Abstract
Swapping just small fractions of the configuration of an FPGA can be very beneficial in many applications. This is in particular useful for reconfiguring the instruction set of embedded soft core processors. This is highly relevant for software driven design flows. Here, the system is initially implemented as far as possible in software (which is faster to accomplish than hardware development). By profiling the application, hot spots will be identified and kernals will be implemented for the FPGA for acceleration until performance requirements are met. There are several methodologies to integrate such accelerator modules. This ranges from small CPU instruction set extensions to large and fully autonomous modules that work concurrently with the CPU. In this chapter, we will investigate how CPU instruction set extensions can be used efficiently with the help of partial runtime reconfiguration. The base idea of extending a CPU with exchangeable instructions is sketched in Fig. 5.1.
Access provided by Autonomous University of Puebla. Download chapter PDF
Keywords
- Partial Run-time Reconfiguration
- Soft-core Processor
- Reconfigurable Instruction
- Reconfigurable Region
- Resource Slots
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Swapping just small fractions of the configuration of an FPGA can be very beneficial in many applications. This is in particular useful for reconfiguring the instruction set of embedded soft core processors. This is highly relevant for software driven design flows. Here, the system is initially implemented as far as possible in software (which is faster to accomplish than hardware development). By profiling the application, hot spots will be identified and kernals will be implemented for the FPGA for acceleration until performance requirements are met. There are several methodologies to integrate such accelerator modules. This ranges from small CPU instruction set extensions to large and fully autonomous modules that work concurrently with the CPU. In this chapter, we will investigate how CPU instruction set extensions can be used efficiently with the help of partial runtime reconfiguration. The base idea of extending a CPU with exchangeable instructions is sketched in Fig. 5.1. Custom instructions access the register file in the same way as the ALU. By decoding unused instruction in the CPU ISA (instruction set architecture) a multiplexer may select between normal ALU operation or one or more user defined instructions. Softcore CPUs with statically implemented custom instructions are well supported. For example the NIOS-II CPU from Altera can be easily extended with custom instructions when using the SOPC builder wizard of the Quartus design tools. Similarly, Xilinx allows to add custom hardware to thier Microblaze softcore CPU using FSL ports. These ports provide basically a streaming port interface between the Microblaze core and the custom hardware. However, for implementing runtime reconfigurable custom instructions, the support is weak, hence omitting this powerful opportunity. In the following section, we will firstly demonstrate that commonly used techniques, like the Xilinx bus macro approach or the recent proxy logic technique is not well suited for integrating custom instructions. After this, in Sect. 5.2, we will demonstrate for a reconfigurable soft core processor that instructions can be integrated into the system without causing any additional logic overhead for the communication. In Sect. 5.3, we reveal how such systems can be easily implemented with the tool ReCoBus-Builder. Rather than providing reconfigurable islands, we will integrate multiple custom instructions in a slot-based fashion. Finally, in Sect. 5.4, an experimental evaluation of a system providing a MIPS CPU extended to support reconfigurable custom instructions will be presented.
5.1 On-FPGA Communication for Custom Instructions
One basic problem to be solved in the design of partially reconfigurable systems is to constrain the routing of the interface signals for a partial module during its physical implementation. As introduced in Chap. 2 , there are several ways to accomplish this. However, when considering a custom instruction set extension, as shown in Fig. 5.1, we have to consider that a relatively large wire count is required for connecting a relatively small reconfigurable area. For example, if we consider a bit permutation function, we have to connect 32 wires towards an island hosting the permutation accelerator and an additional 32 wires for the result. Note that this example would not demand any logic on the FPGA as a permutation is basically wiring on the FPGA. But still, as compared to a software implementation of a permute function, we can easily save a hundred or more assembly instructions, even if fully unroling the function. Or, if we consider a 64-bit XOR gate (over both operands) to be hosted in the same reconfigurable island, it requires two times 32 wires for connecting the input operands. But also in this case, we need only 21 4-input LUTs (e.g., on a Xilinx Virtex-II FPGA) or 13 or 6-input LUTs on a (Xilinx Spartan-6 FPGA) for implementing the 64 bit XOR gate. Again, this instruction would save about a hundred instructions per call of the function. In other words, for some programs, we could gain a substantial speed-up by just adding little additional logic. And by making this configurable, we could host virtually an infinite amount of different accelerators for supporting various software tasks.
When implementing such custom instructions with slice-based bus macros, as illustratrated in Fig. 5.2a, it takes two LUTs per signal wire only for providing the accelerator connection. If we consider in total 100 wires for linking two times a 32-bit operand, a 32-bit result vector and a few additional signals, The overhead is 200 LUTs. This is roughly 10 ×more than actually needed for the XOR gate! Moreover, the look-up tables constitute not only a logic overhead, but also a latency overhead which is roughly 0.4 ns on a Virtex-II FPGA per LUT. Finally, adding LUTs for the communication can negatively impact the placement of both, the static system and the partial modules. For example, a placed bus macro LUT interrupts carry chains and it can further force to spread a module over more area.
With the recent proxy-logic approach, the situation has improved, as shown in Fig. 5.2b. However, it still needs 100 LUTs for the communication. Again this is pure overhead in terms of resources and latency. And as explained in Sect. 2.4.5 on Page 71, the proxy logic approach is not well suited to implement systems with many different reconfigurable modules.
At this point, someone might think to use static only implementations instead, if custom instructions are that small. This is probably the better option for very few instructions. With a rising number of instructions, the CPU gets larger and consequently slower. When assuming the simplified diagram of a CPU datapath in Fig. 5.1a, the ALU contains a multiplexer for selecting between the different sets of instructions of the ALU (e.g., Boolean logic, simple arithmetic, shifter, etc.). This multiplexer is in the critical path and unlikely to be pipelined [Met04], and despite that an FPGA fabric is mainly based on multiplexers, it is poor in implementing wide input multiplexers (see also Sect. 2.6 on Page 104). If carefully applied, runtime reconfiguration allows to integrate more instructions while providing higher performance than a static system. Note that this is in many cases still valid even when considering the configuration overhead. Moreover, partial reconfiguration adds a flexibility to the system that allows to integrate hardware accelerators dynamically to a system like known from the software world.
5.2 Zero Logic Overhead Integration
In this section, we will demonstrate how the Xilinx vendor tools can be used to integrate reconfigurable instructions without any logic overhead. As shown in Fig. 5.3a, we are only interested in binding the signals between the static system (the CPU) and the partial modules (the custom instructions) to a preceisly defined wire of the fabric, called a “PR link”, in the following. In order to occupy a wire segment (i.e., use a PR link), we need a path that will use this wire. In other words, there must by somewhere a primitive source (e.g., a LUT output) and another primitive destination (e.g., a LUT input) in our netlist with a requested connection from the source to the destination. However, this creates a path in our netlist but we have still not constrained the routing. This is done by generating blocker macros, that occupy a user specified set of routing resources such that the Xilinx vendor router cannot use these wires for further implementation steps. The blocker concept is introduced in Sect. 3.2.4. Note that we cannot constrain the routing directly in a way that we say “use wire x for signal y”. We are basically defining a wire allocation in a way that we define “donotuse wire z”. However, if we ensure by our allocation, that there is only one possible path remaining, we can actually achieve our goal to bind a signal path to a wire.
5.2.1 Static System Constraints
With the knowledge of how to create a path and how to constrain this path to certain wire resources of the FPGA fabric, we can implement the static system. The static system contains the CPU and a reconfigurable region. In order to create paths into this region for connecting the operands OP_A and OP_B (see Fig. 5.1), we place a connection primitive into the reconfigurable region (PR region), as depicted in Fig. 5.3b. This primitive acts as a placeholder for the partial module and is the destination for the operand routing. Similarly, for creating a path for the result vector back to the CPU in the static part, we place a placeholder acting as the source for the path. Note that the same LUT primitive (or, to be more preceise, a slice) might be used as a placeholder for multiple input and output signals at the same time. So far, this seams to be pretty much identical to the proxy logic approach. However, we will now add a blocker into the reconfigurable region that blocks all routing resources in this region, except the wires to be used as PR links. Note, that the placement of the placeholders and the blocking is not random and has to support the intended PR link. If we now start the router, we will create the routing of the static system including paths to and from the partial region that are routed using the requested PR links. There are two things to remember: (1) we have not added any logic overhead to the static system, and (2) we only blocked wire resources inside the reconfigurable region.
5.2.2 Partial Module Constraints
The partial module implementation (here the custom instructions) is very similar to what we did for the static system. However, all signals directions are now changed and with respect to a custom instruction, the operands are no inputs and the result vector is an output. Consequently, we place a source placeholder as the start for the operands outside the reconfigurable region (i.e., the static region). Respectively, we add also placeholders acting as the destinations for the result vector. Again, placeholders for inputs and outputs can share the same FPGA primitive, as shown in Fig. 5.3. We will now add a blocker around the partial module that conguests all routing resources, except the ones needed to route the operands and results over the PR links. Here it is important, that the blocker releases PR links that are compatible to the PR links used in the static design. Again, there are two things to remember: (1) we have added no logic overhead to the static system, and (2) we only blocked wires outside the reconfigurable region. Consequently, when loading a reconfigurable instruction into an reconfigurable island that was created as described for the static system in the last section, there will be no placeholder module visible. The placeholders are only temporarily required to create a path over the PR link.
5.2.3 Communication Binding by Wire Allocation
The zero logic overhead technique has to follow some rules. Again, by blocking, we can only select the set of wires that are allowed for routing (i.e., wire allocation) but this does not necessary ensure a particular binding of a logical signal to a physical wire. However, the binding is achieved by allocating wires such that only one unique routing path can be used to reach the connection macro (see Fig. 5.4). As a consequence, not all wires within a CLB can be used at the same time to implement the routing between the static part and the partial part of a system. This is because in the case that multiple wires are routed from one configurable logic block (CLB) to another, wires must be allocated that cannot be swapped. A possible swapping of wires would allow the router to decide between more than one option for a PR link, which cannot be accepted. A situation of allocating swappable wire resources is shown in Fig. 5.4a. Here, the problem is that both allocated wires can be arbitrary used to connect to both placeholders that used for the data signals data[0] and data[1]. Consequently, the router has two possibilities to chose from and we cannot gurantee a signal binding to a specific PR link. However, by allocating a different wire set, we leave only one possible path per data signal and we achieve an exact binding to wires, as shown in Fig. 5.4b. Note that designing PR link paths needs deep knowledge about the FPGA routing fabric including wire resources and possible switch matrix settings. This information is provided by Xilinx individually for each FPGA in a language called XDL [BKT11].
5.3 Implementing Reconfigurable Instructions with the ReCoBus-Builder
The ReCoBus-Builder is originally designed for implementing bus-based systems consisting of many small resource slots that are integrated with the help of macros, as revealed in Sect. 3.2. At this point, we focus only on macros implementing the connection bar architecture (Sect. 2.6.1). For implementing the zero logic overhead approach, we follow the original ReCoBus-Builder flow and perform resource budgeting and define a floorplan that fulfills the resource requirements. Then, we create our communication architecture that will provide connection primitives in the static part of the system as well as in each resource slot. Let us consider the simple case of a connection bar to connect only a single resource slot. We would then basically generate a Xilinx bus macro for an island reconfiguration style. When following the default ReCoBus-Builder flow, we will generate two blocker macros, one for the static design and one for the reconfigurable modules. We will use these blockers for implementing the PR link approach shown in Fig. 5.3. As the blockers generated by the ReCoBus-Builder will not block the wires that are already used for the connection bar macro, the blocker will contain a tunnel for a PR link. The only thing that is now missing are the placeholder primitives. These primitives are taken directly from the generated connection bar macro. Consequently, we can generate compatible placeholder/blocker pairs for both the static system and the partial modules. If we assume a connection bar with one internal wire towards east and another wire towards westwards direction, the resulting primitives and blockers would match the example in Fig. 5.3. The ReCoBus-Builder has a wire database for each supported device. This is used by the tool to check if a wire allocation can ensure PR links without possible swaps as discussed in the last paragraph. With this approach, we can provide four double wire PR links per CLB on a Xilinx Virtex-II FPGA.
As a case study, we consider to integrate up to five different instructions into the system at the same time. Instead of using five individual islands for hosting the instruction modules (as it would be necessary following the Xilinx PR flow), the system uses a more flexible approach with one reconfigurable area that is tiled into five resource slots, as depicted in Fig. 5.5. This has the advantage that modules of different size can be more efficiently integrated into the system by taking a variable amount of slots. The communication architecture has to link the two operands to each slot and the result vector back individually for each slot to an instruction multiplexer. By using different wire resources for the operands and the result vectors that route over different distances, both requirements can be properly implemented. By taking advantage of the regular FPGA fabric, the slots can be arranged completely identically, hence allowing free placement of instructions into the reconfigurable ALU. Figure 5.5 reveals a detail of the routing architecture of Xilinx Virtex-II FPGAs that was used to provide slots that are smaller than the routing distance of a wire. In the example, it is assumed that one resource slot is only one CLB wide and that the operands are routed using double lines that route two CLBs wide. However, by using a connection in the middle of the wire, which is provided by the routing fabric after a distance of one CLB, and by displacing the start points of the regular routing structure of the two operands by one CLB in horizontal direction, both operands can be accessed in any slot. This is possible by routing the signals in an interleaved manner. Note that it is also possible to route paths by cascading multiple different wires, which would allow to widen the slots (in terms of CLB columns) and to extend the total amount of slots for hosting modules (see Sect. 2.5.2 on Page 81 for more details). The interleaving results in swapping the operands with respect to the placement position (odd or even start slot). However, for instructions that are not commutative, we can use two physical implementations in order to omit the alignment multiplexing. See Sect. 2.5.3 on Page 92 for more details on interleaving.
5.4 Case Study on Custom Instructions
The case study has been implemented with the ReCo-Bus-Builder on a Xilinx Virtex-II XC2V500-5 FPGA. The tool generates regular structured macros together with the surrounding blocker macros that constrain the routing. The implementation follows directly the methodology revealed in Sect. 5.2. The communication macros provide the connection primitives and fix the wire resources. The ReCoBus-Builer generates the all macros (including the blocker) in the Xilinx design language (XDL). While communication macros are instantiated using the HDL flow, the blockers are integrated into the design just before the final route step. A floorplanning view on the system is depicted in Fig. 5.6. The area reserved for hosting reconfigurable instructions is 8% of the total amount of CLBs that are available on the used device. With five times 48 slices, the PR region provides roughly 15–20% the amount of logic that would be required by an optimized 32 bit soft core processor, such as the Xilinx Microblaze. For the experiments, we used our own MIPS processor implementation that has not been optimized for speed or area, but which can be easily adapted to include reconfigurable instructions.
5.4.1 Static System Implementation
During implementation of the static system, connection primitives that are placed inside the reconfigurable region and that are surrounded with blocker macros have been used to constrain all signals required to integrate the instructions. A screenshot with the static system is shown in Fig. 5.7. The amount of wires that are connected from the static part of the system to the PR region is 2 ×32 for the operands plus additional eight wires of control signals. In reverse direction, each one out of the five slot delivers a 32 bit result plus additional four flags. This results in a total amount of \(64 + 8 + 5 \times(4 + 32) = 252\) wires.
According to the partial design flow provided by Xilinx, the number of operand bits and control signals has to be multiplied by the number of slots, as that flow does not consider multicast routing to multiple slots without additional connection primitives. Then the slice based macro approach would cost \(2 \times5 \times(72 + 36) = 1,080\) LUTs only for the communication. This is 18% of the available LUTs on the target device and roughly one third of the logic a fully featured 32 bit Microblaze soft core processor would take. Even using the new flow that is based on proxy logic, would still result in a remarkable unnecessary overhead.
When floorplanning a reconfigurable system, it is recommended to consider the underlying FPGA architecture. For example, Xilinx FPGAs are column-wise reconfigured, which should be taken into account by designing the slots vertically. This optimizes the reconfiguration time. A restriction derived from the full column reconfiguration scheme is that no distributed memory can be used directly above or below the PR region as this would corrupt the state of these primitives. Following this rule, partial reconfiguration can be carried out while continuing the system to operate.
FPGAs provide carry chain logic, which are used for different kinds of arithmetic operations. On Xilinx FPGAs, the carry chains include four LUTs per CLB and the chains are arranged in upwards direction. Consequently, we built the system such that exactly two times four operand signal bits and four bits of the result vector are connected in a CLB. Furthermore, the signal vector bits are connected bottom-up (LSBs in the bottom) to follow the carry chain. Without this physical port mapping, routing will get very conguested for the modules. In [CPF09], a tool using a simulated annealing heuristic was used to place communication macros around a reconfigurable region that was also used for reconfigurable CPU extensions. Such tools have an excessive runtime as they require a place and route step for each annealing step. It can be assumed, that the final result would be very similar to the here proposed rule based port mapping that needs only one place and route run.
5.4.2 Reconfigurable Instructions
For implementing the reconfigurable modules, the complete static system was substituted with a connection bar macro, as depicted in Fig. 5.8. This permits to implement reconfigurable modules in absence of the static system. As can be seen in Fig. 5.8 for a CRC checksum function, a module is surrounded with a blocker macro for restricting modules into strict bounding boxes. This design has no connections to external pins. The timing was constrained with the Xilinx TPSYNC parameter.
5.4.3 Results and Overhead Analysis
Swapping instructions comprises a significant time for writing the corresponding partial bit stream to the right target position. In addition, extra time might be required for computing a placement position or performing some bitstream manipulations. This extra time overhead is implementation dependent and not further considered in the following. However, due to the small size of the systems, most work could be precomputed offline (e.g., a table for the placement position). When taking the decision to use reconfigurable instructions, it is important to know the latency that has to be considered for the reconfiguration process (response time) and the time the processor will require when executing the instructions alternatively as simple software function calls. This determines the breakeven factor k and the system has to trigger a reconfigurable instruction at least k times before gaining a benefit in the total execution time of the system. Note that we use function calls and no traps, as traps are very specific for emulating CPU instructions in software and because traps have a tiny additional overhead that would not occur in case of normal function calls. The configuration times and the execution times for software implementations of the custom instructions (determined in a simulator) are listed in Table 5.1.
The reconfiguration process is relatively slow and would consequently prevent using custom instructions in time critical parts of the software (e.g., interrupts). However this is not problematic as critical software parts should typically not perform complex computations. The breakeven factor k is the number of possible invocations of a particular instruction during the time to configure this instruction. As can be seen, for complex operations, such as the CRC instruction, less than 300 calls of this reconfigurable instruction would pay of the configuration overhead; and even if an instruction can save only a few cycles, this can pay of after just a few thousand cycles. Considering that the saturation addition/subtraction module is used in an image processing application, it can be assumed that it is very likely to trigger this function an sufficient amount of times. It must be mentioned that the listed values are theoretical and the breakeven points will probably be likely higher. This is because the configuration data transfer is in our system in conflict with the CPU (shared memory buses); and even having only a few KB of configuration data results in a burst affecting the CPU. However, reconfigurable instructions are still an interesting option for both saving FPGA resources and gaining performance.
The values in brackets denote the utilization within the occupied slots. Despite that the CRC logic would easily fit into one slot, an additional slot was required to fully route the module. The bitstream size states only the fraction of the partial module and no static parts. The reconfiguration time is mainly related to the amount of slots that have to be written to the device. A single slot configuration is 11.6 KB on this device which results in 0.6 ms configuration time, when assuming a configuration speed of 20 MB/s. The latency was determined using the FPGA editor. The values are measured between the operand fetching pipeline register through the combinatory path of the instruction and further towards the output of the instruction select multiplexer. The max value denotes the critical path delay and the average delay over all paths.
The examples point out that small FPGA areas are sufficient to include very valuable instructions into a CPU with the help of partial runtime reconfiguration. Despite the small slots, a high number of signals can be interfaced to partial modules.
References
Ahmadinia A, Bobda C, Ding J, Majer M, Teich J, Fekete S, van der Veen J (2005) A practical approach for circuit routing on dynamic reconfigurable devices. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping (RSP), June 2005, pp 84–90
Athanas P, Bowen J, Dunham T, Patterson C, Rice J, Shelburne M, Suris J, Bucciero M, Graf J (2007) Wires on demand: run-time communication synthesis for reconfigurable computing. In: International conference on field programmable logic and applications (FPL), Aug 2007. IEEE, pp 513–516.
Abel N (2011) Design and implementation of an object-oriented framework for dynamic partial reconfiguration. PhD thesis, University of Heidelberg, Heidelberg
Ahmadinia A, Bobda C, Majer M, Teich J, Fekete S, van der Veen J (2005) DyNoC: a dynamic infrastructure for communication in dynamically reconfigurable devices. In: Proceedings of the international conference on field-programmable logic and applications (FPL), Aug 2005, pp 153–158
Abound Logic, Inc (2009) The raptor family of FPGAs (V1.0)
Altera Corporation (2007) Avalon memory-mapped interface specification V3.3. Available online: www.altera.com/literature/manual/mnl_avalon_spec.pdf
Altera Inc (2009) Altera devices. Available online: www.altera.com/products/devices/dev-index.jsp
Athanas PM, Silverman HF (1993) Processor reconfiguration through instruction-set metamorphosis: compiler and architectures. IEEE Comput 26(3):11–18
Ashenden PJ (2008) The designer’s guide to VHDL, 3rd edn. Morgan Kaufmann
Asadi G-H, Tahoori MB (2005) Soft error mitigation for SRAM-based FPGAs. In: Proceedings of the 23rd IEEE VLSI test symposium (VTS), IEEE Computer Society, pp 207–212
Bieser C, Bahlinger M, Heinz M, Stops C, Müller-Glaser KD (2006) A novel partial bitstream merging methodology accelerating Xilinx Virtex-II FPGA based RP system setup. In: Proceedings of the international conference on field programmable logic and applications (FPL), pp 1–4
Brebner GJ, Diessel O (2001) Chip-based reconfigurable task management. In: Proceedings of the 11th international conference on field programmable logic and application (FPL), Springer, pp 182–191
Beckhoff C (2007) Entwurf und Implementierung von Hardwaremodulen zur Dekompression von FPGA-Konfigurationsdaten. In: Studienarbeit, Lehrstuhl für Hardware-Software-Co-Design. Universtät Erlangen-Nürnberg, Erlangen
Braun L, Hübner M, Becker J, Perschke T, Schatz V, Bach S (2007) Circuit switched run-time adaptive network-on-chip for image processing applications. In: International conference on field programmable logic and applications (FPL), Aug 2007. IEEE, pp 688–691
Blodget B, James-Roxby P, Keller E, McMilla S, Sundararajan P (2003) A self-reconfiguring platform. In: Proceedings of international conference on field-programmable logic and applications (FPL), pp 565–574
Beckhoff C, Koch D, Torresen J (2010) Short-circuits on FPGAs caused by partial runtime reconfiguration. In: Proceedings of the international conference on field programmable logic and applications (FPL), Aug 2010, pp 596–601
Beckhoff C, Koch D, Torresen J (2011) The Xilinx Design Language (XDL): tutorial and use cases. In: Proceedings of the 6th international workshop on reconfigurable communication-centric systems-on-chip (ReCoSoC), pp 1–8
Becker T, Luk W, Cheung PYK (2007) Enhancing relocatability of partial bitstreams for run-time reconfiguration. In: Proceedings of the 15th annual IEEE symposium on Field-programmable Custom Computing Machines (FCCM), IEEE Computer Society, pp 35–44
Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. Computer 35(1):70–78
Bieser C, Mueller-Glaser K-D (2006) Rapid prototyping design acceleration using a novel merging methodology for partial configuration streams of Xilinx Virtex-II FPGAs. In: Proceedings of the 17th IEEE international workshop on Rapid System Prototyping (RSP). IEEE Computer Society, pp 193–199
Bobda C, Majer M, Koch D, Ahmadinia A, Teich J (2004) A dynamic NoC approach for communication in reconfigurable devices. In: Proceedings of international conference on field-programmable logic and applications (FPL). Volume 3203 of lecture notes in computer science (LNCS). Springer, pp 1032–1036
Bruneel K, Abouelella FMMA, Stroobandt D (2009) TMAP: s reconfigurability-aware technology mapper. In: Jacquemod G, Luxey C, Damiano J-P (eds) Design, automation and test Europe: university booth (DATE), Apr 2009
Baumgarte V, May F, Nückel A, Vorbach M, Weinhardt M (2001) PACT XPP – A self-reconfigurable data processing architecture. In: Proceedings of the international conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), June 2001, pp 64–70
Betz V, Rose J (1997) VPR: a new packing, placement and routing tool for FPGA research. In: Proceedings of the 7th international workshop on field-programmable logic and applications (FPL). Springer, pp 213–222
Brebner G (1997) The swappable logic unit: a paradigm for virtual hardware. In: Pocek KL, Arnold J (eds) IEEE symposium on FPGAs for custom computing machines (FPGA), Apr 1997, pp 77–86
Betz V, Rose J, Marquardt A (eds) (1999) Architecture and CAD for deep-submicron FPGAs. Kluwer, Norwell
Babb J, Tessier R (1993) Virtual wires: overcoming pin limitations in FPGA-based logic emulators. In: IEEE workshop on FPGAs for custom computing machines, pp 142–151
Babb J, Tessier R, Dahl M, Hanono S, Hoki D, Agarwal A (1997) Logic emulation with virtual wires. IEEE trans Comput Aided Design 16:609–626
Curt D, Kalara P, Leblanc R, Eck V, Trynosky S, Lindholm J, Bauer T, Blodget B, McMillan S, Philip J, Prasanna S, Keller E (2004) Reconfiguration of the programmable logic of an integrated circuit. WO Patent WO002004055986A3, issued 25, Nov 2004
Compton K, Li Z, Cooley J, Knol S, Hauck S (2002) Configuration relocation and defragmentation for run-time reconfigurable computing. IEEE Trans Very Large Scale Integ Syst 10:209–220
Claus C, Müller FH, Stechele W (2006) Combitgen: a new approach for creating partial bitstreams in Virtex-II Pro. In: Karl W, Becker J, Großpietsch K-E, Hochberger C, Maehle E (eds) Workshops proceedings of the 19th international conference on Architecture of Computing Systems (ARCS). Volume 81 of lecture notes in informatics. GI, pp 122–131
Claus C, Müller FH, Zeppenfeld J, Stechele W (2007) A new framework to accelerate Virtex-II Pro dynamic partial self-reconfiguration. In: Proceedings of the IEEE 21th International Parallel and Distributed Processing Symposium (IPDPS), Mar 2007, pp 1–7
Carver JM, Pittman RN, Forin A (2009) Automatic bus macro placement for partially reconfigurable FPGA designs. In: Proceeding of the ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA). ACM, pp 269–272
Department of Computer Science 12, ReCoNets-Project website, University of Erlangen Nuremberg, Germany. www.reconets.de
Dehon A (1999) Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100% LUT utilization. In: Proceedings of the international symposium on Field Programmable Gate Arrays (FPGA), pp 69–78
Dittmann F, Frank S (2007) Hard real-time reconfiguration port scheduling. In: Proceedings of the conference on design, automation and test in Europe (DATE). EDA Consortium, pp 123–128
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271
Demirsoy SS, Langhammer M (2009) Cholesky decomposition using fused datapath synthesis. In: Proceeding of the ACM/SIGDA international symposium on Field-Programmable Gate Arrays (FPGA). ACM, pp 241–244
Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proceedings of the 38th conference on design automation (DAC), pp 684–689
Elnozahy EN, Alvisi L, Wang Y-M, Johnson D (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3): 375–408
Altera Inc. (2007) Enhanced configuration devices (EPC4, EPC8 & EPC16) data sheet. www.altera.com/literature/hb/cfg/ch_14_vol_2.pdf
Can programmables break out of $3.6bn niche. ElectronicWeekly, 26.06.2009
Fidge C (1988) Timestamps in message-passing systems that preserve the partial ordering. In: Proceedings of the 11th Australian Computer Science Conference (ACSC), pp 56–66
Fekete SP, Kamphans T, Schweer N, Tessars C, van der Veen JC, Angermeier J, Koch D, Teich J (2008) No-break dynamic defragmentation of reconfigurable devices. In: FPL 2008, international conference on field programmable logic and applications, Heidelberg, pp 113–118
Flynn M (2005) Area – time – power and design effort: the basic tradeoffs in application specific systems. In: Proceedings of the IEEE international conference on Application-Specific Systems, Architecture Processors (ASAP). IEEE Computer Society, pp 3–6
Fekete SP, van der Veen J, Angermeier J, Göhringer D, Majer M, Teich J (2007) Scheduling and communication-aware mapping of HW/SW modules for dynamically and partially reconfigurable SoC architectures. In: Proceedings of the 20th international conference on Architecture of Computing Systems (ARCS), Zurich. VDE-Verlag, pp 151–160
Fekete S, van der Veen J, Majer M, Teich J (2006) Minimizing communication cost for reconfigurable slot modules. In: Proceedings of 16th international conference on field programmable logic and applications (FPL), Aug 2006, pp 535–540
Guccione S, Levi D, Sundararajan P (1999) JBits: Java based interface for reconfigurable computing. In: Proceedings of the 2nd annual military and aerospace applications of programmable devices and technologies (MAPLD)
Gao S, Schmidt A, Sass R (2009) Hardware implementation of mpi barrier on an fpga cluster. In: Proceedings of the international conference on field programmable logic and applications (FPL), pp 12–17
Gupta RK, Zorian Y (1997) Introducing core-based system design. IEEE Des Test 14(4):15–25
Halfhill TR (2007) Tabulas time machine – Rapidly reconfigurable chips will challenge conventional FPGAs. Microprocessor Report, Issue 032910. Available online: www.tabula.com/news/M11_Tabula_Reprint.pdf.
Hauck S (1998) Configuration prefetch for single context reconfigurable coprocessors. In: Proceedings of the sixth ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA). ACM, pp 65–74
Heun V (2003) Grundlegende algorithmen. Vieweg
Hempel G, Hochberger C, Koch A (2010) A comparison of hardware acceleration interfaces in a customizable soft core processor. In: Proceedings of the international conference on field programmable logic and applications (FPL). IEEE Computer Society, pp 469–474
Hagemeyer J, Kettelhoit B, Koester M, Porrmann M (2007) A design methodology for communication infrastructures on partially reconfigurable FPGAs. In: International conference on field programmable logic and applications (FPL), Aug 2007. IEEE, pp 331–338
Hagemeyer J, Kettelhoit B, Koester M, Porrmann M (2007) Design of homogeneous communication infrastructures for partially reconfigurable FPGAs. In: Proceedings of the international conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Jun 2007
Hagemeyer J, Kettelhoit B, Porrmann M (2006) Dedicated module access in dynamically reconfigurable systems. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS)
Haubelt C, Koch D, Teich J (2003) Basic OS support for distributed reconfigurable hardware. In: Proceedings of the third international workshop on systems, architectures, modeling, and simulation, July 2003, pp 18–22
Hansen SG, Koch D, Torresen J (2011) High speed partial run-time reconfiguration using enhanced ICAP hard macro. In: Proceedings of the 18th reconfigurable architectures workshop (RAW). IEEE
Horta EL, Lockwood JW, Kofuji ST (2002) Using PARBIT to implement partial run-time reconfigurable systems. In: Proceedings of the 12th international conference on field-programmable logic and applications (FPL). Springer, pp 182–191
Horta EL, Lockwood JW, Taylor DE, Parlour D (2002) Dynamic hardware plugins in an FPGA with partial run-time reconfiguration. In: Proceedings of the 39th conference on design automation (DAC). ACM, pp 343–348
Huang W-J, McCluskey EJ (2001) Column-based precompiled configuration techniques for FPGA. In: Proceedings of the the 9th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, pp 137–146
Hübner M, Schuck C, Kühnle M, Becker J (2006) New 2-dimensional partial dynamic reconfiguration techniques for real-time adaptive microelectronic circuits. In: Proceedings of the IEEE computer society annual symposium on emerging VLSI technologies and architectures (ISVLSI), p 97
Huffman DA (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101
Hauck S, Wilson WD (1999) Runlength compression techniques for FPGA configurations. In: Proceedings of the seventh annual IEEE symposium on field-programmable custom computing machines (FCCM99). IEEE Computer Society, pp 286–287
Xilinx Inc (2002) Two flows for partial reconfiguration: module based or difference based. Available online: www.xilinx.com/bvdocs/appnotes/xapp290.pdf
Atmel Inc (2003) AT40K series configuration user guide. Available online: http://www.atmel.com/dyn/resources/prod_documents/DOC1009.PDF
Xilinx Inc (2005) Using look-up tables as shift registers (SRL16) in Spartan-3 generation FPGAs. Available online: www.xilinx.com/bvdocs/appnotes/xapp465.pdf
Xilinx Inc (2005) Virtex-II platform FPGAs: complete data sheet. Available online: http://www.xilinx.com/support/documentation/data_sheets/ds031.pdf
Xilinx Inc (2007) Xilinx Virtex-II platform FPGA user guide. Available online: http://www.xilinx.com/support/documentation/user_guides/ug002.pdf
Altera Inc (2008) Logic array blocks and adaptive logic modules in Stratix IV devices. Available online: http://www.altera.com/literature/hb/stratix-iv/
Xilinx Inc (2008) CLB white papers. Available online: http://www.xilinx.com/support/documentation/wpfpgafeaturedesign_clb.htm/
Xilinx Inc (2008) PlanAhead design analysis tool. Available online: http://www.xilinx.com/ise/optional_prod/planahead.htm
International Business Machines corporation (IBM) (1999) The CoreConnect bus architecture. Available online: http://www-03.ibm.com/chips/products/coreconnect/
Jones G, Sheeran M (1990) Circuit design in Ruby. In Staunstrup, J. (ed) Formal Methods for VLSI Design. North-Holland
Jerraya AA, Wolf W (2005) Hardware/software interface codesign for embedded systems. Computer 38(2):63–69
Koch D, Bobda C, Ahmadinia A, Teich J (2007) FPGA architecture extensions for preemptive multitasking and hardware defragmentation. In: Proceedings of International Conference on Field-Programmable Technology 2007 (ICFPT ’07), Dec 2007. IEEE, pp 433–436
Koch D, Beckhoff C, Teich J (2007) Bitstream decompression for high speed FPGA configuration from slow memories. In: Proceedings of international conference on field-programmable technology (ICFPT), Dec 2007. IEEE, pp 161–168
Koch D, Beckhoff C, Teich J (2008) ReCoBus-builder – A novel tool and technique to build statically and dynamically reconfigurable systems for FPGAs. In: Proceedings of international conference on field-programmable logic and applications (FPL 08), Sept 2008, pp 119–124
Koch D, Beckhoff C, Teich J (2009) A communication architecture for complex runtime reconfigurable systems and its implementation on Spartan-3 FPGAs. In: Proceedings of the 17th ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2009), Feb 2009. ACM, pp 233–236
Koch D, Beckhoff C, Teich J (2009) Hardware decompression techniques for FPGA-based embedded systems. ACM Trans Reconfig Technol Syst 2:9:1–9:23
Koch D, Beckhoff C, Teich J (2009) Minimizing internal fragmentation by fine-grained two-dimensional module placement for runtime reconfigurable systems. In: 17th annual IEEE symposium on field-programmable custom computing machines (FCCM 2009), Apr 2009. IEEE Computer Society, pp 251–254
Koch D, Beckhoff C, Torresen J (2010) Obstacle-free two-dimensional online-routing for run-time reconfigurable FPGA-based systems. In: Proceedings of international conference on field-programmable technology (ICFPT1́0). IEEE, pp 208–215
Koh S, Diessel O (2006) COMMA: a communications methodology for dynamic module-based reconfiguration of FPGAs. In: Workshops proceedings of the 19th international conference on architecture of computing systems (ARCS), Mar 2006, pp 173–182
Koh S, Diessel O (2006) COMMA: a communications methodology for dynamic module reconfiguration in FPGAs. In: Proceedings of the 14th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, pp 273–274
Krasteva YE, de la Torre E, Riesgo T, Joly D (2006) Virtex II FPGA bitstream manipulation: application to reconfiguration control systems. In: Proceedings of the 16th international conference on field programmable logic and applications (FPL). IEEE, pp 1–8
Koch D, Haubelt C, Streichert T, Teich J (2007) Modeling and synthesis of hardware-software morphing. In: Proceedings of the international symposium on circuits and systems (ISCAS 2007), May 2007, pp 2746–2749
Koch D, Haubelt C, Teich J (2007) Efficient hardware checkpointing – Concepts, overhead analysis, and implementation. In: Proceedings of the 15th ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2007), Feb 2007. ACM, pp 188–196
Koch D, Haubelt C, Teich J (2008) Efficient reconfigurable on-chip buses for FPGAs. In: 16th annual IEEE symposium on field-programmable custom computing machines (FCCM 2008), Apr 2008. IEEE Computer Society, pp 287–290
Krasteva YE, Jimeno AB, de la Torre E, Riesgo T (2005) Straight method for reallocation of complex cores by dynamic reconfiguration in Virtex II FPGAs. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping (RSP), pp 77–83
Kissler D, Kupriyanov A, Hannig F, Koch D, Teich J (2006) A generic framework for rapid prototyping of system-on-chip designs. In: Proceedings of international conference on computer design (CDES), June 2006, pp 189–195
Koch D, Körber M, Teich J (2006) Searching RC5-keys with distributed reconfigurable computing. In: Proceedings of international conference on engineering of reconfigurable systems and algorithms (ERSA 2006), June 2006. CSREA Press, pp 42–48
Kalte H, Lee G, Porrmann M, Rückert U (2005) REPLICA: a bitstream manipulation filter for module relocation in partial reconfigurable systems. In: Proceedings of the 19th International Parallel and Distributed Processing Symposium – Reconfigurable architectures workshop (IPDPS). IEEE Computer Society
Kapre N, Mehta N, Delorimier M, Rubin R, Barnor H, Wilson MJ, Wrighton M, Dehon A (2006) Packet-switched vs. time-multiplexed FPGA overlay networks. In: Proceedings of the IEEE symposium on Field-programmable Custom Computing Machines (FCCM). IEEE, pp 205–216
Kotolly A (2006) The economics of structured- and standard-cell-ASIC designs. Electronic News, 16.03.2006
Kalte H, Porrmann M (2005) Context saving and restoring for multitasking in reconfigurable systems. In: Proceedings of the 15th international conference on field programmable logic and applications (FPL), Aug 2005, pp 223–228
Kalte H, Porrmann M (2006) REPLICA2Pro: task relocation by bitstream manipulation in Virtex-II/Pro FPGAs. In: Proceedings of the 3rd conference on computing frontiers (CF). ACM, pp 403–412
Kalte H, Porrmann M, Rückert U (2002) A prototyping platform for dynamically reconfigurable system on chip designs. In: Proceedings of the IEEE workshop heterogeneous reconfigurable systems on chip (SoC)
Kalte H, Porrmann M, Rückert U (2004) Study on column wise design compaction for reconfigurable systems. In: Proceedings of the IEEE international conference on field programmable technology (FPT), Dec 2004
Kalte H, Porrmann M, Rückert U (2004) System-on-programmable-chip approach enabling online fine-grained 1D-placement. In: Proceedings of the 11th reconfigurable architectures workshop (RAW), pp 141–146
Kuon I, Rose J (2007) Measuring the gap between FPGAs and ASICs. Tran Comput-Aided Des Integr Circ Syst 26(2):203–215
Koch D, Reimann F, Streichert T, Haubelt C, Teich J (2010) ReCoNets – design methodology for embedded systems consisting of small networks of reconfigurable nodes and connections. In: Platzner M, Teich J, Wehn N (eds) Dynamically reconfigurable systems. Springer, pp 223–244
Koch D, Streichert T, Haubelt C, Teich J (2008) Logic chip, logic system and method for designing a logic chip. Patent PCT/EP2008/007342
Koch D, Streichert T, Haubelt C, Teich J (2008) Logic chip, method and computer program for providing a configuration information for a configurable logic chip. Patent PCT/EP2008/007343
Kissler D, Strawetz A, Hannig F, Teich J (2009) Power-efficient reconfiguration control in coarse-grained dynamically reconfigurable architectures. In: Proceedings of the 18th international workshop on power and timing modeling, optimization, and simulation (PATMOS), Sept 2009. Lecture notes in computer science (LNCS, vol 5349). Springer, pp 307–317
Koch D, Teich J (2004) Platform-independent methodology for partial reconfiguration. In: Proceedings of the 1st conference on computing frontiers (CF’04). ACM, pp 398–403
Koch D, Torresen J (2010) Routing optimizations for component-based system design and partial run-time reconfiguration on FPGAs. In: Proceedings of international conference on field-programmable technology (ICFPT). IEEE, pp 460–464
Koch D, Torresen J (2011) A routing architecture for mapping dataflow graphs at run-time. In: Proceedings of international conference on field-programmable logic and applications (FPL 11), Sept 2011, pp 286–290
Koch D, Torresen J (2011) FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In: Proceedings of the 19th ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2011), Feb 2011. ACM, pp 45–54
Kachris C, Vassiliadis S (2006) Performance evaluation of an adaptive FPGA for network applications. In: Proceedings of the seventeenth IEEE international workshop on rapid system prototyping (RSP). IEEE Computer Society, pp 54–62
Lewis D, Ahmed E, Baeckler G, Betz V, Bourgeault M, Galloway D, Hutton M, Lane C, Lee A, Leventis P, Mcclintock C, Padalia K, Pedersen B, Powell G, Ratchev B, Reddy S, Schleicher J, Stevens K, Yuan R, Cliff R, Rose J (2005) The Stratix II logic and routing architecture. In: Proceedings of the ACM/SIGDA 13th international symposium on field-programmable gate arrays (FPGA). ACM, pp 14–20
Lysaght P, Blodget B, Mason J, Young J, Bridgford B (2006) Invited paper: enhanced architecture, design methodologies and CAD tools for dynamic reconfiguration of Xilinx FPGAs. In: Proceedings of the 16th international conference on field programmable logic and application (FPL), Aug 2006, pp 1–6
Lu R, Cao A, Koh C-K (2007) SAMBA-Bus: a high performance bus architecture for system-on-chips. IEEE Trans VLSI Syst 15(1):69–79
Li Z, Hauck S (1999) Don’t care discovery for FPGA configuration compression. In: Proceedings of the 7th ACM/SIGDA international symposium on field programmable gate arrays (FPGA). ACM, pp 91–98
Li Z, Hauck S (2001) Configuration compression for Virtex FPGAs. In: Proceedings of the 9th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, pp 147–159
Lange H, Koch A (2010) Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans Comput 59:1363–1377
Lemieux G, Lee E, Tom M, Yu A (2004) Directional and single-driver wires in FPGA interconnect. In: IEEE international conference on field-programmable technology (FPT)
Lockwood JW, Naufel N, Turner JS, Taylor DE (2001) Reprogrammable network packet processing on the field programmable port extender (FPX). In: Proceedings of the 1998 ACM/SIGDA sixth international symposium on field programmable gate arrays (FPGA). ACM, pp 87–93
Lübbers E, Platzner M (2008) Communication and synchronization in multithreaded reconfigurable computing systems. In: Proceedings of the international conference on engineering of reconfigurable systems and algorithms (ERSA), Jun 2008, pp 83–89
Luk W, Shirazi N, Cheung PYK (1996) Modelling and optimising run-time reconfigurable systems. In: IEEE symposium on FPGAs for custom computing machines. IEEE Computer Society Press, pp 167–176
Luk W, Shirazi N, Cheung PYK (1997) Compilation tools for run-time reconfigurable designs. In: Proceedings of the 5th IEEE symposium on FPGA-based custom computing machines (FCCM). IEEE Computer Society, pp 56–65
Advanced RISC Machines Ltd (2007) AMBA system architecture. Available online: http://www.arm.com/products/solutions/AMBAHomePage.html
Landaker WJ, Wirthlin MJ, Hutchings B (2002) Multitasking hardware on the SLAAC1-V reconfigurable computing system. In: Proceedings of the 12th international conference on field-programmable logic and applications (FPL), pp 806–815
Bourgeault M (2011) Alteras partial reconfiguration flow. Available online: http://www.eecg.utoronto.ca/jayar/FPGAseminar/FPGA_Bourgeault_June23_2011.pdf
Marescaux T, Bartic A, Verkest D, Vernalde S, Lauwereins R (2002) Interconnection networks enable fine-grain dynamic multi-tasking on FPGAs. In: Proceedings of the 12th international conference on field-programmable logic and applications (FPL), pp 795–805
Mcmurchie L, Ebeling C (1995) Pathfinder: a negotiation-based performance-driven router for FPGAs. In: Proceedings of the 1995 ACM third international symposium on field-programmable gate arrays (FPGA), pp 111–117
Metzgen P (2004) A high performance 32-bit ALU for programmable logic. In: Proceedings of the 2004 ACM/SIGDA 12th international symposium on field programmable gate arrays (FPGA). ACM, pp 61–70
Mesquita D, Moraes F, Palma J, Möller L, Calazans N (2003) Remote and partial reconfiguration of FPGAs: tools and trends. In: Proceedings of the 17th international symposium on parallel and distributed processing (IPDPS). IEEE Computer Society, p 177.1
Mignolet J-Y, Nollet V, Coene P, Verkest D, Vernalde S, Lauwereins R (2003) Infrastructure for design and management of relocatable tasks in a heterogeneous reconfigurable system-on-chip. In: Proceedings of the conference on design, automation and test in Europe (DATE). IEEE Computer Society, pp 986–991
Marescaux T, Nollet V, Mignolet J-Y, Bartic A, Moffat W, Avasare P, Coene P, Verkest D, Vernalde S, Lauwereins R (2004) Run-time support for heterogeneous multitasking on reconfigurable SoCs. Integr VLSI J 38(1):107–130
Mak TST, Sedcole NP, Cheung PYK, Luk W (2006) On-FPGA communication architectures and design factors. In: Proceedings of the 16th international conference on field programmable logic and applications (FPL). IEEE, pp 1–8
Mak T, Sedcole P, Cheung PYK, Luk W (2008) Wave-pipelined signalling for On-FPGA communication. In: IEEE international conference on field-programmable technology (FPT)
Majer M, Teich J, Ahmadinia A, Bobda C (2007) The erlangen slot machine: a dynamically reconfigurable FPGA-based computer. J VLSI Sig Process Syst 47(1):15–31
Oliver TF, Maskell DL (2007) Prerouted FPGA cores for rapid system construction in a dynamic reconfigurable system. EURASIP J Embed Syst 2007(1):7 [ope] opencores http://www.opencores.org
Palma JC, de Mello AV, Möller L, Moraes F, Calazans N (2002) Core communication interface for FPGAs. In: Proceedings of the 15th symposium on integrated circuits and systems design (SBCCI), p 183
Pan JH, Mitra T, Wong W-F (2004) Configuration bitstream compression for dynamically reconfigurable FPGAs. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 766–773
Salomon D (2004) Data compression – The complete reference. Springer
Suris J, Athanas P, Patterson C (2008) An efficient run-time router for connecting modules in FPGAs. In: Proceedings of international conference on field-programmable logic and applications (FPL), Sept 2008, pp 125–130
Sedcole P, Blodget B, Becker T, Anderson J, Lysaght P (2006) Modular dynamic reconfiguration in Virtex FPGAs. IEE 153(3):157–164
Sethuraman B, Bhattacharya P, Khan J, Vemuri R (2005) LiPaR: a light-weight parallel router for FPGA-based networks-on-chip. In: Proceedings of the 15th ACM great lakes symposium on VLSI (GLSVSLI), New York. ACM, pp 452–457
Sedcole P, Cheung PYK, Constantinides GA, Luk W (2004) A structured methodology for system-on-an-FPGA design. In: 14th international conference on field programmable logic and application (FPL), Aug 2004. Volume 3203 of lecture notes in computer science, pp 1047–1051
Silvia M, Ferreira JC (2008) Generation of partial FPGA configurations at run-time. In: Proceedings of international conference on field-programmable logic and applications (FPL), Sept 2008, pp 367–372
Sunwoo J, Garimella S, Stroud C (2005) On embedded processor reconfiguration of logic BIST for FPGA cores in SoCs. In: Proceedings of the 14th IEEE North Atlantic test workshop (NATW), May 2005
SilicoreTM (2002) WISHBONE System-on-Chip (SoC)interconnection architecture for portable IP cores Rev. B.3. Available online: http://www.opencores.org/projects.cgi/web/wishbone/wbspec_b3.pdf
Streichert T, Koch D, Haubelt C, Teich J (2006) Modeling and design of fault-tolerant and self-adaptive reconfigurable networked embedded systems. EURASIP J Embed Syst 2006:1–15. Article ID 42168, doi:10.1155/ES/2006/42168
Simmler H, Levinson L, Manner R (2000) Multitasking on FPGA coprocessors. In: Proceedings of the 10rd international conference on field programmable logic and application (FPL), pp 121–130
Stitt G, Lysecky R, Vahid F (2003) Dynamic hardware/software partitioning: a first approach. In: Proceedings of the 40th conference on design automation (DAC). ACM, pp 250–255
Schallenberg A, Nebel W, Herrholz A, Hartmann P, Grüttner K, Oppenheimer F (2010) POLYDYN – object-oriented modelling and synthesis targeting dynamically reconfigurable FPGAs. In: Platzner M, Teich J, Wehn N (eds) Dynamically reconfigurable systems. Springer, pp 139–158
The Spirit Consortium (2007) http://www.spiritconsortium.org/
Saldana M, Patel A, Liu HJ, Chow P (2012) Using partial reconfiguration and message passing to enable FPGA-based generic computing platforms. EURASIP J Embed Syst 2012(127302):10
Storer JA, Szymanski TG (1982) Data compression via textual substitution. J ACM 29(4):928–951
Streichert T, Strengert C, Koch D, Haubelt C, Teich J (2007) Communication aware optimization of the task binding in hardware/software reconfigurable networks. Int J Circ Syst 2(1):29–36
Scalera SM, Vázquez JR (1998) The design and implementation of a context switching FPGA. In: Proceedings of the IEEE symposium on FPGAs for custom computing machines (FCCM). IEEE Computer Society, p 78
Snider GS, Williams S (2007) Nano/CMOS architectures using a field-programmable nanowire interconnect. Nanotechnology 18(3):1–11
Sim JE, Wong W-F, Teich J (2009) Optimal placement-aware trace-based scheduling of hardware reconfigurations for FPGA accelerators. In: Proceedings of the 17th annual IEEE symposium on field-programmable custom computing machines (FCCM), Apr 2009. IEEE Computer Society (to appear)
Synopsys Inc (2008) DesignWare GTECH library. Available online: www.synopsys.com/dw/doc.php/doc/dwf/manuals/dw_gtech.pdf
Trimberger S, Carberry D, Johnson A, Wong J (1997) A time-multiplexed FPGA. In: Proceedings of the 5th IEEE symposium on FPGA-based custom computing machines (FCCM). IEEE Computer Society, pp 22–28
Teich J (2007) Lecture on reconfigurable computing: Chapter 1. inroduction, University of Erlangen-Nuremberg. Available online: http://www12.informatik.uni-erlangen.de/edu/rc/slides/1_RC_introduction.pdf
Tessier R (1998) Negotiated A* routing for FPGAs. In: Proceedings of the fifth canadian workshop on field-programmable devices (FPD)
Tiwari A, Tomko KA (2003) Scan-chain based watch-points for efficient run-time debugging and verification of FPGA designs. In: Proceedings of the conference on Asia South Pacific design automation (ASPDAC). ACM, pp 705–711
Virtual Socket Interface Alliance (2007) Legacy documents of the VSI alliance. Available online: http://vsi.org/
Vanmeerbeeck G, Schaumont P, Vernalde S, Engels M, Bolsens I (2001) Hardware/software partitioning of embedded system in OCAPI-xl. In: Proceedings of the ninth international symposium on hardware/software codesign. ACM, pp 30–35
Wazlowski M, Agarwal L, Lee T, Smith A, Lam E, Athanas P, Silverman H, Ghosh S (1993) PRISM-II compiler and architecture. In: Buell DA, Pocek KL (eds) IEEE workshop on FPGAs for custom computing machines (FCCM). IEEE Computer Society Press, pp 9–16
Wheeler T, Graham P, Nelson BE, Hutchings B (2001) Using design-level scan to improve FPGA design observability and controllability for functional verification. In: Proceedings of the 11th international conference on field programmable logic and application (FPL), pp 483–492
Wirthlin M, Hutchings BL (1995) DISC: the dynamic instruction set computer. In: Schewel J (ed) Proceedings on Field Programmable Gate Arrays (FPGAs) for fast board development and reconfigurable computing (SPIE) 2607. SPIE – The International Society for Optical Engineering, pp 92–103
Wirthlin MJ, Hutchings BL (1996) Sequencing run-time reconfigured hardware with software. In: Proceedings of the 1996 ACM fourth international symposium on field-programmable gate arrays (FPGA). ACM, pp 122–128
Wang F, Jean J (2006) Architectural support for runtime 2D partial reconfiguration. In: Proceedings of international conference on engineering of reconfigurable systems and algorithms (ERSA), June 2006. CSREA Press, pp 231–232
Wetekam G, Lutz B(2005) Hardware-Implementierung einer 3D-Huffman-Decodierung für dynamische Volumendaten. In: Hardware for visual computing workshop, Universität Tübingen, University of Tübingen
Walder H, Platzner M (2002) Non-preemptive multitasking on FPGA: task placement and footprint transform. In: Proceedings of the international conference on engineering of reconfigurable systems and algorithms (ERSA), June 2002. CSREA Press, pp 24–30
Walder H, Platzner M (2004) A runtime environment for reconfigurable operating systems. In: Proceedings of the 14th international conference on field programmable logic and application (FPL), pp 831–835
Xilinx Inc (2007) Platform flash in-system programmable configuration PROMs. Available online: www.xilinx.com/support/documentation/data_sheets/ds123.pdf
Xilinx Inc (2007) Xilinx: silicon devices. Available online: www.xilinx.com/products/silicon_solutions/
Xilinx Inc (2011) Partial reconfiguration user guide (Rel 13.2). Available online: www.xilinx.com/support/documentation/sw_manuals/xilinx13_2/ug702.pdf
Xu W, Ramanarayanan R, Tessier R (2003) Adaptive fault recovery for networked reconfigurable systems. In: Proceedings of the 11th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, p 143
eferino CA, Kreutz ME, Susin AA (2004) RASoC: a router soft-core for networks-on-chip. In: Proceedings of the conference on design, automation and test in Europe (DATE). IEEE Computer Society, p 30198
Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inform Theory 23(3):337–343
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Koch, D. (2013). Reconfigurable CPU Instruction Set Extensions. In: Partial Reconfiguration on FPGAs. Lecture Notes in Electrical Engineering, vol 153. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1225-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1225-0_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1224-3
Online ISBN: 978-1-4614-1225-0
eBook Packages: EngineeringEngineering (R0)