Reconfigurable CPU Instruction Set Extensions

Koch, Dirk

doi:10.1007/978-1-4614-1225-0_5

Dirk Koch²

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 153))

1632 Accesses

Abstract

Swapping just small fractions of the configuration of an FPGA can be very beneficial in many applications. This is in particular useful for reconfiguring the instruction set of embedded soft core processors. This is highly relevant for software driven design flows. Here, the system is initially implemented as far as possible in software (which is faster to accomplish than hardware development). By profiling the application, hot spots will be identified and kernals will be implemented for the FPGA for acceleration until performance requirements are met. There are several methodologies to integrate such accelerator modules. This ranges from small CPU instruction set extensions to large and fully autonomous modules that work concurrently with the CPU. In this chapter, we will investigate how CPU instruction set extensions can be used efficiently with the help of partial runtime reconfiguration. The base idea of extending a CPU with exchangeable instructions is sketched in Fig. 5.1.

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Swapping just small fractions of the configuration of an FPGA can be very beneficial in many applications. This is in particular useful for reconfiguring the instruction set of embedded soft core processors. This is highly relevant for software driven design flows. Here, the system is initially implemented as far as possible in software (which is faster to accomplish than hardware development). By profiling the application, hot spots will be identified and kernals will be implemented for the FPGA for acceleration until performance requirements are met. There are several methodologies to integrate such accelerator modules. This ranges from small CPU instruction set extensions to large and fully autonomous modules that work concurrently with the CPU. In this chapter, we will investigate how CPU instruction set extensions can be used efficiently with the help of partial runtime reconfiguration. The base idea of extending a CPU with exchangeable instructions is sketched in Fig. 5.1. Custom instructions access the register file in the same way as the ALU. By decoding unused instruction in the CPU ISA (instruction set architecture) a multiplexer may select between normal ALU operation or one or more user defined instructions. Softcore CPUs with statically implemented custom instructions are well supported. For example the NIOS-II CPU from Altera can be easily extended with custom instructions when using the SOPC builder wizard of the Quartus design tools. Similarly, Xilinx allows to add custom hardware to thier Microblaze softcore CPU using FSL ports. These ports provide basically a streaming port interface between the Microblaze core and the custom hardware. However, for implementing runtime reconfigurable custom instructions, the support is weak, hence omitting this powerful opportunity. In the following section, we will firstly demonstrate that commonly used techniques, like the Xilinx bus macro approach or the recent proxy logic technique is not well suited for integrating custom instructions. After this, in Sect. 5.2, we will demonstrate for a reconfigurable soft core processor that instructions can be integrated into the system without causing any additional logic overhead for the communication. In Sect. 5.3, we reveal how such systems can be easily implemented with the tool ReCoBus-Builder. Rather than providing reconfigurable islands, we will integrate multiple custom instructions in a slot-based fashion. Finally, in Sect. 5.4, an experimental evaluation of a system providing a MIPS CPU extended to support reconfigurable custom instructions will be presented.

5.1 On-FPGA Communication for Custom Instructions

One basic problem to be solved in the design of partially reconfigurable systems is to constrain the routing of the interface signals for a partial module during its physical implementation. As introduced in Chap. 2 , there are several ways to accomplish this. However, when considering a custom instruction set extension, as shown in Fig. 5.1, we have to consider that a relatively large wire count is required for connecting a relatively small reconfigurable area. For example, if we consider a bit permutation function, we have to connect 32 wires towards an island hosting the permutation accelerator and an additional 32 wires for the result. Note that this example would not demand any logic on the FPGA as a permutation is basically wiring on the FPGA. But still, as compared to a software implementation of a permute function, we can easily save a hundred or more assembly instructions, even if fully unroling the function. Or, if we consider a 64-bit XOR gate (over both operands) to be hosted in the same reconfigurable island, it requires two times 32 wires for connecting the input operands. But also in this case, we need only 21 4-input LUTs (e.g., on a Xilinx Virtex-II FPGA) or 13 or 6-input LUTs on a (Xilinx Spartan-6 FPGA) for implementing the 64 bit XOR gate. Again, this instruction would save about a hundred instructions per call of the function. In other words, for some programs, we could gain a substantial speed-up by just adding little additional logic. And by making this configurable, we could host virtually an infinite amount of different accelerators for supporting various software tasks.

When implementing such custom instructions with slice-based bus macros, as illustratrated in Fig. 5.2a, it takes two LUTs per signal wire only for providing the accelerator connection. If we consider in total 100 wires for linking two times a 32-bit operand, a 32-bit result vector and a few additional signals, The overhead is 200 LUTs. This is roughly 10 ×more than actually needed for the XOR gate! Moreover, the look-up tables constitute not only a logic overhead, but also a latency overhead which is roughly 0.4 ns on a Virtex-II FPGA per LUT. Finally, adding LUTs for the communication can negatively impact the placement of both, the static system and the partial modules. For example, a placed bus macro LUT interrupts carry chains and it can further force to spread a module over more area.

With the recent proxy-logic approach, the situation has improved, as shown in Fig. 5.2b. However, it still needs 100 LUTs for the communication. Again this is pure overhead in terms of resources and latency. And as explained in Sect. 2.4.5 on Page 71, the proxy logic approach is not well suited to implement systems with many different reconfigurable modules.

At this point, someone might think to use static only implementations instead, if custom instructions are that small. This is probably the better option for very few instructions. With a rising number of instructions, the CPU gets larger and consequently slower. When assuming the simplified diagram of a CPU datapath in Fig. 5.1a, the ALU contains a multiplexer for selecting between the different sets of instructions of the ALU (e.g., Boolean logic, simple arithmetic, shifter, etc.). This multiplexer is in the critical path and unlikely to be pipelined [Met04], and despite that an FPGA fabric is mainly based on multiplexers, it is poor in implementing wide input multiplexers (see also Sect. 2.6 on Page 104). If carefully applied, runtime reconfiguration allows to integrate more instructions while providing higher performance than a static system. Note that this is in many cases still valid even when considering the configuration overhead. Moreover, partial reconfiguration adds a flexibility to the system that allows to integrate hardware accelerators dynamically to a system like known from the software world.

5.2 Zero Logic Overhead Integration

In this section, we will demonstrate how the Xilinx vendor tools can be used to integrate reconfigurable instructions without any logic overhead. As shown in Fig. 5.3a, we are only interested in binding the signals between the static system (the CPU) and the partial modules (the custom instructions) to a preceisly defined wire of the fabric, called a “PR link”, in the following. In order to occupy a wire segment (i.e., use a PR link), we need a path that will use this wire. In other words, there must by somewhere a primitive source (e.g., a LUT output) and another primitive destination (e.g., a LUT input) in our netlist with a requested connection from the source to the destination. However, this creates a path in our netlist but we have still not constrained the routing. This is done by generating blocker macros, that occupy a user specified set of routing resources such that the Xilinx vendor router cannot use these wires for further implementation steps. The blocker concept is introduced in Sect. 3.2.4. Note that we cannot constrain the routing directly in a way that we say “use wire _x for signal _y”. We are basically defining a wire allocation in a way that we define “donotuse wire _z”. However, if we ensure by our allocation, that there is only one possible path remaining, we can actually achieve our goal to bind a signal path to a wire.

5.2.1 Static System Constraints

With the knowledge of how to create a path and how to constrain this path to certain wire resources of the FPGA fabric, we can implement the static system. The static system contains the CPU and a reconfigurable region. In order to create paths into this region for connecting the operands OP_A and OP_B (see Fig. 5.1), we place a connection primitive into the reconfigurable region (PR region), as depicted in Fig. 5.3b. This primitive acts as a placeholder for the partial module and is the destination for the operand routing. Similarly, for creating a path for the result vector back to the CPU in the static part, we place a placeholder acting as the source for the path. Note that the same LUT primitive (or, to be more preceise, a slice) might be used as a placeholder for multiple input and output signals at the same time. So far, this seams to be pretty much identical to the proxy logic approach. However, we will now add a blocker into the reconfigurable region that blocks all routing resources in this region, except the wires to be used as PR links. Note, that the placement of the placeholders and the blocking is not random and has to support the intended PR link. If we now start the router, we will create the routing of the static system including paths to and from the partial region that are routed using the requested PR links. There are two things to remember: (1) we have not added any logic overhead to the static system, and (2) we only blocked wire resources inside the reconfigurable region.

5.2.2 Partial Module Constraints

The partial module implementation (here the custom instructions) is very similar to what we did for the static system. However, all signals directions are now changed and with respect to a custom instruction, the operands are no inputs and the result vector is an output. Consequently, we place a source placeholder as the start for the operands outside the reconfigurable region (i.e., the static region). Respectively, we add also placeholders acting as the destinations for the result vector. Again, placeholders for inputs and outputs can share the same FPGA primitive, as shown in Fig. 5.3. We will now add a blocker around the partial module that conguests all routing resources, except the ones needed to route the operands and results over the PR links. Here it is important, that the blocker releases PR links that are compatible to the PR links used in the static design. Again, there are two things to remember: (1) we have added no logic overhead to the static system, and (2) we only blocked wires outside the reconfigurable region. Consequently, when loading a reconfigurable instruction into an reconfigurable island that was created as described for the static system in the last section, there will be no placeholder module visible. The placeholders are only temporarily required to create a path over the PR link.

5.2.3 Communication Binding by Wire Allocation

The zero logic overhead technique has to follow some rules. Again, by blocking, we can only select the set of wires that are allowed for routing (i.e., wire allocation) but this does not necessary ensure a particular binding of a logical signal to a physical wire. However, the binding is achieved by allocating wires such that only one unique routing path can be used to reach the connection macro (see Fig. 5.4). As a consequence, not all wires within a CLB can be used at the same time to implement the routing between the static part and the partial part of a system. This is because in the case that multiple wires are routed from one configurable logic block (CLB) to another, wires must be allocated that cannot be swapped. A possible swapping of wires would allow the router to decide between more than one option for a PR link, which cannot be accepted. A situation of allocating swappable wire resources is shown in Fig. 5.4a. Here, the problem is that both allocated wires can be arbitrary used to connect to both placeholders that used for the data signals data[0] and data[1]. Consequently, the router has two possibilities to chose from and we cannot gurantee a signal binding to a specific PR link. However, by allocating a different wire set, we leave only one possible path per data signal and we achieve an exact binding to wires, as shown in Fig. 5.4b. Note that designing PR link paths needs deep knowledge about the FPGA routing fabric including wire resources and possible switch matrix settings. This information is provided by Xilinx individually for each FPGA in a language called XDL [BKT11].

5.3 Implementing Reconfigurable Instructions with the ReCoBus-Builder

The ReCoBus-Builder is originally designed for implementing bus-based systems consisting of many small resource slots that are integrated with the help of macros, as revealed in Sect. 3.2. At this point, we focus only on macros implementing the connection bar architecture (Sect. 2.6.1). For implementing the zero logic overhead approach, we follow the original ReCoBus-Builder flow and perform resource budgeting and define a floorplan that fulfills the resource requirements. Then, we create our communication architecture that will provide connection primitives in the static part of the system as well as in each resource slot. Let us consider the simple case of a connection bar to connect only a single resource slot. We would then basically generate a Xilinx bus macro for an island reconfiguration style. When following the default ReCoBus-Builder flow, we will generate two blocker macros, one for the static design and one for the reconfigurable modules. We will use these blockers for implementing the PR link approach shown in Fig. 5.3. As the blockers generated by the ReCoBus-Builder will not block the wires that are already used for the connection bar macro, the blocker will contain a tunnel for a PR link. The only thing that is now missing are the placeholder primitives. These primitives are taken directly from the generated connection bar macro. Consequently, we can generate compatible placeholder/blocker pairs for both the static system and the partial modules. If we assume a connection bar with one internal wire towards east and another wire towards westwards direction, the resulting primitives and blockers would match the example in Fig. 5.3. The ReCoBus-Builder has a wire database for each supported device. This is used by the tool to check if a wire allocation can ensure PR links without possible swaps as discussed in the last paragraph. With this approach, we can provide four double wire PR links per CLB on a Xilinx Virtex-II FPGA.

As a case study, we consider to integrate up to five different instructions into the system at the same time. Instead of using five individual islands for hosting the instruction modules (as it would be necessary following the Xilinx PR flow), the system uses a more flexible approach with one reconfigurable area that is tiled into five resource slots, as depicted in Fig. 5.5. This has the advantage that modules of different size can be more efficiently integrated into the system by taking a variable amount of slots. The communication architecture has to link the two operands to each slot and the result vector back individually for each slot to an instruction multiplexer. By using different wire resources for the operands and the result vectors that route over different distances, both requirements can be properly implemented. By taking advantage of the regular FPGA fabric, the slots can be arranged completely identically, hence allowing free placement of instructions into the reconfigurable ALU. Figure 5.5 reveals a detail of the routing architecture of Xilinx Virtex-II FPGAs that was used to provide slots that are smaller than the routing distance of a wire. In the example, it is assumed that one resource slot is only one CLB wide and that the operands are routed using double lines that route two CLBs wide. However, by using a connection in the middle of the wire, which is provided by the routing fabric after a distance of one CLB, and by displacing the start points of the regular routing structure of the two operands by one CLB in horizontal direction, both operands can be accessed in any slot. This is possible by routing the signals in an interleaved manner. Note that it is also possible to route paths by cascading multiple different wires, which would allow to widen the slots (in terms of CLB columns) and to extend the total amount of slots for hosting modules (see Sect. 2.5.2 on Page 81 for more details). The interleaving results in swapping the operands with respect to the placement position (odd or even start slot). However, for instructions that are not commutative, we can use two physical implementations in order to omit the alignment multiplexing. See Sect. 2.5.3 on Page 92 for more details on interleaving.

5.4 Case Study on Custom Instructions

The case study has been implemented with the ReCo-Bus-Builder on a Xilinx Virtex-II XC2V500-5 FPGA. The tool generates regular structured macros together with the surrounding blocker macros that constrain the routing. The implementation follows directly the methodology revealed in Sect. 5.2. The communication macros provide the connection primitives and fix the wire resources. The ReCoBus-Builer generates the all macros (including the blocker) in the Xilinx design language (XDL). While communication macros are instantiated using the HDL flow, the blockers are integrated into the design just before the final route step. A floorplanning view on the system is depicted in Fig. 5.6. The area reserved for hosting reconfigurable instructions is 8% of the total amount of CLBs that are available on the used device. With five times 48 slices, the PR region provides roughly 15–20% the amount of logic that would be required by an optimized 32 bit soft core processor, such as the Xilinx Microblaze. For the experiments, we used our own MIPS processor implementation that has not been optimized for speed or area, but which can be easily adapted to include reconfigurable instructions.

5.4.1 Static System Implementation

During implementation of the static system, connection primitives that are placed inside the reconfigurable region and that are surrounded with blocker macros have been used to constrain all signals required to integrate the instructions. A screenshot with the static system is shown in Fig. 5.7. The amount of wires that are connected from the static part of the system to the PR region is 2 ×32 for the operands plus additional eight wires of control signals. In reverse direction, each one out of the five slot delivers a 32 bit result plus additional four flags. This results in a total amount of $64 + 8 + 5 \times(4 + 32) = 252$ wires.

According to the partial design flow provided by Xilinx, the number of operand bits and control signals has to be multiplied by the number of slots, as that flow does not consider multicast routing to multiple slots without additional connection primitives. Then the slice based macro approach would cost $2 \times5 \times(72 + 36) = 1,080$ LUTs only for the communication. This is 18% of the available LUTs on the target device and roughly one third of the logic a fully featured 32 bit Microblaze soft core processor would take. Even using the new flow that is based on proxy logic, would still result in a remarkable unnecessary overhead.

When floorplanning a reconfigurable system, it is recommended to consider the underlying FPGA architecture. For example, Xilinx FPGAs are column-wise reconfigured, which should be taken into account by designing the slots vertically. This optimizes the reconfiguration time. A restriction derived from the full column reconfiguration scheme is that no distributed memory can be used directly above or below the PR region as this would corrupt the state of these primitives. Following this rule, partial reconfiguration can be carried out while continuing the system to operate.

FPGAs provide carry chain logic, which are used for different kinds of arithmetic operations. On Xilinx FPGAs, the carry chains include four LUTs per CLB and the chains are arranged in upwards direction. Consequently, we built the system such that exactly two times four operand signal bits and four bits of the result vector are connected in a CLB. Furthermore, the signal vector bits are connected bottom-up (LSBs in the bottom) to follow the carry chain. Without this physical port mapping, routing will get very conguested for the modules. In [CPF09], a tool using a simulated annealing heuristic was used to place communication macros around a reconfigurable region that was also used for reconfigurable CPU extensions. Such tools have an excessive runtime as they require a place and route step for each annealing step. It can be assumed, that the final result would be very similar to the here proposed rule based port mapping that needs only one place and route run.

5.4.2 Reconfigurable Instructions

For implementing the reconfigurable modules, the complete static system was substituted with a connection bar macro, as depicted in Fig. 5.8. This permits to implement reconfigurable modules in absence of the static system. As can be seen in Fig. 5.8 for a CRC checksum function, a module is surrounded with a blocker macro for restricting modules into strict bounding boxes. This design has no connections to external pins. The timing was constrained with the Xilinx TPSYNC parameter.

5.4.3 Results and Overhead Analysis

Swapping instructions comprises a significant time for writing the corresponding partial bit stream to the right target position. In addition, extra time might be required for computing a placement position or performing some bitstream manipulations. This extra time overhead is implementation dependent and not further considered in the following. However, due to the small size of the systems, most work could be precomputed offline (e.g., a table for the placement position). When taking the decision to use reconfigurable instructions, it is important to know the latency that has to be considered for the reconfiguration process (response time) and the time the processor will require when executing the instructions alternatively as simple software function calls. This determines the breakeven factor k and the system has to trigger a reconfigurable instruction at least k times before gaining a benefit in the total execution time of the system. Note that we use function calls and no traps, as traps are very specific for emulating CPU instructions in software and because traps have a tiny additional overhead that would not occur in case of normal function calls. The configuration times and the execution times for software implementations of the custom instructions (determined in a simulator) are listed in Table 5.1.

Table 5.1 Implementation and performance details

Full size table

The reconfiguration process is relatively slow and would consequently prevent using custom instructions in time critical parts of the software (e.g., interrupts). However this is not problematic as critical software parts should typically not perform complex computations. The breakeven factor k is the number of possible invocations of a particular instruction during the time to configure this instruction. As can be seen, for complex operations, such as the CRC instruction, less than 300 calls of this reconfigurable instruction would pay of the configuration overhead; and even if an instruction can save only a few cycles, this can pay of after just a few thousand cycles. Considering that the saturation addition/subtraction module is used in an image processing application, it can be assumed that it is very likely to trigger this function an sufficient amount of times. It must be mentioned that the listed values are theoretical and the breakeven points will probably be likely higher. This is because the configuration data transfer is in our system in conflict with the CPU (shared memory buses); and even having only a few KB of configuration data results in a burst affecting the CPU. However, reconfigurable instructions are still an interesting option for both saving FPGA resources and gaining performance.

The values in brackets denote the utilization within the occupied slots. Despite that the CRC logic would easily fit into one slot, an additional slot was required to fully route the module. The bitstream size states only the fraction of the partial module and no static parts. The reconfiguration time is mainly related to the amount of slots that have to be written to the device. A single slot configuration is 11.6 KB on this device which results in 0.6 ms configuration time, when assuming a configuration speed of 20 MB/s. The latency was determined using the FPGA editor. The values are measured between the operand fetching pipeline register through the combinatory path of the instruction and further towards the output of the instruction select multiplexer. The max value denotes the critical path delay and the average delay over all paths.

The examples point out that small FPGA areas are sufficient to include very valuable instructions into a CPU with the help of partial runtime reconfiguration. Despite the small slots, a high number of signals can be interfaced to partial modules.

References

Ahmadinia A, Bobda C, Ding J, Majer M, Teich J, Fekete S, van der Veen J (2005) A practical approach for circuit routing on dynamic reconfigurable devices. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping (RSP), June 2005, pp 84–90
Google Scholar
Athanas P, Bowen J, Dunham T, Patterson C, Rice J, Shelburne M, Suris J, Bucciero M, Graf J (2007) Wires on demand: run-time communication synthesis for reconfigurable computing. In: International conference on field programmable logic and applications (FPL), Aug 2007. IEEE, pp 513–516.
Google Scholar
Abel N (2011) Design and implementation of an object-oriented framework for dynamic partial reconfiguration. PhD thesis, University of Heidelberg, Heidelberg
Google Scholar
Ahmadinia A, Bobda C, Majer M, Teich J, Fekete S, van der Veen J (2005) DyNoC: a dynamic infrastructure for communication in dynamically reconfigurable devices. In: Proceedings of the international conference on field-programmable logic and applications (FPL), Aug 2005, pp 153–158
Google Scholar
Abound Logic, Inc (2009) The raptor family of FPGAs (V1.0)
Google Scholar
Altera Corporation (2007) Avalon memory-mapped interface specification V3.3. Available online: www.altera.com/literature/manual/mnl_avalon_spec.pdf
Altera Inc (2009) Altera devices. Available online: www.altera.com/products/devices/dev-index.jsp
Athanas PM, Silverman HF (1993) Processor reconfiguration through instruction-set metamorphosis: compiler and architectures. IEEE Comput 26(3):11–18
Article Google Scholar
Ashenden PJ (2008) The designer’s guide to VHDL, 3rd edn. Morgan Kaufmann
Google Scholar
Asadi G-H, Tahoori MB (2005) Soft error mitigation for SRAM-based FPGAs. In: Proceedings of the 23rd IEEE VLSI test symposium (VTS), IEEE Computer Society, pp 207–212
Google Scholar
Bieser C, Bahlinger M, Heinz M, Stops C, Müller-Glaser KD (2006) A novel partial bitstream merging methodology accelerating Xilinx Virtex-II FPGA based RP system setup. In: Proceedings of the international conference on field programmable logic and applications (FPL), pp 1–4
Google Scholar
Brebner GJ, Diessel O (2001) Chip-based reconfigurable task management. In: Proceedings of the 11th international conference on field programmable logic and application (FPL), Springer, pp 182–191
Google Scholar
Beckhoff C (2007) Entwurf und Implementierung von Hardwaremodulen zur Dekompression von FPGA-Konfigurationsdaten. In: Studienarbeit, Lehrstuhl für Hardware-Software-Co-Design. Universtät Erlangen-Nürnberg, Erlangen
Google Scholar
Braun L, Hübner M, Becker J, Perschke T, Schatz V, Bach S (2007) Circuit switched run-time adaptive network-on-chip for image processing applications. In: International conference on field programmable logic and applications (FPL), Aug 2007. IEEE, pp 688–691
Google Scholar
Blodget B, James-Roxby P, Keller E, McMilla S, Sundararajan P (2003) A self-reconfiguring platform. In: Proceedings of international conference on field-programmable logic and applications (FPL), pp 565–574
Google Scholar
Beckhoff C, Koch D, Torresen J (2010) Short-circuits on FPGAs caused by partial runtime reconfiguration. In: Proceedings of the international conference on field programmable logic and applications (FPL), Aug 2010, pp 596–601
Google Scholar
Beckhoff C, Koch D, Torresen J (2011) The Xilinx Design Language (XDL): tutorial and use cases. In: Proceedings of the 6th international workshop on reconfigurable communication-centric systems-on-chip (ReCoSoC), pp 1–8
Google Scholar
Becker T, Luk W, Cheung PYK (2007) Enhancing relocatability of partial bitstreams for run-time reconfiguration. In: Proceedings of the 15th annual IEEE symposium on Field-programmable Custom Computing Machines (FCCM), IEEE Computer Society, pp 35–44
Google Scholar
Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. Computer 35(1):70–78
Article Google Scholar
Bieser C, Mueller-Glaser K-D (2006) Rapid prototyping design acceleration using a novel merging methodology for partial configuration streams of Xilinx Virtex-II FPGAs. In: Proceedings of the 17th IEEE international workshop on Rapid System Prototyping (RSP). IEEE Computer Society, pp 193–199
Google Scholar
Bobda C, Majer M, Koch D, Ahmadinia A, Teich J (2004) A dynamic NoC approach for communication in reconfigurable devices. In: Proceedings of international conference on field-programmable logic and applications (FPL). Volume 3203 of lecture notes in computer science (LNCS). Springer, pp 1032–1036
Google Scholar
Bruneel K, Abouelella FMMA, Stroobandt D (2009) TMAP: s reconfigurability-aware technology mapper. In: Jacquemod G, Luxey C, Damiano J-P (eds) Design, automation and test Europe: university booth (DATE), Apr 2009
Google Scholar
Baumgarte V, May F, Nückel A, Vorbach M, Weinhardt M (2001) PACT XPP – A self-reconfigurable data processing architecture. In: Proceedings of the international conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), June 2001, pp 64–70
Google Scholar
Betz V, Rose J (1997) VPR: a new packing, placement and routing tool for FPGA research. In: Proceedings of the 7th international workshop on field-programmable logic and applications (FPL). Springer, pp 213–222
Google Scholar
Brebner G (1997) The swappable logic unit: a paradigm for virtual hardware. In: Pocek KL, Arnold J (eds) IEEE symposium on FPGAs for custom computing machines (FPGA), Apr 1997, pp 77–86
Google Scholar
Betz V, Rose J, Marquardt A (eds) (1999) Architecture and CAD for deep-submicron FPGAs. Kluwer, Norwell
Google Scholar
Babb J, Tessier R (1993) Virtual wires: overcoming pin limitations in FPGA-based logic emulators. In: IEEE workshop on FPGAs for custom computing machines, pp 142–151
Google Scholar
Babb J, Tessier R, Dahl M, Hanono S, Hoki D, Agarwal A (1997) Logic emulation with virtual wires. IEEE trans Comput Aided Design 16:609–626
Article Google Scholar
Curt D, Kalara P, Leblanc R, Eck V, Trynosky S, Lindholm J, Bauer T, Blodget B, McMillan S, Philip J, Prasanna S, Keller E (2004) Reconfiguration of the programmable logic of an integrated circuit. WO Patent WO002004055986A3, issued 25, Nov 2004
Google Scholar
Compton K, Li Z, Cooley J, Knol S, Hauck S (2002) Configuration relocation and defragmentation for run-time reconfigurable computing. IEEE Trans Very Large Scale Integ Syst 10:209–220
Article Google Scholar
Claus C, Müller FH, Stechele W (2006) Combitgen: a new approach for creating partial bitstreams in Virtex-II Pro. In: Karl W, Becker J, Großpietsch K-E, Hochberger C, Maehle E (eds) Workshops proceedings of the 19th international conference on Architecture of Computing Systems (ARCS). Volume 81 of lecture notes in informatics. GI, pp 122–131
Google Scholar
Claus C, Müller FH, Zeppenfeld J, Stechele W (2007) A new framework to accelerate Virtex-II Pro dynamic partial self-reconfiguration. In: Proceedings of the IEEE 21th International Parallel and Distributed Processing Symposium (IPDPS), Mar 2007, pp 1–7
Google Scholar
Carver JM, Pittman RN, Forin A (2009) Automatic bus macro placement for partially reconfigurable FPGA designs. In: Proceeding of the ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA). ACM, pp 269–272
Google Scholar
Department of Computer Science 12, ReCoNets-Project website, University of Erlangen Nuremberg, Germany. www.reconets.de
Dehon A (1999) Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100% LUT utilization. In: Proceedings of the international symposium on Field Programmable Gate Arrays (FPGA), pp 69–78
Google Scholar
Dittmann F, Frank S (2007) Hard real-time reconfiguration port scheduling. In: Proceedings of the conference on design, automation and test in Europe (DATE). EDA Consortium, pp 123–128
Google Scholar
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271
Article MathSciNet MATH Google Scholar
Demirsoy SS, Langhammer M (2009) Cholesky decomposition using fused datapath synthesis. In: Proceeding of the ACM/SIGDA international symposium on Field-Programmable Gate Arrays (FPGA). ACM, pp 241–244
Google Scholar
Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proceedings of the 38th conference on design automation (DAC), pp 684–689
Google Scholar
Elnozahy EN, Alvisi L, Wang Y-M, Johnson D (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3): 375–408
Article Google Scholar
Altera Inc. (2007) Enhanced configuration devices (EPC4, EPC8 & EPC16) data sheet. www.altera.com/literature/hb/cfg/ch_14_vol_2.pdf
Can programmables break out of $3.6bn niche. ElectronicWeekly, 26.06.2009
Google Scholar
Fidge C (1988) Timestamps in message-passing systems that preserve the partial ordering. In: Proceedings of the 11th Australian Computer Science Conference (ACSC), pp 56–66
Google Scholar
Fekete SP, Kamphans T, Schweer N, Tessars C, van der Veen JC, Angermeier J, Koch D, Teich J (2008) No-break dynamic defragmentation of reconfigurable devices. In: FPL 2008, international conference on field programmable logic and applications, Heidelberg, pp 113–118
Google Scholar
Flynn M (2005) Area – time – power and design effort: the basic tradeoffs in application specific systems. In: Proceedings of the IEEE international conference on Application-Specific Systems, Architecture Processors (ASAP). IEEE Computer Society, pp 3–6
Google Scholar
Fekete SP, van der Veen J, Angermeier J, Göhringer D, Majer M, Teich J (2007) Scheduling and communication-aware mapping of HW/SW modules for dynamically and partially reconfigurable SoC architectures. In: Proceedings of the 20th international conference on Architecture of Computing Systems (ARCS), Zurich. VDE-Verlag, pp 151–160
Google Scholar
Fekete S, van der Veen J, Majer M, Teich J (2006) Minimizing communication cost for reconfigurable slot modules. In: Proceedings of 16th international conference on field programmable logic and applications (FPL), Aug 2006, pp 535–540
Google Scholar
Guccione S, Levi D, Sundararajan P (1999) JBits: Java based interface for reconfigurable computing. In: Proceedings of the 2nd annual military and aerospace applications of programmable devices and technologies (MAPLD)
Google Scholar
Gao S, Schmidt A, Sass R (2009) Hardware implementation of mpi barrier on an fpga cluster. In: Proceedings of the international conference on field programmable logic and applications (FPL), pp 12–17
Google Scholar
Gupta RK, Zorian Y (1997) Introducing core-based system design. IEEE Des Test 14(4):15–25
Article Google Scholar
Halfhill TR (2007) Tabulas time machine – Rapidly reconfigurable chips will challenge conventional FPGAs. Microprocessor Report, Issue 032910. Available online: www.tabula.com/news/M11_Tabula_Reprint.pdf.
Hauck S (1998) Configuration prefetch for single context reconfigurable coprocessors. In: Proceedings of the sixth ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA). ACM, pp 65–74
Google Scholar
Heun V (2003) Grundlegende algorithmen. Vieweg
Google Scholar
Hempel G, Hochberger C, Koch A (2010) A comparison of hardware acceleration interfaces in a customizable soft core processor. In: Proceedings of the international conference on field programmable logic and applications (FPL). IEEE Computer Society, pp 469–474
Google Scholar
Hagemeyer J, Kettelhoit B, Koester M, Porrmann M (2007) A design methodology for communication infrastructures on partially reconfigurable FPGAs. In: International conference on field programmable logic and applications (FPL), Aug 2007. IEEE, pp 331–338
Google Scholar
Hagemeyer J, Kettelhoit B, Koester M, Porrmann M (2007) Design of homogeneous communication infrastructures for partially reconfigurable FPGAs. In: Proceedings of the international conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Jun 2007
Google Scholar
Hagemeyer J, Kettelhoit B, Porrmann M (2006) Dedicated module access in dynamically reconfigurable systems. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS)
Google Scholar
Haubelt C, Koch D, Teich J (2003) Basic OS support for distributed reconfigurable hardware. In: Proceedings of the third international workshop on systems, architectures, modeling, and simulation, July 2003, pp 18–22
Google Scholar
Hansen SG, Koch D, Torresen J (2011) High speed partial run-time reconfiguration using enhanced ICAP hard macro. In: Proceedings of the 18th reconfigurable architectures workshop (RAW). IEEE
Google Scholar
Horta EL, Lockwood JW, Kofuji ST (2002) Using PARBIT to implement partial run-time reconfigurable systems. In: Proceedings of the 12th international conference on field-programmable logic and applications (FPL). Springer, pp 182–191
Google Scholar
Horta EL, Lockwood JW, Taylor DE, Parlour D (2002) Dynamic hardware plugins in an FPGA with partial run-time reconfiguration. In: Proceedings of the 39th conference on design automation (DAC). ACM, pp 343–348
Google Scholar
Huang W-J, McCluskey EJ (2001) Column-based precompiled configuration techniques for FPGA. In: Proceedings of the the 9th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, pp 137–146
Google Scholar
Hübner M, Schuck C, Kühnle M, Becker J (2006) New 2-dimensional partial dynamic reconfiguration techniques for real-time adaptive microelectronic circuits. In: Proceedings of the IEEE computer society annual symposium on emerging VLSI technologies and architectures (ISVLSI), p 97
Google Scholar
Huffman DA (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101
Article Google Scholar
Hauck S, Wilson WD (1999) Runlength compression techniques for FPGA configurations. In: Proceedings of the seventh annual IEEE symposium on field-programmable custom computing machines (FCCM99). IEEE Computer Society, pp 286–287
Google Scholar
Xilinx Inc (2002) Two flows for partial reconfiguration: module based or difference based. Available online: www.xilinx.com/bvdocs/appnotes/xapp290.pdf
Atmel Inc (2003) AT40K series configuration user guide. Available online: http://www.atmel.com/dyn/resources/prod_documents/DOC1009.PDF
Xilinx Inc (2005) Using look-up tables as shift registers (SRL16) in Spartan-3 generation FPGAs. Available online: www.xilinx.com/bvdocs/appnotes/xapp465.pdf
Xilinx Inc (2005) Virtex-II platform FPGAs: complete data sheet. Available online: http://www.xilinx.com/support/documentation/data_sheets/ds031.pdf
Xilinx Inc (2007) Xilinx Virtex-II platform FPGA user guide. Available online: http://www.xilinx.com/support/documentation/user_guides/ug002.pdf
Altera Inc (2008) Logic array blocks and adaptive logic modules in Stratix IV devices. Available online: http://www.altera.com/literature/hb/stratix-iv/
Xilinx Inc (2008) CLB white papers. Available online: http://www.xilinx.com/support/documentation/wpfpgafeaturedesign_clb.htm/
Xilinx Inc (2008) PlanAhead design analysis tool. Available online: http://www.xilinx.com/ise/optional_prod/planahead.htm
International Business Machines corporation (IBM) (1999) The CoreConnect bus architecture. Available online: http://www-03.ibm.com/chips/products/coreconnect/
Jones G, Sheeran M (1990) Circuit design in Ruby. In Staunstrup, J. (ed) Formal Methods for VLSI Design. North-Holland
Google Scholar
Jerraya AA, Wolf W (2005) Hardware/software interface codesign for embedded systems. Computer 38(2):63–69
Article Google Scholar
Koch D, Bobda C, Ahmadinia A, Teich J (2007) FPGA architecture extensions for preemptive multitasking and hardware defragmentation. In: Proceedings of International Conference on Field-Programmable Technology 2007 (ICFPT ’07), Dec 2007. IEEE, pp 433–436
Google Scholar
Koch D, Beckhoff C, Teich J (2007) Bitstream decompression for high speed FPGA configuration from slow memories. In: Proceedings of international conference on field-programmable technology (ICFPT), Dec 2007. IEEE, pp 161–168
Google Scholar
Koch D, Beckhoff C, Teich J (2008) ReCoBus-builder – A novel tool and technique to build statically and dynamically reconfigurable systems for FPGAs. In: Proceedings of international conference on field-programmable logic and applications (FPL 08), Sept 2008, pp 119–124
Google Scholar
Koch D, Beckhoff C, Teich J (2009) A communication architecture for complex runtime reconfigurable systems and its implementation on Spartan-3 FPGAs. In: Proceedings of the 17th ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2009), Feb 2009. ACM, pp 233–236
Google Scholar
Koch D, Beckhoff C, Teich J (2009) Hardware decompression techniques for FPGA-based embedded systems. ACM Trans Reconfig Technol Syst 2:9:1–9:23
Google Scholar
Koch D, Beckhoff C, Teich J (2009) Minimizing internal fragmentation by fine-grained two-dimensional module placement for runtime reconfigurable systems. In: 17th annual IEEE symposium on field-programmable custom computing machines (FCCM 2009), Apr 2009. IEEE Computer Society, pp 251–254
Google Scholar
Koch D, Beckhoff C, Torresen J (2010) Obstacle-free two-dimensional online-routing for run-time reconfigurable FPGA-based systems. In: Proceedings of international conference on field-programmable technology (ICFPT1́0). IEEE, pp 208–215
Google Scholar
Koh S, Diessel O (2006) COMMA: a communications methodology for dynamic module-based reconfiguration of FPGAs. In: Workshops proceedings of the 19th international conference on architecture of computing systems (ARCS), Mar 2006, pp 173–182
Google Scholar
Koh S, Diessel O (2006) COMMA: a communications methodology for dynamic module reconfiguration in FPGAs. In: Proceedings of the 14th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, pp 273–274
Google Scholar
Krasteva YE, de la Torre E, Riesgo T, Joly D (2006) Virtex II FPGA bitstream manipulation: application to reconfiguration control systems. In: Proceedings of the 16th international conference on field programmable logic and applications (FPL). IEEE, pp 1–8
Google Scholar
Koch D, Haubelt C, Streichert T, Teich J (2007) Modeling and synthesis of hardware-software morphing. In: Proceedings of the international symposium on circuits and systems (ISCAS 2007), May 2007, pp 2746–2749
Google Scholar
Koch D, Haubelt C, Teich J (2007) Efficient hardware checkpointing – Concepts, overhead analysis, and implementation. In: Proceedings of the 15th ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2007), Feb 2007. ACM, pp 188–196
Google Scholar
Koch D, Haubelt C, Teich J (2008) Efficient reconfigurable on-chip buses for FPGAs. In: 16th annual IEEE symposium on field-programmable custom computing machines (FCCM 2008), Apr 2008. IEEE Computer Society, pp 287–290
Google Scholar
Krasteva YE, Jimeno AB, de la Torre E, Riesgo T (2005) Straight method for reallocation of complex cores by dynamic reconfiguration in Virtex II FPGAs. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping (RSP), pp 77–83
Google Scholar
Kissler D, Kupriyanov A, Hannig F, Koch D, Teich J (2006) A generic framework for rapid prototyping of system-on-chip designs. In: Proceedings of international conference on computer design (CDES), June 2006, pp 189–195
Google Scholar
Koch D, Körber M, Teich J (2006) Searching RC5-keys with distributed reconfigurable computing. In: Proceedings of international conference on engineering of reconfigurable systems and algorithms (ERSA 2006), June 2006. CSREA Press, pp 42–48
Google Scholar
Kalte H, Lee G, Porrmann M, Rückert U (2005) REPLICA: a bitstream manipulation filter for module relocation in partial reconfigurable systems. In: Proceedings of the 19th International Parallel and Distributed Processing Symposium – Reconfigurable architectures workshop (IPDPS). IEEE Computer Society
Google Scholar
Kapre N, Mehta N, Delorimier M, Rubin R, Barnor H, Wilson MJ, Wrighton M, Dehon A (2006) Packet-switched vs. time-multiplexed FPGA overlay networks. In: Proceedings of the IEEE symposium on Field-programmable Custom Computing Machines (FCCM). IEEE, pp 205–216
Google Scholar
Kotolly A (2006) The economics of structured- and standard-cell-ASIC designs. Electronic News, 16.03.2006
Google Scholar
Kalte H, Porrmann M (2005) Context saving and restoring for multitasking in reconfigurable systems. In: Proceedings of the 15th international conference on field programmable logic and applications (FPL), Aug 2005, pp 223–228
Google Scholar
Kalte H, Porrmann M (2006) REPLICA2Pro: task relocation by bitstream manipulation in Virtex-II/Pro FPGAs. In: Proceedings of the 3rd conference on computing frontiers (CF). ACM, pp 403–412
Google Scholar
Kalte H, Porrmann M, Rückert U (2002) A prototyping platform for dynamically reconfigurable system on chip designs. In: Proceedings of the IEEE workshop heterogeneous reconfigurable systems on chip (SoC)
Google Scholar
Kalte H, Porrmann M, Rückert U (2004) Study on column wise design compaction for reconfigurable systems. In: Proceedings of the IEEE international conference on field programmable technology (FPT), Dec 2004
Google Scholar
Kalte H, Porrmann M, Rückert U (2004) System-on-programmable-chip approach enabling online fine-grained 1D-placement. In: Proceedings of the 11th reconfigurable architectures workshop (RAW), pp 141–146
Google Scholar
Kuon I, Rose J (2007) Measuring the gap between FPGAs and ASICs. Tran Comput-Aided Des Integr Circ Syst 26(2):203–215
Article Google Scholar
Koch D, Reimann F, Streichert T, Haubelt C, Teich J (2010) ReCoNets – design methodology for embedded systems consisting of small networks of reconfigurable nodes and connections. In: Platzner M, Teich J, Wehn N (eds) Dynamically reconfigurable systems. Springer, pp 223–244
Google Scholar
Koch D, Streichert T, Haubelt C, Teich J (2008) Logic chip, logic system and method for designing a logic chip. Patent PCT/EP2008/007342
Google Scholar
Koch D, Streichert T, Haubelt C, Teich J (2008) Logic chip, method and computer program for providing a configuration information for a configurable logic chip. Patent PCT/EP2008/007343
Google Scholar
Kissler D, Strawetz A, Hannig F, Teich J (2009) Power-efficient reconfiguration control in coarse-grained dynamically reconfigurable architectures. In: Proceedings of the 18th international workshop on power and timing modeling, optimization, and simulation (PATMOS), Sept 2009. Lecture notes in computer science (LNCS, vol 5349). Springer, pp 307–317
Google Scholar
Koch D, Teich J (2004) Platform-independent methodology for partial reconfiguration. In: Proceedings of the 1st conference on computing frontiers (CF’04). ACM, pp 398–403
Google Scholar
Koch D, Torresen J (2010) Routing optimizations for component-based system design and partial run-time reconfiguration on FPGAs. In: Proceedings of international conference on field-programmable technology (ICFPT). IEEE, pp 460–464
Google Scholar
Koch D, Torresen J (2011) A routing architecture for mapping dataflow graphs at run-time. In: Proceedings of international conference on field-programmable logic and applications (FPL 11), Sept 2011, pp 286–290
Google Scholar
Koch D, Torresen J (2011) FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In: Proceedings of the 19th ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2011), Feb 2011. ACM, pp 45–54
Google Scholar
Kachris C, Vassiliadis S (2006) Performance evaluation of an adaptive FPGA for network applications. In: Proceedings of the seventeenth IEEE international workshop on rapid system prototyping (RSP). IEEE Computer Society, pp 54–62
Google Scholar
Lewis D, Ahmed E, Baeckler G, Betz V, Bourgeault M, Galloway D, Hutton M, Lane C, Lee A, Leventis P, Mcclintock C, Padalia K, Pedersen B, Powell G, Ratchev B, Reddy S, Schleicher J, Stevens K, Yuan R, Cliff R, Rose J (2005) The Stratix II logic and routing architecture. In: Proceedings of the ACM/SIGDA 13th international symposium on field-programmable gate arrays (FPGA). ACM, pp 14–20
Google Scholar
Lysaght P, Blodget B, Mason J, Young J, Bridgford B (2006) Invited paper: enhanced architecture, design methodologies and CAD tools for dynamic reconfiguration of Xilinx FPGAs. In: Proceedings of the 16th international conference on field programmable logic and application (FPL), Aug 2006, pp 1–6
Google Scholar
Lu R, Cao A, Koh C-K (2007) SAMBA-Bus: a high performance bus architecture for system-on-chips. IEEE Trans VLSI Syst 15(1):69–79
Article Google Scholar
Li Z, Hauck S (1999) Don’t care discovery for FPGA configuration compression. In: Proceedings of the 7th ACM/SIGDA international symposium on field programmable gate arrays (FPGA). ACM, pp 91–98
Google Scholar
Li Z, Hauck S (2001) Configuration compression for Virtex FPGAs. In: Proceedings of the 9th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, pp 147–159
Google Scholar
Lange H, Koch A (2010) Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans Comput 59:1363–1377
Article MathSciNet Google Scholar
Lemieux G, Lee E, Tom M, Yu A (2004) Directional and single-driver wires in FPGA interconnect. In: IEEE international conference on field-programmable technology (FPT)
Google Scholar
Lockwood JW, Naufel N, Turner JS, Taylor DE (2001) Reprogrammable network packet processing on the field programmable port extender (FPX). In: Proceedings of the 1998 ACM/SIGDA sixth international symposium on field programmable gate arrays (FPGA). ACM, pp 87–93
Google Scholar
Lübbers E, Platzner M (2008) Communication and synchronization in multithreaded reconfigurable computing systems. In: Proceedings of the international conference on engineering of reconfigurable systems and algorithms (ERSA), Jun 2008, pp 83–89
Google Scholar
Luk W, Shirazi N, Cheung PYK (1996) Modelling and optimising run-time reconfigurable systems. In: IEEE symposium on FPGAs for custom computing machines. IEEE Computer Society Press, pp 167–176
Google Scholar
Luk W, Shirazi N, Cheung PYK (1997) Compilation tools for run-time reconfigurable designs. In: Proceedings of the 5th IEEE symposium on FPGA-based custom computing machines (FCCM). IEEE Computer Society, pp 56–65
Google Scholar
Advanced RISC Machines Ltd (2007) AMBA system architecture. Available online: http://www.arm.com/products/solutions/AMBAHomePage.html
Landaker WJ, Wirthlin MJ, Hutchings B (2002) Multitasking hardware on the SLAAC1-V reconfigurable computing system. In: Proceedings of the 12th international conference on field-programmable logic and applications (FPL), pp 806–815
Google Scholar
Bourgeault M (2011) Alteras partial reconfiguration flow. Available online: http://www.eecg.utoronto.ca/jayar/FPGAseminar/FPGA_Bourgeault_June23_2011.pdf
Marescaux T, Bartic A, Verkest D, Vernalde S, Lauwereins R (2002) Interconnection networks enable fine-grain dynamic multi-tasking on FPGAs. In: Proceedings of the 12th international conference on field-programmable logic and applications (FPL), pp 795–805
Google Scholar
Mcmurchie L, Ebeling C (1995) Pathfinder: a negotiation-based performance-driven router for FPGAs. In: Proceedings of the 1995 ACM third international symposium on field-programmable gate arrays (FPGA), pp 111–117
Google Scholar
Metzgen P (2004) A high performance 32-bit ALU for programmable logic. In: Proceedings of the 2004 ACM/SIGDA 12th international symposium on field programmable gate arrays (FPGA). ACM, pp 61–70
Google Scholar
Mesquita D, Moraes F, Palma J, Möller L, Calazans N (2003) Remote and partial reconfiguration of FPGAs: tools and trends. In: Proceedings of the 17th international symposium on parallel and distributed processing (IPDPS). IEEE Computer Society, p 177.1
Google Scholar
Mignolet J-Y, Nollet V, Coene P, Verkest D, Vernalde S, Lauwereins R (2003) Infrastructure for design and management of relocatable tasks in a heterogeneous reconfigurable system-on-chip. In: Proceedings of the conference on design, automation and test in Europe (DATE). IEEE Computer Society, pp 986–991
Google Scholar
Marescaux T, Nollet V, Mignolet J-Y, Bartic A, Moffat W, Avasare P, Coene P, Verkest D, Vernalde S, Lauwereins R (2004) Run-time support for heterogeneous multitasking on reconfigurable SoCs. Integr VLSI J 38(1):107–130
Article Google Scholar
Mak TST, Sedcole NP, Cheung PYK, Luk W (2006) On-FPGA communication architectures and design factors. In: Proceedings of the 16th international conference on field programmable logic and applications (FPL). IEEE, pp 1–8
Google Scholar
Mak T, Sedcole P, Cheung PYK, Luk W (2008) Wave-pipelined signalling for On-FPGA communication. In: IEEE international conference on field-programmable technology (FPT)
Google Scholar
Majer M, Teich J, Ahmadinia A, Bobda C (2007) The erlangen slot machine: a dynamically reconfigurable FPGA-based computer. J VLSI Sig Process Syst 47(1):15–31
Article Google Scholar
Oliver TF, Maskell DL (2007) Prerouted FPGA cores for rapid system construction in a dynamic reconfigurable system. EURASIP J Embed Syst 2007(1):7 [ope] opencores http://www.opencores.org
Palma JC, de Mello AV, Möller L, Moraes F, Calazans N (2002) Core communication interface for FPGAs. In: Proceedings of the 15th symposium on integrated circuits and systems design (SBCCI), p 183
Google Scholar
Pan JH, Mitra T, Wong W-F (2004) Configuration bitstream compression for dynamically reconfigurable FPGAs. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 766–773
Google Scholar
Salomon D (2004) Data compression – The complete reference. Springer
Google Scholar
Suris J, Athanas P, Patterson C (2008) An efficient run-time router for connecting modules in FPGAs. In: Proceedings of international conference on field-programmable logic and applications (FPL), Sept 2008, pp 125–130
Google Scholar
Sedcole P, Blodget B, Becker T, Anderson J, Lysaght P (2006) Modular dynamic reconfiguration in Virtex FPGAs. IEE 153(3):157–164
Google Scholar
Sethuraman B, Bhattacharya P, Khan J, Vemuri R (2005) LiPaR: a light-weight parallel router for FPGA-based networks-on-chip. In: Proceedings of the 15th ACM great lakes symposium on VLSI (GLSVSLI), New York. ACM, pp 452–457
Google Scholar
Sedcole P, Cheung PYK, Constantinides GA, Luk W (2004) A structured methodology for system-on-an-FPGA design. In: 14th international conference on field programmable logic and application (FPL), Aug 2004. Volume 3203 of lecture notes in computer science, pp 1047–1051
Google Scholar
Silvia M, Ferreira JC (2008) Generation of partial FPGA configurations at run-time. In: Proceedings of international conference on field-programmable logic and applications (FPL), Sept 2008, pp 367–372
Google Scholar
Sunwoo J, Garimella S, Stroud C (2005) On embedded processor reconfiguration of logic BIST for FPGA cores in SoCs. In: Proceedings of the 14th IEEE North Atlantic test workshop (NATW), May 2005
Google Scholar
Silicore^TM (2002) WISHBONE System-on-Chip (SoC)interconnection architecture for portable IP cores Rev. B.3. Available online: http://www.opencores.org/projects.cgi/web/wishbone/wbspec_b3.pdf
Streichert T, Koch D, Haubelt C, Teich J (2006) Modeling and design of fault-tolerant and self-adaptive reconfigurable networked embedded systems. EURASIP J Embed Syst 2006:1–15. Article ID 42168, doi:10.1155/ES/2006/42168
Google Scholar
Simmler H, Levinson L, Manner R (2000) Multitasking on FPGA coprocessors. In: Proceedings of the 10rd international conference on field programmable logic and application (FPL), pp 121–130
Google Scholar
Stitt G, Lysecky R, Vahid F (2003) Dynamic hardware/software partitioning: a first approach. In: Proceedings of the 40th conference on design automation (DAC). ACM, pp 250–255
Google Scholar
Schallenberg A, Nebel W, Herrholz A, Hartmann P, Grüttner K, Oppenheimer F (2010) POLYDYN – object-oriented modelling and synthesis targeting dynamically reconfigurable FPGAs. In: Platzner M, Teich J, Wehn N (eds) Dynamically reconfigurable systems. Springer, pp 139–158
Google Scholar
The Spirit Consortium (2007) http://www.spiritconsortium.org/
Saldana M, Patel A, Liu HJ, Chow P (2012) Using partial reconfiguration and message passing to enable FPGA-based generic computing platforms. EURASIP J Embed Syst 2012(127302):10
Google Scholar
Storer JA, Szymanski TG (1982) Data compression via textual substitution. J ACM 29(4):928–951
Article MathSciNet MATH Google Scholar
Streichert T, Strengert C, Koch D, Haubelt C, Teich J (2007) Communication aware optimization of the task binding in hardware/software reconfigurable networks. Int J Circ Syst 2(1):29–36
Google Scholar
Scalera SM, Vázquez JR (1998) The design and implementation of a context switching FPGA. In: Proceedings of the IEEE symposium on FPGAs for custom computing machines (FCCM). IEEE Computer Society, p 78
Google Scholar
Snider GS, Williams S (2007) Nano/CMOS architectures using a field-programmable nanowire interconnect. Nanotechnology 18(3):1–11
Article Google Scholar
Sim JE, Wong W-F, Teich J (2009) Optimal placement-aware trace-based scheduling of hardware reconfigurations for FPGA accelerators. In: Proceedings of the 17th annual IEEE symposium on field-programmable custom computing machines (FCCM), Apr 2009. IEEE Computer Society (to appear)
Google Scholar
Synopsys Inc (2008) DesignWare GTECH library. Available online: www.synopsys.com/dw/doc.php/doc/dwf/manuals/dw_gtech.pdf
Trimberger S, Carberry D, Johnson A, Wong J (1997) A time-multiplexed FPGA. In: Proceedings of the 5th IEEE symposium on FPGA-based custom computing machines (FCCM). IEEE Computer Society, pp 22–28
Google Scholar
Teich J (2007) Lecture on reconfigurable computing: Chapter 1. inroduction, University of Erlangen-Nuremberg. Available online: http://www12.informatik.uni-erlangen.de/edu/rc/slides/1_RC_introduction.pdf
Tessier R (1998) Negotiated A* routing for FPGAs. In: Proceedings of the fifth canadian workshop on field-programmable devices (FPD)
Google Scholar
Tiwari A, Tomko KA (2003) Scan-chain based watch-points for efficient run-time debugging and verification of FPGA designs. In: Proceedings of the conference on Asia South Pacific design automation (ASPDAC). ACM, pp 705–711
Google Scholar
Virtual Socket Interface Alliance (2007) Legacy documents of the VSI alliance. Available online: http://vsi.org/
Vanmeerbeeck G, Schaumont P, Vernalde S, Engels M, Bolsens I (2001) Hardware/software partitioning of embedded system in OCAPI-xl. In: Proceedings of the ninth international symposium on hardware/software codesign. ACM, pp 30–35
Google Scholar
Wazlowski M, Agarwal L, Lee T, Smith A, Lam E, Athanas P, Silverman H, Ghosh S (1993) PRISM-II compiler and architecture. In: Buell DA, Pocek KL (eds) IEEE workshop on FPGAs for custom computing machines (FCCM). IEEE Computer Society Press, pp 9–16
Google Scholar
Wheeler T, Graham P, Nelson BE, Hutchings B (2001) Using design-level scan to improve FPGA design observability and controllability for functional verification. In: Proceedings of the 11th international conference on field programmable logic and application (FPL), pp 483–492
Google Scholar
Wirthlin M, Hutchings BL (1995) DISC: the dynamic instruction set computer. In: Schewel J (ed) Proceedings on Field Programmable Gate Arrays (FPGAs) for fast board development and reconfigurable computing (SPIE) 2607. SPIE – The International Society for Optical Engineering, pp 92–103
Google Scholar
Wirthlin MJ, Hutchings BL (1996) Sequencing run-time reconfigured hardware with software. In: Proceedings of the 1996 ACM fourth international symposium on field-programmable gate arrays (FPGA). ACM, pp 122–128
Google Scholar
Wang F, Jean J (2006) Architectural support for runtime 2D partial reconfiguration. In: Proceedings of international conference on engineering of reconfigurable systems and algorithms (ERSA), June 2006. CSREA Press, pp 231–232
Google Scholar
Wetekam G, Lutz B(2005) Hardware-Implementierung einer 3D-Huffman-Decodierung für dynamische Volumendaten. In: Hardware for visual computing workshop, Universität Tübingen, University of Tübingen
Google Scholar
Walder H, Platzner M (2002) Non-preemptive multitasking on FPGA: task placement and footprint transform. In: Proceedings of the international conference on engineering of reconfigurable systems and algorithms (ERSA), June 2002. CSREA Press, pp 24–30
Google Scholar
Walder H, Platzner M (2004) A runtime environment for reconfigurable operating systems. In: Proceedings of the 14th international conference on field programmable logic and application (FPL), pp 831–835
Google Scholar
Xilinx Inc (2007) Platform flash in-system programmable configuration PROMs. Available online: www.xilinx.com/support/documentation/data_sheets/ds123.pdf
Xilinx Inc (2007) Xilinx: silicon devices. Available online: www.xilinx.com/products/silicon_solutions/
Xilinx Inc (2011) Partial reconfiguration user guide (Rel 13.2). Available online: www.xilinx.com/support/documentation/sw_manuals/xilinx13_2/ug702.pdf
Xu W, Ramanarayanan R, Tessier R (2003) Adaptive fault recovery for networked reconfigurable systems. In: Proceedings of the 11th annual IEEE symposium on field-programmable custom computing machines (FCCM). IEEE Computer Society, p 143
Google Scholar
eferino CA, Kreutz ME, Susin AA (2004) RASoC: a router soft-core for networks-on-chip. In: Proceedings of the conference on design, automation and test in Europe (DATE). IEEE Computer Society, p 30198
Google Scholar
Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inform Theory 23(3):337–343
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Oslo, Oslo, Norway
Dirk Koch

Authors

Dirk Koch
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Koch, D. (2013). Reconfigurable CPU Instruction Set Extensions. In: Partial Reconfiguration on FPGAs. Lecture Notes in Electrical Engineering, vol 153. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1225-0_5

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1225-0_5
Published: 19 March 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1224-3
Online ISBN: 978-1-4614-1225-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics