1 Introduction

Pervasive computing is a growing trend for where devices with embedded microprocessors are used to enhance various aspects of our everyday lives (e.g., [43, 44, 57, 58]). Such devices operate on limited resources, and therefore, data processing, communication protocols and underlying technologies must be carefully chosen to meet strict operating requirements. Considering, however, that the information they handle is in many cases private or safety critical and must be appropriately protected from malicious attackers, appliance of secure cryptographic components becomes imperative. Lightweight cryptography (LWC) investigates the integration of cryptographic primitives and algorithms into constrained devices [55, 59].

Between the two main categories of cryptographic algorithms, symmetric and asymmetric key, the former exhibit good performance and are widely used for confidentiality, integrity checks and authentication protocols. The tested robustness and known levels of security of well-known block ciphers, like AES [132], make them good candidates as long as they can be adjusted to meet the corresponding resource constraints. AES is the standard symmetric-key cipher used in cryptographic applications. Newer block ciphers designed specifically for this domain are gaining ground, introducing novelties and improving the overall efficiency.

In the following sections, we present a survey of lightweight block ciphers for embedded systems as well as their implementations (in hardware and software). Similar surveys on LWC were first carried out in 2007. In [36] and [96], the authors evaluate hardware and software implementations for lightweight symmetric and asymmetric cryptography. In [118], the authors investigate the lightweight hardware and software solutions for wireless sensor networks (WSNs), i.e., groups of highly constrained hardware platforms. In [105], the authors report new trends for lightweight hardware block and stream ciphers. In [78], hardware implementations of block ciphers are examined, while in [27] and [37], the authors implement and evaluate lightweight block ciphers on the same platform for fair comparison. Software implementations of lightweight block ciphers on three different platforms are presented in [35]. The latest survey on cryptanalytic attacks on lightweight block ciphers was carried out in [8].

This paper is a comprehensive survey in LWC and includes recently proposed ciphers. We further present the latest trends in hardware/software implementations of lightweight block ciphers and summarize the state-of-the-art design directions. The latest cryptanalysis results are mentioned, and the vulnerable ciphers are reported. We analyze the features of block ciphers and identify the more suitable ciphers for different types of embedded devices. Based on the devices’ capabilities, we categorize the implementations in three groups: ultra-lightweight, low cost and lightweight. Ultra-lightweight implementations fit in the most constrained devices (in terms of computation capability, memory, power), like the standard 8051 microcontroller and the ATtiny45. Low-cost devices are affordable and perform a little better than ultra-lightweight ones, like the ATmega128. Lightweight devices include the remaining devices that are reported in LWC.

Hardware-based algorithm implementations are categorized based on chip area and complexity. Ultra-lightweight implementations occupy up to 1000 logic gates, low-cost implementations occupy up to 2000 logic gates and lightweight implementations occupy up to 3000 logic gates. The software implementations are categorized based on the ROM and RAM requirements. Ultra-lightweight implementations require up to 4KB ROM and 256 bytes RAM, low-cost implementations require up to 4KB ROM and 8KB RAM, and lightweight implementations require up to 32KB ROM and 8KB RAM. Finally, we record the implementation characteristics of the latest proposals in hardware/software and create benchmarks based on metrics.

The remainder of this article is structured as follows. In Sect. 2, we briefly introduce LWC and denote the evaluation metrics we applied for categorizing the different block cipher proposals. Section 3 presents the evolution of LWC in symmetric-key block ciphers. Section 4 proposes the more suitable implementations in the three proposed platform, and Sect. 5 concludes this work. Finally, “Appendix” summarizes the general features and cryptanalysis results of each cipher and presents the benchmarks for the hardware and software implementations.

2 LWC implementations

2.1 Brief overview

Embedded devices have inherent limitations in terms of processing power, memory, storage, connectivity and energy consumption [45, 95]. While ordinary cryptography focuses in providing high levels of security, lightweight cryptography also needs to consider the aforementioned restrictions, which can affect design and implementation choices.

Hardware LWC implementations try to achieve the required functionality with the minimum amount of hardware real-estate. These designs are suitable for ultra-constrained devices that perform specific tasks. The efficiency depends on a number of aspects analyzed below, like design complexity, CMOS technology, power and energy consumption, and throughput.

The implementation’s complexity is one of the most dominant factors that affect the design approach. It is determined by the logic gates that are required to implement a cipher. The relevant metric is called gate equivalent (GE). GE is a unit of measurement that specifies the manufacturing technology-independent complexity of digital electronic circuits. In CMOS technologies, the silicon area of a NAND2 gate usually constitutes the technology-dependent unit area commonly referred to as gate equivalent. Except for expressing complexity, the number of the logic gates or the GE metric reflect to a portion of the chip area that is occupied by the hardware implementation. Thus, they are usually utilized as an intuitive way to express the chip area of a design. A specification in GE for a certain circuit represents a complexity measure, for which the corresponding silicon area can be deduced for a dedicated manufacturing technology. The authors in [121] and [145] suggest that approximately 250–4000 logic gates, out of the total 1000–10,000 logic gates commonly present on RFID tag spaces, can be used for security-related tasks. For lightweight devices, up to 3000GE can be acceptable, while for even smaller devices, like low-cost RFIDs and 4-bit microcontrollers, 2000GE and 1000GE are proposed, respectively (Table 4).

Table 1 Characteristics of different CMOS technology nodes for commercial standard-cell libraries [138]
Table 2 Characteristics of different microcontroller platforms [118]

CMOS technology is another factor that affects the implementation characteristics. Different technologies and standard-cell libraries produce different results. For example, the same implementation of PRESENT that is presented in [117] produces 1075GE on \(0.18\,\upmu \mathrm{m}\), 1169GE on \(0.25\,\upmu \mathrm{m}\) and 1000GE on \(0.35\,\upmu \mathrm{m}\) CMOS technology.

Software implementations on the other hand target less constrained devices. One of the main goals of software implementations is to keep memory and CPU needs as low as possible in order to minimize power consumption and the compatible devices’ cost.

Software implementations are optimized for throughput and energy savings by requiring fewer cycles, since voltage and clock frequency are usually fixed for microcontrollers. These implementations are included in cryptographic libraries for embedded systems [47, 56].

Memory restrictions, however, are bound to negatively affect performance. As small memory elements are utilized, more cycles are needed to execute an operation. While in hardware implementations, the 4-bit S-boxes are a popular option [7, 21, 39, 48, 51, 111, 124, 135, 148, 150], in software, the use of 8-bit S-boxes has also been proposed [7, 24, 28, 37, 99, 103]. The reduction in cycles count is considered significant despite the use of more space.

For active devices, like wireless sensor nodes that have their own power supply, energy consumption is a significant aspect. For passive devices, like contactless smart cards and RFID tags that do not have their own power supply and must adapt to the host device’s constraints, power consumption is the main concern.

Power is important as it is related to two main issues: the power consumption of a device and attacks related to power analysis. When frequency is fixed at a low value, like 100 kHz that is usually examined in LWC research (Table 4), the power consumption is directly connected to chip area [110]. A small area predisposes that the circuit will consume low power. Table 1 summarizes the GE and power characteristics of standard-cell libraries. Power deviates by a factor of two to three across different technologies. Table 2 summarizes the general frequency, RAM, ROM and power characteristics of different microcontroller platforms in the sensor network market.

Researchers aim to make the power usage profile of their implementations independent of the secret key, in order to withstand simple and differential power analysis [46, 54, 102]. As a result, the power consumption is increased. In applications where differential power analysis (DPA) [81] is a threat, researchers attempt to counter DPA while trying to respect the power constraints. In less critical applications, power consumption is considered a priority while the countermeasures for DPA are less important.

Timing requirements are also imperative for deploying several special domain applications [80, 150] and ISO/IEC standard protocols [39]. The ideal lightweight cipher should occupy a small circuit area, consume the least amount of resources and power and perform as fast as possible.

2.2 Fair comparison

In order to fairly compare different ciphers, several factors should uphold:

  • The compared implementations need to provide the same security level.

  • For hardware, the benchmarking setup, including the CMOS library, the synthesis scripts, the synthesis tools need to be fixed and not to favor any of the competing ciphers.

  • Similarly, the same benchmarking machines need to be applied for software.

  • There are plenty of implementation choices (e.g., serial and round based), and the designs are optimized for specific evaluation metrics. This may lead to different architectures for different metrics.

  • The reported results should ideally be reproducible by the wider community.

In order to fairly compare different ciphers, the aforementioned factors should all be considered, as it is difficult to implement all the proposals on the same platform.

On the same issue, the authors in [52] studied the impact of technology and standard-cell libraries in the hardware implementation of hash functions. They conclude that accurate comparisons between different hardware implementations can be achieved when the same technology has been used, even if different standard-cell libraries are used. On the other hand, significant inaccuracy could be noticed when using different technologies. Similar observations are mentioned for all other reported features, like performance and power consumption. The authors of said work also note that only latency is independent of technology, while power and area depend on technology. Throughput may depend on technology when the maximum throughput is needed, as the maximum frequency depends on the technology being used. Similar remarks can be deduced for software implementations.

A comparative study is conducted in Sect. 5. The ciphers with low security level are initially indicated to draw the reader’s attention. In order to provide a fair comparison, the different implementations are initially grouped, based on the deployed technology in hardware and software. Also, the implementations with the worst features (e.g., high energy consumption or low throughput) are excluded. Among the efficient ciphers, the implementations of the same technology with the same key and datapath size are retrieved and compared.

Then, inter-technology evaluation is performed based on a subset of ciphers that are implemented in more than one technologies. For example, in hardware TWINE is only implemented in \(0.09\,\upmu \)m and RECTANGLE only in \(0.13\,\upmu \)m. PRESENT is implemented in both platforms. Based on the aforementioned analysis, PRESENT is more efficient than TWINE in \(0.09\,\upmu \)m and PRESENT is less efficient than RECTANGLE in \(0.13\,\upmu \)m. Thus, we derive that in general, RECTANGLE is also more efficient than TWINE.

2.3 Evaluation metrics

In Sect. 3, we compare several lightweight ciphers based on a number of metrics presented in this section. We summarize the details of 127 hardware and 233 software implementations. Specifically, we investigate security, cost and performance features. The best ciphers are those who provide the appropriate level of security and balance the tradeoffs between cost and performance. Some metrics are generic, while others are tied to the hardware implementations (e.g., hardware technology and GE) and some to software implementations (e.g., RAM/ROM size). The metrics used for the comparison presented in this paper are the following:

  • Security level: It is a logarithmic measure of the fastest known computational attack on an algorithm and is measured in bits. In most cases, the level is identified by the key length in bits. The level of security cannot exceed the key length, but it can be smaller. Initially, a cipher’s developers estimate its security level, which can be updated accordingly, based on reported attacks.

  • Hardware technology: It is the CMOS technology used to implement the cipher with the corresponding occupied area being measured in \(\upmu \)m. The technologies most commonly used in LWC research papers are 0.13 and 0.18 \(\upmu \)m (Appendix). The complexity and the area occupied by the hardware implementation are described by the Gate Equivalent (GE) metric and depend on the technology used. The GE metric is calculated by dividing the layout area of an implementation in \(\upmu \mathrm{m}^{2}\) by the corresponding area of a NAND2 gate.

  • Throughput: It stands for the Kb/s the cipher’s encryption/decryption operation achieves at a specific frequency. Most of the research papers in LWC utilize frequencies of 100 KHz for hardware and 4 MHz for software. We record the throughput of each evaluated implementation. In cases where an implementation uses a different frequency, the reported throughput is projected at these two frequencies.

  • Latency: It defines the number of clock cycles required to compute a single block’s plaintext/ciphertext.

  • Power and energy consumption: We evaluate the power of hardware implementations in \(\upmu \)W. The power can be roughly estimated based on the GE and the hardware technology details referred to in Table 1. For software implementations, we consider an average power of 0.004 and 0.00135 \(\upmu \)W for 8- and 16-bit microcontrollers in 4 MHz frequency with 0.9 V voltage, respectively, based on Table 2. Energy consumption per bit for both hardware and software implementations is calculated by the formula:

    $$\begin{aligned} Energy\,[\upmu \mathrm{J}]= & {} (Latency\,[cycles/block]\nonumber \\&\times Power\,[\upmu \mathrm{W}])/block\,size\,[bits] \end{aligned}$$
    (1)

    where latency is the number of clock cycles that are required to encrypt a block, power is the \(\upmu \)W that are consumed by the hardware or software implementation, and block size is the size of data in bits that each cipher can process in one encryption/decryption operation.

  • RAM/ROM memory: The amount of RAM and ROM (in bytes) the algorithm requires. RAM bytes represent the resources required to store the intermediate state. ROM bytes represent the code size which is actually the size of the implementation.

  • Efficiency: Indicates the trade-off between performance and implementation size. The higher the value, the better. For hardware implementations, the metric is calculated by the formula:

    $$\begin{aligned} Hardware\,Efficiency= & {} Throughput\,[Kbps]/\nonumber \\&Complexity\,[KGE] \end{aligned}$$
    (2)

    where throughput is the Kb/s the cipher’s encryption operation achieves at 100 KHz frequency and the complexity is the value of the chip area and complexity in KGE.

    For software implementations, the metric is calculated by the formula:

    $$\begin{aligned} Software\,Efficiency= & {} Throughput\,[Kbps]/\nonumber \\&Code\,size [KB] \end{aligned}$$
    (3)

    where throughput is the Kb/s the cipher’s encryption operation achieves at 4 MHz frequency and code size is the size of the executable code in KB.

3 Block ciphers

3.1 General information

There are five basic types of block ciphers based on their inner structure: Substitution Permutation Networks (SPNs), Feistel Networks, Add-Rotate-XOR (ARX), NLFSR-based and hybrid. AES is the best known cipher that adopts the SPN structure, DES is the best known Feistel-type cipher, while IDEA is the best known ARX cipher, KeeLoq is the best known NLFSR-based cipher, and the best known hybrid ciphers are those of the hummingbird family.

SPNs process plaintext through a series of sequential substitution and permutation boxes that transform the data and prepare them for the next round. SPN ciphers include AES [132], NOEKEON [32], ICEBERG [130], mCrypton [91], PRESENT [21], PUFFIN-2 [142], PRINTcipher [80], Klein [48], LED [51], EPCBC [150], PICARO [107], PRINCE [22], Zorro [46], RECTANGLE [155], \(\hbox {I-PRESENT}^{\mathrm{TM}}\) [154] and PRIDE [4].

Feistel networks perform a diffusion function on half of the data of each block, resulting in a smaller round function. Additional logic is needed to apply the diffusion function to the untransformed state, such as a bit-wise XOR operation, which requires 2.5–3 GE per bit. SPNs do not have this extra overhead, and as a result, serialized SPN ciphers are likely to achieve smaller datapaths. Feistel ciphers include DES [131] and its variants [87], GOST [111], TEA [146] and its variants [101, 147], Camellia [9], SEA [129], CLEFIA [125], KASUMI [42], MIBS [66], TWIS [103], TWINE [135], LBlock [148], Piccolo [124], SIMON [15], ITUbee [74], FeW [83], Robin [49], Fantomas [49] and HISEC [5].

SPN and Feistel are the most widely used types (Appendix). As will be made clear in the following sections, many of the proposed lightweight Feistel-type ciphers suffer from security problems, in contrast to SPN-type ones. Still, Feistel networks offer both encryption and decryption with little cost, but in many tag-based applications, decryption functionality is rarely required. An encryption-only SPN cipher still remains a strong competitor, if not the choice of preference, for the LWC field.

ARXs use addition, rotation and XORs with no S-boxes. They produce compact and fast implementations, but their security properties are not well studied, especially when compared to SPN and Feistel ciphers. ARX designs are IDEA [84], HIGHT [60], SPECK [15], LEA [61] and BEST-1 [67].

NLFSR-based ciphers utilize the building blocks of stream ciphers. They are mostly used for hardware implementations. The security of their inner components is based on stream cipher analysis. KeeLoq [63], KATAN and KTANTAN family [25] and Halka [34] have a stream cipher structure.

Hybrid ciphers combine the three aforementioned types to improve specific parameters, like throughput. Their analysis is determined by the selected cipher types. The hummingbird family [38, 39] is a special case with hybrid structure of stream and block cipher. PRESENT-GRP [12] is another hybrid design that combines the PRESENT SPN with the bit permutation of a grouping permutation (GRP) structure.

Table 3 indicates the general characteristics of each examined cipher (key size, block size, rounds of operation and cipher’s type) with references to the recorded cryptanalysis and attacks. Tables 4, 5, 6, 7, 8, 9 and 10 summarize the implementation characteristics of lightweight block ciphers in hardware and software, respectively. In Tables 4, 5, 6 and 7, the hardware implementations are categorized according to the supporting key size, the type of the block cipher, and the block size. We also indicate the parameters of latency, throughput, technology, GE, hardware efficiency, power, energy and cipher’s type. In Tables 8, 9 and 10, the categorization of software is only based on the key and block size as the cipher’s type column has been omitted. We further examine the parameters of ROM size, RAM size, latency, energy, throughput and software efficiency

The field “Type” indicates the inner structure of the cipher. The following classes are used: SPN, Feistel, GFN (generalized feistel network), ARX, NLFSR and hybrid.

The implementations presented are encryption-only, unless stated otherwise. Furthermore, the following notations are used:

  • (S) for serialized implementations

  • (D) for implementations that offer both encryption and decryption

  • (H) when the key is hard-wired on the device

4 Chronicles

We categorize the lightweight block cipher proposals in three time periods based on the general LWC features and design goals. The ciphers of each period are grouped based on their type and are referred to in chronological order.

4.1 Period 1: early lightweight applications

In 1980s and 1990s, cryptography on mainstream computers was investigated and the first cryptographic standards were established. Embedded systems usage was limited. Lightweight security was provided by compact implementations of mainstream ciphers and some early lightweight proposals. These solutions target specific applications, like GSM security and remote keyless systems. The first attacks exposed the vulnerabilities of the lower security settings and established the cryptanalysis principles in this domain.

4.1.1 SPN ciphers

AES is considered a landmark in traditional cryptography and, thus, cannot be ignored in the context of LWC. It uses 128-bit blocks with 128-, 192- and 256-bit keys through 10, 12 and 14 rounds, respectively. Latest achievements in AES hardware implementations based on a serialized AES S-box [99] use 2400GE and 226 cycles per block. The S-box is based on Canright’s research [26], who thoroughly investigated the hardware requirements for the AES S-box. Canright proposes a very compact S-box that is composed of smaller fields. The S-box that is used in the lightweight version is further minimized in area by the replacement of D flip-flops and MUX with scan flip-flops. The scan flip-flops are mainly utilized in the construction of the state and the key array. Also the area requirements of the control logic are reduced by the replacement of the FSM with a LFSR. The authors also show that row-wise processing is more hardware-friendly than column-wise. The cipher remains safe with biclique cryptanalysis achieving slightly better results than exhaustive search [20].

In [37], a compact software implementation of AES is proposed. Registers are used to store the internal state and the mix column step, while the key is stored in RAM. The S-box and the round constants are implemented as look-up tables. Also shift and XOR operations are used for the multiplications performed for mix column. The implementation requires 1659 bytes of ROM and needs 4557 cycles to encrypt a 128-bit block.

NOEKEON [32] uses 128-bit keys and blocks through 16 identical rounds. A related-key cryptanalysis was presented in [79] and as a result the cipher was rejected by the NESSIE project. Later, the designers of NOEKEON argued that the presented attacks were not practical and that the cipher remains safe [33]. The first hardware implementation [108] occupies 2880GE and is suitable for lightweight devices. In software [37], it requires 364 bytes of code for 21.7 Kbps throughput and is suitable for 32-bit processors. It is an early involutive cipher (uses the same datapath for encryption and decryption, allowing the same circuity to be reused for the inverse operation) whose design is explored by newer proposals.

ICEBERG [130] is a fast involutive cipher. It uses 128-bit keys with 64-bit blocks through 16 rounds. ICEBERG is optimized for reconfigurable hardware implementations as it allows changing the key at every clock cycle without any loss in performance and the deriving of the round keys on the fly. It produces very efficient combinations of encryption/decryption and requires 5800 gates for 400 Kbps of throughput [28]. Differential cryptanalysis on the 8-round version is the best known attack [134]. The overall design is investigated by newer involutive ciphers to provide low-cost encryption/decryption functionality.

4.1.2 Feistel ciphers

DES is one of the first ciphers to be investigated for LWC. It uses 56-bit keys with 64-bit blocks through 16 rounds. The disadvantage of DES compared to AES is the smaller key size (i.e., 56-bits), yielding a lower security level (also broken under linear cryptanalysis [72]). On the other hand, smaller key sizes allow for smaller inner structures and low area footprint. The DESX variant [87] uses key whitening to increase the security level and render brute force attacks impossible; it uses 184-bit keys for the same block size and rounds. Hardware implementations of DES and DESX are presented in [87], and their cost is 2309GE and 1848GE, respectively.

Older software implementations are presented in [36]. Newer, more compact software implementations are presented in [37] and [115].

TEA (Tiny Encryption Algorithm) [146, 153] (2355GE) uses 128-bit keys with 64-bit blocks and 64 rounds. It is notable for its efficiency in power–energy–memory, its simplicity and ease of implementation. TEA needs 648 bytes of code [37], requires 7408 cycles of encryption and does not use S-boxes. The weak points of TEA, identified in [75] and [113], are the equivalent keys attack and its bad performance as a hash function. XTEA (eXtended TEA or Block TEA) [73, 101] was proposed to overcome these weaknesses. Among others, XTEA operates on arbitrary size blocks and utilizes a more complex key scheduling procedure. A related-key rectangle attack on 36 rounds of XTEA is presented in [93]. At a later stage, XXTEA (Corrected Block TEA) [147] was proposed, but a chosen-plaintext attack based on differential analysis against the full-round cipher was later presented [151]. Even though these ciphers are designed for software implementations, hardware implementations are reported in [73] for XTEA with 3490GE, which well exceed the 3000GE limit.

Camellia [9] is developed by Nippon Telegraph and Telephone Corporation and Mitsubishi Electric Corporation and is approved by ISO/IEC, IETF, the projects NESSIE and CRYPTREC, and is adopted in Japan’s new e-Government Recommended Ciphers List. It became popular as it achieves similar security level and processing capabilities with AES. It uses the same block and key sizes as AES through 18 or 24 rounds. In LWC, it is mainly studied for the fast software implementations as the hardware implementation [123] exceeds the 3000GE bound (at 6511GE). In software [24], Camellia requires 1262 bytes of code and 64 cycles for encryption. Cache timing attacks in software implementations were presented in [158].

4.1.3 ARX ciphers

IDEA (International Data Encryption Algorithm) [84] uses 128-bit keys with 64-bit blocks through 8.5 rounds, and all data operations are performed in 16-bit unsigned integers. To reduce the memory overhead, IDEA does not contain S- and P-boxes. It is based on XOR, addition and modular multiplication operations. It has been implemented in hardware for encryption in high-speed networks [137]. However, these designs are not suitable for embedded devices. Yet, IDEA performs well in embedded software and is used in PGP v2.0. Its most compact implementation occupies 596 bytes of code for 94.8 Kbps throughput [100]. Similarly to AES, biclique attacks are slightly faster than exhaustive search [76].

4.1.4 NLFSR ciphers

KeeLoq [63] is widely used in remote keyless entry systems. It was created by Gideon Kuhn in the 1980s. The cipher is dedicated to hardware and utilizes a nonlinear feedback shift register (NLFSR). It uses 64-bit keys with 32-bit blocks, a 32-bit initialization vector (IV), and processes the data for 528 rounds. Practical key recovery based on a slide attack and a novel meet-in-the-middle attack is presented in [63].

4.2 Period 2: first LWC generation

The first LWC generation covers the years 2005–2012. In this era, embedded systems are gaining ground and adequate general purpose security becomes imperative. The foundations of LWC are set, and a high variety of ciphers are designed specifically for this domain. An arm race is held where the area reduction is one of the most targeted design goals. Mostly encryption-only implementations are evaluated, with speed and power optimizations also taking place. PRESENT is the new milestone as the first lightweight cipher that reached the bound of around 1000GE area. Speed- and power-optimized ciphers are also proposed. Cryptanalysis is evolved as new attacks are efficiently applied in lightweight primitives. The period ends by the formal representation of LWC, and the establishment of the ISO/IEC standard for LWC that includes the block ciphers PRESENT and CLEFIA.

4.2.1 SPN ciphers

mCrypton (miniature of Crypton) [91] is a compact edition of Crypton [90]. It uses smaller block size (64-bit) through 13 rounds and variable key sizes (64-, 96- and 128-bit) to comply with the new constraints in ubiquitous computing devices. It is designed for low-cost RFID tags and sensors and exhibits low-power and compact implementations in both hardware and software. A related-key rectangle attack was reported in [106] for the 8-round mCrypton.

PRESENT [21] meets lightweight and ultra-lightweight requirements. It is one of the first ciphers implemented on ultra-constrained devices with almost 1000GE (encryption-only) [117]. PRESENT is a milestone in the evolution of lightweight block ciphers and is used along with AES as a benchmark for newer proposals. As with CLEFIA, PRESENT is also standardized in ISO/IEC 29192. It uses 80- and 128-bit keys with 64-bit blocks through 31 rounds. It has an SPN structure and needs 1030GE for 80-bit keys of both encryption. The main feature is the replacement of the ordinary eight S-boxes with a carefully selected single S-box. PRESENT is the first cipher that applied this serialized architecture, and its design is extremely hardware efficient, since it uses a fully wired diffusion layer without any algebraic unit.

PRESENT software implementations on different platforms have also been studied [110]. The newest software proposal [37] decreases the code size, retains the throughput and performs both encryption and decryption. The code size is 1000 bytes, and the algorithm requires 11,342/13,599 cycles for encryption/decryption on an 8-bit microcontroller. The implementation stores the round key and the state in registers. Two 256-byte tables are used for the encryption and decryption S-boxes to permit parallel lookups, while the code for permutation is utilized in both operations.

On the downside, side-channel attacks [114, 149] and a related-key attack [104] have been reported on the 17 round of PRESENT. Improved differential fault analysis was presented in [71], which recovered the key by inducing two or three 2-byte random faults. Biclique cryptanalysis on the full-round cipher is slightly better than exhaustive search [69]. A truncated differential attack on the reduced 26-round cipher was presented in [19].

PUFFIN-2 [142] is based on its predecessor cipher PUFFIN (2303GE) [29] (the latter is broken under statistical saturation attacks [85]). It is designed to be implemented exclusively with a serialized architecture and supports the same key and block size as PRESENT through 34 rounds. PUFFIN-2 is faster and has smaller footprint (1083GE), while providing both encryption and decryption functionality; it is an involutive cipher using the same datapath for encryption and decryption. The cipher is resistant to related-key attacks, since at key scheduling the relevant permutation layers are not regularly distributed among rounds. Differential cryptanalysis on the full cipher is slightly better than exhaustive search [18].

Klein [48] uses 64-bit blocks with 64-, 80- and 96-bit keys through 12, 16 and 20 rounds, respectively. It targets legacy sensor platforms. The main goal was to achieve high software-based performance while keeping a compact hardware footprint. As, in general, sensors are more powerful than RFID tags, software implementations are more practical. For its inner structure, several choices are made to balance the small area in hardware and software performance. Thus, bit-shifting operations, used by many hardware compact ciphers, are avoided. Byte-oriented matrix multiplication operations are opted for because of their software efficiency on 8-bit processors.

Practical chosen-plaintext key-recovery attacks have been successful for up to 8 rounds of Klein-64 [10]. They exploit the existence of differentials of high probability deriving from the combination of the 4-bit S-box and the byte-oriented MixColumns operation. An asymmetric biclique attack on the full version cipher exploits weaknesses of the diffusion layer and key schedule [3]. The worst case computations are \(2^{62.84}\) with \(2^{39}\) data complexity. A modified version of the attack, with higher data requirements, is slightly faster.

LED (Lightweight Encryption Device) [51] is designed to achieve small hardware footprint while keeping reasonable software performance. It uses 64-, 80-, 96- and 128-bit keys with 64-bit blocks through 32 and 48 rounds. It is an AES-like cipher, and the authors apply some newer trends from the field of lightweight hash functions based on block ciphers. LED uses no key scheduling process, the PRESENT S-box, the row-wise processing as described for lightweight AES [99] and the mix column approach of the hash function PHOTON [50]. The absence of key scheduling is also proposed in CGEN [116]. This approach offers significant chip area reduction for hardware implementations but may give rise to serious security issues, like related-key attacks [16]. The authors study and address these vulnerabilities. The research for the absence of key scheduling is the main contribution of the cipher, though, as reported, further independent cryptanalysis must be performed. However, the reduced round cipher is vulnerable to biclique cryptanalysis [69]. Differential fault analysis based on Super-S-box techniques obtain significant improvements for fault attacks [156]. The search space for the 128-bit key exhaustive search is reduced on an average of \(2^{21.96}\).

4.2.2 Domain-specific SPNs

Domain-specific ciphers like PRINTcipher [80] and EPCBC [150] are implemented to meet the cryptographic requirements of specific applications, such as integrated circuit (IC) printing and electronic product code (EPC) encryption, respectively. IC printing is used for the production and personalization of circuits at very low cost. EPC is an industry standard by EPCglobal [41] and is considered as a replacement for barcodes using low-cost passive RFID. In its smallest form, it uses 96-bit keys.

PRINTcipher uses 80- and 160-bit keys with 48- and 96-bit blocks through 48 and 96 rounds, respectively. It is developed for both domains, i.e., PRINTcipher-48 (402GE) is designed for IC printing applications, while PRINTcipher-96 (726GE) for EPC encryption. It uses 3-bit operations. As computer architectures do not use odd number of bits, a software implementation would use redundant resources. It is noted that PRINTcipher was released in order to investigate this application domain and is not ready for actual deployment yet. The authors are not concerned about related-key attacks as they are considered unrealistic for IC printing applications. Such attacks on the full-round cipher were presented in [89].

EPCBC is a PRESENT-like cipher that uses 96-bit keys with 48- or 96-bit block size through 32 rounds. The main contribution is the adjustment of an improved PRESENT version with the 96-bit keys aiming to be used for EPC encryption. The authors took the security analysis of PRESENT into consideration. Thus, EPCBC uses an optimized key scheduling procedure that is more secure against related-key differential attacks. The most lightweight version of the cipher costs 1008GE, and to our knowledge, it is the most suitable cipher for EPC encryption. Reduced round versions are vulnerable to practical algebraic cryptanalysis [141].

Ciphers like AES and the original PRESENT support smaller or larger key sizes than 96 bit. Assuming that 96-bit keys are adequate, smaller keys do not offer the required level of security, while larger keys produce larger-than-necessary implementations due to redundant components. Other ciphers that support 96-bit keys are SEA (3758GE), mCrypton (2420GE), Klein (1528GE) and LED (1116GE). The older ciphers, i.e., SEA and mCrypton, produce larger implementations and are not suitable for EPC encryption, although mCrypton is the most efficient.

4.2.3 Feistel ciphers

SEA (Scalable Encryption Algorithm) [129] is designed for scalable software implementations on constrained devices, being parameterized in text, key and processor size. It uses low-cost encryption routines that are suitable for processors with a limited instruction set. The design goals were to meet low memory requirements, small code size and a limited instruction set. Other features are the efficient combination of encryption–decryption, the ability to derive keys on the fly and the straightforward implementation in assembly code. Its most compact software implementation on 8-bit microcontrollers [37] requires 426 bytes of code and 41,604 cycles for encryption. In [94], the authors propose a hardware implementation where they demonstrate a fully generic VHDL design to achieve the algorithm’s scalability. The most lightweight version requires 3758GE.

CLEFIA [7, 125] is another lightweight block cipher, known for its highly efficient hardware and software implementations. It was designed by SONY and is standardized in ISO/IEC 29192. CLEFIA uses 128-bit blocks with 128-, 192- and 256-bit keys through 18, 22 and 26 rounds, respectively. The most compact implementation occupies 2488GE (encryption-only) for a 128-bit key. It follows a serialized architecture without requiring any additional registers. Decryption can be implemented with an 116GE overhead. The most lightweight encryption/decryption version occupies 2604GE, which is 23% smaller than the corresponding AES-128 version. The authors apply clock gating techniques to reduce the number of multiplexers and increase the cycle counts. Furthermore, they adopt some older ideas utilized in compact AES implementations. For a compact matrix multiplier, operations are computed column by column. Scan flip-flops replace D flip-flops and MUX to further reduce area. Improbable differential attacks on reduced round versions achieve the best cryptanalysis results and are slightly better than exhaustive search [136].

KASUMI [42] is implemented for cryptography in GSM, UMTS and GPRS. It uses 128-bit keys with 64-bit blocks and processes the data in 8 rounds. A differential-based related-key attack was presented in [17].

Two newer DES and DESX variants, called DESL (DES Lightweight) and DESXL [87], are also presented for the same key and block sizes and rounds. DESL reduces the gate complexity by replacing the 8 original S-boxes with a single one, thus removing 7 S-boxes and a multiplexer. The single S-box is repeated 8 times and is designed to resist against differential, linear and the Davis–Murphy attacks. It achieves a size reduction of 20%. Also, it utilizes serial hardware techniques to reduce gate complexity. DESXL is the combination of DESL and DESX and features an 18% size reduction compared to DESL. Hardware implementations of DESL and DESXL are presented in [87], and their cost is 2629GE and 2168GE, respectively. Software implementations are presented in [37, 115]. For a compact implementation of DESXL, they propose the use of a function which can compute all permutations and expansions depending on the calling parameters, while producing 6-bit outputs as direct input to S-boxes. The permutation and expansion tables are stored and processed from flash memory.

MIBS [66] supports 64- and 80-bit keys with 64-bit blocks through 32 rounds. It has a Feistel structure with an SPN round function, utilizing the S-box of mCrypton [91]. The permutation is operated in nibbles and the F-function operates on half a block. The key scheduling is based on PRESENT. The authors take the side-channel and related-key attacks reported for PRESENT into consideration. They secure MIBS by using a round-dependent counter to mix the contents of the key register. However, many types of attacks [14] have come close to a complete compromise, namely linear attacks up to 18 rounds, first ciphertext-only attacks on 13 rounds, differential analysis up to 14 rounds and impossible differential attacks up to 12 rounds. Although the full cipher is not broken, these attacks reduce its security level by more than 50%.

TWIS [103] is based on the CLEFIA cipher. A differential distinguisher with probability 1 for its full-round version was reported in [133].

Attempting to create a lightweight version of the Soviet GOST cipher, authors in [111] managed to produce a 651GE implementation. It uses 256-bit key and 64-bit block size and is a Feistel network with 32 rounds. The main contribution of this proposal is the adaptation of the PRESENT S-box that decreases the GE metric. The authors chose not to use simple wiring for permutation to reduce area, since that would result in weaker differential and linear properties. Recently, vulnerabilities were published and the cipher is theoretically broken by an improved three-subset meet-in-the-middle attack [64]. In [30], the authors further analyze differential attacks on GOST and present a single-key attack which is the fastest attack to our knowledge.

LBlock [148] uses 80-bit keys and 64-bit blocks through 32 rounds. It produces ultra-lightweight implementations in both hardware and software. In hardware, it needs 1320GE for 200 Kbps throughput. In software, assuming an 8-bit microcontroller, it takes 3955 clock cycles to encrypt a single block. For diffusion, the authors chose to use only half of the data in each round and a simple rotation for the other half. This is applied efficiently on hardware and software. Permutation operations are implemented with no cost of GE in hardware. A 4-bit word-wise permutation is a bit-wise permutation, as it can be implemented efficiently in software. Therefore, diffusions and permutations are efficient in both hardware and software. Key scheduling was designed in a stream cipher way.

Biclique cryptanalysis has been performed against the full-round LBlock [143]. The complexity of the attack is slightly lower than exhaustive key search. The modification of the key scheduling operation is proposed as a countermeasure to prevent the attack [143]. Improved differential fault attacks were presented in [70] that recover the key in a few seconds to one hour on a general PC. A round addition differential fault analysis reconstructs the key by utilizing one correct and two faulty ciphertexts [152].

TWINE [135] is a 64-bit block cipher with 80- and 128-bit keys through 36 rounds. It produces a compact hardware implementation of 1866GE. It is a GFN and implements a unified encryption and decryption functionality. The authors aim to balance performance on both hardware and software. For this reason, similar choices to LBlock have been made. Still, TWINE is an independent work and presents several concrete design advantages. One of their differences is that LBlock uses ten S-boxes, while TWINE uses a single S-box. TWINE uses nibble permutation instead of bit permutation, while LBlock utilizes bit permutation for key scheduling purposes. The authors of TWINE published a saturation attack against a 22-round version of LBlock [135], lowering the security level originally claimed. While TWINE is not the most compact cipher, its performance is well balanced on both hardware and software. However, the simplified key scheduling process can be exploited by meet-in-the-middle attacks [62].

Piccolo [124] is one of the most lightweight block ciphers. It is a variant of a generalized Feistel network (GFN) and supports 64-bit blocks with 80- and 128-bit keys through 25 and 31 rounds, respectively. It features very compact implementations which achieve low energy consumption. The 80-bit key version is the most lightweight, requiring 432GE. Piccolo needs an additional 60GE for the decryption functionality. As a result, Piccolo remains more compact even when compared to encryption-only ciphers.

The key scheduling process is implemented by multiplexers without flip-flops (like GOST and KTANTAN) that require a large area. These values (key components) are stored in the key inputs segment, which is not hard-wired, and no registers are used for storing keys. As with other newer ciphers, Piccolo uses scan flip-flops for the data state. Also, the cipher uses AND-NOR and OR-NAND gates, which require 2GE, to perform the same functionality as XOR and XNOR, respectively, which require 2.25GE. Moreover, it adopts a half-word-based round permutation.

In software [27], it requires 2434 bytes of code and consumes 79 bytes of RAM. It has poor performance and achieves 7.8 Kbps of throughput.

Biclique cryptanalysis has been achieved on full-round versions of Piccolo-80, [69, 128]. For Piccolo-128, they perform slightly lower than exhaustive key search. Impossible differential cryptanalysis is also performed on reduced round versions [11].

4.2.4 ARX ciphers

HIGHT [60] uses a compact round function, without the use of S-boxes, and all operations are simple computations. It uses 128-bit keys with 64-bit blocks through 32 rounds. The most compact hardware implementation requires 2608GE for 188Kbps throughput [92]. An impossible differential attack on 26-round HIGHT was presented in [104], a related-key attack on full-round HIGHT was presented in ICISC2010 [82], biclique cryptanalysis has been achieved on full-round version [128], and zero-correlation attacks on the 26- and 27-round cipher were presented in [144].

4.2.5 NLFSR ciphers

The KATAN and KTANTAN [25] cipher family supports 80-bit keys and 32-, 48- and 64-bit blocks through 254 rounds. The ciphers follow the design of KeeLoq, but they apply an LFSR (instead of NLFSR) and process the data for less rounds (254 rounds). They use a very simple key scheduling mechanism. KATAN achieves a small footprint of 802GE. KTANTAN authors propose to hard-wire the key on the device to further reduce the GE, due to the absence of key generation operations. KTANTAN is only suitable for devices where the key is initialized once and does not change. The most compact version of KTANTAN achieves 462GE. The authors propose that the version of KTANTAN-48 (588GE) is more suitable for RFID devices. The basic cipher resembles the structure of the stream cipher Trivium [98].

Their software implementations are not efficient since they use bit manipulations extensively. This fact is noted in [37], where a compact software implementation of KATAN is presented. The authors mainly try to decrease the code size. Although the proposal achieves a quite low code size of 338 bytes, it produces low throughput and consumes too much energy. The encryption takes 72,963 cycles and the decryption 88,525 cycles. Multidimensional meet-in-the-middle attacks on reduced round KATAN that are faster than exhaustive search were presented in [159]. Several related-key attacks which recover the full 80-bit key of KTANTAN with a probability of one are presented in [2].

4.2.6 Hybrid ciphers

Hummingbird (HB) [38] is another promising ultra-lightweight cipher, which introduces a hybrid block and stream cipher structure, with 256-bit key and 16-bit block size through 20 rounds. It features better performance than PRESENT on 4-bit microcontrollers. The first version was found vulnerable to a number of attacks [119].

Its successor, hummingbird-2 (HB-2) [39], optionally produces a message authentication code (MAC) for each message processed and is developed with both lightweight software and hardware implementation constraints in mind. It can actually be considered a combination of a cipher and a protocol; its design fulfills the requirements of ISO 18000-6C protocol. HB-2 uses 128-bit secret keys and 64-bit initialization vectors (IVs), and as its predecessor, it has been targeted for low-end microcontrollers and hardware implementations in resource-constrained devices such as RFID tags and wireless sensors. HB-2 is broken under a related-key attack [120].

4.3 Period 3: second LWC generation

This is the current period of LWC’s lifetime. Pervasive computing is here with general purpose constrained devices being utilized in our everyday activities. The competition for area optimization continues but in a less binding manner. The first generation of ciphers achieved a high degree of reduction, where almost 80–90% of the implementation is occupied by memory elements (e.g., the cases of PRESENT and KATAN). New challenges are now considered to meet the real application requirements. Over and above the speed optimization, low latency is now the objective. Decryption functionality becomes essential, and involutive designs gain ground. Cryptanalysis reveals that many lightweight ciphers are vulnerable to side-channel attacks. Masking techniques and ciphers that are easy to mask are proposed. Strategies for designing secure and efficient ciphers are applied (e.g., wide-trail), along with further analysis on the linear and nonlinear layers of the inner structures.

4.3.1 SPN ciphers

The authors in [99] try to apply countermeasures against first-order side-channel attacks on AES. They implement the features proposed in [102] and increase the level of resistance. Their implementation occupies 11031GE (\(0.33\,\upmu \)m). A newer implementation [31] requires 6340GE, but it is deployed in larger CMOS technology (\(0.45\,\upmu \)m). PICARO [107] is a new cipher which is specifically designed to counter such attacks. The authors implement a cipher that fits the masking constraints well, instead of integrating masking schemes to existing ciphers, like the AES proposal. They consider 4 different masking levels. PICARO’s masked hardware implementation is faster than the corresponding AES. Zorro [46] is a newer proposal that is based on AES and provides efficient masking. It is suitable for embedded systems, and it is implemented in 8-bit microcontrollers. Zorro is more efficient than PICARO as it requires less cycles to operate. However, practical invariant subspace attacks are presented in [86].

PRINCE [22] accomplishes low latency in hardware. It is a notable proposal that applies the wide-trail strategy and uses 128-bit keys with 64-bit blocks through 12 rounds. Its lightweight implementation [13] requires 2953GE for 533.3 Kbps of throughput and has low energy consumption. Efficient software implementations are also presented [4]. Cryptanalysis in [68] and [127] reduced the security level, as attacks were performed on the reduced 6-round and the full 12-round cipher. A key-recovery attack on the 7-round version based on truncated differential cryptanalysis is presented in [157].

RECTANGLE [155] is a recent lightweight SPN proposal. It uses 64-bit blocks with 80- and 128-bit keys through 24, 28 and 32 rounds, respectively. The substitution is performed by 16 \(4\times 4\) S-boxes in parallel setting, and the permutation is executed in 3 rotations. Its authors propose a new type of S-box along with an asymmetric design of the permutation layer. These new designed criteria are motivated by the cryptanalysis of PRESENT. RECTANGLE adapts bit-slice techniques to LWC and allows lightweight hardware and fast software implementations. In hardware, its most lightweight version with 80-bit keys requires 1467GE for 246 Kbps throughput and consumes the lowest energy/bit. In software, it requires 5.38 cycles/byte for encryption of 1000 byte messages using Intel 128-bit SSE instructions (such implementations are outside the LWC scope).

I-PRESENT \(^{\mathrm{TM}}\) [154] is an involutive version of PRESENT, using the same key and block sizes through 30 rounds. The S-box layer uses two additional \(4\times 4\) S-boxes 16 times. The involutive part is inspired by PRINCE. A ciphertext block is produced after a 15-round function, followed by a 15-round involutive function (30 rounds in total, the original cipher takes 31 rounds). The S-box from NOEKEON is used as the S-box of the involutive function. The key schedule procedure generates 30 64-bit round subkeys. Decryption is identical to encryption, except the fact that the round subkeys are inputted in the reverse order. The most compact hardware implementation requires about 2769GE, providing both encryption and decryption. The relevant encryption-only implementation of PRESENT requires 1570GE.

PRIDE [4] is a software cipher that uses 128-bit keys with 64-bit blocks through 31 rounds. The wide-trail design strategy is adopted, and the linear properties of the S-box are further studied. A bit-sliced implementation is presented, with an extremely efficient linear layer. The implementation outperforms many other proposals in 8-bit microcontrollers and exhibits low latency and energy consumption.

4.3.2 Feistel ciphers

SIMON [15] was designed by the NSA. A performance evaluation was presented in [15]. It performs well in software and hardware. It supports several sizes for key (64, 72, 96, 128, 144, 192, 256) and block (32, 48, 64, 96, 128) and round numbers (32, 36, 42, 44, 52, 54, 68, 69, 72). Independent cryptanalysis efforts on SIMON [6] present a series of observations on the cipher’s construction and attacks on reduced round versions. Differential fault attacks on SIMON are presented in [139]. Cube and dynamic cube attacks on SIMON, with 64-bit key and 32-bit block, recover the full key in a practical time complexity [112].

ITUbee [74] is a newly proposed cipher, designed for lightweight software and suitable for 8-bit software-based platforms. It is based on a Feistel structure and has no key schedule, featuring 80-bit keys with 80-bit blocks. Round-dependent constants are used, instead of strong key scheduling. In its most compact form [74], it requires 586 bytes of code and 2937 cycles for encryption. Self-similarity cryptanalysis is performed in [126] that exploits the round constants and builds relations between them. The reduced round cipher is distinguishable from an ideal random permutation. A deterministic related-key differential distinguisher is presented for the 8-round version in the single-key model, reducing the security by one bit.

FeW [83] is a software-oriented design for LWC. It uses 80- and 128-bit keys with 64-bit blocks through 32 rounds. The S-box of hummingbird-2 is utilized in encryption, decryption and key schedule. The key expansion process is similar to PRESENT. FeW combines a Feistel and a generalized Feistel structure to enhance security against basic cryptographic attacks. It is reported as safe against present day cryptanalytic attacks [83].

Robin and Fantomas [49] are LS-designs (i.e., a combination of look-up table-based L-boxes and bit-slice S-boxes). Both ciphers use 128-bit keys and blocks and are based on the Feistel-type block cipher MISTY [97] (ancestor of KASUMI). They integrate LS-design techniques to provide efficient masking and protection against side-channel attacks. The ciphers use the same constants and target single-key security, excluding related or chosen key attacks. They target efficient and secure software implementations on 8-bit microcontrollers, but are considered to perform well in hardware as well.

Robin is an involutive instance of LS-design, with 16 rounds that utilize the MISTY Class13 S-box [140]. The S-box and L-box are involutive, allowing the same circuity to be reused for the inverse operation (i.e., encryption and decryption). It aims to have similar security margins and performance to NOEKEON. These two bit-slice ciphers optimized for Boolean masking are more efficient than AES, Zorro and PICARO. However, practical invariant subspace attacks are presented in [86], similarly with Zorro. A careful tweak of the cipher is proposed to prevent the attacks in case of Robin, and the LS-design security is not questioned.

Fantomas is a non-involutive LS-design with 12 rounds that utilizes the MISTY 3/5-bit S-boxes [97]. It illustrates the impact of choosing involutive components with respect to inherent efficiency limits imposed by the LS-designs. Analysis indicates that involutive LS-designs require about 4/3 rounds of non-involutive LS-designs to achieve a similar level of security. Fantomas is safe against the aforementioned attacks on Robin and is also faster.

HISEC [5] is a GFN and exhibits similar characteristics as PRESENT with a different bit permutation procedure. It uses 80-bit keys with 64-bit blocks and processes the data for 15 rounds. HISEC provides protection against the differential, integral and boomerang attacks that have been presented for other ciphers, like PRESENT, LBlock, Klein and TWINE. The hardware requirements are estimated to 1695GE. No performance details are provided.

4.3.3 ARX ciphers

SPECK [15] is designed by the NSA along with SIMON. As is the case with SIMON, SPECK also performs well in software and hardware. It uses the same key and block sizes through 22, 23, 26, 27, 28, 29, 32, 33 and 34 rounds. SIMON is better in hardware, and SPECK is better in software. Independent cryptanalysis efforts on SPECK [1] present a series of observations on the cipher’s construction and attacks on reduced versions of the cipher. Differential fault attacks are presented in [139].

BEST-1 (Better Encryption Security Technique-1) [67] is designed for ultra-lightweight implementations with low cost and power on WSN and RFID. It uses 128-bit keys with 64-bit block through 12 rounds and fits in 8-bit processors. It is an ARX proposal, and the main operations are \(\hbox {mod2}^{8}\) addition and subtraction, bit-wise shift and XOR. Decryption is performed at little cost as a sequence of steps after inverting the encryption process. In hardware, BEST-1 requires around 2200GE and achieves 265.7 Mbps at 80MHz frequency.

LEA (Lightweight block Encryption Algorithm) [61] is a software-oriented ARX designed by the Electronics and Telecommunication Research Institute of Korea. It targets fast encryption on common processors, and all operations are 32-bit wide. It uses 128-bit blocks with 128-, 192- and 256-bit keys through 24, 28 and 32 rounds, respectively. LEA has small code size and consumes low power. On ARM platforms, it requires 590 bytes of ROM and 32 bytes of RAM for 326.94 cycles per byte speed. The cipher exhibits high hardware demand with the most compact implementation requiring 3826GE for 76.19 Mbps throughput [88]. The security analysis is theoretical. Power analysis on hardware is presented in [77]. The 128-bit key is retrieved by attacking the first round. Implementing LEA with countermeasures becomes essential.

4.3.4 NLFSR ciphers

Halka [34] is a recently proposed cipher. It is designed for lightweight hardware but also remains software friendly. It uses 80-bit keys with 64-bit blocks through 24 rounds. The main feature is the implementation of a novel multiplicative inverse for 8-bit S-boxes using LFSR, which requires 138GE. The relevant feature of other existing standards demand more gates, like the Canright’s AES S-box [26] that occupies 253GE. Thus, the total gate count of the Halka should be low. The use of 8- instead of 4-bit S-boxes is considered more secure in this design. Then, eight 8 \(\times \) 256 bytes S-boxes are utilized to achieve super-fast software performance. The key schedule is similar to PRESENT, but it is materialized with 8-bit S-box. In hardware, Halka is estimated to require 7% less GE than PRESENT but is considered more secure [34]. In software, it could be three times faster than PRESENT [34].

4.3.5 Hybrid ciphers

PRESENT-GRP [12] is a hybrid cipher that combines the SPN of PRESENT with a GRP (group operation) structure for bit permutation. It uses 128-bit keys with 64-bit blocks through 31 rounds. The security properties of GRP are well studied and offer higher confusion properties than bit permutation. The P-box of the original PRESENT is replaced by GRP. The hybrid cipher with the PRESENT S-box and GRP enhances security and results in lower memory and area consumption. The hardware design requires 2125GE, which is a little larger than the corresponding PRESENT (1884GE). In software, the implementation size is slightly larger than PRESENT for the same amount of RAM.

5 Results and discussion

5.1 Security

Table 3 indicates the features of each cipher and the best publicly known cryptanalysis results. Security issues should be carefully considered before using a cipher. For example, KeeLoq, GOST, TEA, XTEA, mCrypton, HIGHT, KASUMI, PUFFIN, hummingbird, MIBS, KTANTAN, TWIS, PRINTcipher, hummingbird-2, LBlock, Zorro and LEA all suffer from security vulnerabilities and should be avoided. DES 56-bit key is not adequate for nowadays applications.

Newer second-generation proposals, namely PICARO, SPECK, ITUbee, RECTANGLE, FeW, Halka, BEST-1, Robin, Fantomas, HISEC, \(\hbox {I-PRESENT}^{\mathrm{TM}}\), PRESENT-GRP and PRIDE, have to be further analyzed for vulnerabilities and their claimed level of security. AES, Camellia, PRESENT, CLEFIA, and DES variants are the most studied solutions and as a consequence the most acceptable ciphers. PRESENT and CLEFIA are the standardized block ciphers for lightweight cryptography.

5.2 Comparison

We examine 127 hardware and 233 software implementations. We evaluate all implementations based on the metrics and the fair comparison approach detailed in Sect. 2. In hardware, 0.09, 0.13, 0.18 and \(0.35\,\upmu \)m technologies are used. The software implementations are deployed in 8-, 16- and 32-bit microcontrollers.

5.2.1 Hardware implementations

Regarding hardware implementations for constrained devices, Figs. 1, 2, 3 and 4 illustrate the best hardware implementations according to individual metrics. Implementations are summarized per technology in terms of GE, energy, power and hardware efficiency, respectively. Implementations on \(0.09\,\upmu \)m technology are colored with purple, on \(0.13\,\upmu \)m technology with deep blue, on \(0.18\,\upmu \)m technology with blue, and on \(0.35\,\upmu \)m technology with green. The standardized ciphers that are utilized as comparison units (AES, PRESENT and CLEFIA) have a yellow color, and implementations with excess demands in each metric are outlined with red color. All hardware implementations are detailed in Tables 4, 5, 6 and 7.

Fig. 1
figure 1

Best hardware implementations in terms of GE

All proposals for the ciphers IDEA, Camellia, ICEBERG, KASUMI and LEA exceed the boundary of 3000GE and are, therefore, not considered efficient. NOEKEON, PUFFIN-2, LED and EPCBC consume excessive energy per bit and produce low hardware efficiency. KTANTAN, HIGHT, HB-2, TEA, Klein, SEA, KATAN, AES, DES, DESX, EPCBC, LED, PUFFIN-2 and NOEKEON consume high energy per bit. DESX, DES, AES, KATAN, KTANTAN, EPCBC, PUFFIN-2 and NOEKEON produce high latency (144 3720 cycles per block). NOEKEON, PUFFIN-2 and EPCBC have also low throughput. DES, HB-2, \(\hbox {I-PRESENT}^{\mathrm{TM}}\), DESX and HIGHT have higher power requirements than other ciphers.

Fig. 2
figure 2

Best hardware implementations in terms of energy consumption (axis y is limited at \(200\,\upmu \hbox {J}\))

Fig. 3
figure 3

Best hardware implementations in terms of power requirements

KTANTAN, PRINTcipher, Piccolo, SIMON, TWINE, KATAN, SPECK and GOST produce compact implementations with the lowest power requirements (less than \(1\,\upmu \)W). Klein, LED, RECTANGLE, EPCBC, PRESENT, PUFFIN, PUFFIN-2, DESL, LBlock, MIBS, AES, CLEFIA, XTEA, HISEC and SEA consume also low power (around \(2.5\,\upmu \)W). Piccolo, DESL, Klein, DESXL, mCrypton, PRINCE, PRESENT, CLEFIA, TWINE, RECTANGLE, SPECK, GOST, PRINTcipher, PUFFIN, SIMON, LBlock and MIBS consume low energy per bit (less than 10 \(\upmu \mathrm{J}\) per bit).

Fig. 4
figure 4

Best hardware implementations in terms of hardware efficiency (Hardware Efficiency = Throughput [Kbps]/Complexity [KGE])

Figure 5 illustrates the best overall hardware implementations. Based on the criteria adopted in this work, Piccolo evaluates best compared to all the other proposals for many key sizes. PRESENT and TWINE perform similarly and achieve a good overall status. TWINE, as a Feistel network, can offer the decryption functionality at low cost. RECTANGLE, PRINCE, SPECK and mCrypton exhibit low latency and efficient implementations. SEA also offers low latency but it is quite slower. For lower security, DESL performs well.

Fig. 5
figure 5

Best overall hardware implementations per technology

Fig. 6
figure 6

Best software implementations in terms of energy consumption

Low-cost RFID devices require about 2000GE and 80-bit keys. From the ciphers that support 80-bit keys, LED, PUFFIN-2, KATAN, Klein, EPCBC and LBlock produce low hardware efficiency or consume a lot of energy per bit. Klein typically targets wireless sensors which are more powerful devices, and PUFFIN-2 requires more than 2000GE. Piccolo achieves the best overall status compared to all the other ciphers. It occupies a small chip area even when both encryption and decryption are implemented. Piccolo has higher throughput and hardware efficiency, while it consumes the least energy. SIMON, SPECK and RECTANGLE also achieve a good overall status and consume low energy but are newly proposed ciphers. GOST and PRINTcipher perform well in this category with higher key sizes. KTANTAN, MIBS, TWINE and PRESENT perform reasonably on such devices.

Suitable implementations for ultra-constrained devices require less than 1000GE and 64-bit keys. Klein, LED and MIBS provide exactly 64-bit keys. Among these three ciphers, only LED can be implemented within the GE limit but with high energy cost. The most compact ciphers that require around 1000GE and provide key sizes between 80- and 256-bits are the GOST, PRINTcipher, KTANTAN, Piccolo, SIMON, KATAN, SPECK, PRESENT and LED. GOST, PRINTcipher and KTANTAN perform the best.

Fig. 7
figure 7

Best software implementations in terms of throughput

Fig. 8
figure 8

Best software implementations in terms of software efficiency (Software Efficiency = Throughput [Kbps]/Code size [KB])

Some serial implementations have achieved very small chip areas. The ciphers PRINTcipher, GOST, Piccolo, KTANTAN and SIMON can be implemented with less than 800 GEs.

Among the novelties and well-known practices proposed to enhance hardware implementations is the use of a scan flip-flop instead of a D flip-flop and a 2-to-1 MUX, AND-NOR gates instead of XOR gates and OR-NAND gates instead of XNOR gates. Furthermore, the storage of memory elements in registers instead of flip-flops consumes less energy and occupies less chip area. Bit-slice techniques are also utilized to reduce the area and design complexity requirements. Key scheduling is simplified to enhance latency and throughput. Involutive designs are suggested for low-cost encryption/decryption functionality.

5.2.2 Software implementations

For software implementations, Figs. 6, 7 and 8 illustrate the best software implementations according to individual metrics. The ciphers AES, Camellia, IDEA, NOEKEON, KASUMI, GOST, HIGHT, PRESENT, PRESENT-GRP, CLEFIA, HB, HB-2, Piccolo, TEA, XTEA, SEA, mCrypton, MIBS, TWINE, LED, Klein, KATAN, KTANTAN, ITUbee, SIMON, SPECK, PRINTcipher, LBlock, PRINCE, PRIDE, Zorro, Robin, Fantomas, LEA and DES and its variants are examined.

Fig. 9
figure 9

Best overall software implementations per platform

KTANTAN, LED, KATAN and Camellia consume high energy. KTANTAN, KATAN, PRINTcipher, LED and Piccolo are too slow. The standardized PRESENT and CLEFIA have also low throughput as the lightweight proposals for mCrypton, PRINCE, KASUMI, MIBS, SEA, DESX, DES, DESXL and DESL. KASUMI, mCrypton, TWINE, NOEKEON, Piccolo, Camellia, IDEA, MIBS, PRINTcipher, CLEFIA, LED, KATAN and KTANTAN exhibit high latency in software (more than 10,000 cycles). KTANTAN, KATAN, PRINTcipher, GOST, LED, MIBS, Camellia, DESX and DES achieve the worst overall performance based on the software efficiency metric in all three microcontroller technologies.

A speed-optimized implementation of HB exceeds the boundary of 8KB RAM for both low-cost and lightweight devices and is not considered efficient for LWC. For ultra-constrained devices only, KTANTAN, PRESENT-GRP and Zorro exceed the boundary of 256 byte RAM and can not be applied. For ultra-constrained and low-cost devices, some of the implementations for GOST, PRINTcipher, KTANTAN, PRINCE, DES and DESX exceed the boundary of 4KB ROM with HB-2, DESL, DESXL, HB, LBlock and KATAN coming close.

The most compact implementations (in terms of code size) are reported for SPECK, SIMON, PRIDE, SEA, KATAN, NOEKEON, AES, ITUbee, HIGHT and XTEA (less than 0.5KB). The implementations with the higher throughput are those of SPECK, SIMON, AES, HB-2, TWINE, PRIDE, Fantomas, ITUbee, Robin, LEA, IDEA and HIGHT (more than 85 Kbps). The lowest latency is achieved by HB-2, SPECK, SIMON, HB, TWINE, PRIDE, AES, ITUbee, IDEA and HIGHT (less than 3000 cycles). HB-2, SPECK, HB, SIMON, AES, Fantomas, TWINE, PRIDE and Robin have low energy consumption (less than 10 \(\upmu \mathrm{J}\) per bit).

Figure 9 illustrates the best overall software implementations for the three microcontroller platforms and different keys. Based on the criteria adopted in this work, SPECK and PRIDE evaluate best compared to all the other proposals for many key sizes. Fantomas and Robin perform similarly well, providing additional security against side-channel attacks. AES, SEA and ITUbee achieve a good overall status. PRESENT is slower with high latency and energy consumption. SPECK and SEA also offer efficient encryption/decryption implementations with low cost. DESXL provides higher key size, while for lower security DESL performs well.

In software implementations, code size reduction can be achieved by using low-level languages, such as the device’s assembly language, and by developing segments of code that will facilitate code reuse. To reduce the memory requirements, it is suggested to reuse storage space for keeping different elements, whenever it is possible. To achieve better performance, it is preferred to use the device’s word size as a processing unit (usually byte words). Word-wise permutations (4- or 8-bit) are preferred to bit-wise permutations. Moreover, word-wise matrix multiplication operations are utilized.

The ciphers SIMON, SPECK, TWINE, Klein, LED and LBlock were designed both for lightweight hardware and compact software implementations. SIMON and SPECK shine in both domains and TWINE in hardware, while the rest of the ciphers perform reasonably well in both domains and do not excel in any of them.

5.3 Design directions

The maintenance of the internal state is the most challenging aspect of managing space requirements. The state has to be saved at each round which typically requires at least half of the total chip area and considerable power. As memory is limited in low-cost devices and read/write operations are power-demanding, the intermediate state is often kept in registers. The registers are materialized by flip-flops that have high area and power needs. Thus, lightweight ciphers try to minimize the storage requirements.

Space reduction can be achieved by lowering the block and key size. A small block or key size preserves small memory complexity. On the other hand, various security issues, such as birthday attacks, are introduced when we use block sizes that are less than 32 bits. Popular parameters for lightweight ciphers are 64-bit blocks and 80-bit keys size.

Absence of decryption functionality and hard-wired keys are techniques that can also improve the performance of applications that utilize cryptography. Some proposals tend to apply simple wiring, while others avoid this approach to harden differential analysis. For applications where the implementation cost is more important than the security level, Feistel networks and simple key schedule mechanisms are a good choice. For applications where moderate security is needed, SPNs with more robust key schedule mechanisms are preferred.

5.4 S-box

S-boxes are a critical component of block ciphers, typically used to introduce nonlinearity. In software, they are implemented using look-up tables (LUT). However, these LUTs, when implemented in hardware, may have a large footprint or introduce other technical complications. Combinatorial solutions [36] are usually more efficient, where the input bits affect the complexity of the equations and the output bits affect the number of the Boolean equations required. If a combinatorial solution does not deal with the inner structure of the S-box, the occupied area will grow rapidly with the number of input and output bits. High nonlinearity demands more area, but also offers higher resistance to differential and linear analysis.

The size of the S-box is another important parameter. Large S-boxes can achieve high levels of security but consume many resources for both their hardware and software implementations. On hardware, the gate count increases exponentially with the size. However, very small S-boxes cannot provide an acceptable level of security. A good choice that balances efficiency and security, and is adopted by the majority of the lightweight ciphers, is the \(4\times 4\) S-box (as an alternative to typical \(8\times 8\) S-boxes). If higher levels of security are required, it is preferable to increase the number of rounds.

6 Conclusions

Lightweight cryptography is one of the important building blocks of secure embedded systems. This paper presented the latest developments in the field and the state-of-the-art implementations in lightweight symmetric-key cryptography. To this end, hardware and software implementations of block ciphers were examined. Moreover, the security, cost and performance properties of all the different proposals were considered, and a comparative analysis was presented, with the information aggregated in the corresponding tables.

Research is ongoing, as new developments in the field constantly emerge, with novel algorithms and cryptanalysis techniques being proposed in the literature. Crypto-enabled resource-constrained devices are high in demand, a trend that is expected to continue to grow as embedded systems permeate our lives, leading to the realization of ubiquitous computing. Consequently, the authors hope that the analysis presented here will contribute to the design of robust systems and architectures, enabling secure the transition to this new reality and the Internet of Things.