1 Introduction

We live in an era where interconnected computing devices keep getting more numerous, while cyber-attacks keep getting more sophisticated and frequent. The need for new standards that can protect communications against such threats has increased. However, the available cryptography standards do not meet the requirements of the new challenges we face today where constrained devices are massively deployed in the Internet of Things. They have hardware limitations such as memory size and their battery life must be preserved. For instance, limited health sensors (e.g. heart pacemaker, brain simulator) are directly connected to a network to gather useful data. Security here plays a crucial role since unauthorized access to these critical devices can be life-threatening. Other examples are smart homes, green cities, supply chain management, etc. Lightweight cryptography in the last 10 years has resulted in more than 1400 papers which all aim at reducing resource consumption, software and hardware efficiency on different/limited platforms while remaining resilient against different kinds of attacks. To help in the process of development, evaluation and standardization of a suitable lightweight cryptographic algorithm, NIST has initiated the Lightweight Cryptography Project. Looking back at the NIST contests for the selection of new cryptographic standards [36, 37], algorithms with weak security designs were disqualified after the first evaluation phase. The evaluation and the benchmark of the proposed solutions play a major role in the evaluation of an algorithm on both hardware and software efficiency. Since benchmark frameworks allow for consistent evaluation, they are important not only in the selection process of new cryptographic standards, but also for carrying out a fair comparison of ciphers’ performance in given usage scenarios.

In 2015, NIST organized a workshop on lightweight cryptography to discuss the security and resource requirements that should be available in a standard to secure IoT applications. NIST received and published 56 algorithm proposals, which include more than 200 AEAD cipher implementation variants. The final goal of this initiative is to find the best proposal that can be used in such limited devices. The proposed algorithms are based on authenticated encryption with associated data (AEAD). An Authenticated Encryption (AE) algorithm can be defined as a symmetric cryptographic algorithm that is capable of simultaneously preserving the confidentiality and authenticity of data [8].

In this paper, 12 candidate algorithms from NIST second round competition are benchmarked on different hardware platforms. Our first objective is to evaluate the ciphers by using the FELICS-AE benchamrk to measure for each algorithm three metrics: (1) RAM, (2) Execution time, and (3) Binary code size. These metrics could be computed for a PC, an AVR, an MSP430 and a 32-bit ARM processor. After that, the second objective of this work is to evaluate every algorithm on the IoT-LAB platform, which is a real test-bed that provides access to hundreds of IoT boards for experimenting and deploying across different sites. IoT-LAB is used in this study due to these advantages: (a) user-friendly interface; (b) multi-platform: offers experimentation boards; (c) multi-radio: the boards do not have the same radio chips; (d) multi-topology: different physical deployments; and finally (e) multi-OS: the different boards support one or more embedded OS.

1.1 Motivation and contribution

The goal of this work is to provide a broad overview over the ciphers performance footprints, as well as building a benchmark for the performance evaluation of the NIST round 2 candidate AEAD-algorithmsFootnote 1 by using the platform IoT-LABFootnote 2. The benchmark code is available at this link: https://gitlab.inria.fr/anon_group/iotlabandnist/. Our benchmarking process is depicted in Fig. 1.

Fig. 1
figure 1

The steps of our benchmarking process

The four steps summarizing the motivation and the main contribution of our work are the following:

  1. 1.

    The NIST AEAD algorithms: The first step is to understand the NIST lightweight competition algorithms which are tested to be standardized as lightweight algorithms dedicated to authenticated encryption with associated data (AEAD). 33 public algorithms have been chosen for the round 2 competition and 12 of them are chosen in this work.

  2. 2.

    FELICS-AE: Adapting these algorithms to the FELICS-AE [34] platform is done by importing their NIST releases and adding their implementations to the platform, as well as one of the test vector provided in these releases. The platform then checks the implementations for their correctness on three different hardware platforms. Then, a benchmark process is carried out for the algorithms’ performance in terms of cycle counts and memory usage. The current distribution of FELICS-AE includes only a few algorithms, but more could be easily added. It is based on the FELICS [18] platformFootnote 3 which is dedicated to the evaluation of stream and block ciphers that do not support authenticated encryption. This step is described in Sect. 3.2. Note that FELICS-AE and IoT-LAB support the same hardware platforms which guided our choice of using FELICS-AE in this step.

  3. 3.

    Contiki operating system: After evaluating the provided algorithms on FELICS-AE, we adapted their respective codes to the Contiki operating system which is supported by IoT-LAB. Contiki is dedicated to running on hardware devices that are severely constrained in memory, power, processing power, and communication bandwidth, such as embedded systems and old 8-bit hardware. We used here Contiki-NG, which is a new version of Contiki OS. It runs on a variety of platforms based on energy-efficient architectures such as the ARM Cortex-M3/M4 and the Texas Instruments MSP430. However, other operating systems could be considered such as Riot [39].

  4. 4.

    IoT-LAB: Finally, we used these algorithms on an IoT application that we deployed on the IoT-LAB platform to evaluate their performance from a networking perspective.

1.2 Organization

Section 2 describes the related work in the field of software performance evaluations of cryptosystems. Section 3 presents the IoT-LAB platform and details the 12 algorithms selected among the 33 NIST ligthweight candidates as well as their evaluation with FELICS-AE. In Sect. 4, we detail the way we instrument IoT-LAB to produce the seven identified metrics for the 12 algorithms. In Sect. 5 we provide the performance results while using two IoT nodes (one client and one server) for the seven identified metrics on the 12 algorithms. Finally, Sect. 6 concludes this paper.

2 Related work

In this section we present the related work regarding existing cryptographic algorithms benchmarking tools and their comparison with our methodology.

Many benchmarks have been proposed to evaluate the performance of cryptographic algorithms on both hardware and software [22, 31,32,33, 35]. For example, the BLOC [12] project is one of the first attempts to evaluate lightweight cryptographic algorithms on embedded devices. It is publicly available and contains one of the largest collection of algorithm implementations. In [13] the authors analyse the performance of lightweight cryptographic algorithms on wireless sensor nodes. The code is written in C and it targets the 16-bit MSP430F1611 device [40]. Three metrics are considered: execution time, RAM requirement and code size. However, the RAM is not computed correctly, since the unsigned int data type requires two bytes and not one byte as considered by the authors on 16-bit MSP430F1611 micro controller. Thus the identified RAM requirement is half of the actual value. Further more, the library is not flexible and does not allow the addition of new algorithms easily. Finally, some implementations of the studied ciphers do not verify the test vectors. Therefore, we avoided using this platform and studied other options.

Two years after the BLOC project, the University of Luxembourg provides the FELICS platform [18]. FELICS stands for Fair Evaluation of Lightweight Cryptographic Systems. This benchmarking framework is motivated by the need for a unified evaluation of lightweight block ciphers and stream ciphers performances. FELICS has a dedicated web pageFootnote 4 where an open virtual machine can be downloaded and benchmarks are maintained. Designers could upload new ciphers and could get consistent and detailed feedback on how their cipher compares with the state-of-the-art. The tool can evaluate execution time, RAM footprint, and binary code size. The tool supports four microcontroller families: an 8-bit microcontroller (Atmel AVR ATmega128), a 16-bit microcontroller (Texas Instruments MSP430F1611) and finally two 32-bit microcontrollers (Arduino Due and ARM Cortex-M3). Finally, FELICS-AEAD was presented at the first NIST Lightweight workshop [21] as an extension of FELICS done by the University of Luxembourg allowing authenticated encryption. Unfortunately, we could not find the source code. In the same way, FELICS-AE, also presented at the third NIST workshop [34], is also an extension of FELICS with the additional functionality of the authenticated encryptionFootnote 5.

The third platform that was also studied is the eBACS Project ECRYPT Benchmarking of Cryptographic Systems, which is considered as the first step to consistent evaluation of cryptographic primitives for software [9]. The web page of this platform describes how to add new implementations and how to collect the data for the existing implementations. It allows the benchmarking of algorithms implemented in C, C++ and assembly. The only metric extracted is the cycle count (speed) and the results are saved in a database in text format. The advantage of this platform is the variety of supported hardware platforms and architectures used to obtain the results while the drawback is supporting only one metric which is the execution time.

The fourth project is the XBX Project (eXternal Benchmarking eXtension) [41]. It allows the benchmarking of hash functions on different micro controllers. Two metrics are extracted which are the binary code size and RAM consumption. The code size is obtained through static analysis of the generated binary file. The RAM requirement is the sum of stack consumption and static RAM requirement obtained from the application binary. The framework is written in C, Perl and Bash. XBX is the first project to unify measuring the performances of software implementations of cryptographic primitives built for different embedded devices using the same evaluation methodology. The results in [42] are evaluated for eight different devices with 8-bit, 16-bit and 32-bit CPUs. However, its web pageFootnote 6 is no longer maintained.

3 Preliminaries

In this work, we relied on different platforms to build our benchmarking process. First, FELICS-AE is used to verify the test vectors of each implemented algorithm. Once the test vectors are verified and RAM size, code size and execution times have been obtained, IoT-LAB is used to analyse the performance of the algorithms on a real IoT network. Below, we describe the platform IoT-lab as well as the evaluation of the selected algorithms by using the FELIC-AE tool.

3.1 IoT-LAB: platform description

3.1.1 Overview

The IoT-LAB Platform [30] offers an easy way to deploy experiments involving IoT nodes. It is part of the FIT (Future Internet Testing) experimental facility, provided by five French institutions of higher education and research: UPMC, Institut Mines-Télécom, Inria, CNRS and University of Strasbourg. FIT is also part of OneLab [38] facility, aiming to facilitate experimentation for academic and industrial users. This facility is composed of four platforms: PlanetLab Europe, FIT CorteXlab, NITLab and FIT IoT-LAB. All these infrastructures are made available through a single entry point, providing access to the different test-beds.

The IoT-LAB Platform provides 1786 wireless sensors nodes, located on six different sites. These nodes can be selected by an authenticated user in order to be used in one (or more) experiment(s). For each of the chosen nodes, the user can then provide a firmware to be deployed on them, and select a profile defining what metrics will be measured while the experiment is running. It is done through an online dashboard, or through dedicated Python scripts (IoT-LAB cli-tools [27]). The collected metrics can then be accessed by the user through ssh.

3.1.2 Nodes, components and topologies

As previously mentioned, the IoT-LAB platform provides access to 1786 nodes. These nodes are located in six different sites: Inria Grenoble (640 nodes), Inria Lille (293 nodes), Inria Saclay (264 nodes), ICube Strasbourg (400 nodes), Institut Mines-Télécom Paris (160 nodes) and CITI Lab Lyon (29 nodes). Each site proposes a different topology. The nodes are divided into four main categories, regarding their architecture: there are 256 WSN430 nodes (of 868 MHz), 883 M3 nodes, 524 A8 nodes and 123 other “custom” nodes. Each node is identified by its site (Grenoble, Saclay, Lille, Lyon, Paris or Strasbourg), its architecture and an integer ID. For instance, the node m3-21.lille.iot-lab.info has Lille as site, m3 as architecture and 21 as integer ID. Finally, each node has a state: it can be Alive if the node is available, Busy if it is currently used by an experiment, Suspected if it is not available, Dead if it is not working, or Absent.

Each node in IoT-LAB has three main components: the Open Node (ON), the Gateway (GW) and the Control Node (CN). The Open Node is flashed with the firmware provided by the user. During the experiment, it can be stopped, re-flashed, rebooted, etc. The Gateway provides a connection between the Open Node and the global infrastructure. Finally, the Control Node interacts with the Open Node in order to monitors its sensors. Thus, it handles the consumption and selects the power supply (battery or Power over Ethernet). The Gateway and the Control Node are the defined as “Host Nodes”, so the user has not interactions with them.

Several network topologies for the different sites are also available. We define a topology for an experiment, which represents the way the nodes communicates with each other. It is defined by the direct communications between them. Thus, we define three types of topologies available for our experiments: the line topology, the grid topology and the star topology. Unfortunately, we only present in this paper results for the line topology as too many packets were lost in the other topologies and the obtained measurements are very noisy regarding the consumption of algorithms.

3.1.3 Profile and experiment

A profile is used to determine which metrics are collected when the experiment is running. In an experiment, a profile can be associated to a specific node. The profile creation form needs three main information: information related to the architecture, information related to the consumption and information related to the radio. Only the first one is mandatory: a profile can be set not to measure the consumption for instance. The architecture information is one of three options, related to the architecture of the associated node: “M3”, “A8” or “Other”. The consumption information is divided into three parts: whether or not to measure the current (in amperes), the voltage (in volts) or the power (in watts), the “period” and the “average”. The “period”, or “conversion times” (CT), and the “average” (AV) are used to define the periodic measure (PM) given by the formula: PM = CT * AV * 2. The periodic measure is used to configure the INA226, which is a component used to monitor the current/power. Moreover, the average value is used for the filtering of the signal. Thus, a greater number of averages leads to a noise reduction for the measurements. Finally, the information related to the radio can be defined with two options: in the first option, the Retrieved Signal Strength Indication (RSSI) is measured, and in the other option, the traffic is sniffed. An experiment is launched by the user for a specific running time. Thus, one of its parameters is the duration. The other parameters correspond to the resources of the experiment. The resources are the nodes, their associated firmwares and their associated profiles. An experiment can be scheduled to a specific time, or can be launched as soon as possible. As soon as an experiment is defined, it has one of the following states: it can be Waiting when the experiment has not begun, Launching if it is beginning (i.e. the nodes are being set up for it), Running when the experiment is running, Terminated if the experiment is done, or Error if an error occurred (or if it has been manually stopped).

Usage. Interacting with IoT-LAB experiments and resources can be done through three main ways: using the dashboard, using command lines tools or using the Python library.

The dashboard is accessible through a web browserFootnote 7. It allows authenticated users to monitor their experiments, their profiles and firmwares, and to get the testbed status.

The command line tools, developed in Python, are available on the IoT-LAB Github repository [29] and covered by the CeCILL v2.1 free software licence. They allow a user to manage their experiments and profiles, or to interact with running experiments. More documentation is available on the IoT-LAB CLI tools documentation [26].

The IoT-LAB client is a Python library to access the IoT-LAB API. Its source code is available on its Github repository [28], and the API is described in the IoT-LAB documentation [25].

3.2 Selected cryptographic algorithms

By lack of time, we have not implemented all the round 2 candidate algorithms of the NIST lightweight Competition. Indeed, for each algorithm, we must first test that the code correctly compiles on the dedicated hardware platform using FELICS-AE and, once done, we flash the corresponding code on nodes using the Contiki OS. Thus, we have arbitrarily decided to test the following list of 20 algorithms with all finalists included (among 33):

  • Ascon-128 and Ascon-128a [20]

  • Elephant [10]

  • ForkAE-128 [2]

  • GIFT-COFB [4]

  • GRAIN-128AEAD [24]

  • Isap [19]

  • HyENA [14]

  • Lilliput-AE [1]

  • LOTUS-AEAD and LOCUS-AEAD [15]

  • PHOTON-Beetle [5]

  • PYJAMASK [23]

  • Saturnin [11]

  • SKINNY-AEAD [7]

  • SPARKLE [6]

  • Subterranean 2.0 [16]

  • SUNDAE-GIFT [3]

  • TinyJAMBU [44]

  • Xoodyak [17]

In the context of the IoT-LAB evaluation, we compare 12 of those algorithms with the baseline application no_enc where no encryption and no authentication are used.

All the codes of these algorithms are available in the submission package of round 2 candidatesFootnote 8. The size of the key, of the nonce and of the tag are given in Table 1.

Table 1 Key, nonce and tag sizes for algorithms benchmarked with FELICS-AE

We provide in Table 2, in Table 3 and in Table 4 the evaluation results in terms of code size (Bytes), RAM (Bytes) and execution times (cycles) by using the FELICS-AE framework for the 20 algorithms on the platforms AVR ATmega128, MSP430F1611 and a classical PC. The codes are compiled with the option -03 and we measure the performances of encrypting 16 bytes of plaintext with 16 bytes of associated data.

Table 2 Benchmarking results using FELICS-AE on AVR ATmega128

In Table 2, we show the results in terms of RAM size, code size and execution time for the 20 algorithms when executing the algorithms on an AVR ATmega128. Two groups emerge regarding the execution time: one group composed of GIFT-COFB, Isap-A-128a, GRAIN-128AEAD, HyENA-128, LOCUS-AEAD-128, LOTUS-AEAD-128, Pyjamask-128, Elephant-160 and SUNDAE-GIFT-96-128 require more than 1 millions of cycles whereas the second group composed of Ascon-128, Ascon-128a, ForkAE-128, Lilliput-I-128, Lilliput-II-128, Romulus-M1-128, SKINNY-AEAD-M1-128, Saturnin-CTR-Cascade-256, Schwaemm256-128, PHOTON-BeetleAEAD128 (with a dedicated code), Sub-terranean-SAE-128 and Xoodyak-128 require less than 550 000 cycles. The best performing algorithm in this category is Schwaemm256-128 (with a dedicated code) followed closely by PHOTON-Beetle-AEAD128 with also a dedicated code. Concerning code size, only six algorithms (PHOTON6Beetle-AEAD128, Ascon-128, Ascon-128a, Schwaemm256-128, Lilliput-I-128 and Lilliput-II-128) have a code size of less than or around 5000 Bytes. The other ones have higher code sizes up to 36 000 Bytes. Concerning used RAM, excepting HyENA-128, LOCUS-AEAD-128 and LOTUS-AEAD-128, all the algorithms require less than 1000 Bytes of RAM even going down to 200 or 300 Bytes. Of course, those results highlight the better performances of a dedicated code.

Table 3 Benchmarking results using FELICS-AE on MSP430F1611

In Table 3, we show the results in terms of RAM size, code size and execution time for the 20 algorithms when executing the algorithms on an MSP430F1611. The same group (GIFT-COFB, Grain-128AEAD, Isap-A-128a, HyENA-128, LOCUS-AEAD-128, LOTUS-AEAD-128, Pyjamask-128, Elephant-160 and SUNDAE-GIFT-96-128) and surprisingly PHOTON6Beetle-AEAD128 (with the reference code) have bad performances: more than 1 millions of cycles whereas the others require less than 500 000 cycles. The case of GRAIN is clearly particular and mainly depend on the code versions: with the ref code, the performances are bad but with the optimized one, GRAIN is among the best. Concerning the code size, 12 algorithms (ForkAE-128, HyENA-128, Pyjamask-128, LOCUS-AEAD-128, Romulus-N, GIFT-COFB, Isap-A-128a, Xoodyak-128, Romulus-M1-128 and LOTUS-AEAD-128) have a code size greater than 10 000 Bytes, all the other are beyond this bound with a special mention to Lilliput-II and Schwaemm256-128 where the code size does not exceed 3700 Bytes. For the RAM size, only HyENA-128, LOCUS-AEAD-128 and LOTUS-AEAD-128 have a RAM size greater than 1000 Bytes. We have to note that Grain with the optimized code has a required RAM less than 100 Bytes, closely followed by TinyJAMBU-128 adn Lilliput-II-128.

Thus, regarding the two embedded hardware platforms, the results of the algorithms have closely similar trends: they are good or bad for both.

Table 4 Benchmarking results using FELICS-AE on PC

Finally, in Table 4, we show the results in terms of RAM size, code size and execution time for the 20 algorithms when executing the algorithms on a classical PC (Intel Core™ i5-3570 CPU @ 3.40GHz, 7.7GiB RAM). It seems that this evaluation is a little bit different, compared to the others. The eight algorithms (Grain-128AEAD, Elephant-160, PHOTON-Beetle-AEAD128, Grain-128AEAD, HyENA-128, LOCUS-AEAD-128, LOTUS-AEAD-128 and SUNDAE-GIFT-96-128) have execution time greater than 80 000 cycles. However, GIFT-COFB that behaves badly on small embedded platforms becomes a reasonable candidate here with an execution time equal to 18 000 cycles, the same remark holds for Pyjamask-128 that has also a lower execution time in PC compared to the other platforms. Special mention to Ascon-128 and Ascon-128a that require about 2000 cycles. Concerning code size, surprisingly, the behavior on a PC is closely the same as the behavior on the AVR: Grain-128AEAD, ForkAE-128, GIFT-COFB, HyENA-128, LOCUS-AEAD-128, LOTUS-AEAD-128, Pyjamask-128, Romulus-M1-128, Romulus-N and Subterranean-SAE-128 have a code size greater than 10 000 Bytes whereas Ascon-128 and Ascon-128a with the ref code have a code size of about 2 000 Bytes. Concerning RAM size, Grain-128AEAD (ref), HyENA-128, LOCUS-AEAD-128, LOTUS-AEAD-128 and SUNDAE-GIFT-96-128 require more than 2 000 Bytes of RAM whereas Ascon-128 (opt64), Ascon-128a (opt64) and Grain-128AEAD (opt32) require about 200 Bytes of RAM.

3.3 Discussion

Our results show that the algorithm Schwaemm256-128 is suitable for the AVR ATMEGA128 hardware platform since it provides well balanced performance regarding code size, RAM usage and execution time. For the MSP430F1611 hardware platform, the most suitable algorithm is GRAIN-128AEAD with optimized code (opt32). On PC hardware platforms, the most suitable algorithm is Ascon-128.

Thus, we find that analysing the performance of the algorithms on small embedded platforms remains important because the performance results obtained for a PC could hide different realities in terms of execution time, code size and RAM for many of them, in particular the MSP430 being the most consuming platform in terms of execution time. Moreover, we also observe that dedicated codes are of course more efficient than reference codes. So, it is really important that authors provide dedicated codes for dedicated platforms. We also provide in Appendix 1 the results we obtained for the NIST finalists of an ARM Cortex-M3.

4 Our benchmarking framework with IoT-LAB

We will present in this section the way we have implemented the identified performance metrics on the 12 selected algorithms. First, those algorithms need to be compiled on Contiki before we can use them on the IoT application to deploy on IoT-LAB.

4.1 Architecture

The global architecture of our framework is presented in Fig. 2.

Fig. 2
figure 2

Overview of the global architecture of our benchmarking framework

We describe here the different components interacting with the benchmarking tool. Two types of components can be distinguished: the external components and the internal components. The internal components are entities we developed and set up while the external components are external entities we are only using.

4.1.1 External components

Two external components with which the benchmarking tool interacts are the following: the IoT-LAB platform and the IoT-LAB tools. Their code can be retrieved from the IoT-LAB Git repositoryFootnote 9.

The IoT-LAB platform, described in the Sect. 3, allows an authenticated user to launch experiments on IoT assets. It then allows the user to access energy consumption data and traffic data. Thus, it handles the experiments and formats their related data.

The IoT-LAB tools are used to compile the firmwares in order to make them run on the IoT assets and to specify, retrieve and parse the requested experiments data.

4.1.2 Internal components

The benchmarking tool can be divided into three internal entities: the SQLite3 database, the local files and the Python scripts.

The SQLite3 database stores the experiments data, as well as the topologies and measured data.

The local files store the data files generated by IoT-LAB, which are retrieved by the Python script when an experiment ends. They could be considered as temporary files, as they are then parsed in order to extract metrics, which are then stored in the database. They present different formats: the consumption data is contained in OML files, the traffic data is contained in PCAP files. Log files are generated as well.

A set of Python scripts orchestrate all the benchmarking process. They parse the configuration file, create and fill the database, manipulate the local files and interact with the IoT-LAB platform and tools. A complete description of the scripts and the way they interact with all the components is given in Appendix 1. The Python scripts create the database, store and retrieve data from it by using the Database module. They also copy the files generated by IoT-LAB while running an experiment to local files. Then, they parse these files in order to sum them up in the database.

Moreover, the IoT-LAB tools propose an interface through which the Python scripts can communicate with the IoT-LAB platform. It namely includes an API to launch experiments by submitting their parameters, and a user space to retrieve the generated data.

4.2 Overall description of an experiment

We give here a general overview of generating an experiment. More details are available in Appendix 1.

4.2.1 Configuration

The parameters required by the script in order to launch experiments are given in Table 5.

Table 5 Parameters of the configuration file of a benchmarking experiment

4.2.2 How it works

Launching and handling an experiment is done in four main steps:

  1. 1.

    Defining the resources related to the given configuration file (or parameters)

  2. 2.

    Launching the experiment using the iotlabcli library

  3. 3.

    Launching the traffic capture when the experiment’s status is “Running”

  4. 4.

    Retrieving the data related to the experiment when the status of the latter is “Terminated”.

The resources related to an experiment are its nodes, the firmwares and profiles associated to these nodes.

The client nodes and the server node are defined from the architecture and the topology provided in the configuration file. We manually identified the nodes corresponding to a certain configuration (architecture, topology) by observing which nodes are linked together when creating a RPL networkFootnote 10. Thus, the following nodes are associated to the architecture “m3” and the “line” topology:

  • m3-358.grenoble.iot-lab.info (as server),

  • m3-318.grenoble.iot-lab.info (as client),

  • m3-326.grenoble.iot-lab.info (as client),

  • m3-334.grenoble.iot-lab.info (as client),

  • m3-342.grenoble.iot-lab.info (as client),

  • m3-350.grenoble.iot-lab.info (as client).

For each experiment, two firmwares are used: one for the server node, and the other for the client nodes. For now, the firmwares to use are only defined by the requested Operating System because, for each chosen operating system, all the available algorithms are launched. The profile “consumption_and_sniffer” is used when traffic capture is required.

Once the resources have been defined, the experiment is launched using the library iotlabcli.experiment. The state of the experiment is then retrieved each 30 seconds. As soon as this state is Running, the traffic capture is launched.

The traffic capture is launched through the following SSH command:

$$\begin{aligned} \texttt {sniffer\_aggregator -l<site>,<architecture>,<server\_id>-o <experiment\_id>-server.pcap} \end{aligned}$$

Thus, a PCAP file is created on the user space <user>@<site>.iot-lab.info. It is retrieved at the end of the experiment, along with the different measures defined by the profile.

4.2.3 Retrieving the data

Retrieving the data is the last step of the experiment deployment. It is done by remotely copying the folder generated by IoT-LAB regarding the selected profiles of the nodes. Moreover, the PCAP file created containing the generated packets by the nodes is also retrieved. These data are stored in the local folder data/<experiment_id> and used by the Experiments Analysis Tool. For each node, the voltage and the power of the nodes, as well as the captured traffic are stored.

The Experiments Analysis Tool takes as inputs the files generated and retrieved by the Experiments Deployment Tool. The parameters required by the script in order to analyze the experiments data are given in Table 6.

Table 6 Parameters of the configuration file for retrieving experiments data

4.3 Performance metrics

First of all, for an accurate measure of the performance metrics, we identified a transition time which corresponds to 5% of the total time of the experiment. The packets received before this transition time after the beginning of the experiment are not considered, and also the packets received after the end of the experiment, minus this time.

We identified the following metrics which are appropriate for evaluating the network performance of the algorithms.

4.3.1 Bytes per second

The number of bytes per second for an experiment is computed on the baseline experiment, where the client does not wait between sending two packets. The sum of the data packets’ lengths is then computed, and divided by the time interval of the experiment which is its duration minus twice the transition time.

4.3.2 Latency

When a client sends a packet, it writes it in the log file. Likewise, when the server receives a packet, it writes it also. In these records, each message is identified by an ID. Thus, we retrieve the latency, by subtracting the reception time and the sending time of a packet having the same ID.

4.3.3 Limit of packets per second

The limit of packets per second is obtained by dividing the number of data packets received by the server, by the duration of the packets capture. In this case, the client does not wait between sending two packets. This number represents the amount of network charge a configuration can handle.

4.3.4 Packet delivery ratio (PDR)

This ratio is obtained by dividing the number received packets by the total number of packets sent. This value is then multiplied by 100 in order to get a percentage. In this case, the client does not wait between sending two packets.

4.3.5 Packets per second

The number of received packets is then computed, and divided by the duration of the experiment. The duration of the experiment is running time of the experiment minus twice the transition time. In this case, the client does not wait between sending two packets.

4.3.6 Mean of the power consumption

The experiment duration time is split in several time subintervals. For each subinterval, the mean of the power measures (in Watts) associated is computed by doing the sum of the power measures saved for all the nodes of the experiments, divided by the number of measures. These mean values are then represented in a boxplot (one boxplot for each algorithm). We also represent these mean values by a graph, where the value of the X-axis for each measure is the middle time of the associated interval. The graph can present several curves: one per algorithm.

Watts per packet. The number of watts consumed per packet is computed over the experiment duration as the total consumption divided by the number of received packets.

5 Obtained results

In this section, we present the obtained results regarding the seven chosen metrics for the 12 candidate algorithms by using two series of experiments.

Each experiment involves an IoT application deployed on 2 nodes, a client and a serverFootnote 11 and lasts 20 minutes. Each node has an ARM-Cortex M3 hardware and using the OS Contiki-NG with RPL protocol as the underlying routing protocol. We run two sets of experiments: one for packets of length 128 bytes and one for packets of length 512 bytes. We have to note that the latency results of 512 bytes is missing because, we lost the corresponding data.

We provide in Figs. 3,  4,  56789 and 10 the obtained results for our 7 metrics, the no_enc scenario is the baseline evaluation for all the experiments.

Fig. 3
figure 3

Mean power consumption (watt) over time

Fig. 4
figure 4

Power consumption (watt) statistics

The most interesting result is the Fig. 3 where the overall watts consumption is given. First, we note that consumption values are really low (between 0.134 and 0.142 W) meaning that the consumption of all the algorithms is really low compared to the no_enc scenario. We find that the cryptographic operations are not the most consuming part in IoT devices. The most efficient algorithms in terms of energy consumption are Ascon-128a, SPARKLE and GIFT-COFB for packets of size 128 bytes and SUNDAE-GIFT when considering packets of size 512 bytes. GRAIN remains the most energy consuming for both packet sizes. This first finding is confirmed by the graph presented in Fig. 4.

The interpretation of the other metrics is more difficult. However, a global view of the results show that the 512 bytes packets case seems to just flatten all these metrics except for the no_enc case.

Fig. 5
figure 5

The measured number of bytes per second generated by the IoT application and for each used algorithm

In terms of bytes per second as shown in Fig. 5), the most efficient algorithms in the 128 bytes packets case seems to be PYJAMASK, Lilliput-AE, Subterrean 2.0 and SUNDAE-GIFT. The worst algorithms in this case are Ascon-128a and Sparkle. We think that Ascon-128a is penalized because it is a stream cipher and it has thus a big warm-up step. The GRAIN algorithm seems to perform a little bit better.

In terms of latency as shown in Fig. 6, the same algorithms seem to be performing well. However, it is difficult to generalize the obtained result since the no_enc scenario is not representative while it has not the lowest average latency.

Fig. 6
figure 6

Measured latency (seconds) of the generated packets by the IoT application and for each used algorithm

Regarding the limit of the number of packets per second as shown in Fig. 7, the most efficient algorithms are Ascon-128a, GIFT-COFB, Lilliput-AE, PYJAMASK, Sparkle, Subterrean 2.0 and SUNDAE-GIFT.

Fig. 7
figure 7

The limit number of packets per second generated by the IoT application and for each algorithm

Concerning the Packet Delivery Ratio (PDR) as depicted in Fig. 8, for both cases (128 bytes packets or 512 bytes packets), the two algorithms Ascon-128a and SPARKLE perform similar to baseline scenario no_enc with a percentage close to 98 %.

Fig. 8
figure 8

The packet delivery ratio (percentage) of the IoT application and for each algorithm

Regarding the number of packets per second generated by the IoT application as shown in Fig. 9, five algorithms perform correctly: GIFT-COFB, Lilliput-AE, PYJAMASK, Subterrean 2.0 and SUNDAE-GIFT. This metric is clearly correlated to the metric of the number of bytes per second and the metric watts per packet. Thus, the best algorithms stay the best for those 3 metrics.

Fig. 9
figure 9

Number of generated packets per second by the IoT application and for each algorithm.

Finally, the metric of the consumed watts per packet is given in Fig. 10. We clearly observe thatASCON-128a and SPARKLE are disadvantaged in the 128 bytes packets case but behave better when 512 bytes packets are considered. Thus, in both cases, the algorithms that behave well are: GIFT-COFB, Lilliput-AE, PYJAMASK, Subterrean 2.0 and SUNDAE-GIFT. Note also that the bad performances of SKINNY-AEAD are mainly linked with the fact that we use SKINNY-TK3 in our implementations leading to a clear degradation of its performance due to the fact that 56 rounds of the ciphering function are required.

Fig. 10
figure 10

The consumed watts per packet by the IoT application and for each algorithm

In fact, considering this metric is more realistic than the average consumed power. Indeed, if we study the case of Ascon-128a, that is the best algorithm in terms of average consumed power, this algorithm becomes the worst when considering watts per packet. This consumption of watts packets is correlated to its bad results in terms of bytes per second and packets per second where it they are low compared to the other algorithms. The Ascon-128a algorithm spends less energy because it ciphers fewer packets. Thus, regarding the totality of metrics, the average consumed power of the IoT application must be studied in relation with the other metrics. The algorithms that behave well for bytes per second, packets per second and watts per packet are the ones that are the most efficient regarding energy consumption. This is the case for GIFT-COFB, Lilliput-AE, PYJAMASK, Subterrean 2.0 and SUNDAE-GIFT.

6 Conclusion and future works

In this paper, we presented a new dedicated benchmarking framework based on the IoT-LAB platform to evaluate several lightweight cryptograhpic algorithms which are candidate in the round 2 of NIST competition. We ported these algortihms to the Contiki-NG OS and evaluate them by using an IoT application deployed on the physical nodes of the platform IoT-LAB while varying the packets size. The code of our benchmarking tool is publicly available. Indeed, we expect that developers and cryptographers rely on it to add more algorithms and metrics.

Our work could be extended by adding algorithms, metrics, and versions of the available algorithms, etc. However, our main perspective is that this work establishes the first step in the direction of real-world evaluation of lightweight cryptograpihic primitives to enhance IoT security.