Keywords

1 Introduction

There are appealing advantages to using public-key cryptography (PKC) in RFID applications that require authentication. Indeed, PKC-based authentication significantly simplifies the distribution of cryptographic keys, resulting in a more scalable solution. The challenge of using ECC [8] in the RFID environment is how to deal with the high computational cost associated with ECC algorithms relative to the capabilities of the RFID platform [6, 10, 19, 20]. Fortunately, this question has been reasonably well solved, and Table 1 shows some of the more recent achievements. It is fair to state that a security level of 80 bit (ECC curves of at least 160 bit) is within reach, with sub-second latency, in small footprint applications (i.e. 15 KGate, 100 KHz hardware implementations; or 16 bit, 8-MHz software implementations).

Table 1. Recent ECC implementations for RFID

To achieve these results, the authors of the designs in Table 1 need to use advanced algorithmic transformations and optimizations. Furthermore, they also make use of technology-specific features. Software implementations assume certain amounts of flash and RAM memory, or microcontroller features such as a hardware multiplier. Hardware implementations assume a target cell library of specific performance and feature size.

The assumption of a specific target architecture at the start of the design is typical for contemporary digital design methods. The designs in Table 1 are no exceptions. On the other hand, it is much harder for ECC designers to make clear commitments to application constraints such as the available energy budget and the required authentication latency. This is understandable: EDA tools do a poor job at estimating system performance and energy consumption. It is easier to optimize an implementation on a given target and then evaluate its characteristics on a prototype or using low-level simulation.

In this paper, we discuss the ECC RFID design problem by considering how to meet design requirements, rather than how to obtain the fastest ECC point multiplication. The main motivator for this is that the power source in the RFID environment is truly unique, and that this design problem deserves a more holistic approach which considers energy source as well as energy consumer.

Indeed, depending on the power source, RFID designs are either energy-constrained or else power constrained [4, 5]. They also have to optimize application throughput (the time taken to complete a single signature) with respect to the available energy budget. We note that the requirements on energy and power depend on the type of RFID.

  • Active RFID are powered from a battery source, and they have to minimize the energy consumed per signature as this will maximize the battery lifetime.

  • Passive RFID are powered through an RF source, and they have to minimize the time required per signature while matching the available power budget.

  • Passive RFID, powered through an energy harvesting mechanism that includes an energy store, have to minimize the energy used per signature as well, since this allows uninterrupted RFID operation. Furthermore, the energy needed for the desired application throughput has to match the average energy influx in the harvester.

The question addressed in this paper is: how can we select architecture parameters such that we meet these design requirements, including energy budget and application throughput? We provide an empirical answer to this question by presenting the energy/latency characteristics of an RFID doing ECDSA key generation, signature generation and signature verification. We present our results for a microcontroller target, a MSP430F5438A from Texas Instruments. We evaluate the energy characteristics of signatures at 80-bit and 128-bit security level [13]. We examine multiple architecture configurations: multiple frequencies, multiple core voltages, and with/without use of the MSP430’s hardware multiplier. For each of these configurations, we carefully measure the required energy by tracking the MSP430 core current at high speed. The resulting curves then allow us to determine, for a given security level and energy budget, the most appropriate core frequency and voltage level.

In a nutshell, our results are as follows. Increasing the MSP430 core frequency always reduces the energy consumed per signature. This is because energy consumed by leakage (i.e. static power dissipation) becomes proportionally less important as the runtime of the application decreases. Hence, in a given energy harvesting method, it is always better to wait as long as possible before initiating ECC computations and, once sufficient energy is available, complete them as quickly as possible at the highest possible operating frequency. A second observation is that the impact of security level on energy budget is significant. For an MSP430 without a hardware multiplier, our prototype needs roughly six times as much energy per signature at the 128 bit security level compared to the 80 bit security level. When a hardware multiplier can be used, the difference is roughly two times. A third observation is that architecture specialization matters. Under constant security level, a hardware multiplier reduces the energy consumption by almost 8 times. Voltage scaling results in an additional gain factor of 2 in energy.

The remainder of the paper is organized as follows. In the next section, we describe the background related to power and energy in the digital electronics. In Sect. 3, we review the target ECC design measured using our method. Next, we introduce the target platform for our experiments. Section 5 describes a setup that can be used for precise energy measurement of RFID applications. The resulting energy/throughput curves are presented in Sect. 6, and applied in a methodology in Sect. 7. Section 8 concludes the paper.

2 Background: Power and Energy in Digital Electronics

Power dissipation in modern digital electronics has two major components: static power dissipation defined by static leakage current, and dynamic power consumption, defined by circuit activity. The static power dissipation depends, in first order, on the operating voltage of the circuit and the size of the circuit. Dynamic power dissipation depends, in first order, on the operating frequency of the circuit, the size of the circuit, and the square of the operating voltage.

Fig. 1.
figure 1

Expected energy dissipation as a function of frequency

We analyze what happens to the energy dissipation for a fixed workload, such as signature verification, under varying operating conditions. In the following, \(K\) and \(C\) are technology constants, \(\alpha \) is an application-dependent activity factor, \(T_{cycle}\) is the clock cycle period, \(f_{cycle}\) is the operating frequency, and \(n\) is the workload cycle budget. The energy dissipation per workload has a static and a dynamic component.

$$\begin{aligned} E_{dyn} = P_{dyn} . T_{alg} = \alpha . C . V^2 . f_{cycle} . T_{cycle} . n \end{aligned}$$
(1)
$$\begin{aligned} E_{stat} = P_{static} . T_{alg} = K . V . T_{cycle} . n \end{aligned}$$
(2)

The total energy dissipation thus equals

$$\begin{aligned} E_{tot} = E_{dyn} + E_{static} = n . [ \alpha . C .V^2 + K . V . T_{cycle} ] \end{aligned}$$
(3)

This formulation leads to the following assessment. If the clock frequency increases, the total energy per workload will decrease: the dynamic energy remains constant, while the static energy decreases. Furthermore, if the operating voltage decreases, the total energy per workload will decrease as well. Finally, if the security level of the design increases (from 80 to 128 bit, for example), the cycle budget \(n\) will increase, and the total energy per workload will increase as well. This analysis is captured by Fig. 1. The leftmost point on the curve represents a design where the static and dynamic parts of the energy per workload are equal. As the operating frequency increases, the total energy decreases as well. We note that this figure is a theoretical model: it ignores overhead for clock generation, and for transitions between power modes.

3 Target Protocol: ECDSA in secp160r1 and nistp256

The driving application for our measurements is ECDSA key generation, signature generation, and signature verification. Rather than developing our own implementation from scratch, we use the RELIC library with support for the MSP430 and 32-bit hardware multiplier [16]. We implement two prime-field curves, secp160r1 and nistp256. The scalar multiplication is done with a left-to-right window-3 NAF multiplication, and using Jacobian Projective Coordinates. The field operations are basic Comba multiplication and squaring, with Montgomery reduction. SHA-1 is used for hashing and as a pseudorandom generator. We used two implementation variants for each of the curves: one which uses a 32-bit hardware multiplier (using RELIC’s msp-asm backend), and a second one which emulates multiplication in software (using RELIC’s easy backend).

Our standard testbench goes through ECDSA key generation, ECDSA signature generation of a fixed digest, and ECDSA signature verification. The execution time, as well as the energy, is measured for each of these steps separately. Our performance and energy numbers only cover the computations, and they don’t include initialization overhead, or overhead from data communications.

4 Target Platform: MSP430F5438A

We use MSP430F5438A [1] as our prototyping platform. The MSP430F5438A is an ultra-low power Reduced Instruction Set Computer (RISC) from Texas Instruments, optimized for low-resource applications [2]. The architecture combines five different low power modes suitable for low power battery operation. The MSP430F5438A features a 16-bit CPU, 256 KB flash, 16 KB SRAM, up to 25 MHz CPU clock and 16 working registers with 12 available as general purpose registers. It also supports a 32 bit hardware multiplier.

Fig. 2.
figure 2

Voltage and frequency scaling

Figure 2 shows the internal energy management architecture of the MSP430F5438A. In general, VCore supplies the CPU, memories (flash and RAM), and the digital modules, while DVcc supplies the I/Os and all analog modules. The internal core voltage of the MSP430F5438A, VCore, needs to be adjusted as a function of the desired operating frequency. The VCore output is programmable in four steps, to provide only as much power as is needed for the speed that has been selected for the CPU. We configure VCore voltage by writing register bits PMMCOREV[1:0]. Table 2 shows recommended PMMCOREV settings and minimum external power supply voltage for different frequency ranges. We make use of these programmable VCore levels to optimize the energy efficiency of ECDSA on the MSP430.

Table 2. Recommended PMMCOREV and \(\mathrm{{DV}}_{\text{ CC }}\) settings for selected \(\mathrm{{f}}_{\text{ sys }}\)

5 High Resolution Energy Measurement

Fig. 3.
figure 3

Energy measurement setup block diagram

Figure 3 depicts the block diagram of energy measurement setup used in our experiments. The average current consumed by a microcontroller during a particular interval of time is measured by integrating the immediate current. The current is measured by means of the voltage drop over a shunt resistor on the microcontroller Vcc line. To sample the voltage drop, we use OpenADC [14, 15] with a Spartan FPGA [3], in place of a traditional high-speed oscilloscope setup. OpenADC is a custom ADC board with a 10-bit A/D converter that supports differential inputs and an adjustable reference voltage. The FPGA takes care of sample accumulation and sample counting.

The integration period is defined by means of trigger signals created by the microcontroller. This requires a simple instrumentation of the software application. The clock frequency of the A/D converter can be independently chosen of the microcontroller clock.

Fig. 4.
figure 4

Average current measurement

Procedure to measure average current: At a desired time, the microcontroller triggers the FPGA to start sampling OpenADC data (Fig. 4). Once trigger is asserted, FPGA accumulates ADC samples until trigger line is re-asserted. The accumulator represents the average current consumed by a function being executed on a microcontroller. It also measures number of samples collected during trigger window to derive the execution time of that function. The FPGA design then provides average current and number of samples collected to a python script running on laptop (PC). A Python script uses this data and calculates the average energy consumed by an MSP430 function as follows.

$$\begin{aligned} \text{ Energy }&= V_{cc}. I_{avg}. T_{Alg} \nonumber \\&=V_{cc}.\frac{Accm . V_{ref}}{2^n . N_s. R} . N_s . T_s \nonumber \\&=V_{cc}. \frac{{Accm}. V_{ref}}{2^n. R} . T_s \end{aligned}$$
(4)
$$\begin{aligned} \text{ Cycle } \text{ count }=N_s . T_s . F_{cpu} \end{aligned}$$
(5)

where \(V_{cc}\) is supply voltage of the microcontroller, \(Accm\) is FPGA accumulator value, \(V_{ref}\) is ADC reference voltage, \(N_s\) is number of samples collected during trigger window, \(T_s\) is sampling period, \(2^n\) is resolution of ADC where \(n\) is 10 bit, \(R\) is a shunt resistor value and \(F_{cpu}\) is microcontroller frequency. Our Energy formula assumes that the voltage supply at the microcontroller input is constant, in other words, that the voltage drop over the shunt resistor is negligible with respect to \(V_{cc}\).

The above formula shows that the resolution of energy measurements increases with low reference voltage of ADC, high sampling frequency and with high resolution of ADC. We use sampling rate of 20 MHz for operating frequencies below 15 MHz and 30 MHz for operating frequencies above 15 MHz. In our experiments, ADC reference voltage is 0.5 V and shunt resistor is \(100\,\Omega \). This setup can be used to measure the energy consumption of any device provided that it has the facility to insert a resistor in series with power supply. Also, the device should be able to generate a trigger signal to activate the FPGA accumulator.

6 Results

In this section, we present experimental results of our energy measurements under different architecture configurations. We measure the energy consumption of ECDSA key generation, signature generation and signature verification for different operating voltages and frequency settings. We also measure the runtime for each operation in order to obtain the throughput. We use the gcc 4.6.3 cross-compiler for MSP430 family of microcontrollers. Table 3 shows cycle counts for different ECDSA operations, and Table 4 shows the footprint of the implementations.

Table 3. Cycle count of ECDSA operations on the MSP430F5438A
Table 4. Code size of the implementation of ECDSA on the MSP430F5438A

The graphs show the energy/throughput characteristics for signature generation.

Fig. 5.
figure 5

Energy consumption for secp160r1 without hardware multiplier

Fig. 6.
figure 6

Energy consumption for secp160r1 with hardware multiplier

Figures 5 and 6 show energy consumption for 80-bit(secp160r1 curve) security level, without and with hardware multiplier, respectively. We analyze the effect of different operating voltage on energy consumption. In our experiments, we found that if the microcontroller is operated at 2.0 V instead of 2.7 V, it saves almost 1.4 times energy consumption. Reducing operating voltage reduces dynamic power consumption because the circuit will have smaller voltage swings during switching. Also the static power consumption reduces because the leakage current reduces. Further, the use of the hardware multiplier results in almost 8 times energy reduction. This is because the hardware multiplier accelerates the signing operation almost by 8 times.

Fig. 7.
figure 7

Energy consumption for nistp256 without hardware multiplier

Fig. 8.
figure 8

Energy consumption for nistp256 with hardware multiplier

Fig. 9.
figure 9

Energy Profile for ECDSA secp160r1 and nistp256 on TI MSP430F5438A

Fig. 10.
figure 10

Energy harvesting system

Figures 7 and 8 show energy consumption for 128-bit(nistp256 curve) security level, without and with hardware multiplier respectively.

The curves for operation at 2.7 V is discontinuous at 12 MHz and 20 MHz. The discontinuities are caused by reprogramming of the power management system of the CPU.

Figure 9 shows energy improvement factors for different architecture configurations. It can be observed that moving towards the origin results in the most optimized design where energy consumption is minimal. Reducing the operating supply voltage from 2.7 to 2.0 V results in a gain of 1.4 for both security levels. The computational complexity of used algorithm results in different improvement factors since energy consumption is dependent upon runtime of an algorithm. The acceleration achieved with hardware multiplier is also dependent on the used security level which gives different energy gain factors. Overall the most significant impact comes from architecture specialization.

7 Methodology for Architecture-Energy Tuning

Finally, we show how the energy measurement method can be applied to meet the design requirements. Figure 10 shows a typical energy harvesting [18] setup where energy is harvested and stored. When sufficient energy is available, the application can execute. In the following examples, we demonstrate how our energy throughput curves can be used for system dimensioning. We consider two cases. In a first case, start from an energy constraint and derive the achievable performance, expressed as the latency to complete one signature. This case is relevant when we use an energy store of a given dimension, and we would like to evaluate what system performance can be achieved. In the second case, we start from a desired application performance, and we derive the required energy (and thus the required energy store). Since there are multiple energy/throughput curves (at different security levels, and at different architecture configurations), we propose that this evaluation is done concurrently over all curves available, in order to analyze the design space. In the example discussed below, however, we will focus on a single security level and a microcontroller with a hardware multiplier.

Fig. 11.
figure 11

Architecture tuning for energy constrained system

Fig. 12.
figure 12

Architecture tuning for throughput constrained system

Figure 11 shows the energy curve for the 80-bit security level. We assume an energy limit of 2 mJ per signature, and we assume a 2 V operating level. This limit sets a minimum operating frequency for the microcontroller. The 2 mJ energy level requires a 3 MHz operating frequency, which enables a signature to complete in a one second. If we increase the core voltage to 2.7 V, the minimum operating frequency at 2 mJ per signature will increase to 9 MHz, and the signature will be done in 0.25 s.

A second example is to start from the required signature throughput. The example shown in Fig. 12 needs to complete a signature in 1/6 of a second. We use our graphs to decide the operating frequency and to find required energy per signature. First, we note that the system needs to operate at least at 14 MHz. At that frequency, only the 2.7 V core voltage mode is available. Under this setting, the corresponding energy per signature is 1.9 mJ.

8 Conclusion

We demonstrated the importance of energy-architecture tuning for RFID. The complex energy-provisioning environment of these applications requires a holistic approach that not only considers performance optimization, but also the energy and/or power needs. We analyzed and quantified, for two different security levels, the impact of several architecture optimization techniques including voltage scaling, frequency scaling, and the use of a hardware multiplier. Using the analysis presented in this paper, we can now investigate the design of a secure RFID with integrated energy harvesting.