Keywords

1 Introduction

Random Numbers Generators (RNGs) form the backbone in the development of almost all of the devices requiring secure communication, device authentication, or data encryption. Applications of such devices include Smart Cards, RFID tags, and IoT devices. The unpredictability of random numbers is a crucial component in parameter and key generation for both symmetric and asymmetric cryptography, generation of random masks for padding data, generating session keys between communicating parties and several other secured applications. The RNGs rely on inherent chaos and unpredictability of various physical factors from the environment or hardware components to provide the much-needed randomness, which signifies that infinitely long history of previous random numbers provides no advantage over deciding or predicting the future random occurrences.

True Random Number Generators (TRNGs) derive its randomness from physical factors in the environment, and thus when implemented in hardware, the randomness is extracted from the physical parameters such as gate delay. Fault in design and fabrication procedure of TRNG results in insufficient entropy (randomness) or improper functioning of the TRNG. Depending on the type of randomness source and the post-processing involved, there have been quite a large number of TRNG designs in the literature. A popular TRNG based on Thermal Noise was first introduced by Intel [14] and recently can be found in [10, 19]. The most commonly used entropy source for both FPGA and ASIC TRNG’s is metastability, which can be found in [12, 20]. Timing jitter in electronic systems was also considered as TRNG source in [11, 26]. However, along with the design challenges involved in designing these TRNGs, an equally complex task is to develop test strategies for TRNGs. Owing to the fact that most tests are statistical and can only evaluate the statistical quality of the generated numbers and not their entropy, modern methods for certification of TRNGs involve a carefully chosen bag of tests. These tests mainly include: (1) NIST standard tests [22], which aims at estimating the min-entropy, i.e., evaluating the information content of the most likely outcome, and (2) AIS 20/31 Tests [15] proposed by the German Federal Office for Information Security, which estimates Shannon entropy, targeted at evaluating the average information content of the random variable associated with the random source.

Furthermore, the TRNGs which are often implemented using external hardware are susceptible to physical attacks. In [16] attackers inject periodic signals to the power supply in order to destroy the randomness of the Ring Oscillators (ROs) by reducing the entropy of the generated keystream. In [7], the adversaries used strong magnetic fields to tamper the randomness generated by 50 ROs. In other instances, usage of power, clock glitches, and other techniques of physical attacks have threatened the deployment of TRNGs. Though there exist methods of on-the-fly testing of TRNGs which can be effective during such attacks [21, 27], it would be desirable to develop TRNG sources which are available to a program without resorting to an external component. The TRNG should derive its randomness from the underlying hardware artifacts which are available in the computer architecture and which exhibit its randomness owing to the various processes which execute on them. Such an in-situ TRNG design would also make physical attacks more challenging, as compared to when the TRNG is an add-on hardware device which the adversary can target more effectively.

In this paper, we analyze and propose an in-situ design for TRNG based on the randomness derived from various hardware activities as observed in the underlying computer architecture. These hardware activities (like instruction-counts, CPU-cycles, etc.) are observed through Hardware Performance Counters (HPCs) allowing detailed, low-level measurement of different process behavior executed in CPUs. HPCs provide valuable information about program execution, which were extensively used in the literature to attack strong cryptographic algorithms [3, 9] and also used to detect the execution of malicious programs [1, 2, 5]. The information provided by HPCs are also used to identify vulnerabilities which aid reverse engineering of proprietary software [4]. However, modern processor vendors do not make any guarantee about determinism of these hardware activities (or HPC events) [25]. We analyze the source of non-determinism exhibited by these HPC events and aim to utilize this non-determinism in designing our TRNG module. We observe that the least significant bits of these HPC events display high-quality randomness and high entropy. We also propose a hybrid model coupled with a secure hash implementation in order to cope up with the latency in accessing the HPC events and thus to increase the throughput of the TRNG design where required.

Fig. 1.
figure 1

Generic structure of the proposed True Random Number Generator (TRNG) circuit

Contribution

The major highlights of the paper in context to the general structure of a TRNG (as illustrated in Fig. 1) are explained in details as follows:

  • We propose a TRNG derived from computer architecture, which thrives on the randomness observed through the architectural events from the underlying hardware. The HPC counters, which monitors these architectural events, exhibits inherent non-determinism in their implementation [25] and are also affected due to a vast number of processes running on a processor core in a fixed quantum of time. HPC event counters provide a cumulative count to these architectural events and thus proposed to be a high source of entropy.

  • It was also observed that the randomness was highest in the Least Significant bits (LSBs) for the observed values from these counters. The entropy reduces as we consider the bits more towards the Most Significant Bit (MSBs).

  • These event counter statistics over the monitored application along with the background noise can only be observed at periodic intervals, which could create a bottleneck in terms of throughput. Thus, in order to increase the throughput of the overall random number generation, we pair the proposed TRNG with a secured hash implementation using the Keccak algorithm.

In the next section, we present a brief discussion on HPCs, which is the primitive tool to extract the randomness exhibited by the architectural component.

2 Preliminaries on Hardware Performance Counters

Hardware Performance Counters (HPCs) are special purpose registers present in most of the modern-day microprocessors which store the hardware related activities during the execution of a program. The HPCs provide detailed, low-level hardware utilization statistics to advance user modules and are thus very useful in code optimization and process tuning operations. The HPCs typically count the number of occurrences of various hardware events such as number of instructions executed, number of bus-cycles consumed, number of CPU-cycles consumed, different operations related to the cache memory, branch misprediction operations and many more events related to the architectural activities of a system. There are a wide variety of events that can be measured with HPCs, and the event availability varies considerably among CPUs and vendors. A full list of available events can be found in various vendor’s architectural manuals. As these events are immensely useful in computer architectural design and optimizations, these were available to users having user-level privileges. However, various researchers developed security implications of such user-level accesses to these counter values for the cryptographic implementations, which led the security engineers to push the HPCs to higher privilege scale and thus the counters can only be accessed in modern systems with administrative privileges. It has already been shown in [25] that the HPC values obtained by monitoring a process are not deterministic in nature. In the next section, we analyze the reason behind the inherent non-determinism exhibited by the HPC events, which is the primary motivation behind the proposed TRNG design.

3 Non-determinism of HPCs and Motivation

All Linux based systems with kernel version 2.6.31 and above have a utility named perf, which can be used to access and read the HPC registers through the perf_event system call. The perf utility provides a simple command line interface to observe the detailed, low-level hardware based event counter values during the execution of a process. A user can monitor desired events for the entire duration of the execution or can observe them periodically with a fixed interval of time. The command to observe the values of a monitored event (<event_name>) with a fixed interval (<interval_duration>) of time during the execution of an executable (<executable_name>) is given as follows:

figure a

In all of our experiments, we took a C code snippet which does nothing but loops infinitely, over which we observed various event counts such as instructions, bus-cycles, cpu-cycles, cache-misses, branch-misses and many more. As an example, we observed two performance counter events instruction and cpu-cycles over the executable of infinite loop with 10 msFootnote 1 interval of time and the corresponding observations are shown in Fig. 2. In an ideal case, the HPC events instruction and cpu-cycles should report constant values over the duration of time, as the executable is doing nothing except looping infinitely. But, it can be observed from Fig. 2 that the number of instructions and the number of CPU cycles is not constant over time, which shows the significant amount of non-determinism exhibited by these performance counters.

Fig. 2.
figure 2

Variation of the HPC events (a) instructions and (b) cpu-cycles monitored over an infinite loop on different time intervals

Measuring exact event counts using HPCs can be difficult because of several external sources of variation such as program layout [18, 24], multiprocessor variation [6], operating system interaction [17], measurement overhead [29], and hardware implementation details [23, 24]. As discussed in [25], after carefully avoiding these sources of variation as much as possible, it was found that internal hardware interrupt is the potential source that leads to non-deterministic behaviour. Most of the HPC events get incremented an extra time for every hardware interrupt that occurs in the system. Hence, if an event is affected by hardware interrupts, then it cannot be a deterministic event, as it is impossible to predict in advance when these interrupts will happen. There are several types of interrupts affecting these HPC events such as Local Timer Interrupts (LOC), IRQ Work Interrupts (IWI), Rescheduling Interrupts (RES), Function Call Interrupts (CAL), and TLB Shootdowns (TLB). The effect of these interrupts can be monitored efficiently using /proc/interrupts, which also includes additional interrupt counts that happen outside of process context, adding extra assistance to the non-determinism of HPC events.

In order to validate this, we again monitored the events instructions and cpu-cycles as before and along with that we also measured the total number of interrupts received per second using the Linux mpstat command which uses /proc/interrupts as a subroutine. We performed the experimentation on a per-core approach using the Linux taskset command. We present the effect of these hardware interrupts on both the HPC events in Fig. 3. We can easily observe from both the figures that whenever there is a surge in the number of interrupts, the counts of the events also increases validating the association between hardware interrupts and HPC events.

HPCs are registers dedicated to each core of a processor. Apart from being affected by the hardware interrupts, these event counts are also affected by the execution of various processes which are running in the background. The Operating System entirely administrates the execution of all the processes on the actual hardware of the architectural components, which are extremely complex to model as it is mostly dominated by the effect of speculative executions, out-of-order execution, interrupts, instruction prefetching and many more optimization techniques. Most of the modern processors are multicore, and an innumerable number of processes can get executed concurrently on each of these processor cores. The HPCs measure the event counts for all the events for all those processes which are concurrently running on the same processor core along with their fine-grained context switches. Thus if these HPCs are monitored for a per-core based approach, the counters are not only affected by the event counts from the operations in the monitored process but also from the all other processes which are running concurrently to the monitored application. Hence the background processes also have a significant impact on the event count of the target process, which is being monitored. In the next section, we propose an efficient construction of TRNG using the statistics obtained from these HPCs.

Fig. 3.
figure 3

Effect of hardware interrupts on the HPC events (a) instructions and (b) cpu-cycles monitored over an infinite loop on different time instances

4 Randomness Extraction Using HPCs

In this section, we briefly discuss about the selection of appropriate architectural events to design the TRNG and the methodology to extract the randomness from such events. In the previous section, we have seen that the performance counter values range between a particular interval with low deviation from the mean, which makes the most significant bits of the observed values to be highly predictable. On the other hand, the noise component in these performance counter values is very high if we consider only the least significant bits. The analysis of selecting the LSBs is elaborated next.

4.1 Selection of the Least Significant Bits

In the previous section, we claimed that the performance counter events which are observed over a straightforward executable exhibit a high source of entropy as they inherently possess a considerable amount of noise which is entirely contributed by the unpredictable hardware interrupts in the system and various processes running in the background. It should be further noted that the entropy of each bit position would not be the same for the binary sequences converted directly from the monitored values. The entropy is highest with the LSB, which can be considered as the most random bit output while the MSB is highly predictable. In support of our claim, we observed 500, 000 instances of the performance counter events instructions and cpu-cycles, and calculated the entropy for each bit position. Figure 4 shows entropy values of all the bit positions for both the events. We can observe from the figures that the LSBs have the highest entropy, and as we move towards the MSBs, the entropy gets reduced. Hence, instead of considering the observed HPC value, we transform the data into binary sequences and consider the last 9Footnote 2 bits for our further analysis.

Fig. 4.
figure 4

Entropy of each LSBs for HPC event (a) instructions, and (b) cpu-cycles

4.2 Selection of HPC Events Using Yao’s Next-Bit Test

In all of our experiments, we considered the sampling interval time as 10 ms. Hence, the perf tool will generate data points for any HPC event after each 10 ms time interval. Let us consider last n LSB bits of a single instance of data as the source of randomness. At any time instant t, we considered the sequence of bits \(S(n, t) = \left( b_0^t, b_1^t, \cdots , b_{n-2}^t, b_{n-1}^t\right) \) derived directly from the value obtained by perf tool for each HPC events. Our conjecture to consider an HPC event as a source of TRNG is based on the idea that there will be no bias in predicting a bit, even if we know the previous values. In order to determine the bias in observation, we provide Yao’s next-bit test [28], which we discuss as follows.

Table 1. Next-bit test for different HPC events for \(m=4\)
Table 2. Experimental setups for validation of the proposed claim

Suppose we know first m-bits of the n possible bits for any sequence S(nt), i.e., the sequence S(mt) is already given (where \(m<n\)). Now, according to Yao’s Next-bit test, we say that the sequence S(nt) has no bias if probability of the \((m+1)^{th}\) bit being zero is \(0.5 \pm \delta \) (i.e., Pr\([b_m^t = 0] = 0.5 \pm \delta \)), given the knowledge of S(mt), when \(\delta \) is negligible (with respect to the security parameter). There are \(2^m\) possibilities for S(mt), and we perform the test for all such possibilities. In order to estimate the Pr\([b_m^t = 0]\), we consider N such sequences by observing an HPC event at N successive interval of time. Let the sequence S(mt) occurs at \(\mathcal {T}\) times out of N possibilities. We now count the occurrences of \(b_m^i = 0\) and \(b_m^i = 1\) as \(\mathcal {C}_0\) and \(\mathcal {C}_1\) respectively, where \(i=1, 2, \cdots , \mathcal {T}\). We define the estimated probabilities as and . Without loss of generality we first consider the case of \(m=4\), i.e., first 4 bits of the binary sequence is known. There are \(2^4\) possibilities of S(mt), which are shown in Known Bits column of Table 1. We observed \(N = 500,000\) values for the events instructions, cpu-cycles, cache-misses, and branches and estimated the probability as discussed previously. The estimated probability values for all the events and for all the combination of Known Bits and the corresponding values of \(\delta \) (as mentioned previously) are shown in Table 1. The first cell in Table 1 contains the value 0.499362, which signifies that for the event instructions if we known that the first 4 bits are 0000, then the estimated probability that the next bit will be 0 is 0.499362. We can also observe from the table that the corresponding value of \(\delta \) is 0.000638. It is clear from the table that all the combinations have probabilities close to 0.5 for the events instructions and cpu-cycles. The corresponding values of \(\delta \) are also negligible. In case of events cache-misses and branches the probability values are highly biased for some combinations with high value of \(\delta \) and are shown with bold faces. The average values of \(\delta \) for all the sequences corresponding to each of the performance counter events are also shown in Table 1 and justifiably these values are higher for the events cache-misses and branches. Similar results are observed for other values of m. Hence, we conclude after this analysis that the events instructions and cpu-cycles can act as better candidate for source of randomness while we discard the other two events cache-misses and branches in our further analysis. In the next section, we validate our conclusion through an extensive set of results with the help of NIST and AIS 20/31 Test suite.

5 Experimental Validation

In this section, we first provide results on TRNG output obtained from the raw HPC events followed by the results on the TRNG output in the presence of a strong adversarial perturbation.

5.1 Results on TRNG Output Obtained from HPC Events

All the experiments are conducted on two different processors as listed in Table 2, where the access to HPC events is available to users with administrative privilege. There exists a diverse set of HPC events which can be accessed via the perf utility. We considered some of the primitive events such as instructions, cpu-cycles, bus-cycles, cache-misses, branches etc., as the obtained values for each of these events are high compared to the other events. The perf statistics are recorded after every time interval of 10 ms for an executable which runs infinitely over time. The idea behind the selection of events which showed high values compared to the lower ones is because the ones reporting very high values can be expected to produce decent randomness in the Least Significant Bits. On the contrary, events which show low values as output are intuitively more predictable compared to the earlier case.

The NIST Test suite is observed to work the best for the events instructions and cpu-cycles in all the setups as mentioned in Table 2. The perf statistic is recorded for more than 15 hours of execution time, which resulted in altogether 10 sets, each set having more than \(5.5 \times 10^7\) performance counter values. For each set, we selected the last 9 bits from the LSB of each observation and appended one after another to generate a consolidated binary string. We applied the NIST Test suite on this consolidated binary sequence. We furnish our results from the NIST suite for both Intel and AMD processors in Table 3. We can observe from the table that both the events instructions and cpu-cycles pass all the 15 tests under the NIST Test suite. We perform the same experimentation on each of the 10 sets and obtain similar results for all the sets. Table 3 also shows the NIST Test results for the HPC event cache-misses on both the processors. We can observe that most of the tests under the NIST Test suite fails for this event, which aligns with the results shown in Table 1.

Table 3. NIST test results on TRNG output for different HPC events on two different processors

In order to further analyze the TRNG property of the events instructions and cpu-cycles, we applied the AIS 20/31 Test procedures on the consolidated binary output string as obtained before. The results of the tests for both Intel and AMD processors are shown in Table 4. We can observe from the table that all the tests under Procedure A and Procedure B of AIS 20/31 Test suite pass for both the events instructions and cpu-cycles. The details of the parameters mentioned under Procedure B can be found in the AIS 20/31 Test Manual. Hence, with the outcomes of these two test suites, we conclude that the HPC events which are affected by various hardware interrupt and the background noises can be effectively used to design a TRNG module.

Table 4. AIS 20/31 test results on TRNG output for different HPC events on two different processors

5.2 Perturbation in TRNG Output in Presence of an Adversary

In the previous subsections, we tested the sources of entropy through normal process execution framework in a multi-core processor setup, where we show that the inherent chaos of the various process execution and the unpredictability of hardware interrupts have an extensive impact on the HPC values. We claim with suitable results that the values obtained from HPCs qualify for a pure computer architecture based TRNG. But we are also interested in understanding the effect on the HPCs in the presence of a powerful adversary.

Let us consider a server setup, where there are multiple users logged into the same server, and all of the users are having administrative privileges. Thus all of these users can observe perf statistics over executables which run on processor cores shared across various user processes. Hence it is feasible for an adversary running on the same processor core as the TRNG module to modify these HPC values in regular time intervals. We performed several experiments where the adversary process runs on the same processor core as the target core and uses asynchronous perf ioctl system calls to set the value of the HPC event instructions to zero periodically. This manipulation by the adversary hampers the instruction counts observed over a synchronous measurement procedure to a great extent. The range of the instruction counts varied widely when a concurrent adversary module refreshed the instruction counts, which is also expected if the adversary wishes to modify the counter values instead of resetting it. Any modification to the counter values by a powerful adversary does have an impact in changing the overall values of the instruction counts but does not have any impact on the entropy of the least significant bits of the counter values. The reason behind it is that of the inherent chaos of a large number of concurrent process executions and optimization constructs of the Operating System and their effect on the underlying computer architecture modules. Hence, a powerful adversary needs to not only model the chaos exhibited by the background concurrent processes but also needs to have complete control of hardware interrupts appearing in the system, both of which is assumed to be a challenging task to execute. Without loss of generality, we tested the TRNG sequences generated by the HPC event instructions on the Intel processor in the presence of this adversary with both NIST and AIS 20/31 Test suites. The results are furnished in Table 5, which shows that all the tests under both of these test suites pass with the modified TRNG sequence. In the next section, we discuss a hybrid TRNG construction using a secure hash implementation for enhancing the throughput of the design to cope up with the latency in accessing HPC events.

Table 5. NIST and AIS 20/31 test results on TRNG output for the HPC event instructions on Intel processor after adversarial modification

6 Hybrid Construction to Enhance Throughput

In this section, we describe an efficient generation of random bit string through a secured hash implementation using Keccak algorithm [8] followed by its validation as TRNG using NIST and AIS 20/31 Test suites. The design is simple yet effective in context to generating a high-speed sequence of random numbers. In the previous section, we elaborate on how True Random Numbers were obtained from the Hardware Performance Counter values. The proposed design only considers last 9 bits from the LSB of each cumulative sample of event count for a periodic interval of 10 ms. This latency of 10 ms of the generation of 9 random bits is inappropriate when compared to real-life random number generation requirements. Thus we bridge the gap with a hybrid model which uses a shift register, the Keccak algorithm, and a control block by considering the random bits obtained from HPCs as input. If an application asks for a random number within the interval of 10 ms, the hybrid model uses its deterministic algorithm to generate a more extensive number of random bits using the previous inputs.

6.1 Cryptographic Post-processing of the TRNG Output

The hybrid TRNG construction as shown in Fig. 5, takes the output of the true random number sequence obtained from the HPC events as its input and generates a sequence of more number of bits using a shift register (SR_TRNG), a Control Block, and Keccak (or SHA-3) Algorithm. The hybrid construction works with two operational modes as follows:

Fig. 5.
figure 5

Hybrid Construction for generating internal random numbers

  1. 1.

    Initialization: The HPC based TRNG construction generates 9 random bits in every 10 ms interval. The hybrid construction waits for the first 80 ms after the start of the system. The shift register SR_TRNG, which is of length 72 bits, is filled from the LSB to MSB after each 10 ms in such a way that after first 80 ms the register SR_TRNG is filled with 72 bits of a random string. In this mode a register Count is also set to 0.

  2. 2.

    Generation: If a user needs a true random number from the system, it requests the Control Block to generate it. The Control Block then takes the 64-bit string SR_TRNG[Count:Count+63] and produces a 512-bit string using SHA-3 algorithm and provides these bits as an output to the request. The Control Block then increments the value of Count register by 1. After 10 ms the register SR_TRNG is right shifted by 9 bits, 9 random bits obtained from the new HPC value after the 10 ms are added to the shift register, and the register Count is again reset to 0.

Any user can obtain a maximum of \(9*512 = 4608\) bits of a random string within the latency of 10 ms. Hence the maximum throughput of the hybrid design is 46, 080 bits per second (or 45 Kbps). It is evident that the throughput of the hybrid design is directly proportional to the length of the shift register SR_TRNG, which can be tuned to support different kinds of applications with varying requirements of throughput.

6.2 Results on TRNG Output Obtained from Hybrid Construction

The TRNG output obtained from the HPC event values are used as input to the hybrid construction. As discussed previously, after every 10 ms the shift register SR_TRNG holding the recent history of random bits from the TRNG is right shifted by 9 bits to accommodate fresh random bits. In an interval of 10 ms, we obtain the upper bound of 4608 bits of random binary string which requires only 72 bits of extra storage. The storage will be marginally higher for higher throughput design. We also take the output from the hybrid construction and run both the NIST and AIS 20/31 Tests on the sequences. Without loss of generality, results for the event instructions on the Intel processor are furnished in Table 6, which shows that the sequences pass all the tests under both the test suites.

Table 6. NIST and AIS 20/31 test results on TRNG output for the HPC event instructions on Intel processor obtained from the hybrid construction
Table 7. NIST test results on the output of Linux /dev/urandom on both Intel and AMD Processors

7 Discussion

In this paper, we proposed a TRNG construction using the values obtained from the HPC events through the Linux based tool perf. However, all the Linux based systems have special character files /dev/urandom providing an interface to the kernel’s random number generator, which gathers environmental noise from device drivers and other sources into an entropy pool. However, several weaknesses of such random number generation with a detailed cryptographic analysis is shown in [13]. In order to stress the weakness, we collected “random” data using /dev/urandom and applied NIST Test suite on the output. The result of the tests on both Intel and ARM processors are shown in Table 7. We can easily observe that apart from the LinearComplexity test under the NIST Test suite the dataset fails to qualify for all other tests. Since the dataset did not qualify the NIST Test suite, we did not provide any results on AIS 20/31 Test to show its weakness further. The objective of this discussion is to stress on the fact that the proposed approach can be used as a TRNG source in modern Linux based systems as an alternative to apparently weaker random number generator using /dev/urandom.

8 Conclusion

In this paper, we showed that components of architecture infuse a huge level of randomness because of the Operating System optimization constructs and unpredictability of different hardware interrupts, which gets manifested through the Hardware Performance Counters. These counters digitize the randomness of the architectural constructs and various experimental results using standard NIST, and AIS 20/31 Test suites show that these counters can indeed be considered as a TRNG source. We have also shown that the proposed TRNG construction is robust and fault tolerant in the presence of a powerful adversary. The proposed TRNG module has a latency of 10 ms because of the time to access HPC events. Thus to enhance the throughput of the design, we combine the TRNG module with a simple yet effective Keccak hash implementation and a shift register to design a hybrid module which also qualifies NIST and AIS 20/31 Tests.