research-article

Open Access

Increasing the Robustness of TERO-TRNGs Against Process Variation

Authors:
Christian Skubich

Fraunhofer Institute for Integrated Circuits IIS, Germany

Fraunhofer Institute for Integrated Circuits IIS, Germany

0000-0002-4689-8691
View Profile

,
Peter Reichel

Fraunhofer Institute for Integrated Circuits IIS, Germany

Fraunhofer Institute for Integrated Circuits IIS, Germany

0000-0001-7149-8238
View Profile

,
Marc Reichenbach

BTU Cottbus-Senftenberg, Germany

BTU Cottbus-Senftenberg, Germany

0000-0002-9687-6247
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 16 Issue 3Article No.: 48pp 1–29https://doi.org/10.1145/3597418

Published:27 July 2023Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

The transition effect ring oscillator is a popular design for building entropy sources because it is compact, built from digital elements only, and is very well suited for FPGAs. However, it is known to be quite sensitive to process variation. Although the latter is useful for building physical unclonable functions, it is interfering with the application as an entropy source.

In this article, we investigate an approach to increase reliability. We show that adding a third stage eliminates much of the susceptibility to process variation and how a resulting gigahertz oscillation can be evaluated on an FPGA. The design is supported by physical and stochastic modeling. The physical model is validated using an experiment with dynamically reconfigurable look-up tables.

1 INTRODUCTION

1.1 On the Relevance of FPGA-Based Entropy Sources

Randomness is often essential in cryptography, such as for generating keys, initialization vectors, or nonces. Popular modes like AES-GCM can fail spectacularly if, for example, the same initialization vector is used twice due to poor or malicious design of an entropy source [15]. Although there are efforts to eliminate the requirement of randomness (e.g., through synthetic initialization vectors), true randomness remains a general requirement for secure operation of cryptographic systems.

The dependence on a physical source distinguishes random number generation from other elements of cryptography. A True Random Number Generator (TRNG)

cannot be replaced by software,
cannot be proved purely mathematically,
cannot be evaluated by standardized test vectors alone, and
is subject to environmental conditions and aging.

There are some similarities to other security hardware elements such as trusted on-chip memory. However, most problems with security hardware blocks are limited to attackers who are able to execute code or gain physical access, whereas vulnerabilities of a TRNG can compromise remote connections and on-chip generated keys. In summary, TRNGs are significant, difficult to assess, and potentially exploitable remotely by attackers.

To build confidence in a TRNG, it is essential to enable its evaluation. It can be argued that this includes disclosure of the design. The barrier to independent research is further lowered if the design can be implemented without a large investment. Therefore, FPGA-based implementations are of particular interest and will be examined in more detail in the following sections.

1.2 Entropy Sources for FPGAs

On FPGAs, one is limited in terms of possible building blocks: one can use either available hard macros (e.g., PLLs [16]) or digital logic elements such as Look-Up Tables (LUTs) and Flip-Flops (FFs). Ring oscillator-based designs are a typical approach. Petura et al. [26] categorize their studied entropy sources into those that are based on

single-event ring oscillators,
multi-event ring oscillators without signal collisions, and
multi-event ring oscillators with signal collisions.

A current example of the first category is the ES-TRNG [40], in which a ring oscillator is sampled by a tapped delay chain that is clocked by a second ring oscillator. Self-timed rings [9] fall into the second category. The Transition Effect Ring Oscillator (TERO) is a typical example of the third category and is the focus of the following section.

1.3 Transition Effect Ring Oscillator

The TERO was introduced by Varchola and Drutarovsky [34]. It is basically a latch with extended feedback paths that oscillates. The length of the oscillation is the random variable of this entropy source. The implementation can be based, for example, on an XOR/AND combination or NAND gates (Figure 1(a) and (b)). The authors reported that placement and routing are important for this design. The TERO design was later extended to be used as Physical Unclonable Function (PUF) [35]. A model based on noisy inverters was presented by Bernard et al. [4]. The authors develop an analytical expression for the probability of the number of oscillations and a procedure to determine the model parameters and the entropy rate.

Fig. 1. Three TERO topologies. (a) Original proposition. (b) NAND and buffer based. (c) Tunable implementation with additional delay elements.

Some authors proposed configuration options, for example, to compensate for process variation effects. Yang et al. [19] presented a tunable edge racing TRNG based on two ring stages. The proposed design uses multiple parallel inverters and selects one through a multiplexer (see Figure 1(c)). By varying the MUX configuration, some of the device mismatch can be compensated. A configurable TERO FPGA implementation has been proposed by Fujieda [17]. The implementation consists of NAND gates and multiplexers, with each multiplexer stage selecting between three inputs. This way, logic elements and routing paths can be adapted by configuration.

1.4 Goals and Contributions

Our goal is to explore how TERO designs can be implemented more reliably so that they are less susceptible to process variation effects. The contributions of this work are as follows:

(1)	We analyze timing issues of TERO counter implementations.
(2)	We demonstrate that a variant of the TERO design that adds an additional stage to reduce process variation effects can be implemented on FPGAs.
(3)	We show how the proposed entropy source can be evaluated in the system clock domain using asynchronous evaluation.
(4)	A model for the implementation is developed that is used for the entropy rate estimation.
(5)	We propose and perform experimental validation of the model beyond simple curve fitting.
(6)	Finally, we evaluate our proposed entropy source design and show that the design is indeed more robust to process variation effects.

2 TERO COUNTERS

2.1 TERO Counter Implementations

To use the TERO oscillator as an entropy source, you must measure the time that the oscillator is running. This is usually accomplished by using the TERO as a clock for a counter. We have found that in most publications, the actual counter implementation is not clearly stated. Where we could find more specific information, we identified four types of counter implementations (Figure 2 and Table 1).

The simplest and most direct implementation is shown in Figure 2. Figure 2(a) presents a synchronous counter that uses the TERO directly as a clock. Since this puts additional load on the oscillator, this may well affect the oscillation. This problem can be mitigated by an additional clock buffer, as shown in Figure 2(b)).

Since synchronous counters are limited in performance, a common implementation is an asynchronous counter (see Figure 2(c)). When only the LSB of the counter is used as the entropy output, the length of the chain is limited to one bit (so only a single T-FF is used). This reduces the size of the entropy source but also limits the monitoring of the entropy source’s health.

Fig. 2. TERO counter implementations. (a) Synchronous. (b) Synchronous with clock buffer. (c) Asynchronous. (d) Synchronous with duty cycle restoration.

Finally, Delvaux [13] proposed the use of a T-FF for restoring the duty cycle (see Figure 2(d)) [13]. From the synchronous counter’s point of view, the FF also acts as a low-pass filter, since short pulses are absorbed. To maintain full resolution, he combines this with both a rising and falling edge sensitive counter.

2.2 The Timing Issue

Although the TERO can generate an approximately uniform duty cycle at the beginning of the oscillation, the duty cycle changes over time until the oscillation collapses (Figure 3). In practice, the initial duty cycle may be skewed due to, for example, routing variations, a different number of inverters per branch, or process variation. We consider the frequency of the oscillation to be approximately constant. This signal has three characteristics that make it difficult to use as a clock:

Fig. 3. Pulse width changes of the TERO output.

the frequency is high and may already exceed the technological limits of synchronous designs at the beginning,
the duty cycle changes over time with pulses basically becoming as short as technological possible, and
at some point the inverters will probably not switch completely anymore.

There has been some discussion about the difficulties of using this signal as a clock. Varchola and Drutarovsky [33] implement the counter asynchronously because the TERO frequency exceeds the technological limitations of a synchronous implementation. Fujieda [17] mentions that timing constraints must be considered when implementing the counter. However, he does not address how to properly constrain the design. Delvaux [13] argues that duty cycle is an issue in a synchronous counter implementation: when oscillation exceeds technological limitations, some FFs of the counter may update their state, whereas others remain at their previous value. He also conducted experiments to confirm this assumption. Contrary to his argument, this is not a problem that affects all previous TERO implementations with multi-bit counters. Asynchronous counters are a common implementation [5, 6, 14, 33, 34] and are not subject to the same technological limitations. Although there is a frequency limit here as well, exceeding it does not result in corrupted counters. Instead, the first T-FF acts as a frequency filter that absorbs short pulses, and the counter just misses toggling.

As Delvaux’s experiments show, synchronous implementations (Figure 2(a) and (b)) should not be used. For an ASIC implementation, the circuit given in Figure 2(c) would be a good candidate. For FPGAs, however, the use of an asynchronous multi-bit counter is questionable because FPGA toolchains are tailored to synchronous designs. Constraining such designs is not part of the standard design flow. In particular, without proper timing analysis, one does not know when the counter bits are expected to be stable.

The implementation in Figure 2(d) may be better suited for FPGA designs. Only a single T-FF is required, where manual placement and routing are much more feasible than for a complete multi-bit counter. All remaining bits can be placed automatically by the placement tool. Proper operation of this implementation is only guaranteed if the clock for the synchronous counters is properly constrained. For this purpose, the developer must manually determine the maximum frequency. This can be challenging since, for example, Xilinx Vivado does not support direct timing analysis for a purely combinatorial path. Since both counters can only differ in their LSB, the two counters and the adder could be simplified to a single counter and a T-FF. Additionally, it might be advisable to add a dedicated clock buffer. In this case, this will limit the number of instances that can be implemented: for our target device, each clock region contains on average only eight suitable clock buffers. Although the counter is synchronous, other digital components run in a separate clock domain. Therefore, additional synchronization will be required.

Proper design and constraining of a TERO counter is challenging to say the least and requires detailed design analysis. From an implementation point of view, it would be much easier if the entropy source could be evaluated in the system clock domain. In the following section, we present such an architecture for our TERO variant.

3 THE THREE-STAGED TERO FOR FPGAS

3.1 From Two to Three Stages

The TERO design is very sensitive to process variation. The cause of this problem is the path of the two edges (Figure 4(a)). To illustrate the effect, assume that each gate g has a rising (falling) delay \(D_{g, r}\) (\(D_{g, f}\)) and wire delays can be ignored. Consider the first two periods of the edges traversing the oscillator. In this case, edge 1 accumulates the delay \(2 \cdot (D_{1, r} + D_{2, f}),\) whereas edge 2 accumulates the delay \(2 \cdot (D_{2, r} + D_{1, f})\). Thus, in the two-staged TERO, the mismatch between the paths of the edges is accumulated over time. So, the timing of the collapse of the oscillation depends not only on the noise but also is heavily influenced by the effects of process variation. This effect has been shown to be relevant for both ASIC and FPGA implementations. Bernard et al. [4] report that their adjustable ASIC design works well only in a specific range of path delays, which they identify as a potential problem for large-volume production. For FPGA implementations, the survey by Petura et al. [26] found that even the same configuration file can yield very different implementation results. In their feasibility and repeatability scale of 5 (best score) to 0 (worst score), the TERO was rated 1, meaning that a manual setup is required for each individual device. However, the dependence on actual physical parameters allows the TERO to be used as a PUF.

Fig. 4. Illustration of the waveform propagation in the two- and three-staged TERO. Only for 3S-TERO do the edges alternate their direction between rising and falling for each gate.

For ASIC implementations, one can adjust the design parameters to reduce the mismatch. For FPGAs, the number of available options is more limited. One approach used for both ASIC and FPGA implementations is a configurable design. A major disadvantage of a configurable design is the increased complexity of using it. One needs a way to select the correct implementation and must ensure that an attacker cannot exploit the configurable design.

It is advantageous to address the process variation sensitivity in the architecture of the entropy source. This can be done by adding a third stage (see Figure 4(b)). We refer to this architecture as the three-staged TERO or 3S-TERO. Here, the transit direction changes after each iteration. In the first two periods, each edge accumulates \(D_{1, r} + D_{1, f} + D_{2, r} + D_{2, f} + D_{3, r} + D_{3, f}\). Thus, the variation between the gates evens out every two complete iterations.

However, this advantage comes at a price. The TERO simply stops oscillating after the edge collapse. This makes evaluation easy by simply using the oscillator as a clock for a counter. For the 3S-TERO, this is more complicated: when two edges collide, only they disappear while the third continues to oscillate. We refer to this frequency as \(f_{low}\). We call the frequency at the beginning \(f_{high}\) with \(f_{high} = 3 \cdot f_{low}\). Thus, simply using the oscillator as a clock for a counter gives the collapse time (measured at \(f_{high}\)) mixed with a counter clocked at \(f_{low}\). This complicates entropy extraction and monitoring of the entropy source. To avoid this mixture, one must explicitly detect the edge collision and stop the counter accordingly, which requires additional hardware and implementation effort.

For ASICs, such an implementation was presented by Yang et al. [41]. To evaluate the oscillation state, they compare the frequency of the 3S-TERO with a reference oscillator (Figure 5). Both the area and implementation complexity for the required Phase Frequency Detector (PFD) are significant, and the design is most likely not suitable for an FPGA implementation. Additionally, the design requires additional power since it uses an oscillator as a reference. Finally, the authors report that the counter FF has mismatch issues, so the LSB is dropped and not used as an entropy output. In the following section, we propose our implementation that addresses all of these problems.

Fig. 5. ASIC implementation with reference oscillator and PFD. Adapted from Yang et al. [41].

3.2 FPGA Implementation of the 3S-TERO

We propose a design that builds on the three-staged TERO oscillator and uses a novel approach to evaluate the oscillator state. In addition to addressing the problem of process variation, we have two main goals. First, we want an architecture that is suitable for FPGA implementations. Second, we prefer to evaluate the entropy source in the system clock domain. This is motivated both by our analysis of the TERO timing problem in Section 2 and by the counter LSB mismatch issue reported by Yang et al. [41].

The structure of our proposed architecture is shown in Figure 6 and consists of an oscillator (the physical source of entropy), a counter (to measure the oscillation time), evaluation logic for the oscillator state, and a result register. These elements are described in detail in the following.

Fig. 6. Structure of the proposed entropy source.

The basic structure of the oscillator is found in Figure 4(b), right. It can be modified by adding either an even number of inverters or an arbitrary number of buffers between the NAND gates. The counter is reset and started by the start signal. It operates in the system clock domain. Therefore, no special timing constraints are required. It does not require dedicated clock buffers and can be shared by multiple active entropy source instances. The oscillator state evaluation controls the result register.

The challenge for the oscillator state evaluation is to evaluate a fast oscillator with a varying duty cycle with a system clock where the frequency is limited. For our most compact implementation, the ring oscillator operates at \(f_{high} \approx 1 \text{ GHz}\). At the same time, FPGA components such as clock buffers are frequency limited. For Xilinx Virtex UltraScale devices, for example, clock buffer limits range from 630 to 850 MHz [37]. Since any possible sampling frequency is far below the Nyquist frequency, simply sampling an oscillator output is insufficient.

However, for our application, sub-Nyquist sampling is possible by using three taps instead of just one. The basic idea is to take advantage of the fact that the signals on the three taps should be equal only at \(f_{high}\). This is illustrated in Figure 7: the combinatorial equality comparison will no longer be true after the oscillation collapse.

Fig. 7. Idealized illustration of the phase shifting and collapse of the 3S-TERO oscillator.

Initially, we experimented with simply sampling of the combinatorial comparison of the three taps. This approach introduced distortions because the sampling depends on the phase relationship between the 3S-TERO oscillator and the system clock.

To solve this problem, we introduce the concept of asynchronous oscillation evaluation. The idea is to use an asynchronous FF input to continuously evaluate the oscillation state. This eliminates the dependence on the phase relationship. In our implementation, the three taps are compared by an equality gate that controls the asynchronous set input³ of an FF. The D input is set to constant zero (Figure 8). As a clock, we use the system clock only where we require \(f_{sys} \lt \frac{f_{high}}{2}\). While the oscillator is running at \(f_{high}\), the output of the equality gate will be high at least twice per \(T_{sys} = \frac{1}{f_{sys}}\). Thus, regardless of the phase relationship, the repeatedly set zero is constantly overridden by the asynchronous set input before the FF output is evaluated. In this way, the circuit can detect the last clock cycle in which the oscillator is still running at \(f_{high}\) and in which the generated pulse is long enough to trigger the asynchronous set input. Figure 9 shows a comparison between direct sampling and the asynchronous implementation.

Fig. 8. Implementation for the asynchronous evaluation of the ring oscillator state. The gray FF is not part of the actual evaluation logic.

Fig. 9. Example for direct sampling ( \(q_2\) ) and the asynchronous implementation ( \(q_1\) ).

Due to asymmetric routing and process variation, you must expect the actual design to be imbalanced. Here the equivalence comparison is critical: the skew of the taps must not be extreme to ensure a correct equivalence comparison. To ensure approximately symmetric routing, the designer must analyze routing and timing. Thus, similar to the implementations discussed in Section 2, the crossing of system clock and oscillator must be considered. Unlike the implementation proposed by Delvaux [13], the counter is fully implemented in the system clock domain and there is no additional clock domain crossing.

Figure 10 shows a comparison of the resulting distribution of the counter value for different sampling frequencies. A higher sampling frequency produces a wider distribution with the potential for a higher entropy rate. At higher sampling frequencies, the histograms show binning artifacts caused by compression (see Section 4.3). For our design, the 100 MHz of our system clock turned out to be a reasonable choice for the sampling frequency as well.

Fig. 10. Counter value distribution of a 3S-TERO for different sampling frequencies.

Fig. 11. Illustration of delay components for our model of the 3S-TERO.

4 MODELING

4.1 Physical Modeling of Pulse Shortening

The stochastic TERO model of Bernard et al. [4] is based on the physical model of Reyneri et al. [28]. For a single inverter, the pulse shortening (without considering noise components) is described as follows: (1) \(\begin{align} w_{\text{out}} = \frac{t_c}{2} + \left[w_{\text{in}} - \frac{t_c}{2}\right] [1 + H_d] = w_{\text{in}} + H_d \left(w_{\text{in}} - \frac{t_c}{2}\right). \end{align}\)

The length of the outgoing (respectively incoming) pulse of the inverter is denoted by \(w_{\text{out}}\) (\(w_{\text{in}}\)). \(H_d\) is a constant related to the physical properties of the oscillator, whereas \(\frac{t_c}{2}\) corresponds to the average pulse width.

With one additional stage, things become more complex for the three-staged TERO, since one additional pulse must be considered (Figure 11). We denote the outgoing (respectively, incoming) pulse at gate j in round k as \(\omega _{\text{out}, j, k}\) (\(\omega _{\text{in}, j, k}\)). We model the pulse shortening as follows: (2) \(\begin{align} \omega _{\text{out}, 0, k} = \omega _{\text{in}, 0, k} + H_d \left(\omega _{\text{in}, 0, k} - \frac{t_c}{2}\right) = \omega _{\text{out}, 2, k-1} + H_d \left(\omega _{\text{out}, 2, k-1} - \frac{t_c}{2}\right), \end{align}\) (3) \(\begin{align} \omega _{\text{out}, 1, k} = \omega _{\text{in}, 1, k} + H_d \left(\omega _{\text{in}, 1, k} - \frac{t_c}{2}\right) = \omega _{\text{out}, 0, k-1} + H_d \left(\omega _{\text{out}, 0, k-1} - \frac{t_c}{2}\right), \end{align}\) (4) \(\begin{align} \omega _{\text{out}, 2, k} = \omega _{\text{in}, 2, k} + H_d \left(\omega _{\text{in}, 2, k} - \frac{t_c}{2}\right) = \omega _{\text{out}, 1, k-1} + H_d \left(\omega _{\text{out}, 1, k-1} - \frac{t_c}{2}\right). \end{align}\)

For our design, the average pulse width is given by \(\frac{t_c}{2} = \frac{1}{3}(\tau _0+\tau _1+\tau _2)\). The initial pulse width is determined by the ring delay of the preceding stage: \(\begin{align*} \omega _{out, 0, 0} = \tau _2 \qquad w_{out, 1, 0} = \tau _0 \qquad w_{out, 2, 0} = \tau _1. \end{align*}\)

We denote the noise introduced at gate j in round k as \(n(j, k)\). We assume i.i.d. noise components for each edge \(n(j, k) \sim \ \mathcal {N}(0,\, \sigma),\) which for each round are added at the respective gate: (5) \(\begin{align} \omega _{\text{out}, 0, k} = \omega _{\text{out}, 2, k-1} + H_d \left(\omega _{\text{out}, 2, k-1} - \frac{t_c}{2}\right) + n(0, k), \end{align}\) (6) \(\begin{align} \omega _{\text{out}, 1, k} = \omega _{\text{out}, 0, k-1} + H_d \left(\omega _{\text{out}, 0, k-1} - \frac{t_c}{2}\right) + n(1, k), \end{align}\) (7) \(\begin{align} \omega _{\text{out}, 2, k} = \omega _{\text{out}, 1, k-1} + H_d \left(\omega _{\text{out}, 1, k-1} - \frac{t_c}{2}\right) + n(2, k). \end{align}\)

The oscillation stops as soon as one of the three pulse widths reaches zero. Thus, the probability \(P(k)\) that the oscillation collapses in round k is as follows: (8) \(\begin{align} P(k) = P(& \lbrace \omega _{\text{out}, 0, k} \le 0 & \text{ or } & \omega _{\text{out}, 1, k} \le 0 & \; \text{or} \; & \omega _{\text{out}, 2, k} \le 0 \rbrace \\ & | \; \lbrace \omega _{\text{out}, 0, l} \gt 0 & \text{ and } & \omega _{\text{out}, 1, l} \gt 0 & \text{ and } & \omega _{\text{out}, 2, l} \gt 0 \rbrace \; \forall \; l \in [0,1, \dots , k-1]). \nonumber \nonumber \end{align}\)

4.2 Model Simplification

For our model simplification, we define the following: \(\begin{align*} w_{out, j, k}^{\prime } = \frac{w_{out, j, k}}{\sigma } \qquad \tau _i^{\prime } = \frac{\tau _i}{\sigma } \qquad t_c^{\prime } = \frac{t_c}{\sigma } = \frac{2}{3}(\tau _0^{\prime }+\tau _1^{\prime }+\tau _2^{\prime }) \qquad n(j, k)^{\prime } = \frac{n(j, k)}{\sigma }. \end{align*}\)

With \(n(j, k)^{\prime } \sim \ \mathcal {N}(0,\, 1)\) we can now simplify the model by division of Equations (5) through (7) by \(\sigma\): (9) \(\begin{align} \omega _{\text{out}, 0, k}^{\prime } = \omega _{\text{out}, 2, k-1}^{\prime } + H_d \left(\omega _{\text{out}, 2, k-1}^{\prime } - \frac{t_c^{\prime }}{2}\right) + n(0, k)^{\prime }, \end{align}\) (10) \(\begin{align} \omega _{\text{out}, 1, k}^{\prime } = \omega _{\text{out}, 0, k-1}^{\prime } + H_d \left(\omega _{\text{out}, 0, k-1}^{\prime } - \frac{t_c^{\prime }}{2}\right) + n(1, k)^{\prime }, \end{align}\) (11) \(\begin{align} \omega _{\text{out}, 2, k}^{\prime } = \omega _{\text{out}, 1, k-1}^{\prime } + H_d \left(\omega _{\text{out}, 1, k-1}^{\prime } - \frac{t_c^{\prime }}{2}\right) + n(2, k)^{\prime }. \end{align}\)

This eliminates one model parameter, since we can work with a standard normal distribution. The model modification is also intuitive: the collapse probability does not depend on the absolute value of the delay parameters, but on the delays relative to the noise level.

4.3 Modeling of the Asynchronous Evaluation

A complete oscillation period lasts \(2 (\tau _0+\tau _1+\tau _2)\) and contains three rising and three falling edges. From the modeling perspective, this corresponds to six rounds. If an ideal double-edge sensitive counter were clocked by this signal, it would count at an average frequency of \(f_{\text{high}} = \frac{6}{2 (\tau _0+\tau _1+\tau _2)} =\frac{3}{\tau _0+\tau _1+\tau _2}\). Once the oscillation collapses, the period is still the same but contains only one rising and one falling edge.

In our implementation, the actual counter is clocked by a system clock at \(f_{\text{sys}}\) with (12) \(\begin{align} f_{\text{sys}} \lt \frac{f_{\text{high}}}{2}. \end{align}\)

The same is true for the evaluation logic, so \(f_{\text{sys}}\) determines the resolution of the time measurement. When the counter is stopped at round \(k,\) the time \(t_0 =\frac{1}{f_{\text{high}}} \cdot k\) has elapsed. At this time, the counter has the value \(\lfloor f_{\text{sys}} \cdot t_0 \rfloor = \lfloor \frac{f_{\text{sys}}}{f_{\text{high}}} \cdot k\rfloor\). The probability \(P(c)\) that the counter stops at c can now be determined simply by summing all \(P(k)\) where \(c =\lfloor \frac{f_{\text{sys}}}{f_{\text{high}}} \cdot k \rfloor\). This means that the resulting distribution for \(P(c)\) is a compressed version of the distribution of \(P(k)\).

4.4 Model Simulation and Parameter Identification

To speed up the computation, the simulation is vectorized. This includes both the round-by-round update of the pulse width and the detection of the simulations as finished.

The application of the model of the oscillator evaluation can be performed on the generated histogram data. Therefore, this step is not performance critical. Figure 12 shows examples of the pulse width simulation and the corresponding histogram of the counter values.

Fig. 12. Example for pulse width simulation and histogram.

For validation of our model, we want to compare model and measurements. Therefore, we want to identify the best fitting model parameters for our entropy source instances. Typically, this is done by some sort of curve fitting of the Probability Density Function (PDF) to the measured data. In general, there is no guarantee that a useful form of this function exists. Without knowing the PDF for our model, we must take a different route to determine the model parameters.

As an approximation to the PDF, we use the results of a Monte Carlo simulation of our model. One problem with this approach is the noise in the simulation results. This is problematic because many standard curve fitting algorithms are not designed to deal with both noisy measurements and noise in the function being fitted. We opted for a brute force approach where we select a reasonable parameter range and precompute our simulation-based PDF approximation for this parameter range. Then, to determine the actual model parameters for particular measurement data, we simply select the best result by scanning the entire parameter range. Analogous to the method of least squares, we use the sum of squares of the residuals as a metric.

4.5 Stochastic Model

In TRNG models, it is a common assumption that oscillator jitter is temporally uncorrelated [4, 21, 24]. Due to effects such as flicker noise, this assumption may be too optimistic. An analysis of noise effects on FPGAs was presented by Haddad et al. [18]. To investigate our platform, we performed the experiment they proposed (see Appendix A.1). According to this methodology, we should not exceed a jitter accumulation time of 2.4 us for thermal noise to contribute 95% to the overall oscillator noise. Since we are well below this limit, we assume for the following model that the jitter realizations are independent. Statistical tests of the generated entropy data can provide additional confidence regarding possible correlations.

For an entropy output of one bit per sample, we select the LSB (without further post-processing). Let \(p_{i}\) be the probability that the LSB of a counter value is equal to i (\(i \in {0,1}\)). Given that \(p_{i}\) is known, we can calculate the min-entropy as (13) \(\begin{align} H_\mathbf {\infty } = -log_{2} (\max \limits _{i \in \lbrace 0,1\rbrace } \, p_{i}). \end{align}\)

To determine \(p_{i}\), we use our identified model parameters and the results of our Monte Carlo simulation. For the calculation of \(H_\mathbf {\infty }\), we need to account for possible inaccuracies of our \(p_{i}\) estimation. More details are presented in Section 6.4.

5 MODEL VALIDATION

5.1 Motivation and Idea

Simple parameter fitting of experimental data to a modeling function can be misleading. Fitting data to a function becomes easier as the number of parameters increases. For example, a better fit of a more complex model may just be an artifact of its complexity. In this section, we propose an experiment that provides more evidence than a single curve fit. The idea is to use configurable ring elements so that you can construct permutations with a known relationship between their parameters. Since the different configurations share a common set of hardware elements (in our case, certain LUTs and routing resources), the curve fits can be cross validated by a comparison of model parameters. It is important to note that this approach is only feasible if the curve fit allows to infer elemental model parameters (e.g., ring element delays).

5.2 Dynamically Reconfigurable Look-Up Tables

Since our experiment is built with dynamically Reconfigurable Look-Up Tables (RLUTs) this technology is introduced in the following. RLUTs are special LUTs that can be reconfigured at runtime using a simple serial interface. This allows to adapt the functionality of specific parts in a design with deterministic timing. For Xilinx UltraScale devices, the primitive is named CFGLUT5 (Figure 13) and can only be implemented in slices of the SLICEM type. It takes 32 cycles to update the complete configuration. It is possible to chain the reconfiguration interface of multiple CFGLUT5 instances [38]. Unlike partial reconfiguration, RLUT reconfiguration cannot change the routing of the design—only the logical function of the LUT can be adapted. Applications in the context of hardware security include hardware Trojans and hardening against side channel attacks [29].

Fig. 13. CFGLUT5 primitive for Xilinx UltraScale devices.

5.3 Implementation

RLUTs do not allow you to change the routing, but it is possible to implement multiple routing options and select one by changing the RLUT function accordingly. The setup for our design is shown in Figure 14: the output of each LUT is connected to two inputs of its successor. Assuming that a LUT should operate as NAND, you then configure the RLUT function to either \(y=\lnot (I0 \wedge I4)\) or \(y=\lnot (I1 \wedge I4)\) to choose between routing paths. Depending on the FPGA architecture, the LUT input may also be important since the timing may be different. For Xilinx FPGAs, the pin mapping between logical and physical LUT inputs can be adapted by synthesis. To control which physical LUT input pins are used, the pin mapping can be fixed [39].

Fig. 14. Setup and configuration options for the experimental validation of the 3S-TERO model.

5.4 Implementation Without RLUTs

The experiment can also be adapted for standard LUTs. Instead of changing the LUT function to select between multiple routing paths, you can also use additional LUT inputs that are connected to configuration bits for the selection.

If runtime configuration is not required, there is also the option to modify the bitfile. For the Xilinx toolchain, the LUT configuration can be adapted by a TCL command. In this way, it is possible to export multiple bitfile versions where only the respective LUT configurations differ.

6 EVALUATION

6.1 Evaluation Setup

The evaluation setup for our design is shown in Figure 15. To access the FPGA hardware from Python, we use a rapid system prototyping framework [27]. This allows us to automate the complete measurement setup including PLL and RLUT configuration in a Python script. Settings like sample size or entropy source selection are stored in a register file. To reduce the amount of measurement data, the histograms of the counter value distributions can optionally be generated in hardware. Our target device is a Xilinx Virtex UltraScale XCVU440.

Fig. 15. Architecture of our evaluation setup.

For the TERO, we evaluate two implementations: one with two LUTs per stage and one with three. Both use an asynchronous counter implementation. For the 3S-TERO, we implemented the shortest possible version with only three LUTs configured as NAND gates. All of these designs are implemented with the CFGLUT5 primitive and are instantiated 32 times at different locations within the FPGA’s fabric.

Among other things, we want to study the effect of process variation. Therefore, we fix relative placement, local routing, and pin mapping for each of the four implementations using a mixture of VHDL attributes and XDC constraints. This approach assumes that positions on the FPGA where identical local routing is possible share an identical or at least very similar physical implementation. In practice, however, there may be differences, such as a mirrored layout or a nearby clock network.

6.2 Stability Regarding Process Variation

As shown in Section 3.1, the motivation for the 3S-TERO design is the assumption that it is less susceptible to the effects of process variation. The practical impact is evident when you compare the histograms of TERO and 3S-TERO implementations (see Appendix A).

To measure the actual differences, we evaluate three metrics. The results are shown in Figure 16. Figure 16(a) shows the mean of the counter values (measured over 500,000 runs) of each instance. For the 3S-TERO, the mean values are quite close, whereas the mean for the TERO varies considerably. For the 3S-TERO, the scaling of the distribution depends on the sampling frequency. Therefore, Figure 16(b) is a fairer comparison: it shows the relative deviation of the mean for one instance from the average mean of the respective implementation. Thus, for each instance \(k,\) we calculate \(\Delta \mu _{k, rel} = |\mu _{k}/ \frac{\sum \nolimits _{i=0}^{31} \mu _{i}}{32} - 1|\). Although this metric remains in the range of 20% for the 3S-TERO, the TERO sometimes exceeds 100%.

Fig. 16. Comparison of the 96 instances.

To assess the dispersion, we calculated the standard deviation across the instances for Figure 16(c). To normalize for the mean, we calculate the relative standard deviation \(c_v = \frac{\sigma }{\mu }\). Again, we find that the 3S-TERO is more stable across the instances.

6.3 Model Validation

6.3.1 Fitting of Related Instances.

For the model validation experiment, it is essential to find a consistent set of parameters so that each change in configuration can be attributed to the change in a corresponding \(\tau ^{\prime }\). We determine the parameters as follows. First, we perform individual curve fits for two curves with distinct configurations. We store the best n parameter sets for each curve. In the second step, we generate all possible parameter sets, resulting in \(n^2\) combinations. Any combination where the parameter \(H_d\) does not match is discarded. For each remaining combination (\(H_d, \tau _{0_a}^{\prime }, \tau _{1_a}^{\prime }, \tau _{2_a}^{\prime }, \tau _{0_b}^{\prime }, \tau _{1_b}^{\prime }, \tau _{2_b}^{\prime }\)), we then generate the possible permutations (e.g., \(H_d, \tau _{1_a}^{\prime }, \tau _{0_a}^{\prime }, \tau _{2_a}^{\prime }, \tau _{0_b}^{\prime }, \tau _{1_b}^{\prime }, \tau _{2_b}^{\prime }\)). These parameter sets are then applied to the simulation model, and we calculate the sum of the curve fit metric. Finally, we select the parameter set with the lowest curve fit error as our result.

Two examples of the resulting curve fits can be found in Figure 17. In general, our curve fitting procedure can find quite well fitting consistent parameter sets. One possible reason for the small difference between curve fit and measurement data is our limited precomputation precision (e.g., the \(\tau ^{\prime }\) values were computed only in steps of 20).

Fig. 17. Two examples for the consistent curve fit of the eight possible configurations. The parameter set \((H_d,\tau _{0,a}^{\prime },\tau _{1,a}^{\prime },\tau _{2,a}^{\prime }, \tau _{0,b}^{\prime },\tau _{1,b}^{\prime },\tau _{2,b}^{\prime })\) for the two plots was determined as \((0.00225, 380, 520, 520, 240, 780, 560)\) and \((0.00300, 360, 380, 480, 240, 540, 500)\) . More results can be found later in Figure 26.

6.3.2 Computational Effort.

For our system, we calculated 1,400 rounds per simulation and performed 300,000 simulations per parameter set. The precomputation took about 10,000 CPU hours. In the second step, the compression due to the asynchronous evaluation logic is applied. The final result can be compressed to less than 15 MByte. Based on this dataset, scanning the entire parameter space for the best fit takes only a few minutes on a standard computer.

6.4 Entropy

To calculate \(H_\mathbf {\infty }\) for our design (see Section 4.5), we select the configuration with the widest distribution and identify the parameters as previously described. The calculation of \(H_\mathbf {\infty }\) relies on \(p_i\) for which two factors may lead to an inaccurate estimate. First, the Monte Carlo simulation approach causes statistical fluctuation in the values we determine for \(p_i\). Second, there may be uncertainty in the model parameters caused by the model fitting procedure, e.g., because the accuracy of the model parameters is limited by the step size of the parameters. A similar effect might be caused by different operating conditions.

To estimate the error caused by Monte Carlo simulation, we select one parameter set and calculate the probability of the counter LSBs based on multiple simulation runs. The result is shown in Figure 18(a). The deviation from the mean of most values is less than 0.5%. To account for uncertainties in the model parameters, we calculate \(H_\mathbf {\infty }\) for the variation of \(\tau ^{\prime }\) in a range of \(\pm 40\). For a conservative entropy estimate, we select the configuration with the lowest \(H_\mathbf {\infty }\). Additionally, we consider the possible 0.5% error of the Monte Carlo simulation. Under these conditions, we determined \(H_\mathbf {\infty }\) to be above 0.91 for all instances (see Figure 18(b)). If you use one bit per sample, an operating frequency of 100 MHz, and 60 cycles per sample, the throughput is about \(1.6 \cdot 10^6\) bit/s. Although a guaranteed min-entropy of 0.91 bit would be sufficient for the current version of AIS-31 [20], the draft for a revision [31] indicates that future standards might require a higher level of entropy per bit. In this case, further algorithmic post-processing is required. To validate the theoretical result, we performed test procedure B (T6–T8) as described in AIS-31. The tests pass and provide a Shannon entropy estimate of at least 0.9994 per bit for all instances.

Fig. 18. Evaluation of simulation-based probability estimation and \(H_\mathbf {\infty }\) .

6.5 NIST SP 800-90B IID and Non-IID Tests

For additional validation and comparison between the TERO and the 3S-TERO, we performed the IID and non-IID entropy estimations from NIST SP 800-90B for 32 instances each. For the TERO, we choose the same entropy output as for the 3S-TERO (LSB without further post-processing). For the 3S-TERO (see Table 3), we found that all 32 instances passed the IID tests. For the TERO instances, we examined (see Table 4) four instances that failed the IID test. All four failed the IID permutation test, and three of them also failed the chi square independence test.

Table 1.

Publication	Application	Technology	TERO Length	Counter Type
[8]	PUF	Cyclone III, ASIC (350 nm)	Various	(a)¹
[17]	TRNG	Artix-7 XC7A35T	8	(b)²
[34]	TRNG	Spartan 3E	4	(c)
[13]	TRNG, PUF	Zynq-7000	8–64	(d)

Length refers to the total number of elements (e.g., buffer, inverter, or NAND gates) of a TERO instance.

View Table

Table 1. Examples of TERO Counter Implementations

Length refers to the total number of elements (e.g., buffer, inverter, or NAND gates) of a TERO instance.

Table 2.

Type	Platform	Area	Throughput	Stochastic	Multiple Instances
			(Mbit/s)	Model
Meta-stability [12]	Altera Cyclone V	4 LUTs, 3 FFs	0.76	✘	✘
	Microsemi SmartFusion 2
Multi-stage feedback ring oscillator [11]	Xilinx Spartan 6	24 LUTs, 2 FFs	150	✘	✘
	Xilinx Virtex 6		290
RO with configurable delay [23]	Xilinx Spartan 3	528 LUTs, 177 FFs	6	✘	✘
Digital nonlinear oscillator [1]	Xilinx Artix 7	15 LUTs	100	✘	6 FPGAs, 16 locations each
STR/Jitter-latch structure [36]	Xilinx Spartan 6	56 LUTs, 19 FFs	100	✘	\(\checkmark\)
	Xilinx Virtex 6
ERO [26]	Xilinx Spartan 6	46 LUTs, 19 FFs	0.0042	[2]	✘
	Altera Cyclone V	34 LUTs, 20 FFs	0.0027
	Microsemi SmartFusion 2	45 LUTs, 19 FFs	0.014
COSO [26]	Xilinx Spartan 6	18 LUTs, 3 FFs	0.54	\(\checkmark\)	✘
	Altera Cyclone V	13 LUTs, 3 FFs	1.44
	Microsemi SmartFusion 2	23 LUTs, 3 FFs	0.328
Configurable COSO [25]	Xilinx Spartan 6	108 LUTs, 39 FFs	3.3	\(\checkmark\)	\(\checkmark\)
	Microsemi SmartFusion 2	111 LUTs, 38 FFs	1.47
	Xilinx Spartan 7	82 LUTs, 62 FFs/58 LUTs, 62 FFs	4.65/2.34
MURO [26]	Xilinx Spartan 6	521 LUTs, 131 FFs	2.57	[32]	✘
	Altera Cyclone V	525 LUTs, 130 FFs	2.2
	Microsemi SmartFusion 2	545 LUTs, 130 FFs	3.62
PLL [26]	Xilinx Spartan 6	34 LUTs, 14 FFs	0.44	[3]	✘
	Altera Cyclone V	24 LUTs, 14 FFs	0.6
	Microsemi SmartFusion 2	30 LUTs, 15 FFs	0.37
TERO [26]	Xilinx Spartan 6	39 LUTs, 12 FFs⁴	0.625	[4]	✘
	Altera Cyclone V	46 LUTs, 12 FFs	1
	Microsemi SmartFusion 2	46 LUTs, 12 FFs	1
STR [26]	Xilinx Spartan 6	346 LUTs, 256 FFs	154	[10]	✘
	Altera Cyclone V	352 LUTs, 256 FFs	245
	Microsemi SmartFusion 2	350 LUTs, 256 FFs	188
High-precision edge sampling [40]	Xilinx Spartan 6	10 LUTs, 5 FFs + counter	1.15	\(\checkmark\)	✘
	Altera Cyclone V	10 LUTs, 6 FFs + counter	1.067
3S-TERO (this work)	Xilinx Virtex UltraScale	5 LUTs, 3 FFs + counter	1.6	\(\checkmark\)	32 instances

View Table

Table 2. Comparison with Other TRNGs for FPGAs

In the case of non-IID entropy outputs, NIST provides a set of entropy tests that aim to be diverse and conservative. In general, these tests tend to underestimate entropy [30]. We use a sample size of about 256 million samples per instance. For the 3S-TERO, the tests give fairly stable results across all instances (Figure 19(b)). According to these tests, the target of 0.91 bit of entropy per sample is achieved. The results for the TERO (see Figure 19(a)) correlate with the results of the IID test: the instances that failed the independence test are significantly weaker. Overall, both IID and Non-IID tests confirm that the proposed design provides more consistent results. The direct comparison indicates that the design goal was indeed achieved.

Fig. 19. Results of the NIST SP 800-90B Non-IID tests.

6.6 Temperature

To evaluate the temperature effect on random number generation, we performed experiments in a climate cabinet (Figure 20; only instance 32 was tested). We evaluated the data using the NIST Non-IID tests, as these provide a diverse set of entropy estimates for comparison. Overall, we did not detect any particular effect for the temperature range evaluated.

Fig. 20. Evaluation of the NIST Non-IID test for temperature variation.

6.7 Restart Tests

It is possible for a design to have a similar pattern after each restart. To evaluate the restart behavior, you can record the initial output sequence for several restarts of the design [7, 11, 23, 36]. For the test, we enable the reset for a couple of seconds, release the reset, acquire the initial random data generated, and record the output sequence for several restarts (Figure 21). The results do not indicate a particular repeating pattern.

Fig. 21. Initial sequence for six restarts.

6.8 Comparison

Table 2 provides a comparison between our 3S-TERO implementation and other FPGA TRNGs. Our implementation has one of the lowest resource consumptions. In terms of throughput, there are both slower and faster designs. To compare the throughput, one needs to take into account that our platform is manufactured in a more modern technology node, so the numbers are not directly comparable. The fifth column lists the availability of stochastic modeling, which is required for entropy estimation of modern standards. For many designs, this is lacking, whereas for other designs, a detailed theoretical analysis is provided. The last column lists whether validation across multiple instances is reported.

With respect to active and passive attacks, we did not perform specific experiments. The results of Cao et al. [6] indicate that TEROs are resistant to low temperatures but vulnerable to underpower attacks. The work of Mureddu et al. [22] shows that TEROs are susceptible to the locking phenomenon. We expect similar results for our proposed implementation.

In a direct comparison with the original 3S-TERO ASIC implementation of Yang et al. [41], there are some differences worth mentioning. Our proposed evaluation logic is much simpler and smaller than a PFD with a reference oscillator. Additionally, the synchronous counter implementation fits better into a standard FPGA development flow and it will not face the LSB mismatch issues. These advantages come at a cost of resolution, since the counter distribution is less wide.

7 CONCLUSION

In this work, we introduced the 3S-TERO, a novel entropy source for FPGAs. Although its resource usage is comparable to the TERO, our results showed that it is much more robust against the effects of process variation. The counter implementation avoids using the oscillator as a clock at the cost of resolution. We proposed a model for our design that fits the measurement data. Furthermore, we constructed related instances that share certain model parameters and show that model and experimental results are consistent. Both theoretical and experimental results showed that a sufficient level of entropy can be achieved across all instances.

A APPENDIX

A.1 Experimental Noise Analysis

Fig. A.1. Measurement data of the jitter experiment. The ratio of the coefficients of the quadratic fit was determined as \(\frac{\alpha }{\beta } \approx 18 \cdot 10^3\) .

To investigate the noise parameters for our specific platform, we performed the analysis of noise effects that has been presented by Haddad et al. [18]. The proposed experiment compares the jitter between two oscillators, where one stops the other after N oscillations and the number of oscillations for the second oscillator is recorded for different values of N.

For their experiment, Haddad et al. [18] derive that the overall jitter \(\sigma _N\) is composed of thermal noise impact (\(\sigma _{N, th} \propto N\)) and flicker noise impact (\(\sigma _{N, fl} \propto N^2\)): (14) \(\begin{align} \sigma _N^2 & = \underbrace{\frac{2 \cdot b_{th}}{f_0^3} N}_{\sigma _{N, th}^2} + \underbrace{\frac{8 \cdot ln(2) \cdot b_{fl}}{f_0^4} N^2}_{\sigma _{N, fl}^2}. \end{align}\)

We can use \(r_N := \frac{\sigma _{N, th}^2}{\sigma _N^2}\) as a metric of the thermal noise contribution. We determine \(\alpha := \frac{2 \cdot b_{th}}{f_0^3}\) and \(\beta := \frac{8 \cdot ln(2) \cdot b_{fl}}{f_0^4}\) experimentally by measuring \(\sigma _N^2\) (and \(f_0\)) for various values of N and performing a curve fit on this data. Based on these parameters, we can calculate the upper limit of N for a given limit \(r_{N, min}\): (15) \(\begin{align} r_N & = \frac{\alpha \cdot N}{\alpha \cdot N + \beta \cdot N^2} = \frac{\alpha }{\alpha + \beta N}, \end{align}\) (16) \(\begin{align} N &\lt \frac{\alpha }{\beta }\cdot \left(\frac{1}{r_{N, min} } - 1\right). \end{align}\)

Following this methodology, we found that \(\frac{\alpha }{\beta } \approx 18 \cdot 10^3\) at \(f_0 \approx 392\) MHz (Figure A.1). So, for example, for \(r_{N, min} = 0.95,\) we can accumulate the oscillator jitter for around 2.4 us.

Table A.1.

Instance	H	Chi Square p-Value		Chi Square	LRS	IID Permutation
		Independence	Goodness of Fit	Pass	Pass	Pass
32	0.991892	0.915191	0.338877	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
33	0.991442	0.271656	0.525349	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
34	0.994842	0.708494	0.657599	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
35	0.995434	0.412296	0.936311	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
36	0.985251	0.566836	0.270507	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
37	0.992816	0.294088	0.545550	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
38	0.992053	0.359607	0.370609	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
39	0.994908	0.811760	0.665165	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
40	0.993793	0.178917	0.582806	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
41	0.992920	0.327685	0.257683	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
42	0.994681	0.900627	0.253804	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
43	0.986757	0.280346	0.310491	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
44	0.994092	0.272521	0.314782	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
45	0.993526	0.621159	0.352952	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
46	0.995009	0.713433	0.312900	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
47	0.996199	0.617973	0.391529	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
48	0.995788	0.199517	0.308791	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
49	0.996228	0.800793	0.937441	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
50	0.993095	0.713063	0.507468	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
51	0.986717	0.001448	0.737751	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
52	0.995912	0.353098	0.400936	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
53	0.995236	0.733570	0.540364	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
54	0.988668	0.845608	0.533944	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
55	0.990338	0.368269	0.355110	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
56	0.994388	0.824579	0.893011	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
57	0.991080	0.409606	0.139833	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
58	0.990023	0.460630	0.894219	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
59	0.991700	0.274109	0.853038	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
60	0.985654	0.483515	0.313661	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
61	0.989813	0.372409	0.994627	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
62	0.990556	0.648651	0.210829	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
63	0.994273	0.478404	0.019948	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)

View Table

Table A.1. Results of the NIST SP 800-90B IID Tests for the 3S-TERO (Based on 1 Million Samples Each)

A.2 NIST SP 800-90B IID Test Results

Fig. A.2. Histograms of TERO (stage length = 2) instances.

Table A.2.

Instance	H	Chi Square p-Value		Chi Square	LRS	IID Permutation
		Independence	Goodness of Fit	Pass	Pass	Pass
0	0.971862	0.000000	–	✘	\(\checkmark\)	✘
1	0.979749	0.217717	0.763359	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
2	0.994048	0.208420	0.890438	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
3	0.972484	0.000000	–	✘	\(\checkmark\)	✘
4	0.993270	0.460021	0.481394	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
5	0.989980	0.595433	0.457907	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
6	0.994477	0.233347	0.704916	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
7	0.993661	0.539419	0.388413	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
8	0.992684	0.285122	0.858378	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
9	0.995126	0.680832	0.946976	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
10	0.995164	0.268435	0.854592	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
11	0.988001	0.611209	0.201418	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
12	0.977143	0.221607	0.182867	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
13	0.984511	0.315191	0.312157	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
14	0.995984	0.760858	0.414164	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
15	0.979340	0.469458	0.251074	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
16	0.989736	0.273900	0.219610	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
17	0.988705	0.820762	0.576554	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
18	0.994293	0.167817	0.239495	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
19	0.989478	0.236539	0.382645	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
20	0.994718	0.609784	0.502505	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
21	0.988207	0.830978	0.712509	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
22	0.991376	0.486608	0.143152	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
23	0.981218	0.092790	0.458329	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
24	0.993761	0.008326	0.507327	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
25	0.985611	0.401483	0.851893	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
26	0.989656	0.721159	0.716137	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
27	0.978583	0.915450	0.354163	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
28	0.989868	0.337326	0.106949	\(\checkmark\)	\(\checkmark\)	✘
29	0.987818	0.656701	0.749198	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
30	0.986094	0.145232	0.494826	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
31	0.990180	0.000000	–	✘	\(\checkmark\)	✘

View Table

Table A.2. Result of the NIST SP 800-90B IID Tests for the TERO (Based on 1 Million Samples Each)

A.3 TERO Histograms

For a better comparison, the plots in Figures A.2 through A.5 are all scaled to the same x-axis range. The chosen range is a compromise between wide and narrow distributions. Each plot represents a single instance.

Fig. A.3. Histograms of TERO (stage length = 3) instances. The LUTs between the NAND gates are configured as buffers.

A.4 3S-TERO Histograms

Fig. A.4. Histograms of 3S-TERO instances. Sampling frequency: 100 MHz.

A.5 Model Validation Experiment

Footnotes

¹ The source code for Spartan 6 and Cyclone V FPGAs on the projectwebsite uses a synchronous counter with no clock buffers explicitly instantiated.
Footnote
² According to the source code on GitHub.
Footnote
³ In the Xilinx library, this input is called pre.
Footnote
⁴ The TEROs we implemented for this work were shorter (4 (respectively, 6) LUTs + asynchronous counter).
FootnoteFootnote

REFERENCES

[1] Addabbo Tommaso, Fort Ada, Moretti Riccardo, Mugnaini Marco, Takaloo Hadis, and Vignoli Valerio. 2020. A new class of digital circuits for the design of entropy sources in programmable logic. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 7 (July 2020), 2419–2430. DOI:Google ScholarCross Ref
Reference
[2] Baudet Mathieu, Lubicz David, Micolod Julien, and Tassiaux André. 2011. On the security of oscillator-based random number generators. Journal of Cryptology 24, 2 (April 2011), 398–425. DOI:Google ScholarDigital Library
Reference
[3] Bernard Florent, Fischer Viktor, and Valtchanov Boyan. 2010. Mathematical model of physical RNGs based on coherent sampling. Tatra Mountains Mathematical Publications 45, 1 (Dec. 2010), 1–14. DOI:Google ScholarCross Ref
Reference
[4] Bernard Florent, Haddad Patrick, Fischer Viktor, and Nicolai Jean. 2019. From physical to stochastic modeling of a TERO-based TRNG. Journal of Cryptology 32, 2 (April 2019), 435–458. DOI:Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[5] Bossuet Lilian, Ngo Xuan Thuy, Cherif Zouha, and Fischer Viktor. 2014. A PUF based on a transient effect ring oscillator and insensitive to locking phenomenon. IEEE Transactions on Emerging Topics in Computing 2, 1 (2014), 30–36. DOI:Google ScholarCross Ref
Reference
[6] Cao Yang, Rozic Vladimir, Yang Bohan, Balasch Josep, and Verbauwhede Ingrid. 2016. Exploring active manipulation attacks on the TERO random number generator. In Proceedings of the 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS’16). IEEE, Los Alamitos, CA, 1–4. DOI:Google ScholarCross Ref
Reference 1Reference 2
[7] Cheng Xin, Zhu Haowen, Xing Xinyi, Zhang Yunfeng, Zhang Yongqiang, Xie Guangjun, and Zhang Zhang. 2021. A feedback architecture of high speed true random number generator based on ring oscillator. In Proceedings of the 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC’21). 1–3. DOI:Google ScholarCross Ref
Reference
[8] Cherkaoui Abdelkarim, Bossuet Lilian, and Marchand Cedric. 2016. Design, evaluation, and optimization of physical unclonable functions based on transient effect ring oscillators. IEEE Transactions on Information Forensics and Security 11, 6 (June 2016), 1291–1305. DOI:Google ScholarDigital Library
Reference
[9] Cherkaoui Abdelkarim, Fischer Viktor, Aubert Alain, and Fesquet Laurent. 2013. A self-timed ring based true random number generator. In Proceedings of the 2013 IEEE 19th International Symposium on Asynchronous Circuits and Systems (ASYNC’13). IEEE, Los Alamitos, CA, 99–106.Google ScholarDigital Library
Reference
[10] Cherkaoui Abdelkarim, Fischer Viktor, Fesquet Laurent, and Aubert Alain. 2013. A very high speed true random number generator with entropy assessment. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems. 179–196.Google ScholarDigital Library
Reference
[11] Cui Jianguo, Yi Maoxiang, Cao Di, Yao Liang, Wang Xinyu, Liang Huaguo, Huang Zhengfeng, Qi Haochen, Ni Tianming, and Lu Yingchun. 2022. Design of true random number generator based on multi-stage feedback ring oscillator. IEEE Transactions on Circuits and Systems II: Express Briefs 69, 3 (March 2022), 1752–1756. DOI:Google ScholarCross Ref
Reference 1Reference 2
[12] Sala Riccardo Della, Bellizia Davide, and Scotti Giuseppe. 2022. A novel ultra-compact FPGA-compatible TRNG architecture exploiting latched ring oscillators. IEEE Transactions on Circuits and Systems II: Express Briefs 69, 3 (March 2022), 1672–1676. DOI:Google ScholarCross Ref
Reference
[13] Delvaux Jeroen. 2019. Refutation and Redesign of a Physical Model of TERO-Based TRNGs and PUFs. Paper 2019/810. IACR Cryptology ePrint Archive.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[14] Drutarovsky Milos and Varchola Michal. 2010. Analysis of randomness sources in transition effect ring oscillator based TRNG. In Proceedings of the International Workshop on Cryptographic Architectures Embedded in Reconfigurable Devices (CryptArchi’10). 102–107.Google Scholar
Reference
[15] Dworkin Morris J.. 2007. NIST Special Publication 800-38D - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. National Institute of Standards & Technology.Google Scholar
Reference
[16] Fischer Viktor and Drutarovský Milos. 2003. True random number generator embedded in reconfigurable hardware. In Cryptographic Hardware and Embedded Systems—CHES 2002. Lecture Notes in Computer Science, Vol. 2523. Springer, 415–430. DOI:Google ScholarCross Ref
Reference
[17] Fujieda Naoki. 2020. On the feasibility of TERO-based true random number generator on Xilinx FPGAs. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 103–108. DOI:Google ScholarCross Ref
Reference 1Reference 2Reference 3
[18] Haddad Patrick, Teglia Yannick, Bernard Florent, and Fischer Viktor. 2014. On the assumption of mutual independence of jitter realizations in P-TRNG stochastic models. In Proceedings of the 2014 Design, Automation, and Test in Europe Conference and Exhibition (DATE’14). 1–6. DOI:Google ScholarCross Ref
Reference
[19] Yang Kaiyuan, Blaauw David, and Sylvester Dennis. 2016. An all-digital edge racing true random number generator robust against PVT variations. IEEE Journal of Solid-State Circuits 51, 4 (April 2016), 1022–1031. DOI:Google ScholarCross Ref
Reference
[20] Killmann Wolfgang and Schindler Werner. 2011. A Proposal for: Functionality Classes for Random Number Generators (v2.0). BSI.Google Scholar
Reference
[21] Li Xiang, Stanwicks Peter, Provelengios George, Tessier Russell, and Holcomb Daniel. 2023. Jitter-based adaptive true random number generation circuits for FPGAs in the cloud. ACM Transactions on Reconfigurable Technology and Systems 16, 1 (Sept. 2023), Article 3, 20 pages. DOI:Google ScholarDigital Library
Reference
[22] Mureddu Ugo, Bochard Nathalie, Bossuet Lilian, and Fischer Viktor. 2019. Experimental study of locking phenomena on oscillating rings implemented in logic devices. IEEE Transactions on Circuits and Systems I: Regular Papers 66, 7 (July 2019), 2560–2571. DOI:Google ScholarCross Ref
Reference
[23] Anandakumar N. Nalla, Sanadhya Somitra Kumar, and Hashmi Mohammad S.. 2020. FPGA-based true random number generation using programmable delays in oscillator-rings. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 3 (March 2020), 570–574. DOI:Google ScholarCross Ref
Reference 1Reference 2
[24] Peetermans Adriaan, Rožić Vladimir, and Verbauwhede Ingrid. 2021. Design and analysis of configurable ring oscillators for true random number generation based on coherent sampling. ACM Transactions on Reconfigurable Technology and Systems 14, 2 (June 2021), Article 7, 20 pages. DOI:Google ScholarDigital Library
Reference
[25] Peetermans Adriaan, Rožić Vladimir, and Verbauwhede Ingrid. 2021. Design and analysis of configurable ring oscillators for true random number generation based on coherent sampling. ACM Transactions on Reconfigurable Technology and Systems 14, 2 (June 2021), 1–20. DOI:Google ScholarDigital Library
Reference
[26] Petura Oto, Mureddu Ugo, Bochard Nathalie, Fischer Viktor, and Bossuet Lilian. 2016. A survey of AIS-20/31 compliant TRNG cores suitable for FPGA devices. In Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications (FPL’16). 1–10. DOI:Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
[27] Reichel Peter and Döge Jens. 2014. Hardware/software infrastructure for ASIC commissioning and rapid system prototyping. In Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig’14). 1–6. DOI:Google ScholarCross Ref
Reference
[28] Reyneri Leonardo M., Corso Dante Del, and Sacco Bruno. 1990. Oscillatory metastability in homogeneous and inhomogeneous flip-flops. IEEE Journal of Solid-State Circuits 25, 1 (Feb. 1990), 254–264. DOI:Google ScholarCross Ref
Reference
[29] Roy Debapriya Basu, Bhasin Shivam, Danger Jean-Luc, Guilley Sylvain, He Wei, Mukhopadhyay Debdeep, Najm Zakaria, and Ngo Xuan Thuy. 2018. The conflicted usage of RLUTs for security-critical applications on FPGA. Journal of Hardware and Systems Security 2, 2 (June 2018), 162–178. DOI:Google ScholarCross Ref
Reference
[30] Saarinen Markku-Juhani O.. 2021. On entropy and bit patterns of ring oscillator Jitter. In Proceedings of the 2021 Asian Hardware Oriented Security and Trust Symposium (AsianHOST’21). IEEE, Los Alamitos, CA, 1–6. DOI:Google ScholarCross Ref
Reference
[31] Matthias Peter and Werner Schindler. 2022. A Proposal for Functionality Classes for Random Number Generators (v2.35). BSI.Google Scholar
Reference
[32] Sunar B., Martin W. J., and Stinson D. R.. 2007. A provably secure true random number generator with built-in tolerance to active attacks. IEEE Transactions on Computers 56, 1 (Jan. 2007), 109–119. DOI:Google ScholarCross Ref
Reference
[33] Varchola Michal and Drutarovsk Y.. 2009. New FPGA based TRNG principle using transition effect with built-in malfunction detection. In Proceedings of the International Workshop on Cryptographic Architectures Embedded in Reconfigurable Devices (CryptArchi’09). 150–155.Google Scholar
Reference 1Reference 2
[34] Varchola Michal and Drutarovsky Milos. 2010. New high entropy element for FPGA based true random number generators. In Cryptographic Hardware and Embedded Systems, CHES 2010. Lecture Notes in Computer Science, Vol. 6225. Springer, 351–365. DOI:Google ScholarCross Ref
Reference 1Reference 2Reference 3
[35] Varchola Michal, Drutarovsky Milos, and Fischer Viktor. 2013. New universal element with integrated PUF and TRNG capability. In Proceedings of the 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig’13). IEEE, Los Alamitos, CA, 1–6. DOI:Google ScholarCross Ref
Reference
[36] Wang Xinyu, Liang Huaguo, Wang Yanjie, Yao Liang, Guo Yang, Yi Maoxiang, Huang Zhengfeng, Qi Haochen, and Lu Yingchun. 2021. High-throughput portable true random number generator based on Jitter-Latch structure. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 2 (Feb. 2021), 741–750. DOI:Google ScholarCross Ref
Reference 1Reference 2
[37] Xilinx. 2019. Virtex UltraScale FPGAs Data Sheet: DC and AC Switching Characteristics (DS893, v1.12). Xilinx.Google Scholar
Reference
[38] Xilinx. 2021. UltraScale Architecture Lbraries Guide (UG974, v2021.2). Xilinx.Google Scholar
Reference
[39] Xilinx. 2021. Vivado Design Suite User Guide: Using Constraints (UG903, v2021.2). Xilinx.Google Scholar
Reference
[40] Yang Bohan, Rožić Vladimir, Grujić Miloš, Mentens Nele, and Verbauwhede Ingrid. 2018. ES-TRNG: A high-throughput, low-area true random number generator based on edge sampling. IACR Transactions on Cryptographic Hardware and Embedded Systems 2018, 3 (Aug. 2018), 267–292. DOI:Google ScholarCross Ref
Reference 1Reference 2
[41] Yang Kaiyuan, Fick David, Henry Michael B., Lee Yoonmyung, Blaauw David, and Sylvester Dennis. 2014. 16.3 A 23Mb/s 23pJ/b fully synthesized true-random-number generator in 28nm and 65nm CMOS. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). 280–281. DOI:Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4

Index Terms

Increasing the Robustness of TERO-TRNGs Against Process Variation
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Reconfigurable logic applications
2. Security and privacy
  1. Security in hardware
    1. Hardware security implementation

Recommendations

Enhanced TERO-PUF Implementations and Characterization on FPGAs (Abstract Only)
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Physical unclonable functions (PUF) are a promising approach in design for trust and security. A PUF derives a unique identifier using physical characteristics of different dies containing an identical circuit, so it can be used to authenticate chips ...
Read More
A Comparison of TERO and RO Timing Sensitivity for Hardware Trojan Detection Applications
DSD '15: Proceedings of the 2015 Euromicro Conference on Digital System Design

A Ring Oscillator (RO) integrated in a design can be used for detecting insertion of malicious logic i.e., a hardware Trojan horse. Recently, the Transition Effect Ring Oscillator (TERO) was proposed as a means for implementing True Random Number ...
Read More
An FPGA implementation for neural networks with the FDFM processor core approach

This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 16, Issue 3
September 2023
447 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3604889
Editor:
Deming Chen
University of Illinois, Urbana-Champaign, USA
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 July 2023
- Online AM: 23 May 2023
- Accepted: 10 May 2023
- Revised: 9 April 2023
- Received: 2 January 2023
Published in trets Volume 16, Issue 3

Check for updates
Author Tags
Hardware random number generators
ring oscillators
entropy
FPGA
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 468
  Total Downloads
- Downloads (Last 12 months)468
- Downloads (Last 6 weeks)56
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Increasing the Robustness of TERO-TRNGs Against Process Variation

ACM Transactions on Reconfigurable Technology and Systems

Abstract

1 INTRODUCTION

1.1 On the Relevance of FPGA-Based Entropy Sources

1.2 Entropy Sources for FPGAs

1.3 Transition Effect Ring Oscillator

1.4 Goals and Contributions

2 TERO COUNTERS

2.1 TERO Counter Implementations

2.2 The Timing Issue

3 THE THREE-STAGED TERO FOR FPGAS

3.1 From Two to Three Stages

3.2 FPGA Implementation of the 3S-TERO

4 MODELING

4.1 Physical Modeling of Pulse Shortening

4.2 Model Simplification

4.3 Modeling of the Asynchronous Evaluation

4.4 Model Simulation and Parameter Identification

4.5 Stochastic Model

5 MODEL VALIDATION

5.1 Motivation and Idea

5.2 Dynamically Reconfigurable Look-Up Tables

5.3 Implementation

5.4 Implementation Without RLUTs

6 EVALUATION

6.1 Evaluation Setup

6.2 Stability Regarding Process Variation

6.3 Model Validation

6.3.1 Fitting of Related Instances.

6.3.2 Computational Effort.

6.4 Entropy

6.5 NIST SP 800-90B IID and Non-IID Tests

6.6 Temperature

6.7 Restart Tests

6.8 Comparison

7 CONCLUSION

A APPENDIX

A.1 Experimental Noise Analysis

A.2 NIST SP 800-90B IID Test Results

A.3 TERO Histograms

A.4 3S-TERO Histograms

A.5 Model Validation Experiment

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Enhanced TERO-PUF Implementations and Characterization on FPGAs (Abstract Only)

A Comparison of TERO and RO Timing Sensitivity for Hardware Trojan Detection Applications

An FPGA implementation for neural networks with the FDFM processor core approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media