Abstract
The transition effect ring oscillator is a popular design for building entropy sources because it is compact, built from digital elements only, and is very well suited for FPGAs. However, it is known to be quite sensitive to process variation. Although the latter is useful for building physical unclonable functions, it is interfering with the application as an entropy source.
In this article, we investigate an approach to increase reliability. We show that adding a third stage eliminates much of the susceptibility to process variation and how a resulting gigahertz oscillation can be evaluated on an FPGA. The design is supported by physical and stochastic modeling. The physical model is validated using an experiment with dynamically reconfigurable look-up tables.
1 INTRODUCTION
1.1 On the Relevance of FPGA-Based Entropy Sources
Randomness is often essential in cryptography, such as for generating keys, initialization vectors, or nonces. Popular modes like AES-GCM can fail spectacularly if, for example, the same initialization vector is used twice due to poor or malicious design of an entropy source [15]. Although there are efforts to eliminate the requirement of randomness (e.g., through synthetic initialization vectors), true randomness remains a general requirement for secure operation of cryptographic systems.
The dependence on a physical source distinguishes random number generation from other elements of cryptography. A True Random Number Generator (TRNG)
cannot be replaced by software,
cannot be proved purely mathematically,
cannot be evaluated by standardized test vectors alone, and
is subject to environmental conditions and aging.
There are some similarities to other security hardware elements such as trusted on-chip memory. However, most problems with security hardware blocks are limited to attackers who are able to execute code or gain physical access, whereas vulnerabilities of a TRNG can compromise remote connections and on-chip generated keys. In summary, TRNGs are significant, difficult to assess, and potentially exploitable remotely by attackers.
To build confidence in a TRNG, it is essential to enable its evaluation. It can be argued that this includes disclosure of the design. The barrier to independent research is further lowered if the design can be implemented without a large investment. Therefore, FPGA-based implementations are of particular interest and will be examined in more detail in the following sections.
1.2 Entropy Sources for FPGAs
On FPGAs, one is limited in terms of possible building blocks: one can use either available hard macros (e.g., PLLs [16]) or digital logic elements such as Look-Up Tables (LUTs) and Flip-Flops (FFs). Ring oscillator-based designs are a typical approach. Petura et al. [26] categorize their studied entropy sources into those that are based on
single-event ring oscillators,
multi-event ring oscillators without signal collisions, and
multi-event ring oscillators with signal collisions.
A current example of the first category is the ES-TRNG [40], in which a ring oscillator is sampled by a tapped delay chain that is clocked by a second ring oscillator. Self-timed rings [9] fall into the second category. The Transition Effect Ring Oscillator (TERO) is a typical example of the third category and is the focus of the following section.
1.3 Transition Effect Ring Oscillator
The TERO was introduced by Varchola and Drutarovsky [34]. It is basically a latch with extended feedback paths that oscillates. The length of the oscillation is the random variable of this entropy source. The implementation can be based, for example, on an XOR/AND combination or NAND gates (Figure 1(a) and (b)). The authors reported that placement and routing are important for this design. The TERO design was later extended to be used as Physical Unclonable Function (PUF) [35]. A model based on noisy inverters was presented by Bernard et al. [4]. The authors develop an analytical expression for the probability of the number of oscillations and a procedure to determine the model parameters and the entropy rate.
Some authors proposed configuration options, for example, to compensate for process variation effects. Yang et al. [19] presented a tunable edge racing TRNG based on two ring stages. The proposed design uses multiple parallel inverters and selects one through a multiplexer (see Figure 1(c)). By varying the MUX configuration, some of the device mismatch can be compensated. A configurable TERO FPGA implementation has been proposed by Fujieda [17]. The implementation consists of NAND gates and multiplexers, with each multiplexer stage selecting between three inputs. This way, logic elements and routing paths can be adapted by configuration.
1.4 Goals and Contributions
Our goal is to explore how TERO designs can be implemented more reliably so that they are less susceptible to process variation effects. The contributions of this work are as follows:
(1) | We analyze timing issues of TERO counter implementations. | ||||
(2) | We demonstrate that a variant of the TERO design that adds an additional stage to reduce process variation effects can be implemented on FPGAs. | ||||
(3) | We show how the proposed entropy source can be evaluated in the system clock domain using asynchronous evaluation. | ||||
(4) | A model for the implementation is developed that is used for the entropy rate estimation. | ||||
(5) | We propose and perform experimental validation of the model beyond simple curve fitting. | ||||
(6) | Finally, we evaluate our proposed entropy source design and show that the design is indeed more robust to process variation effects. |
2 TERO COUNTERS
2.1 TERO Counter Implementations
To use the TERO oscillator as an entropy source, you must measure the time that the oscillator is running. This is usually accomplished by using the TERO as a clock for a counter. We have found that in most publications, the actual counter implementation is not clearly stated. Where we could find more specific information, we identified four types of counter implementations (Figure 2 and Table 1).
The simplest and most direct implementation is shown in Figure 2. Figure 2(a) presents a synchronous counter that uses the TERO directly as a clock. Since this puts additional load on the oscillator, this may well affect the oscillation. This problem can be mitigated by an additional clock buffer, as shown in Figure 2(b)).
Since synchronous counters are limited in performance, a common implementation is an asynchronous counter (see Figure 2(c)). When only the LSB of the counter is used as the entropy output, the length of the chain is limited to one bit (so only a single T-FF is used). This reduces the size of the entropy source but also limits the monitoring of the entropy source’s health.
Finally, Delvaux [13] proposed the use of a T-FF for restoring the duty cycle (see Figure 2(d)) [13]. From the synchronous counter’s point of view, the FF also acts as a low-pass filter, since short pulses are absorbed. To maintain full resolution, he combines this with both a rising and falling edge sensitive counter.
2.2 The Timing Issue
Although the TERO can generate an approximately uniform duty cycle at the beginning of the oscillation, the duty cycle changes over time until the oscillation collapses (Figure 3). In practice, the initial duty cycle may be skewed due to, for example, routing variations, a different number of inverters per branch, or process variation. We consider the frequency of the oscillation to be approximately constant. This signal has three characteristics that make it difficult to use as a clock:
the frequency is high and may already exceed the technological limits of synchronous designs at the beginning,
the duty cycle changes over time with pulses basically becoming as short as technological possible, and
at some point the inverters will probably not switch completely anymore.
There has been some discussion about the difficulties of using this signal as a clock. Varchola and Drutarovsky [33] implement the counter asynchronously because the TERO frequency exceeds the technological limitations of a synchronous implementation. Fujieda [17] mentions that timing constraints must be considered when implementing the counter. However, he does not address how to properly constrain the design. Delvaux [13] argues that duty cycle is an issue in a synchronous counter implementation: when oscillation exceeds technological limitations, some FFs of the counter may update their state, whereas others remain at their previous value. He also conducted experiments to confirm this assumption. Contrary to his argument, this is not a problem that affects all previous TERO implementations with multi-bit counters. Asynchronous counters are a common implementation [5, 6, 14, 33, 34] and are not subject to the same technological limitations. Although there is a frequency limit here as well, exceeding it does not result in corrupted counters. Instead, the first T-FF acts as a frequency filter that absorbs short pulses, and the counter just misses toggling.
As Delvaux’s experiments show, synchronous implementations (Figure 2(a) and (b)) should not be used. For an ASIC implementation, the circuit given in Figure 2(c) would be a good candidate. For FPGAs, however, the use of an asynchronous multi-bit counter is questionable because FPGA toolchains are tailored to synchronous designs. Constraining such designs is not part of the standard design flow. In particular, without proper timing analysis, one does not know when the counter bits are expected to be stable.
The implementation in Figure 2(d) may be better suited for FPGA designs. Only a single T-FF is required, where manual placement and routing are much more feasible than for a complete multi-bit counter. All remaining bits can be placed automatically by the placement tool. Proper operation of this implementation is only guaranteed if the clock for the synchronous counters is properly constrained. For this purpose, the developer must manually determine the maximum frequency. This can be challenging since, for example, Xilinx Vivado does not support direct timing analysis for a purely combinatorial path. Since both counters can only differ in their LSB, the two counters and the adder could be simplified to a single counter and a T-FF. Additionally, it might be advisable to add a dedicated clock buffer. In this case, this will limit the number of instances that can be implemented: for our target device, each clock region contains on average only eight suitable clock buffers. Although the counter is synchronous, other digital components run in a separate clock domain. Therefore, additional synchronization will be required.
Proper design and constraining of a TERO counter is challenging to say the least and requires detailed design analysis. From an implementation point of view, it would be much easier if the entropy source could be evaluated in the system clock domain. In the following section, we present such an architecture for our TERO variant.
3 THE THREE-STAGED TERO FOR FPGAS
3.1 From Two to Three Stages
The TERO design is very sensitive to process variation. The cause of this problem is the path of the two edges (Figure 4(a)). To illustrate the effect, assume that each gate g has a rising (falling) delay \(D_{g, r}\) (\(D_{g, f}\)) and wire delays can be ignored. Consider the first two periods of the edges traversing the oscillator. In this case, edge 1 accumulates the delay \(2 \cdot (D_{1, r} + D_{2, f}),\) whereas edge 2 accumulates the delay \(2 \cdot (D_{2, r} + D_{1, f})\). Thus, in the two-staged TERO, the mismatch between the paths of the edges is accumulated over time. So, the timing of the collapse of the oscillation depends not only on the noise but also is heavily influenced by the effects of process variation. This effect has been shown to be relevant for both ASIC and FPGA implementations. Bernard et al. [4] report that their adjustable ASIC design works well only in a specific range of path delays, which they identify as a potential problem for large-volume production. For FPGA implementations, the survey by Petura et al. [26] found that even the same configuration file can yield very different implementation results. In their feasibility and repeatability scale of 5 (best score) to 0 (worst score), the TERO was rated 1, meaning that a manual setup is required for each individual device. However, the dependence on actual physical parameters allows the TERO to be used as a PUF.
For ASIC implementations, one can adjust the design parameters to reduce the mismatch. For FPGAs, the number of available options is more limited. One approach used for both ASIC and FPGA implementations is a configurable design. A major disadvantage of a configurable design is the increased complexity of using it. One needs a way to select the correct implementation and must ensure that an attacker cannot exploit the configurable design.
It is advantageous to address the process variation sensitivity in the architecture of the entropy source. This can be done by adding a third stage (see Figure 4(b)). We refer to this architecture as the three-staged TERO or 3S-TERO. Here, the transit direction changes after each iteration. In the first two periods, each edge accumulates \(D_{1, r} + D_{1, f} + D_{2, r} + D_{2, f} + D_{3, r} + D_{3, f}\). Thus, the variation between the gates evens out every two complete iterations.
However, this advantage comes at a price. The TERO simply stops oscillating after the edge collapse. This makes evaluation easy by simply using the oscillator as a clock for a counter. For the 3S-TERO, this is more complicated: when two edges collide, only they disappear while the third continues to oscillate. We refer to this frequency as \(f_{low}\). We call the frequency at the beginning \(f_{high}\) with \(f_{high} = 3 \cdot f_{low}\). Thus, simply using the oscillator as a clock for a counter gives the collapse time (measured at \(f_{high}\)) mixed with a counter clocked at \(f_{low}\). This complicates entropy extraction and monitoring of the entropy source. To avoid this mixture, one must explicitly detect the edge collision and stop the counter accordingly, which requires additional hardware and implementation effort.
For ASICs, such an implementation was presented by Yang et al. [41]. To evaluate the oscillation state, they compare the frequency of the 3S-TERO with a reference oscillator (Figure 5). Both the area and implementation complexity for the required Phase Frequency Detector (PFD) are significant, and the design is most likely not suitable for an FPGA implementation. Additionally, the design requires additional power since it uses an oscillator as a reference. Finally, the authors report that the counter FF has mismatch issues, so the LSB is dropped and not used as an entropy output. In the following section, we propose our implementation that addresses all of these problems.
3.2 FPGA Implementation of the 3S-TERO
We propose a design that builds on the three-staged TERO oscillator and uses a novel approach to evaluate the oscillator state. In addition to addressing the problem of process variation, we have two main goals. First, we want an architecture that is suitable for FPGA implementations. Second, we prefer to evaluate the entropy source in the system clock domain. This is motivated both by our analysis of the TERO timing problem in Section 2 and by the counter LSB mismatch issue reported by Yang et al. [41].
The structure of our proposed architecture is shown in Figure 6 and consists of an oscillator (the physical source of entropy), a counter (to measure the oscillation time), evaluation logic for the oscillator state, and a result register. These elements are described in detail in the following.
The basic structure of the oscillator is found in Figure 4(b), right. It can be modified by adding either an even number of inverters or an arbitrary number of buffers between the NAND gates. The counter is reset and started by the start signal. It operates in the system clock domain. Therefore, no special timing constraints are required. It does not require dedicated clock buffers and can be shared by multiple active entropy source instances. The oscillator state evaluation controls the result register.
The challenge for the oscillator state evaluation is to evaluate a fast oscillator with a varying duty cycle with a system clock where the frequency is limited. For our most compact implementation, the ring oscillator operates at \(f_{high} \approx 1 \text{ GHz}\). At the same time, FPGA components such as clock buffers are frequency limited. For Xilinx Virtex UltraScale devices, for example, clock buffer limits range from 630 to 850 MHz [37]. Since any possible sampling frequency is far below the Nyquist frequency, simply sampling an oscillator output is insufficient.
However, for our application, sub-Nyquist sampling is possible by using three taps instead of just one. The basic idea is to take advantage of the fact that the signals on the three taps should be equal only at \(f_{high}\). This is illustrated in Figure 7: the combinatorial equality comparison will no longer be true after the oscillation collapse.
Initially, we experimented with simply sampling of the combinatorial comparison of the three taps. This approach introduced distortions because the sampling depends on the phase relationship between the 3S-TERO oscillator and the system clock.
To solve this problem, we introduce the concept of asynchronous oscillation evaluation. The idea is to use an asynchronous FF input to continuously evaluate the oscillation state. This eliminates the dependence on the phase relationship. In our implementation, the three taps are compared by an equality gate that controls the asynchronous set input3 of an FF. The D input is set to constant zero (Figure 8). As a clock, we use the system clock only where we require \(f_{sys} \lt \frac{f_{high}}{2}\). While the oscillator is running at \(f_{high}\), the output of the equality gate will be high at least twice per \(T_{sys} = \frac{1}{f_{sys}}\). Thus, regardless of the phase relationship, the repeatedly set zero is constantly overridden by the asynchronous set input before the FF output is evaluated. In this way, the circuit can detect the last clock cycle in which the oscillator is still running at \(f_{high}\) and in which the generated pulse is long enough to trigger the asynchronous set input. Figure 9 shows a comparison between direct sampling and the asynchronous implementation.
Due to asymmetric routing and process variation, you must expect the actual design to be imbalanced. Here the equivalence comparison is critical: the skew of the taps must not be extreme to ensure a correct equivalence comparison. To ensure approximately symmetric routing, the designer must analyze routing and timing. Thus, similar to the implementations discussed in Section 2, the crossing of system clock and oscillator must be considered. Unlike the implementation proposed by Delvaux [13], the counter is fully implemented in the system clock domain and there is no additional clock domain crossing.
Figure 10 shows a comparison of the resulting distribution of the counter value for different sampling frequencies. A higher sampling frequency produces a wider distribution with the potential for a higher entropy rate. At higher sampling frequencies, the histograms show binning artifacts caused by compression (see Section 4.3). For our design, the 100 MHz of our system clock turned out to be a reasonable choice for the sampling frequency as well.
4 MODELING
4.1 Physical Modeling of Pulse Shortening
The stochastic TERO model of Bernard et al. [4] is based on the physical model of Reyneri et al. [28]. For a single inverter, the pulse shortening (without considering noise components) is described as follows: (1) \(\begin{align} w_{\text{out}} = \frac{t_c}{2} + \left[w_{\text{in}} - \frac{t_c}{2}\right] [1 + H_d] = w_{\text{in}} + H_d \left(w_{\text{in}} - \frac{t_c}{2}\right). \end{align}\)
The length of the outgoing (respectively incoming) pulse of the inverter is denoted by \(w_{\text{out}}\) (\(w_{\text{in}}\)). \(H_d\) is a constant related to the physical properties of the oscillator, whereas \(\frac{t_c}{2}\) corresponds to the average pulse width.
With one additional stage, things become more complex for the three-staged TERO, since one additional pulse must be considered (Figure 11). We denote the outgoing (respectively, incoming) pulse at gate j in round k as \(\omega _{\text{out}, j, k}\) (\(\omega _{\text{in}, j, k}\)). We model the pulse shortening as follows: (2) \(\begin{align} \omega _{\text{out}, 0, k} = \omega _{\text{in}, 0, k} + H_d \left(\omega _{\text{in}, 0, k} - \frac{t_c}{2}\right) = \omega _{\text{out}, 2, k-1} + H_d \left(\omega _{\text{out}, 2, k-1} - \frac{t_c}{2}\right), \end{align}\) (3) \(\begin{align} \omega _{\text{out}, 1, k} = \omega _{\text{in}, 1, k} + H_d \left(\omega _{\text{in}, 1, k} - \frac{t_c}{2}\right) = \omega _{\text{out}, 0, k-1} + H_d \left(\omega _{\text{out}, 0, k-1} - \frac{t_c}{2}\right), \end{align}\) (4) \(\begin{align} \omega _{\text{out}, 2, k} = \omega _{\text{in}, 2, k} + H_d \left(\omega _{\text{in}, 2, k} - \frac{t_c}{2}\right) = \omega _{\text{out}, 1, k-1} + H_d \left(\omega _{\text{out}, 1, k-1} - \frac{t_c}{2}\right). \end{align}\)
For our design, the average pulse width is given by \(\frac{t_c}{2} = \frac{1}{3}(\tau _0+\tau _1+\tau _2)\). The initial pulse width is determined by the ring delay of the preceding stage: \(\begin{align*} \omega _{out, 0, 0} = \tau _2 \qquad w_{out, 1, 0} = \tau _0 \qquad w_{out, 2, 0} = \tau _1. \end{align*}\)
We denote the noise introduced at gate j in round k as \(n(j, k)\). We assume i.i.d. noise components for each edge \(n(j, k) \sim \ \mathcal {N}(0,\, \sigma),\) which for each round are added at the respective gate: (5) \(\begin{align} \omega _{\text{out}, 0, k} = \omega _{\text{out}, 2, k-1} + H_d \left(\omega _{\text{out}, 2, k-1} - \frac{t_c}{2}\right) + n(0, k), \end{align}\) (6) \(\begin{align} \omega _{\text{out}, 1, k} = \omega _{\text{out}, 0, k-1} + H_d \left(\omega _{\text{out}, 0, k-1} - \frac{t_c}{2}\right) + n(1, k), \end{align}\) (7) \(\begin{align} \omega _{\text{out}, 2, k} = \omega _{\text{out}, 1, k-1} + H_d \left(\omega _{\text{out}, 1, k-1} - \frac{t_c}{2}\right) + n(2, k). \end{align}\)
The oscillation stops as soon as one of the three pulse widths reaches zero. Thus, the probability \(P(k)\) that the oscillation collapses in round k is as follows: (8) \(\begin{align} P(k) = P(& \lbrace \omega _{\text{out}, 0, k} \le 0 & \text{ or } & \omega _{\text{out}, 1, k} \le 0 & \; \text{or} \; & \omega _{\text{out}, 2, k} \le 0 \rbrace \\ & | \; \lbrace \omega _{\text{out}, 0, l} \gt 0 & \text{ and } & \omega _{\text{out}, 1, l} \gt 0 & \text{ and } & \omega _{\text{out}, 2, l} \gt 0 \rbrace \; \forall \; l \in [0,1, \dots , k-1]). \nonumber \nonumber \end{align}\)
4.2 Model Simplification
For our model simplification, we define the following: \(\begin{align*} w_{out, j, k}^{\prime } = \frac{w_{out, j, k}}{\sigma } \qquad \tau _i^{\prime } = \frac{\tau _i}{\sigma } \qquad t_c^{\prime } = \frac{t_c}{\sigma } = \frac{2}{3}(\tau _0^{\prime }+\tau _1^{\prime }+\tau _2^{\prime }) \qquad n(j, k)^{\prime } = \frac{n(j, k)}{\sigma }. \end{align*}\)
With \(n(j, k)^{\prime } \sim \ \mathcal {N}(0,\, 1)\) we can now simplify the model by division of Equations (5) through (7) by \(\sigma\): (9) \(\begin{align} \omega _{\text{out}, 0, k}^{\prime } = \omega _{\text{out}, 2, k-1}^{\prime } + H_d \left(\omega _{\text{out}, 2, k-1}^{\prime } - \frac{t_c^{\prime }}{2}\right) + n(0, k)^{\prime }, \end{align}\) (10) \(\begin{align} \omega _{\text{out}, 1, k}^{\prime } = \omega _{\text{out}, 0, k-1}^{\prime } + H_d \left(\omega _{\text{out}, 0, k-1}^{\prime } - \frac{t_c^{\prime }}{2}\right) + n(1, k)^{\prime }, \end{align}\) (11) \(\begin{align} \omega _{\text{out}, 2, k}^{\prime } = \omega _{\text{out}, 1, k-1}^{\prime } + H_d \left(\omega _{\text{out}, 1, k-1}^{\prime } - \frac{t_c^{\prime }}{2}\right) + n(2, k)^{\prime }. \end{align}\)
This eliminates one model parameter, since we can work with a standard normal distribution. The model modification is also intuitive: the collapse probability does not depend on the absolute value of the delay parameters, but on the delays relative to the noise level.
4.3 Modeling of the Asynchronous Evaluation
A complete oscillation period lasts \(2 (\tau _0+\tau _1+\tau _2)\) and contains three rising and three falling edges. From the modeling perspective, this corresponds to six rounds. If an ideal double-edge sensitive counter were clocked by this signal, it would count at an average frequency of \(f_{\text{high}} = \frac{6}{2 (\tau _0+\tau _1+\tau _2)} =\frac{3}{\tau _0+\tau _1+\tau _2}\). Once the oscillation collapses, the period is still the same but contains only one rising and one falling edge.
In our implementation, the actual counter is clocked by a system clock at \(f_{\text{sys}}\) with (12) \(\begin{align} f_{\text{sys}} \lt \frac{f_{\text{high}}}{2}. \end{align}\)
The same is true for the evaluation logic, so \(f_{\text{sys}}\) determines the resolution of the time measurement. When the counter is stopped at round \(k,\) the time \(t_0 =\frac{1}{f_{\text{high}}} \cdot k\) has elapsed. At this time, the counter has the value \(\lfloor f_{\text{sys}} \cdot t_0 \rfloor = \lfloor \frac{f_{\text{sys}}}{f_{\text{high}}} \cdot k\rfloor\). The probability \(P(c)\) that the counter stops at c can now be determined simply by summing all \(P(k)\) where \(c =\lfloor \frac{f_{\text{sys}}}{f_{\text{high}}} \cdot k \rfloor\). This means that the resulting distribution for \(P(c)\) is a compressed version of the distribution of \(P(k)\).
4.4 Model Simulation and Parameter Identification
To speed up the computation, the simulation is vectorized. This includes both the round-by-round update of the pulse width and the detection of the simulations as finished.
The application of the model of the oscillator evaluation can be performed on the generated histogram data. Therefore, this step is not performance critical. Figure 12 shows examples of the pulse width simulation and the corresponding histogram of the counter values.
For validation of our model, we want to compare model and measurements. Therefore, we want to identify the best fitting model parameters for our entropy source instances. Typically, this is done by some sort of curve fitting of the Probability Density Function (PDF) to the measured data. In general, there is no guarantee that a useful form of this function exists. Without knowing the PDF for our model, we must take a different route to determine the model parameters.
As an approximation to the PDF, we use the results of a Monte Carlo simulation of our model. One problem with this approach is the noise in the simulation results. This is problematic because many standard curve fitting algorithms are not designed to deal with both noisy measurements and noise in the function being fitted. We opted for a brute force approach where we select a reasonable parameter range and precompute our simulation-based PDF approximation for this parameter range. Then, to determine the actual model parameters for particular measurement data, we simply select the best result by scanning the entire parameter range. Analogous to the method of least squares, we use the sum of squares of the residuals as a metric.
4.5 Stochastic Model
In TRNG models, it is a common assumption that oscillator jitter is temporally uncorrelated [4, 21, 24]. Due to effects such as flicker noise, this assumption may be too optimistic. An analysis of noise effects on FPGAs was presented by Haddad et al. [18]. To investigate our platform, we performed the experiment they proposed (see Appendix A.1). According to this methodology, we should not exceed a jitter accumulation time of 2.4 us for thermal noise to contribute 95% to the overall oscillator noise. Since we are well below this limit, we assume for the following model that the jitter realizations are independent. Statistical tests of the generated entropy data can provide additional confidence regarding possible correlations.
For an entropy output of one bit per sample, we select the LSB (without further post-processing). Let \(p_{i}\) be the probability that the LSB of a counter value is equal to i (\(i \in {0,1}\)). Given that \(p_{i}\) is known, we can calculate the min-entropy as (13) \(\begin{align} H_\mathbf {\infty } = -log_{2} (\max \limits _{i \in \lbrace 0,1\rbrace } \, p_{i}). \end{align}\)
To determine \(p_{i}\), we use our identified model parameters and the results of our Monte Carlo simulation. For the calculation of \(H_\mathbf {\infty }\), we need to account for possible inaccuracies of our \(p_{i}\) estimation. More details are presented in Section 6.4.
5 MODEL VALIDATION
5.1 Motivation and Idea
Simple parameter fitting of experimental data to a modeling function can be misleading. Fitting data to a function becomes easier as the number of parameters increases. For example, a better fit of a more complex model may just be an artifact of its complexity. In this section, we propose an experiment that provides more evidence than a single curve fit. The idea is to use configurable ring elements so that you can construct permutations with a known relationship between their parameters. Since the different configurations share a common set of hardware elements (in our case, certain LUTs and routing resources), the curve fits can be cross validated by a comparison of model parameters. It is important to note that this approach is only feasible if the curve fit allows to infer elemental model parameters (e.g., ring element delays).
5.2 Dynamically Reconfigurable Look-Up Tables
Since our experiment is built with dynamically Reconfigurable Look-Up Tables (RLUTs) this technology is introduced in the following. RLUTs are special LUTs that can be reconfigured at runtime using a simple serial interface. This allows to adapt the functionality of specific parts in a design with deterministic timing. For Xilinx UltraScale devices, the primitive is named CFGLUT5 (Figure 13) and can only be implemented in slices of the SLICEM type. It takes 32 cycles to update the complete configuration. It is possible to chain the reconfiguration interface of multiple CFGLUT5 instances [38]. Unlike partial reconfiguration, RLUT reconfiguration cannot change the routing of the design—only the logical function of the LUT can be adapted. Applications in the context of hardware security include hardware Trojans and hardening against side channel attacks [29].
5.3 Implementation
RLUTs do not allow you to change the routing, but it is possible to implement multiple routing options and select one by changing the RLUT function accordingly. The setup for our design is shown in Figure 14: the output of each LUT is connected to two inputs of its successor. Assuming that a LUT should operate as NAND, you then configure the RLUT function to either \(y=\lnot (I0 \wedge I4)\) or \(y=\lnot (I1 \wedge I4)\) to choose between routing paths. Depending on the FPGA architecture, the LUT input may also be important since the timing may be different. For Xilinx FPGAs, the pin mapping between logical and physical LUT inputs can be adapted by synthesis. To control which physical LUT input pins are used, the pin mapping can be fixed [39].
5.4 Implementation Without RLUTs
The experiment can also be adapted for standard LUTs. Instead of changing the LUT function to select between multiple routing paths, you can also use additional LUT inputs that are connected to configuration bits for the selection.
If runtime configuration is not required, there is also the option to modify the bitfile. For the Xilinx toolchain, the LUT configuration can be adapted by a TCL command. In this way, it is possible to export multiple bitfile versions where only the respective LUT configurations differ.
6 EVALUATION
6.1 Evaluation Setup
The evaluation setup for our design is shown in Figure 15. To access the FPGA hardware from Python, we use a rapid system prototyping framework [27]. This allows us to automate the complete measurement setup including PLL and RLUT configuration in a Python script. Settings like sample size or entropy source selection are stored in a register file. To reduce the amount of measurement data, the histograms of the counter value distributions can optionally be generated in hardware. Our target device is a Xilinx Virtex UltraScale XCVU440.
For the TERO, we evaluate two implementations: one with two LUTs per stage and one with three. Both use an asynchronous counter implementation. For the 3S-TERO, we implemented the shortest possible version with only three LUTs configured as NAND gates. All of these designs are implemented with the CFGLUT5 primitive and are instantiated 32 times at different locations within the FPGA’s fabric.
Among other things, we want to study the effect of process variation. Therefore, we fix relative placement, local routing, and pin mapping for each of the four implementations using a mixture of VHDL attributes and XDC constraints. This approach assumes that positions on the FPGA where identical local routing is possible share an identical or at least very similar physical implementation. In practice, however, there may be differences, such as a mirrored layout or a nearby clock network.
6.2 Stability Regarding Process Variation
As shown in Section 3.1, the motivation for the 3S-TERO design is the assumption that it is less susceptible to the effects of process variation. The practical impact is evident when you compare the histograms of TERO and 3S-TERO implementations (see Appendix A).
To measure the actual differences, we evaluate three metrics. The results are shown in Figure 16. Figure 16(a) shows the mean of the counter values (measured over 500,000 runs) of each instance. For the 3S-TERO, the mean values are quite close, whereas the mean for the TERO varies considerably. For the 3S-TERO, the scaling of the distribution depends on the sampling frequency. Therefore, Figure 16(b) is a fairer comparison: it shows the relative deviation of the mean for one instance from the average mean of the respective implementation. Thus, for each instance \(k,\) we calculate \(\Delta \mu _{k, rel} = |\mu _{k}/ \frac{\sum \nolimits _{i=0}^{31} \mu _{i}}{32} - 1|\). Although this metric remains in the range of 20% for the 3S-TERO, the TERO sometimes exceeds 100%.
To assess the dispersion, we calculated the standard deviation across the instances for Figure 16(c). To normalize for the mean, we calculate the relative standard deviation \(c_v = \frac{\sigma }{\mu }\). Again, we find that the 3S-TERO is more stable across the instances.
6.3 Model Validation
6.3.1 Fitting of Related Instances.
For the model validation experiment, it is essential to find a consistent set of parameters so that each change in configuration can be attributed to the change in a corresponding \(\tau ^{\prime }\). We determine the parameters as follows. First, we perform individual curve fits for two curves with distinct configurations. We store the best n parameter sets for each curve. In the second step, we generate all possible parameter sets, resulting in \(n^2\) combinations. Any combination where the parameter \(H_d\) does not match is discarded. For each remaining combination (\(H_d, \tau _{0_a}^{\prime }, \tau _{1_a}^{\prime }, \tau _{2_a}^{\prime }, \tau _{0_b}^{\prime }, \tau _{1_b}^{\prime }, \tau _{2_b}^{\prime }\)), we then generate the possible permutations (e.g., \(H_d, \tau _{1_a}^{\prime }, \tau _{0_a}^{\prime }, \tau _{2_a}^{\prime }, \tau _{0_b}^{\prime }, \tau _{1_b}^{\prime }, \tau _{2_b}^{\prime }\)). These parameter sets are then applied to the simulation model, and we calculate the sum of the curve fit metric. Finally, we select the parameter set with the lowest curve fit error as our result.
Two examples of the resulting curve fits can be found in Figure 17. In general, our curve fitting procedure can find quite well fitting consistent parameter sets. One possible reason for the small difference between curve fit and measurement data is our limited precomputation precision (e.g., the \(\tau ^{\prime }\) values were computed only in steps of 20).
6.3.2 Computational Effort.
For our system, we calculated 1,400 rounds per simulation and performed 300,000 simulations per parameter set. The precomputation took about 10,000 CPU hours. In the second step, the compression due to the asynchronous evaluation logic is applied. The final result can be compressed to less than 15 MByte. Based on this dataset, scanning the entire parameter space for the best fit takes only a few minutes on a standard computer.
6.4 Entropy
To calculate \(H_\mathbf {\infty }\) for our design (see Section 4.5), we select the configuration with the widest distribution and identify the parameters as previously described. The calculation of \(H_\mathbf {\infty }\) relies on \(p_i\) for which two factors may lead to an inaccurate estimate. First, the Monte Carlo simulation approach causes statistical fluctuation in the values we determine for \(p_i\). Second, there may be uncertainty in the model parameters caused by the model fitting procedure, e.g., because the accuracy of the model parameters is limited by the step size of the parameters. A similar effect might be caused by different operating conditions.
To estimate the error caused by Monte Carlo simulation, we select one parameter set and calculate the probability of the counter LSBs based on multiple simulation runs. The result is shown in Figure 18(a). The deviation from the mean of most values is less than 0.5%. To account for uncertainties in the model parameters, we calculate \(H_\mathbf {\infty }\) for the variation of \(\tau ^{\prime }\) in a range of \(\pm 40\). For a conservative entropy estimate, we select the configuration with the lowest \(H_\mathbf {\infty }\). Additionally, we consider the possible 0.5% error of the Monte Carlo simulation. Under these conditions, we determined \(H_\mathbf {\infty }\) to be above 0.91 for all instances (see Figure 18(b)). If you use one bit per sample, an operating frequency of 100 MHz, and 60 cycles per sample, the throughput is about \(1.6 \cdot 10^6\) bit/s. Although a guaranteed min-entropy of 0.91 bit would be sufficient for the current version of AIS-31 [20], the draft for a revision [31] indicates that future standards might require a higher level of entropy per bit. In this case, further algorithmic post-processing is required. To validate the theoretical result, we performed test procedure B (T6–T8) as described in AIS-31. The tests pass and provide a Shannon entropy estimate of at least 0.9994 per bit for all instances.
6.5 NIST SP 800-90B IID and Non-IID Tests
For additional validation and comparison between the TERO and the 3S-TERO, we performed the IID and non-IID entropy estimations from NIST SP 800-90B for 32 instances each. For the TERO, we choose the same entropy output as for the 3S-TERO (LSB without further post-processing). For the 3S-TERO (see Table 3), we found that all 32 instances passed the IID tests. For the TERO instances, we examined (see Table 4) four instances that failed the IID test. All four failed the IID permutation test, and three of them also failed the chi square independence test.
Publication | Application | Technology | TERO Length | Counter Type |
---|---|---|---|---|
[8] | PUF | Cyclone III, ASIC (350 nm) | Various | (a)1 |
[17] | TRNG | Artix-7 XC7A35T | 8 | (b)2 |
[34] | TRNG | Spartan 3E | 4 | (c) |
[13] | TRNG, PUF | Zynq-7000 | 8–64 | (d) |
Length refers to the total number of elements (e.g., buffer, inverter, or NAND gates) of a TERO instance.
Length refers to the total number of elements (e.g., buffer, inverter, or NAND gates) of a TERO instance.
Type | Platform | Area | Throughput | Stochastic | Multiple Instances |
---|---|---|---|---|---|
(Mbit/s) | Model | ||||
Meta-stability [12] | Altera Cyclone V | 4 LUTs, 3 FFs | 0.76 | ✘ | ✘ |
Microsemi SmartFusion 2 | |||||
Multi-stage feedback ring oscillator [11] | Xilinx Spartan 6 | 24 LUTs, 2 FFs | 150 | ✘ | ✘ |
Xilinx Virtex 6 | 290 | ||||
RO with configurable delay [23] | Xilinx Spartan 3 | 528 LUTs, 177 FFs | 6 | ✘ | ✘ |
Digital nonlinear oscillator [1] | Xilinx Artix 7 | 15 LUTs | 100 | ✘ | 6 FPGAs, 16 locations each |
STR/Jitter-latch structure [36] | Xilinx Spartan 6 | 56 LUTs, 19 FFs | 100 | ✘ | \(\checkmark\) |
Xilinx Virtex 6 | |||||
ERO [26] | Xilinx Spartan 6 | 46 LUTs, 19 FFs | 0.0042 | [2] | ✘ |
Altera Cyclone V | 34 LUTs, 20 FFs | 0.0027 | |||
Microsemi SmartFusion 2 | 45 LUTs, 19 FFs | 0.014 | |||
COSO [26] | Xilinx Spartan 6 | 18 LUTs, 3 FFs | 0.54 | \(\checkmark\) | ✘ |
Altera Cyclone V | 13 LUTs, 3 FFs | 1.44 | |||
Microsemi SmartFusion 2 | 23 LUTs, 3 FFs | 0.328 | |||
Configurable COSO [25] | Xilinx Spartan 6 | 108 LUTs, 39 FFs | 3.3 | \(\checkmark\) | \(\checkmark\) |
Microsemi SmartFusion 2 | 111 LUTs, 38 FFs | 1.47 | |||
Xilinx Spartan 7 | 82 LUTs, 62 FFs/58 LUTs, 62 FFs | 4.65/2.34 | |||
MURO [26] | Xilinx Spartan 6 | 521 LUTs, 131 FFs | 2.57 | [32] | ✘ |
Altera Cyclone V | 525 LUTs, 130 FFs | 2.2 | |||
Microsemi SmartFusion 2 | 545 LUTs, 130 FFs | 3.62 | |||
PLL [26] | Xilinx Spartan 6 | 34 LUTs, 14 FFs | 0.44 | [3] | ✘ |
Altera Cyclone V | 24 LUTs, 14 FFs | 0.6 | |||
Microsemi SmartFusion 2 | 30 LUTs, 15 FFs | 0.37 | |||
TERO [26] | Xilinx Spartan 6 | 39 LUTs, 12 FFs4 | 0.625 | [4] | ✘ |
Altera Cyclone V | 46 LUTs, 12 FFs | 1 | |||
Microsemi SmartFusion 2 | 46 LUTs, 12 FFs | 1 | |||
STR [26] | Xilinx Spartan 6 | 346 LUTs, 256 FFs | 154 | [10] | ✘ |
Altera Cyclone V | 352 LUTs, 256 FFs | 245 | |||
Microsemi SmartFusion 2 | 350 LUTs, 256 FFs | 188 | |||
High-precision edge sampling [40] | Xilinx Spartan 6 | 10 LUTs, 5 FFs + counter | 1.15 | \(\checkmark\) | ✘ |
Altera Cyclone V | 10 LUTs, 6 FFs + counter | 1.067 | |||
3S-TERO (this work) | Xilinx Virtex UltraScale | 5 LUTs, 3 FFs + counter | 1.6 | \(\checkmark\) | 32 instances |
In the case of non-IID entropy outputs, NIST provides a set of entropy tests that aim to be diverse and conservative. In general, these tests tend to underestimate entropy [30]. We use a sample size of about 256 million samples per instance. For the 3S-TERO, the tests give fairly stable results across all instances (Figure 19(b)). According to these tests, the target of 0.91 bit of entropy per sample is achieved. The results for the TERO (see Figure 19(a)) correlate with the results of the IID test: the instances that failed the independence test are significantly weaker. Overall, both IID and Non-IID tests confirm that the proposed design provides more consistent results. The direct comparison indicates that the design goal was indeed achieved.
6.6 Temperature
To evaluate the temperature effect on random number generation, we performed experiments in a climate cabinet (Figure 20; only instance 32 was tested). We evaluated the data using the NIST Non-IID tests, as these provide a diverse set of entropy estimates for comparison. Overall, we did not detect any particular effect for the temperature range evaluated.
6.7 Restart Tests
It is possible for a design to have a similar pattern after each restart. To evaluate the restart behavior, you can record the initial output sequence for several restarts of the design [7, 11, 23, 36]. For the test, we enable the reset for a couple of seconds, release the reset, acquire the initial random data generated, and record the output sequence for several restarts (Figure 21). The results do not indicate a particular repeating pattern.
6.8 Comparison
Table 2 provides a comparison between our 3S-TERO implementation and other FPGA TRNGs. Our implementation has one of the lowest resource consumptions. In terms of throughput, there are both slower and faster designs. To compare the throughput, one needs to take into account that our platform is manufactured in a more modern technology node, so the numbers are not directly comparable. The fifth column lists the availability of stochastic modeling, which is required for entropy estimation of modern standards. For many designs, this is lacking, whereas for other designs, a detailed theoretical analysis is provided. The last column lists whether validation across multiple instances is reported.
With respect to active and passive attacks, we did not perform specific experiments. The results of Cao et al. [6] indicate that TEROs are resistant to low temperatures but vulnerable to underpower attacks. The work of Mureddu et al. [22] shows that TEROs are susceptible to the locking phenomenon. We expect similar results for our proposed implementation.
In a direct comparison with the original 3S-TERO ASIC implementation of Yang et al. [41], there are some differences worth mentioning. Our proposed evaluation logic is much simpler and smaller than a PFD with a reference oscillator. Additionally, the synchronous counter implementation fits better into a standard FPGA development flow and it will not face the LSB mismatch issues. These advantages come at a cost of resolution, since the counter distribution is less wide.
7 CONCLUSION
In this work, we introduced the 3S-TERO, a novel entropy source for FPGAs. Although its resource usage is comparable to the TERO, our results showed that it is much more robust against the effects of process variation. The counter implementation avoids using the oscillator as a clock at the cost of resolution. We proposed a model for our design that fits the measurement data. Furthermore, we constructed related instances that share certain model parameters and show that model and experimental results are consistent. Both theoretical and experimental results showed that a sufficient level of entropy can be achieved across all instances.
A APPENDIX
A.1 Experimental Noise Analysis
To investigate the noise parameters for our specific platform, we performed the analysis of noise effects that has been presented by Haddad et al. [18]. The proposed experiment compares the jitter between two oscillators, where one stops the other after N oscillations and the number of oscillations for the second oscillator is recorded for different values of N.
For their experiment, Haddad et al. [18] derive that the overall jitter \(\sigma _N\) is composed of thermal noise impact (\(\sigma _{N, th} \propto N\)) and flicker noise impact (\(\sigma _{N, fl} \propto N^2\)): (14) \(\begin{align} \sigma _N^2 & = \underbrace{\frac{2 \cdot b_{th}}{f_0^3} N}_{\sigma _{N, th}^2} + \underbrace{\frac{8 \cdot ln(2) \cdot b_{fl}}{f_0^4} N^2}_{\sigma _{N, fl}^2}. \end{align}\)
We can use \(r_N := \frac{\sigma _{N, th}^2}{\sigma _N^2}\) as a metric of the thermal noise contribution. We determine \(\alpha := \frac{2 \cdot b_{th}}{f_0^3}\) and \(\beta := \frac{8 \cdot ln(2) \cdot b_{fl}}{f_0^4}\) experimentally by measuring \(\sigma _N^2\) (and \(f_0\)) for various values of N and performing a curve fit on this data. Based on these parameters, we can calculate the upper limit of N for a given limit \(r_{N, min}\): (15) \(\begin{align} r_N & = \frac{\alpha \cdot N}{\alpha \cdot N + \beta \cdot N^2} = \frac{\alpha }{\alpha + \beta N}, \end{align}\) (16) \(\begin{align} N &\lt \frac{\alpha }{\beta }\cdot \left(\frac{1}{r_{N, min} } - 1\right). \end{align}\)
Following this methodology, we found that \(\frac{\alpha }{\beta } \approx 18 \cdot 10^3\) at \(f_0 \approx 392\) MHz (Figure A.1). So, for example, for \(r_{N, min} = 0.95,\) we can accumulate the oscillator jitter for around 2.4 us.
A.2 NIST SP 800-90B IID Test Results
A.3 TERO Histograms
For a better comparison, the plots in Figures A.2 through A.5 are all scaled to the same x-axis range. The chosen range is a compromise between wide and narrow distributions. Each plot represents a single instance.
A.4 3S-TERO Histograms
A.5 Model Validation Experiment
Footnotes
1 The source code for Spartan 6 and Cyclone V FPGAs on the projectwebsite uses a synchronous counter with no clock buffers explicitly instantiated.
Footnote2 According to the source code on GitHub.
Footnote3 In the Xilinx library, this input is called pre.
Footnote4 The TEROs we implemented for this work were shorter (4 (respectively, 6) LUTs + asynchronous counter).
FootnoteFootnote
- [1] . 2020. A new class of digital circuits for the design of entropy sources in programmable logic. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 7 (
July 2020), 2419–2430.DOI: Google ScholarCross Ref - [2] . 2011. On the security of oscillator-based random number generators. Journal of Cryptology 24, 2 (
April 2011), 398–425.DOI: Google ScholarDigital Library - [3] . 2010. Mathematical model of physical RNGs based on coherent sampling. Tatra Mountains Mathematical Publications 45, 1 (
Dec. 2010), 1–14.DOI: Google ScholarCross Ref - [4] . 2019. From physical to stochastic modeling of a TERO-based TRNG. Journal of Cryptology 32, 2 (
April 2019), 435–458.DOI: Google ScholarDigital Library - [5] . 2014. A PUF based on a transient effect ring oscillator and insensitive to locking phenomenon. IEEE Transactions on Emerging Topics in Computing 2, 1 (2014), 30–36.
DOI: Google ScholarCross Ref - [6] . 2016. Exploring active manipulation attacks on the TERO random number generator. In Proceedings of the 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS’16). IEEE, Los Alamitos, CA, 1–4.
DOI: Google ScholarCross Ref - [7] . 2021. A feedback architecture of high speed true random number generator based on ring oscillator. In Proceedings of the 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC’21). 1–3.
DOI: Google ScholarCross Ref - [8] . 2016. Design, evaluation, and optimization of physical unclonable functions based on transient effect ring oscillators. IEEE Transactions on Information Forensics and Security 11, 6 (
June 2016), 1291–1305.DOI: Google ScholarDigital Library - [9] . 2013. A self-timed ring based true random number generator. In Proceedings of the 2013 IEEE 19th International Symposium on Asynchronous Circuits and Systems (ASYNC’13). IEEE, Los Alamitos, CA, 99–106.Google ScholarDigital Library
- [10] . 2013. A very high speed true random number generator with entropy assessment. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems. 179–196.Google ScholarDigital Library
- [11] . 2022. Design of true random number generator based on multi-stage feedback ring oscillator. IEEE Transactions on Circuits and Systems II: Express Briefs 69, 3 (
March 2022), 1752–1756.DOI: Google ScholarCross Ref - [12] . 2022. A novel ultra-compact FPGA-compatible TRNG architecture exploiting latched ring oscillators. IEEE Transactions on Circuits and Systems II: Express Briefs 69, 3 (
March 2022), 1672–1676.DOI: Google ScholarCross Ref - [13] . 2019. Refutation and Redesign of a Physical Model of TERO-Based TRNGs and PUFs. Paper 2019/810. IACR Cryptology ePrint Archive.Google Scholar
- [14] . 2010. Analysis of randomness sources in transition effect ring oscillator based TRNG. In Proceedings of the International Workshop on Cryptographic Architectures Embedded in Reconfigurable Devices (CryptArchi’10). 102–107.Google Scholar
- [15] . 2007. NIST Special Publication 800-38D - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. National Institute of Standards & Technology.Google Scholar
- [16] . 2003. True random number generator embedded in reconfigurable hardware. In Cryptographic Hardware and Embedded Systems—CHES 2002. Lecture Notes in Computer Science, Vol. 2523. Springer, 415–430.
DOI: Google ScholarCross Ref - [17] . 2020. On the feasibility of TERO-based true random number generator on Xilinx FPGAs. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 103–108.
DOI: Google ScholarCross Ref - [18] . 2014. On the assumption of mutual independence of jitter realizations in P-TRNG stochastic models. In Proceedings of the 2014 Design, Automation, and Test in Europe Conference and Exhibition (DATE’14). 1–6.
DOI: Google ScholarCross Ref - [19] . 2016. An all-digital edge racing true random number generator robust against PVT variations. IEEE Journal of Solid-State Circuits 51, 4 (
April 2016), 1022–1031.DOI: Google ScholarCross Ref - [20] . 2011. A Proposal for: Functionality Classes for Random Number Generators (v2.0). BSI.Google Scholar
- [21] . 2023. Jitter-based adaptive true random number generation circuits for FPGAs in the cloud. ACM Transactions on Reconfigurable Technology and Systems 16, 1 (
Sept. 2023), Article 3, 20 pages.DOI: Google ScholarDigital Library - [22] . 2019. Experimental study of locking phenomena on oscillating rings implemented in logic devices. IEEE Transactions on Circuits and Systems I: Regular Papers 66, 7 (
July 2019), 2560–2571.DOI: Google ScholarCross Ref - [23] . 2020. FPGA-based true random number generation using programmable delays in oscillator-rings. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 3 (
March 2020), 570–574.DOI: Google ScholarCross Ref - [24] . 2021. Design and analysis of configurable ring oscillators for true random number generation based on coherent sampling. ACM Transactions on Reconfigurable Technology and Systems 14, 2 (
June 2021), Article 7, 20 pages.DOI: Google ScholarDigital Library - [25] . 2021. Design and analysis of configurable ring oscillators for true random number generation based on coherent sampling. ACM Transactions on Reconfigurable Technology and Systems 14, 2 (
June 2021), 1–20.DOI: Google ScholarDigital Library - [26] . 2016. A survey of AIS-20/31 compliant TRNG cores suitable for FPGA devices. In Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications (FPL’16). 1–10.
DOI: Google ScholarCross Ref - [27] . 2014. Hardware/software infrastructure for ASIC commissioning and rapid system prototyping. In Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig’14). 1–6.
DOI: Google ScholarCross Ref - [28] . 1990. Oscillatory metastability in homogeneous and inhomogeneous flip-flops. IEEE Journal of Solid-State Circuits 25, 1 (
Feb. 1990), 254–264.DOI: Google ScholarCross Ref - [29] . 2018. The conflicted usage of RLUTs for security-critical applications on FPGA. Journal of Hardware and Systems Security 2, 2 (
June 2018), 162–178.DOI: Google ScholarCross Ref - [30] . 2021. On entropy and bit patterns of ring oscillator Jitter. In Proceedings of the 2021 Asian Hardware Oriented Security and Trust Symposium (AsianHOST’21). IEEE, Los Alamitos, CA, 1–6.
DOI: Google ScholarCross Ref - [31] . 2022. A Proposal for Functionality Classes for Random Number Generators (v2.35). BSI.Google Scholar
- [32] . 2007. A provably secure true random number generator with built-in tolerance to active attacks. IEEE Transactions on Computers 56, 1 (
Jan. 2007), 109–119.DOI: Google ScholarCross Ref - [33] . 2009. New FPGA based TRNG principle using transition effect with built-in malfunction detection. In Proceedings of the International Workshop on Cryptographic Architectures Embedded in Reconfigurable Devices (CryptArchi’09). 150–155.Google Scholar
- [34] . 2010. New high entropy element for FPGA based true random number generators. In Cryptographic Hardware and Embedded Systems, CHES 2010. Lecture Notes in Computer Science, Vol. 6225. Springer, 351–365.
DOI: Google ScholarCross Ref - [35] . 2013. New universal element with integrated PUF and TRNG capability. In Proceedings of the 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig’13). IEEE, Los Alamitos, CA, 1–6.
DOI: Google ScholarCross Ref - [36] . 2021. High-throughput portable true random number generator based on Jitter-Latch structure. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 2 (
Feb. 2021), 741–750.DOI: Google ScholarCross Ref - [37] . 2019. Virtex UltraScale FPGAs Data Sheet: DC and AC Switching Characteristics (DS893, v1.12). Xilinx.Google Scholar
- [38] . 2021. UltraScale Architecture Lbraries Guide (UG974, v2021.2). Xilinx.Google Scholar
- [39] . 2021. Vivado Design Suite User Guide: Using Constraints (UG903, v2021.2). Xilinx.Google Scholar
- [40] . 2018. ES-TRNG: A high-throughput, low-area true random number generator based on edge sampling. IACR Transactions on Cryptographic Hardware and Embedded Systems 2018, 3 (Aug. 2018), 267–292.
DOI: Google ScholarCross Ref - [41] . 2014. 16.3 A 23Mb/s 23pJ/b fully synthesized true-random-number generator in 28nm and 65nm CMOS. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). 280–281.
DOI: Google ScholarCross Ref
Index Terms
- Increasing the Robustness of TERO-TRNGs Against Process Variation
Recommendations
Enhanced TERO-PUF Implementations and Characterization on FPGAs (Abstract Only)
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPhysical unclonable functions (PUF) are a promising approach in design for trust and security. A PUF derives a unique identifier using physical characteristics of different dies containing an identical circuit, so it can be used to authenticate chips ...
A Comparison of TERO and RO Timing Sensitivity for Hardware Trojan Detection Applications
DSD '15: Proceedings of the 2015 Euromicro Conference on Digital System DesignA Ring Oscillator (RO) integrated in a design can be used for detecting insertion of malicious logic i.e., a hardware Trojan horse. Recently, the Transition Effect Ring Oscillator (TERO) was proposed as a means for implementing True Random Number ...
An FPGA implementation for neural networks with the FDFM processor core approach
This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Comments