A fault-tolerant variational quantum algorithm with limited T-depth

Hasan Sayginel; Francois Jamet; Abhishek Agarwal; Dan E Browne; Ivan Rungger

doi:10.1088/2058-9565/ad0571

1. Introduction

Quantum computing has seen significant progress in the last decade, leading to the development of quantum processors with an increasing number of qubits. These noisy intermediate-scale quantum (NISQ) processors are currently being investigated for a possible advantage over classical computers [1–4]. However, anticipated general quantum algorithms with a proven exponential speed-up over their classical counterparts lie outside of the reach of NISQ processors, because of the limited physical qubit numbers as well as the accumulation of noise for deep circuits [5–7]. Such algorithms include Shor's algorithm [8] and quantum phase estimation (QPE) [9]. Therefore, for the full potential of quantum computing to be achieved, fault-tolerant quantum computers (FTQCs) will be needed.

An essential part of developing an FTQC is the incorporation of quantum error correction (QEC) into the processors, which will not only increase the coherence times of the resulting logical qubits but also actively correct for errors in faulty operations as the information is being processed. To this end, there have been recent prototype demonstrations of QEC on superconducting [10] and ion-trap devices [11]. Furthermore, recently error suppression with increasing code-size was demonstrated for the surface code [12] despite an increased number of physical qubits and gate operations. While the gate error levels in current hardware are not yet sufficiently low for large-scale error correction, with rapid progress in the field, error-corrected quantum computers with a small number of logical qubits may emerge in the next few years.

The main difference between the quantum circuits achievable on near-term, so called NISQ computers, and future error-corrected quantum computers are the range of quantum gates available at the logical level. Typically, NISQ devices allow continuously parameterized single-qubit rotations, such as an arbitrary z-rotation, $R_z(\theta)$ , where $\theta\in[0,2\pi]$ . On the other hand, FTQCs only allow a fixed set of discrete rotation angles, such as the $\pi/4$ rotation about the z-axis, also known as the T-gate. For this reason, any algorithm must first be compiled into a code-specific fault-tolerant (FT) gate-set before it can be run on the logical qubits. FT gate synthesis has been investigated in depth, and a number of algorithms have been proposed [13–17].

In this article we study the variational quantum eigensolver (VQE) algorithm [18] in the context of FT quantum computation. VQE uses the variational principle to compute an upper bound for the ground state energy of a Hamiltonian by using a parameterized quantum circuit that approximates the ground state of the Hamiltonian after optimization of the parameters in the quantum circuit. Computing an approximation for the ground state and energy of a Hamiltonian is generally the first step in probing the energetic properties of physical systems, a problem that commonly arises in condensed matter physics and quantum chemistry [19]. As a result, VQE has potential practical applications in fields such as material science [20, 21] and drug discovery [22, 23]. Furthermore, general cost functions can typically be represented as a Hamiltonian, where a variational algorithm can then be used to obtain the ground state, which hence yields the optimized result for the given cost function. Therefore, results obtained for VQE are generally applicable also in applications in optimization, in particular using the quantum approximate optimization algorithm (QAOA) [24], and in quantum machine learning [25, 26].

Due to its partial resilience to noise for moderate numbers of qubits and circuit depths, VQE has been studied as a near-term algorithm and used to perform a number of proof of concept demonstrations on noisy quantum hardware. However, the scaling to larger systems is hindered by the noise in the hardware, since the noise mitigation methods used successfully in small experiments become very expensive to scale up to larger circuits [19, 27]. As the system sizes increase, it is still an open question as to whether the VQE algorithm by itself will be an efficient method to compute the ground state of a Hamiltonian, since a number of challenges need to be solved, as outlined in [19].

For such large scale systems there are a number of alternative algorithms requiring fault-tolerance, which in principle allow to obtain desired states on a quantum computer with very high quality. These target states are typically ground states of a Hamiltonian, which can either represent a physical system or a general cost function in optimization and machine learning tasks. Such methods with known scalability include the QPE algorithm and the so-called Rodeo algorithm [28]. However, all these methods rely on the necessary condition that the initial state, upon which the method is applied, has already a sufficient overlap with the target state. The problem is that as the systems scale up in size, the overlap can become exponentially small [20], in which case the use of these methods becomes impossible. As a solution to potentially overcome this problem we propose to integrate these methods with an FT implementation of VQE, which we call FT-VQE. For large systems FT-VQE may not directly obtain the final target state with very high accuracy, but it will likely allow to obtain sufficient overlap with the final target state to be used as initial state in the algorithms above (for example QPE or Rodeo algorithms). We therefore expect FT-VQE to become an integral part of such applications in fault-tolerant devices for this initial state preparation step. We note that FT-VQE would not solve all the intrinsic scaling challenges of VQE, such as the difficulty in the classical optimization of the circuit parameters for large systems [19, 29, 30]. However, in the FT setting, we expect VQE to be used alongside algorithms with known scalability such as QPE [9].

The hurdle which must be overcome is that current VQE quantum circuits, designed for near-term hardware, include continuously parameterized rotation gates. The rotation angles are optimized on a classical computer, and the energies used in the optimization process for a given set of angles are obtained from the quantum computer. It had not yet been investigated how well this classical optimization process can be performed given the limitations of an FT discrete gate-set, where a continuous range of parameters can only be achieved in the limit of large numbers of T-gates, and whether the necessary approximations in the gates for circuits with finite T-gate depth would affect the convergence of the algorithm. In this article, we address this question and show that by integrating the Ross–Selinger (RS) recompilation of a continuous rotation to an FT gate-set into our algorithm, we can obtain systematic convergence of the VQE algorithm. We find that there is no slowdown in convergence efficiency when compared to the conventional VQE circuits with parameterized hardware gates if an adaptive RS recompilation accuracy is used.

The structure of the paper is as follows. Section 2 discusses the methods, introducing the FT Clifford+T gate-set, the RS algorithm used for FT gate synthesis, and an FT implementation of the VQE algorithm. Section 3 introduces the two spin models that we use to test the method, and then the results for both fixed angles of a quantum circuit and for a full VQE loop performed using our implementation with an FT gate-set. In section 4 we discuss the conclusions.

2. Methods

2.1. FT gate-set

The universal FT gate-set used in this study is the Clifford+T gate-set. The Clifford set in this study consists of Hadamard, phase-gate, and controlled-NOT gates, $\{H,S,\text{CNOT}\}$ , which can generate any N-qubit Clifford operation. With the addition of the T-gate, any unitary lying in $\text{SU}(2^N)$ can be approximated up to any target accuracy by a sequence of Clifford+T gates [31].

The T-gate is expressed in matrix form as

$\begin{equation} T = \begin{bmatrix} 1 & 0 \\ 0 & \mathrm{e}^{\mathrm{i}\pi/4} \end{bmatrix}. \end{equation} \tag{ 1 }$

There are diverse ways of achieving FT logic gates depending on the underlying code. However, in many codes, including the surface code [32] (currently the most promising quantum error correcting code due to its high threshold and simple two-dimensional modular implementation), Clifford group gates can be performed in a direct and fast method, while T-gates are more expensive.

In order to implement the T-gates in these codes, a non-unitary technique such as magic-state distillation and injection is required. This procedure has a spatial and temporal overhead due to the additional ancillary qubits and physical gates needed [33]. Therefore, an important metric in the analysis of the performance of an FT quantum algorithm implemented with Clifford+T gates is the number of T-gates.

In our work, we are, therefore, interested in the two following metrics.

T-count: T-count is defined as the total number of T-gates in the FT quantum circuit.
T-depth: T-depth is defined as the total number of layers of T-gates, where within each layer parallel execution of the T-gates on different logical qubits is possible.

2.2. RS algorithm

The RS algorithm computes approximations of arbitrary single qubit z-rotations, $R_z(\theta)$ over the Clifford+T gates. It achieves this by approximating $R_z(\theta)$ with another unitary U, where U has an exact decomposition over Clifford+T gates up to the single-qubit global phase, $\omega = \mathrm{e}^{\mathrm{i}\frac{\pi}{4}}$ [14]

$\begin{equation} R_z\left(\theta\right) = \begin{bmatrix} \mathrm{e}^{-\mathrm{i}\frac{\theta}{2}} & 0 \\ 0 & \mathrm{e}^{\mathrm{i}\frac{\theta}{2}} \end{bmatrix} \xrightarrow{\text{RS}} U = \prod_m U_m\in\left\{\omega,H,S,T\right\}. \end{equation} \tag{ 2 }$

The algorithm then outputs the Clifford+T decomposition of the unitary U. The error in this approximation is given as

$\begin{equation} ||R_z\left(\theta\right)-U||\unicode{x2A7D}\epsilon = 10^{-d} \end{equation} \tag{ 3 }$

where $||.||$ is the operator norm bounded by $\epsilon$ , and d is the digit accuracy. The accuracy of the decomposition can be systematically improved with increasing T-depth. Furthermore, the RS algorithm is efficient in the number of T-gates, which in the typical case scales with respect to $\epsilon$ as $4\log_2(\frac{1}{\epsilon}) + O(\log(\log(\frac{1}{\epsilon})))$ . This is in contrast to the Solovay–Kitaev algorithm [13], which achieves T-counts of $O(\log^c(\frac{1}{\epsilon}))$ , where c is a constant, making RS more favorable for c > 3.

Rotations about other axes, such as R_x and R_y , can be achieved with the corresponding R_z rotation and additional Clifford gates as

$\begin{equation} R_x\left(\theta\right) = HR_z\left(\theta\right)H, \;\;\; R_y = SHR_z\left(\theta\right)HS^\dagger. \end{equation} \tag{ 4 }$

Therefore, the RS transpilation can be used to find the Clifford+T approximation of any $\text{SU(2)}$ unitary.

The software implementation of the RS algorithm is available in an open-source software package [34]. Figure 1 shows practical numbers for the number of T gates in the RS decomposition of $R_z(\theta)$ rotations in the range $\theta\in[0,\pi]$ at various d, obtained using this software package. It can be observed that for a given d, the number of T gates required to realize different rotations is similar for all rotations. Note that multiples of $\pi/2$ can be carried out by S gates and hence have a T-count of 0.

**Figure 1.** T-counts of $R_z(\theta)$ rotations for increasing RS digit accuracy, d (see equation (3)).
Download figure:
Standard image High-resolution image

2.3. FT implementation of VQE

The VQE is a hybrid quantum–classical algorithm, where the ground state wave function of a Hamiltonian $\mathcal{H}$ is expressed on the quantum computer by executing a parametric quantum circuit, denoted as ansatz $V(\boldsymbol{\theta})\vert 0\rangle$ . $V(\boldsymbol{\theta})$ is a sequence of parametric gates, and θ is a vector of parameters that is optimized to find the ground state. The energy of the parametric wave function,

$\begin{equation} \mathcal{E}\left(\boldsymbol{\theta}\right) = \langle 0\vert{V\left(\boldsymbol{\theta}\right)^\dagger \mathcal{H} V\left(\boldsymbol{\theta}\right)}\vert0\rangle, \end{equation} \tag{ 5 }$

can be obtained on a quantum processor as expectation value.

If the quantum circuit can span the entire Hilbert space, the minimum value of $\mathcal{E}(\boldsymbol{\theta})$ will correspond to the ground state of $\mathcal{H}$ . However, practical implementations typically use a circuit with limited parameters, leading to an approximation of the ground state. The minimum value of $\mathcal{E}(\boldsymbol{\theta})$ is found via classical optimization.

2.3.1. Classical optimization

The task of finding the minimum value of $\mathcal{E}(\boldsymbol{\theta})$ is challenging, because it is usually a non-convex function. In this paper we focus on gradient-based minimization methods, specifically the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [35–38]. In BFGS, the procedure assumes a continuous optimization landscape as a function of the parameters. On the contrary, in FT quantum computation, one will necessarily be restricted to a discrete gate-set, typically Clifford+T. Here, we therefore substitute the parametric gate-set $R_x(\theta),R_y(\theta),R_z(\theta)$ by their RS decompositions (equations (2) and (4)).

Two strategies are usually used to obtain the gradient in VQE: the finite-difference rule and the parameter-shift rule [39–41]. In this paper, we use both strategies and evaluate their relative performance.

Finite-difference rule : this method requires two evaluations of a given circuit with an infinitesimally shifted parameter. The required derivatives can be approximately calculated as

$\begin{align} \partial_{\theta_\mu}\mathcal{E}\left(\theta_\mu\right) \approx \frac{\mathcal{E}\left(\theta_\mu + \frac{1}{2}\Delta\theta_\mu\right) - \mathcal{E}\left(\theta_\mu - \frac{1}{2}\Delta\theta_\mu\right)}{\Delta\theta_\mu}. \end{align} \tag{ 6 }$

For ease of notation in the equation we only specify the shifted parameter θ_µ as argument in the function; all other parameters in the vector θ are kept constant. For the FT-VQE algorithm, the finite-shift rule requires two runs of the RS algorithm for calculating the Clifford+T decompositions of the shifted angles, followed by two expectation value evaluations.

Parameter-shift rule : the parameter-shift rule is a method of computing the gradient of a parameterized gate by running the same quantum circuit twice with a finite shift in the gate parameter [39–41]. It states that if the generator G of the gate $\mathcal{G}(\theta_\mu)$ (with a single parameter θ_µ) has two unique eigenvalues denoted as e₀ and e₁, then the derivative of the circuit expectation with respect to the gate parameter θ_µ is given by the difference in expectation value of two circuits. The two circuits are run with shifted parameters scaled by a parameter r given as $r = \frac{e_1 - e_0}{2}$ . The gradient is obtained as

$\begin{equation} \partial_{\theta_\mu} \mathcal{E}\left(\theta_\mu\right) = r\left[\mathcal{E}\left(\theta_\mu + \frac{\pi}{4r}\right) - \mathcal{E}\left(\theta_\mu - \frac{\pi}{4r}\right)\right] \end{equation} \tag{ 7 }$

where r is the scaling parameter and $\frac{\pi}{4r}$ is a finite shift.

The main gate of interest in this work is $R_z(\theta) = \mathrm{e}^{-\mathrm{i}\frac{\theta}{2}Z}$ . The Z-gate has eigenvalues ±1, hence the value of r for $R_z(\theta)$ is $r = \frac{1}{2}$ . The parameter-shift rule for $R_z(\theta)$ is, therefore, expressed as

$\begin{equation} \partial_{\theta_\mu} \mathcal{E}\left(\theta_\mu\right) = \frac{1}{2}\left[\mathcal{E}\left(\theta_\mu + \frac{\pi}{2}\right) - \mathcal{E}\left(\theta_\mu - \frac{\pi}{2}\right)\right]. \end{equation} \tag{ 8 }$

We note that the required parameter-shift can be exactly implemented with S gates together with the appropriate multiple of the phase ω: $R_z(\pi/2) = \omega^7S$ and $R_z(-\pi/2) = \omega SSS$ .

We remark that the two-qubit gate $ZZ(\theta) = \mathrm{e}^{\mathrm{i}\frac{\theta}{2}Z\otimes Z}$ , which is required in the Hamiltonian variational ansatz (HVA) used in the results section of this article, can be decomposed into CNOTs and a single-qubit rotation as shown in figure 2. The gradient of the $ZZ(\theta)$ can therefore be calculated with equation (8).

**Figure 2.** $ZZ(\theta)$ gate decomposed into CNOTs and a single-qubit $R_z(\theta)$ gate.
Download figure:
Standard image High-resolution image

Therefore, the parameter-shift rule for $ZZ(\theta)$ -like gates does not have an additional FT overhead, i.e. no extra T-gates are needed for the required shifts because S gates can realize the desired shifts.

3. Results

To demonstrate the method, we apply the FT-VQE to two spin models, namely the transverse-field Ising model (TFIM) and the XXZ model, using an HVA [42, 43]. Within the HVA each ansatz is model-specific. For a general Hamiltonian, $\mathcal{H}$ , which is a linear sum of not-necessarily commuting terms $\mathcal{H} = \sum_i \mathcal{H}_i$ , an HVA is expressed as

$\begin{equation} \vert \psi_L\rangle = \prod_{l = 1}^{L}\left(\prod_i \exp\left(-i\theta_{i,l}\mathcal{H}_i\right)\right)\vert \psi_0\rangle \end{equation} \tag{ 9 }$

where L is the total number of layers (layer depth). $\vert \psi_0\rangle$ is the ground state of one of the individual terms $\mathcal{H}_i$ of the Hamiltonian [42, 43]. HVA is motivated by the QAOA [24], and the choice of $\vert \psi_0\rangle$ is similar, which is a state that is easy to prepare in hardware. For our investigation, HVA is chosen because of its linear scaling in the number of parameters with increasing number of layers.

TFIM . The TFIM Hamiltonian is a prototype model of quantum magnetism, and is given by

$\begin{equation} \mathcal{H}_{\text{TFIM}} = - \sum_{i = 1}^{N}\left[ \hat{Z}_{i} \hat{Z}_{i+1} + g \hat{X}_i\right], \end{equation} \tag{ 10 }$

where we consider periodic boundary conditions ( $\hat{Z}_{N+1} = \hat{Z}_{1}$ ). The value of g affects the magnetic phase, where g < 1 gives a ferromagnetic phase as ground state, g > 1 corresponds to a paramagnetic phase, and g = 1 is a critical point. The system is gapless at g = 1 in the thermodynamic limit where N goes to infinity. For our FT-VQE simulations, we consider the critical g = 1 point for a system of N = 16 qubits. The initial state is taken as $\vert \psi_0\rangle = \vert +\rangle^{\otimes 16}$ . For the HVA we use a layer depth of L = 8, leading to 16 parameters. The corresponding quantum circuit ansatz is shown in figure 3. This system size and ansatz choice are based on those of [43], where the entanglement and optimization properties of this system have been first investigated using the HVA. In [43] it is shown that barren plateaus are expected at L = 8 with a random parameter initialization. We, therefore, choose this setup as one of our test systems to evaluate the performance of FT-VQE in presence of barren plateaus.

**Figure 3.** TFIM ansatz with 16 qubits with two parameters per layer, where $\beta,\gamma$ are the variational parameters of all the gates in each layer. Layers are repeated eight times leading to 16 free parameters in total.
Download figure:
Standard image High-resolution image

**Figure 3.** TFIM ansatz with 16 qubits with two parameters per layer, where $\beta,\gamma$ are the variational parameters of all the gates in each layer. Layers are repeated eight times leading to 16 free parameters in total.
Download figure:
Standard image High-resolution image

XXZ model . The XXZ model is another prototypical model of quantum magnetism. In the one-dimensional case its Hamiltonian is given by

$\begin{align} \mathcal{H}_{\text{XXZ}} = \sum_{i = 1}^{N} \hat{X}_{i} \hat{X}_{i+1} + \hat{Y}_{i} \hat{Y}_{i+1} + \Delta \hat{Z}_{i} \hat{Z}_{i+1} , \end{align} \tag{ 11 }$

where again periodic boundary conditions are applied. Δ represents the spin anisotropy of the model. For our FT-VQE simulations, we take $\Delta = 1$ , at which there is a phase transition to the Néel ordered state. The system size we consider is N = 12 qubits, and for the HVA we use a layer depth L = 36, leading to 144 free parameters. The initial state is taken as $\vert \psi_0\rangle = \bigotimes_{n = 1}^{N = 6}\vert \Psi^-\rangle$ , where $\vert \Psi^-\rangle = \frac{1}{\sqrt{2}}(\vert 01\rangle-\vert 10\rangle)$ is a Bell state as in [43]. The ansatz is shown in figure 4. In [43] it is shown that such ansatz achieves good VQE convergence at L = 36, which is attributed to the system being in the over-parameterization limit for the XXZ model, and hence avoids a barren plateau. Therefore, this second ansatz illustrated in figure 4 allows us to evaluate the performance of FT-VQE for a system that has no barren plateaus, but a large number of variational parameters and layers.

**Figure 4.** XXZ model ansatz with 12 qubits and four parameters per layer, where $\theta,\phi,\beta,\gamma$ are the variational parameters of all the gates in each layer. Layers are repeated 36 times leading to 144 free parameters in total. It was numerically observed that at least 36 layers were needed for sufficient convergence with random initial parameters.
Download figure:
Standard image High-resolution image

**Figure 4.** XXZ model ansatz with 12 qubits and four parameters per layer, where $\theta,\phi,\beta,\gamma$ are the variational parameters of all the gates in each layer. Layers are repeated 36 times leading to 144 free parameters in total. It was numerically observed that at least 36 layers were needed for sufficient convergence with random initial parameters.
Download figure:
Standard image High-resolution image

Finally, all the calculations performed in this paper have been done on the quantum emulator Qulacs [44] as state vector emulations.

3.1. State preparation accuracy analysis with limited T-depth

First, we perform the VQE minimization using the parameterized rotation gates. We denote this circuit as the Rz(θ)-circuit due to the continuous parameterization of the rotation angles, and due to the decomposition of each parameterized gate into single-qubit $R_z(\theta)$ rotations. We then fix the rotation gates at the optimized values, perform the RS decomposition at different d for each rotation gate, and then replace all parameterized $R_z(\theta)$ rotations with their RS decompositions. We denote this circuit as the FT(θ)-circuit. We evaluate the energy difference between the Rz(θ)-circuit and FT(θ)-circuits for different digital accuracy d (equation (3)). Figure 5 shows this comparison for both the TFIM and XXZ models, together with the resulting T-count and T-depth. The energy difference decreases exponentially with d, until d is larger than about six, above which the energy difference becomes approximately constant. We verified that this tail-off is due to the finite numerical accuracy of the emulator software. In appendix we show that when performing the numerical evaluations at higher precision the energy difference decays exponentially up to digit accuracy of d = 16. Our results therefore show that one can systematically increase the accuracy of the energy of the FT(θ)-circuit by increasing d up to the numerical accuracy. Importantly, the circuit depth only increases linearly with d, as guaranteed by the RS decomposition.

**Figure 5.** Difference of the ground state energy between the solution obtained with the circuit using parameterized rotations (Rz(θ)-circuit) and its fault-tolerant form with limited T-depth following Clifford+T compilation (FT(θ)-circuit). The horizontal axis of the plots illustrates the Ross-Selinger digit accuracy, d in equation (3). (a) Difference in energy of the Rz(θ)-circuit and the FT(θ)-circuit for the same rotation parameters, as function of digit precision d (b) T-count and T-depth, showing that they increase approximately linearly with d.
Download figure:
Standard image High-resolution image

For $d=4$ the error in the energy expectation value is less than 10⁻⁴ for both the TFIM and XXZ models. For the TFIM model the corresponding T-count is 10 232, and the T-depth is 974; for the XXZ model the T-count is 50 976 and the T-depth is 8554. The energy difference becomes approximately constant for d larger than six, where it reaches values of the order of 10⁻¹⁰ to 10⁻⁹. The T-count for d = 6 is 15 480, with a T-depth of 1474, for the TFIM; for the XXZ model the T-count is 77 680, with a T-depth of 13 056. TFIM with N = 16 qubits has a total number of 256 single-qubit $R_z(\theta)$ gates, whereas the XXZ model has a total of 1296 single-qubit $R_z(\theta)$ gates. The total T-count for both models at arbitrary d is approximately equal to the number of $R_z(\theta)$ gates multiplied by the average number of T-gates per RS decomposition of a single $R_z(\theta)$ gate, which can be extracted from figure 1. Note that in practice also the $R_z(\theta)$ gates have finite accuracy when implemented in hardware, so that the hardware result will also deviate from the ideal exact circuit results.

The above T-count estimates are modest when compared with recent resource count estimates for other FT algorithms. For example, for the problem of integer factoring, it is expected that at least 10⁹ T-gates and 10⁵ logical qubits will be needed [5]. In [45, 46] the required T-counts of various quantum simulation problems are investigated, and estimates range between 10⁷ and 10¹² T-gates depending on the complexity of the system. Particularly, in [46], authors estimate the T-counts of various spin systems. The Hamiltonian studied in [46] is similar to our XXZ model with $\Delta = 1$ (a Heisenberg chain), and an additional transverse-magnetic field. Their lower estimates are of the order of 10⁷ T-gates for a spin-system with 14 qubits for an error level of 10⁻³. For the XXZ model of a similar size, our estimate with FT-VQE is 10⁴–10⁵ T-gates.

3.2. VQE using the RS decompositions

We now evaluate and optimize the convergence of the full VQE minimization using the circuit with the RS decomposed rotation gates. As starting point for the minimization process we use random initial rotation parameters, and compile these into their RS decomposition for a given d. We use gradient-based optimizers. To obtain the gradients we use two different methods, and compare the respective convergence behavior. In the first method we evaluate the energy for the parameters shifted by a small finite difference from their previous values (see section 2.3 for details), and we compute the gradient from the resulting energy difference. We choose $\Delta\theta_\mu = 0.1$ for this small shift, because the shift needs to be orders of magnitude larger than the error made in the RS decomposition. In general, the precision of gradients that can be obtained using FT(θ)-circuits depends on d. In the second method we use the parameter-shift rule outlined in section 2.3 to obtain the gradients. Once the gradients are obtained, the optimizer updates the rotation angles as part of the energy minimization process. The updated rotation parameters are again recompiled into FT(θ)-circuits. This process is iterated until convergence of the VQE optimization. The optimizer used for both methods is the BFGS algorithm [35–38]. The stopping criterion for the minimization is when the difference energy between the new iteration and the previous one is less than 10⁻¹⁴. The same criterion is taken for both the Rz(θ)-circuit and the FT(θ)-circuits.

We now analyze the convergence behavior of this FT-VQE method by evaluating the energy difference between the energy at each minimization step and the energy obtained by direct numerical diagonalization of the Hamiltonian matrix, which is exact up to numerical precision. This data is obtained from the TensorFlow Quantum dataset available at [47]. Figure 6 shows the results for the TFIM model, for both the finite-difference based gradient (figure 6(a)) and for the gradient obtained using the parameter-shift rule (figure 6(b)). We also show the convergence behavior for the Rz(θ)-circuit (gray solid line) as reference. Both in figures 6(a) and (b) we observe that the convergence behavior of the Rz(θ)-circuit and FT(θ)-circuits is similar. Up to about 150–300 optimization steps the energy decreases very slowly, and then rather abruptly the decrease of energy becomes much larger until it converges. This initial slow convergence is caused by small gradients in large parts of the energy landscape, which are commonly referred to as barren plateaus. It can be seen upon close inspection that the light purple d = 3 plot gets stuck in the barren plateau, where the energy error remains high at a value of the order of 10⁻¹. The gradient in the barren plateau is characterized by its small magnitude, which leads to correspondingly small updates in the optimization process. In this case, a higher d is needed to update the angle with enough accuracy. For $d\unicode{x2A7E}5$ , sufficiently high accuracy has been reached allowing the convergence of the VQE. For $d\unicode{x2A7E}5$ the convergence behavior in presence of barren plateaus is similar for both the Rz(θ)-circuit and FT(θ)-circuits.

**Figure 6.** Fault-tolerant VQE with (a) finite-differences with $\Delta\theta_\mu = 0.1$ and (b) parameter-shift rule methods for TFIM. Energy differences were calculated with respect to the numerically exact ground state energy value for the TFIM calculated via matrix diagonalization. The dashed lines in the plots illustrate the optimal expectation value of the Rz(θ)-circuit with continuously parameterized rotation gates.
Download figure:
Standard image High-resolution image

**Figure 6.** Fault-tolerant VQE with (a) finite-differences with $\Delta\theta_\mu = 0.1$ and (b) parameter-shift rule methods for TFIM. Energy differences were calculated with respect to the numerically exact ground state energy value for the TFIM calculated via matrix diagonalization. The dashed lines in the plots illustrate the optimal expectation value of the Rz(θ)-circuit with continuously parameterized rotation gates.
Download figure:
Standard image High-resolution image

We note that the finite-differences method escapes the barren plateau at around 150 steps, while for the parameter-shift rule this happens above 300 steps. We attribute this behavior to large shift of $\Delta\theta_\mu = 0.1$ in the angles for the gradient calculation. The rather large $\Delta\theta_\mu$ can lead to random errors of that order in the gradient. A stochastic component during optimization resulting from such random errors in finite precision gradients can speed up the convergence by bringing the system out of a barren plateau. These random errors can also help the system escape out of a local minimum and hence improve convergence. The Rz(θ)-circuit converges to an error of 10⁻¹² with the finite-shift rule and to 10⁻¹⁴ with the parameter-shift rule. On the other hand, the FT(θ)-circuits converge to a similar error level of $10^{-10}-10^{-9}$ with both finite-differences and parameter-shift rule for $d\unicode{x2A7E}7$ . FT(θ)-circuits use RS decompositions, and as a result involve many more multiplications of matrices as compared to the Rz(θ)-circuits. The higher number of multiplications amplifies the numerical precision errors in Qulacs matrix multiplications. This is consistent with the results of figure 5, and with the numerical precision analysis presented in appendix.

As explained in section 2.3, in the finite-shift rule one has to recompute the RS decompositions for the shifted angles, whereas in the parameter-shift rule the required angle shifts can be realized with S-gates. Therefore, in this setting the parameter-shift rule has less classical overhead for the computation of the gradient. Furthermore, since the required shifts are multiples of the S-gate, the gradient calculations do not contribute to the T-count. Finally, due to its resilience to noise, as outlined in [25], overall we expect that the parameter-shift rule is more efficient for FT-VQE.

Figure 7 shows the convergence behavior for the XXZ model, where we only use the parameter-shift method. One can see that there are no barren plateaus, differently to what was found for the TFIM system. One possible reason for this can be the larger number of variational parameters in the XXZ model ansatz. Such over-parameterization can help avoid barren plateaus, as discussed in [43]. The system converges rather rapidly to errors below 10⁻² and $10{^{-3}}$ , which can be typical accuracy thresholds for practical applications. While for d = 3 the system only converges down to 10⁻¹, for d = 5 one reaches an accuracy of below 10⁻³, and hence d = 5 can be the appropriate digit precision of the RS decomposition for practical applications. If one further increases d and allows for more optimization steps, the energy difference keeps decreasing down to values of 10⁻⁶ and below. Importantly, the FT(θ)-circuit convergence at these high d values is very similar to the one for the Rz(θ)-circuit. Note that even the Rz(θ)-circuit ansatz has an inherent accuracy limit compared to the ground energy calculated from matrix diagonalization, which is the reason for the small but finite remaining energy difference.

The results show that the small values of d can achieve convergence to moderate errors in the energy, and that with increasing d one can systematically improve the accuracy of the final energy up to a numerical accuracy limit determined by the precision of the software. Furthermore the results also show that when barren plateaus are encountered, random errors in the gradients can help the system escape from the barren plateau. For example, in figure 6(b) the d = 5 FT(θ)-circuit escapes the barren plateau at significantly lower optimization step number when compared to larger d. We also note that a priori it is not known which d will allow the system to reach a target accuracy. We therefore propose an adaptive algorithm, where a small d is used to start the minimization process, which is then progressively increased during the convergence until a target accuracy is reached. We implement this adaptive approach in our software, where we minimize the energy for a given d until the criterion $\vert{\mathcal{E}(\boldsymbol{\theta}_{t+1})-\mathcal{E}(\boldsymbol{\theta}_{t})\vert} = 1\times 10^{-14}$ is reached. Once this is the case, we proceed by increasing the accuracy d by one. We show the results for the TFIM model in figure 6(b), and for the XXZ model in figure 7 (black curves). We set the initial d to d = 3, and then allow d to increase up to a maximum of d = 8 within the adaptive algorithm. Each '×' marker represents the point where d is incremented. The results show that the adaptive approach leads to efficient convergence, while allowing to minimize the T-depth of the circuit. We therefore expect a generally improved performance of adaptive FT-VQE when compared to FT-VQE at fixed d.

We note that while our work considered an HVA, the methods used are applicable to any variation of VQE. Since the Clifford+ $R_z(\theta)$ gate-set is universal, any quantum circuit can be decomposed first into this form, and then into the Clifford+T set via RS decomposition. In practice, this technique will be efficient for a large number of parametric quantum circuits, where the performance of FT-VQE will mostly depend on the performance of the underlying variational algorithm with continuously parameterized circuits. For example, the ADAPT-VQE ansatz [48] can be used as an alternative to the fixed HVA.

3.3. Proposed early hardware demonstration

The current state-of-the-art quantum hardware with its significant noise levels is not yet suitable for executing extended circuits with many gates, which are required to perform the FT-VQE computations presented in our work. The earliest experimental demonstration of FT-VQE on partially FT quantum hardware, as a proof-of-principle rather than for practical purposes, can be done using one or two-qubit Hamiltonians. For the one-qubit case, a general single-qubit unitary $U\in\text{SU(2)}$ comprising three single qubit rotations can be used. Assuming an adaptive RS accuracy of $d = 3-7$ , figure 1 suggests that between 25 and 75 T-gates for each of the single-qubit rotations should be sufficient. Therefore, this demonstration would require between 75 and 225 T-gates at each VQE iteration. For a demonstration with two qubits, one may consider a general two-qubit unitary $U\in\text{SU(4)}$ from [49]. This circuit requires three CNOTs and 15 single-qubit rotations. Assuming again an adaptive RS accuracy of $d = 3-7$ , we estimate 375–1125 T-gates to be needed at each VQE iteration for successful convergence.

4. Conclusions

Our results demonstrate that variational quantum algorithms, such as VQE, show promise for practical applications on error-corrected quantum computers with limited T-depth. Our findings suggest that VQE is viable for running on an FTQC. We expect that FT-VQE alone cannot achieve quantum advantage, but that it will be a required component of a number of quantum algorithms that combine different methods. For example, VQE can be the first step of an algorithm to compute the ground state, where VQE is used to prepare a state with finite overlap with the ground state, which is then used as a starting state for other algorithms such as QPE.

We presented the FT-VQE algorithm, which we demonstrated on a quantum emulator for 12 and 16 qubits for two prototypical spin systems. Our study shows that FT-VQE convergence behavior is analogous to standard VQE, especially when we use our proposed adaptive setting of the RS circuit re-compilation accuracy. The presented data show that the discretization of arbitrarily parameterized rotation angles with a finite T-gate depth does not negatively affect the convergence of the VQE algorithm, and also that the required T-gate depth for such good convergence is moderate. The main limitation of FT-VQE when compared to traditional VQE is that the circuit depth is significantly increased due to the restriction to the Clifford+T gate-set, the main advantage is that it allows for FT execution, where noise will not accumulate despite the increased circuit depth.

This work highlights the potential of FT-VQE as a powerful tool for practical quantum applications, with particular relevance in quantum chemistry simulations and optimization problems. These findings contribute to the ongoing development of quantum computing technologies and underscore the importance of continued research in this area.

Acknowledgments

H S is supported by the Engineering and Physical Sciences Research Council [Grant Number EP/S021582/1]. H S also acknowledges support from the National Physical Laboratory. H S acknowledges the use of the UCL Myriad High Performance Computing Facility (Myriad@UCL), and associated support services, in the completion of this work. F J, A A, and I R acknowledge the support of the UK Department for Science, Innovation & Technology through the UK National Quantum Technologies Programme. DEB acknowledges the support of the Engineering and Physical Sciences Research Council [Grant Numbers EP/S005021/1 and EP/T001062/1] and InnovateUK.

After the completion of this work, we became aware of [50], which investigates state preparation with the unitary coupled cluster ansatz in circuits with limited T-depth. This work is complementary to the analysis of convergence behavior of the HVA decomposition investigated here.

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Appendix: Numerical precision of the classical emulation

To evaluate the numerical precision of the fault-tolerant variational quantum eigensolver (FT-VQE) circuits we first evaluate the precision of the Ross–Selinger (RS) circuit synthesis for a single rotation gate, and then evaluate how the implementation of a sequence of gates in the used emulators affects the overall numerical precision.

A.1. RS error analysis

We carry out error analysis for the RS algorithm in order to assess whether the z-rotation realized by the approximate Clifford+T unitary U is close to the expected rotation θ, and whether the error is compatible with the operator norm given in equation 3. Furthermore, the non-zero off-axis rotation realized by the unitary U is also calculated. For this purpose, we consider the Euler angle decomposition of a general SU(2) matrix in the ZXZ convention, given by

$\begin{align} U\left(\theta_1,\phi,\theta_2\right) & = \mathrm{e}^{-\mathrm{i}\frac{\theta_1}{2}Z}\mathrm{e}^{-\mathrm{i}\frac{\varphi}{2}X}\mathrm{e}^{-\mathrm{i}\frac{\theta_2}{2}Z} \nonumber\\ & = \begin{bmatrix} \mathrm{e}^{-\mathrm{i}\left(\frac{\theta_1+\theta_2}{2}\right)}\cos\left(\frac{\varphi}{2}\right) & -i\mathrm{e}^{\mathrm{i}\left(\frac{\theta_1-\theta_2}{2}\right)}\sin\left(\frac{\varphi}{2}\right) \\ -i\mathrm{e}^{-\mathrm{i}\left(\frac{\theta_1-\theta_2}{2}\right)}\sin\left(\frac{\varphi}{2}\right) & \mathrm{e}^{\mathrm{i}\left(\frac{\theta_1 + \theta_2}{2}\right)}\cos\left(\frac{\varphi}{2}\right)\end{bmatrix} . \end{align} \tag{ A1 }$

We calculate the z-rotation angles θ₁ and θ₂ and the off-axis x-rotation ϕ.

In the ideal case, θ in equation 3 should be $\theta\approx\theta_1+\theta_2$ , whereas the off-axis rotation should be ϕ ≈ 0. Figure 8 shows the results for the computed angles. Note that we take the expression of U for different angles directly from the RS software, so that the only error in consideration is from the RS approximation of the $R_z(\theta)$ rotation. We compute the absolute difference of the Euler angles of U from their expected value. In figure 8 it can be seen that both the z-rotations and the undesired x-rotations have similar levels of errors, which can be systematically reduced by increasing d.

A.2. Circuit emulation at finite numerical precision

RS decompositions lead to a large number of Clifford+T gates. When using an emulator, these are implemented as classical matrix multiplications. Due to the large number of matrix multiplications, finite numerical precision errors accumulate and become observable. This behavior can be seen in figures 5(a) and 6(a), (b). To analyze accumulation of precision errors, and to explain why the error levels off when d is increased above some threshold in the FT-circuit emulator runs, we implement an emulator in Mathematica, which allows to perform the computations at arbitrary numerical precision.

Since the arbitrary precision calculations are computationally expensive for large number of qubits, we use a scaled-down version of the transverse-field Ising model (TFIM). We find that for N = 4 qubits we can observe the same leveling off of the error with increasing d as for the larger systems. We therefore use this scaled-down system as our test system. In analogy to figure 5, we first calculate the expectation value of the $R_z(\theta)$ -circuits with optimized parameters. Then, we perform the RS circuit recompilation for the $R_z(\theta)$ -circuits at various d, and compute the expectation values. The results are presented in figure 9. The axes of figures 9 and 5(a) are the same.

**Figure 9.** Classical emulation of Ross–Selinger FT-circuit synthesized for increasing accuracy d, performed at different levels of numerical precision of the emulator. The model used is TFIM with N = 4 qubits. p represents the finite numerical precision used in the emulator operations, and L is the layer depth of the TFIM-Q4 ansatz. In the L = 2 ansatz, there are a total of 16 paramete-rized gates with four distinct parameters, whereas in L = 8, there are 64 parameterized gates with 16 distinct parameters.
Download figure:
Standard image High-resolution image

We take two different layer depths of the TFIM four-qubit ansatz in order to investigate the effect of increasing number of matrix multiplications at various numerical precision values, which we indicate by p. In the L = 2 ansatz, there are a total of 16 parameterized gates, with four distinct parameters, whereas for L = 8 there are 64 parameterized gates, with 16 distinct parameters. At the limited p = 7 precision, a similar pattern as that of figure 5(a) is observed, where the error plateaus despite increasing d. Importantly, when we increase the precision of our Mathematica emulator to p = 16 and p = 20, the leveling off threshold of the error systematically decreases. For p = 20 no leveling off is observed. This confirms that the leveling off of the error in the emulator runs for the FT-circuits is due to the inherent numerical precision of the emulator.

A fault-tolerant variational quantum algorithm with limited T-depth

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction