Brought to you by:
Paper

Decomposition of matrix product states into shallow quantum circuits

, , , and

Published 8 November 2023 © 2023 IOP Publishing Ltd
, , Citation Manuel S Rudolph et al 2024 Quantum Sci. Technol. 9 015012 DOI 10.1088/2058-9565/ad04e6

2058-9565/9/1/015012

Abstract

Tensor networks (TNs) are a family of computational methods built on graph-structured factorizations of large tensors, which have long represented state-of-the-art methods for the approximate simulation of complex quantum systems on classical computers. The rapid pace of recent advancements in numerical computation, notably the rise of GPU and TPU hardware accelerators, have allowed TN algorithms to scale to even larger quantum simulation problems, and to be employed more broadly for solving machine learning tasks. The 'quantum-inspired' nature of TNs permits them to be mapped to parametrized quantum circuits (PQCs), a fact which has inspired recent proposals for enhancing the performance of TN algorithms using near-term quantum devices, as well as enabling joint quantum–classical training frameworks that benefit from the distinct strengths of TN and PQC models. However, the success of any such methods depends on efficient and accurate methods for approximating TN states using realistic quantum circuits, which remains an unresolved question. This work compares a range of novel and previously-developed algorithmic protocols for decomposing matrix product states (MPS) of arbitrary bond dimension into low-depth quantum circuits consisting of stacked linear layers of two-qubit unitaries. These protocols are formed from different combinations of a preexisting analytical decomposition method together with constrained optimization of circuit unitaries, with initialization by the former method helping to avoid poor-quality local minima in the latter optimization process. While all of these protocols have efficient classical runtimes, our experimental results reveal one particular protocol employing sequential growth and optimization of the quantum circuit to outperform all others, with even greater benefits in the setting of limited computational resources. Given these promising results, we expect our proposed decomposition protocol to form a useful ingredient within any joint application of TNs and PQCs, further unlocking the rich and complementary benefits of classical and quantum computation.

Export citation and abstract BibTeX RIS

1. Introduction

Parametrized quantum algorithms represent a new and promising technology for tackling a variety of problems in fundamental research and real-world applications alike. While the term 'quantum algorithm' is commonly associated with quantum circuit algorithms running on quantum computers, there also exists a family of quantum (commonly referred to as 'quantum-inspired') algorithms whose evaluation is classically tractable using tensor network (TN) models [1]. The field of TNs is developing at a fast pace, with TNs enabling state-of-the-art performance in quantum simulation [24], and more recently also machine learning (see e.g. [510]) and combinatorial optimization [1113]. In contrast to quantum algorithms on quantum computers, TNs allow for an analytical construction of the encoded quantum system on classical computers which enables powerful tools such as low-rank factorization, and efficient exact calculation of state amplitudes and fidelities. Additionally, the abundance and comparative cheapness of classical hardware, along with GPU and TPU support, has allowed TNs to handle simulations of extraordinary scale [1416]. On the other hand, TN models are restricted in the amount of long-range entanglement that they can encode in a quantum state, which can be a limiting factor in some quantum simulation or machine learning applications [1719].

The counterpart to TNs on quantum computers are parametrized quantum circuits (PQCs), which are currently one of the most promising frameworks for effectively utilizing near- and mid-term quantum computers with limited numbers of qubits and modest amounts of hardware noise [2022]. With their distinct strengths and weaknesses relative to TNs, PQCs promise to help solve some tasks more effectively than TNs by enabling long-range entangling operations to prepare quantum states [2325].

Owing to their similar mathematical structure and shared use for parametrizing quantum states, both TNs and PQCs can generally be transformed exactly or approximately into instances of the other [26, 27], with the transformation from TNs to PQCs requiring acyclic TN architectures to work in full generality. Transferring previously optimized quantum states from classical to quantum hardware may not only reduce the amount of time and resources spent on expensive and scarce quantum computers, but it may enable new state-of-the-art results with joint optimization frameworks. Initial stages of the quantum state optimization could first be simulated using TNs, before continuing on quantum computers with access to increased model expressivity through long-range entangling gates (see e.g. [10] for a concrete realization using the developments of this work). The success of such joint schemes then clearly hinges on the quality of the transfer from TNs to PQCs, as well as the depth of the resulting quantum circuit. The latter is a critical requirement for near-term quantum computers, whose noise may lead to significant decay in state fidelity for deep circuits. While joint frameworks utilizing TNs and PQCs have already been proposed, and are expected to yield significant benefits [5], there remains a relative scarcity of computational techniques for converting TN models to high-quality shallow quantum circuits, with best practices for this conversion process still lacking.

In this work, we introduce a number of algorithmic protocols for approximately decomposing matrix product states (MPS), a one-dimensional flavor of TNs, of arbitrary bond dimension χ into stacked linear layers of two-qubit unitaries. These protocols each utilize different combinations of the analytic decomposition technique proposed in [28] with constrained optimization of circuit unitaries on classical hardware, and include as special cases the decomposition methods of [28, 29], the more sophisticated method of [30], and layerwise PQC training strategies [31]. We benchmark these protocols on a variety of MPS arising from many-body and machine learning settings, and find one particular protocol, involving sequential growth and optimization of the quantum circuit, that emerges as the most successful technique. This novel decomposition method consistently delivers lower approximation errors than other protocols, with an advantage that becomes more pronounced with a limited computational budget. This decomposition protocol can be used as an effective tool for mapping MPS to shallow quantum circuits with high fidelity, which we anticipate will make possible the realization of powerful joint frameworks utilizing both TNs and PQCs.

2. Background

2.1. TNs

TNs are low-rank factorizations of large tensors into networks of smaller tensors, which can represent wave functions and quantum circuits on classical hardware [1, 32]. TNs are typically illustrated as undirected graphs, whose nodes represent 'core' tensors containing the parameters of the model, and whose edges indicate contractions between core tensors along particular tensor axes. TN model families are defined by their underlying graph structures, with 1D line graphs giving MPS [33], 2D grid graphs giving projected entangled pair states (PEPS) [34], and tree graphs (or equivalently, acyclic graphs) giving tree tensor networks (TTNs) [26]. The main capacity parameter of TNs is the dimension of the vector spaces being contracted over, known as the bond dimension χ. This parameter controls the expressive power of the model, as well as the computational resources and memory requirements for classical simulations. The bond dimensions needed to encode a target wavefunction can be estimated by the entanglement entropy within the target state [35], with states possessing entanglement entropy S requiring TNs of bond dimensions $\chi \sim e^S$ to exactly represent. MPS implementations typically utilize bond dimensions of at most $\chi \sim 10^3$, although high-performance simulations using MPS are able to reach bond dimensions of $\chi \sim 10^4 - 10^5$ [36].

TN simulations of quantum circuits can be very effective when the graph structure of the TN matches the circuit layout [17, 27], with the singular value decomposition (SVD) enabling efficient approximate simulations by dropping lower magnitude singular values in the Schmidt decomposition of the target state [35]. As a concrete example, TN simulations in [3] were able to achieve higher overall state fidelity when simulating the multi-layer 2D circuits of the so-called quantum supremacy experiments in [37], where the fidelity deteriorated due to quantum hardware errors. On the other hand, computational cost of classical simulations can quickly escalate when the TN graph and quantum circuit layouts differ, such that circuit gates act on qubits that are far apart on the TN graph. In such cases, the bond dimensions necessary for a high-fidelity simulation will generally increase exponentially with the system size, owing to the need to carry entanglement between distant qubits through all intermediate edges.

2.2. Canonical forms and isometries

While simulating quantum circuits on classical hardware via representations as TNs has many fundamental use-cases, our focus here is on the reverse direction, of efficiently representing TNs as the output of quantum circuits. An important tool for carrying out this latter conversion are TN canonical forms, which facilitate the loss-less conversion of all core tensors in a TN into isometries (i.e. inner product preserving transformations between Hilbert spaces) [38]. While weaker canonical forms have been developed for general TNs [39], the ability to efficiently convert core tensors into isometries requires TN architectures to be defined on acyclic graphs, such as MPS and TTNs.

By restricting to bond dimensions which are powers of 2, the isometries arising in a TN in canonical form are equivalent to matrices of shape $2^n\times 2^m$ (assuming $n \unicode{x2A7E} m$ without loss of generality), which can always be expanded to $2^n \times 2^n$ unitary matrices (i.e. n-qubit unitaries). In more detail, if Q is an isometry of shape $2^n\times 2^m$ satisfying $Q^{\dagger}Q = \unicode{x1D7D9}_{2^m \times 2^m}$, the construction of the corresponding n-qubit unitary U involves extending the 2m columns of Q with $\kappa : = 2^n - 2^m$ basis vectors from the kernel space of $Q^{\dagger}$. These additional basis vectors can be arranged in a $2^n \times \kappa$ matrix X satisfying $Q^{\dagger}X = 0$ and $X^{\dagger} X = \unicode{x1D7D9}_{\kappa \times \kappa}$, which guarantees that the $2^n\times 2^n$ square matrix $U = [Q \quad X]$ satisfies the unitary condition $U^{{\,}\dagger}U = U U^{{\,}\dagger} = \unicode{x1D7D9}_{2^n \times 2^n}$.

For a general MPS parametrizing an N-qubit wavefunction, each tensor core will have left and right bond dimensions of $\chi_{\text{left}}$ and $\chi_{\text{right}}$, which, by padding with zeros, can be expanded to the nearest powers-of-two, namely $\chi_{\text{left}}^{\prime} = 2^n$ and $\chi_{\text{right}}^{\prime} = 2^m$, where $n = \lceil\log_2(\chi_{\text{left}})\rceil$, $m = \lceil\log_2(\chi_{\text{right}})\rceil$, and $\lceil \cdot \rceil$ denotes the ceiling operation. Placing this MPS in (left) canonical form via iterated QR decompositions (see [33] for details) will then yield a $2^{n+1} \times 2^m$ isometry, which can then be converted into a unitary gate acting on n + 1 qubits. The linear topology of the MPS leads the composition of gates from this conversion process to form one layer of multi-qubit unitaries arranged in a staircase topology, which we refer to as a linear circuit layer. Figure 1 illustrates this exact conversion from MPS to PQC for an MPS with ${\chi_{\text{max}}} = 8$.

Figure 1.

Figure 1. Exact or approximate mapping of an MPS with an arbitrary maximum bond dimension ${\chi_{\text{max}}}$ to a quantum circuit. (Top left) MPS with ${\chi_{\text{max}}} = 2$ can be mapped exactly to a quantum circuit consisting of one linear layer of two-qubit unitaries. (Bottom, left) In general, for an MPS with a maximum bond dimensions of ${\chi_{\text{max}}}$, each bond in the MPS with bond dimension χ maps to a multi-qubit unitary acting on $\left \lceil\log_2( \chi)\right \rceil +1$ qubits. (Bottom right) To more effectively transport MPS to quantum hardware, we study the decomposition of MPS into multiple layers of two-qubit unitaries.

Standard image High-resolution image

Despite the simplicity of the conversion process described above, a major deficit of this procedure is the fact that arbitrary n + 1 qubit unitary gates cannot be directly implemented in quantum circuits running on real quantum hardware, which typically only support gates acting on one or two qubits. Overcoming this limitation while still obtaining an efficient and accurate conversion from TNs to PQCs is a primary motivation for our work.

2.3. Related work

The problem of efficiently representing general MPS as the output of PQCs is crucial for many applications in condensed matter physics and quantum machine learning (QML) which utilize both classical and quantum computational resources. In the former domain, [29, 4042] explored the use of PQCs as quantum variational ansätze for calculating diverse properties of complex many-body systems, and identified distinct benefits of this method relative to the use of TNs on their own. Within QML, [5] introduced the idea of a joint optimization framework utilizing both TNs and PQCs, which the authors predicted would give improved performance in QML tasks. While this proposal lacked a concrete method for transferring TNs solutions to quantum computers, the PQC pretraining method of [43] gave numerical verification of the benefits of this style of joint training for MPS with ${\chi_{\text{max}}} = 2$ (which exactly decompose into a single linear layer of two-qubit unitaries). The recent work of [10], which utilizes the conversion method described here, illustrates the increased boost in QML task performance that results from the use of larger TN models for initializing PQCs. [4446] represent other approaches which combine TN and PQC models to solve QML tasks.

At the level of specific algorithms, [28] proposed an analytic technique for decomposing MPS of arbitrary ${\chi_{\text{max}}}$ into multiple linear layers of two-qubit unitaries, where the unitaries for each layer are chosen based on truncations of the MPS to χ = 2 across each bond, determined through iterated SVDs (see also the earlier work of [47]). This has the advantage of requiring relatively small computational resources, but unfortunately leads to relatively small gains with increasing circuit layers, decreasing fidelity past a critical number of layers, and poor performance compared to more sophisticated approaches. In a different direction, [29] proposed decomposing arbitrary MPS into circuits of stacked linear layers via iterative optimization over the two-qubit unitaries forming that circuit. This optimization method is similar to the Evenbly-Vidal algorithm [48], as well as the procedure described in section 3.2, and can support more general circuit architectures and achieve better performance via greater numbers of optimization sweeps. However, without a high-quality initialization for the circuit gates, this method remains vulnerable to barren plateaus [49], and becomes expensive when large numbers of sweeps are needed.

Closer to our work is the recent [30], which utilizes both analytical and optimization-based decomposition methods to efficiently approximate arbitrary MPS using PQCs. This method first initializes the gates of a circuit using the analytical method, before using iterated optimization of the unitary gates to improve fidelity with the target MPS, a combination that was shown to have significant performance benefits over that of [28]. This method is only one of many decomposition protocols we benchmark here (denoted by $D_{all} O_{all}$ in the language of section 3.3), and while we confirm the clear advantages of combining analytical and optimization methods, our empirical findings show that there are yet better ways of utilizing these two styles of decompositions to approximate arbitrary MPS using quantum circuits.

3. Methods

In this section, we outline the fundamental algorithmic building blocks used throughout our work, as well as the MPS decomposition protocols that can be built from them. To resolve any possible ambiguity that might arise, we give a detailed description of our notation and indexing convention for unitaries U and layers of unitaries L$[U]$ in appendix C.

3.1. Analytic decomposition by disentangling

[28] presents a technique to sequentially decompose an MPS into K layers of two-qubit unitaries with a linear next-neighbor topology. The approach is aimed at maximizing the fidelity

Equation (1)

between the original MPS state $| \psi_{{\chi_{\text{max}}}} \rangle$, and a quantum state

Equation (2)

which is meant to approximate the MPS. The resulting quantum circuit is $\prod_{k = K}^1 \textrm{L}[U]^{(k)}$, and it consists only of two-qubit U(4) unitaries. Equivalently, one could view the inverse layers $\prod_{k = 1}^K \textrm{L}[U]^{(k)\dagger}$ as the circuit that disentangles the MPS. To avoid ambiguity, we will at times refer to $| \psi_{{\chi_{\text{max}}}} \rangle$ as the 'target state' when it is being viewed as a target for approximation, and as the 'partially-decomposed state' when it is being disentangled by circuit layers.

Algorithm 1 describes the analytic decomposition technique, which is also illustrated in figure 2(top).

Figure 2.

Figure 2. Illustration of the two fundamental building-block algorithms that are used to compose practical decomposition protocols for an MPS with arbitrary bond dimension ${\chi_{\text{max}}}$ into linear layers of two-qubit unitaries. (TOP) Analytical decomposition algorithm presented in [28]. See section 3.1 for details. A ${\chi_{\text{max}}} = 2$ MPS map exactly to one layer of two-qubit unitaries (see also figure 1), and one can truncate an MPS via SVD to χ = 2 such that the fidelity $|\langle \psi_{\chi = 2} | \psi_{\chi_{\text{max}}} \rangle|$ is near optimal. The χ = 2 version of the MPS maps to one layer L$[U]$ of two-qubit unitaries, whose inverse can then be applied to disentangle the MPS. The process can be repeated iteratively to create several unitary layers that approximate the original MPS state. (Bottom) A constrained optimization algorithm for unitaries in TNs which is demonstrated in [51]. See section 3.2 for details. Updates for a circuit unitary Um are calculated by first calculating the environment tensor $\hat{\mathcal{F}}_m$ in equation (5), and applying SVD on it to find the locally optimal unitary. A learning rate is applied to improve the convergence behavior.

Standard image High-resolution image
Algorithm 1. Analytic Decomposition.
Input: MPS $\psi_{{\chi_{\text{max}}}}$, Maximum layers K, Target fidelity $\hat{f}$
Output: Quantum Circuit layers $\prod_{k = K}^1 \textrm{L}[U]^{(k)}$
1: $k \gets 0$
2: $| \psi^{(0)} \rangle \gets | \psi_{{\chi_{\text{max}}}} \rangle$
3: while k < K or $|\langle 0^{\otimes N} | \psi^{(k)} \rangle| \lt \hat{f}$ do
4:  Truncate $\psi^{(k)}$ to $\psi^{(k)}_{\chi = 2}$ via SVD
5:  Convert $\psi^{(k)}_{\chi = 2}$ to $\text{L}[U]^{(k+1)}$
6:  $|\psi^{(k+1)}\rangle \gets \textrm{L}[U]^{(k+1)\dagger} |\psi^{(k)}\rangle$
7:  $k \gets k+1$
8: end while

In this algorithm, one truncates a copy of the original MPS $\psi_{{\chi_{\text{max}}}}$ to χ = 2 via SVD, converts the truncated MPS $\psi_{\chi = 2}$ to one layer L$[U]$ of two-qubit unitaries, and applies the inverse of that layer L$[U]^{\dagger}$ to the original MPS. The resulting partially-disentangled MPS is expected to have less entanglement and the bond dimensions may decrease depending on the predefined singular value thresholds. The conversion of the χ = 2 MPS to a linear layer of two-qubit unitaries is explained in section 2.2 and [28, 41, 43]. This process can be repeated iteratively to create several layers $\prod_{k = K}^1 \textrm{L}[U]^{(k)}$ which are indexed from K to 1 owing to the reversal of the disentangling layers to form a circuit for approximating the target MPS. In particular, the Kth layer of the disentangling circuit ends up being used as the (adjoint of the) first layer of the quantum circuit for approximating $| \psi_{{\chi_{\text{max}}}} \rangle$.

The analytical decomposition algorithm is relatively inexpensive to perform, and the decomposition can be improved sequentially up until a desired fidelity, with every additional layer likely giving better decomposition performance. However, it is crucial to note that for a fixed number of layers K, it is not possible to iteratively increase the quality of the decomposition. This leads to our observation that K sometimes needs to be orders of magnitudes larger for high-fidelity decomposition than predicted by the lower bound $K_\mathrm{min} \sim \log_2({\chi_{\textrm{max}}})$ [50]. For this reason, this analytical decomposition algorithm is better utilized in conjunction with an effective constrained optimization algorithm for the resulting quantum circuit unitaries.

3.2. Decomposition by optimization

We now describe the constrained optimization algorithm for the circuit unitaries used throughout this work. It was demonstrated in [51] for learning quantum circuits to prepare quantum states. Given an initial quantum circuit $\prod_{m = 1}^MU_m$ consisting of M unitaries Um , the goal of the optimization algorithm is to increase the fidelity

Equation (3)

between the target MPS state $| \psi_{{\chi_{\text{max}}}} \rangle$, and a quantum state

Equation (4)

which is meant to approximate the MPS. In our case, the unitaries Um act only on two qubits, i.e. the unitaries are U(4) matrices, and the circuit layout is a linear next-neighbor topology for efficient classical simulation using MPS, as well as for compatibility with algorithm 1 in section 3.1.

Algorithm 2 describes the optimization algorithm used in this work, which is also illustrated in figure 2(bottom).

Algorithm 2. Decomposition by Optimization.
Input: Initial quantum circuit $\prod_{m = 1}^M U_{m, 0}$, number of sweeps T, target fidelity $\hat{f}$, learning rate $r \in [0, 1]$
Output: Optimized quantum circuit $\prod_{m = 1}^M U_m$
1: $t \gets 1$
2: While t < T or $\left | \langle 0^{\otimes N}| \prod_{m = M}^1 U^{{\,}\dagger}_m |\psi_{\chi_{max}}\rangle \right | \lt \hat{f}$ do
3:   for m in 1 ... M do
4:    Calculate $\hat{\mathcal{F}}_m$ in equation(5)
5:    SVD $\hat{\mathcal{F}} = \mathcal{U}\mathcal{S}\mathcal{V^{\dagger}}$
6:    $ U_{\text{new},m} \gets \mathcal{U}\mathcal{V^{\dagger}}$
7:    Apply learning rate in equation (6)
     $U_{m} \gets U_{m}(U^{{\,}\dagger}_{m} U_{\text{new},m})^r$
8:   end for
9:   $t \gets t+1$
10: end While

The optimization algorithm iterates over all unitaries in the circuit and updates them one by one. The number of sweeps can either be predefined through a maximum number T, or until a target fidelity $\hat{f}$ is reached with $f\big(\{U_m\}_m\big) \gt \hat{f}$ (compare equation (3)). The update for one unitary Um can be calculated by first calculating the so-called environment tensor for Um :

Equation (5)

Here, the expression $\text{Tr}_{\bar{U}_m}[\dots]$ denotes the partial trace over the qubit indices that do not interact with Um . The environment tensor $\hat{\mathcal{F}}_m$ is represented as a $4\times 4$ matrix, and can in practice be calculated by 'removing' Um from the circuit and contracting the remaining tensor network (see bottom of figure 2) while retaining the MPS structure throughout. Interestingly, $\hat{\mathcal{F}}_m$ is the operator which is locally optimal for the circuit to increase the fidelity with the MPS if it were replacing Um [51], but this operator is not generally unitary. In order to keep all quantum circuit operations unitary, we can compute the SVD $\hat{\mathcal{F}}_m = \mathcal{U}\mathcal{S}\mathcal{V}^{\dagger}$, then use the fact that the product $\mathcal{U}\mathcal{V}^{\dagger}$ is the unitary matrix that minimizes the Frobenius distance with $\hat{\mathcal{F}}_m$ [52], which in turn maximizes the fidelity of the PQC with the MPS among all unitaries. This leads $U_{\text{new}} = \mathcal{U}\mathcal{V}^{\dagger}$ to be the locally optimal unitary for increasing the fidelity with the target MPS. Since this is a rather strong local update, we introduce a learning rate $r\in[0, 1]$, which influences the unitary update rule via

Equation (6)

We find the optimization process to give excellent performance using a value of r = 0.6. We note that the fractional matrix exponential in $\left( U^{{\,}\dagger}_{m} U_{\text{new}} \right)^r$ can be implemented by simply applying the exponent to the diagonal elements in the eigendecomposition of $U^{{\,}\dagger}_{m} U_{\text{new}}$. Finally, $U_{m}^{^{\prime}}$ is the unitary that replaces Um , which gives improved fidelity with the MPS.

We note that it is vital to the computational feasibility of this algorithm to perform optimization sweeps consisting of consecutive values for the index m, because in that case the states used to compute $\hat{\mathcal{F}}_m$, i.e. $\prod_{i = M}^{m+1} U_{i}| \psi_{\chi_{max}} \rangle$ and $\langle 0^{\otimes N} |\prod_{j = 1}^{m-1} U^{{\,}\dagger}_{j}$, do not require full re-calculation. With a proper choice of the optimization order, most of these states can be cached and reused. For example, after $U_{m}^{^{\prime}}$ has been obtained, we apply $U_{m}^{^{\prime}\dagger}$ to $\langle 0^{\otimes N} |\prod_{j = 1}^{m-1}$ and $U^{{\,}\dagger}_{m+1}$ to $\prod_{i = M}^{m+1} U_{i}| \psi_{\chi_\mathrm {max}} \rangle$, which forwards the two sides of the TN to step m + 1. While one can analogously perform backward sweeps, we restrict ourselves to forward sweeps in this work.

The issue with decomposing an MPS with this optimization algorithm alone (what we call the brute-force method), is that when the initial circuit unitaries are random, the application of the circuit to the fully-optimized MPS state $| \psi_{{\chi_{\text{max}}}} \rangle$ can potentially increase the maximum bond dimension ${\chi_{\text{max}}}$ required to approximate the resulting quantum state without significant truncation of singular values. Therefore, we now combine algorithms 1 and 2. Let algorithm 1 provide the initial circuit guesses that tend to decrease the bond dimension of the MPS, and let algorithm 2 then optimize the unitaries to achieve the best approximation of the target MPS for a fixed number of unitaries and layers.

3.3. Combinations of analytic decomposition and optimization

In sections 3.1 and 3.2 we have introduced the fundamental algorithmic building blocks for decomposing MPS into quantum circuits consisting of only two-qubit unitaries. All steps of both algorithms are to be performed on classical hardware. Both approaches on their own have severe limitations, but complement each other very nicely. To briefly summarize, while relatively inexpensive to perform, the main disadvantage of the analytic decomposition in section 3.1. is that the number K of resulting quantum circuit layers can be very large to achieve a desired fidelity, and there is no way to increase fidelity for a fixed number of layers. Disadvantages of the brute-force decomposition algorithm in section 3.2 include that random initial guesses for the unitaries may not only require many sweeps T to optimize, but the random circuit can cause the bond dimension ${\chi_{\text{max}}}$ of the partially-disentangled MPS to increase. However, optimization is a very valuable approach, because one spends classical resources to increase the TN decomposition quality without increasing circuit depth and potentially hampering the performance of the quantum circuits on noisy quantum hardware. The performance of optimization-based decomposition methods can furthermore be improved by using high-quality initializations of the gate parameters being optimized [53].

Fortunately, both algorithms can be combined into several different decomposition protocols, towards achieving the best trade-off between the required classical computing resources and the resulting quantum circuit depth. In fact, one way of understanding their interplay is that the analytical decomposition step serves as a high-quality initialization for each layer in the subsequent constrained optimization. As quantum hardware improves, one can more reliably implement deeper circuits on quantum hardware, which could in turn alleviate some of the burden of the MPS decomposition from classical hardware.

In figure 3, we illustrate the combinations of algorithms 1 and 2 that are studied in this work. The letter D stands for the analytical decomposition algorithm, i.e. algorithm 1 in section 3.1. We make use of the possibility to either decompose one layer (indicated by i), or decompose to the full circuit depth K (indicated by all). The letter I represents an alternative approach to D, where linear layers are added to the quantum circuit in the form of identity unitaries. The letter O stands for the application of the optimization algorithm, i.e. algorithm 2 in section 3.2. We either optimize the last layer that was created (indicated by i), or all layers in the circuit so far (indicated by all). The 'Iter' notation implies that there is an iterative procedure between creating new layers (either with D or I), and optimizing the newest layer or all layers so far (indicated by i or all, respectively).

Figure 3.

Figure 3. Schematic depiction of the MPS decomposition protocols studied in this work. The protocols utilize algorithm 1 in section 3.1 (as indicated by 'D'), as well as algorithm 2 in section 3.2 (as indicated by 'O'). The different protocols are described in section 3.3.

Standard image High-resolution image

Our first protocol is Dall , which is equivalent to applying only algorithm 1. The second protocol is Iter$[D_i O_i]$, and it represents a minimal extension of Dall where only the unitaries in the layer $\textrm{L}[U]^{(k+1)}, k\lt K$ are optimized before the disentangling step $| \psi^{k+1} \rangle \gets \textrm{L}[U]^{(k+1)\dagger}| \psi^{(k)} \rangle$. This aims to ensure that $|\langle \psi^{(k)}_{\chi = 2} | \psi^{(k)} \rangle|$ is maximal in every iteration before disentangling, which the SVD only ensures approximately. The third protocol is Oall , which is equivalent to applying only algorithm 2. The circuit unitaries are initialized as $4 \times 4$ matrices with random entries from a normal distribution with zero mean. These are then converted into unitaries by selecting the unitary component Q in the QR-decomposition of the random matrix. The fourth protocol, $D_{all}O_{all}$, constitutes the first practical combination of both analytical decomposition and optimization. In this protocol, the K circuit layers are first created using algorithm 1, and then all unitaries are optimized in sweeps in further increase fidelity. The analytically decomposed layers provide a good starting point for the optimization algorithm as compared to random unitaries, and, crucially, the layers tend to already have a disentangling effect on the MPS, which reduces the bond dimension of the MPS. The fifth protocol, Iter$[I_i O_{all}]$, is a sequential protocol where new layers are iteratively added as identity operations, and and all existing layers are optimized using algorithm 2. The sixth and final protocol studied in this work is Iter$[D_i O_{all}]$, where new layers are added via one analytical decomposition step, and all existing layers are optimized using algorithm 2. The difference between the protocols Iter$[D_i O_i]$ and Iter$[D_i O_{all}]$ is that, while both maximize $|\langle \psi^{(k)} | \psi_{{\chi_{\text{max}}}} \rangle|$ at every decomposition step k, Iter$[D_i O_{i}]$ keeps layers $1\dots k-1$ fixed, however, these layers are also optimized in Iter$[D_i O_{all}]$.

The decomposition protocols studied in this work are constructed such that the protocol Iter$[D_i O_{all}]$ represents the most sophisticated combination of algorithms 1 and 2. The other protocols can be seen as reference benchmarks with different components of Iter$[D_i O_{all}]$ being stripped away—either the analytical decomposition, the optimization, or the sequential growing scheme of the circuit. As such, we compare this protocol to previous work, where Dall was proposed in [28], Oall is similar to the algorithm proposed in [29], and $D_{all} O_{all}$ is similar to the method implemented in [30].

3.4. Discussion of computational complexity

It is crucial to the decomposition protocols presented in this work that all necessary steps can feasibly be performed on classical hardware. A rigorous derivation of the computational resources required for a practical decomposition of an MPS of interest is not within the scope of this work, but we emphasize that the use of MPS as intermediate states during the decomposition procedure allows the classical runtime to remain polynomial with respect to the circuit depth and qubit count. Therefore, in this section, we merely present a general overview of the expected complexities of algorithms 1 and 2, as well as the protocols presented in section 3.3.

For an MPS with a maximal bond dimension ${\chi_{\text{max}}}$, the memory requirements of the MPS scale like $\sim N\chi^2_{\text{max}}$ [32]. However, including the cost of bringing the MPS into canonical form, the computational complexity $\mathcal{C}_{\text{MPS}}$ is

Equation (7)

Now approaching our MPS decomposition protocols, the protocol $D_{all} O_{all}$, for example, requires K steps of analytical decomposition (see section 3.1), whose complexity $\mathcal{C}_{\text{Decomp}}$ scales like

Equation (8)

We note that each linear layer has N − 1 two-qubit unitaries, but the computational cost $\mathcal{C}_{\text{MPS}}$ already includes performing SVD on all bonds of the MPS. Then performing the T optimization sweeps over all K layers requires results in the complexity $\mathcal{C}_{\text{Opt}}$ of the optimization

Equation (9)

The sequential protocols Iter$[O_i D_{i}]$, Iter$[I_i O_{all}]$ and Iter$[D_i O_{all}]$ apply T optimization sweeps not only on K layers, but on k < K layers for all previous iterations k. However, these previous layers have fewer unitaries. The total number of layers that are optimized in a protocol that concludes at K layers is $\frac{K(K+1)}{2}$. Thus, the sequential protocols (indicated by 'Iter') receive an additional scaling factor of K:

Equation (10)

The question now becomes, how many layers K are needed for a high-fidelity approximation of the MPS? A convenient property of our main proposed protocol Iter$[D_i O_{all}]$ is that the circuit can grow sequentially, and one does not have to guess the circuit depth that can feasibly be optimized on classical hardware. However, a lower-bound of $K_{\text{min}} \sim \log_2({\chi_{\text{max}}})$ is presented in [50]. We note that this bound provides a best-case circuit depth to exactly decompose a generic MPS, but exact decomposition is likely not the goal of a practical decomposition protocol. Therefore, we merely understand the scaling to provide a general sense of how deep quantum circuits consisting of two-qubit unitaries need to be to represent an MPS with maximum bond dimension ${\chi_{\text{max}}}$. In this case, we arrive at scalings of

Equation (11)

4. Empirical assessment

In this section, we quantitatively compare the six MPS decomposition protocols presented in section 3.3. We study three 12-qubit MPS with bond dimension ${\chi_{\text{max}}} = 64$ that represent the solutions to a Hamiltonian ground state search problem, a generative modelling task, as well as a random MPS.

The Hamiltonian is a spin $1/2$ antiferromagnetic Heisenberg model Hamiltonian with

Equation (12)

where $\hat{S}_{X,Y,Z} = \frac{1}{2}\sigma_{x,y,z}$ are the spin operators acting on sites i and j, and $\sigma_x, \sigma_y, \sigma_z$ are the Pauli operators. In particular, we study a $4\times3$ two-dimensional Heisenberg model with open boundary condition, where $\langle i, j \rangle$ denotes the interaction of the respective vertical and horizontal neighbors in a grid layout. The MPS with ${\chi_{\text{max}}} = 64$ exactly represents the ground state of this Hamiltonian.

The second ${\chi_{\text{max}}} = 64$ MPS exactly encodes a uniform superposition over the binary data samples in the $6 \times 2$ bars and stripes (BAS) dataset [54]. This dataset has emerged as a canonical benchmark for generative modelling tasks in the domain of QML. The BAS MPS is special in that the singular value spectrum in the middle bond consists of either constant degenerated large values, or zero. As such, we expect results on this MPS to exhibit sharper increases in fidelity when all singular values can be reproduced by a sufficiently deep circuit.

Finally, we also test the decomposition protocols on a random ${\chi_{\text{max}}} = 64$ MPS, where each with each tensor entry (before canonicalization) is drawn from a normal distribution with zero mean and variance of 1. The same random MPS was used to assess all decomposition methods, with the choice of MPS made prior to obtaining any experimental results (i.e. no cherry picking). There are different conventions used in literature as to how to construct a random MPS state. We note that the existence of both negative and positive values presents a more challenging decomposition task, as the singular values decay significantly slower than with only positive tensor entries.

Figure 4 depicts the decomposition performances on the three target MPS using the six protocols presented in section 3.3. We record the infidelity with the respective target MPS per number of circuit layers K, where infidelity is defined as $1-f$, with the fidelity f from equation (1) or equivalently equation (3). Note that some other works choose $1-f^2$. In addition to the finally attained infidelities, where applicable, we illustrate the optimization progress leading up to this final result in between the data points. This highlights how the sequential protocols (indicated by 'Iter') build upon previous circuit layers. We test the performance of the decomposition protocols for $T = 10, T = 100$ or T = 1000 optimization sweeps per number of layers. The cases T = 10 and T = 100 are more representative for results one may obtain in practice with a limited computational budget, whereas T = 1000 highlights the robustness to local minima during optimization. Because the protocols Iter$[I_iO_{all}]$ and Iter$[D_iO_{all}]$ utilize the solution from the previous data point, and thus there are already optimization sweeps invested for any k > 1, T is increased in Oall and $D_{all}O_{all}$ to match the total number of unitary optimization sweeps (compare also equations (9) and (10)). One important note is that we do not perform any truncation of singular values in these numerical results, but perform contractions with the exact statevector.

Figure 4.

Figure 4. Benchmark results of the MPS decomposition protocols presented in section 3.3 and figure 3. We plot the infidelity of three target MPS with the respective quantum state prepared by the decomposed quantum circuit. We study the MPS examples which encode the ground state of a $4\times 3$ Heisenberg Hamiltonian (top), a uniform superposition over data samples from the $6 \times 2$ BAS dataset (middle), as well as a random MPS (bottom). All MPS have ${\chi_{\text{max}}} = 64$. We report the infidelity with $T = 10, T = 100$ or T = 1000 optimization sweeps per data point, as well as an illustration of the infidelity convergence during optimization (where applicable) in-between data points.

Standard image High-resolution image

Our results show that the protocols Dall and Iter$[O_i D_{i}]$ improve only very slowly with additional layers, which was also observed in [30]. In contrast, the protocol Oall is mainly hindered by the large number of optimization sweeps T necessary to achieve high-quality approximations, but as we see when comparing T = 100 and T = 1000, the protocol may additionally run into complications related to local minima. While local minima for larger K also occur for the protocol $D_{all} O_{all}$, it achieves significantly improved approximations for low T. In the cases of the ground state and random MPS, where the singular value spectra are more continuous, the protocol Iter$[I_i O_{all}]$ can achieve competitive performance. However, for the BAS dataset MPS, the protocol is clearly under-performing. Finally, it becomes evident that the protocol Iter$[D_i O_{all}]$ is the best-performing and most consistent across the all target MPS and number of optimization sweeps. It is less affected by sub-optimal local minima than Oall and $D_{all} O_{all}$, and can achieve signficantly better fidelities than Iter$[I_i O_{all}]$ on the BAS dataset MPS. The improvements are most significant for T = 10 and T = 100, which is a very desirable property that reduces the amount of classical computing resources spent for a given decomposition fidelity.

In appendix A we provide further numerical evidence of the performance of the protocols. For the structured target MPS, and the more realistic scenarios of ${T = 10}$ and ${T = 100}$ optimization sweeps, the protocol Iter$[D_i O_{all}]$ achieves equivalent infidelities to all other methods with only $25\%$ the number of unitary updates and with 4 instead of 8 layers, in the best case.

To summarize, we find the protocol Iter$[D_i O_{all}]$ to be the most performant and promising for practical application. When only a relatively small amount of resources are available for optimization, we find the improvement in infidelity per number of layers K reach orders of magnitude relative to other protocols. In particular, our ablation analysis shows that, if one removes any one component of the protocol, namely the analytical decomposition, the optimization, or the sequential growing scheme of the circuit, the protocol performs significantly worse.

5. Conclusion

In this work, we presented a range of algorithmic protocols to decompose MPS with arbitrary bond dimension into linear quantum circuit layers consisting of two-qubit unitaries. The fundamental building-block algorithms for the protocols are the analytical decomposition algorithm presented in [28], and a constrained optimization algorithm to optimize the circuit unitaries on classical hardware, which was utilized in [51] (see also the earlier method of [48]). We benchmarked the MPS decomposition protocols on three different 12-qubit MPS with bond dimension χ = 64. Across our benchmarks, we found the protocol Iter$[D_i O_{all}]$ to be the most successful in reliably decomposing MPS with high fidelity into shallow quantum circuit layers. The protocol combines sequential growing the quantum circuit using the analytical decomposition algorithm with intertwined optimization steps of all existing circuit unitaries. It performs particularly well with a limited number of optimization steps—sometimes achieving orders of magnitude improvement in infidelity compared to other protocols. Nevertheless, further quantitative studies are necessary to evaluate the performance and scalability of this MPS decomposition protocol. Such studies should additionally explore the use of automatic differentiation and Riemannian optimization methods for performing the optimization steps, which offer different performance and implementational considerations than the sweeping optimization method employed here [5557]. Such work can also benefit from qualitative studies aimed at understanding which properties of quantum states determine the difficulty of their preparation using a circuit with a fixed topology and limited depth. This is an important question which our present results leave unsettled.

A promising follow-up work is to extend the protocols presented here to TTNs. TTNs are more flexible than MPS in terms of the long-range correlations that they can capture, a feature which has already proven practically beneficial in generative modeling tasks [58]. By efficiently converting TTNs into PQCs running on quantum hardware, these benefits can likely be translated into better performance on a variety of simulation and QML tasks. Decomposing the two-dimensional PEPS on classical hardware may be significantly more challenging, due to the lack of strong canonical forms facilitating the exact conversion into circuits of multi-qubit unitaries. However, the close resemblance between the graphical structure of PEPS and that of physical qubit topologies in typical quantum hardware (e.g. [37]) could make such conversions from PEPSs to PQCs extremely promising for practical applications. While our driving motivation in this work has been to limit the amount of quantum resources spent for the TN decomposition, one may attempt to perform certain sub-routines on quantum hardware [30]. This could be particularly useful when working with PEPS [59, 60], or when aiming to map MPS or TTNs to circuit layouts that are not classically efficient to simulate.

An accompanying work, [10], utilizes the MPS decomposition protocol presented here inside a synergistic optimization framework, and demonstrates how the performance of PQCs is continuously improved by MPS with increasing ${\chi_{\text{max}}}$ across several applications. Additionally, [10] provides evidence that PQCs initialized with classically trained TNs may not be affected by barren plateaus [49, 61, 62], due to the highly specific circuit initialization. This companion work can be seen as the realization of the benefits predicted in [5], which uses our decomposition protocol to convert MPS into shallow quantum circuits. We are optimistic that future work will identify yet greater improvements on our methods, and foster a rich interconnection between the growing TN and PQC research communities.

Data availability statement

All information necessary to replicate the data supporting the findings of this study are included within the article (and any supplementary files).

Conflict of interest

The authors declare no competing interests.

Author contributions

M S R, J C and A A reviewed the suitability of algorithms 1 and 2 for the purpose of this work. M S R and J M designed the proposed decomposition protocols. M S R wrote the code used in this work and performed the numerical simulations. J C provided optimized MPS models for the VQE simulations. J M, J C and A A provided relevant expertise in tensor network methods. A P-O helped supervise and coordinate the efforts in this work. All authors regularly analyzed the numerical results and contributed to the final version of the manuscript.

Appendix A: Performance as a function of unitary updates

In order to establish the relative performances of the decomposition protocols more clearly, we collect statistics for the infidelity of the decomposition over 50 repetitions as a function of the number of unitary updates and present them in figure 5. Seeing as the analytical decomposition step is quick relative to the total number of unitary updates, the x-axis can be understood as a proxy for the total wall time. The target MPS are structurally the same as in figure 4 and the final number of circuit layers is chosen to be 8. We depict the more realistic cases of 10 and 100 optimization sweeps, where the number of sweeps for the non-sequential sweeps is again increased accordingly. We note that the sequential growing of circuits in the protocols Iter$[I_iO_{all}]$ and Iter$[D_iO_{all}]$ occurs within the same computational budget, which leads to these methods only optimizing up to five layers before the half-point mark of total unitary updates. However, in the end, all models experience the same number of local unitary updates.

Figure 5.

Figure 5. Statistics on the performance of the decomposition protocols as a function of the number of local unitary updates. The target MPS are the same as in figure 4, and the number of decomposition circuit layers is chosen to be 8. We depict the median infidelity and the 5-95 percentiles over 50 repetitions. The random target MPS are re-generated for every repetition. Only the protocols Oall and Iter$[I_iO_{all}]$ exhibit significant variation because the initial gate parameters are random and not deterministic. The identity layers in Iter$[I_iO_{all}]$ are in fact near-identity with small parameters in order to facilitate better optimization in practice.

Standard image High-resolution image

For the cases of the structured and more practical target MPS, i.e. Heisenberg MPS and the BAS MPS, our selected decomposition protocol Iter$[D_iO_{all}]$ outperforms all other methods with approximately $25\%$ and approximately $50\%$ of unitary updates, respectively. This also implies that the resulting circuit depth at equal fidelity is between 4 and 6, instead of 8.

The case of the random MPS is less clear. While all methods achieve approximately equal performance at the end of the computational budget, the intermediate infidelities favor the protocols where one starts out the optimization with all 8 layers, rather than sequentially adding them. As seen in figure 4 however, for every number of optimization sweeps, the protocols perform approximately the same at the end, and the discrepancy at intermediate stages can be understood as an artifact of the sequential circuit growing.

Appendix B: Entanglement entropy

Although our results have focused on the infidelity of a state generated by a quantum circuit with a fixed target MPS, another useful lens for understanding the decomposition process is offered by entanglement. When applied to the target MPS, the (adjoint of the) circuit produced at each step of a decomposition protocol should lead to a state that is less entangled than the original, in a way that can be quantified by the entanglement entropy (EE),

Equation (B1)

where λi is the entanglement spectrum (here the singular values) of the resulting state across a given bond. Beyond its intrinsic meaning, the exponential of the EE also sets a lower bound on the bond dimension needed for an MPS to losslessly encode the given state.

Figure 6 shows the evolution of the EE across the middle bond of the partially-decomposed MPS during the optimization procedure used in various decomposition methods. Although this is an incomplete picture of the entanglement of the state, the similarity between figure 6 and the infidelity plots of figure 5 is immediate, and suggests that minimizing the infidelity between the reconstructed and target state achieves the same end as directly minimizing the entanglement of the partially-decomposed target state.

Figure 6.

Figure 6. Representative example of the entanglement entropy (EE) across the middle bond of the partially-decomposed target MPS state, viewed as a function of the number of unitary updates. A decrease in EE is seen in each case, in a manner which is strongly correlated with the decrease in infidelity shown in figure 5. The EE of the target MPS can be seen approximately in the initial value of the Iter$[I_i O_{all}]$ curves (orange), since this protocol initially applies a single layer of near-identity gates to the MPS state. While protocols that incorporate analytic decomposition methods tend to decrease the EE before any subsequent optimization, methods using a random gate initialization (such as Oall ) lead to an initial increase in the EE, which must be reduced by subsequent optimization steps. Such an increase in EE consequently results in an increase in bond dimension, which by itself can render such approaches impractical.

Standard image High-resolution image

Figure 7 broadens this investigation by depicting the EE across all bonds of the partially-decomposed target MPS throughout the decomposition process. Although this behavior largely follows the one seen in figure 6, the reader will notice an asymmetry of some EE plots for the BAS dataset under a spatial inversion (reversal) of the sites of the MPS, wherein the leftmost sites are more disentangled than the rightmost ones. This behavior is not due to any asymmetry in the BAS dataset itself (which is unchanged under spatial inversion), but most likely due to the use of the analytic decomposition method, which uses a series of MPS truncations proceeding from left to right. The initial truncations are most effective in reducing the entanglement across the associated bond, whereas subsequent truncation steps are based on the partially-truncated state, and are therefore less effective at reducing entanglement.

Figure 7.

Figure 7. Representative example of the entanglement entropy on all bonds (y-axis) as a function of the number of unitary updates (x-axis). Each column of the subplots represents a different decomposition protocol. Three different MPS are shown which are the $4\times3$ lattice Heisenberg model ground state, the $4\times 3$ Bars and Stripes MPS, and a random MPS. Note that the heatmap is colored according to the logarithm of the entanglement entropy.

Standard image High-resolution image

We note finally that EE holds promise as a useful a priori measure of the difficulty for state preparation, and the results above lend some limited support to this. In particular, figure 6 shows the EE of the target state to be higher for the MPS derived from the BAS dataset than for the other cases, and this is furthermore the state which certain decomposition protocols (such as Oall and Iter$[I_i O_{all}]$ ) struggle the most with. Nonetheless, the relative difficulty of all decomposition protocols to learn the random MPS, whose EE starts out lower, hints at the limitations of such single-number measures of entanglement. In general, we suspect the difficulty of state preparation through such decomposition protocols to be a more complicated question, whose resolution must also take into consideration qualitative factors, such as the topology of the correlations within the target state.

Appendix C: Unitaries & layers—notation and convention

In this work, we use the notation U for general U(4) unitaries acting on two qubits. A collection of unitaries $\{U_{m}\}_m$ will be indexed by the index m.

Linear layers of unitaries arranged in a stair-case or next-neighbor topology (see for example the layout in figures 1& 2) is denoted as L$[U]$. A collection of layers $\{\text{L}[U]^{(k)}\}_k$ will be indexed by the index k.

Whenever unitaries or layers of unitaries are applied to a state ψ, we will be indexing the order according to the application to the ket state $| \psi \rangle$.

In particular, in section 3.2 we apply unitaries via

Equation (C1)

and analogous

Equation (C2)

In the case of layers of unitaries, like in section 3.1, we follow

Equation (C3)

and analogous

Equation (C4)

where the order of unitaries in side each layer is reversed, and all unitaries are conjugated and transposed.

Please wait… references are loading.