research-article

Public Access

On Exponential-time Hypotheses, Derandomization, and Circuit Lower Bounds

Authors:
Lijie Chen

Miller Institute for Basic Research in Science at University of California, Berkeley, USA

Miller Institute for Basic Research in Science at University of California, Berkeley, USA

0000-0002-6084-4729
View Profile

,
Ron D. Rothblum

Technion, Israel

Technion, Israel

0000-0001-5481-7276
View Profile

,
Roei Tell

Institute for Advanced Study and DIMACS, USA

Institute for Advanced Study and DIMACS, USA

0000-0002-9693-9244
View Profile

,
Eylon Yogev

Bar-Ilan University, Israel

Bar-Ilan University, Israel

0000-0001-8599-2472
View Profile

Authors Info & Claims

Journal of the ACM Volume 70 Issue 4Article No.: 25pp 1–62https://doi.org/10.1145/3593581

Published:20 April 2023Publication History

Journal of the ACM

Abstract

The Exponential-Time Hypothesis (ETH) is a strengthening of the 𝒫 ≠ 𝒩𝒫 conjecture, stating that 3-SAT on n variables cannot be solved in (uniform) time 2^εċn, for some ε > 0. In recent years, analogous hypotheses that are “exponentially strong” forms of other classical complexity conjectures (such as 𝒩𝒫⊈ ℬ𝒫𝒫 or co𝒩𝒫⊈𝒩𝒫) have also been introduced and have become widely influential.

In this work, we focus on the interaction of exponential-time hypotheses with the fundamental and closely related questions of derandomization and circuit lower bounds. We show that even relatively mild variants of exponential-time hypotheses have far-reaching implications to derandomization, circuit lower bounds, and the connections between the two. Specifically, we prove that:

(1)	The Randomized Exponential-Time Hypothesis (rETH) implies that ℬ𝒫𝒫 can be simulated on “average-case” in deterministic (nearly-)polynomial-time (i.e., in time 2^Õ(log(n)) = n^{loglog(n)^O(1)}). The derandomization relies on a conditional construction of a pseudorandom generator with near-exponential stretch (i.e., with seed length Õ(log (n))); this significantly improves the state-of-the-art in uniform “hardness-to-randomness” results, which previously only yielded pseudorandom generators with sub-exponential stretch from such hypotheses.
(2)	The Non-Deterministic Exponential-Time Hypothesis (NETH) implies that derandomization of ℬ𝒫𝒫 is completely equivalent to circuit lower bounds against ℰ, and in particular that pseudorandom generators are necessary for derandomization. In fact, we show that the foregoing equivalence follows from a very weak version of NETH, and we also show that this very weak version is necessary to prove a slightly stronger conclusion that we deduce from it.

Last, we show that disproving certain exponential-time hypotheses requires proving breakthrough circuit lower bounds. In particular, if CircuitSAT for circuits over n bits of size poly(n) can be solved by probabilistic algorithms in time 2^n/polylog(n), then ℬ𝒫ℰ does not have circuits of quasilinear size.

1 INTRODUCTION

The Exponential-Time Hypothesis (\(\mathbf {ETH}\)), introduced by Impagliazzo and Paturi [31] (and refined in [32]), conjectures that \(3\text{-}\mathtt {SAT}\) with n variables and \(m=O(n)\) clauses cannot be deterministically solved in time less than \(2^{\epsilon \cdot n}\) (for a constant \(\epsilon =\epsilon _{m/n}\gt 0\)). The \(\mathsf {ETH}\) may be viewed as an “exponentially strong” version of \(\mathcal {P}\ne \mathcal {NP}\), since it conjectures that a specific \(\mathcal {NP}\)-complete problem requires essentially exponential time to solve.

Since the introduction of \(\mathsf {ETH}\) many related variants, which are also “exponentially strong” versions of classical complexity-theoretic conjectures, have also been introduced. For example, the Randomized Exponential-Time Hypothesis (\(\mathbf {rETH}\)), introduced in [15], conjectures that the same lower bound holds also for probabilistic algorithms (i.e., it is a strong version of \(\mathcal {NP}\not\subseteq \mathcal {BPP}\)). The Non-Deterministic Exponential-Time Hypothesis (\(\mathbf {NETH}\)), introduced (implicitly) in [7], conjectures that \(co\text{-}3\mathtt {SAT}\) (with n variables and \(O(n)\) clauses) cannot be solved by non-deterministic machines running in time \(2^{\epsilon \cdot n}\) for some constant \(\epsilon \gt 0\) (i.e., it is a strong version of \(co\mathcal {NP}\not\subseteq \mathcal {NP}\)). The variations \(\mathsf {MAETH}\) and \(\mathsf {AMETH}\) are defined analogously (see [61]¹), and other variations conjecture similar lower bounds for seemingly harder problems (e.g., for \(\#3\mathtt {SAT}\); see [15]).

These Exponential-Time Hypotheses have been widely influential across different areas of complexity theory. Among the numerous fields to which they were applied so far are structural complexity (i.e., showing classes of problems that, conditioned on exponential-time hypotheses, are “exponentially hard”), parametrized complexity, communication complexity, and fine-grained complexity; see, e.g., the surveys cited in [40, 62, 63, 64].

Exponential-time hypotheses focus on conjectured lower bounds for uniform algorithms. Two other fundamental questions in theoretical computer science are those of derandomization, which refers to the power of probabilistic algorithms; and of circuit lower bounds, which refers to the power of non-uniform circuits. Despite the central place of all three questions, the interactions of exponential-time hypotheses with derandomization and circuit lower bounds have yet to be systematically studied.

1.1 Our Results: Bird’s Eye

In this work, we focus on the interactions between exponential-time hypotheses, derandomization, and circuit lower bounds. In a nutshell, our main contribution is showing that:

Even relatively mild variants of exponential-time hypotheses have far-reaching consequences for derandomization and circuit lower bounds.

Let us now give a brief overview of our specific results before describing them in more detail in Sections 1.2, 1.3, and 1.4. Our two main results are the following:

(1)	We show that \(\mathsf {rETH}\) implies a nearly-polynomial-time average-case derandomization of \(\mathcal {BPP}\). Specifically, assuming \(\mathsf {rETH}\), we show that \(\mathcal {BPP}\) can be decided, in average-case and on infinitely many input lengths, by deterministic algorithms that run in time \(n^{\mathrm{loglog}(n)^{O(1)}}\) (see Theorem 1.1). This significantly improves the state-of-the-art in the long line of uniform “hardness-to-randomness” results.
(2)	A classical open question is whether worst-case derandomization of \(\mathcal {BPP}\) requires pseudorandom generators. We show that a weak version of \(\mathsf {NETH}\) yields a positive answer to this question; specifically, it suffices to assume that \(\mathcal {E}=\mathcal {DTIME}[2^{O(n)}]\) is hard for small circuits that are uniformly generated by non-deterministic machines (see Section 1.3). This indicates that the answer to the classical question might be positive and suggests a path towards proving so.

Last, we show that disproving a conjecture similar to \(\mathsf {rETH}\) requires proving breakthrough circuit lower bounds (see Theorem 1.7, and see the discussion in Section 1.4 for a comparison with the state-of-the-art).

Relation to Strong Exponential Time Hypotheses. The exponential-time hypotheses that we consider also have “strong” variants that conjecture a lower bound of \(2^{(1-\epsilon)\cdot n}\), where \(\epsilon \gt 0\) is arbitrarily small, for solving a corresponding problem (e.g., for solving \(\mathtt {SAT}\), \(co\mathtt {SAT}\), or \(\#\mathtt {SAT}\); see, e.g., [63]).² In this article, we focus only on the “non-strong” variants that conjecture lower bounds of \(2^{\epsilon \cdot n}\) for some \(\epsilon \gt 0\). Indeed, the point is that even the variants that we consider already have far-reaching consequences for derandomization and circuit lower bounds.

We mention that a recent work of Carmosino, Impagliazzo, and Sabin [8] studied the implications of hypotheses in fine-grained complexity on derandomization. These fine-grained hypotheses are implied by the “strong” version of rETH (i.e., by \(\mathsf {rSETH}\)), but are not known to follow from the “non-strong” versions that we consider in this article. We will refer again to their results in Section 1.2.

1.2 rETH and Pseudorandom Generators for Uniform Circuits

The first hypothesis that we study is rETH, which (slightly changing notation from above) asserts that probabilistic algorithms cannot decide if a given \(3\text{-}\mathtt {SAT}\) formula with v variables and \(O(v)\) clauses is satisfiable in time less than \(2^{\epsilon \cdot v}\), for some constant \(\epsilon \gt 0\). Note that such a formula can be represented with \(n=O(v\cdot \log (v))\) bits, and therefore the conjectured lower bound as a function of the input length is \(2^{\epsilon \cdot (n/\log (n))}\).

1.2.1 Background: Uniform Hardness vs. Randomness.

Intuitively, using “hardness-to-randomness” results, we expect that a strong lower bound such as rETH would imply a strong derandomization result. When starting from lower bounds for non-uniform circuits, and aiming to deduce worst-case derandomization, smooth tradeoffs that yield such results are well-known (see, e.g., [34, 44, 50, 54, 57]) The key problem, however, is that the long line of works that starts from hardness for uniform algorithms (and aims to deduce average-case derandomization) did not yield such smooth tradeoffs so far (see [6, 8, 20, 26, 27, 33, 35, 41, 51, 56]).

Ideally, given an exponential lower bound for uniform probabilistic algorithms (such as \(\mathcal {E}\not\subseteq \mathtt {i.o.}\mathcal {BPTIME}[2^{\epsilon \cdot n}]\)),³ we would like to deduce that there exists a PRG with exponential stretch for uniform circuits, and consequently that \(\mathcal {BPP}=\mathcal {P}\) in “average-case.”⁴ However, prior to the current work, the state-of-the-art (by Trevisan and Vadhan [56]) could at best yield PRGs with sub-exponential stretch (i.e., with seed length \(\mathrm{polylog}(n)\)), even if the hypothesis refers to an exponential lower bound. Moreover, the best currently known PRG only works on infinitely many input lengths.

Previous works bypassed these two obstacles in various indirect ways. Carmosino, Impagliazzo, and Sabin [8] deduced polynomial-time derandomization of \(\mathcal {BPP}\) on all input lengths relying on strong hypotheses from fine-grained complexity (these hypotheses are implied by the “strong” version of rETH, i.e., by \(\mathsf {rSETH}\)). Gutfreund and Vadhan [27] deduced (subexponential-time) derandomization of \(\mathcal {RP}\) on all input lengths, rather than of \(\mathcal {BPP}\) (see details below). Last, a line of works dealing with uniform “hardness-to-randomness” for \(\mathcal {AM}\) (rather than for \(\mathcal {BPP}\)) was able to bypass both obstacles in this context (see, e.g., [26, 41, 51]).⁵

1.2.2 Our Contribution to Uniform Hardness vs. Randomness.

In this work, we tackle both obstacles directly. Loosely speaking, our first main result is that rETH implies the existence of a PRG for uniform circuits with near-exponential stretch, which can be used for average-case derandomization of \(\mathcal {BPP}\) in nearly-polynomial-time. Specifically, the PRG that we construct has seed length \(\tilde{O}(\log (n)\), and the corresponding derandomization runs in time \(2^{\tilde{O}(\log (n))}=n^{\mathrm{loglog}(n)^{O(1)}}\).

Our hardness assumption will in fact be weaker than rETH: It suffices to assume that the Totally Quantified Boolean Formula (\(\mathbf {TQBF}\)) problem cannot be solved by probabilistic algorithms that run in time \(2^{n/\mathrm{polylog}(n)}\) (see Definition 4.6 for a standard definition of \(\mathtt {TQBF}\)). This hypothesis is weaker than rETH, because \(3\text{-}\mathtt {SAT}\) reduces to \(\mathtt {TQBF}\) with a linear overhead in the input length. (Indeed, it is a far weaker hypothesis, since \(\mathtt {TQBF}\) is \(\mathcal {PSPACE}\)-complete, whereas \(3\text{-}\mathtt {SAT}\) is only \(\mathcal {NP}\)-complete.)

Theorem 1.1

(rETH → PRG with Almost-exponential Stretch for Uniform Circuits; Informal)

Suppose that there exists \(T(n)=2^{n/\mathrm{polylog}(n)}\) such that \(\mathtt {TQBF}\notin \mathcal {BPTIME}[T]\). Then, there exists a PRG that has seed length \(\widetilde{O}(\log (n))\), runs in time \(n^{\mathrm{polyloglog}(n)}\), and is \((1/n)\)-pseudorandom on infinitely many input lengths for every distribution over circuits that can be sampled in polynomial time.

The technical statement of Theorem 1.1 is even stronger: For every \(t(n)=n^{\mathrm{polyloglog}(n)}\), the PRG is \((1/t)\)-pseudorandom for every distribution over circuits that can be sampled in time t and with \(O(\log (t))\) bits of advice (see Theorem 4.14 for details).

Theorem 1.1 establishes for the first time that hardness assumptions for \(\mathcal {BPTIME}\) yield a PRG for uniform circuits with seed length as short as \(\tilde{O}(\log (n))\) and running time as small as \(2^{\tilde{O}(\log (n))}\). The proof of this result is based on careful refinements of the proof framework of [33], using new technical tools that we construct. The latter tools significantly refine and strengthen the technical tools that were used by [56] to obtain the previously best uniform hardness-to-randomness tradeoff. For high-level overviews of the proof of Theorem 1.1 (and of the new constructions), see Section 2.1.

Overcoming the “infinitely-often” barrier. The hypothesis in Theorem 1.1 is that any probabilistic algorithm that runs in time \(2^{n/\mathrm{polylog}(n)}\) fails to compute \(\mathtt {TQBF}\) infinitely-often, and the corresponding conclusion is that the PRG “fools” uniform circuits only infinitely-often. (The meaning of “infinitely-often” is “on infinitely many input lengths,” and the meaning of “almost-always” that will be used next is “on all but finitely many input lengths.” Recall that a hypothesis of the form \(L\notin \mathcal {BPTIME}[T]\) only means that every probabilistic time-T algorithm fails to compute L infinitely-often.)

The shortcoming of Theorem 1.1 that the derandomization works only infinitely-often is identical to all previous uniform “hardness-to-randomness” results that used the [33] proof framework.⁶ \(^{,}\)⁷ However, known techniques (see, e.g., [27]) can nevertheless be adapted to yield an almost-always PRG that uses \(O(\log (n))\) bits of non-uniform advice (relying on an almost-always lower bound hypothesis).

We are able to significantly improve this: Assuming the “almost-always” version of rETH, we show that \(\mathcal {BPP}\) can be derandomized in average-case and almost-always, using only a triply logarithmic number (i.e., \(O(\mathrm{logloglog}(n))\)) of advice bits. In fact, as in Theorem 1.1, it suffices to assume hardness for \(\mathtt {TQBF}\), rather than for \(3\text{-}\mathtt {SAT}\).

Theorem 1.2

(aa-rETH → Almost-always Derandomization in Timen^{polyloglog(n)}; Informal)

Assume that for some \(T(n)=2^{n/\mathrm{polylog}(n)}\) it holds that \(\mathtt {TQBF}\notin \mathtt {i.o.}\mathcal {BPTIME}[T]\), and let \(t(n)=n^{\mathrm{polyloglog}(n)}\). Then, for every \(L\in \mathcal {BPTIME}[t]\) and every distribution ensemble \(\mathcal {X}\) that can be sampled in polynomial time, there exists a deterministic algorithm \(D=D_{\mathcal {X}}\) that runs in time \(n^{\mathrm{polyloglog}(n)}\) and uses \(O(\mathrm{logloglog}(n))\) bits of non-uniform advice such that for almost all input lengths \(n\in \mathbb {N}\) it holds that \(\Pr _{x\sim \mathcal {X}_n}[D(x)\ne L(x)]\lt 1/n\).

Similarly to Theorem 1.2, the conclusion in Theorem 1.2 can be strengthened so it holds for every distribution \(\mathcal {X}\) samplable in time \(t(n)=n^{\mathrm{polyloglog}(n)}\), and the derandomization succeeds on all but a \((1/t)\)-fraction of the inputs under \(\mathcal {X}\) (rather than only on a \(1-1/n\) fraction).

Remark 1.3

(Non-deterministic Extensions).

We note that “scaled-up” versions of Theorems 1.1 and 1.2 for non-deterministic settings follow easily from known results; that is, assuming lower bounds for non-deterministic uniform algorithms, we can deduce strong derandomization of corresponding non-deterministic classes. First, from the hypothesis \(\mathsf {MAETH}\),⁸ we can deduce strong circuit lower bounds, and hence also worst-case derandomization of \(pr\mathcal {BPP}\) and of \(pr\mathcal {MA}\) (see Appendix A for details and for a related result). Similarly, as shown by Gutfreund, Shaltiel, and Ta-Shma [26], a suitable variant of \(\mathsf {AMETH}\) implies an average-case derandomization of \(\mathcal {AM}\).

1.3 NETH and an Equivalence of Derandomization and Circuit Lower Bounds

In the previous section, we considered the hypothesis rETH, and now we consider the Non-Deterministic Exponential-Time Hypothesis (\(\mathbf {NETH}\)), which asserts that \(co\text{-}3\mathtt {SAT}\) (with n variables and \(O(n)\) clauses) cannot be solved by non-deterministic machines running in time \(2^{\epsilon \cdot n}\) for some \(\epsilon \gt 0\). This hypothesis is an exponential-time version of \(co\mathcal {NP}\not\subseteq \mathcal {NP}\), and is incomparable to rETH (and weaker than \(\mathsf {MAETH}\)).

1.3.1 Background and a Surprising Observation.

The motivating observation for our results in this section is that NETH has an unexpected consequence to the long-standing question of whether worst-case derandomization of \(pr\mathcal {BPP}\) is equivalent to circuit lower bounds against \(\mathcal {E}\). Specifically, recall that two-way implications between derandomization and circuit lower bounds have been gradually developing since the early ’90s (for surveys, see, e.g., [45, 60]), and that it is a long-standing question whether the foregoing implications can be strengthened to show a complete equivalence between the two. One well-known implication of such an equivalence would be that any worst-case derandomization of \(pr\mathcal {BPP}\) necessitates the construction of PRGs that “fool” non-uniform circuits.⁹

Then, being more concrete, the motivating observation for our results in this section is that NETH implies an affirmative answer to the foregoing classical question. In fact, this is not difficult to show, relying on known results (see Section 2.2 for details).

1.3.2 Our Results: Even Very Weak Forms of NETH Suffice for the Equivalence.

Our main contribution is in showing that, loosely speaking, even a very weak form of NETH suffices to answer the question of equivalence in the affirmative, and that this weak form of NETH is in some sense inherent. Specifically, we say that \(L\subseteq \lbrace 0,1\rbrace ^*\) has \(\mathcal {NTIME}[T]\)-uniform circuits if there exists a non-deterministic machine M that gets input \(1^n\), runs in time \(T(n)\), and satisfies the following: For some non-deterministic choices M outputs a single circuit \(C:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{}\) that decides L on all inputs \(x\in \lbrace 0,1\rbrace ^n\), and whenever M does not output such a circuit, it outputs \(\perp\). We also quantify the size of the output circuit, when this size is smaller than \(T(n)\).

The weak forms of NETH that will suffice to show equivalences between derandomization and circuit lower bounds are of the form “\(\mathcal {E}\) does not have \(\mathcal {NTIME}[T]\)-uniform circuits of size \(S(n)\ll T(n)\),” for values of T and S that will be specified below. In words, this hypothesis rules out a world in which every \(L\in \mathcal {E}\) can be computed by small circuits that can be efficiently produced by a uniform (non-deterministic) machine. Indeed, this hypothesis is weaker than the NETH-style hypothesis \(\mathcal {E}\not\subseteq \mathcal {NTIME}[T]\), and even than the hypothesis \(\mathcal {E}\not\subseteq (\mathcal {NTIME}[T]\cap \mathcal {SIZE}[T])\).¹⁰ The fact that such a weak hypothesis suffices to deduce that derandomization and circuit lower bounds are equivalent can be seen as appealing evidence that the equivalence indeed holds.

Our results refer both to the “low-end” parameter regime, which connects relatively weak circuit lower bounds to relatively slow derandomization algorithms, and to the “high-end” parameter regime, which connects strong circuit lower bounds to fast derandomizatoin algorithms. Showing an equivalence in the former regime will require weaker hypothesis, compared to the latter regime.

Starting with the “low-end” regime, our first result is that if \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of polynomial size (for some \(\delta \gt 0\)), then derandomization of \(pr\mathcal {BPP}\) in sub-exponential time is equivalent to lower bounds for polynomial-sized circuits against \(\mathcal {EXP}\).

Theorem 1.4

(NETH → Circuit Lower Bounds are Equivalent to Derandomization; “Low-end” Setting)

Assume that there exists \(\delta \gt 0\) such that \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of arbitrary polynomial size, even infinitely-often. Then, \(\begin{align*} pr\mathcal {BPP}\subseteq \mathtt {i.o.}pr\mathcal {SUBEXP} \iff \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\;\text{.} \end{align*}\)

The scaling of Theorem 1.4 to the “high-end” regime us not smooth and uses different proof techniques (see Section 5 for details). Nevertheless, an analogous result holds for the extreme “high-end” setting: Under the stronger hypothesis that \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{\Omega (n)}]\)-uniform circuits, we show that \(pr\mathcal {BPP}=pr\mathcal {P}\) is equivalent to lower bounds for exponential-sized circuits against \(\mathcal {E}\); that is:

Theorem 1.5

(NETH → Circuit Lower Bounds are Equivalent to Derandomization; “High-end” Setting)

Assume that there exists \(\delta \gt 0\) such that \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{\delta \cdot n}]\)-uniform circuits, even infinitely-often. Then: \(\begin{align*} pr\mathcal {BPP}=pr\mathcal {P}&\iff \exists \epsilon \gt 0:\mathcal {DTIME}[2^n]\not\subset \mathtt {i.o.}\mathcal {SIZE}[2^{\epsilon \cdot n}] \;\text{.} \end{align*}\)

(We remind the reader again that circuit lower bounds as in Theorems 1.4 and 1.5 are known to be equivalent to the existence of corresponding PRGs that fool non-uniform circuits [3, 34, 44, 54, 57]. Thus, the hypotheses in these theorems imply that derandomization requires PRGs.)

The very weak version of NETH is inherent (for a stronger conclusion that it yields). Remarkably, as mentioned above, hypotheses such as the ones in Theorems 1.4 and 1.5 actually yield a stronger conclusion and are also necessary for that stronger conclusion. Specifically, the stronger conclusion is that even non-deterministic derandomization of \(pr\mathcal {BPP}\) (such as \(pr\mathcal {BPP}\subseteq pr\mathcal {NSUBEXP}\)) yields circuit lower bounds against \(\mathcal {E}\), which in turn yield PRGs for non-uniform circuits.

Theorem 1.6

(NTIME-uniform Circuits for ℰ, Non-deterministic Derandomization, and Circuit Lower Bounds)

Assume that there exists \(\delta \gt 0\) such that \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of arbitrary polynomial size. Then, (1.1) \(\begin{equation} pr\mathcal {BPP}\subseteq pr\mathcal {NSUBEXP} \Longrightarrow \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\;\text{.} \end{equation}\) In the other direction, if Equation (1.1) holds, then \(\mathcal {E}\) cannot be decided by \(\mathcal {NP}\)-uniform circuits.

Note that in Theorem 1.6 there is a gap between the hypothesis that implies Equation (1.1) and the conclusion from Equation (1.1). Specifically, the hypothesis refers to \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of polynomial size, whereas the conclusion refers to \(\mathcal {NP}\)-uniform circuits. By optimizing the parameters, this gap between sub-exponential and polynomial can be considerably narrowed (see Theorem 5.11).

1.4 Disproving a Version of rETH Requires Circuit Lower Bounds

Our last main result is that disproving a weak version of rETH requires breakthrough circuit lower bounds. Recall that rETH assumes hardness of the form \(2^{\epsilon \cdot n}\) for solving \(3\text{-}\mathtt {SAT}\) for n-bit formulas; thus, disproving rETH means constructing a probabilistic algorithm that solves \(3\text{-}\mathtt {SAT}\) for n-bit formulas in time \(2^{\epsilon \cdot n}\).

We consider the stronger assumption, that the problem \(\mathsf {CircuitSAT}\) for n-bit circuits can be solved in probabilistic time \(2^{n/\mathrm{polylog}(n)}\). (Recall that, in \(\mathsf {CircuitSAT}\), we want to solve satisfiability for a given general Boolean circuit, rather than for a given depth-two formula as in \(3\text{-}\mathtt {SAT}\).) We show that such an algorithm would yield lower bounds for circuits of quasilinear size against \(\mathcal {BPE}=\mathcal {BPTIME}[2^{O(n)}]\).¹¹

Theorem 1.7

(Circuit Lower Bounds from RandomizedCircuitSATAlgorithms)

For any constant \(c\in \mathbb {N}\) there exists a constant \(c^{\prime }\in \mathbb {N}\) such that the following holds: If \(\mathtt {CircuitSAT}\) for circuits over n variables and of size \(n^{2} \cdot (\log n)^{c^{\prime }}\) can be solved in probabilistic time \(2^{n/(\log n)^{c^{\prime }}}\), then \(\mathcal {BPE}\not\subset \mathcal {SIZE}[n\cdot (\log n)^{c}]\).

Theorem 1.7 can be viewed from another perspective, which reveals that it constitutes progress on a well-known technical challenge. Specifically, we can view Theorem 1.7 as belonging to the family of results asserting that circuit-analysis algorithms imply circuit lower bounds (following Williams [59]). Previous results crucially rely on the hypothesis that the circuit-analysis algorithm is deterministic. It is a well-known challenge to obtain analogous results for randomized algorithms, and indeed Theorem 1.7 is such a result, albeit one that relies on a relatively fast algorithm (see Section 2.3 for further details and for comparison with known results).

Since Theorem 1.1 deduces a conclusion from a weak version of rETH, and Theorem 1.7 deduces a conclusion from the negation of a weak version of rETH, we can combine the two results to obtain a “win-win” statement. This yields the following unconditional Karp-Lipton style result: If \(\mathcal {BPE}\) can be decided by circuits of quasilinear size, then \(\mathcal {BPP}\) can be derandomized, in average-case and infinitely-often, in time \(2^{\tilde{O}(\log (n))}=n^{\mathrm{polyloglog}(n)}\). (See Corollary 6.6 for details and for a precise statement.)

1.5 Open Problems and Subsequent Work

Our work makes significant progress on several long-standing open problems, but by no means did we resolve them completely. Let us mention a few of these problems.

Uniform hardness vs. randomness. As mentioned in Section 1.2, the goal in this classical line of work is to deduce smooth tradeoffs between average-case derandomization and hardness for uniform probabilistic algorithms (which mirror the known tradeoffs between worst-case derandomization and hardness for non-uniform circuits).

The main open problem is to deduce polynomial-time derandomization from the existence of a hard function computable in exponential time (rather than in linear space as in Theorems 1.1 and 1.2); that is:

Open Problem 1.

Deduce average-case derandomization of \(\mathcal {BPP}\) that runs in polynomial time from the existence of a function in \(\mathcal {E}=\mathcal {DTIME}[2^{O(n)}]\) that is hard for uniform probabilistic algorithms.

Progress on the foregoing problem was recently made in a work by three of the current authors [12]. They deduced average-case derandomization of \(\mathcal {RP}\) that runs in polynomial time from the existence of a function computable by logspace-uniform circuits of size \(2^{O(n)}\) and depth \(2^{o(n)}\) that is hard for \(\mathcal {BPTIME}[2^{\epsilon \cdot n}]\) (for an arbitrary constant \(\epsilon \gt 0\)).

Theorem 1.2 (as well as another result in aforementioned work [12]) deduced derandomization of \(\mathcal {BPP}\) on all input lengths that relies on a small number of bits of non-uniform advice. A second open problem is to deduce such derandomization without relying on non-uniform advice:

Open Problem 2.

Deduce fast derandomization of \(\mathcal {BPP}\) (ideally, polynomial time) that works for all input lengths and does not rely on any non-uniform advice, from the existence of a function in \(\mathcal {DSPACE}[O(n)]\) (or, better yet, in \(\mathcal {E}\)) that is hard for uniform probabilistic algorithms.

In a different direction, a subsequent work by two of the authors [13] showed worst-case derandomization from strong hardness assumptions for uniform probabilistic algorithms (namely, from the existence of a function f in \(\mathcal {P}\) such that every probabilistic algorithm running in a certain fixed polynomial time fails to compute f on each and every sufficiently large input). A follow-up work by Liu and Pass [39] showed an equivalence between worst-case derandomization and a similar (albeit more complicated) hardness assumption for conditional time-bounded Kolmogorov complexity.

Derandomization vs. circuit lower bounds. As mentioned in Section 1.3, it is a classical question whether derandomization of \(pr\mathcal {BPP}\) requires the circuit lower bounds in \(\mathcal {E}=\mathcal {DTIME}[2^{O(n)}]\) that are known to imply it. The conditional results in Theorems 1.4 and 1.5 suggest that the answer may be positive, yet proving unconditional results is still a major open problem.

Open Problem 3.

Show the implication \(pr\mathcal {BPP}=pr\mathcal {P}\Longrightarrow \mathcal {E}\not\subset \mathcal {P}/\mathrm{poly}\).

Interestingly, while the foregoing problem has been open for decades, we are not aware of any significant barriers towards solving it.

2 TECHNICAL OVERVIEW

In this section, we describe the proofs of our main results, in high level. In Section 2.1, we describe the proofs of Theorems 1.1 and 1.2; in Section 2.2, we describe the proofs of Theorems 1.4, 1.5, and 1.6; and in Section 2.3, we describe the proof of Theorem 1.7, which relies on the proofs from Section 2.1.

2.1 Near-optimal Uniform Hardness-to-randomness Results for TQBF

Recall that in typical “hardness-to-randomness” results, a PRG is based on a hard function, and the proof amounts to showing that an efficient distinguisher for the PRG can be transformed to an efficient algorithm or circuit that computes the hard function.

In high level, our proof strategy follows this paradigm and relies on the classic approach of Impagliazzo and Wigderson [33] for transforming a distinguisher into an algorithm for the hard function. Loosely speaking, the latter approach works only when the hard function \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^{*}\rightarrow \lbrace 0,1\rbrace ^{*}\) is well-structured; the precise meaning of the term “well-structured” differs across different follow-up works, and in the current work it will also take on a new meaning, but for now let us intuitively think of \(f^{\mathtt {ws}}\) as downward self-reducible and as having properties akin to random self-reducibility. Instantiating the Nisan-Wigderson PRG with a suitable encoding \(\mathtt {ECC}(f^{\mathtt {ws}})\) of \(f^{\mathtt {ws}}\) as the underlying function (again, the precise requirements from \(\mathtt {ECC}\) differ across works), our goal is to show that if the PRG with stretch \(t(n)\) does not “fool” uniform distinguishers even infinitely-often, then \(f^{\mathtt {ws}}\) is computable in probabilistic time \(t^{\prime }(n)\gt t(n)\).

The key challenge underlying this approach is the significant overheads in the proof, which increase the time complexity \(t^{\prime }\) of computing \(f^{\mathtt {ws}}\). In the original proof of [33] this time was roughly \(t^{\prime }(n)\approx t(t(n))\), and the state-of-the-art prior to the current work, by Trevisan and Vadhan [56] (following [6]), yielded \(t^{\prime }(n)=\mathrm{poly}(t(\mathrm{poly}(n)))\). Since the relevant functions \(f^{\mathtt {ws}}\) in all works are computable in \(\mathcal {E}\), proofs with such an overhead can yield at most a sub-exponential stretch \(t(n)=2^{n^{\Omega (1)}}\).

As mentioned in Section 1.2, previous works bypassed this difficulty either by using stronger hypotheses, or by deducing weaker conclusions, or by working in different contexts (e.g., considering derandomization of \(\mathcal {AM}\) rather than of \(\mathcal {BPP}\)). In contrast, we tackle this difficulty directly and manage to reduce all of the polynomial overheads in the input length to polylogarithmic overheads in the input length. That is, we will show that for carefully constructed \(f^{\mathtt {ws}}\) and suitably chosen \(\mathtt {ECC}\) (and with some variations in the proof approach), if the PRG instantiated with \(\mathtt {ECC}(f^{\mathtt {ws}})\) for stretch t does not “fool” uniform distinguishers infinitely-often, then \(f^{\mathtt {ws}}\) can be computed in time \(t^{\prime }(n)=t(\widetilde{O}(n))^{O(1)}\).

2.1.1 The Well-structured Function f^ws.

Let us now be more specific about the properties of the well-structured function \(f^{\mathtt {ws}}\) that we need in our proof. Our function \(f^{\mathtt {ws}}\) will satisfy the following:

(1)	(Very efficient \(\mathcal {PSPACE}\)-completeness:) The \(\mathcal {PSPACE}\)-complete problem \(\mathtt {TQBF}\) is reducible to \(f^{\mathtt {ws}}\) in quasilinear time, and \(f^{\mathtt {ws}}\) is computable in linear space.¹²
(2)	(Not too inefficient downward self-reducibility:) The function \(f^{\mathtt {ws}}\) is downward self-reducible in time \(2^{n/\mathrm{polylog}(n)}\) (see Definition 4.1 for a standard definition).
(3)	(A strengthening of random self-reducibility:) The function \(f^{\mathtt {ws}}\) is sample-aided worst-case to \(\delta\)-average-case reducible, for \(\delta (n)=2^{-n/\mathrm{polylog}(n)}\).

The last property, which is implicit in many works and was recently made explicit by Goldreich and G. Rothblum [23], asserts the following: There exists a uniform algorithm T that gets as input a circuit \(C:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{*}\) that agrees with \(f^{\mathtt {ws}}_n\) on at least \(\delta (n)\) of the inputs, and labeled examples \((x,f^{\mathtt {ws}}(x))\) where \(x\in \lbrace 0,1\rbrace ^{n}\) is uniformly chosen, runs in time \(2^{n/\mathrm{poly}\log (n)}\) and with high probability outputs a circuit \(C^{\prime }:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{*}\) that computes \(f^{\mathtt {ws}}_n\) on all inputs (see Definition 4.2).

(Our construction of \(f^{\mathtt {ws}}\) will also satisfy an additional property, which will only be used in the proof of Theorem 1.2 (i.e., of the “almost-always” version of the result). We will describe this property in the proof outline for Theorem 1.2 below.)

The construction of \(f^{\mathtt {ws}}\). Let us now explain how we construct \(f^{\mathtt {ws}}\). Following Trevisan and Vadhan [56], our \(f^{\mathtt {ws}}\) is an artificial \(\mathcal {PSPACE}\)-complete problem that we carefully construct. Their goal was to construct a \(\mathcal {PSPACE}\)-complete problem that will be simultaneously downward self-reducible and randomly self-reducible. Our goal will be to obtain a construction with stronger completeness and random self-reducibility properties while compromising on a slower downward self-reducibility algorithm (as detailed above). In a gist, we do so by drastically improving the efficiency of parts of their construction; details follow.

The construction in [56] is based on the proof of \(\mathcal {IP}=\mathcal {PSPACE}\) [42, 52]. Recall that the latter proof starts with a given \(3\text{-}\mathtt {SAT}\) formula \(\varphi\), which represents a fully quantified instance for \(\mathtt {TQBF}\) (see Definition 4.6 for the standard definition). The proof then arithmetizes the \(\mathtt {TQBF}\) function on \(\varphi\) by a low-degree polynomial \(P^{(\varphi ,0)}=Q_1\circ Q_2\circ \cdots \circ Q_{\mathrm{poly}(n)} \circ P^{(\varphi)}\), where \(P^{(\varphi)}\) is a standard arithmetization of \(3\text{-}\mathtt {SAT}\), and the \(Q_i\)’s are suitable arithmetic operators (i.e., arithmetizations of the \(\forall\) and of the \(\exists\) operators, as well as an operator that lowers the degree of the intermediary polynomial). Finally, the proof defines a sequence of \(\mathrm{poly}(n)\) polynomials \(P^{(\varphi ,1)},\ldots ,P^{(\varphi ,\mathrm{poly}(n))}\), where for \(i=1,\ldots ,\mathrm{poly}(n)\), the polynomial \(P^{(\varphi ,i)}\) applies one less operator to \(P^{(\varphi)}\), compared to \(P^{(\varphi ,i-1)}\). The crucial observation of [56] is that computing each \(P^{(\varphi ,i)}\) efficiently reduces to computing \(P^{(\varphi ,i-1)}\), and thus this sequence of polynomials already has a property reminiscent to downward self-reducibility (whereas the polynomials are of low degree, and thus compute functions that are random self-reducible).

Loosely speaking, the function from [56] defines, for every integer \(n\in \mathbb {N}\), a corresponding interval \(I_n\) of \(\mathrm{poly}(n)\) input lengths; for simplicity of presentation, let us pretend that this interval is \(I_n=[n,\ldots ,N=\mathrm{poly}(n)]\). At input length \(N=\mathrm{poly}(n)\) the function gets as input a \(3\text{-}\mathtt {SAT}\) formula \(\varphi\) over n variables and outputs \(P^{(\varphi ,0)}\). Then, for \(i\in [\mathrm{poly}(n)]\), at input length \(N-i\), the function gets input \((\varphi ,w)\), where w is a sequence of auxiliary variables, and outputs \(P^{(\varphi ,i)}(w)\). Given the observation mentioned above, this function is downward self-reducible and randomly self-reducible.

Going through their proof (with needed adaptations for our “high-end” parameter setting), we encounter four different polynomial overheads in the input length, when reducing from \(\mathtt {TQBF}\) to their function. The first and obvious one is that inputs of length n are mapped to inputs of length \(N=\mathrm{poly}(n)\), corresponding to the number of rounds in the \(\mathcal {IP}=\mathcal {PSPACE}\) protocol. The other polynomial overheads in the input length come from their reduction of \(\mathtt {TQBF}\) to an intermediate problem that takes both \(\varphi\) and w as part of the input and is still amenable to arithmetization¹³ from the field size that is required for the stronger random self-reducibility property that we need and from the way the \(\mathrm{poly}(n)\) polynomials are combined into a single Boolean function.

The main challenge is to eliminate all of the foregoing overheads simultaneously. Our first main idea is to use an \(\mathcal {IP}=\mathcal {PSPACE}\) protocol with \(\mathrm{polylog}(n)\) rounds instead of \(\mathrm{poly}(n)\) rounds, so the first overhead (i.e., the additive overhead in the input length caused by the number of operators) will be only \(\mathrm{polylog}(n)\) instead of \(\mathrm{poly}(n)\). Indeed, in such a protocol the verification time in each round is high, and therefore our downward self-reducibility algorithm is relatively slow and makes many queries; but we will be able to afford this.

While implementing this idea, we define a different intermediate problem that is both amenable to arithmetization and reducible from \(\mathtt {TQBF}\) in quasilinear time, relying on an efficient Cook-Levin theorem (see Claim 4.7.1); we move to an arithmetic setting that will support the strong random self-reducibility property that we want and arithmetize the intermediate problem in this setting (see Claim 4.7.2); we show how to execute arithmetic operators in a “batch” in this arithmetic setting (see Claim 4.7.3); and we efficiently combine the resulting collection of polynomials into a single Boolean function (see the last part of the proof of Lemma 4.7).

We stress that we are “paying” for all the optimizations above by the fact that the associated algorithms (for downward self-reducibility and for our notion of random self-reducibility) now run in time \(2^{n/\mathrm{polylog}(n)}\), rather than polynomial time; but again, we are able to afford this in our proof.

2.1.2 Instantiating the Reference [33] Proof Framework with the Function f^ws.

Given this construction of \(f^{\mathtt {ws}}\), we now use a variant of the proof framework of Impagliazzo and Wigderson [33], as follows: For simplicity, in this overview, we show how to “fool” polynomial-time distinguishers that do not use advice. (The full technical proof appears in Section 4.2; see the proof of Lemma 4.9.)

Let \(\mathtt {ECC}\) be the Goldreich-Levin [21] (i.e., Hadamard) encoding \(\mathtt {ECC}(f^{\mathtt {ws}})(x,r)=\oplus _{i}f^{\mathtt {ws}}(x)_i\cdot r_i\). Our PRG is the Nisan-Wigderson PRG, instantiated with \(\mathtt {ECC}(f^{\mathtt {ws}})\) as the hard function, and with seed length \(\tilde{O}(\log (n))\). To analyze it, we rely on the well-known “uniform reconstruction” argument of [33] (following [44]), which shows the following: If for input length n there exists a uniform \(\mathrm{poly}(n)\)-time distinguisher A for the PRG, then for input length \(\ell =\widetilde{O}(\log (n))\) there is a weak learner for \(\mathtt {ECC}(f^{\mathtt {ws}})\). That is, there exists an algorithm that gets input \(1^{\ell }\) and oracle access to \(\mathtt {ECC}(f^{\mathtt {ws}})\) on \(\ell\)-bit inputs, runs in time \(\mathrm{poly}(n)\approx 2^{\ell /\mathrm{polylog}(\ell)}\), and outputs a small circuit that agrees with \(\mathtt {ECC}(f^{\mathtt {ws}})\) on approximately \(1/2+1/n^2\approx 1/2+\delta _0(\ell)\) of the \(\ell\)-bit inputs, where \(\delta _0(\ell)=2^{-\ell /\mathrm{polylog}(\ell)}\).

Thus, assuming that there exists a distinguisher for the PRG as above for every \(n\in \mathbb {N}\), we deduce that a weak learner exists for every \(\ell \in \mathbb {N}\). Following the “bootstrapping” idea of [33], we now iteratively construct, for each input length \(i=1,\ldots ,\ell\), a circuit of size \(2^{i/\mathrm{polylog}(i)}\) for \(f^{\mathtt {ws}}_i\). The base case \(i=1\) is trivial. And, in iteration \(i\gt 1\), having already obtained a circuit \(C_{i-1}\) for \(f^{\mathtt {ws}}_{i-1}\), we run the weak learner for \(\mathtt {ECC}(f^{\mathtt {ws}})\) on input length \(2i\) and answer its oracle queries using the downward self-reducibility of \(f^{\mathtt {ws}}\), the circuit \(C_{i-1}\), and the fact that \(\mathtt {ECC}(f^{\mathtt {ws}})_{2i}\) is easily computable given access to \(f^{\mathtt {ws}}_i\).

The weak learner outputs a circuit \(C^{(0)}_i\) of size \(2^{2i/\mathrm{polylog}(2i)}\) that agrees with \(\mathtt {ECC}(f^{\mathtt {ws}})\) on approximately \(1/2+\delta _0(2i)\) of the \(2i\)-bit inputs, and we want to transform it into a circuit that computes \(f^{\mathtt {ws}}\) on all i-bit inputs. To do so, we first use the list-decoding algorithm of Goldreich and Levin [21] to efficiently transform \(C^{(0)}_i\) to a circuit \(C^{(1)}_i\) of similar size that computes \(f^{\mathtt {ws}}\) on a approximately \(\delta (i)=\mathrm{poly}(\delta _0(2i))\) of the i-bit inputs; the algorithm of [21] succeeds only with probability \(\mathrm{poly}(\delta)\), so we run it for \(\mathrm{poly}(1/\delta)\) times and each time test the agreement of the resulting circuit with \(f^{\mathtt {ws}}\), using the circuit, \(C_{i-1}\), and the downward self-reducibility of \(f^{\mathtt {ws}}\).

Our goal now is to transform \(C^{(1)}_i\) into a circuit of similar size that computes \(f^{\mathtt {ws}}\) on all i-bit inputs. Recall that in general, performing such transformations by a uniform algorithm is challenging (intuitively, if the truth-table of \(f^{\mathtt {ws}}\) is a codeword in an error-correcting code, then this task corresponds to uniform list-decoding of a “very corrupt” version of \(f^{\mathtt {ws}}\)). However, in our specific setting, we can produce random labeled samples for \(f^{\mathtt {ws}}\), using its downward self-reducibility and the circuit \(C_{i-1}\). Relying on the sample-aided worst-case to average-case reducibility of \(f^{\mathtt {ws}}\), we can transform \(C^{(1)}_i\) to a circuit \(C_i\) of similar size that computes \(f^{\mathtt {ws}}_i\) on all inputs.

Finally, since \(\mathtt {TQBF}\) is reducible with quasilinear overhead to \(f^{\mathtt {ws}}\), if we can compute \(f^{\mathtt {ws}}\) in time \(2^{n/\mathrm{polylog}(n)}\), then we can compute \(\mathtt {TQBF}\) in such time, a contradiction. This establishes that the generator is indeed pseudorandom, and, since \(f^{\mathtt {ws}}\) is computable in space \(O(\ell)=\tilde{O}(\log (n))\) (and thus in time \(n^{\mathrm{polyloglog}(n)}\)), the pseudorandom generator is also computable in time \(n^{\mathrm{polyloglog}(n)}\).

2.1.3 The “Almost-always” Version: Proof of Theorem 1.2.

We now explain how to adapt the proof above to get an “almost-always” PRG with near-exponential stretch. For starters, we will use a stronger property of \(f^{\mathtt {ws}}\), namely, that it is downward self-reducible in a polylogarithmic number of steps; this means that for every input length \(\ell\) there exists an input length \(\ell _0\ge \ell -\mathrm{polylog}(\ell)\) such that \(f^{\mathtt {ws}}\) is efficiently computable at input length \(\ell _0\) (i.e., \(f^{\mathtt {ws}}_{\ell _0}\) is computable in time \(2^{\ell _0/\mathrm{polylog}(\ell _0)}\) without a “downward” oracle); see Section 4.1.1 for intuition and details about this property.

Now, observe that the transformation of a probabilistic distinguisher A for the PRG to a probabilistic algorithm F that computes \(f^{\mathtt {ws}}\) actually gives a “point-wise” guarantee: For every input length \(n\in \mathbb {N}\), if A distinguishes the PRG on a corresponding set of input lengths \(S_{n}\), then F computes \(f^{\mathtt {ws}}\) correctly at input length \(\ell =\ell (n)=\widetilde{O}(\log (n))\); specifically, we want to use the downward self-reducibility argument for \(f^{\mathtt {ws}}\) at input lengths \(\ell ,\ell -1,\ldots ,\ell _0\), and \(S_n\) is the set of input lengths at which we need a distinguisher for G to obtain a weak learner for \(\mathtt {ECC}(f^{\mathtt {ws}})\) at input lengths \(\ell ,\ell -1,\ldots \ell _0\). Moreover, since \(f^{\mathtt {ws}}\) is downward self-reducible in \(\mathrm{polylog}\) steps, we will only need weak learners at inputs \(\ell ,\ldots ,\ell _0=\ell -\mathrm{polylog}(\ell)\); hence, we can show that \(S_n\) is a set of \(\mathrm{polylog}(\ell)=\mathrm{polyloglog}(n)\) input lengths in the interval \([n,n^2]\) (see Lemma 4.9 for the precise calculation). Taking the contrapositive, if \(f^{\mathtt {ws}}\) cannot be computed by F on almost all \(\ell\)’s, then for every \(n\in \mathbb {N}\) there exists an input length \(m\in S_n\subset [n,n^2]\) such that G fools A at input length m.¹⁴

Our derandomization algorithm gets input \(1^n\) and also gets the “good” input length \(m\in S_{n}\) as non-uniform advice; it then simulates \(G(1^m)\) (i.e., the PRG at input length m) and truncates the output to n bits. (We can indeed show that truncating the output of our PRG preserves its pseudorandomness in a uniform setting; see Proposition 4.12 for details.) The crucial point is that, since \(|S_{n}|=\mathrm{polyloglog}(n)\), the advice length is \(O(\mathrm{logloglog}(n))\). Note, however, that for every potential distinguisher A there exists a different input length \(m\in S_{n}\) such that G is pseudorandom for A on m. Hence, our derandomization algorithm (or, more accurately, its advice) depends on the distinguisher that it wants to “fool.” Thus, for every \(L\in \mathcal {BPP}\) and every efficiently samplable distribution \(\mathcal {X}\) of inputs, there exists a corresponding “almost-always” derandomization algorithm \(D_{\mathcal {X}}\) (see Proposition 4.12).

2.2 \(\mathcal {NTIME}\)-uniform Circuits for ℰ and an Equivalence between Derandomization and Circuit Lower Bounds

The proofs that we describe in the current section are significantly simpler technically than the proofs described in Sections 2.1 and 2.3. As mentioned in Section 1.3, the motivating observation is that NETH implies an equivalence between derandomization and circuit lower bounds; let us start by proving this statement:

Proposition 2.1 (“Warm-Up”: A Weaker Version of Theorem 1.4).

Assume that \(\mathcal {EXP}\not\subset \mathtt {i.o.}\mathcal {NSUBEXP}\). Then, \(pr\mathcal {BPP}\subseteq pr\mathcal {SUBEXP} \iff \mathcal {EXP}\not\subset \mathtt {i.o.}\mathcal {P}/\mathrm{poly}\).

Proof.

The “\(\Longleftarrow\)” direction follows (without any assumption) from [3]. For the “\(\Longrightarrow\)” direction, assume that \(pr\mathcal {BPP}\subseteq pr\mathcal {SUBEXP}\), and assume towards a contradiction that \(\mathcal {EXP}\subset \mathtt {i.o.}\mathcal {P}/\mathrm{poly}\). The latter hypothesis implies (using the Karp-Lipton style result of [3]) that \(\mathcal {EXP}\subset \mathtt {i.o.}\mathcal {MA}\). Combining this with the former hypothesis, we deduce that \(\mathcal {EXP}\subset \mathtt {i.o.}\mathcal {NSUBEXP}\), a contradiction.□

Our proofs of Theorems 1.4 and 1.5 will follow the same logical structure as the proof of Proposition 2.1, and our goal will be to relax the hypothesis \(\mathcal {EXP}\not\subset \mathtt {i.o.}\mathcal {NSUBEXP}\).¹⁵ We will do so by strengthening the Karp-Lipton style result that uses [3] and asserts that a joint “collapse” hypothesis and derandomization hypothesis implies that \(\mathcal {EXP}\) can be decided in small non-deterministic time. We will show two different strengthenings, each referring to a different parameter setting: The first strengthening refers to a “low-end” setting and asserts that if \(\mathcal {EXP}\subset \mathcal {P}/\mathrm{poly}\) and \(pr\mathcal {BPP}\subseteq pr\mathcal {SUBEXP}\), then \(\mathcal {EXP}\) has \(\mathcal {NSUBEXP}\)-uniform circuits of polynomial size (see Item (1) of Proposition 5.6); and the second strengthening refers to a “high-end” setting and asserts that if \(\mathcal {E}\subset \mathtt {i.o.}\mathcal {SIZE}[2^{\epsilon \cdot n}]\) and \(pr\mathcal {BPP}=pr\mathcal {P}\), then \(\mathcal {E}\) has \(\mathcal {NTIME}[2^{O(\epsilon)\cdot n}]\)-uniform circuits (see Proposition 5.7). The proofs of these two different strengthenings rely on different ideas; for high-level descriptions of the proofs, see Sections 5.1.2 and 5.1.3, respectively.

For context, recall that (as noted by Fortnow, Santhanam, and Williams [17]), the proof of [3] already supports the stronger result that \(\mathcal {EXP}\subset \mathcal {P}/\mathrm{poly}\iff \mathcal {EXP}=\mathcal {OMA}\)¹⁶; and by adding a derandomization hypothesis (e.g., \(pr\mathcal {BPP}=pr\mathcal {P}\)), we can deduce that \(\mathcal {EXP}=\mathcal {ONP}\). Nevertheless, our results above are stronger, because \(\mathcal {NP}\)-uniform circuits are an even weaker model than \(\mathcal {ONP}\): This is since, in the latter model, the proof is verified on an input-by-input basis, whereas, in the former model, we only verify once that the proof is convincing for all inputs. We also stress that some lower bounds for this weaker model (i.e., for \(\mathcal {NTIME}\)-uniform circuits of small size) are already known: Santhanam and Williams [49] proved that for every \(k\in \mathbb {N}\) there exists a function in \(\mathcal {NP}\) that cannot be computed by \(\mathcal {NP}\)-uniform circuits of size \(n^k\).

We also note that our proofs actually show that (conditioned on lower bounds for \(\mathcal {NTIME}\)-uniform circuits against \(\mathcal {E}\)) even a relaxed derandomization hypothesis is already equivalent to the corresponding circuit lower bounds. For example, in the “high-end” setting, to deduce that \(\mathcal {E}\not\subset \mathcal {SIZE}[2^{\Omega (n)}]\) it suffices to assume that \(\mathsf {CAPP}\) on v-bit circuits of size \(n=2^{\Omega (v)}\) can be solved in time \(2^{\epsilon \cdot v}\) for a sufficiently small \(\epsilon \gt 0\).¹⁷ For more details, see Section 5.2.

Proof of Theorem 1.6. The first part of Theorem 1.6 asserts that if \(\mathcal {E}\) does not have \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of polynomial size, then the conditional statement “\(pr\mathcal {BPP}\subseteq pr\mathcal {NSUBEXP}\Longrightarrow \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\)” holds. The proof of this statement again follows the logical structure from the proof of Proposition 2.1 and relies on a further strengthening of our “low-end” Karp-Lipton style result such that the result only uses the hypothesis that \(pr\mathcal {BPP}\subseteq pr\mathcal {NSUBEXP}\) rather than \(pr\mathcal {BPP}\subseteq pr\mathcal {SUBEXP}\).¹⁸

The second part of Theorem 1.6 asserts that if the conditional statement “\(pr\mathcal {BPP}\subseteq pr\mathcal {NSUBEXP}\Longrightarrow \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\)” holds, then \(\mathcal {E}\) does not have \(\mathcal {NP}\)-uniform circuits. We will in fact prove the stronger conclusion that \(\mathcal {E}\not\subseteq (\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\). (Recall that the class of problems decidable by \(\mathcal {NP}\)-uniform circuits is a subclass of \(\mathcal {ONP}\subseteq \mathcal {NP}\cap \mathcal {P}/\mathrm{poly}\).) The proof itself is very simple: Assume towards a contradiction that \(\mathcal {E}\subseteq (\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\); since \(\mathcal {BPP}\subseteq \mathcal {EXP}\), it follows that \(pr\mathcal {BPP}\subseteq pr\mathcal {NP}\) (see the proof of Theorem 5.10); and by the hypothesized conditional statement, we deduce that \(\mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\), a contradiction. Indeed, the parameter choices in the foregoing proof are far from tight, and (as mentioned after the statement of Theorem 1.6) the quantitative gap between the two parts of Theorem 1.6 can be considerably narrowed (see Theorem 5.11).

2.3 Circuit Lower Bounds from Randomized CircuitSAT Algorithms

Recall that Theorem 1.7 asserts that if \(\mathtt {CircuitSAT}\) for n-bit circuits of size \(\tilde{O}(n^2)\) can be solved in probabilistic time \(2^{n/(\log n)^c}\), then \(\mathcal {BPE}\not\subset \mathcal {SIZE}[n\cdot (\log n)^{c^{\prime }}]\), where \(c^{\prime }\) depends on c. The relevant context for this result is the known line of works that deduce circuit lower bounds from “non-trivial” circuit-analysis algorithms, following the celebrated result of Williams [59]. The main technical innovation in Theorem 1.7 is that our hypothesis is only that there exists a probabilistic circuit-analysis algorithm, whereas the aforementioned known results crucially rely on the fact that the circuit-analysis algorithm is deterministic. However, the aforementioned known results yield new circuit lower bounds even if the running time of the algorithm is \(2^{n}/n^{\omega (1)}\),¹⁹ whereas Theorem 1.7 only yields new circuit lower bounds if the running time is \(2^{n/\mathrm{polylog}(n)}\).

As far as we are aware, Theorem 1.7 is the first result that deduces circuit lower bounds from a near-exponential-time probabilistic algorithm for a natural circuit-analysis task. The closest result that we are aware of is by Oliveira and Santhanam [46, Theorem 14], who deduced lower bounds for circuits of size \(n^{O(1)}\) against \(\mathcal {BPE}\) from probabilistic algorithms for learning with membership queries (rather than for a circuit-analysis task such as \(\mathtt {CircuitSAT}\)); as explained next, we build on their techniques in our proof.²⁰

Our proof strategy is indeed very different from the proof strategies underlying known results that deduce circuit lower bounds from deterministic circuit-analysis algorithms (e.g., from the “easy-witness” proof strategy [9, 11, 14, 30, 43, 59] or from proofs that rely on \(\mathcal {MA}\) lower bounds [30, Remark 26], [48, 55]). In high level, to prove our result, we exploit the connection between randomized learning algorithms and circuit lower bounds, which was recently discovered by Oliveira and Santhanam [46, Section 5] (following [16, 28, 37]). Loosely speaking, their connection relies on the classical results of [33], and we are able to significantly refine this connection using our refined version of the [33] argument that was detailed in Section 2.1.

Our starting point is the observation that \(\mathtt {CircuitSAT}\) algorithms yield learning algorithms. Specifically, fix \(k\in \mathbb {N}\), and assume (for simplicity) that \(\mathtt {CircuitSAT}\) for polynomial-sized n-bit circuits can be solved in probabilistic time \(2^{n/\mathrm{polylog}(n)}\) for an arbitrarily large polylogarithmic function. We show that in this case, any function that is computable by circuits of size \(n\cdot (\log n)^k\) can be learned (approximately) using membership queries in time \(2^{n/\mathrm{polylog}(n)}\) (we explain below how to prove this).²¹ Now, let \(f^{\mathtt {ws}}\) be the well-structured function from Section 2.1, and recall that \(f^{\mathtt {ws}}\) is computable in linear space and hard for linear space under quasilinear-time reductions. Then, exactly one of two cases holds:

(1)	The function \(f^{\mathtt {ws}}\) does not have circuits of size \(n\cdot (\log n)^k\). In this case, a Boolean version of \(f^{\mathtt {ws}}\) also does not have circuits of such size and, since this Boolean version is in \(\mathcal {SPACE}[O(n)]\subseteq \mathcal {BPE}\), we are done.
(2)	The function \(f^{\mathtt {ws}}\) has circuits of size \(n\cdot (\log n)^k\). Hence, \(f^{\mathtt {ws}}\) is also learnable (as we concluded above), and so the argument of [33] can be used to show that \(f^{\mathtt {ws}}\) is computable by an efficient probabilistic algorithm.²² Now, by a diagonalization argument, there exists \(L^{\sf diag}\in \Sigma _4[n\cdot (\log n)^{2k}]\) that cannot be computed by circuits of size \(n\cdot (\log n)^{k}\). We show that \(L^{\sf diag}\in \mathcal {BPE}\) by first reducing \(L^{\sf diag}\) to \(f^{\mathtt {ws}}\) in time \(\widetilde{O}(n)\) and then computing \(f^{\mathtt {ws}}\) (using the efficient probabilistic algorithm).

Thus, in both cases, we showed a function in \(\mathcal {BPE}\setminus \mathcal {SIZE}[n\cdot (\log n)^k]\). The crucial point is that in the second case, our new and efficient implementation of the [33] argument (which was described in Section 2.1) yields a probabilistic algorithm for \(f^{\mathtt {ws}}\) with very little overhead, which allows us to indeed show that \(L^{\sf diag}\in \mathcal {BPE}\). Specifically, our implementation of the argument (with the specific well-structured function \(f^{\mathtt {ws}}\)) shows that if \(f^{\mathtt {ws}}\) can be learned in time \(T(n)=2^{n/\mathrm{polylog}(n)}\), then \(f^{\mathtt {ws}}\) can be computed in similar time \(T^{\prime }(n)=2^{n/\mathrm{polylog}(n)}\) (see Corollary 4.10).

We thus only need to explain how a \(\mathtt {CircuitSAT}\) algorithm yields a learning algorithm with comparable running time. The idea here is quite simple: Given oracle access to a function \(f^{\mathtt {ws}}\), we generate a random sample of \(r=\mathrm{poly}(n)\) labeled examples \((x_1,f^{\mathtt {ws}}(x_1)),\ldots ,(x_r,f^{\mathtt {ws}}(x_r))\) for \(f^{\mathtt {ws}}\), and we use the \(\mathtt {CircuitSAT}\) algorithm to construct, bit-by-bit, a circuit of size \(n\cdot (\log n)^k\) that agrees with \(f^{\mathtt {ws}}\) on the sample. Note that the input for the \(\mathtt {CircuitSAT}\) algorithm is a circuit of size \(\mathrm{poly}(n)\) over only \(n^{\prime }\approx n\cdot (\log n)^{k+1}\) bits (corresponding to the size of the circuit that we wish to construct). Hence, the \(\mathtt {CircuitSAT}\) algorithm runs in time \(2^{n^{\prime }/\mathrm{polylog}(n^{\prime })}=2^{n/\mathrm{polylog}(n)}\). And if the sample size \(r=\mathrm{poly}(n)\) is large enough, then with high probability any circuit of size \(n\cdot (\log n)^k\) that agrees with \(f^{\mathtt {ws}}\) on the sample also agrees with \(f^{\mathtt {ws}}\) on almost all inputs (i.e., by a union-bound over all circuits of such size).

3 PRELIMINARIES

We denote random variables in boldface. For an alphabet \(\Sigma\) and \(n\in \mathbb {N}\), we denote the uniform distribution over \(\Sigma ^n\) by \({\bf u}_n\), where \(\Sigma\) will be clear from context.

For any set \(L\subseteq \lbrace 0,1\rbrace ^*\) and \(n\in \mathbb {N}\), we denote by \(L_n=L\cap \lbrace 0,1\rbrace ^n\) the restriction of L to n-bit inputs. Similarly, for \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\), we denote by \(f_n:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace ^*\) the restriction of f to the domain of n-bit inputs.

3.1 Complexity Classes

We will use standard complexity-theoretic notation, which can be found in any standard textbook (such as [2, 19]). As few specific reminders for classes that will be used in our article, let us recall that:

(1)	The class \(\mathcal {E}=\mathcal {DTIME}[2^{O(n)}]\) is the set of languages decidable in deterministic time \(2^{O(n)}\).
(2)	For a function \(s:\mathbb {N}\rightarrow \mathbb {N}\), the class \(\mathcal {SIZE}[s]\) is the set of languages decidable by an infinite family \(\lbrace C_n:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{}\rbrace _{n\in \mathbb {N}}\) of Boolean circuits with fan-in two over the De Morgan basis such that \(C_n\) is of size at most \(s(n)\).
(3)	For a class \(\mathcal {C}\) of languages, the notation \(\mathtt {i.o.}\mathcal {C}\) refers the set of languages \(L\subseteq \lbrace 0,1\rbrace ^{*}\) that agree with some \(L^{\prime }\in \mathcal {C}\) on infinitely many input lengths; that is, there exists an infinite set \(S\subseteq \mathbb {N}\) such that for every \(n\in S\) it holds \(L\cap \lbrace 0,1\rbrace ^{n}=L^{\prime }\cap \lbrace 0,1\rbrace ^{n}\).
(4)	The notation \(pr\mathcal {BPP}\) refers to the set of promise problems decidable in probabilistic polynomial time; that is, the set of pairs \((\mathsf {Y},\mathsf {N})\in \lbrace 0,1\rbrace ^{}\times \lbrace 0,1\rbrace ^{}\) such that there exists a probabilistic polynomial time machine M satisfying \(x\in \mathsf {Y}\Rightarrow \Pr [M(x)=1]\ge 2/3\) and \(x\in \mathsf {N}\Rightarrow \Pr [M(x)=0]\ge 2/3\).
(5)	The class \(\mathcal {SUBEXP}=\cap _{\epsilon \gt 0}\mathcal {DTIME}[2^{n^{\epsilon }}]\) is the set of languages decidable in sub-exponential time (i.e., time \(2^{n^{\epsilon }}\) where \(\epsilon \gt 0\) can be an arbitrarily small constant). Similarly, the class \(\mathtt {i.o.}pr\mathcal {SUBEXP}\) is the set of promise problems decidable in sub-exponential time on infinitely many input lengths; and the class \(pr\mathcal {NSUBEXP}\) is the set of promise problems decidable in sub-exponential time.

3.2 Two Exponential-time Hypotheses

We define two exponential-time hypotheses that we consider in this article. We note in advance that our actual results refer to various weaker variants of these hypotheses.

Hypothesis 1

rETHsee Reference [15])

Randomized Exponential Time Hypothesis (\(\mathbf {rETH}\)): There exists \(\epsilon \gt 0\) and \(c\gt 1\) such that \(3\text{-}\mathtt {SAT}\) on n variables and with \(c\cdot n\) clauses cannot be solved by probabilistic algorithms that run in time \(2^{\epsilon \cdot n}\).

Hypothesis 2

(NETH; see Reference [7])

Non-Deterministic Exponential Time Hypothesis (\(\mathbf {NETH}\)): There exists \(\epsilon \gt 0\) and \(c\gt 1\) such that \(co\text{-}3\text{-}\mathtt {SAT}\) on n variables and with \(c\cdot n\) clauses cannot be solved by non-deterministic algorithms that run in time \(2^{\epsilon \cdot n}\).

We also extend the two foregoing hypotheses to stronger versions in which every algorithm (probabilistic or non-deterministic, respectively) fails to compute the corresponding “hard” function on all but finitely-many input lengths. These stronger hypotheses are denoted a.a.-rETH, and a.a.-NETH, respectively.

3.3 Worst-case Derandomization and Pseudorandom Generators

We now formally define the circuit acceptance probability problem (or \(\mathsf {CAPP}\), in short); this well-known problem is also sometimes called Circuit Derandomization, Approx Circuit Average, and GAP-SAT or GAP-UNSAT.

Definition 3.1

(CAPP)

The circuit acceptance probability problem with parameters \(\alpha ,\beta \in [0,1]\) such that \(\alpha \gt \beta\) and for size \(S:\mathbb {N}\rightarrow \mathbb {N}\) (or \((\alpha ,\beta)\text{-}\mathsf {CAPP}[S]\), in short) is the following promise problem:

The \(YES\) instances are (representations of) circuits over v input bits of size at most \(S(v)\) that accept at least an \(\alpha\) fraction of their inputs.
The \(NO\) instances are (representations of) circuits over v input bits of size at most \(S(v)\) that accept at most a \(\beta\) fraction of their inputs.

We define the \(\mathsf {CAPP}[S]\) problem (i.e., omitting \(\alpha\) and \(\beta\)) as the \((2/3,1/3)\text{-}\mathsf {CAPP}[S]\) problem. We define \(\mathsf {CAPP}\) to be the problem when there is no restriction on S.

It is well-known that \(\mathsf {CAPP}\) is complete for \(pr\mathcal {BPP}\) under deterministic polynomial-time reductions; in particular, \(\mathsf {CAPP}\) can be solved in deterministic polynomial time if and only if \(pr\mathcal {BPP}=pr\mathcal {P}\). (For a proof, see, e.g., [58, Corollary 2.31], [19, Exer. 6.14].)

We will need the following well-known construction of a pseudorandom generator from a function that is “hard” for non-uniform circuits, by Umans [57] (following the line of works initiated by Nisan and Wigderson [44]).

Theorem 3.2 (Umans’ PRG; see Reference [57, Theorem 6]).

There exists a constant \(c\gt 1\) and an algorithm G such that the following holds: When G is given an n-bit truth-table of a function \(f:\lbrace 0,1\rbrace ^{\log (n)}\rightarrow \lbrace 0,1\rbrace\) that cannot be computed by circuits of size s, and a random seed of length \(\ell (n)=c\cdot \log (n)\), it runs in time \(n^c\), and for \(m=s^{1/c}\) outputs an m-bit string that is \((1/m)\)-pseudorandom for every size-m circuit over m bits.

Corollary 3.3 (Near-optimal Non-uniform Hardness-to-randomness using Umans’ PRG).

There exists a universal constant \({\Delta }\gt 1\) such that for every time-computable \(S:\mathbb {N}\rightarrow \mathbb {N}\) and for \(T(n)=2^{{\Delta }\cdot S^{-1}(n^{\Delta })}\), we have that

(1)	If \(\mathcal {E}\not\subset \mathcal {SIZE}[S]\), then \(\mathsf {CAPP}\in \mathtt {i.o.}pr\mathcal {DTIME}[T]\).
(2)	If \(\mathcal {E}\not\subset \mathtt {i.o.}\mathcal {SIZE}[S]\), then \(\mathsf {CAPP}\in pr\mathcal {DTIME}[T]\).

In addition, we will need a suitable construction of an averaging sampler. Recall the standard definition of averaging samplers:

Definition 3.4

(Averaging Sampler).

A function \(Samp:\lbrace 0,1\rbrace ^{m^{\prime }}\rightarrow (\lbrace 0,1\rbrace ^m)^{D}\) is an averaging sampler with accuracy \(\epsilon\) and confidence \(\delta\) (or \((\epsilon ,\delta)\)-averaging sampler, in short) if for every \(T\subseteq \lbrace 0,1\rbrace ^m\), the probability over choice of \(x\in \lbrace 0,1\rbrace ^{m^{\prime }}\) that \(\Pr _{i\in [D]}[Samp(x)_i\in T]\notin |T|/2^m\pm \epsilon\) is at most \(\delta\).

We will specifically use the following well-known construction by Guruswami, Umans, and Vadhan [25]. (The construction in [25] is of an extractor, rather than of an averaging sampler, but the two are well-known to be essentially equivalent; see, e.g., [19, Section D.4.1.2] or [58, Corollary 6.24].)

Theorem 3.5 (The Near-optimal Extractor of Reference [25], Instantiated as a Sampler and for Specific Parameters).

Let \(\gamma \ge 1\) and \(\beta \gt \alpha \gt 0\) be constants. Then, there exists a polynomial-time algorithm that for every m computes an \((m^{-\gamma },2^{-(\beta -\alpha)\cdot m})\)-averaging sampler \(Samp:\lbrace 0,1\rbrace ^{m^{\prime }}\rightarrow (\lbrace 0,1\rbrace ^m)^D\), where \(m^{\prime }=(1+\beta)\cdot m\) and \(D=\mathrm{poly}(m)\).

3.4 Average-case Derandomization and Pseudorandom Generators

We now define the notions of “average-case” derandomization of probabilistic algorithms. The first definitions that we need are of circuits that distinguish a distribution from uniform and of distributions that are pseudorandom for uniform algorithms. Towards this purpose, we consider a generator G that gets input \(1^n\), a random seed of length \(\ell (n)\), and a stretch parameter \(\mathtt {str}(n)\) and outputs \(\mathtt {str}(n)\) pseudorandom bits.

Definition 3.6

(Distinguishing Distributions from Uniform).

For two functions \(\mathtt {str},\ell :\mathbb {N}\rightarrow \mathbb {N}\), let G be an algorithm that gets input \(1^n\) and a random seed of length \(\ell (n)\) and outputs a string of length \(\mathtt {str}(n)\). Then:

(1)	For \(n\in \mathbb {N}\) and \(n^{\prime }\in \mathtt {str}^{-1}(n)\), we say that \(D_n:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) \(\epsilon\)-distinguishes \(G(1^{n^{\prime }},{\bf u}_{\ell (n^{\prime })})\) from uniform if \(\vert \Pr [D_n(G(1^{n^{\prime }},{\bf u}_{\ell (n^{\prime })}))=1]-\Pr [D_n({\bf u}_n)=1]\vert \gt \epsilon\).
(2)	For a probabilistic algorithm A, an integer n, and \(\epsilon \gt 0\), we say that \(G(1^n,{\bf u}_{\ell (n)})\) is \(\epsilon\)-pseudorandom for A if the probability that \(A(1^{\mathtt {str}(n)})\) outputs a circuit that \(\epsilon\)-distinguishes \(G(1^n,{\bf u}_{\ell (n)})\) from uniform is at most \(\epsilon\).

When applying this definition without specifying a function \(\mathtt {str}\), we assume that \(\mathtt {str}\) is the identity function.

We now use Definition 3.6 to define pseudorandom generators for uniform circuits and hitting-set generators for uniform circuits, which are analogous to the standard definitions of PRGs and HSGs for non-uniform circuits:

Definition 3.7

(PRGs for Uniform Circuits).

For \(\ell :\mathbb {N}\rightarrow \mathbb {N}\), let G be an algorithm that gets as input \(1^n\) and a random seed of length \(\ell (n)\) and outputs strings of length n. For \(t,a:\mathbb {N}\rightarrow \mathbb {N}\) and \(\epsilon :\mathbb {N}\rightarrow (0,1)\), we say that G is an \(\epsilon\)-i.o.-PRG for \((t,a)\)-uniform circuits if for every probabilistic algorithm A that runs in time \(t(n)\) and gets \(a(n)\) bits of non-uniform advice there exists an infinite set \(S_A\subseteq \mathbb {N}\) such that for every \(n\in S_A\) it holds that \(G(1^n,{\bf u}_{\ell (n)})\) is \(\epsilon (n)\)-pseudorandom for A. If for every such algorithm A there is a set \(S_A\) as above that contains all but finitely-many inputs, then we say that G is an \(\epsilon\)-PRG for \((t,a)\)-uniform circuits.

Definition 3.8

(HSGs for Uniform Circuits).

For \(\ell :\mathbb {N}\rightarrow \mathbb {N}\), let H be an algorithm that gets as input \(1^n\) and a random seed of length \(\ell (n)\) and outputs strings of length n. For \(t,a:\mathbb {N}\rightarrow \mathbb {N}\) and \(\epsilon :\mathbb {N}\rightarrow (0,1)\), we say that H is an \(\epsilon\)-HSG for \((t,a)\)-uniform circuits if the following holds: For every probabilistic algorithm A that gets input \(1^n\) and \(a(n)\) bits of non-uniform advice, runs in time \(t(n)\), and outputs a circuit \(D_n:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\), and every sufficiently large \(n\in \mathbb {N}\), with probability at least \(1-\epsilon (n)\) (over the coin tosses of A) at least one of the following two cases holds:

(1)	There exists \(s\in \lbrace 0,1\rbrace ^{\ell (n)}\) such that \(D_n(G(1^n,s))=1\).
(2)	The circuit \(D_n\) satisfies \(\Pr _{x\in \lbrace 0,1\rbrace ^n}[D_n(x)=1]\le \epsilon (n)\).

As mentioned in Section 1, PRGs for uniform circuits can be used to derandomize \(\mathcal {BPP}\) “on average” (see, e.g., [20, Proposition 4.4]). Analogously, HSGs for uniform circuits can be used to derandomize \(\mathcal {RP}\) “on average.” That is, loosely speaking, if there exists an HSG for uniform circuits, then for any \(L\in \mathcal {RP}\) there exists a deterministic algorithm D such that for every efficiently samplable distribution \(\mathcal {X}\), the probability over \(x\sim \mathcal {X}\) that \(D(x)\ne L(x)\) is small. For simplicity, we prove the foregoing claim for HSGs that are computable in polynomial time and have logarithmic seed length:

Claim 3.9

(HSGs for Uniform Circuits → Derandomization of RP “On Average”)

For \(\epsilon :\mathbb {N}\rightarrow (0,1)\) such that \(\epsilon (n)\le 1/3\), assume that for every \(k\in \mathbb {N}\) there exists a \(\epsilon\)-HSG for \((n^k,0)\)-uniform circuits that is polynomial-time computable and that has logarithmic seed length. Then, for every \(L\in \mathcal {RP}\) and every \(c\in \mathbb {N}\), there exists a deterministic polynomial-time algorithm D such that for every probabilistic algorithm F that runs in time \(n^c\) and every sufficiently large \(n\in \mathbb {N}\), the probability (over the internal coin tosses of F) that \(F(1^n)\) outputs a string \(x\in \lbrace 0,1\rbrace ^n\) such that \(D(x)\ne L(x)\) is at most \(\epsilon (n)\).

Proof.

Let M be an \(\mathcal {RP}\) machine that decides L in time \(n^{c^{\prime }}\) for some \(c^{\prime }\in \mathbb {N}\). The deterministic algorithm D gets input \(x\in \lbrace 0,1\rbrace ^n\), enumerates the seeds of the HSG for output length \(m=n^{c^{\prime }}\) and with the parameter \(k=O(1+c/c^{\prime })\), and accepts x if and only if there exists an output r of the HSG such that M accepts x with random coins r. Note that D never accepts inputs \(x\notin L\) (since M is an \(\mathcal {RP}\) machine), and thus, we only have to prove that for every algorithm F as in the claim’s statement, the probability that \(x=F(1^n)\) satisfies both \(x\in L\) and \(D(x)=0\) is at most \(\epsilon (n)\).

To do so, let F be a probabilistic algorithm that runs in time \(n^c\). Consider the probabilistic algorithm A that, on input \(1^{m}\), runs the algorithm F on input \(1^{n}\) to obtain \(x\in \lbrace 0,1\rbrace ^{n}\) and outputs a circuit \(C_{m,x}:\lbrace 0,1\rbrace ^{m}\rightarrow \lbrace 0,1\rbrace\) that computes the decision of M at input x as a function of M’s \(m=n^{c^{\prime }}\) random coins. Note that the algorithm A runs in time at most \(m^{O(1+c/c^{\prime })}\), and also note that the only probabilistic choices that A makes are a choice of \(x=F(1^n)\). Thus, by Definition 3.8 for every sufficiently large m, with probability at least \(1-\epsilon (m)\gt 1-\epsilon (n)\) over choice of \(x=F(1^n)\) (i.e., over the coin tosses of A), if \(D(x)=0\) then \(\Pr _r[C_{m,x}(r)=1]=\Pr [M(x)=1]\le \epsilon (n)\le 1/3\), which means that \(x\notin L\).□

3.5 An ℰ-complete Problem with Useful Properties

Our proofs in Section 5 will rely on the well-known existence of an \(\mathcal {E}\)-complete problem \(L^{\mathtt {nice}}\) with the following useful properties: The problem \(L^{\mathtt {nice}}\) is randomly self-reducible and that has an instance checker with linear-length queries such that both the instance checker and the random self-reducibility algorithm use a linear number of random bits. Let us properly define these notions:

Definition 3.10

(Instance Checkers).

A probabilistic polynomial-time oracle machine \(\mathtt {IC}\) is an instance checker for a set \(L\subseteq \lbrace 0,1\rbrace ^*\) if for every \(x\in \lbrace 0,1\rbrace ^*\) the following holds:

(1)	(Completeness.) \(\mathtt {IC}^L(x)=L(x)\), with probability one.
(2)	(Soundness.) For every \(L^{\prime }\subseteq \lbrace 0,1\rbrace ^*\), we have that \(\Pr [\mathtt {IC}^{L^{\prime }}(x)\notin \lbrace L(x),\perp \rbrace ]\le 1/6\).²³

For \(\ell :\mathbb {N}\rightarrow \mathbb {N}\), if for every \(x\in \lbrace 0,1\rbrace ^*\), all the oracle queries of \(\mathtt {IC}\) on input x are of length \(\ell (|x|)\), then we say that \(\mathtt {IC}\) has queries of length \(\ell\). We will also measure the maximal number of queries that \(\mathtt {IC}\) makes on inputs of any given length.

Definition 3.11

(Random Self-reducible Function).

We say that \(f:\lbrace 0,1\rbrace ^{*}\rightarrow \lbrace 0,1\rbrace ^{*}\) is randomly self-reducible if there exists a probabilistic oracle machine \(\mathtt {Dec}\) that gets input \(x\in \lbrace 0,1\rbrace ^n\) and access to an oracle \(g:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{*}\), runs in time \(\mathrm{poly}(n)\), makes oracle queries such that each query is uniformly distributed in \(\lbrace 0,1\rbrace ^{n}\), and if for every oracle query \(q\in \lbrace 0,1\rbrace ^n\) it holds that \(g(q)=f(q)\), then \(\mathtt {Dec}^g(x)=f(x)\).

In high level, the problem \(L^{\mathtt {nice}}\) is the low-degree extension of an (arbitrary) \(\mathcal {E}\)-complete problem. The intuition is that, since \(L^{\mathtt {nice}}\) is a low-degree extension, it is randomly self-reducible, and, since \(L^{\mathtt {nice}}\) is \(\mathcal {E}\)-complete, we can construct an instance checker for it. (Specifically, the instance checker for \(L^{\mathtt {nice}}\) simulates a PCP verifier for \(L^{\mathtt {nice}}\), and the problem of answering the verifier’s queries reduces to \(L^{\mathtt {nice}}\), so the verifier’s queries can be answered using an oracle to \(L^{\mathtt {nice}}\).) For details and a full proof, see Appendix C.

Proposition 3.12

(An ℰ-complete Problem that is Random Self-reducible and has a Good Instance Checker)

There exists \(L^{\mathtt {nice}}\in \mathcal {DTIME}[\tilde{O}(2^n)]\) such that:

(1)	Any \(L\in \mathcal {DTIME}[2^n]\) reduces to \(L^{\mathtt {nice}}\) in polynomial time with a constant multiplicative blow-up in the input length; specifically, for every n there exists \(n^{\prime }=O(n)\) such that any n-bit input for L is mapped to an \(n^{\prime }\)-bit input for \(L^{\mathtt {nice}}\).
(2)	The problem \(L^{\mathtt {nice}}\) is randomly self-reducible by an algorithm \(\mathtt {Dec}\) that on inputs of length n uses \(n+\mathrm{polylog}(n)\) random bits.
(3)	There is an instance checker \(\mathtt {IC}\) for \(L^{\mathtt {nice}}\) that on inputs of length n uses \(n+O(\log (n))\) random bits and makes \(O(1)\) queries of length \(\ell (n)=O(n)\).

4 RETH AND NEAR-OPTIMAL UNIFORM HARDNESS-TO-RANDOMNESS

In this section, we prove Theorems 1.1 and 1.2. First, in Section 4.1, we define and construct well-structured functions, which are the key technical component in our proof of Theorem 1.1. Then, in Section 4.2, we show how well-structured functions can be used in the proof framework of [33] (with minor variations) to construct a PRG that “fools” uniform circuits, assuming that the well-structured function cannot be computed by efficient probabilistic algorithms. Finally, in Section 4.3, we prove Theorems 1.1 and 1.2.

4.1 Construction of a Well-structured Function

In Section 4.1.1, we present the required properties of well-structured functions and define such functions. Then, in Section 4.1.2, we present a high-level overview of our construction of such functions. Finally, in Section 4.1.3, we present the construction itself in detail.

4.1.1 Well-structured Function: Definition.

Loosely speaking, we will say that a function \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) is well-structured if it satisfies three properties. The first property, which is not crucial for our proofs but simplifies them a bit, is that f is length-preserving; that is, for every \(x\in \lbrace 0,1\rbrace ^*\) it holds that \(|f(x)|=|x|\).

The second property is a strengthening of the notion of downwards self-reducibility. Recall that a function \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) is downwards self-reducible if \(f_n\) can be computed by an efficient algorithm that has oracle access to \(f_{n-1}\). First, we quantify the notion of “efficient,” to also allow for a very large running time (e.g., running time \(2^{n/\mathrm{polylog}(n)}\)). Second, we also require that for any \(n\in \mathbb {N}\) there exists an input length m that is not much smaller than n such that \(f_m\) is efficiently computable without any “downward” oracle. That is, intuitively, if we try to compute f on input length n by “iterating downwards” using downward self-reducibility, our “base case” in which the function is efficiently computable is not input length \(O(1)\), but a large input length m that is not much smaller than n. More formally:

Definition 4.1

(Downward Self-reducibility in Few Steps).

For \(t,s:\mathbb {N}\rightarrow \mathbb {N}\), we say that a function \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) is downward self-reducible in time t and s steps if there exists a probabilistic oracle machine A that for any sufficiently large \(n\in \mathbb {N}\) satisfies the following:

(1)	When A is given input \(x\in \lbrace 0,1\rbrace ^n\) and oracle access to \(f_{n-1}\), it runs in time at most \(t(n)\) and satisfies \(\Pr _r[A^{f_{n-1}}(x,r)=f(x)]\ge 2/3\).
(2)	There exists an input length \(m\in [n-s(n),n]\) such that A computes \(f_m\) in time \(t(m)\) without using randomness or oracle queries.

In the special case that \(s(n)=n\), we simply say that f is downward self-reducible in time t.

The third property that we need is a refinement of the notion of random self-reducibility, which is called sample-aided worst-case to average-case reducibility. This notion was recently made explicit by Goldreich and G. Rothblum [23] and is implicit in many previous results (see, e.g., the references in [23]).

To explain the notion, recall that if a function f is randomly self-reducible, then a circuit \(\widetilde{C}\) that computes f on most of the inputs can be efficiently transformed to a (probabilistic) circuit C that computes f on every input (whp). We want to relax this notion by allowing the efficient algorithm that transforms \(\widetilde{C}\) into C to obtain random labeled samples for f (i.e., inputs of the form \((r,f(r))\) where r is chosen uniformly at random). The main advantage in this relaxation is that we will not need to assume that \(\widetilde{C}\) computes f on most of the inputs but will be satisfied with the weaker assumption that \(\widetilde{C}\) computes f on a tiny fraction of the inputs. Specifically²⁴:

Definition 4.2

(Sample-aided Reductions; see.

[23, Definition 4.1]) Let \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) be a length-preserving function, and let \(s:\mathbb {N}\rightarrow \mathbb {N}\) and \(\delta _0:\mathbb {N}\rightarrow [0,1)\). Let M be a probabilistic oracle machine that gets input \(1^n\) and a sequence of \(s(n)\) pairs of the form \((r,v)\in \lbrace 0,1\rbrace ^n\times \lbrace 0,1\rbrace ^n\) and oracle access to a function \(\tilde{f_n}:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace ^n\), and outputs a circuit \(C:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace ^n\) with oracle gates. We say that M is a sample-aided reduction of computing f in the worst-case to computing f on \(\delta _0\) of the inputs using a sample of size s if for every \(\tilde{f_n}:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace ^n\) satisfying \(\Pr _{x\in \lbrace 0,1\rbrace ^n}[\tilde{f_n}(x)=f_n(x)]\ge \delta _0(n)\) the following holds: With probability at least \(1-\delta _0(n)\) over choice of \(\bar{r}=r_1,\ldots ,r_{s(n)}\in \lbrace 0,1\rbrace ^n\) and over the internal coin tosses of M, we have that \(M^{\tilde{f_n}}(1^n,(r_i,f_n(r_i))_{i\in [s(n)]})\) outputs a circuit C such that \(\Pr [C^{\tilde{f_n}}(x)=f_n(x)]\ge 2/3\) for every \(x\in \lbrace 0,1\rbrace ^n\) (the probability bound of \(2/3\) is over the internal randomness of C).

Definition 4.3

(Sample-aided Worst-case to Average-case Reducibility).

For \(\delta _0:\mathbb {N}\rightarrow (0,1)\), we say that a function \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) is sample-aided worst-case to \(\delta _0\)-average-case reducible if there exists a sample-aided reduction M of computing f in worst-case to computing f on \(\delta _0\) of the inputs such that M runs in time \(\mathrm{poly}(n,1/\delta _0(n))\) and uses \(\mathrm{poly}(1/\delta _0(n))\) samples.

For high-level intuition of why labeled samples can be helpful for worst-case to average-case reductions, and for a proof that if f is a low-degree multivariate polynomial then it is sample-aided worst-case to average-case reducible, see Appendix B.

We are now ready to define well-structured functions. Fixing a parameter \(\delta \gt 0\), a function \(f^{\mathtt {ws}}\) is \(\delta\)-well-structured if it is length-preserving, downward self-reducible in time \(\mathrm{poly}(1/\delta)\), and sample-aided worst-case to \(\delta\)-average case reducible. That is:

Definition 4.4

(Well-structured Function).

For \(\delta :\mathbb {N}\rightarrow (0,1)\) and \(s:\mathbb {N}\rightarrow \mathbb {N}\), we say that a function \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) is \((\delta ,s)\)-well-structured if \(f^{\mathtt {ws}}\) is length-preserving, downward self-reducible in time \(\mathrm{poly}(1/\delta)\) and s steps, and sample-aided worst-case to \(\delta\)-average-case reducible. Also, when \(s(n)=n\) (i.e., \(f^{\mathtt {ws}}\) is simply downward self-reducible in time \(\mathrm{poly}(1/\delta)\)), we say that \(f^{\mathtt {ws}}\) is \(\delta\)-well-structured.

In the following definition, we consider reductions from a decision problem \(L\subseteq \lbrace 0,1\rbrace ^*\) to a well-structured function \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\). To formalize this, we consider both a reduction R, which transforms any input x for L to an input \(R(x)\) for \(f^{\mathtt {ws}}\), and a “decision algorithm” D, which translates the non-Boolean result \(f^{\mathtt {ws}}(R(x))\) into a decision of whether or not \(x\in L\).

Definition 4.5

(Reductions to Multi-output Functions).

Let \(L\subseteq \lbrace 0,1\rbrace ^*\) and \(f:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\). For \(t,b:\mathbb {N}\rightarrow \mathbb {N}\), we say that L reduces to f in time t with blow-up b if there exist two deterministic time-t algorithms R and D such that for every \(x\in \lbrace 0,1\rbrace ^*\) it holds that \(|R(x)|\le b(|x|)\) and that \(x\in L\) if and only if \(D(f(R(x)))=1\).

4.1.2 Overview of Our Construction.

For \(\delta =2^{-n/\mathrm{polylog}(n)}\) and \(s=\mathrm{polylog}(n)\), our goal is to construct a \((\delta ,s)\)-well-structured function \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) such that \(\mathtt {TQBF}\) reduces to \(f^{\mathtt {ws}}\) in quasilinear time (and thus with quasilinear blow-up). Throughout the section, assume that an n-bit input to \(\mathtt {TQBF}\) is simply a \(3\text{-}\mathtt {SAT}\) formula \(\varphi\) on n variables, and it is assumed that all variables are quantified in order with alternating quantifiers (e.g., \(\forall w_1 \exists w_2 \forall w_3 ... \varphi (w_1,\ldots ,w_n)\); see Definition 4.6).

Our starting point is the well-known construction of Trevisan and Vadhan [56], which (loosely speaking) transforms the protocol underlying the \(\mathcal {IP}=\mathcal {PSPACE}\) proof into a computational problem \(L_{TV}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\).²⁵ They required that \(L_{TV}\) will meet the weaker requirements (compared to our requirements) of being downward self-reducible and randomly self-reducible, where the latter means reducible from being worst-case computable to being computable on, say, .99 of the inputs.

Before describing our new construction, let us first review the original construction of \(L_{TV}\). For every \(n\in \mathbb {N}\), fix a corresponding interval \(I_n=[N_0,N_1]\) of \(r(n)=\mathrm{poly}(n)\) input lengths. The input to \(L_{TV}\) at any input length in \(I_n\) (disregarding necessary padding) is a pair \((\varphi ,w)\in \mathbb {F}^{2n}\), where \(\mathbb {F}\) is a sufficiently large field. (The field size is chosen such that both P and related polynomials that are described below will be of low degree.) If \((\varphi ,w)\in \lbrace 0,1\rbrace ^{2n}\), then we think of \(\varphi\) as representing a \(3\text{-}\mathtt {SAT}\) formula and of w as representing an assignment. At input length \(N_0\), we define \(L_{TV}(\varphi ,w)=P(\varphi ,w)\), where \(P(\varphi ,x)\) is a low-degree arithmetized version of the Boolean function \((\varphi ,w)\mapsto \varphi (w)\).

Now, recall that the \(\mathcal {IP}=\mathcal {PSPACE}\) protocol defines three arithmetic operators on polynomials (two quantification operators and a linearization operator). Then, at input length \(N_0+i\), the problem \(L_{TV}\) is recursively defined by applying one of the three arithmetic operators on the polynomial from the previous input length \(N_0+i-1\).²⁶ Observe that computing \(L_{TV}\) at input length \(N_0+i\) corresponds to the residual computational problem that the verifier faces at the \((r-i)^{th}\) round of the \(\mathcal {IP}=\mathcal {PSPACE}\) protocol when instantiated for formula \(\varphi\) and with \(r=r(n)\) rounds. Indeed, at the largest input length \(N_1=N_0+r(n)\) the polynomial \(L_{TV}\) is simply a low-degree arithmetized version of the function that decides whether or not \(\varphi \in \mathtt {TQBF}\) (regardless of w); thus, \(\mathtt {TQBF}\) can be reduced to \(L_{TV}\) by mapping \(\varphi \in \lbrace 0,1\rbrace ^n\) to \((\varphi ,1^n)\in \mathbb {F}^{2n}\) and adding padding to get the input to be of length \(N_1=\mathrm{poly}(n)\). Note that \(L_{TV}\) is indeed both downward self-reducible (since for each operator O and polynomial P, we can compute \(O(P)(\varphi ,w)\) in polynomial-time with two oracle queries to P), and randomly self-reducible (since the polynomials have low degree.)

Let us now define our \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\), which replaces their \(L_{TV}\), and highlight what is different in our setting. Recall that our main goal is to construct the well-structured function \(f^{\mathtt {ws}}\) such that \(\mathtt {TQBF}\) is reducible to \(f^{\mathtt {ws}}\) with only quasilinear overhead in the input length (i.e., we need to avoid polynomial overheads), while keeping the running time of all operations (i.e., of the algorithms for downward self-reducibility and for sample-aided worst-case to rare-case reducibility) to be at most \(2^{n/\mathrm{polylog}(n)}\).

The first issue, which is relatively easy to handle, is the number of bits that we use to represent an (arithmetized) input \((\varphi ,w)\) for \(f^{\mathtt {ws}}\). Recall that we want \(f^{\mathtt {ws}}\) to be worst-case to \(\delta\)-average-case reducible for a tiny \(\delta =2^{-n/\mathrm{polylog}(n)}\); thus, \(f^{\mathtt {ws}}\) will involve computing polynomials over a field of large size \(|\mathbb {F}|\ge \mathrm{poly}(1/\delta)\). Using the approach of [56], we would need \(2n\cdot \log (|\mathbb {F}|)=\tilde{\Omega }(n^2)\) bits to represent \((\varphi ,w)\), and thus the reduction from \(\mathtt {TQBF}\) to \(f^{\mathtt {ws}}\) would incur a polynomial overhead. This is easily solvable by considering a “low-degree extension” instead of their “multilinear extension”: To represent an input \((\varphi ,w)\in \lbrace 0,1\rbrace ^{2n}\) to \(f^{\mathtt {ws}}\), we will use few elements in a very large field. Specifically, we will use \(\ell =\mathrm{polylog}(n)\) variables (i.e., the polynomial will be \(\mathbb {F}^{2\ell }\rightarrow \mathbb {F}\)) such that each variable “provides” \(O(n/\mathrm{polylog}(n))\) bits of information.

A second problem is constructing a low-degree arithmetization \(P(\varphi ,w)\) of the Boolean function that evaluates \(\varphi\) at w. In [56], they solve this by first reducing \(\mathtt {TQBF}\) to an intermediate problem \(\mathtt {TQBF}^{\prime }\) that is amenable to such low-degree arithmetization; however, their reduction incurs a quadratic blow-up in the input length, which we cannot afford in our setting. To overcome this, we reduce \(\mathtt {TQBF}\) to another intermediate problem, denoted \(\mathtt {TQBF^{loc}}\), which is amenable to low-degree arithmetization, such that the reduction incurs only a quasilinear blow-up in the input length. (Loosely speaking, we define \(\mathtt {TQBF^{loc}}\) by applying a very efficient Cook-Levin reduction to the Turing machine that gets input \((\varphi ,w)\) and outputs \(\varphi (w)\); see Claim 4.7.1 for precise details.) We then carefully arithmetize \(\mathtt {TQBF^{loc}}\), while “paying” for this efficient arithmetization by the fact that computing the corresponding polynomial now takes time \(\exp (n/\ell)=\mathrm{poly}(1/\delta)\), instead of \(\mathrm{poly}(n)\) time as in [56] (see Claim 4.7.2).

Third, the number of polynomials in the construction of \(L_{TV}\) (i.e., the size of the interval \(I_n\)) is \(r(n)=\mathrm{poly}(n)\), corresponding to the number of rounds in the \(\mathcal {IP}=\mathcal {PSPACE}\) protocol. This poses a problem for us, since the reduction from \(\mathtt {TQBF}\) maps an input of length n is to an input of length \(N_1\ge \mathrm{poly}(n)\). We solve this problem by “shrinking” the number of polynomials to be polylogarithmic, using an approach similar to an \(\mathcal {IP}=\mathcal {PSPACE}\) protocol with only \(\mathrm{polylog}(n)\) rounds and a verifier that runs in time \(2^{n/\mathrm{polylog}(n)}\): Intuitively, at each input length, we define \(f^{\mathtt {ws}}\) by simultaneously applying \(O(\log (1/\delta))\) operators (rather than a single operator) to the polynomial that corresponds to the previous input length. Indeed, as one might expect, this increases the running time of the downward self-reducibility algorithm to \(\mathrm{poly}(1/\delta)\), but we can afford this. Implementing this approach requires some care, since multiple operators will be applied to a single variable (which represents many bits of information), and since the linearization operator needs to be replaced by a “degree-lowering operation” (that will reduce the individual degree of a variable to be \(\mathrm{poly}(1/\delta)\)); see Claim 4.7.3 for details.

Last, we also want our function to be downward self-reducible in \(\mathrm{polylog}(n)\) steps (i.e., after \(\mathrm{polylog}(n)\) “downward” steps, the function at the now-smaller input length is computable in time \(\mathrm{poly}(1/\delta)\) without an oracle). This follows by noting that the length of each interval \(I_n\) is now polylogarithmic, and that at the “bottom” input length the function \(f^{\mathtt {ws}}\) simply computes the arithmetized version of \(\mathtt {TQBF^{loc}}\), which (as mentioned above) is computable in time \(\mathrm{poly}(1/\delta)\).

The complexity of \(f^{\mathtt {ws}}\). For our derandomization result, it suffices to prove that \(f^{\mathtt {ws}}\) is computable in time \(2^{\tilde{O}(n)}\), rather than in linear space. (This is because our derandomization algorithm enumerates over all choices for a seed of length \(\tilde{O}(n)\) and computes the Nisan-Wigderson generator on each choice, with \(f^{\mathtt {ws}}\) as the hard function.) However, analogously to Trevisan and Vadhan [56], we prove the stronger statement that \(f^{\mathtt {ws}}\) is computable in linear space.²⁷ This stronger property may be of independent interest and in particular may be used in future work for constructions of PRGs that work in small space. (See Remark 4.13 for further details.)

4.1.3 The Construction Itself.

We consider the standard “totally quantified” variant of the Quantified Boolean Formula (\(\mathbf {QBF}\)) problem, called Totally Quantified Boolean Formula (\(\mathbf {TQBF}\)). In this version, the quantifiers do not appear as part of the input, and we assume that all the variables are quantified and that the quantifiers alternate according to the index of the variable (i.e., \(x_i\) is quantified by \(\exists\) if i is odd and otherwise quantified by \(\forall\)).

Definition 4.6

(TQBF)

A string \(\varphi \in \lbrace 0,1\rbrace ^*\) of length \(n=|\varphi |\) is in the set \(\mathtt {TQBF}\subseteq \lbrace 0,1\rbrace ^*\) if \(\varphi\) is a representation of a \(3\text{-}\mathtt {SAT}\) formula in variables indexed by \([n]\) such that, denoting the variables by \(w_1,\ldots ,w_{n}\), it holds that \(\exists w_1 \forall w_2 \exists w_3 \forall w_4 ... \varphi (w_1,\ldots ,w_n)\). In other words, \(\varphi \in \mathtt {TQBF}\) if the quantified expression that is obtained by quantifying all n variables, in order of their indices and with alternating quantifiers (starting with \(\exists\)), evaluates to true.

Recall that a formula \(\varphi\) that is represented by n bits actually has less than n input variables, since the representation length of an m-bit formula is \(O(m\cdot \log (m))\). Thus, an n-bit \(\varphi\) actually has at most \(n/O(\log (n))\) variable. In Definition 4.6, we assume for simplicity (and to avoid cumbersome notation) that \(\varphi\) has precisely n input variables, but some of these are dummy variables that are ignored.²⁸

Recall that \(\mathtt {QBF}\), in which the quantifiers are part of the input, is reducible in linear time to \(\mathtt {TQBF}\) from Definition 4.6 (by renaming variables and adding dummy variables).

The main result in this section is a construction of a well-structured function \(f^{\mathtt {ws}}\) such that \(\mathtt {TQBF}\) can be reduced to \(f^{\mathtt {ws}}\) with only quasilinear blow-up. This construction is detailed in the following lemma:

Lemma 4.7

(A Well-structured Set That is Hard forTQBFunder Quasilinear Reductions)

There exists a universal constant \(r\in \mathbb {N}\) such that for every constant \(c\in \mathbb {N}\) the following holds: For \(\ell (n)=\log (n)^{3c}\) and \(\delta (n)=2^{-n/\ell (n)}\), there exists a \((\delta ,O(\ell ^2))\)-well-structured function \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) such that \(f^{\mathtt {ws}}\) is computable in linear space, and \(\mathtt {TQBF}\) deterministically reduces to \(f^{\mathtt {ws}}\) in time \(n\cdot \log ^{2c+r}(n)\).

Proof.

In high level, we first reduce \(\mathtt {TQBF}\) to a problem \(\mathtt {TQBF^{loc}}\) that will have a property useful for arithmetization and then reduce \(\mathtt {TQBF^{loc}}\) to a function \(f^{\mathtt {ws}}\) that we will construct as follows: We will first carefully arithmetize a suitable witness-relation that underlies \(\mathtt {TQBF^{loc}}\); then transform the corresponding arithmetic version of \(\mathtt {TQBF^{loc}}\) to a collection of low-degree polynomials that also satisfy a property akin to downward self-reducibility (loosely speaking, these polynomials arise from the protocol underlying the proof of \(\mathcal {IP}=\mathcal {PSPACE}\) [42, 52]); and finally “combine” these polynomials to a Boolean function \(f^{\mathtt {ws}}\) that will “inherit” the useful properties of the low-degree polynomials and will thus be well-structured.

A variant of \(\mathtt {TQBF}\) that is amenable to arithmetization. We will need a non-standard variant of \(\mathtt {TQBF}\), which we denote by \(\mathtt {TQBF^{loc}}\), such that \(\mathtt {TQBF}\) is reducible to \(\mathtt {TQBF^{loc}}\) with quasilinear blow-up, and \(\mathtt {TQBF^{loc}}\) has an additional useful property. To explain this property, recall that the verification procedure of a “witness” \(w=w_1,\ldots ,w_n\) in \(\mathtt {TQBF}\) is local, in the following sense: For every fixed \(\varphi\) it holds that \(\varphi \in \mathtt {TQBF}\) iff \(\exists w_1 \forall w_2...\;3SAT(\varphi ,w)\), where \(3SAT(\varphi ,w)=\varphi (w)\) is a relation that can be decided by a conjunction of local conditions on the “witness” w. We want the stronger property that the relation that underlies \(\mathtt {TQBF^{loc}}\) can be tested by a conjunction of conditions that are local both in the input and in the witness. That is, denoting the underlying relation by \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\), we will have that \(x\in \mathtt {TQBF^{loc}}\) iff \(\exists w_1 \forall w_2 ... \; \mathtt {R}\text{-}\mathtt {TQBF^{loc}}(x,w)\), where \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) is a conjunction of local conditions on \((x,w)\). In more detail:

Claim 4.7.1

(A Variant ofTQBFwith Verification that is Local in Both Input and Witness)

There exists a set \(\mathtt {TQBF^{loc}}\in \mathcal {SPACE}[O(n)]\) and a relation \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\subseteq (\lbrace 0,1\rbrace ^*\times \lbrace 0,1\rbrace ^*)\) such that \(\mathtt {TQBF^{loc}}=\lbrace x:\exists w_1\forall w_2\exists w_3\forall w_4 ... (x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\rbrace\), and the following holds:

(1)	(Length-preserving witnesses.) For any \((x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) it holds that \(\|w\|=\|x\|\).
(2)	(Verification that is local in both input and witness.) For every \(n\in \mathbb {N}\) there exist n functions \(\lbrace f_i:\lbrace 0,1\rbrace ^{n}\times \lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace \rbrace _{i\in [n]}\) such that the mapping \((x,w,i)\mapsto f_i(x,w)\) is computable in quasilinear time and linear space, and each \(f_i\) depends on only three variables, and \((x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) if and only if for all \(i\in [n]\) it holds that \(f_i(x,w)=1\).
(3)	(Efficient reduction with quasilinear blow-up.) There exists a deterministic linear-space and quasilinear-time algorithm A that gets as input \(\varphi \in \lbrace 0,1\rbrace ^n\) and outputs \(x=A(\varphi)\) such that \(\varphi \in \mathtt {TQBF}\) if and only if \(x\in \mathtt {TQBF^{loc}}\).

□

Proof.

Consider a \(3\text{-}\mathtt {SAT}\) formula \(\varphi \in \lbrace 0,1\rbrace ^n\) as an input to \(\mathtt {TQBF}\), and for simplicity assume that n is even (this assumption is insignificant for the proof and only simplifies the notation). By definition, we have that \(\varphi \in \mathtt {TQBF}\) if and only if \(\begin{align*} \exists w_1 \forall w_2 \exists w_3 .... \exists w_n \; \varphi (w_1,\ldots ,w_n) = 1 \;\text{.} \end{align*}\)

Now, let M be a linear-space and quasilinear-time machine that gets as input \((\varphi ,w)\) and outputs \(\varphi (w)\). We use an efficient Cook-Levin transformation of the computation of the machine M on inputs of length \(2n\) to a \(3\text{-}\mathtt {SAT}\) formula and deduce the following²⁹: There exists a linear-space and quasilinear-time algorithm that, on input \(1^n\), constructs a \(3\text{-}\mathtt {SAT}\) formula \(\Phi _n:\lbrace 0,1\rbrace ^n\times \lbrace 0,1\rbrace ^n\times \lbrace 0,1\rbrace ^{\mathtt {ql}(n)}\rightarrow \lbrace 0,1\rbrace\) of size \(\mathtt {ql}(n)=\tilde{O}(n)\) such that for any \((\varphi ,w)\in \lbrace 0,1\rbrace ^{n}\times \lbrace 0,1\rbrace ^{n}\) it holds that \(\varphi (w)=1\) if and only if there exists a unique \(w^{\prime }\in \lbrace 0,1\rbrace ^{\mathtt {ql}(n)}\) satisfying \(\Phi _n(\varphi ,w,w^{\prime })=1\).

Now, using the formula \(\Phi _n\), note that \(\varphi \in \lbrace 0,1\rbrace ^n\) is in \(\mathtt {TQBF}\) if and only if (4.1) \(\begin{equation} \exists w_1 \forall w_2 \exists w_3 ... \exists w_n \; \exists w^{\prime }_1 \exists w^{\prime }_2 ... \exists w^{\prime }_{\mathtt {ql}(n)} \; \Phi _n(\varphi ,w,w^{\prime })=1 \;\text{.} \end{equation}\) We slightly modify \(\Phi _n\) to make the suffix of existential quantifiers in Equation (4.1) alternate with universal quantifiers that are applied to dummy variables. (Specifically, for each \(i\in [\mathtt {ql}(n)]\), we rename \(w^{\prime }_i\) to \(w^{\prime }_{2i}\), which effectively introduces a dummy variable before \(w^{\prime }_i\).) Denoting the modified formula by \(\Phi ^{\prime }_n\), we have that \(\varphi \in \mathtt {TQBF}\) if and only if \(\begin{align*} \exists w_1 \forall w_2 \exists w_3 ... \exists w_n \forall w^{\prime }_1 \exists w^{\prime }_2 \forall w^{\prime }_3 ... \exists w^{\prime }_{2\mathtt {ql}(n)} \; \Phi _n^{\prime }(\varphi ,w,w^{\prime })=1 \;\text{.} \end{align*}\)

We define the relation \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) to consist of all pairs \((x,w)\) such that \(x=(\varphi ,1^{2\mathtt {ql}(|\varphi |)})\) and \(w=(w^{(0)},w^{(1)})\in \lbrace 0,1\rbrace ^{|\varphi |}\times \lbrace 0,1\rbrace ^{2\mathtt {ql}(|\varphi |)}\) and \(\Phi ^{\prime }_{|\varphi |}(\varphi ,w^{(0)},w^{(1)})=1\). Indeed, in this case the corresponding set \(\mathtt {TQBF^{loc}}\) is defined by \(\begin{align*} \mathtt {TQBF^{loc}}= \left\lbrace (\varphi ,1^{2\mathtt {ql}(|\varphi |)}) : \exists w^{(0)}_1 \forall w^{(0)}_2 ... \exists w^{(0)}_{|\varphi |} \forall w^{(1)}_1 \exists w^{(1)}_2 ... \exists w^{(1)}_{2\mathtt {ql}(|\varphi |)} \; \Phi _{|\varphi |}^{\prime }(\varphi ,w^{(0)},w^{(1)})=1 \right\rbrace \;\text{.} \end{align*}\)

Note that, by definition, for every \((x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\), we have that \(|w|=|x|\). To see that \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) can be tested by a conjunction of efficiently computable local conditions, note that an n-bit input to \(\mathtt {TQBF^{loc}}\) is of the form \((\varphi ,1^{2\mathtt {ql}(|\varphi |)})\in \lbrace 0,1\rbrace ^m\times \lbrace 1\rbrace ^{2\mathtt {ql}(m)}\), and recall that \(\Phi ^{\prime }_m\) is a \(3\text{-}\mathtt {SAT}\) formula of size \(\mathtt {ql}(m)\lt n\) that can be produced in linear space and quasilinear time from input \(1^m\). Also, \(\mathtt {TQBF^{loc}}\) is computable in linear space, since on input \((\varphi ,1^{2\mathtt {ql}(|\varphi |)})\) the number of variables that are quantified is \(|\varphi |+2\mathtt {ql}(|\varphi |)\), and since \(\Phi ^{\prime }_{|\varphi |}\) can be evaluated in space \(O(|\varphi |)\). Last, \(\mathtt {TQBF}\) trivially reduces to \(\mathtt {TQBF^{loc}}\) by adding padding \(\varphi \mapsto (\varphi ,1^{2\mathtt {ql}(|\varphi |)})\).□

Arithmetic setting. For any \(n\in \mathbb {N}\), let \(\ell _0=\ell _0(n)=\lfloor (\log n)^c\rfloor\), let \(n^{\prime }=\lceil n/\ell _0\rceil\), let \(\delta _0(n)=2^{-n^{\prime }}\), and let \(\mathbb {F}\) be the field with \(2^{5n^{\prime }}=1/\mathrm{poly}(\delta _0(n))\) elements. Recall that a representation of such a field (i.e., an irreducible polynomial of degree \(5n^{\prime }\) over \(\mathbb {F}_2\)) can be found deterministically either in linear space (by a brute-force algorithm) or in time \(\mathrm{poly}(n^{\prime })=\mathrm{poly}(n)\) (by Shoup’s [53] algorithm).

Fix a bijection \(\pi\) between \(\lbrace 0,1\rbrace ^{5n^{\prime }}\) and \(\mathbb {F}\) (i.e., \(\pi\) maps any string in \(\lbrace 0,1\rbrace ^{5n^{\prime }}\) to the bit-representation of the corresponding element in \(\mathbb {F}\)) such that both \(\pi\) and \(\pi ^{-1}\) can be computed in polynomial time and linear space. Let \(H\subset \mathbb {F}\) be the set of \(2^{n^{\prime }}\) elements that are represented (via \(\pi\)) by bit-strings with a prefix of \(n^{\prime }\) arbitrary bits and a suffix of \(4n^{\prime }\) zeroes (i.e., \(H=\lbrace \pi (z):z=x0^{4n^{\prime }},x\in \lbrace 0,1\rbrace ^{n^{\prime }}\rbrace \subset \mathbb {F}\) such that \(|H|=2^{n^{\prime }}\)).³⁰

We will consider polynomials \(\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\), and we think of the inputs to each such polynomial as of the form \((x,w)\in \mathbb {F}^{\ell _0}\times \mathbb {F}^{\ell _0}\). Note that, intuitively, x and w each represent about \(5n\) bits of information. When x and w are elements in the subset \(H^{\ell _0}\subset \mathbb {F}^{\ell _0}\), we think of them as a pair of n-bit strings that might belong to \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\).

Arithmetization of \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\). Our first step is to carefully arithmetize the relation \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) within the arithmetic setting detailed above. We will mainly rely on the property that there is a “doubly-local” verification procedure for \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\).

Claim 4.7.2

(Low-degree Arithmetization).

There exists a polynomial \(P^{\mathtt {TQBF^{loc}}}:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\) such that the following holds:

(1)	(Low-degree.) The degree of \(P^{\mathtt {TQBF^{loc}}}\) is at most \(O(n\cdot 2^{n^{\prime }})\).
(2)	(Arithmetizes \(\mathtt {R}\text{-}\mathtt {TQBF^{loc}}\).) For every \((x,w)\in H^{\ell _0}\times H^{\ell _0}\) it holds that \(P^{\mathtt {TQBF^{loc}}}(x,w)=1\) if \((x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\), and \(P^{\mathtt {TQBF^{loc}}}(x,w)=0\) otherwise.
(3)	(Efficiently computable.) There exists a deterministic algorithm that gets as input \((x,w)\in \mathbb {F}^{2\ell _0}\), runs in time \(\mathrm{poly}(\|\mathbb {F}\|)\), and outputs \(P^{\mathtt {TQBF^{loc}}}(x,w)\in \mathbb {F}\). There also exists a deterministic linear-space algorithm with the same functionality.

Proof.

We first show a polynomial-time and linear-space algorithm that, given input \(1^n\), constructs a low-degree polynomial \(P^{\mathtt {TQBF^{loc}}}_0:\mathbb {F}^{2n^{\prime }\cdot \ell _0}\rightarrow \mathbb {F}\) that satisfies the following: For every \((x,w)\in \mathbb {F}_2^{2n^{\prime }\cdot \ell _0}\) (i.e., when the input is a string of \(2n^{\prime }\cdot \ell _0\ge 2n\) bits, and we interpret it as a pair \((x,w)\in \lbrace 0,1\rbrace ^{2n}\)) it holds that \(P^{\mathtt {TQBF^{loc}}}_0(x,w)=1\) if \((x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}(x,w)\), and \(P^{\mathtt {TQBF^{loc}}}_0(x,w)=0\) otherwise.

To do so, recall that, by Claim 4.7.1, we can construct in polynomial time and linear space a collection of n polynomials \(\lbrace f_i:\mathbb {F}_2^{2n^{\prime }\cdot \ell _0}\rightarrow \mathbb {F}_2\rbrace _{i\in [n]}\) such that for each \(i\in [n]\) the polynomial \(f_i\) depends only on three variables in the input \((x,w)\), and such that \((x,w)\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) if and only if for all \(i\in [n]\) it holds that \(f_i(x,w)=1\). For each \(i\in [n]\), let \(p_i:\mathbb {F}^{2n^{\prime }\cdot \ell _0}\rightarrow \mathbb {F}\) be the multilinear extension of \(f_i\), which can be evaluated in time \(\mathrm{poly}(n)\) and in linear space (since \(f_i\) depends only on three variables, and using Lagrange’s interpolation formula and the fact that \(\pi\) is efficiently computable). Then, the polynomial \(P^{\mathtt {TQBF^{loc}}}_0\) is simply the multiplication of all the \(p_i\)’s; that is, \(P^{\mathtt {TQBF^{loc}}}_0(x,w)=\prod _{i\in [n]}p_i(x,w)\). Note that \(P^{\mathtt {TQBF^{loc}}}_0\) can indeed be evaluated in time \(\mathrm{poly}(n)\) and in linear space, and that the degree of \(P^{\mathtt {TQBF^{loc}}}_0\) is \(O(n)\) (since each \(p_i\) is a multilinear polynomial in \(O(1)\) variables).

Now, let \(\pi ^{(H)}_1,\ldots ,\pi ^{(H)}_{n^{\prime }}:H\rightarrow \lbrace 0,1\rbrace\) be the “projection” functions such that \(\pi ^{(H)}_i\) outputs the ith bit in the bit-representation of its input according to \(\pi\). Abusing notation, we let \(\pi ^{(H)}_1,\ldots ,\pi ^{(H)}_{n^{\prime }}:\mathbb {F}\rightarrow \mathbb {F}\) be the low-degree extensions of the \(\pi ^{(H)}_i\)’s, which are of degree at most \(|H|-1\lt 2^{n^{\prime }}\). Also, for every \(\sigma \in \mathbb {F}\), we denote by \(\pi ^{(H)}(\sigma)\) the string \(\pi ^{(H)}_1(\sigma),\ldots ,\pi ^{(H)}_{n^{\prime }}(\sigma)\in \mathbb {F}^{n^{\prime }}\). Note that the mapping of \(\sigma \in \mathbb {F}\) to \(\pi ^{(H)}(\sigma)\in \mathbb {F}^{n^{\prime }}\) can be computed in time \(\mathrm{poly}(|H|)=\mathrm{poly}(|\mathbb {F}|)\) and in linear space (again just using Lagrange’s interpolation formula and the fact that \(\pi\) is efficiently computable).

Finally, we define the polynomial \(P^{\mathtt {TQBF^{loc}}}:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\). Intuitively, for \((x,w)\in H^{\ell _0}\times H^{\ell _0}\), the polynomial \(P^{\mathtt {TQBF^{loc}}}\) first uses the \(\pi ^{(H)}_i\)’s to compute the bit-projections of x and w, which are each of length \(n^{\prime }\cdot \ell _0\), and then evaluates the polynomial \(P^{\mathtt {TQBF^{loc}}}_0\) on these \(2n^{\prime }\cdot \ell _0\) bit-projections. More formally, for every \((x,w)\in \mathbb {F}^{2\ell _0}\), we define \(\begin{align*} P^{\mathtt {TQBF^{loc}}}(x,w) &= P^{\mathtt {TQBF^{loc}}}_0\Big (\pi ^{(H)}(x_1),\ldots ,\pi ^{(H)}(x_{\ell _0}),\pi ^{(H)}(w_1),\ldots ,\pi ^{(H)}(w_{\ell _0}) \Big) \;\text{.} \end{align*}\)

The first item in the claim follows, since for every \(i\in [n^{\prime }]\) the degree of \(\pi ^{(H)}_i\) is less than \(2^{n^{\prime }}\), and, since \(\deg (P^{\mathtt {TQBF^{loc}}}_0)=O(n)\). The second item in the claim follows immediately from the definition of \(P^{\mathtt {TQBF^{loc}}}\). And the third item in the claim follows, since \(\pi ^{(H)}\) can be computed in time \(\mathrm{poly}(|\mathbb {F}|)\) and in linear space, and since \(P^{\mathtt {TQBF^{loc}}}_0\) can be constructed and evaluated in polynomial time and in linear space. (The two different algorithms are since we need to find an irreducible polynomial, which can be done either in linear space or in time \(\mathrm{poly}(n)\lt \mathrm{poly}(|\mathbb {F}|)\).)□

Constructing a “downward self-reducible” collection of low-degree polynomials. Our goal now is to define a collection of \(O(\ell _0^2)\) polynomials \(\lbrace P_{n,i}:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\rbrace _{i\in [O(\ell _0^2)]}\) such that the polynomials are of low degree, and \(P_{n,1}\) essentially computes \(\mathtt {TQBF^{loc}}\), and computing \(P_{n,i}\) can be reduced in time \(\mathrm{poly}(1/\delta _0(n))\) to computing \(P_{n,i+1}\). The collection and its properties are detailed in the following claim:

Claim 4.7.3.

There exists a collection of \(\bar{\ell _0}=\ell _0(2\ell _0+1)+1\) polynomials, denoted \(\lbrace P_{n,i}:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\rbrace _{i\in [\bar{\ell _0}]}\), that satisfies the following:

(1)	(Low degree:) For every \(i\in [\bar{\ell _0}]\), the degree of \(P_{n,i}\) is at most \(O(n\cdot \ell _0\cdot 2^{2n^{\prime }})\).
(2)	(\(P_{n,1}\) computes \(\mathtt {TQBF^{loc}}\) on H-inputs:) For any \((x,w)\in H^{\ell _0}\times H^{\ell _0}\) it holds that \(P_{n,1}(x,w)=1\) if \(x\in \mathtt {TQBF^{loc}}\), and \(P_{n,1}(x,w)=0\) if \(x\notin \mathtt {TQBF^{loc}}\). (Regardless of w.)
(3)	(“Forward” self-reducible:) For every \(i\in [\bar{\ell _0}]\) it holds that \(P_{n,i}\) can be computed in time \(\mathrm{poly}(2^{n^{\prime }})\) when given oracle access to \(P_{n,i+1}\).
(4)	(Efficiently computable:) The polynomial \(P_{n,\bar{\ell _0}}\) can be computed in time \(\mathrm{poly}(2^{n^{\prime }})\). Moreover, for every \(i\in [\bar{\ell _0}]\), it holds that \(P_{n,i}\) can be computed in space \(O(n\cdot \bar{\ell _0})\).

Proof.

For simplicity of notation, assume throughout the proof that \(n^{\prime }\) is even. Towards defining the collection of polynomials, we first define two operators on functions \(p:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\). Loosely speaking, the first operator corresponds to \(n^{\prime }\) alternating quantification steps in the \(\mathcal {IP}=\mathcal {PSPACE}\) proof (i.e., \(n^{\prime }\) steps of alternately quantifying the next variable either by \(\exists\) or by \(\forall\)), and the second operator roughly corresponds to a linearization step that is simultaneously applied to \(n^{\prime }\) variables. In both cases, the \(n^{\prime }\) variables that we consider are the bits in the representation of a single element in the second input to p.

Quantifications operator: Let \(i\in [\ell _0]\). Loosely speaking, \(\mathtt {Quant}^{(i)}(p)\) causes p to ignore the ith variable of its second input and instead consider alternating quantification steps applied to the bits that represent this variable. In more detail, consider an input \((x,w)\in \mathbb {F}^{2\ell _0}\) for p, and think of \(w=w_1,\ldots ,w_{\ell _0}\in \mathbb {F}\). The operator \(\mathtt {Quant}^{(i)}(p)\) causes p to ignore \(w_i\) and instead think of a variable \(\pi (\sigma 0^{4n^{\prime }})\in H\) that is determined a sequence of \(n^{\prime }\) bits \(\sigma =\sigma _1,\ldots ,\sigma _{n^{\prime }}\); then, \(\mathtt {Quant}^{(i)}(p)\) will be the arithmetization of the expression “\(\exists \sigma _1\forall \sigma _2\exists \sigma _3...:p(x,w_{1},\ldots ,w_{i-1},\pi (\sigma 0^{4n^{\prime }}),w_{i+1},\ldots ,w_{\ell _0})\)” (obtained by arithmetizing the “\(\exists\)” and “\(\forall\)” operations in the usual way). To do this, we define a sequence of functions such that the first function replaces the ith variable in the second input for p by a dummy variable in H, and each subsequent function corresponds to a quantification step applied to a single bit in the representation of this dummy variable.

Formally, we recurvisely define \(n^{\prime }+1\) functions \(\mathtt {Quant}^{(i,0)},\ldots ,\mathtt {Quant}^{(i,n^{\prime })}=\mathtt {Quant}^{(i)}(p)\) such that for \(j\in \lbrace 0,\ldots ,n^{\prime }\rbrace\) it holds that \(\mathtt {Quant}^{(i,j)}(p)\) is a function \(\mathbb {F}^{2\ell _0}\times \lbrace 0,1\rbrace ^{n^{\prime }-j}\rightarrow \mathbb {F}\). The function \(\mathtt {Quant}^{(i,0)}(p)\) gets as input \((x,w)\in \mathbb {F}^{2\ell _0}\) and \(\sigma \in \lbrace 0,1\rbrace ^{n^{\prime }}\), ignores the ith element of w, and outputs \(\mathtt {Quant}^{(i,0)}(x,w,\sigma)=p(x,w_1...w_{i-1}\pi (\sigma 0^{4n^{\prime }}))\). Then, for \(j\in [n^{\prime }]\), if j is odd, then we define \(\begin{align*} \mathtt {Quant}^{(i,j)}(p)(x,w,\sigma _1...\sigma _{n^{\prime }-j}) = 1 - \left(\prod _{z\in \lbrace 0,1\rbrace }\left(1-\mathtt {Quant}^{(i,j-1)}(p)(x,w,\sigma _1,\ldots ,\sigma _{n^{\prime }-j} z)\right)\right) \;\text{,} \end{align*}\) and if j is even, then we define \(\begin{align*} \mathtt {Quant}^{(i,j)}(p)(x,\sigma _1,\ldots ,\sigma _{n^{\prime }-j}) = \prod _{z\in \lbrace 0,1\rbrace }\mathtt {Quant}^{(i,j-1)}(p)(x,w,\sigma _1...\sigma _{n^{\prime }-j}z) \;\text{.} \end{align*}\)

Note that the function \(\mathtt {Quant}^{(i)}(p)\) can be evaluated at any input in linear space with oracle access to p (since each \(\mathtt {Quant}^{(i,j)}(p)\) can be evaluated in linear space with oracle access to \(\mathtt {Quant}^{(i,j-1)}(p)\)). Also observe the following property of \(\mathtt {Quant}^{(i)}(p)\), which follows immediately from the definition:

Fact 4.7.3.1.

If for some \(x\in H^{\ell _0}\) and any \(w\in H^{\ell _0}\) it holds that \(p(x,w)\in \lbrace 0,1\rbrace\), then for the same x and any \(w\in H^{\ell _0}\) it holds that \(\mathtt {Quant}^{(i)}(p)(x,w)=1\) if \(\exists \sigma _1\forall \sigma _2\exists \sigma _3...\forall \sigma _{n^{\prime }}\) such that \(p(x,w_1...w_{i-1}\pi (\sigma _1...\sigma _{n^{\prime }}0^{4n^{\prime }})w_{i+1}...w_{\ell _0})=1\), and \(\mathtt {Quant}^{(i)}(p)(x,w)=0\) otherwise.□

Degree-reduction operator: For every fixed \(z\in H\), let \(I_z:H\rightarrow \lbrace 0,1\rbrace\) be the indicator function of whether the input equals z, and let \(\bar{I}_z:\mathbb {F}\rightarrow \mathbb {F}\) be the low-degree extension of \(I_z\), which is of degree at most \(|H|-1\) (i.e., \(\bar{I}_z(x)=\prod _{h\in H\setminus \lbrace z\rbrace }\frac{x-h}{z-h}\)). Then, for any \(i\in [\ell _0]\), we define \(\begin{align*} \mathtt {DegRed}^{(i)}(p)(x,w) = \sum _{z\in H} \bar{I}_z(x_i)\cdot p(x_1...x_{i-1}zx_{i+1}...x_{\ell _0},w) \;\text{,} \end{align*}\) and similarly for \(i\in [2\ell _0]\), we denote \(i^{\prime }=i-\ell _0\) and define \(\begin{align*} \mathtt {DegRed}^{(i)}(p)(x,w) = \sum _{z\in H} \bar{I}_z(w_{i^{\prime }})\cdot p(x,w_1...w_{i^{\prime }-1}zw_{i^{\prime }+1}...w_{\ell _0}) \;\text{.} \end{align*}\)

Similarly to the operator \(\mathtt {Quant}^{(i)}\), note that the function \(\mathtt {DegRed}^{(i)}(p)\) can be evaluated at any input in linear space with oracle access to p. Also, the definition of the operator \(\mathtt {DegRed}^{(i)}\) implies that:

Fact 4.7.3.2.

For \(i\in [2\ell _0]\), let v be the variable whose degree \(\mathtt {DegRed}^{(i)}\) reduces (i.e., \(v=x_i\) if \(i\in [\ell _0]\) and \(v=w_{i^{\prime }}=w_{i-\ell _0}\) if \(i\in [2\ell _0]\)). Then, the individual degree of v in \(\mathtt {DegRed}^{(i)}(p)\) is \(|H|-1\), and the individual degree of any other input variable to \(\mathtt {DegRed}^{(i)}(p)\) remains the same as in p. Moreover, for every \((x,w)\in \mathbb {F}^{\ell _0}\times \mathbb {F}^{\ell _0}\), if the input \((x,w)\) assigns the variable v to a value in H, then \(\mathtt {DegRed}^{(i)}(p)(x,w)=p(x,w)\).

Composing the operators: We will be particularly interested in what happens when we first apply the quantifications operator to some variable \(i\in [\ell _0]\) and then apply the degree-reduction operator to all variables sequentially. A useful property of this operation is detailed in the following claim:

Claim 4.7.3.3.

Let \(p:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\) and \(x\in H^{\ell _0}\) such that for any \(w\in H^{\ell _0}\) it holds that \(p(x,w)\in \lbrace 0,1\rbrace\). For \(i\in [\ell _0]\), let \(p^{\prime }:\mathbb {F}^{2\ell _0}\rightarrow \mathbb {F}\) be the function that is obtained by first applying \(\mathtt {Quant}^{(i)}\) to p, then applying \(\mathtt {DegRed}^{(j)}\) for each \(j=1,\ldots ,2\ell _0\). Then, for any \(w^{\prime }\in H^{\ell _0}\), we have that \(p^{\prime }(x,w^{\prime })=1\) if \(\exists \sigma _1\forall \sigma _2\exists \sigma _3...\forall \sigma _{n^{\prime }} :p(x,w^{\prime }_1...w^{\prime }_{i-1}\pi (\sigma _1...\sigma _{n^{\prime }})w^{\prime }_{i+1}...w^{\prime }_{\ell _0})=1\), and \(p^{\prime }(x,w^{\prime })=0\) otherwise.

Proof.

Fix any \(w^{\prime }\in H^{\ell _0}\). By Fact 4.7.3.1, and relying on the hypothesis that for any \(w\in H^{\ell _0}\), we have that \(p(x,w)\in \lbrace 0,1\rbrace\), it follows that \(\mathtt {Quant}^{(i)}(p)(x,w^{\prime })=1\) if \(\exists \sigma _1\forall \sigma _2\exists \sigma _3...\forall \sigma _{n^{\prime }} : p(x,w^{\prime }_1...w^{\prime }_{i-1}\pi (\sigma _1...\sigma _{n^{\prime }})w^{\prime }_{i+1}...w^{\prime }_{\ell _0})=1\) and that \(\mathtt {Quant}^{(i)}(p)(x,w^{\prime })=0\) otherwise. Now, let \(p^{(0)}=\mathtt {Quant}^{(i)}(p)\), and for every \(j\in [2\ell _0]\) recursively define \(p^{(j)}=\mathtt {DegRed}^{(j)}(p^{(j-1)})\). By the “moreover” part of Fact 4.7.3.2, and, since \((x,w^{\prime })\in H^{\ell _0}\times H^{\ell _0}\), for every \(j\in [2\ell _0]\), we have that \(p^{(j)}(x,w^{\prime })=p^{(j-1)}(x,w^{\prime })\), and hence \(p^{\prime }(x,w^{\prime })=\mathtt {Quant}^{(i)}(x,w^{\prime })\).□

Defining the collection of polynomials: Let us now define the collection of \(\bar{\ell _0}=\ell _0(2\ell _0+1)+1\) polynomials. We first define \(P_{n,\ell _0(2\ell _0+1)+1}(x,w) = P^{\mathtt {TQBF^{loc}}}(x,w)\). Then, we recursively construct the collection in \(\ell _0\) blocks such that each block consists of \(2\ell _0+1\) polynomials. The base case will be block \(i=\ell _0\), and we will decrease i down to 1. Loosely speaking, in each block \(i\in [\ell _0]\), starting from the last polynomial in the previous block, we first apply a quantification operator to the ith variable of the second input w and then apply \(2\ell _0\) linearization operators, one for each variable in the inputs \((x,w)\). Specifically, for the ith block, we define the first polynomial by \(P_{n,i(2\ell _0+1)}(x,w)=\mathtt {Quant}^{(i)}(P_{n,i(2\ell _0+1)+1})(x,w)\); and for each \(j=1,\ldots ,2\ell _0\), we define \(P_{n,i(2\ell _0+1)-j}(x,w)=\mathtt {DegRed}^{(j)}(P_{n,i(2\ell _0+1)-j+1})(x,w)\).

Note that the claimed Property (3) of the collection holds immediately from our definition. To see that Property (4) also holds, note that the first part (regarding \(P_{n,\bar{\ell _0}}\)) holds by Claim 4.7.2; and for the “moreover” part, recall (by the properties of the operators \(\mathtt {Quant}^{(i)}\) and \(\mathtt {DegRed}^{(i)}\) that were mentioned above) that each polynomial \(P_{n,k}\) in the collection can be computed in linear space when given access to the “previous” polynomial \(P_{n,k-1}\), and also that we can compute the “first” polynomial \(P_{n,\ell _0(2\ell _0+1)+1}\) in linear space (since this polynomial is just \(P^{\mathtt {TQBF^{loc}}}\) and relying on Claim 4.7.2). Using a suitable composition lemma for space-bounded computation (see, e.g., [19, Lemma 5.2]), we can compute any polynomial in the collection in space \(O(n\cdot \bar{\ell _0})\).

We now prove Property (1), which asserts that all the polynomials in the collection are of degree at most \(O(n\cdot \ell _0\cdot 2^{2n^{\prime }})\). We prove this by induction on the blocks, going from \(i=\ell _0\) down to \(i=1\), while maintaining the invariant that the “last” polynomial in the previous block \(i+1\) (i.e., the polynomial \(P_{n,i(2\ell _0+1)+1}\)) is of degree at most \(O(n\cdot 2^{n^{\prime }})\). For the base case \(i=\ell _0\) the invariant holds by our definition that \(P_{n,\ell _0(2\ell _0+1)+1}=P^{\mathtt {TQBF^{loc}}}\) and by Claim 4.7.2. Now, for every \(i=\ell _0,\ldots ,1\), note that the first polynomial \(P_{n,i(2\ell _0+1)}\) in the block is of degree at most \(2^{n^{\prime }}\cdot \deg (P_{n,i(\ell _0+1)+1})=O(n\cdot 2^{2n^{\prime }})\) (i.e., the quantifications operator induces a degree blow-up of \(2^{n^{\prime }}\)), and in particular the individual degrees of all variables of \(P_{n,i(2\ell _0+1)}\) are upper-bounded by this expression. Then, in the subsequent \(2\ell _0\) polynomials in the block, we reduce the individual degrees of the variables (sequentially) until all individual degrees are at most \(|H|-1\lt 2^{n^{\prime }}\) (this relies on Fact 4.7.3.2). Thus, the degree of the last polynomial in the block (i.e., of \(P_{n,(i-1)(2\ell _0+1)+1}\)) is at most \(2\ell _0\cdot 2^{n^{\prime }}\lt n\cdot 2^{n^{\prime }}\), and the invariant is indeed maintained.

Finally, to see that Property (2) holds, fix any \((x,w)\in H^{\ell _0}\times H^{\ell _0}\). Our goal is to show that \(P_{n,1}(x,w)=1\) if \(x\in \mathtt {TQBF^{loc}}\) and \(P_{n,1}(x,w)=0\) otherwise (regardless of w). To do so, recall that \(P_{n,\bar{\ell _0}}=P^{\mathtt {TQBF^{loc}}}\), and hence for any \(w^{\prime }\in H^{\ell _0}\) it holds that \(P_{n,\bar{\ell _0}}(x,w^{\prime })=1\) if \((x,w^{\prime })\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) and \(P_{n,\bar{\ell _0}}(x,w^{\prime })=0\) otherwise. Note that the last polynomial in block \(i=\ell _0\) (i.e., the polynomial \(P_{n,\ell _0(2\ell _0+1)-2\ell _0}\)) is obtained by applying \(\mathtt {Quant}^{(\ell _0)}\) to \(P_{n,\bar{\ell _0}}\) and then applying \(\mathtt {DegRed}^{(j)}\) for each \(j=1,\ldots ,2\ell _0\). Using Claim 4.7.3.3, for any \(w^{\prime }\in H^{\ell _0}\), when this polynomial is given input \((x,w^{\prime })\), it outputs the value 1 if \(\exists \sigma _1\forall \sigma _2\exists \sigma _3...\forall \sigma _{n^{\prime }} (x,w^{\prime }_1...w^{\prime }_{\ell _0-1}\pi (\sigma _1...\sigma _{n^{\prime }}))\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\) and outputs 0 otherwise. By repeatedly using Claim 4.7.3.3 for the last polynomial in each block \(i=\ell _0-1,\ldots ,1\), we have that \(P_{n,1}(x,w)=1\) if \(\exists \sigma ^{(1)}_1\forall \sigma ^{(1)}_2...\forall \sigma ^{(1)}_{n^{\prime }}...\exists \sigma ^{(\ell _0)}_1...\forall \sigma ^{(\ell _0)}_{n^{\prime }}:(x,w^{\prime })\in \mathtt {R}\text{-}\mathtt {TQBF^{loc}}\), where \(w^{\prime }=(\pi (\sigma ^{(1)}_1...\sigma ^{(1)}_{n^{\prime }}),\ldots ,\pi (\sigma ^{(\ell _0)}_1...\sigma ^{(\ell _0)}_{n^{\prime }}))\); and \(P_{n,1}(x,w)=0\) otherwise. In other words, we have that \(P_{n,1}(x,w)=1\) if \(x\in \mathtt {TQBF^{loc}}\) and \(P_{n,1}(x,w)=0\) otherwise, as we wanted.

Combining the polynomials into a Boolean function. Intuitively, the polynomials in our collection are already downward self-reducible (where “downward” here means that \(P_{n,i}\) is reducible to \(P_{n,i+1}\)) and sample-aided worst-case to average-case reducible (since the polynomials have low degree and relying on Proposition B.1). Our goal now is simply to “combine” these polynomials into a single Boolean function \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) that will be \(\delta\)-well-structured.

For every \(n\in \mathbb {N}\), we define a corresponding interval of input lengths \(I_n=[ N,N+\bar{\ell _0}-1 ]\), where \(N=10n^{\prime }\cdot \ell _0 + 11n\cdot \bar{\ell _0}=O(n\cdot \bar{\ell _0})\). Then, for every \(i\in \lbrace 0,\ldots ,\bar{\ell _0}-1\rbrace\), we define \(f^{\mathtt {ws}}\) on input length \(N+i\) such that it computes (a Boolean version of) \(P_{n,\bar{\ell _0}-i}\). Specifically, \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^{N+i}\rightarrow \lbrace 0,1\rbrace ^{N+i}\) considers only the first \(10n^{\prime }\cdot \ell _0=2\ell _0\cdot \log (|\mathbb {F}|)=O(n)\) bits of its input, maps these bits to \((x,w)\in \mathbb {F}^{2\ell _0}\) using \(\pi\), computes \(P_{n,\bar{\ell _0}-i}(x,w)\), and outputs the bit-representation of \(P_{n,\bar{\ell _0}-i}(x,w)\) (using \(\pi ^{-1}\)), padded to the appropriate length \(N+i\). On input lengths that do not belong to any interval \(I_n\) for \(n\in \mathbb {N}\), we define \(f^{\mathtt {ws}}\) in some fixed trivial way (e.g., as the identity function).

A straightforward calculation shows that the intervals \(\lbrace I_n\rbrace _{n\in \mathbb {N}}\) are disjoint, and thus \(f^{\mathtt {ws}}\) is well-defined.³¹ In addition, since the input length to \(f^{\mathtt {ws}}\) is \(N=O(n\cdot \bar{\ell _0})\) and each polynomial in the collection is computable in space \(O(n\cdot \bar{\ell _0})\), it follows that \(f^{\mathtt {ws}}\) is computable in linear space. To see that \(\mathtt {TQBF}\) reduces to \(f^{\mathtt {ws}}\), recall that, by Claim 4.7.1, we can reduce \(\mathtt {TQBF}\) to \(\mathtt {TQBF^{loc}}\) in time \(n\cdot (\log n)^r\) (for some universal constant \(r\in \mathbb {N}\)); and note that we can then further reduce \(\mathtt {TQBF^{loc}}\) to \(f^{\mathtt {ws}}\) by mapping any \(x\in \lbrace 0,1\rbrace ^n\) to an \((N+\bar{\ell _0}-1)\)-bit input of the form \((x,w,p)\), where w is an arbitrary string and p is padding. (This is since \(f^{\mathtt {ws}}\) on inputs of length \(N+\bar{\ell _0}-1\) essentially computes \(P_{n,1}\).) This reduction is computable in deterministic time \(n\cdot \log (n)^{r+2c+1}\).

We now want to show that \(f^{\mathtt {ws}}\) is downward self-reducible in time \(\mathrm{poly}(1/\delta)\) and in \(O((\log N)^{2c})\) steps, where \(\delta (N)=2^{N/(\log N)^{3c}}\) and N denotes the input length. To see this, first note that given input length \(N\in \mathbb {N}\), we can find in polynomial time an input length n such that \(N\in I_{n}\), if such n exists. If such n does not exist, then the function is defined trivially on input length n and can be computed in polynomial time. Otherwise, let \(N_0\le N\) be the smallest input length in \(I_{n}\) (i.e., \(N_0=10\lceil n/\ell _0(n)\rceil \cdot \ell _0(n) + 11n\cdot \bar{\ell _0}(n)\)), and denote \(N=N_0+i\), for some \(i\in \lbrace 0,\ldots ,\bar{\ell _0}(n)-1\rbrace\). Note that \(f^{\mathtt {ws}}_N\) corresponds to the polynomial \(P_{n,\bar{\ell _0}(n)-i}\), and \(f^{\mathtt {ws}}_{N-1}\) corresponds to the polynomial \(P_{n,\bar{\ell _0}(n)-(i-1)}\). By Claim 4.7.3, the former can be computed in time \(\mathrm{poly}(2^{n^{\prime }})=\mathrm{poly}(2^{n/(\log n)^c})=\mathrm{poly}(2^{N/(\log N)^{3c}})\) with oracle access to the latter. Last, recall that \(|I_{n}|=\bar{\ell _0}(n)\lt O(\log N)^{2c}\) and that \(f^{\mathtt {ws}}_{N_0}\) corresponds to \(P_{n,\ell _0(n)}\), which can be computed in time \(\mathrm{poly}(2^{n^{\prime }})\); hence, there exists an input length \(N_0\ge N-O((\log N)^{2c})\) such that \(f^{\mathtt {ws}}_{N_0}\) can be computed in time \(\mathrm{poly}(2^{n^{\prime }})\lt \mathrm{poly}(1/\delta (N_0))\).

To see that \(f^{\mathtt {ws}}\) is sample-aided worst-case to \(\delta\)-average-case reducible, first note that computing \(f^{\mathtt {ws}}\) on any input length N on which it is not trivially defined is equivalent (up to a polynomial factor in the runtime) to computing a polynomial \(\mathbb {F}^{2\ell _0(n)}\rightarrow \mathbb {F}\) of degree \(d=O(\mathrm{poly}(n)\cdot 2^{2n^{\prime }})\) in a field of size \(q=|\mathbb {F}|=2^{5n^{\prime }}\), where \(n\lt N/(\log N)^{2c}\) and \(n^{\prime }=\lceil n/\ell _0(n)\rceil\).³² We use Proposition B.1 with parameter \(\rho (\log (|\mathbb {F}^{2\ell _0(n)}|))=\delta _0(n)\lt \delta (N)\) and note that its hypothesis \(\delta _0(n)\ge 10\cdot \sqrt {d/|\mathbb {F}|}\) is satisfied, since we chose \(|\mathbb {F}|=\mathrm{poly}(1/\delta _0(n))\) to be sufficiently large.

4.2 PRGs for Uniform Circuits with Almost-exponential Stretch

Let \(\delta (n)=2^{-n/\mathrm{polylog}(n)}\). The following proposition asserts that if there exists a function that is both \(\delta\)-well-structured and “hard” for probabilistic algorithms that run in time \(2^{n/\mathrm{polylog}(n)}\), then there exists an i.o.-PRG for uniform circuits with almost-exponential stretch. That is:

Proposition 4.8

(Almost-exponential Hardness of a Well-structured Function → PRG for Uniform Circuits with Almost-exponential Stretch)

Assume that for some constant \(c\in \mathbb {N}\) and for \(\delta (n)=2^{-n/\log (n)^{c}}\) there exists a \(\delta\)-well-structured function that can be computed in linear space but cannot be computed by probabilistic algorithms that run in time \(2^{O(n/\log (n)^c)}\). Then, for every \(k\in \mathbb {N}\) and for \(t(n)=n^{\mathrm{loglog}(n)^k}\) there exists a \((1/t)\)-i.o.-PRG for \((t,\log (t))\)-uniform circuits that has seed length \(\tilde{O}(\log (n))\) and is computable in time \(n^{\mathrm{polyloglog}(n)}\).

Proposition 4.8 follows as an immediate corollary of the following lemma. Loosely speaking, the lemma asserts that for any \(\delta\)-well-structured function \(f^{\mathtt {ws}}\), there exists a corresponding PRG with almost-exponential stretch such that a uniform algorithm that distinguishes the output of the PRG from uniform yields a uniform probabilistic algorithm that computes \(f^{\mathtt {ws}}\). Moreover, the lemma provides a “point-wise” statement: For any \(n\in \mathbb {N}\), a distinguisher on a small number (i.e., \(\mathrm{polyloglog}(n)\)) of input lengths in a small interval around n yields a uniform algorithm for \(f^{\mathtt {ws}}\) on input length \(\tilde{O}(\log (n))\). We will later use this “point-wise” property of the lemma to extend Proposition 4.8 to “almost everywhere” versions (see Propositions 4.11 and 4.12).

In the following statement, we consider three algorithms: The pseudorandom generator G; a potential distinguisher for the PRG, denoted A; and an algorithm F for the “hard” function \(f^{\mathtt {ws}}\). Loosely speaking, the lemma asserts that for any \(n\in \mathbb {N}\), if G is not pseudorandom for A on every input length in a small set of input lengths surrounding n, then F computes \(f^{\mathtt {ws}}\) on input length \(\ell (n)=\tilde{O}(\log (n))\). We will first fix a constant c that determines the target running time of F (i.e., running time \(t_F(\ell)=2^{\ell /\log (\ell)^c}\)), and the other parameters (e.g., the parameters of the well-structured function and the seed length of the PRG) will depend on c. Specifically:

Lemma 4.9

(Distinguishing a PRG based onf^ws → Computingf^ws)

Let \(c\in \mathbb {N}\) be an arbitrary constant, let \(\delta (n)=2^{-n/\log (n)^{c}}\), and let \(s:\mathbb {N}\rightarrow \mathbb {N}\) be a polynomial-time computable function such that \(s(n)\le n/2\) for all \(n\in \mathbb {N}\). Let \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) be a \((\delta ,s)\)-well-structured function that is computable in linear space, let \(t(n)=n^{\mathrm{loglog}(n)^k}\) for some constant \(k\in \mathbb {N}\), and let \(\ell (n)=\lceil \log (n)\cdot (\mathrm{loglog}n)^b\rceil\) for a sufficiently large constant \(b\in \mathbb {N}\). Then, there exist two objects that satisfy the property detailed below:

(1)	(Pseudorandom generator). An algorithm \(G_0\) that gets as input \(1^{n}\) and a random seed of length \(\ell _{G}(n)=\tilde{O}(\ell (n))\), runs in time \(n^{\mathrm{polyloglog}(n)}\), and outputs a string of length n.
(2)	(Mapping of any input length to a small set of surrounding input lengths). A polynomial-time computable mapping of any unary string \(1^n\) to a set \(S_n\subset [n,n^2]\) of size \(\|S_n\|=s(\tilde{O}(\log (n)))\), where \(a\in \mathbb {N}\) is a sufficiently large constant that depends on k.

The property that the foregoing objects satisfy is the following: For every probabilistic time-t algorithm A that uses \(\log (t)\) bits of non-uniform advice there exists a corresponding probabilistic algorithm F that runs in time \(t_F(\ell)=2^{O(\ell /\log (\ell)^c)}\) such that for any \(n\in \mathbb {N}\), we have that: If for every \(m\in S_n\) it holds that \(G_0(1^m,{\bf u}_{\ell _{G_0}(m)})\) is not \((1/t(m))\)-pseudorandom for A, then F computes \(f^{\mathtt {ws}}\) on strings of length \(\ell (n)\).

Moreover, for any function \(\mathtt {str}:\mathbb {N}\rightarrow \mathbb {N}\) such that \(\mathtt {str}(n)\le n\), the above property holds if we replace \(G_0\) by the algorithm G that computes \(G_0\) and truncates the output to length \(\mathtt {str}(n)\) (i.e., \(G(1^n,z)=G_0(1^n,z)_{1},\ldots ,G_0(1^n,z)_{\mathtt {str}(n)}\)).

Observe that Proposition 4.8 indeed follows as a contra-positive of Lemma 4.9 (with \(\mathtt {str}\) being the identity function, which means that \(G=G_0\)): If every probabilistic algorithm F that gets an \(\ell\)-bit input and runs in time \(2^{O(\ell /\log (\ell)^c)}\) fails to compute \(f^{\mathtt {ws}}\) infinitely-often, then for every corresponding time-t algorithm A there exists an infinite set of inputs on which G is pseudorandom for A.

Proof of Lemma 4.9

We prove the “moreover” part, and it implies the foregoing statement using the function \(\mathtt {str}(n)=n\).

Construction: The generator \(G_0\). For any p, s, \(\delta\), k, t, and \(f^{\mathtt {ws}}\) that satisfy our hypothesis, let \(f^{\mathtt {GL(ws)}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace\) be defined as follows: For any \((x,r)\in \lbrace 0,1\rbrace ^{n}\times \lbrace 0,1\rbrace ^{n}\), we let \(f^{\mathtt {GL(ws)}}(x,r)=\sum _{i\in [n]}f^{\mathtt {ws}}(x)_i\cdot r_i\), where the arithmetic is over \(\mathbb {F}_2\).³³ (We use the notation \(f^{\mathtt {GL(ws)}}\), since we will use the algorithm of Goldreich and Levin [21] to transform a circuit that agrees with \(f^{\mathtt {GL(ws)}}\) on \(1/2+\epsilon\) of the inputs into a circuit that computes \(f^{\mathtt {ws}}\) on \(\mathrm{poly}(\epsilon)\) of the inputs.) We will need the following standard definition:

Definition 4.9.1

(Combinatorial Designs).

An \((\ell ,a)\)-combinatorial design is a collection of sets \(S_1,\ldots ,S_n\subseteq [d]\) such that for every \(i\in [n]\) it holds that \(|S_i|=\ell\), and for every distinct \(i,j\in [n]\) it holds that \(|S_i\cap S_j|\le a\). We call n the number of sets, and d the universe size, and a the pairwise-intersection size.□

Consider a combinatorial design that has n sets of size \(\ell (n)=\lceil \log (n)\cdot (\mathrm{loglog}n)^b\rceil\) (where b is a sufficiently large constant that depends on k) with pairwise-intersection size \(\gamma \cdot \log (n)\), where \(\gamma \gt 0\) is a sufficiently small constant, in a universe of size \(\ell _{G}(n)=\tilde{O}(\ell (n))=\tilde{O}(\log (n))\) (see, e.g., [58, Problem 3.2] for a polynomial-time construction of such a design).

The algorithm \(G_0\) is the Nisan-Wigderson generator, instantiated with \(f^{\mathtt {GL(ws)}}\) as the hard function and with the foregoing design. Since \(f^{\mathtt {ws}}\) is computable in linear space, the function \(f^{\mathtt {GL(ws)}}(x,r)\) is computable in time \(n^{\mathrm{polyloglog}(n)}\), and hence \(G_0\) is computable in time \(n^{\mathrm{polyloglog}(n)}\) and has seed length \(\ell _G(n)\).

Analysis: Transforming a distinguisher A into an algorithm F for \(f^{\mathtt {ws}}\). Let us first fix some parameters that will be useful below. Denote \(\ell ^{\prime }(n)=\ell (n)/\log (\ell (n))^{c+1}\), and fix a sufficiently small universal constant \(\epsilon \gt 0\). We assume that \(\ell (n)\) is sufficiently large such that \(t(n)=n^{\mathrm{loglog}(n)^k}\le 2^{\epsilon \cdot \ell ^{\prime }(n)}\). Recall that, since \(f^{\mathtt {ws}}\) is downward self-reducible in s steps, there exists an input length \(\ell _0(n)\ge \ell (n)-s(\ell (n))\) such that \(f^{\mathtt {ws}}_{\ell _0(n)}\) is computable in time \(\mathrm{poly}(1/\delta (\ell _0(n)))\). For \(L_n=\lbrace \ell _0(n),\ldots ,\ell (n)\rbrace\), we define \(S_n=\lbrace \ell ^{-1}(2i):i\in L_n\rbrace\); see Figure 1 for an illustration. Note that indeed \(|S_n|\le s(\ell (n))=s(\tilde{O}(\log (n)))\); and relying on the fact that \(s(\ell (n))\le \ell (n)/2\), we have that \(S_n\subset [n_0,n_1]\) where \(n_0=\ell ^{-1}(2\ell _0)\ge \ell ^{-1}(\ell (n))=n\) and \(n_1= \ell ^{-1}(2\ell (n))\lt n^2\).

Fig. 1. We want to compute \(f^{\mathtt {ws}}\) on inputs of length \(\ell (n)\) . We define a corresponding interval \(L_n=\lbrace \ell _0(n),\ldots ,\ell (n)\rbrace\) of input lengths, where \(\ell _0(n)\ge \ell (n)-s(\ell (n))\) , in which we will use the downward self-reducibility of \(f^{\mathtt {ws}}\) . We assume that there is a uniform distinguisher A for the PRG on all input lengths in \(S_n=\ell ^{-1}(2L_n)\) , in which case there exists a weak learner \(F^{\mathtt {lrn}}\) for \(f^{\mathtt {GL(ws)}}\) on all input lengths in \(2L_n\) .

Let A be a probabilistic algorithm that gets input \(1^n\) and \(\log (t(n))\) bits of non-uniform advice and runs in time \(t(n)\), fix a corresponding advice sequence, and fix a function \(\mathtt {str}(n)\le n\). Recall that we denote \(G(1^n,s)=G_0(1^n,s)_{1,\ldots ,\mathtt {str}(n)}\).

We call \(n\in \mathbb {N}\) distinguishable if for every \(m\in S_n\), when A is given input \(1^{\mathtt {str}(m)}\) and the advice bits, with probability at least \(1/t(m)\) it outputs a circuit \(D_{\mathtt {str}(m)}:\lbrace 0,1\rbrace ^{\mathtt {str}(m)}\rightarrow \lbrace 0,1\rbrace\) that \((1/t(m))\)-distinguishes \(G(1^m,{\bf u}_{\ell _G(m)})\) from uniform. We will construct a probabilistic algorithm \(F^{\mathtt {ckt}}\) that gets input \(1^{\ell (n)}\), runs in time \(2^{O(\ell (n)/\log (\ell (n))^{c})}\), and if n is distinguishable, then with high probability \(F^{\mathtt {ckt}}\) outputs a circuit \(\lbrace 0,1\rbrace ^{\ell (n)}\rightarrow \lbrace 0,1\rbrace\) that correctly computes \(f^{\mathtt {ws}}\) on \(\ell (n)\)-bit inputs. (It follows that a probabilistic algorithm F can decide \(f^{\mathtt {ws}}\) on \(\lbrace 0,1\rbrace ^{\ell (n)}\) in time at most \(2^{O(\ell (n)/\log (\ell (n))^c)}\) by running \(F^{\mathtt {ckt}}\) and evaluating the circuit at the given input.)

Construction and analysis of \(F^{\mathtt {ckt}}\). Given as input \(1^{\ell (n)}\), the algorithm \(F^{\mathtt {ckt}}\) iteratively constructs circuits for \(f^{\mathtt {ws}}_i\) for increasing values of \(i\in L_n=\lbrace \ell _0(n),\ldots ,\ell (n)\rbrace\). The construction for the base case \(i=\ell _0(n)\) relies on the fact that \(f^{\mathtt {ws}}_{\ell _0(n)}\) is computable in time \(\mathrm{poly}(1/\delta (\ell _0(n)))\) (i.e., the circuit for \(f^{\mathtt {ws}}_{\ell _0(n)}\) simply implements this algorithm). For subsequent iterations, the algorithm \(F^{\mathtt {ckt}}\) will rely on the following procedure:

Claim 4.9.2.

There exists an algorithm \(F^{\mathtt {step}}\) that gets as input \(i\in L_n\setminus \lbrace \ell _0(n)\rbrace\) and a circuit \(C_{i-1}:\lbrace 0,1\rbrace ^{i-1}\rightarrow \lbrace 0,1\rbrace ^{i-1}\) that computes \(f^{\mathtt {ws}}_{i-1}\), runs in time \(2^{O(i/\log (i)^{c})}\cdot \mathrm{poly}(|C_{i-1}|)\), and if n is distinguishable, then with probability at least \(1-\exp (-i/\log (i)^{c+1})\) the algorithm \(F^{\mathtt {step}}\) outputs a circuit \(C_i:\lbrace 0,1\rbrace ^i\rightarrow \lbrace 0,1\rbrace ^i\) of size \(2^{O(i/\log (i)^{c})}\) that computes \(f^{\mathtt {ws}}_i\).

Before proving Claim 4.9.2, let us see how is suffices for the construction of \(F^{\mathtt {ckt}}\). The algorithm \(F^{\mathtt {ckt}}\) uses \(F^{\mathtt {step}}\) with inputs \(i=\ell _0(n)+1,\ldots ,\ell (n)\), and thus it runs in time \(2^{O(\ell (n)/\log (\ell (n))^{c})}\). (Note that the size of the output circuit \(C_i\) in Claim 4.9.2 does not depend on the size of the input circuit \(C_{i-1}\).) The probability that it outputs a circuit that correctly computes \(f^{\mathtt {ws}}_{\ell (n)}\) is at least \(1-\sum _{i=\ell ^{\prime }}^{\ell }\exp (i/\log (i)^{c+1})\ge 2/3\), assuming that \(\ell\) is sufficiently large. Thus, it remains to prove Claim 4.9.2.

Preliminary step: Constructing a weak learner. Towards constructing \(F^{\mathtt {step}}\) and proving Claim 4.9.2, our first step is to construct an efficient algorithm \(F^{\mathtt {lrn}}\) that gets input \(1^{\ell (m)}\) and oracle access to \(f^{\mathtt {GL(ws)}}\) on \(\ell (m)\)-bit inputs, uses a small amount of non-uniform advice, and if \(m\in S_n\) for a distinguishable n, then the algorithm prints a circuit that computes \(f^{\mathtt {GL(ws)}}\) on noticeably more than half of the \(\ell (m)\)-bit inputs. The construction and proof follow the standard efficient uniform reconstruction argument for the Nisan-Wigderson PRG, from [33] (following [44]).

Claim 4.9.3.

There exists a probabilistic algorithm \(F^{\mathtt {lrn}}\) that gets input \(1^{\ell (m)}\), and oracle access to \(f^{\mathtt {GL(ws)}}\) on \(\ell (m)\)-bit inputs, and \(3\epsilon \cdot \ell ^{\prime }(m)\) bits of non-uniform advice, runs in time \(2^{\ell ^{\prime }(m)}\), and satisfies the following: If \(m\in S_n\) for a distinguishable n, then with probability more than \(2^{-\ell ^{\prime }(m)}\) the algorithm outputs a circuit \(\lbrace 0,1\rbrace ^{\ell (m)}\rightarrow \lbrace 0,1\rbrace\) that computes \(f^{\mathtt {GL(ws)}}\) correctly on more than \(1/2+2^{-\ell ^{\prime }(m)}\) of the inputs.

Proof.

Let \(\ell =\ell (m)\), let \(\ell ^{\prime }=\ell ^{\prime }(m)\), and let \(m^{\prime }=\mathtt {str}(m)\le m\). Let us first assume that \(m^{\prime }=m\) (i.e., \(\mathtt {str}\) is the identity function and \(G_0=G\)). In this case, a standard argument (based on [44] and first noted in [33]) shows that there exists a probabilistic polynomial time algorithm \(Rec^{\mathsf {NW}}\) that satisfies the following: When given as input a circuit \(D_{m}:\lbrace 0,1\rbrace ^{m}\rightarrow \lbrace 0,1\rbrace\) that \((1/m^{\mathrm{loglog}(m)^k})\)-distinguishes \(G(1^m,{\bf u}_{\ell _G(m)})\) from uniform, and also given oracle access to \(f^{\mathtt {GL(ws)}}\) on \(\ell\)-bit inputs, with probability at least \(1/O(m)\) the algorithm \(Rec^{\mathsf {NW}}\) outputs a circuit \(C_{\ell }:\lbrace 0,1\rbrace ^{\ell }\rightarrow \lbrace 0,1\rbrace\) such that \(\Pr _{x\in \lbrace 0,1\rbrace ^{\ell }}[C_{\ell }(x)=f^{\mathtt {GL(ws)}}(x)]\ge 1/2+1/O(m^{\mathrm{loglog}(m)^k})\).

Towards extending this claim to the setting of an arbitrary \(m^{\prime }=\mathtt {str}(m)\le m\), let us quickly recap the original construction of \(Rec^{\mathsf {NW}}\): The algorithm randomly chooses an index \(i\in [m]\) (for a hybrid argument) and values for all the bits in the seed of the NW generator outside the ith set (in the underlying design); then uses its oracle to query \(\mathrm{poly}(m)\) values for \(f^{\mathtt {GL(ws)}}\) (these are potential values for the output indices whose sets in the design intersect with the ith set) and “hard-wires” them into a circuit \(C_{\ell }\) that gets input \(x\in \lbrace 0,1\rbrace ^{\ell }\), simulates the corresponding m-bit output of the PRG, and uses the distinguisher to decide if \(x\in f^{\mathtt {GL(ws)}}\). Now, note that if the output of the PRG is truncated to length \(m^{\prime }\lt m\), then the construction above works essentially the same if we choose an initial index \(i\in [m^{\prime }]\) instead of \(i\in [m]\) and if \(C_{\ell }\) completes x to an \(m^{\prime }\)-bit output of the PRG instead of an m-bit output. Indeed, referring to the underlying analysis, these changes only improve the guarantee on the algorithm’s probability of success (we do not use the fact that the guarantee is better).

Thus, there is an algorithm \(Rec^{\mathsf {NW}}\) that gets as input \(1^m\) and a circuit \(D_{m^{\prime }}:\lbrace 0,1\rbrace ^{m^{\prime }}\rightarrow \lbrace 0,1\rbrace\) that \((1/m^{\mathrm{loglog}(m)^k})\)-distinguishes \(G(1^{m},{\bf u}_{\ell _G(m)})\) from uniform, and oracle access to \(f^{\mathtt {GL(ws)}}_{\ell }\), and with probability at least \(1/O(m)\) outputs a circuit \(C_{\ell }:\lbrace 0,1\rbrace ^{\ell }\rightarrow \lbrace 0,1\rbrace\) such that \(\Pr _{x\in \lbrace 0,1\rbrace ^{\ell }}[C_{\ell }(x)=f^{\mathtt {GL(ws)}}(x)]\ge 1/2+1/O(m^{\mathrm{loglog}(m)^k})\).

Now, let n be distinguishable, let \(m\in S_n\), let \(\ell =\ell (m)\), and let \(m^{\prime }=\mathtt {str}(m)\). Our probabilistic algorithm \(F^{\mathtt {lrn}}\) is given as input \(1^{\ell }\) and non-uniform advice \((a,m^{\prime },m)\) such that \(|a|=\log (t(m))=\log (m)\cdot \mathrm{loglog}(m)^k=\epsilon \cdot \ell ^{\prime }\); note that, since \(m^{\prime }\le m\), the total length of the advice is at most \(\epsilon \cdot \ell ^{\prime }+2\log (m)\lt 2\epsilon \cdot \ell ^{\prime }\). The algorithm \(F^{\mathtt {lrn}}\) simulates the algorithm A on input \(1^{m^{\prime }}\) with the advice a, feeds the output of A as input for \(Rec^{\mathsf {NW}}\) along with \(1^m\), and outputs the circuit given by \(Rec^{\mathsf {NW}}\).

Our algorithm \(F^{\mathtt {lrn}}\) runs in time \(m^{O(\mathrm{loglog}(m)^k)}=2^{\ell ^{\prime }}\). With probability more than \((1/m^{\mathrm{loglog}(m)^k})\), the algorithm A outputs \(D_{m^{\prime }}:\lbrace 0,1\rbrace ^{m^{\prime }}\rightarrow \lbrace 0,1\rbrace\) that \((1/m^{\mathrm{loglog}(m)^k})\)-distinguishes \(G(1^m,{\bf u}_{\ell _G(m)})\) from uniform, and conditioned on this event, with probability at least \(1/O(m)\) the algorithm \(F^{\mathtt {lrn}}\) outputs \(C_{\ell }:\lbrace 0,1\rbrace ^{\ell }\rightarrow \lbrace 0,1\rbrace\) that correctly computes \(f^{\mathtt {GL(ws)}}\) on \(1/2+1/O(m^{\mathrm{loglog}(m)^k})\gt 1/2+2^{-\ell ^{\prime }}\) of the \(\ell\)-bit inputs.□

Claim 4.9.3 implies that for any distinguishable n, when \(F^{\mathtt {lrn}}\) gets input \(1^r\) where \(r\in 2L_n=\lbrace 2i:i\in L_n\rbrace\), it succeeds (with probability \(\ge 2^{-\ell ^{\prime }(n)}\)) in printing a circuit that approximates \(f^{\mathtt {GL(ws)}}\) on r-bit inputs. (This is because, by the definition of \(S_n\), any such input length is of the form \(\ell (m)\) for \(m\in S_n\).) See Figure 1 for a pictorial description of the sets \(L_n\), \(2L_n\), and \(S_n\), and for a reminder about our assumptions at this point.

Proof of Claim 4.9.2. Let \(i^{\prime }=2i/\log (2i)^{c+1}\), and let \(S=|C_{i-1}|\). First note that the algorithm can compute \(f^{\mathtt {ws}}_i\) in time \(\mathrm{poly}(1/\delta (i),S)\) (using the downward self-reducibility of \(f^{\mathtt {ws}}\) and the circuit \(C_{i-1}\)) and also compute \(f^{\mathtt {GL(ws)}}_{2i}\) in time \(\mathrm{poly}(1/\delta (i),S)\) (using the fact that \(f^{\mathtt {GL(ws)}}(x,r)=\sum _{j\in [i]}f^{\mathtt {ws}}_i(x)_j\cdot r_j\)). We will construct \(C_i\) in a sequence of steps:

(1) Simulating the learner for \(f^{\mathtt {GL(ws)}}_{2i}\). We enumerate over all \(2^{3\epsilon \cdot i^{\prime }}\) possible advice strings for \(F^{\mathtt {lrn}}\). For each fixed advice string \(a\in \lbrace 0,1\rbrace ^{3\epsilon \cdot i^{\prime }}\), we simulate \(F^{\mathtt {lrn}}\) on input \(1^{2i}\) with advice a for \(2^{O(i^{\prime })}\) times (using independent randomness in each simulation) while answering its queries to \(f^{\mathtt {GL(ws)}}_{2i}\) using \(C_{i-1}\).

Analysis: When a is the “good” advice, each simulation of \(F^{\mathtt {lrn}}\) is successful with probability at least \(2^{-i^{\prime }}\). Thus, with probability at least \(1-\exp (-i^{\prime })\), we obtained a list of \(2^{O(i^{\prime })}\) circuits, at least one of which correctly computes \(f^{\mathtt {GL(ws)}}_{2i}\) on at least \(1/2+2^{-i^{\prime }}\) of its inputs.

(2) Weeding the list to find a circuit for \(f^{\mathtt {GL(ws)}}_{2i}\). We enumerate over the list of \(2^{O(i^{\prime })}\) circuits. For each circuit, we randomly sample \(2^{O(i^{\prime })}\) inputs, compute \(f^{\mathtt {GL(ws)}}_{2i}\) at each of these inputs using \(C_{i-1}\), and compare the value of \(f^{\mathtt {GL(ws)}}_{2i}\) to the output of the candidate circuit. If the circuit agrees with \(f^{\mathtt {GL(ws)}}_{2i}\) on at least \(1/2+2^{-i^{\prime }}-2^{-2i^{\prime }}\) of the inputs in the sample, then we denote this circuit by \(C^{(1)}_i\) and move on to Step 3; otherwise, we continue to the next circuit in the list. If we enumerated over the entire list and did not find a suitable circuit \(C^{(1)}_i\), then we abort.

Analysis: For each circuit, with probability at least \(1-2^{-O(i^{\prime })}\) over the sampled inputs, we correctly estimate its agreement with \(f^{\mathtt {GL(ws)}}_{2i}\) up to error \(2^{-2i^{\prime }-1}\). Union-bounding over the \(2^{O(i^{\prime })}\) circuits, with probability at least \(1-2^{-O(i^{\prime })}\), in this step, we obtained a circuit \(C^{(1)}_i\) that has agreement at least \(1/2+2^{-2i^{\prime }}\) with \(f^{\mathtt {GL(ws)}}_{2i}\).

(3) Conversion to a probabilistic circuit that computes \(f^{\mathtt {ws}}_i\) with success \(\mathrm{poly}(\delta _0)\). We use the algorithm of Goldreich and Levin [21] to convert the deterministic circuit \(C^{(1)}_i\) into a probabilistic circuit \(C^{(2)}_i:\lbrace 0,1\rbrace ^i\rightarrow \lbrace 0,1\rbrace ^i\) of size \(2^{O(i^{\prime })}\) such that \(\Pr [C^{(2)}_i(x)=f^{\mathtt {ws}}_i(x)]\ge 2^{-O(i^{\prime })}\), where the probability is taken both over a random choice of \(x\in \lbrace 0,1\rbrace ^i\) and over the internal randomness of \(C^{(2)}_i\). Specifically, the circuit \(C^{(2)}_i\) gets input \(x\in \lbrace 0,1\rbrace ^i\) and simulates the algorithm from [19, Theorem 7.8] with parameter \(\delta _0=2^{-2i^{\prime }}\), while resolving the oracle queries of the algorithm using the circuit \(C^{(1)}_i\); then, the circuit \(C^{(2)}_i\) outputs a random element from the list that is produced by the algorithm from [19].

Analysis: Since \(\mathbb {E}_{x}[\Pr _r[C^{(1)}_i(x,r)=f^{\mathtt {GL(ws)}}_{2i}(x,r)]]\ge 1/2+\delta _0\), it follows that for at least \(\delta _0/2\) of the inputs \(x\in \lbrace 0,1\rbrace ^i\) it holds that \(\Pr _r[C^{(1)}_i(x,r)=f^{\mathtt {GL(ws)}}_{2i}(x,r)]\ge 1/2+\delta _0/2\). For each such input x, with probability at least \(1/2\) the algorithm of [21] outputs a list of size \(\mathrm{poly}(1/\delta _0)\) that contains \(f^{\mathtt {ws}}_i(x)\), and thus the circuit \(C^{(2)}_i\) outputs \(f^{\mathtt {ws}}_i(x)\) with probability \(\mathrm{poly}(\delta _0)\).

(4) Fixing randomness for the probabilistic circuit. For \(t=2^{O(i^{\prime })}\) attempts, we choose a random string for \(C^{(2)}_i\), hard-wire it into the circuit, and estimate the agreement between the resulting deterministic circuit and \(f^{\mathtt {ws}}_i\), with an additive error of \(\delta _1=\mathrm{poly}(\delta _0)\) and confidence \(1-1/\mathrm{poly}(t)\). (The estimation in each attempt is done using random sampling of inputs, the downward self-reducibility of \(f^{\mathtt {ws}}_i\) and the circuit \(C_{i-1}\), similarly to Step 2.) We proceed to the next step if in one of these attempts yields a deterministic circuit that (according to our estimations) agrees with \(f^{\mathtt {ws}}_i\) on at least \(2\delta _1\) of the inputs.

Analysis: With probability at least \(1-\exp (-i^{\prime }),\) at least one choice of random string yields a deterministic circuit that agrees with \(f^{\mathtt {ws}}_i\) on at least \(3\delta _1\) of the inputs, and with probability at least \(1-\exp (-i^{\prime }),\) all of our t estimates are correct up to an additive error of \(\delta _1\). Thus, with probability at least \(1-\exp (-i^{\prime })\), we proceed to the next step with a deterministic circuit \(C^{(3)}_i\) of size \(2^{O(i^{\prime })}\) that agrees with \(f^{\mathtt {ws}}_i\) on \(\delta _1=2^{-O(i^{\prime })}=2^{-O(i/\log (i)^{c+1})}\gt \delta (i)\) of the inputs.

(5) Worst-case to \(\delta\)-average-case reduction for \(f^{\mathtt {ws}}_i\). We use the sample-aided worst-case to \(\delta\)-average-case reduction for \(f^{\mathtt {ws}}\), generating random labeled samples \((r,f^{\mathtt {ws}}_i(r))\) by using the downward self-reducibility of \(f^{\mathtt {ws}}\) and the circuit \(C_{i-1}\) to compute \(f^{\mathtt {ws}}_i(r)\).

Analysis: With probability at least \(1-\delta (i)\), the uniform reduction outputs a probabilistic circuit \(C^{(4)}_i\) of size \(\mathrm{poly}(1/\delta (i))\) such that for every \(x\in \lbrace 0,1\rbrace ^i\) it holds that \(\Pr _r[C^{(4)}_i(x,r)=f^{\mathtt {ws}}_i(x)]\ge 2/3\).³⁴

(6) Fixing randomness for the final circuit. Applying naive error-reduction to \(C^{(4)}_i\), we obtain a circuit \(C^{(5)}_i\) of size \(\mathrm{poly}(1/\delta (i))\) that correctly computes \(f^{\mathtt {ws}}_i\) at any input with probability \(1-2^{-O(i)}\). Then, we uniformly choose randomness for \(C^{(5)}_i\) and “hard-wire” the randomness into it, such that with probability at least \(1-2^{-i}\), we obtain a deterministic circuit \(C_i:\lbrace 0,1\rbrace ^{i}\rightarrow \lbrace 0,1\rbrace\) that computes \(f^{\mathtt {ws}}_i\) correctly on all inputs.

Having proved Claim 4.9.2, this concludes the proof of Lemma 4.9.

In the last part of the proof of Lemma 4.9, after we converted a distinguisher for \(f^{\mathtt {GL(ws)}}\) into a weak learner for \(f^{\mathtt {GL(ws)}}\) (i.e., after Claim 4.9.3), we used the existence of the weak learner for \(f^{\mathtt {GL(ws)}}\) on \(2L_n\) to obtain a circuit that computes \(f^{\mathtt {ws}}\) on \(L_n\). This part of the proof immediately implies the following, weaker corollary. (The corollary is weaker, since it does not have any “point-wise” property, i.e., does not convert a learner on specific input lengths to a circuit for \(f^{\mathtt {ws}}\) on a corresponding input length.)

Corollary 4.10

(Learningf^GL(ws) ⇛ Computingf^ws)

Let \(c\in \mathbb {N}\) be an arbitrary constant, let \(f^{\mathtt {ws}}:\lbrace 0,1\rbrace ^*\rightarrow \lbrace 0,1\rbrace ^*\) be a \(\delta\)-well-structured function for \(\delta (n)=2^{-n/\log (n)^{c}}\), and let \(f^{\mathtt {GL(ws)}}\) be defined as in the proof of Lemma 4.9. Assume that for every \(\ell \in \mathbb {N}\) there exists a weak learner for \(f^{\mathtt {GL(ws)}}\); that is, an algorithm that gets input \(1^{\ell }\) and oracle access to \(f^{\mathtt {GL(ws)}}_{\ell }\), runs in time \(\delta ^{-1}(\ell)\), and with probability more than \(\delta (\ell)\) outputs a circuit over \(\ell\) bits that computes \(f^{\mathtt {GL(ws)}}\) correctly on more than \(1/2+\delta (\ell)\) of the inputs. Then, there exists an algorithm that for every \(\ell\), when given input \(1^{\ell }\), runs in time \(2^{O(\ell /\log (\ell)^c)}\) and outputs an \(\ell\)-bit circuit that computes \(f^{\mathtt {ws}}\).

We now use the “point-wise” property of Lemma 4.9 to deduce two “almost-always” versions of Proposition 4.8. Recall that in our construction of a well-structured function \(f^{\mathtt {ws}}\), on some input lengths \(f^{\mathtt {ws}}\) is defined trivially, and thus it cannot be that \(f^{\mathtt {ws}}\) is hard almost-always.³⁵ However, since \(\mathtt {TQBF}\) can be reduced to \(f^{\mathtt {ws}}\) with a quasilinear blow-up \(b:\mathbb {N}\rightarrow \mathbb {N}\), we can still deduce the following: If \(\mathtt {TQBF}\) is “hard” almost-always, then for every \(n\in \mathbb {N}\) there exists \(n^{\prime }\le b(n)\) such that \(f^{\mathtt {ws}}\) is “hard” on input length \(n^{\prime }\) (i.e., this holds for the smallest \(n^{\prime }\ge n\) of the form \(b(n_0)\) for \(n_0\in \mathbb {N}\)).

In our first “almost-always” result, the hypothesis is that a well-structured function is “hard” on a dense set of input lengths as above, and the conclusion is that there exists an “almost-everywhere” HSG for uniform circuits.

Proposition 4.11

(“Almost Everywhere” Hardness off^ws → “Almost Everywhere” Derandomization of RP “On Average”)

Assume that for some constant \(c\in \mathbb {N}\) and for \(\delta (n)=2^{-n/\log (n)^{c}}\) there exists a \((\delta ,\mathrm{polylog}(n))\)-well-structured function and \(b(n)=\tilde{O}(n)\) such that for every probabilistic algorithm that runs in time \(2^{n/\log (n)^c}\), and every sufficiently large \(n\in \mathbb {N}\), the algorithm fails to compute \(f^{\mathtt {ws}}\) on input length \(\overline{n}=\min \lbrace b(n_0)\ge n:n_0\in \mathbb {N}\rbrace\). Then, for every \(k\in \mathbb {N}\) and for \(t(n)=n^{\mathrm{loglog}(n)^k}\) there exists a \((1/t)\)-HSG for \((t,\log (t))\)-uniform circuits that is computable in time \(n^{\mathrm{polyloglog}(n)}\) and has seed length \(\tilde{O}(\log (n))\).

Proof.

We instantiate Lemma 4.9 with the constant c, the function \(f^{\mathtt {ws}}\), the parameter \(2k\) instead of k (i.e., the parameter t in Lemma 4.9 is \(t(n)=n^{\mathrm{loglog}(n)^{2k}}\)), and with \(\mathtt {str}(n)=n\) (i.e., \(\mathtt {str}\) is the identity function). Let \(\ell (n)=\lceil \tilde{O}(\log (n))\rceil\) be the quasilogarithmic function given by Lemma 4.9, let \(G=G_0\) be the corresponding PRG, and let \(\ell _G(n)=\tilde{O}(\log (n))\) be the seed length of G. From our hypothesis regarding the hardness of \(f^{\mathtt {ws}}\), we can deduce the following:

Corollary 4.11.1.

For every \(n\in \mathbb {N}\) there is a polynomial-time-enumerable set \(\overline{S_n}=S_{n^{\mathrm{polyloglog}(n)}}\subset [n,n^{\mathrm{polyloglog}(n)}]\) of size \(\mathrm{polyloglog}(n)\) such that for every probabilistic algorithm \(A^{\prime }\) that runs in time \(t^2\) and uses \(2\log (t)\) bits of advice, if \(n\in \mathbb {N}\) is sufficiently large, then there exists \(m\in \overline{S_n}\) such that \(G(1^m,{\bf u}_{\ell _G(m)})\) is \((1/t^2(m))\)-pseudorandom for \(A^{\prime }\).□

Proof.

For every \(n\in \mathbb {N}\), let \(\overline{\ell }(n)=\min \lbrace b(\ell _0)\ge \ell (n):\ell _0\in \mathbb {N}\rbrace\), and let \(\overline{n}=\ell ^{-1}(\overline{\ell }(n))\in [n,n^{\mathrm{polyloglog}(n)}]\). We define \(\overline{S_n}=S_{\overline{n}}\), where \(S_{\overline{n}}\) is the set from Item (2) of Lemma 4.9 that corresponds to \(\overline{n}\). Note that \(\overline{S_n}\subset [n,n^{\mathrm{polyloglog}(n)}]\) and that \(|\overline{S_n}|\le \mathrm{polyloglog}(n)\).

Now, let \(A^{\prime }\) be a probabilistic algorithm as in our hypothesis, let \(F^{\prime }\) be the corresponding probabilistic algorithm from Lemma 4.9 that runs in time \(t_{F^{\prime }}(i)=2^{i/\log (i)^c}\), and let \(n\in \mathbb {N}\) be sufficiently large. By Lemma 4.9, if there is no \(m\in \overline{S_n}\) such that \(G(1^m,{\bf u}_{\ell _G(m)})\) is \((1/t(m))\)-pseudorandom for \(A^{\prime }\), then \(F^{\prime }\) correctly computes \(f^{\mathtt {ws}}\) on input length \(\overline{\ell }(\overline{n})=\overline{\ell }(n)\), which contradicts our hypothesis.□

The HSG, denoted H, gets input \(1^n\), uniformly chooses \(m\in \overline{S_n}\), computes \(G(1^m,s)\) for a random \(s\in \lbrace 0,1\rbrace ^{\ell _G(m)}\), and outputs the n-bit prefix of \(G(1^m,s)\). Note that the seed length that H requires is \(\tilde{O}(\log (n^{\mathrm{polyloglog}(n)}))+\log (|\overline{S_n}|) = \tilde{O}(\log (n))\) and that H is computable in time at most \(n^{\mathrm{polyloglog}(n)}\).

To prove that H is a \((1/t)\)-HSG for \((t,\log (t))\)-uniform circuits, let A be a probabilistic algorithm that runs in time t and uses \(\log (t)\) bits of advice. Assume towards a contradiction that there exists an infinite set \(B_A\subseteq \mathbb {N}\) such that for every \(n\in B_A\), with probability more than \(1/t(n)\) the algorithm A outputs a circuit \(D_n:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) satisfying \(\Pr _s[D_n(H(1^n,s))=0]=1\) and \(\Pr _{x\in \lbrace 0,1\rbrace ^n}[D_n(x)=1]\gt 1/t(n)\). We will construct an algorithm \(A^{\prime }\) that runs in time less than \(t^2\), uses \(\log (t)+\log (n)\lt 2\log (t)\) bits of advice, and for infinitely-many sets of the form \(\overline{S_n}\), for every \(m\in \overline{S_n}\) it holds that \(G(1^m,{\bf u}_{\ell _G(m)})\) is not \((1/t(m))\)-pseudorandom for \(A^{\prime }\). This contradicts Corollary 4.11.1.

The algorithm \(A^{\prime }\) gets input \(1^m\), and as advice it gets an integer of size at most m. Specifically, if m is in a set \(\overline{S_n}\) for some \(n\in B_A\), then the advice will be set to n; and otherwise the advice is zero (which signals to \(A^{\prime }\) that it can fail on input length m). For any \(m\in \mathbb {N}\) such that the first case holds, we know that \(A(1^n)\) outputs, with probability more than \(1/t(n)\), a circuit \(D_n:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) satisfying both \(\Pr _{s\in \lbrace 0,1\rbrace ^{\tilde{O}(\log (n))}}[D_n(H(1^n,s))=0]=1\) and \(\Pr _{x\in \lbrace 0,1\rbrace ^n}[D_n(x)=1]\gt 1/t(n)\). The algorithm \(A^{\prime }\) simulates A on input length n and outputs a circuit \(D_m:\lbrace 0,1\rbrace ^m\rightarrow \lbrace 0,1\rbrace\) such that \(D_m\) computes \(D_n\) on the n-bit prefix of its input. By our hypothesis regarding \(D_n\), when fixing the first part of the seed of H to be the integer m, we have that \(\Pr _{s^{\prime }}[D_n(H(1^n,m\circ s^{\prime }))=0]=\Pr _{s^{\prime }}[D_m(G(1^m,s^{\prime }))=0]=1\), whereas \(\Pr _{x\in \lbrace 0,1\rbrace ^m}[D_m(x)=1]\gt 1/t(n)\). It follows that \(D_m\) distinguishes the m-bit output of G from uniform with advantage \(1/t(n)\ge 1/t(m)\).

We also prove another “almost-everywhere” version of Proposition 4.8. Loosely speaking, under the same hypothesis as in Proposition 4.11, we show that \(\mathcal {BPP}\) can be derandomized “on average” using only a small (triple-logarithmic) amount of advice. In contrast to the conclusion of Proposition 4.11, in the following proposition, we do not construct a PRG or HSG, but rather simulate every \(\mathcal {BPP}\) algorithm by a corresponding deterministic algorithm that uses a small amount of non-uniform advice.

Proposition 4.12

(“Almost Everywhere” Hardness off^ws → “Almost Everywhere” Derandomization of BPP “On Average” with Short Advice)

For \(k\in \mathbb {N}\) and \(t(n)=n^{\mathrm{loglog}(n)^k}\), let \(L\in \mathcal {BPTIME}[t]\) and let F be a probabilistic t-time algorithm. Then, there exists a deterministic machine D that runs in time \(n^{\mathrm{polyloglog}(n)}\) and gets \(O(\mathrm{logloglog}(n))\) bits of non-uniform advice such that for all sufficiently large \(n\in \mathbb {N}\), the probability (over coin tosses of F) that \(F(1^n)\) is an input \(x\in \lbrace 0,1\rbrace ^n\) for which \(D(x)\ne L(x)\) is at most \(1/t(n)\).

Proof.

Let us first prove the claim assuming that \(L\in \mathcal {BPTIME}[t]\) can be decided using only a number of random coins that equals the input length; later on, we show how to remove this assumption (by a padding argument). For t as in our hypothesis for L as above, let M be a probabilistic t-time algorithm that decides L and that for every input \(x\in \lbrace 0,1\rbrace ^*\) uses \(|x|\) random coins, and let F be a probabilistic t-time algorithm. Consider the algorithm A that, on input \(1^n\), simulates F on input \(1^{n}\) to obtain \(x\in \lbrace 0,1\rbrace ^n\) and outputs a circuit \(C_x:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace\) that computes the decision of M at input x as a function of the random coins of M.

We instantiate Lemma 4.9 with the constant c, the function \(f^{\mathtt {ws}}\), and the parameter k. Let \(\ell =\tilde{O}(\log (n))\) be the quasilogarithmic function given by the lemma, let \(G_0\) be the PRG, and let \(\ell _G=\tilde{O}(\log (n))\) be the seed length of \(G_0\). We first need a claim similar to Corollary 4.11.1, but this time also quantifying over the function \(\mathtt {str}\):

Corollary 4.12.1.

For every \(n\in \mathbb {N}\) there is a polynomial-time-enumerable set \(\overline{S_n}=S_{n^{\mathrm{polyloglog}(n)}}\subset [n,n^{\mathrm{polyloglog}(n)}]\) of size \(\mathrm{polyloglog}(n)\) that satisfies the following: For every \(\mathtt {str}:\mathbb {N}\rightarrow \mathbb {N}\) satisfying \(\mathtt {str}(n)\le n\), let \(G_{\mathtt {str}}\) be the algorithm that on input \(1^n\) uses a random seed of length \(\tilde{O}(\log (n))\), computes \(G_0\), which outputs an n-bit string, and truncates the output to length \(\mathtt {str}(n)\). Then, for every probabilistic algorithm \(A^{\prime }\) that runs in time t and uses \(\log (t)\) bits of advice, if \(n\in \mathbb {N}\) is sufficiently large, then there exists \(m\in \overline{S_n}\) such that \(G_{\mathtt {str}}(1^m,{\bf u}_{\ell _G(m)})\) is \((1/t(m))\)-pseudorandom for \(A^{\prime }\).□

Proof.

For any \(n\in \mathbb {N}\), we define \(\overline{\ell }(n)\) and \(\overline{S_n}\) as in the proof of Corollary 4.11.1. For any \(\mathtt {str}:\mathbb {N}\rightarrow \mathbb {N}\) satisfying \(\mathtt {str}(n)\le n\), let \(G_{\mathtt {str}}\) be the corresponding function. Now, let \(A^{\prime }\) be any probabilistic algorithm as in our hypothesis, let \(F^{\prime }\) be the corresponding probabilistic algorithm from Lemma 4.9 that runs in time \(t_{F^{\prime }}(i)=2^{i/\log (i)^c}\), and let \(n\in \mathbb {N}\) be sufficiently large. By Lemma 4.9, if there is no \(m\in \overline{S_n}\) such that \(G_{\mathtt {str}}(1^m,{\bf u}_{\ell _G(m)})\) is \((1/t(m))\)-pseudorandom for \(A^{\prime }\), then \(F^{\prime }\) correctly computes \(f^{\mathtt {ws}}\) on input length \(\overline{\ell }(n)\). This contradicts our hypothesis regarding \(f^{\mathtt {ws}}\).□

The machine D gets input \(x\in \lbrace 0,1\rbrace ^n\) and advice of length \(O(\mathrm{logloglog}(n))\), which is interpreted as an index of an element m in the set \(\overline{S_n}\). Then, for each \(s\in \lbrace 0,1\rbrace ^{\ell _G(m)}\) the algorithm computes the n-bit prefix of \(G_0(1^m,s)\), denoted \(w_s=G_0(1^m,s)_{1,\ldots ,n}\), and outputs the majority value of \(\lbrace M(x,w_s):s\in \lbrace 0,1\rbrace ^{\ell _G(m)}\rbrace\). Note that the machine D indeed runs in time \(m^{\mathrm{polyloglog}(m)}=n^{\mathrm{polyloglog}(n)}\).

Our goal now is to prove that for every sufficiently large \(n\in \mathbb {N}\) there exists advice \(m\in \overline{S_n}\) such that with probability at least \(1-1/t(n)\) over the coin tosses of F (which determine \(x\in \lbrace 0,1\rbrace ^n\) and \(C_x:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace\)) it holds that (4.2) \(\begin{equation} \Big \vert \Pr _{r\in \lbrace 0,1\rbrace ^{n}}[C_x(r)=1]-\Pr _s[C_x(G_0(1^{m},s)_{1,\ldots ,n})=1]\Big \vert \lt 1/t(n) \;\text{,} \end{equation}\) which is equivalent (for a fixed \(x\in \lbrace 0,1\rbrace ^n\)) to the following statement: (4.3) \(\begin{equation} \Big \vert \Pr _{r\in \lbrace 0,1\rbrace ^{n}}[M(x,r)=1] - \Pr _s[M(x,w_s)=1]\Big \vert \lt 1/t(n) \;\text{.} \end{equation}\) Indeed, proving this would suffice to prove our claim, since for every \(x\in \lbrace 0,1\rbrace ^n\) such that Equation (4.3) holds, we have that \(D(x)=L(x)\).

To prove the claim above, assume towards a contradiction that there exists an infinite set of input lengths \(B_A\subseteq \mathbb {N}\) such that for every \(n\in B_A\) and every advice \(m\in \overline{S_n}\), with probability more than \(1/t(n)\) over \(x\leftarrow F(1^n)\) it holds that \(C_x:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) violates Equation (4.2). Let \(\mathtt {str}:\mathbb {N}\rightarrow \mathbb {N}\) be defined by \(\mathtt {str}(m)=n\) if \(m\in \overline{S_n}\) for some \(n\in B_A\), and \(\mathtt {str}(m)=m\) otherwise.³⁶ Then, our assumption implies that for infinitely-many input lengths \(n\in B_A\), for every \(m\in \overline{S_n}\) it holds that \(G_{\mathtt {str}}(1^m,{\bf u}_{\ell _G(m)})\) is not \((1/t(n))\)-pseudorandom for A. This contradicts Corollary 4.12.1.

Finally, let us remove the assumption that L can be decided using a linear number of coins by a padding argument. For any \(L\in \mathcal {BPTIME}[t]\), consider a padded version \(L^{\tt pad}=\lbrace (x,1^{t(|x|)}):x\in L\rbrace\), and note that \(L^{\tt pad}\) can be decided in linear time using \(|z|\) coins on any input z. By the argument above, for every probabilistic t-time algorithm \(F^{\tt pad}\) there exists an algorithm \(D^{\tt pad}\) that runs in time \(t_{D^{\tt pad}}(m)=m^{\mathrm{polyloglog}(m)}\) such that for all sufficiently large \(m\in \mathbb {N}\) it holds that \(\Pr _{z\leftarrow F^{\tt pad}(1^m)}[D^{\tt pad}(z)\ne L^{\tt pad}(z)]\le 1/t(m)\).

We define the algorithm D in the natural way, i.e., \(D(x)=D^{\tt pad}(x,1^{t(|x|)})\), and note that this algorithm runs in time \(n^{\mathrm{polyloglog}(n)}\). Assume towards a contradiction that there exists a t-time algorithm F and an infinite set of input lengths \(B_F\subseteq \mathbb {N}\) such that for every \(n\in B_F\), with probability more than \(1/t(n)\) it holds that \(D(x)\ne L(x)\). Consider the algorithm \(F^{\tt pad}\) that on input of the form \(1^{n+t(n)}\) runs \(F(1^n)\) to obtain \(x\in \lbrace 0,1\rbrace ^n\) and outputs \((x,1^{n})\) (on inputs of another form \(F^{\tt pad}\) fails and halts), and let \(B_{F^{\tt pad}}=\lbrace n+t(n):n\in B_F\rbrace\). For any \(m\in B_{F^{\tt pad}}\), we have that \(\begin{align*} \Pr _{z\leftarrow F^{\tt pad}(1^m)}[D^{\tt pad}(z)\ne L^{\tt pad}(z)] &= \Pr _{x\leftarrow F(1^n)}[D(x)\ne L(x)] \gt 1/t(n) \gt 1/t(m) \;\text{,} \end{align*}\) which yields a contradiction.

Remark 4.13

(A PRG That Runs in Quasilogarithmic Space).

The PRG constructed in Lemma 4.9 actually works in quasilogarithmic space (since \(f^{\mathtt {ws}}\) is computable in linear space), except for one crucial part: The construction of combinatorial designs. Combinatorial designs with parameters as in our proof actually can be constructed in logarithmic space, but these combinatorial designs work only for values of \(\ell\) that are of a specific form (since the constructions are algebraic).³⁷ However, in our downward self-reducibility argument, we need such designs for every integer \(\ell\) (such that we can assume the existence of distinguishers on the set \(S_n=\ell ^{-1}(2L_n)\), and hence of learners for \(f^{\mathtt {GL(ws)}}\) on \(2L_n\)).

4.3 Proofs of Theorems 1.1 and 1.2

Let us now formally state Theorem 1.1 and prove it. The theorem follows immediately as a corollary of Lemma 4.7 and Proposition 4.8.

Theorem 4.14

(rETH → i.o.-PRG for Uniform Circuits)

Assume that there exists \(i\ge 1\) such that \(\mathtt {TQBF}\notin \mathcal {BPTIME}[2^{n/\log (n)^i}]\). Then, for every \(k\in \mathbb {N}\) and for \(t(n)=n^{\mathrm{loglog}(n)^k}\) there exists a \((1/t)\)-i.o.-PRG for \((t,\log (t))\)-uniform circuits that has seed length \(\tilde{O}(\log (n))\) and is computable in time \(n^{\mathrm{polyloglog}(n)}\).

Proof.

Let \(\delta (n)=2^{-n/\log (n)^{3c}}\) for a sufficiently large constant \(c\in \mathbb {N}\). By Lemma 4.7, there exists \((\delta ,O(\log (n)^{6c}))\)-well-structured function \(f^{\mathtt {ws}}\) that is computable in linear space and such that \(\mathtt {TQBF}\) reduces to \(f^{\mathtt {ws}}\) in time \(\mathtt {ql}(n)=n\cdot \log (n)^{2c+r}\), where \(r\in \mathbb {N}\) is a universal constant. Using our hypothesis, we deduce that \(f^{\mathtt {ws}}\) cannot be computed in probabilistic time \(2^{n/\log (n)^{3c-1}}\gt 2^{O(n/\log (n)^{3c})}\); this is the case, since otherwise, \(\mathtt {TQBF}\) could have been computed in probabilistic time (4.4) \(\begin{equation} 2^{\mathtt {ql}(n)/\log (\mathtt {ql}(n))^{3c-1}}=2^{n\cdot \log (n)^{2c+r}/\log (\mathtt {ql}(n))^{3c-1}} \lt 2^{n/\log (n)^{c-r-1}} \;\text{,} \end{equation}\) which is a contradiction if \(c\ge i+r+1\). Our conclusion now follows from Proposition 4.8.□

We also formally state Theorem 1.2 and prove it, as a corollary of Lemma 4.7 and of Propositions 4.11 and 4.12.

Theorem 4.15

(a.a.-rETH → Almost-always HSG for Uniform Circuits and Almost-always “Average-case” Derandomization of\(\mathcal {BPP}\))

Assume that there exists \(i\ge 1\) such that \(\mathtt {TQBF}\notin \text{i.o.-}\mathcal {BPTIME}[2^{n/\log (n)^i}]\). Then, for every \(k\in \mathbb {N}\) and for \(t(n)=n^{\mathrm{loglog}(n)^k}\):

(1)	There exists a \((1/t)\)-HSG for \((t,\log (t))\)-uniform circuits that is computable in time \(n^{\mathrm{polyloglog}(n)}\) and has seed length \(\tilde{O}(\log (n))\).
(2)	For every \(L\in \mathcal {BPTIME}[t]\) and probabilistic t-time algorithm F there exists a deterministic machine D that runs in time \(n^{\mathrm{polyloglog}(n)}\) and gets \(O(\mathrm{logloglog}(n))\) bits of non-uniform advice such that for all sufficiently large \(n\in \mathbb {N}\) the probability (over coin tosses of F) that \(F(1^n)\) is an input \(x\in \lbrace 0,1\rbrace ^n\) for which \(D(x)\ne L(x)\) is at most \(1/t(n)\).

Proof.

Note that both Proposition 4.11 and Proposition 4.12 rely on the same hypothesis and that their respective conclusions correspond to Items (1) and (2) in our claim. Thus, it suffices to prove that their hypothesis holds.

To see this, as in the proof of Theorem 4.14, let \(\delta (n)=2^{-n/\log (n)^{3c}}\) for a sufficiently large constant \(c\in \mathbb {N}\), and let \(f^{\mathtt {ws}}\) be the \((\delta ,\mathrm{polylog}(n))\)-well-structured function that is obtained from Lemma 4.7 with parameter \(\delta\). Let \(r\in \mathbb {N}\) be the universal constant from Lemma 4.7, and let \(\mathtt {ql}(n)=n\cdot \log (n)^{2c+r}\). Note that for every algorithm that runs in time \(2^{n/\log (n)^{3c-1}}\gt 2^{O(n/\log (n)^{3c})}\) and every sufficiently large \(n_0\in \mathbb {N}\), the algorithm fails to compute \(f^{\mathtt {ws}}\) on input length \(n=\mathtt {ql}(n_0)\); this is because, otherwise, we could have computed \(\mathtt {TQBF}\) on infinitely-often \(n_0\)’s in time \(2^{n/\log (n)^{c-r-1}}\le 2^{n_0/\log (n_0)^{k}}\), where the calculation is as in Equation (4.4). This implies the hypothesis of Propositions 4.11 and 4.12.□

5 NETH AND THE EQUIVALENCE OF DERANDOMIZATION AND CIRCUIT LOWER BOUNDS

In this section, we prove Theorems 1.4, 1.5, and 1.6. Recall that these results show two-way implications between the statement that derandomization and circuit lower bounds are equivalent, and a very weak variant of NETH. Specifically, the latter variant is that \(\mathcal {E}\) does not have \(\mathcal {NTIME}[T]\)-uniform circuits of small size; let us now properly define this notion:

Definition 5.1

(.

\(\mathcal {NTIME}[T]\) -uniform Circuits) For \(S,T:\mathbb {N}\rightarrow \mathbb {N}\), we say that a set \(L\subseteq \lbrace 0,1\rbrace ^*\) can be decided by \(\mathcal {NTIME}[T]\)-uniform circuits of size S if there exists a non-deterministic machine M that gets input \(1^n\), runs in time \(T(n)\), and satisfies the following:

(1)	For every \(n\in \mathbb {N}\) there exist non-deterministic choices such that \(M(1^n)\) outputs a circuit \(C:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{}\) of size at most \(S(n)\) that decides \(L_n=L\cap \lbrace 0,1\rbrace ^{n}\).
(2)	For every \(n\in \mathbb {N}\) and non-deterministic choices, \(M(1^n)\) either outputs a circuit \(C:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{}\) that decides \(L_n\), or outputs \(\perp\).

When we simply say that L can be decided by \(\mathcal {NTIME}[T]\)-uniform circuits (without specifying a size bound S), we consider the trivial size bound \(S(n)=T(n)\).

The class \(\mathcal {ONTIME}[T]\), which was defined in [17, 22] and stands for “oblivious \(\mathcal {NTIME}[T]\),” consists of all sets decidable by non-deterministic time-T machines such that for every input length \(n\in \mathbb {N}\) there exists a single witness \(w_n\) that convinces the non-deterministic machine on all n-bit inputs in the set. As mentioned in Section 2.2, the class of problems decidable by \(\mathcal {NTIME}[T]\)-uniform circuits is a subclass of \(\mathcal {ONTIME}[T]\), which is in turn a subclass of \(\mathcal {NTIME}[T]\cap \mathcal {SIZE}[T]\). That is:

Fact 5.2.

For \(T:\mathbb {N}\rightarrow \mathbb {N}\), if \(L\subseteq \lbrace 0,1\rbrace ^{*}\) can be decided by \(\mathcal {NTIME}[T]\)-uniform circuits, then \(L\in \mathcal {ONTIME}[T^{\prime }]\subseteq (\mathcal {NTIME}[T^{\prime }]\cap \mathcal {SIZE}[T^{\prime }])\), for \(T^{\prime }(n)=\tilde{O}(T(n))\).

Proof.

Fix L, and let M be a non-deterministic machine that uniformly constructs circuits for L as in Definition 5.1. For every \(n\in \mathbb {N}\), let \(w_n\in \lbrace 0,1\rbrace ^{T(n)}\) be non-deterministic choices such that \(M(1^n,w_n)\) is a circuit for \(L_n\). Then, L can be decided by a non-deterministic machine that gets input \(x\in \lbrace 0,1\rbrace ^{n}\) and witness \(w_n\), constructs a circuit for \(L_n\) using \(w_n\), and evaluates this circuit at input x. The same witness \(w_n\) leads this non-deterministic machine to accept all \(x\in L_n\), and the running time is quasilinear in the size of the circuit (i.e., in T).□

Since we will be repeating some technical non-degeneracy conditions on functions throughout the section, let us define these conditions concisely at this point:

Definition 5.3

(Size Functions and Time Functions).

We say that \(S:\mathbb {N}\rightarrow \mathbb {N}\) is a size function if S is time-computable, increasing, satisfies \(S(n)=o(2^n/n)\), and for every \(n\in \mathbb {N}\) satisfies \(S(n)\gt n\) and \(S(n+1)\le 2S(n)\). We say that \(T:\mathbb {N}\rightarrow \mathbb {N}\) is a time function if T is time-computable, increasing, and for every \(n\in \mathbb {N}\) satisfies \(T(n)\gt n\).

We will first prove, in Section 5.1, the key technical results that underlie the main theorems; these technical results will be strengthenings of classical Karp-Lipton style theorems. Then, in Section 5.2, we will prove Theorems 1.4, 1.5, and 1.6.

5.1 Strengthened Karp-Lipton Style Results

Recall that Babai et al. [3] proved that if \(\mathcal {EXP}\subset \mathcal {P}/\mathrm{poly},\) then \(\mathcal {EXP}=\mathcal {MA}\); if we also use an additional hypothesis that \(pr\mathcal {BPP}=pr\mathcal {P}\), then we can deduce the stronger conclusion \(\mathcal {EXP}= \mathcal {NP}\). In the current section, we will prove two strengthenings of this result, which further strengthen the foregoing conclusion: Instead of deducing that \(\mathcal {EXP}=\mathcal {NP}\), we will deduce that \(\mathcal {EXP}\) can be decided by \(\mathcal {NTIME}[T]\)-uniform circuits of size S, for small values of \(T,S\).

We first prove, in Section 5.1.1 a lemma that will be used in one of our proofs; we present this lemma and the underlying question in a separate section, since they might be of independent interest. The two strengthened Karp-Lipton style results will be subsequently proved in Sections 5.1.2 and 5.1.3, respectively.

5.1.1 Solving (1,1/3)CAPP using Many Untrusted CAPP Algorithms.

Recall that in the problem \((\alpha ,\beta)\text{-}\mathsf {CAPP}\), we get as input a description of a circuit, and our goal is to distinguish between circuits with acceptance probability at least \(\alpha \gt 0\) and circuits with acceptance probability at most \(\beta \gt 0\); we also denote \(\mathsf {CAPP}=(2/3,1/3)\text{-}\mathsf {CAPP}\) (see Definition 3.1). Assume that we want to solve \(\mathsf {CAPP}\) on an input circuit C of description length n and that we are guaranteed that an algorithm A solves \(\mathsf {CAPP}\) on some input length (unknown to us) in the interval \([n,S(n)]\) for some function S. This problem arises, for example, if we assume that \(pr\mathcal {BPP}\subset \mathtt {i.o.}pr\mathcal {NP}\) (which implies that \(\mathsf {CAPP}\in \mathtt {i.o.}pr\mathcal {NP}\)) and want to derandomize \(\mathcal {MA}\) infinitely-often. (This is because when the \(\mathcal {MA}\) verifier gets an input of length m, the derandomization of the verifier corresponds to a \(\mathsf {CAPP}\) problem on some input length \(n=m^k\), but we are not guaranteed that the \(\mathsf {CAPP}\) algorithm works on input length n.³⁸) How can we solve this problem?

If we invoke the algorithm A on each input length in the interval \([n,S(n)]\), while feeding it C as input each time (i.e., C is padded up to the appropriate length), then we obtain a variety of answers, and it is not clear a priori how we can distinguish the correct answer from possibly misleading ones. In this section, we show a solution for this problem in the setting where we only need to solve \(\mathsf {CAPP}\) with one-sided error and when A solves a problem in \(pr\mathcal {BPP}\) that slightly generalizes \(\mathsf {CAPP}\). Intuitively, since we only need to solve \((1,1/3)\text{-}\mathsf {CAPP}\), it will be possible to prove to us that C is not a YES instance (i.e., that C does not accept all of its inputs); and, since A solves a problem that slightly generalizes \(\mathsf {CAPP}\), we will be able to modify it to an algorithm that is able to provide such a proof when C is not a YES instance. Details follow.

We first define the aforementioned variation of \((\alpha ,\beta)\text{-}\mathsf {CAPP}\), denoted \(\mathsf {pCAPP}\) (for “parametrized \(\mathsf {CAPP}\)”), in which \(\alpha\) and \(\beta\) are specified as part of the input.

Definition 5.4

(Parametrized CAPP)

In the promise problem \(\mathsf {pCAPP}[S,\ell ]\), the input is a triplet \((C,\alpha ,\beta)\), where C is a Boolean circuit over v variables and of size \(S(v)\) and \(1\gt \alpha \gt \beta \gt 0\) are rational numbers specified with \(\ell (v)\) bits. The YES instances are such that \(\Pr _x[C(x)=1]\ge \alpha\) and the NO instances are such that \(\Pr _x[C(x)=1]\le \beta\).

Note that if \(\ell (v)=O(\log (S(v)))\), then \(\mathsf {pCAPP}[S,\ell ]\in pr\mathcal {BPP}\). (This is since we can uniformly sample \(\epsilon ^{-2}\) inputs for C, where \(\epsilon =\beta -\alpha \ge 1/\mathrm{poly}(S(v))\), and estimate \(\Pr _x[C(x)=1]\) with accuracy \((\alpha -\beta)/2\), with high probability.) We now show that solving \((1,1/3)\text{-}\mathsf {CAPP}\) for circuits of size \(S(n)\) infinitely-often reduces to solving \(\mathsf {pCAPP}\) infinitely-often (i.e., on an arbitrary infinite set of input lengths).

Lemma 5.5

(SolvingCAPPwith One-sided Error on a Fixed Input Length Reduces to SolvingpCAPPon an Unknown “Close” Input Length)

For any two size functions \(S^{(n)},S^{(v)}:\mathbb {N}\rightarrow \mathbb {N}\) and time function \(T:\mathbb {N}\rightarrow \mathbb {N}\), assume that \(\mathsf {pCAPP}[S^{(v)},\ell ]\in \mathtt {i.o.}\mathcal {DTIME}[T]\), where \(\ell (v)=4\cdot \log (v)\). Then, there exists an algorithm \(M^{co\mathcal {RP}}\) that for infinitely-many values of \(n\in \mathbb {N}\), when given as input \((1^n,C)\) such that C a v-bit circuit of size at most \(\max \lbrace S^{(n)}(n),S^{(v)}(v)\rbrace\), the algorithm \(M^{co\mathcal {RP}}\) solves \((1,1/3)\text{-}\mathsf {CAPP}\) on C in time \(\mathrm{poly}(n)\cdot v\cdot \tilde{O}(S(n))\cdot T(\tilde{O}(S^{(n)}(n)))\).

Proof.

Let \(\mathtt {ql}(S)=\tilde{O}(S)\) such that circuits of size S can be described by strings of length \(\mathtt {ql}(S)\). For any \(n\in \mathbb {N}\), we consider inputs of length \(S^{(n)}(n)\) that describe v-bit circuits of size \(S^{(v)}(v)\). Let \(I_n=[2\mathtt {ql}(S^{(n)}(n)),2\mathtt {ql}(S^{(n)}(n+1))-1]\), and note that any sufficiently large integer belongs to a unique interval \(I_n\). Let \(M^{\mathsf {pCAPP}}\) be a time-T algorithm that solves \(\mathsf {pCAPP}[S^{(v)},\ell ]\) infinitely-often. We will use \(M^{\mathsf {pCAPP}}\) to construct the following search algorithm:

Claim 5.5.1

(Search-to-decision Reduction that Preserves the Input Length).

There exists an algorithm F that gets as input \((1^n,C,m)\), where C is a v-bit circuit of size at most \(\max \lbrace S^{(n)}(n),S^{(v)}(v)\rbrace\) and \(m\in I_n\), runs in time \(\mathrm{poly}(n)\cdot v\cdot T(m)\), and if \(M^{\mathsf {pCAPP}}\) correctly solves \(\mathsf {pCAPP}[S^{(v)},\ell ]\) on input length m and \(\Pr _x[C(x)=1]\le 1/3\), then \(F(1^n,C,m)\in C^{-1}(0)\).□

Proof.

In the following, we will construct a set of m-bit inputs and run \(M^{\mathsf {pCAPP}}\) on each of those inputs. Since all of our inputs will be of the form \((C,\alpha ,\beta)\) where \(\alpha\) and \(\beta\) can be specified with \(4\cdot \log (v)\) bits, each input will be of size less than \(2\mathtt {ql}(S^{O(n)}(n))\le m\); we will therefore pad each input to be of length exactly m.

First, we run \(M^{\mathsf {pCAPP}}\) on input \((C,1/2,1/3)\), and if \(M^{\mathsf {pCAPP}}\) accepts, then we output \(0^v\). Otherwise, when \(M^{\mathsf {pCAPP}}\) rejected, we have that \(\Pr _x[C(x)=1]\le 1/2\); in this case, our goal will be to construct a string in \(C^{-1}(0)\), bit-by-bit. Let \(\lnot C\) be the circuit that computes C and negates the output, let \(\sigma _0\) be the empty string, and for \(i\in [v]\), in iteration i, we act as follows:

(1)	We start with a prefix \(\sigma _{i-1}\in \lbrace 0,1\rbrace ^{i-1}\), and with the guarantee that the circuit \(\lnot C_{\sigma _{i-1}}\), which is obtained by fixing the first \(i-1\) input variables of \(\lnot C\) to \(\sigma _{i-1}\), satisfies \(\Pr _x[\lnot C_{\sigma _{i-1}}(x)=1]\ge 1/2-(i-1)\cdot v^{-2}\).
(2)	We run \(M^{\mathsf {pCAPP}}\) at input \((\lnot C_{\sigma _{i-1}0},1/2-(i-1)\cdot v^{-2},1/2-i\cdot v^{-2})\). If \(M^{\mathsf {pCAPP}}\) accepts, then we define \(\sigma _i=\sigma _{i-1}0\), and otherwise, we define \(\sigma _i=\sigma _{i-1}1\).
(3)	To see that the guarantee on \(\lnot C_{\sigma _i}\) is preserved for iteration \(i+1\), note that, if \(M^{\mathsf {pCAPP}}\) accepted, then \(\Pr _x[\lnot C_{\sigma _{i}}(x)=1]\gt 1/2-i\cdot v^{-2}\); and otherwise, we have that \(\Pr _x[\lnot C_{\sigma _{i-1}1}(x)=0]\le 1/2-(i-1)\cdot v^{-2}\), which implies (by the guarantee on \(\lnot C_{\sigma _{i-1}}\) from the beginning of the iteration) that \(\Pr _x[\lnot C_{\sigma _i}(x)=1]\ge 1/2-(i-1)\cdot v^{-2}\).

After the v iterations, we have that \(\Pr _x[\lnot C_{\sigma _i}(x)=1]\gt 0\), and therefore \(\sigma _i\in (\lnot C^{-1})(1)=C^{-1}(0)\) and we output \(\sigma _i\). The running time of each iteration is \(\mathrm{poly}(n)\cdot v\cdot T(m)\).□

Our algorithm \(M^{co\mathcal {RP}}\) runs F at inputs \(\lbrace (1^n,C,k)\rbrace _{k\in I_n}\) and evaluates C at the outputs of F; if for some \(k\in I_n\) it holds that \(C(F(C,k))=0,\) then \(M^{co\mathcal {RP}}\) rejects, and otherwise \(M^{co\mathcal {RP}}\) accepts. The running time of \(M^{co\mathcal {RP}}\) is \(\mathrm{poly}(n)\cdot v\cdot T(2\mathtt {ql}(S^{(n)}(n+1))) \cdot |I_n| = \mathrm{poly}(n)\cdot \tilde{O}(S^{(n)}(n))\) \(\cdot\) \(v\cdot T(\tilde{O}(S^{(n)}(n)))\).

Now, fix \(n\in \mathbb {N}\) such that for some \(m\in I_n\) it holds that \(M^{\mathsf {pCAPP}}\) decides \(\mathsf {pCAPP}[S^{(v)},\ell ]\) on inputs of length m. To see that \(M^{co\mathcal {RP}}\) correctly solves \((1,1/3)\text{-}\mathsf {CAPP}\) on an input circuit C over v bits of size at most \(\max \lbrace S^{(n)}(n),S^{(v)}(v)\rbrace\), note that if C accepts all its inputs, then \(M^{co\mathcal {RP}}\) always accepts C; and if C accepts at most \(1/3\) of its inputs, then for the “good” \(m\in I_n\) it holds that \(F(1^n,C,m)\in C^{-1}(0)\), in which case \(M^{co\mathcal {RP}}\) rejects.

5.1.2 A Strengthened Karp-Lipton Style Result for the “Low-end” Setting.

To prove our first strengthening of [3], let \(L\in \mathcal {EXP}\), and note that by our assumption \(L\in \mathcal {P}/\mathrm{poly}\). Consider an \(\mathcal {MA}\) verifier V that gets input \(1^n\), guesses a circuit \(C_L:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{}\), and tries to decide if \(C_L\) correctly computes \(L_n=L\cap \lbrace 0,1\rbrace ^{n}\). The key observation is that, since this decision problem (of deciding whether or not a given n-bit circuit computes \(L_n\)) is in \(\mathcal {EXP}\), we can apply the original Karp-Lipton style result of [3] to it. The latter result implies that there exists an \(\mathcal {MA}\) verifier M that decides whether or not \(C_L\) computes \(L_n\) correctly. Our verifier V guesses \(C_L\) and a witness for M, simulates M, and if M confirms that \(C_L\) computes \(L_n\), then V outputs \(C_L\).

We will derandomize the foregoing \(\mathcal {MA}\) verifier in one of two ways: The first relies on a hypothesis of the form \(pr\mathcal {BPP}\subseteq pr\mathcal {NSUBEXP}\), which immediately implies that \(\mathcal {MA}\subseteq \mathcal {NSUBEXP}\). The second relies on a hypothesis of the form \(pr\mathcal {BPP}\subset \mathtt {i.o.}pr\mathcal {SUBEXP}\); in this case, we derandomize the \(\mathcal {MA}\) verifier infinitely-often, relying on the fact that the \(\mathcal {MA}\) verifier can be assumed to have perfect completeness [18] and on Lemma 5.5 (which was presented in Section 5.1.1). Note that in both cases, the running time of the resulting non-deterministic machine is sub-exponential, but the size of the output circuit \(C_L\) is nevertheless still polynomial.

The following statement and proof generalize the above, using parametrized “collapse” and derandomization hypotheses. Specifically, if we assume that \(\mathcal {E}\subset \mathcal {SIZE}[S]\) and that \(pr\mathcal {BPP}\) can be derandomized in time T, then we deduce that \(\mathcal {E}\) has \(\mathcal {NTIME}[T^{\prime }]\) uniform circuits of size \(S(n)\), where \(T^{\prime }(n)\approx T(S(S(n)))\).

Proposition 5.6 (A Strengthened “Low-end” Karp-Lipton Style Result).

There exist two constants \(k,k^{\prime }\gt 1\) such that for any size function \(S:\mathbb {N}\rightarrow \mathbb {N}\) and time function \(T:\mathbb {N}\rightarrow \mathbb {N}\) satisfying \(T(n)\ge n^{k^{\prime }}\) the following holds: Let \(T^{\prime }(n)=T(\bar{S}(n)))^{O(1)}\) where \(\bar{S}(n)=\tilde{O}(S(\tilde{O}(S(n))))\).

(1)	If \(\mathcal {DTIME}[2^n]\subset \mathcal {SIZE}[S]\) and \(\mathsf {pCAPP}[v^{k}\cdot S(v),4\cdot \log (v)]\in \mathtt {i.o.}pr\mathcal {DTIME}[T]\), then any \(L\in \mathcal {DTIME}[2^n]\) can be decided on infinitely-many input lengths by \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits of size \(S(n)\).
(2)	If \(\mathcal {DTIME}[2^n]\subset \mathcal {SIZE}[S]\) and \((1,1/3)\text{-}\mathsf {CAPP}[v^{k}\cdot S(v)]\in pr\mathcal {NTIME}[T]\), then any \(L\in \mathcal {DTIME}[2^n]\) can be decided (on all input lengths) by \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits of size \(S(n)\).

Proof.

We first prove Item (1). Fix \(L\in \mathcal {DTIME}[2^n]\), and recall that by our hypothesis \(L\in \mathcal {SIZE}[S]\). We define a corresponding problem \(L\text{-}\mathtt {Ckts}\) as the set of size-S circuits that decide L; that is, denoting by \(\mathtt {ql}(S)=\tilde{O}(S)\) the description length of size-S circuits, on inputs of length \(N=n+\mathtt {ql}(S(n))\) we define \(L\text{-}\mathtt {Ckts}\) by \(\begin{align*} L\text{-}\mathtt {Ckts}_N = \left\lbrace (1^n,C) : |C|=\mathtt {ql}(S(n))\wedge \forall x\in \lbrace 0,1\rbrace ^{n}, C(x)=L(x) \right\rbrace \;\text{,} \end{align*}\) and on inputs of length N that cannot be parsed as \(N=n+\mathtt {ql}(S(n))\), we define \(L\text{-}\mathtt {Ckts}\) trivially. Note that \(L\text{-}\mathtt {Ckts}\in \mathcal {DTIME}[2^N]\), since we can enumerate the \(2^n\lt 2^{o(N)}\) inputs, and for each \(x\in \lbrace 0,1\rbrace ^{n}\) compute \(C(x)\) and \(L(x)\) in time \(2^n+\mathrm{poly}(|C|)\lt 2^{o(N)}\).

Given input \(1^n\), we first guess a circuit \(C^{(L)}_n\) of size \(S(n)\), in the hope that \(C^{(L)}_n\) decides \(L_n\); note that a suitable circuit exists by our hypothesis. Now, we consider the problem of deciding if \(x=(1^n,C^{(L)}_n)\in L\text{-}\mathtt {Ckts}\), where \(x\in \lbrace 0,1\rbrace ^{N=n+\mathtt {ql}(S(n))}\). Since \(L\text{-}\mathtt {Ckts}\in \mathcal {DTIME}[2^N]\), we can reduce \(L\text{-}\mathtt {Ckts}\) to the problem \(L^{\mathtt {nice}}\) from Proposition 3.12; that is, we compute in time \(\mathrm{poly}(N)\) an input \(x^{\prime }\in \lbrace 0,1\rbrace ^{N^{\prime }=O(N)}\) for \(L^{\mathtt {nice}}\) such that \(x\in L\text{-}\mathtt {Ckts}\iff x^{\prime }\in L^{\mathtt {nice}}\).

Now, let \(\bar{N}=\ell (N^{\prime })=O(N)\), where \(\ell\) is the query length of the instance checker \(\mathtt {IC}\) for \(L^{\mathtt {nice}}\). We guess another circuit, which is of size \(S(2\bar{N})\) and denoted \(C^{L^{\mathtt {nice}}}_{\bar{N}}:\lbrace 0,1\rbrace ^{\bar{N}}\rightarrow \lbrace 0,1\rbrace ^{}\), in the hope that \(C^{L^{\mathtt {nice}}}_{\bar{N}}\) decides \(L^{\mathtt {nice}}_{\bar{N}}\); again, a suitable circuit exists by our hypothesis.³⁹ We then construct a circuit \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}:\lbrace 0,1\rbrace ^{O(\bar{N})}\rightarrow \lbrace 0,1\rbrace ^{}\) that computes the decision of \(\mathtt {IC}\) at input \(x^{\prime }\) and with oracle \(C^{L^{\mathtt {nice}}}_{\bar{N}}\), as a function of the \(O(\bar{N})\) random coins of \(\mathtt {IC}\), and maps the outputs \(\lbrace 0,\perp \rbrace\) of \(\mathtt {IC}\) to 0, and the output 1 of \(\mathtt {IC}\) to 1.

Note that the circuit \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) is over \(v=O(\bar{N})\) input bits and of size \(S^{(n)}(n){\overset{def}{=}}\mathrm{poly}(N)\cdot S(2\bar{N})\). Also, measuring the size of \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) as a function of its number of input bits (i.e., of v), the size is upper-bounded by \(S^{(v)}(v){\overset{def}{=}}v^{k}\cdot S(v)\), where \(k\in \mathbb {N}\) is a sufficiently large universal constant (and we assume without loss of generality that \(v\ge 2\bar{N}\)). By the properties of the instance checker, and using the fact that a suitable circuit \(C^{L^{\mathtt {nice}}}_{\bar{N}}\) for \(L^{\mathtt {nice}}_{\bar{N}}\) exists, we have that:

(1)	If \(C^{(L)}_n\) decides L, then \(x^{\prime }\in L^{\mathtt {nice}}\), and hence for some guess of \(C^{L^{\mathtt {nice}}}_{\bar{N}}\) the circuit \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) will have acceptance probability one.
(2)	If \(C^{(L)}_n\) does not decide L, then \(x^{\prime }\notin L^{\mathtt {nice}}\), and hence for all guesses of \(C^{L^{\mathtt {nice}}}_{\bar{N}}\) the circuit \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) accepts at most \(1/6\) of its inputs.

Using our hypothesis about \(\mathsf {pCAPP}\) and Lemma 5.5, there exists an algorithm \(M^{co\mathcal {RP}}\) that for infinitely-many values of \(n\in \mathbb {N}\) gets input \((1^n,\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }})\) and solves \((1,1/3)\text{-}\mathsf {CAPP}\) on \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) in time \(\mathrm{poly}(n)\cdot v\cdot \tilde{O}(S(n))\cdot T(\tilde{O}(S^{(n)}(n))\). We run this algorithm on \((1^n,\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }})\), and if it accepts (i.e., asserts that the acceptance probability of \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) is larger than \(1/3\)), then we output the circuit \(C^{(L)}_n\); otherwise, we output \(\perp\).

Note that the size of the circuit that we output is \(S(n)\), and that our running time is at most \(\begin{align*} \mathrm{poly}(n)\cdot v\cdot \tilde{O}(S(n))\cdot T\left(\tilde{O}(S^{(n)}(n)\right) &= \mathrm{poly}(n)\cdot \tilde{O}(S(n))^2\cdot T\left(\tilde{O}(S(\tilde{O}(S(n)))) \right) \\ &\le T\left(\tilde{O}(S(\tilde{O}(S(n)))) \right)^{O(1)} \;\text{,} \end{align*}\) where the last inequality relied on the fact that \(T(n)\ge n^{k^{\prime }}\) for a sufficiently large constant \(k^{\prime }\).

Let us now explain how to prove Item (2). We guess \(C^{(L)}_n\) and \(C^{L^{\mathtt {nice}}}_{\bar{N}}\) and construct \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) as above. However, instead of using Lemma 5.5, we run the hypothesized non-deterministic \((1,1/3)\text{-}\mathsf {CAPP}[v^k\cdot S(v)]\) machine, denoted \(M^{co\mathcal {RP}}\), on input \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{N}}}_{x^{\prime }}\) (the advantage in the current setting being that, in contrast to the proof of Item (1), the machine \(M^{co\mathcal {RP}}\) is guaranteed to work on all input lengths). When \(C^{(L)}_n\) decides \(L_n\) there are some non-deterministic choices that will cause \(M^{co\mathcal {RP}}\) to accept, whereas when \(C^{(L)}_n\) does not decide \(L_n\), all non-deterministic choices will cause \(M^{co\mathcal {RP}}\) to reject. Our running time is \(T(\tilde{O}(S^{(n)}(n)))\), which can be bounded as above by \(T(\tilde{O}(S(\tilde{O}(S(n)))))^{O(1)}\).□

Note that, in the proof of Proposition 5.6, we did not use the fact that \(L^{\mathtt {nice}}\) is randomly self-reducible, but only the facts that \(L^{\mathtt {nice}}\) is complete for \(\mathcal {E}\) under linear-time reductions (such that all n-bit inputs are mapped to \(n^{\prime }\)-bit inputs, for \(n^{\prime }=O(n)\)) and that it has an instance checker with query length \(\ell (n)=O(n)\).

5.1.3 A Strengthened Karp-Lipton Style Result for the “High-end” Setting.

The result presented next asserts that if \(\mathcal {E}\in \mathcal {SIZE}[S]\) and \(pr\mathcal {BPP}\) can be derandomized in time T, then \(\mathcal {E}\) has \(\mathcal {NTIME}[T^{\prime }]\) uniform circuits (with a trivial size bound of \(T^{\prime }(n)\)), where \(T^{\prime }\approx T(S(n))\). The main difference between this result and the result presented in Section 5.1.3, other than the differences in parameters, is that for this result, we will need to assume that \(pr\mathcal {BPP}\) can be derandomized deterministically, rather than only non-deterministically.

Let us briefly describe the proof idea. We construct a circuit for an \(\mathcal {E}\)-complete problem \(L^{\mathtt {nice}}\) that has an instance checker and that is randomly self-reducible (see Section 3.5 for definitions and details). We guess a circuit \(C^{L^{\mathtt {nice}}}\) for \(L^{\mathtt {nice}}\), which exists by our “collapse” hypothesis, and randomly check whether or not this circuit “convinces” the instance checker on almost all inputs; if it does, then we instantiate the instance checker with \(C^{L^{\mathtt {nice}}}\) as an oracle, to obtain a “corrupt” version of \(L^{\mathtt {nice}}\), denoted \(\tilde{L}\). We then construct a probabilistic circuit \(C^{\prime }\) that decides \(L^{\mathtt {nice}}\), with high probability, using the random self-reducibility of \(L^{\mathtt {nice}}\) and oracle access to \(\tilde{L}\).

Now, under the hypothesis \(pr\mathcal {BPP}\subseteq pr\mathcal {DTIME}[T]\), we can derandomize the two probabilistic steps in the foregoing construction. Specifically, we derandomize the probabilistic verification that the circuit \(C^{L^{\mathtt {nice}}}\) “convinces” the instance checker on almost all inputs, and we also derandomize the probabilistic circuit itself (i.e., we actually output a deterministic circuit that constructs the probabilistic circuit \(C^{\prime }\) and applies a deterministic \(\mathsf {CAPP}\) algorithm to \(C^{\prime }\)). Details follow.

Proposition 5.7 (A Strengthened “High-end” Karp-Lipton Style Result).

There exist two constants \(k,k^{\prime }\gt 1\) such that for any size function \(S:\mathbb {N}\rightarrow \mathbb {N}\) and time function \(T:\mathbb {N}\rightarrow \mathbb {N}\) the following holds: Assume that \(\mathcal {DTIME}[2^n]\subset \mathtt {i.o.}\mathcal {SIZE}[S]\) and that \(\mathsf {CAPP}[v^{k^{\prime }}\cdot S(v)]\in pr\mathcal {DTIME}[T]\). Then any \(L\in \mathcal {DTIME}[2^n]\) can be decided on infinitely-many input lengths by \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits, where \(T^{\prime }(n)=\tilde{O}\!(T(n^k\cdot S(k\cdot n)))\).

Note that the actual hypothesis of Proposition 5.7 is weaker than the hypothesis \(pr\mathcal {BPP}\in pr\mathcal {DTIME}[T]\), since we only require an algorithm for \(\mathsf {CAPP}\) for large circuits (i.e., for v-bit circuits of size \(\mathrm{poly}(v)\cdot S(v)\)).

Proof of Proposition 5.7

Fixing any \(L\in \mathcal {DTIME}[2^n]\), we prove that there exist \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits that solve L infinitely-often. In what follows, it will be important to distinguish between the non-deterministic machine M, and the deterministic circuit \(C:\lbrace 0,1\rbrace ^{n}\rightarrow \lbrace 0,1\rbrace ^{}\) that M constructs. The machine M gets input \(1^n\) and constructs C as follows:

Step 1: Reduce L to \(L^{\mathtt {nice}}\). As its first step, the circuit C computes the linear-time reduction from L to the problem \(L^{\mathtt {nice}}\) from Proposition 3.12; that is, C maps its input \(x\in \lbrace 0,1\rbrace ^{n}\) into \(x^{\prime }\in \lbrace 0,1\rbrace ^{n^{\prime }}\), where \(n^{\prime }=O(n)\), such that \(x\in L\) if and only if \(x^{\prime }\in L^{\mathtt {nice}}\).

Step 2: Guess-and-verify a circuit for \(L^{\mathtt {nice}}_{\bar{n}}\). Let \(\mathtt {IC}\) be the instance checker for \(L^{\mathtt {nice}}\) and let \(\bar{n}=\ell (n^{\prime })\) be the length of queries that \(\mathtt {IC}\) makes to its oracle on inputs of length \(n^{\prime }\).

Claim 5.7.1.

For infinitely-many input lengths n there exists a circuit \(C^{L^{\mathtt {nice}}}_{\bar{n}}:\lbrace 0,1\rbrace ^{\bar{n}}\rightarrow \lbrace 0,1\rbrace ^{}\) of size \(S(4\bar{n})\) that decides \(L^{\mathtt {nice}}_{\bar{n}}\).□

Proof.

For every \(n\in \mathbb {N},\) let \(I_{n}=[2\alpha \cdot n,2\alpha \cdot (n+1)-1]\), where \(\alpha \in \mathbb {N}\) is the constant such that \(\bar{n}=\ell (n^{\prime })=\alpha \cdot n\). Note that every sufficiently large integer \(m\in \mathbb {N}\) belongs to a unique interval \(I_{n}\) (i.e., \(n=\lfloor m/2\alpha \rfloor\)). We define \(L^{\prime }\) to be the language that on input length \(m\in I_{n}\) considers only its first \(\bar{n}=\alpha \cdot n\) input bits and decides \(L^{\mathtt {nice}}_{\bar{n}}\) on those input bits. Since \(L^{\prime }\) on input length m can be decided in time \(\tilde{O}(2^n)\lt 2^m\), by our hypothesis there exist an infinite set \(\mathcal {M}\subseteq \mathbb {N}\) of input lengths such that for every \(m\in \mathcal {M}\) there exist size-\(S(m)\) circuits for \(L^{\prime }_{m}\). For every such \(m\in I_{n}\), we hard-wire the last \(m-\bar{n}\) input bits (to be all-zeroes) and obtain a circuit of size \(S(m)\lt S(4\alpha \cdot n)= S(4\bar{n})\) that decides \(L^{\mathtt {nice}}_{\bar{n}}\).□

Thus, if n is one of the infinitely-many input lengths mentioned in Claim 5.7.1, then there exists \(C^{L^{\mathtt {nice}}}_{\bar{n}}:\lbrace 0,1\rbrace ^{\bar{n}}\rightarrow \lbrace 0,1\rbrace\) of size \(S(4\bar{n})\) that decides \(L^{\mathtt {nice}}_{\bar{n}}\). The machine M non-deterministically guesses such a circuit. We define the corruption of \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) by \(\begin{align*} \mathtt {Crpt}\left(C^{L^{\mathtt {nice}}}_{\bar{n}}\right)=\Pr _{z\in \lbrace 0,1\rbrace ^{n^{\prime }}}\left[ \Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)=\perp ]\gt 1/6 \right] \;\text{,} \end{align*}\) where the internal probability is over the random choices of the machine \(\mathtt {IC}\). Let \(\mathtt {Dec}\) be the machine underlying the random self-reducibility of \(L^{\mathtt {nice}}\), and let \(c\in \mathbb {N}\) such that the number of queries that \(\mathtt {Dec}\) makes on inputs of length \(n^{\prime }\) is at most \((n^{\prime })^c\). Consider the following promise problem \(\Pi\):

The input is guaranteed to be a circuit \(C^{L^{\mathtt {nice}}}_{\bar{n}}:\lbrace 0,1\rbrace ^{\bar{n}}\rightarrow \lbrace 0,1\rbrace ^{}\) of size \(S(4\bar{n})\).
YES instances: The circuit \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) decides \(L^{\mathtt {nice}}_{\bar{n}}\), in which case \(\mathtt {Crpt}(C^{L^{\mathtt {nice}}}_{\bar{n}})=0\).
NO instances: It holds that \(\mathtt {Crpt}(C^{L^{\mathtt {nice}}}_{\bar{n}}) \gt (n^{\prime })^{-2c}\).

Now, note that \(\Pi \in pr\text{-}co\mathcal {RP}\), since a probabilistic algorithm that gets \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) as input can decide whether \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) is a YES instance or a NO instance by sampling z’s and estimating \(\Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)=\perp ]\) for each z. Moreover, using the sampler from Theorem 3.5, there is a probabilistic \(co\mathcal {RP}\) algorithm for \(\Pi\) that on input \(C^{L^{\mathtt {nice}}}_{\bar{n}}:\lbrace 0,1\rbrace ^{\bar{n}}\rightarrow \lbrace 0,1\rbrace ^{}\) of size \(S(4\bar{n})\) uses \(m=O(n)\) random bits and runs in time \(\mathrm{poly}(n)\cdot S(4\bar{n})\).⁴⁰

Hence, the problem \(\Pi\) is reducible to an instance of \((1,1/3)\text{-}\mathsf {CAPP}\) with a circuit \(C_{\Pi }\) on \(v=O(n)\) input bits and of size \(n^{O(1)}\cdot S(4\bar{n})=v^{O(1)}\cdot S(v)\). The machine M runs the hypothesized \(\mathsf {CAPP}[v^{k^{\prime }}\cdot S(v)]\) algorithm on \(C_{\Pi }\), which takes time \(T(n^{O(1)}\cdot S(O(n)))\), and rejects iff the \(\mathsf {CAPP}\) algorithm rejects. Thus, from now on, we can assume that \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) is not a NO instance of \(\Pi\), or in other words that \(\mathtt {Crpt}(C^{L^{\mathtt {nice}}}_{\bar{n}})\le (n^{\prime })^{-2c}\).

Step 3: Transforming a non-corrupt \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) into a probabilistic circuit for L. Given that \(\mathtt {Crpt}(C^{L^{\mathtt {nice}}}_{\bar{n}})\le (n^{\prime })^{-2c}\), the machine M now transforms \(C^{L^{\mathtt {nice}}}\) into a probabilistic circuit \(C^{\prime }\) that computes L. In high level, the circuit \(C^{\prime }\) simulates the random self-reducibility algorithm \(\mathtt {Dec}\) for L while resolving the random queries of \(\mathtt {Dec}\) by instantiating the instance checker with oracle \(C^{L^{\mathtt {nice}}}\). Details follow.

Lemma 5.7.2

(Non-corrupt\(C^{L^{\mathtt {nice}}}_{\bar{n}}\) → Probabilistic Circuit for\(L^{\mathtt {nice}}\))

There exists an algorithm that gets as input \(1^n\) and a circuit \(C^{L^{\mathtt {nice}}}_{\bar{n}}:\lbrace 0,1\rbrace ^{\bar{n}}\rightarrow \lbrace 0,1\rbrace ^{}\) of size \(S(4\bar{n})\) such that \(\mathtt {Crpt}(C^{L^{\mathtt {nice}}}_{\bar{n}})\le (n^{\prime })^{-2c}\) and outputs a probabilistic circuit \(C^{\prime }:\lbrace 0,1\rbrace ^{n^{\prime }}\rightarrow \lbrace 0,1\rbrace ^{}\) of size \(\mathrm{poly}(n)\cdot S(4\bar{n})\) that uses \(O(n)\) random coins such that for every \(x^{\prime }\in \lbrace 0,1\rbrace ^{n^{\prime }}\), with high probability over choice of random coins r for \(C^{\prime }\) it holds that \(C^{\prime }(x^{\prime },r)=L^{\mathtt {nice}}(x^{\prime })\).

Proof.

We consider an instantiation of \(\mathtt {IC}\) on inputs of length \(n^{\prime }\) and with oracle to \(C^{L^{\mathtt {nice}}}_{\bar{n}}\), and as a first step, we reduce the error of this algorithm. Let \(m=O(n)\) be the number of random bits that \(\mathtt {IC}\) uses on inputs of length \(n^{\prime }\). Consider the following probabilistic algorithm \(\hat{\mathtt {IC}}:\lbrace 0,1\rbrace ^{n^{\prime }}\rightarrow \lbrace 0,1,\perp \rbrace\). Given input \(z\in \lbrace 0,1\rbrace ^{n^{\prime }}\), the algorithm \(\hat{\mathtt {IC}}\) uses the sampler from Theorem 3.5, instantiated for output length m and with accuracy \(1/n\), to obtain a sample of \(D=\mathrm{poly}(n)\) strings \(r_1,\ldots ,r_{D}\in \lbrace 0,1\rbrace ^m\); then \(\hat{\mathtt {IC}}\) outputs the majority vote among the values \(\lbrace v_i\rbrace _{i\in [D]}\), where \(v_i\) is the output of \(\mathtt {IC}\) when instantiated on input z with oracle \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) and fixed randomness \(r_i\).

Note that \(\hat{\mathtt {IC}}\) uses \(O(n)\) random bits and runs in time \(\mathrm{poly}(n)\cdot S(4\bar{n})\). We claim that there exists a set \(G\subseteq \lbrace 0,1\rbrace ^{n^{\prime }}\) of density \(1-(n^{\prime })^{-2c}\) such that for every \(z\in G\), with probability at least \(1-\exp (-n)\) over the randomness of \(\hat{\mathtt {IC}}\) it holds that \(\hat{\mathtt {IC}}(z)=L^{\mathtt {nice}}(z)\). To see this, let G be the set of z’s such that \(\Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)=\perp ]\le 1/6\), and recall that the density of G is at least \(1-(n^{\prime })^{-2c}\). Note that for any \(z\in G\), we have that \(\Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)=L^{\mathtt {nice}}(z)\ge 2/3\), because \(\Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)\ne L^{\mathtt {nice}}(z)]\le \Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)=\perp ]+\Pr [\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)= \lnot C^{L^{\mathtt {nice}}}_{\bar{n}}(z)]\le 1/3\). Thus, for any fixed \(z\in G\), the probability (over the random choices of \(\hat{\mathtt {IC}}\)) that the majority vote of the \(v_i\)’s will not equal \(L^{\mathtt {nice}}(z)\) is at most \(\exp (-n)\).

Now, consider a probabilistic circuit \(C^{\prime }:\lbrace 0,1\rbrace ^{n^{\prime }}\rightarrow \lbrace 0,1\rbrace ^{}\) that chooses \(O(n)\) random bits to be used as randomness for \(\hat{\mathtt {IC}}\) and simulates the random self-reducibility algorithm \(\mathtt {Dec}\) on its input \(x^{\prime }\in \lbrace 0,1\rbrace ^{n^{\prime }}\) while answering its queries using the algorithm \(\hat{\mathtt {IC}}\) with the fixed random bits chosen in advance. Note that the circuit \(C^{\prime }\) is of size \(\mathrm{poly}(n)\cdot S(4\bar{n})\). We claim that for every \(x^{\prime }\in \lbrace 0,1\rbrace ^{n^{\prime }}\), with high probability \(C^{\prime }(x)=L^{\mathtt {nice}}(x^{\prime })\). To see this, recall that \(\mathtt {Dec}\) makes at most \((n^{\prime })^c\) queries such that each query is uniformly distributed, and thus the probability that all queries of \(\mathtt {Dec}\) lie in the set G is at least \(1-(n^{\prime })^{-c}\). Conditioned on this event, for each fixed query z, the probability over choice of randomness for \(\hat{\mathtt {IC}}\) that \(\hat{\mathtt {IC}}(z)\) does not output \(L^{\mathtt {nice}}(z)\) is at most \(\exp (-n)\). Hence, by another union-bound, with high probability, all the queries of \(\mathtt {Dec}\) are answered correctly, in which case \(C^{\prime }(x^{\prime })=L^{\mathtt {nice}}(x^{\prime })\).□

Step 4: Derandomizing \(C^{\prime }\). The non-deterministic machine guessed-and-verified a circuit \(C^{L^{\mathtt {nice}}}_{\bar{n}}:\lbrace 0,1\rbrace ^{\bar{n}}\rightarrow \lbrace 0,1\rbrace ^{}\) such that \(\mathtt {Crpt}(C^{L^{\mathtt {nice}}}_{\bar{n}})\le (n^{\prime })^{-2c}\), and transformed it (using the algorithm from Proposition 5.7.2) into a probabilistic circuit \(C^{\prime }\). The machine M then constructs the final circuit C, which gets input \(x\in \lbrace 0,1\rbrace ^{n}\) and acts as follows:

(1)	Computes the reduction from L to \(L^{\mathtt {nice}}\) to obtain \(x^{\prime }\in \lbrace 0,1\rbrace ^{n^{\prime }}\).
(2)	Hard-wires \(x^{\prime }\) into \(C^{\prime }\) to obtain a description of a circuit \(C^{\prime }_{x^{\prime }}:\lbrace 0,1\rbrace ^{O(n)}\rightarrow \lbrace 0,1\rbrace ^{}\) such that \(C^{\prime }_{x^{\prime }}(r)=C^{\prime }(x^{\prime },r)\).
(3)	Runs the hypothesized \(\mathsf {CAPP}[v^{k^{\prime }}\cdot S(v)]\) algorithm on \(C^{\prime }_x\) and outputs its decision.

Note that \(C^{\prime }_x\) is a circuit with \(v=O(n)\) input bits and of size \(\mathrm{poly}(n)\cdot S(4\bar{n})=v^{O(1)}\cdot S(v)\), and therefore for an appropriate choice of constant \(k^{\prime }\), the \(\mathsf {CAPP}[v^{k^{\prime }}\cdot S(v)]\) algorithm distinguishes between the case that \(C^{\prime }\) accepts \(x^{\prime }\) with high probability and the case that \(C^{\prime }\) rejects \(x^{\prime }\) with high probability. Thus, for every \(x\in \lbrace 0,1\rbrace ^{n}\) it holds that \(C(x)=L(x)\). Finally, both the size of the circuit C and the running time of our non-deterministic machine are bounded by \(\tilde{O}(T((n^{O(1)}\cdot S(O(n))))\).

5.2 Proof of Theorems 1.4, 1.5, and 1.6

We now prove the main theorems from Section 1.3. We will first prove Theorem 1.4, which refers to the “low-end” parameter setting: Subexponential-time derandomization of \(pr\mathcal {BPP}\) and lower bounds for polynomial-sized circuits against \(\mathcal {EXP}\).

Theorem 5.8 (Theorem 1.4, Restated).

Assume that there exists \(\delta \gt 0\) such that \(\mathcal {DTIME}[2^n]\) cannot be decided by \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of an arbitrarily large polynomial size, even infinitely-often. Then, denoting \(pr \mathcal {SUBEXP}=\cap _{\epsilon \gt 0}pr\mathcal {DTIME}[2^{n^{\epsilon }}]\), we have that \(\begin{align*} \cup _c\mathsf {pCAPP}[v^c,4\cdot \log (v)]\in \mathtt {i.o.}pr\mathcal {SUBEXP}\iff \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\;\text{.} \end{align*}\)

Proof.

Let us first prove the first statement. The “\(\Longleftarrow\)” direction follows from [3], relying on the fact that \(\cup _c\mathsf {pCAPP}[v^c,4\cdot \log (v)]\in pr\mathcal {BPP}\). For the “\(\Longrightarrow\)” direction, assume that for every \(c\in \mathbb {N}\) and every \(\epsilon \gt 0\) it holds that \(\mathsf {pCAPP}[v^c,4\cdot \log (v)]\in \mathtt {i.o.}pr\mathcal {DTIME}[2^{n^{\epsilon }}]\). Assuming towards a contradiction that \(\mathcal {EXP}\subset \mathcal {P}/\mathrm{poly}\), we have that \(\mathcal {DTIME}[2^n]\subset \mathcal {SIZE}[n^c]\) for some \(c\in \mathbb {N}\). We use Item (1) of Proposition 5.6 with parameters \(S(n)=n^c\) and \(T(n)=2^{n^{\epsilon }}\), where \(\epsilon \gt 0\) is sufficiently small. We deduce that \(\mathcal {DTIME}[2^n]\) can be decided infinitely-often by \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits of size \(n^c\), where \(\begin{align*} T^{\prime }(n)\le T(\tilde{O}(S(\tilde{O}(S(n)))))^{O(1)} \lt T(n^{O_{c}(1)})^{O(1)} = 2^{n^{\epsilon \cdot O_{c}(1)}} \;\text{,} \end{align*}\) which contradicts our hypothesis if \(\epsilon\) is sufficiently small.□

We now prove Theorem 5.9, which refers to a “high-end” parameter setting (i.e., faster derandomization and lower bounds for larger circuits). We will in fact show that, conditioned on the hypothesis that \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{\Omega (n)}]\)-uniform circuits, even a weaker derandomization hypothesis is already equivalent to circuit lower bounds. For example, instead of assuming that \(pr\mathcal {BPP}=pr\mathcal {P}\), we will only need to assume that \(\mathsf {CAPP}\) for v-bit circuits of size \(2^{\Omega (v)}\) can be solved deterministically in time \(2^{\alpha \cdot v}\) for some small constant \(\alpha \gt 0\).⁴¹

Theorem 5.9 (Theorem 1.5, Restated).

Assume that there exists \(\delta \gt 0\) such that \(\mathcal {E}\) cannot be decided by \(\mathcal {NTIME}[2^{\delta \cdot n}]\)-uniform circuits even infinitely-often. Then:

(1)	There exists a universal constant \(c\gt 1\) such that \(\begin{align} \exists \epsilon \gt 0 : \mathsf {CAPP}[2^{\epsilon \cdot v}]\in pr\mathcal {DTIME}[n^{(\delta /c)/\epsilon)}] \iff \exists \epsilon \gt 0 : \mathcal {E}\not\subset \mathtt {i.o.}\mathcal {SIZE}[2^{\epsilon \cdot n}] \;\text{.} \end{align}\)
(2)	For every fixed constant \(c\gt 1\) it holds that \(\begin{align} \exists \alpha \gt 1 : \mathsf {CAPP}[2^{v^{1/c}}] \in pr \mathcal {DTIME}[2^{\alpha \cdot (\log n)^{c}}] \iff \exists \epsilon \gt 0 : \mathcal {E}\not\subset \mathtt {i.o.}\mathcal {SIZE}[2^{\epsilon \cdot n^{1/c}}] \;\text{.} \end{align}\)

Proof.

We first prove Item (1). The “\(\Longleftarrow\)” direction follows from [34] (or, alternatively, from the more general Corollary 3.3). Specifically, the hypothesized circuit lower bound implies that \(pr\mathcal {BPP}=pr\mathcal {P}\), and in particular that \(\mathsf {CAPP}\in pr\mathcal {DTIME}[n^{c^{\prime }}]\) for some \(c^{\prime }\in \mathbb {N}\). The conclusion then holds for \(\epsilon \lt \frac{\delta }{c\cdot c^{\prime }}\). For the “\(\Longrightarrow\)” direction, let \(k,k^{\prime }\in \mathbb {N}\) be as in Proposition 5.7, and let \(c=2k\). Assume that for some \(\epsilon \gt 0\) it holds that \(\mathsf {CAPP}[S^{\prime }]\in pr\mathcal {DTIME}[T]\), where \(T(n)=n^{(\delta /c)/\epsilon)}\), and \(S(n)=2^{\epsilon \cdot n}/n^{k^{\prime }}\), and \(S^{\prime }(v)=v^{k^{\prime }}\cdot S(v)=2^{\epsilon \cdot v}\). Assuming towards a contradiction that \(\mathcal {E}\subset \mathtt {i.o.}\mathcal {SIZE}[S]\), Proposition 5.7 implies that \(\mathcal {DTIME}[2^n]\) can be decided infinitely-often by \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits, where \(T^{\prime }(n)=\tilde{O}(T(n^k\cdot S(k\cdot n))) \lt 2^{\delta \cdot n}\); this is a contradiction.

The proof of Item (2) is similar. The “\(\Longleftarrow\)” follows from Corollary 3.3, instantiated with \(S(n)=2^{\epsilon \cdot n^{1/c}}\), to deduce that \(\mathsf {CAPP}\in pr\mathcal {DTIME}[T]\) for \(T(n)=2^{{\Delta }\cdot S^{-1}(n^{{\Delta }})}=2^{({\Delta }/\epsilon)^{c}\cdot (\log n)^{c}}\). For the “\(\Longrightarrow\)” direction, let \(\epsilon \lt (\delta /k\alpha)^{1/c}\) be sufficiently small, let \(S(n)=2^{\epsilon \cdot n^{1/c}}/ n^{k^{\prime }}\), let \(S^{\prime }(v)=v^{k^{\prime }}\cdot S(v)=2^{v^{1/c}}\), and let \(T(n)=2^{\alpha \cdot (\log n)^{c}}\). We use Proposition 5.7 as above, and rely on the fact that \(T^{\prime }(n)=\tilde{O}(T(n^k\cdot S(k\cdot n)))\lt 2^{\delta \cdot n}\).□

Next, we prove Theorem 1.6, which asserts that if non-deterministic derandomization implies lower bounds against \(\mathcal {EXP}\), then \(\mathcal {EXP}\) does not have \(\mathcal {NP}\)-uniform circuits. We will actually prove a stronger result: First, we will use a weaker hypothesis than in Theorem 1.6, namely, that \(pr\mathcal {BPP}\subseteq pr\mathcal {NP}\) implies circuit lower bounds against \(\mathcal {EXP}\); and second, we will deduce the stronger conclusion that \(\mathcal {EXP}\not\subseteq (\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\). (This conclusion is stronger, because the class of problems decidable by \(\mathcal {NP}\)-uniform circuits is a subclass of \(\mathcal {NP}\cap \mathcal {P}/\mathrm{poly}\).)

Theorem 5.10 (Theorem 1.6, Restated).

Assume that there exists \(\delta \gt 0\) such that \(\mathcal {E}\) does not have \(\mathcal {NTIME}[2^{n^{\delta }}]\)-uniform circuits of an arbitrarily large polynomial size. Then, (5.1) \(\begin{equation} pr\mathcal {BPP}\subset pr\mathcal {NSUBEXP}\Longrightarrow \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\;\text{,} \end{equation}\) where \(pr \mathcal {NSUBEXP}=\cap _{\epsilon \gt 0}pr\mathcal {NTIME}[2^{n^{\epsilon }}]\). In the other direction, if Equation (5.1) holds,⁴² then \(\mathcal {EXP}\not\subseteq (\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\), and in particular \(\mathcal {EXP}\) does not have \(\mathcal {NP}\)-uniform circuits.

Proof.

The proof of the first statement is similar to the proof of Theorem 5.8. We assume that \(\mathcal {EXP}\subset \mathcal {P}/\mathrm{poly}\), and use Item (2) of Proposition 5.6 with parameters \(S(n)=n^c\) and \(T(n)=2^{n^{\epsilon }}\), where \(\epsilon \gt 0\) is sufficiently small; we deduce that any \(L\in \mathcal {E}\) can be decided on all input lengths by \(\mathcal {NTIME}[T^{\prime }]\)-uniform circuits of size \(n^c\), where \(T^{\prime }(n) \lt 2^{O(n^{3\epsilon \cdot c})} \lt 2^{n^{\delta }}\), which is a contradiction (the last inequality relied on \(\epsilon \gt 0\) being sufficiently small).

To prove the “in the other direction” statement, first recall that \(pr\mathcal {EXP}\subseteq pr(\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\iff \mathcal {EXP}\subseteq (\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\), because every exponential-time machine that solves a promise problem also induces a language.⁴³ Now, assume towards a contradiction that \(pr\mathcal {EXP}\subseteq pr(\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\). Since \(pr\mathcal {BPP}\subseteq pr\mathcal {EXP}\), we have that \(pr\mathcal {BPP}\subseteq pr(\mathcal {NP}\cap \mathcal {P}/\mathrm{poly})\). By the hypothesized conditional statement, it follows that \(\mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\), a contradiction.□

As mentioned in the introduction, by optimizing the parameters, we can show tighter two-way implications between the statement “derandomization and lower bounds are equivalent” and the statement “\(\mathcal {E}\) does not have \(\mathcal {NTIME}[T]\)-uniform circuits.” Towards proving this result, we define the following class of growth functions, which lie “in between” quasipolynomial functions and sub-exponential functions. For every two constants \(k,c\in \mathbb {N}\), we denote by \(\mathtt {e}^{(k,c)}:\mathbb {N}\rightarrow \mathbb {N}\) the function that applies k logarithms to its input, raises the obtained expression to the power c, and then takes k exponentiations of this expression. For example, \(\mathtt {e}^{(1,c)}(n)=2^{(\log n)^c}\) and \(\mathtt {e}^{(2,c)}(n)\in 2^{2^{\mathrm{loglog}(n)^c}}\). Note that \(\mathtt {e}^{(k+1,c)}\) grows asymptotically faster than \(\mathtt {e}^{(k,c^{\prime })}\) for any constants \(c,c^{\prime }\), and that \(\mathtt {e}^{(k,c)}\) is smaller than any sub-exponential function. Then, we have that:

Theorem 5.11 (Theorem 1.6, a Tighter Version).

For any constant \(k\in \mathbb {N}\), we have that: (5.2) \(\begin{eqnarray} \exists \delta \gt 0 &: \mathcal {DTIME}[2^n] \text{ does not have } \mathcal {NTIME}[T]\text{-uniform circuits, for } T=2^{\mathtt {e}^{(k,\delta)}} ,\\ &\Big \Downarrow \end{eqnarray}\) (5.3) \(\begin{eqnarray} pr\mathcal {BPP}&\subseteq \cap _{\epsilon \gt 0} pr\mathcal {NTIME}[2^{\mathtt {e}^{(k,\epsilon)}}]\Longrightarrow \mathcal {DTIME}[2^n]\not\subset \cup _{c_0\in \mathbb {N}}\mathcal {SIZE}[\mathtt {e}^{(k,c_0)}] \\ &\Big \Downarrow \end{eqnarray}\) (5.4) \(\begin{eqnarray} \forall c_0\in \mathbb {N}&, \mathcal {DTIME}[2^n]\not\subset (\mathcal {NTIME}[T]\cap \mathcal {SIZE}[T]) \text{, for } T(n)=\mathtt {e}^{(k,c_0)} , \end{eqnarray}\) that is, statement (5.2) implies statement (5.3), which in turn implies statement (5.4).

We stress that the gap between the values of T in statements (5.2) and (5.4) is substantial, but nevertheless much smaller than an exponential gap. This is since in statement (5.2), the hypothesis is for T that is exponential in \(\mathtt {e}^{(k,\delta)}\) where \(\delta \gt 0\) is an arbitrarily small constant, whereas in statement (5.4), the conclusion is for \(T=\mathtt {e}^{(k,c_0)}\) where \(c_0\) is an arbitrarily large constant. For example, for \(k=1,\) this is the difference between quasipolynomial functions and functions of the form \(2^{2^{(\log n)^{\epsilon }}}\ll 2^{n^{\epsilon }}\).

Proof of Theorem 5.11

To see that statement (5.2) implies statement (5.3), first observe that for any two constants \(c,c^{\prime }\in \mathbb {N}\) it holds that \((\mathtt {e}^{(k,c)})^{-1}(n)=\mathtt {e}^{(k,1/c)}(n)\) and that \(\mathtt {e}^{(k,c)}(\mathtt {e}^{(k,c^{\prime })}(n))=\mathtt {e}^{(k,cc^{\prime })}(n)\). Now, assuming that \(pr\mathcal {BPP}\subseteq \cap _{\epsilon }pr\mathcal {NTIME}[2^{\mathtt {e}^{(k,\epsilon)}}]\) and that \(\mathcal {DTIME}[2^n]\subset \cup _{c_0}\mathcal {SIZE}[\mathtt {e}^{(k,c_0)}]\), we will show that Equation (5.2) does not hold. To do so, we use Item (2) of Proposition 5.6 with \(S(n)=\mathtt {e}^{(k,c_0)}\) and with \(T(n)=2^{\mathtt {e}^{(k,\epsilon)}}(n)\) for a sufficiently small \(\epsilon \gt 0\), and rely on the fact that for some \(b\in \mathbb {N}\) it holds that \(T^{\prime }(n)\lt T(S(S(n)^b)^b)^b\lt T(\mathtt {e}^{(k,2b^2\cdot c_0)}(n))^b=2^{\mathtt {e}^{(k,2\epsilon b^3\cdot c_0)}(n)}\).

To see that statement (5.3) implies statement (5.4), assume towards a contradiction that for some \(c_0\in \mathbb {N}\) it holds that \(pr\mathcal {DTIME}[2^n]\subseteq pr(\mathcal {NTIME}[T]\cap \mathcal {SIZE}[T])\), where \(T(n)=\mathtt {e}^{(k,c_0)}(n)\). Hence, \(\mathsf {CAPP}\in \mathcal {DTIME}[\tilde{O}(2^n)]\subseteq pr(\mathcal {NTIME}[T(\tilde{O}(n))]\cap \mathcal {SIZE}[T(\tilde{O}(n))])\), and it follows that \(\begin{align*} pr\mathcal {BPP}&\subseteq \cup _{c\in \mathbb {N}}pr\mathcal {NTIME}[T(n^c)] \\ &\subseteq \cup _{c\in \mathbb {N}}pr\mathcal {NTIME}\left[\mathtt {e}^{(k,c)}\right]\\ &\subseteq \cap _{\epsilon \gt 0} pr\mathcal {NTIME}\left[2^{\mathtt {e}^{(k,\epsilon)}}\right] \;\text{.} \end{align*}\) By our hypothesis (i.e., by Equation (5.3)) it follows that \(\mathcal {DTIME}[2^n]\not\subset \cup _{c_0\in \mathbb {N}} \mathcal {SIZE}[ \mathtt {e}^{(k,c_0)}]\), which is a contradiction. Finally, to deduce the statement (i.e., bridge the gap between \(pr\mathcal {DTIME}[2^n]\) and \(\mathcal {DTIME}[2^n]\)), we use the same argument as in Footnote 43.□

6 NOT-RETH AND CIRCUIT LOWER BOUNDS FROM RANDOMIZED ALGORITHMS

In this section, we prove Theorem 1.7. We first show the desired \(\mathcal {BPE}\) lower bounds follow from a weak learning algorithm for general circuits of quasi-linear size and then show such an algorithm follows from the \(2^{n/\mathrm{polylog}(n)}\)-time randomized \(\mathtt {CircuitSAT}\) algorithm for roughly quadratic-size circuits.

We first generalize the definition of weak learning algorithms so the algorithm is now required to learn any possible small oracle circuits.

Definition 6.1

(Weak Learner for General Circuits).

For \(S : \mathbb {N} \rightarrow \mathbb {N}\) and \(\delta : \mathbb {N} \rightarrow \mathbb {R}\), we say that a randomized oracle machine A is a \(\delta\)-weak learner for S-size circuits if the following holds:

On input \(1^n\), A is given oracle access to an oracle \(O: \lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) and runs in time \(\delta ^{-1}(n)\).
If \(\mathcal {SIZE}(O) \le S(n)\), then with probability at least \(\delta\), A outputs a circuit C on n input bits with size \(\le S(n)\) such that C computes O correctly on at least a \(1/2+\delta\) fraction of inputs.⁴⁴

Next, we need the following standard diagonalization argument:

Proposition 6.2

(Diagonalization Against Circuits in Σ₄)

Let \(\delta = 2^{-n/\mathrm{polylog}(n)}\), \(k_{\sf ckt}\) be a constant, and \(f^{\mathtt {ws}}\) be the \(\delta\)-well-structured function guaranteed by Lemma 4.7, there is a language \(L^{\sf diag}\), which is \(n \cdot \mathrm{polylog}(n)\)-time reducible to \(f^{\mathtt {ws}}\), and \(L^{\sf diag}\notin \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}]\).

Proof.

Let \(s = n \cdot (\log n)^{k_{\sf ckt}}\) and \(s^{\prime } = s \cdot \log n\). By standard arguments, there exists an \(s^{\prime }\)-size circuit on n bits that cannot be computed by s-size circuits.

Consider the following \(\Sigma _4\) algorithm:

Given an input \(x \in \lbrace 0,1\rbrace ^n\), we guess a circuit C of size \(s^{\prime }\) on n input bits and reject immediately if \(C(x) = 0\). Then, we check the following two conditions and accept if and only if both of them are satisfied.
(A): For all circuits D on n input bits with size \(\le s\), there exists an input \(y \in \lbrace 0,1\rbrace ^n\) such that \(C(y) \ne D(y)\). That is, C cannot be computed by any circuit with size \(\le s\).
(B): For all circuits D on n input bits with size \(s^{\prime }\) such that the description of D is lexicographically smaller than that of C, there exists a circuit E with size \(\le s\) such that for all \(y \in \lbrace 0,1\rbrace ^n\), \(E(y) = D(y)\). That is, C is the lexicographically first \(s^{\prime }\)-size circuit that cannot be computed by s-size circuits.

Clearly, the above algorithm can be formulated as an \(n \cdot \mathrm{polylog}(n)\)-size \(\Sigma _4 SAT\) instance, and therefore also an \(n \cdot \mathrm{polylog}(n)\)-size \(\mathtt {TQBF}\) instance (which can be further reduced to \(f^{\mathtt {ws}}\) in \(n \cdot \mathrm{polylog}(n)\) time). Moreover, it is easy to see that it computes the truth-table of the lexicographically first \(s^{\prime }\)-size circuit on n input bits that cannot be computed by any circuit with size \(\le s\).

Therefore, we can set \(L^{\sf diag}\) to be the language computed by the above algorithm.□

Remark 6.3.

We remark that the standard \(\Sigma _3 P\) construction of a truth-table hard for s-size circuits actually takes \(\widetilde{O}(s^2)\) time, in which one first existentially guesses an \(s^{\prime }\)-length (where \(s^{\prime } = s \cdot \mathrm{polylog}(s)\)) truth-table L, then enumerates all possible s-size circuits C and all \(s^{\prime }\)-length truth-tables \(L^{\prime }\) such that \(L^{\prime } \lt L\) (lexicographically), and checks there exists an input x such that \(C(x) \ne L(x)\), and an s-size circuit \(C^{\prime }\) computing \(L^{\prime }\). In the last step, checking \(C^{\prime }\) computing \(L^{\prime }\) requires evaluating \(C^{\prime }\) on \(s^{\prime }\) many inputs, which takes \(\widetilde{O}(s^2)\) time.

Now, we are ready to show that weak learning algorithms imply non-trivial circuit lower bounds for \(\mathcal {BPE}\).

Theorem 6.4

(Weak Learning Algorithms Imply\(\mathcal {BPE}\)Lower Bounds)

For any constant \(k_{\sf ckt}\gt 0\), there is another constant \(k_{\sf learn}= k_{\sf learn}(k_{\sf ckt})\), such that letting \(\delta _{\sf learn} = 2^{-n/(\log n)^{k_{\sf learn}}}\), if there is a \(\delta _{\sf learn}\)-weak learner for \(n \cdot (\log n)^{k_{\sf ckt}}\)-size circuits, then \(\mathcal {BPTIME}[2^n] \not\subset \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}]\).

Proof.

Let \(\delta = 2^{-n/(\log n)^{k_{\delta }}}\) where \(k_{\delta }\) is a large enough constant, depending on \(k_{\sf ckt}\). Let \(f^{\mathtt {ws}}\) be the \(\delta\)-well-structured function guaranteed by Lemma 4.7.

Recall that \(f^{\mathtt {ws}}\in \mathcal {SPACE}[O(n)]\). Hence, the Boolean function \(f^{\mathtt {GL(ws)}}\), which is defined as in the proof of Lemma 4.9, is computable in \(\mathcal {SPACE}[O(n)]\) as well.

We can safely assume \(f^{\mathtt {GL(ws)}}\in \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}],\) as otherwise the theorem follows immediately. Then, by our assumption, it follows that there is a \(\delta _{\sf learn}\)-weak learner for \(f^{\mathtt {GL(ws)}}_n\). Applying Corollary 4.10 and setting \(k_{\sf learn}= k_{\delta }\), it follows that \(f^{\mathtt {ws}}\) can be computed by randomized \(T^{ws}(n) {\overset{def}{=}}2^{n / (\log n)^{k_{\sf learn}-1}}\).

Let \(L^{\sf diag}\) be the language guaranteed by Proposition 6.2 such that \(L^{\sf diag}\notin \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}]\) and \(d = d(k_{\sf ckt})\) be a constant such that \(L^{\sf diag}\) is \(n \cdot (\log n)^d\)-time reducible to \(f^{\mathtt {ws}}\). We can then compute \(L^{\sf diag}_n\) in randomized \(T^{ws}(n \cdot (\log n)^d) = 2^{o(n)}\) time by setting \(k_{\sf learn}\) to be large enough. Therefore, it follows that \(\mathcal {BPTIME}[2^n] \not\subset \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}]\).□

6.1 Randomized CircuitSAT Algorithms Imply BPE Circuit Lower Bounds

We now prove Theorem 1.7, which asserts that randomized algorithms that solve \(\mathtt {CircuitSAT}\) in time \(2^{n/\mathrm{polylog}(n)}\) imply circuit lower bounds against \(\mathcal {BPE}\). As explained in Section 2.3, we do so by showing that the foregoing algorithms for \(\mathtt {CircuitSAT}\) imply the weak learner for quasi-linear size circuits, which enables us to apply Theorem 6.4.

Reminder of Theorem 1.7. For any constant \(k_{\sf ckt}\in \mathbb {N}\) there exists a constant \(k_{\sf sat}\in \mathbb {N}\) such that the following holds: If \(\mathtt {CircuitSAT}\) for circuits over n variables and of size \(n^{2} \cdot (\log n)^{k_{\sf sat}}\) can be solved in probabilistic time \(2^{n/(\log n)^{k_{\sf sat}}}\), then \(\mathcal {BPTIME}[2^n]\not\subset \mathcal {SIZE}[n\cdot (\log n)^{k_{\sf ckt}}]\).

Proof.

Let \(s = s(n) = n \cdot (\log n)^{k_{\sf ckt}}\). Let \(k_{\sf learn}\) and \(\delta _{\sf learn}\) be as in Theorem 6.4 such that a \(\delta _{\sf learn}\)-weak learner for s-size circuits implies that \(\mathcal {BPE} \not\subset \mathcal {SIZE}[s]\). In the following, we construct such a weak learner A with the assumed \(\mathtt {CircuitSAT}\) algorithm. In fact, we are going to construct a stronger learner such that:

If \(\mathcal {SIZE}(O) \le s(n)\), then with probability at least \(2/3\), A outputs a circuit C on n input bits with size \(\le s(n)\) such that C computes O correctly on at least a 0.99 fraction of inputs.

Let \(k_{\sf sat}= k_{\sf sat}(k_{\sf ckt})\) be a constant to be specified later. The learner A first draws \(t = n \cdot (\log n)^{k_{\sf ckt}+ 2}\) uniform random samples \(x_1,x_2,\ldots\,,x_{t}\) from \(\lbrace 0,1\rbrace ^n\) and asks O to get \(y_i = O(x_i)\) for all \(i \in [t]\). Note that A operates incorrectly if and only if \(\mathcal {SIZE}(O) \le s(n)\) and it outputs a circuit D of size \(\le s(n)\) such that \(\Pr _{x \in \lbrace 0,1\rbrace ^n}[O(x) = D(x)] \lt 0.99\).

We say that a circuit D is bad if it has size \(\le s(n)\) and \(\Pr _{x \in \lbrace 0,1\rbrace ^n}[O(x) = D(x)] \lt 0.99\). For a fixed bad circuit D, by a Chernoff bound, with probability at least \(1 - 2^{-\Omega (t)}\), we have \(D(x_i) \ne y_i\) for some i. Since there are at most \(n^{O(s)}\) bad circuits, with probability at least \(1 - n^{O(s)} \cdot 2^{-\Omega (t)} \ge 1 - 2^{-\Omega (t) + O(s) \cdot \log n} = 1 - 2^{-\Omega (t)}\) (the last equality follows as \(t = n \cdot (\log n)^{k_{\sf ckt}+ 2}\)), it follows that for every bad circuit D there exists an index i such that \(D(x_i) \ne y_i\). In the following, we condition on such a good event.

By repeating the \(\mathtt {CircuitSAT}\) algorithm \(O(n)\) times and taking the majority of the outputs, we can assume without loss of generality that the \(\mathtt {CircuitSAT}\) algorithm has an error probability of at most \(2^{-n}\). Now, we use the randomized \(\mathtt {CircuitSAT}\) algorithm to construct a circuit C of size \(\le s(n)\) such that \(C(x_i) = y_i\) for all i, bit-by-bit (this can be accomplished with the well-known search-to-decision reduction for SAT) with probability at least 0.99. Note that, in each iteration, the length of the input to the \(\mathtt {CircuitSAT}\) algorithm is the length of the description of a circuit of size \(s(n)\) and hence at most \(s^{\prime }(n)= O(n \cdot (\log n)^{k_{\sf ckt}+1})\). Setting \(k_{\sf sat}\) large enough, it follows that A runs in randomized \((\delta _{\sf learn}(n))^{-1}\) time.

Assuming \(\mathcal {SIZE}(O) \le s(n)\), such circuits exist, and we can find one with probability at least 0.99. Conditioning on the good event, this circuit cannot be bad, and therefore it must agree with O on at least a 0.99 fraction of inputs. Putting everything together, when \(\mathcal {SIZE}(O) \le s(n)\), the algorithm A outputs a circuit C such that \(\Pr _{x \in \lbrace 0,1\rbrace ^n}[O(x) = D(x)] \ge 0.99\) with probability at least \(0.99 - 2^{-\Omega (t)} \ge 2/3\), which completes the proof.□

6.2 Randomized Σ₂-SAT[n] Algorithms Imply BPE Circuit Lower Bounds

One shortcoming of Theorem 1.7 is that the hypothesized algorithm needs to decide the satisfiability of an n-bit circuit of size \(\tilde{O}(n^2)\), rather than the satisfiability of circuits (or of \(3\text{-}\mathtt {SAT}\) formulas) of linear size.⁴⁵ To address this shortcoming, we now prove a different version of Theorem 1.7, which asserts that randomized algorithms that solve \(\Sigma _2\text{-}\mathtt {SAT}\) for formulas of linear size in time \(2^{n/\mathrm{polylog}(n)}\) imply circuit lower bounds against \(\mathcal {BPE}\).

Theorem 6.5

(Randomized\(\Sigma _2\text{-}\mathtt {SAT}\)Algorithms Imply Circuit Lower Bounds Against\(\mathcal {BPE}\))

For any constant \(k_{\sf ckt}\gt 0\), there is another constant \(k_{\sf sat}= k_{\sf sat}(k_{\sf ckt})\) such that if \(\Sigma _2\text{-}\mathtt {SAT}\) with n variables and n clauses can be decided in randomized \(2^{n / (\log n)^{k_{\sf sat}}}\) time, then \(\mathcal {BPTIME}[2^n] \not\subset \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}]\).

Proof.

Let \(\mathtt {TQBF^{loc}}\) be the function from Claim 4.7.1, and recall that \(\mathtt {TQBF^{loc}}\in \mathcal {SPACE}[O(n)]\). Therefore, we can safely assume \(\mathtt {TQBF^{loc}}\in \mathcal {SIZE}[s(n)]\), for \(s(n) = n \cdot (\log n)^{k_{\sf ckt}}\).

Now, we describe a randomized algorithm computing a circuit for \(\mathtt {TQBF^{loc}}\) on inputs of length n. First, it computes the trivial circuit of size-\(s(1)\) for \(\mathtt {TQBF^{loc}}_1\). Now, suppose we have an \(s(m)\)-size circuit \(C_{m}\) computing \(\mathtt {TQBF^{loc}}_m\) where \(m \lt n\), we wish to find an \(s(m+1)\)-size circuit for \(\mathtt {TQBF^{loc}}_{m+1}\).

By the downward self-reducibility of \(\mathtt {TQBF^{loc}}\), we can obtain directly an \(O(s(m))\)-size circuit D for \(\mathtt {TQBF^{loc}}_{m+1}\). Our goal is to utilizing the circuit D and our fast \(\Sigma _2\text{-}\mathtt {SAT}\) algorithm to compute an \(s(m+1)\)-size circuit for \(\mathtt {TQBF^{loc}}_{m+1}\). Consider the following \(\Sigma _2\text{-}\mathtt {SAT}\) question: Given a prefix p, is there an \(s(m+1)\) circuit C whose description starts with p, such that for all \(x \in \lbrace 0,1\rbrace ^{m+1}\), we have \(C(x) = D(x)\)? This can be formulated by a \(\Sigma _2\text{-}\mathtt {SAT}\) instance of \(n \cdot \mathrm{polylog}(n)\) size. By fixing the description bit-by-bit, we can obtain an \(s(m+1)\)-size circuit for \(\mathtt {TQBF^{loc}}_{m+1}\). The success probability can be boosted to \(1 - 2^{-2 n}\) by repeating each call to the \(\Sigma _2\text{-}\mathtt {SAT}\) algorithm a polynomial number of times and taking the majority.

Let \(L^{\sf diag}\) be the language guaranteed by Proposition 6.2 and d be a constant such that \(L^{\sf diag}\) is \(n \cdot (\log n)^d\)-time reducible to \(\mathtt {TQBF^{loc}}\). By setting \(k_{\sf sat}\) large enough, we can compute \(\mathtt {TQBF^{loc}}_{n \cdot (\log n)^d}\) (and therefore also \(L^{\sf diag}_n\)) in \(2^{o(n)}\) time, Therefore, it follows that \(\mathcal {BPTIME}[2^n] \not\subset \mathcal {SIZE}[n \cdot (\log n)^{k_{\sf ckt}}]\).□

Finally, we now use a “win-win” argument to deduce, unconditionally, that either we have an average-case derandomization of \(\mathcal {BPP}\) or \(\mathcal {BPE}\) is “hard” for circuits of quasilinear size (or both statements hold). An appealing interpretation of this result is as a Karp-Lipton-style theorem: If \(\mathcal {BPE}\) has circuits of quasilinear size, then \(\mathcal {BPP}\) can be derandomized in average-case.

Corollary 6.6

(A “Win-win” Result for Average-case Derandomization of\(\mathcal {BPP}\)and Circuit Lower Bounds against\(\mathcal {BPE}\))

At least one of the following statements is true:

(1)	For every constant \(k\in \mathbb {N}\) it holds that \(\mathcal {BPTIME}[2^n]\not\subset \mathcal {SIZE}[n\cdot (\log n)^k]\).
(2)	For every constant \(k\in \mathbb {N}\) and for \(t(n)=n^{\mathrm{loglog}(n)^k}\) there exists a \((1/t)\)-i.o.-PRG for \((t,\log (t))\)-uniform circuits that has seed length \(\tilde{O}(\log (n))\) and is computable in time \(n^{\mathrm{polyloglog}(n)}\).

Proof.

If for every \(k^{\prime }\in \mathbb {N}\) it holds that \(\Sigma _2\text{-}\mathtt {SAT}\) for n-bit formulas with \(O(n)\) clauses can be decided by probabilistic algorithms that run in time \(2^{n/(\log n)^{k^{\prime }}}\), then by Theorem 6.5, we have that Item (1) holds. Otherwise, for some \(k^{\prime }\in \mathbb {N},\) it holds that \(\Sigma _2\text{-}\mathtt {SAT}\) for n-bit formulas with \(O(n)\) clauses cannot be decided by probabilistic algorithms that run in time \(2^{n/(\log n)^{k^{\prime }}}\). In particular, since solving satisfiability of a given n-bit \(\Sigma _2\) formula with \(O(n)\) clauses can be reduced in linear time to solving \(\mathtt {TQBF}\), we have that \(\mathtt {TQBF}\notin \mathcal {BPTIME}[2^{n/(\log n)^{k^{\prime }+1}}]\). In this case, Item (2) follows from Theorem 4.14.□

We note that to prove Corollary 6.6, we do not have to use Theorem 6.5. An alternative proof relies on the fact that the \(\Sigma _4\) formula from the proof of Proposition 6.2 can be constructed in polynomial time. In particular, if \(\mathtt {TQBF}\) can be decided in probabilistic time \(2^{n/\mathrm{polylog}(n)}\) for an arbitrarily large polylogarithmic function, then for every \(k_{\sf ckt}\), we can construct the corresponding \(\Sigma _4\) formula from Proposition 6.2 in polynomial time and decide its satisfiability in probabilistic time \(2^{o(n)}\), which implies that \(L^{\sf diag}\in \mathcal {BPE}\); Item (1) of Corollary 6.6 then follows. Otherwise, we have that \(\mathtt {TQBF}\) cannot be solved in probabilistic time \(2^{n/\mathrm{polylog}(n)}\) for some polylogarithmic function; then we can invoke Theorem 4.14 to deduce Item (2) of Corollary 6.6.

APPENDICES

A ON IMPLICATIONS OF MAETH

Consider the hypothesis \(\mathsf {MAETH}\), which asserts that \(co\text{-}3\mathtt {SAT}\) cannot be solved by Merlin-Arthur protocols running in time \(2^{\epsilon \cdot n}\), for some \(\epsilon \gt 0\). Recall that the “strong” version of this hypothesis is false (since Williams [61] showed that \(\#\mathtt {CircuitSAT}\) can be solved by a Merlin-Arthur protocol in time \(\tilde{O}(2^{n/2})\)), but there is currently no evidence against the “non-strong” version.

As mentioned in Section 1.3, the assumption \(\mathsf {MAETH}\) can be easily shown to imply strong circuit lower bounds and derandomization of \(pr\mathcal {BPP}\) (and thus also of \(pr\mathcal {MA}\)). Specifically, the following more general (i.e., parametrized) result relies on a standard Karp-Lipton-style argument, which originates in [3]. We note in advance that after the proof of this result, we prove another result, which shows a very different tradeoff between \(\mathcal {MA}\) lower bounds (specifically, lower bounds for fixed-polynomial-time verifiers) and derandomization.

Theorem A.1

(Lower Bounds for\(\mathcal {MA}\)Algorithms Imply Non-uniform Circuit Lower Bounds)

There exists \(L\in \mathcal {E}\) and a constant \(k\gt 1\) such that for any time-computable function \(S:\mathbb {N}\rightarrow \mathbb {N}\) such that \(S(n)\ge n\) the following holds: Assume that \(\mathcal {DTIME}[2^n]\not\subseteq \mathcal {MATIME}[S^{\prime }]\), where \(S^{\prime }(n)=S(k\cdot n)^k\). Then, \(L\not\in \mathcal {SIZE}[S]\).

Note that, using Corollary 3.3, under the hypothesis of Theorem A.1, we have that \(\mathsf {CAPP}\in \mathtt {i.o.}pr\mathcal {DTIME}[T]\), where \(T(n)=2^{O(S^{-1}(n^{O(1)}))}\). In particular, under \(\mathsf {MAETH}\) (which refers to \(S(n)=2^{\Omega (n/\log (n))}\)), we have that \(pr\mathcal {BPP}\subseteq \mathtt {i.o.}pr\mathcal {DTIME}[n^{O(\mathrm{loglog}(n))}]\).

Proof of Theorem A.1

Let L be the problem from Proposition 3.12. Assuming towards a contradiction that \(L\in \mathcal {SIZE}[S]\), we show that \(\mathcal {DTIME}[2^n]\subseteq \mathcal {MATIME}[S^{\prime }]\).

Let \(L_0\in \mathcal {DTIME}[2^n]\). We construct a probabilistic verifier that gets input \(x_0\in \lbrace 0,1\rbrace ^{n_0}\), and if \(x_0\in L_0\), then for some non-deterministic choices the verifier accepts with probability one, and if \(x_0\notin L_0\), then for all non-deterministic choices the verifier rejects with high probability. The verifier first reduces \(L_0\) to L by computing \(x\in \lbrace 0,1\rbrace ^n\) of length \(n=O(n_0)\) such that \(x_0\in L_0\) if and only if \(x\in L\).

Let \(n^{\prime }=\ell (n)=O(n)=O(n_0)\). By our hypothesis, there exists a circuit over \(n^{\prime }\) input bits of size \(S(n^{\prime })\) that decides \(L_{n^{\prime }}\). The verifier guesses a circuit \(C_L:\lbrace 0,1\rbrace ^{n^{\prime }}\rightarrow \lbrace 0,1\rbrace\) of size \(S(n^{\prime })\) and simulates the machine M from Proposition 3.12 on input x while resolving its oracle queries of using \(C_L\). The verifier accepts if and only if M accepts. Note that if \(x_0\in L_0\) and the verifier’s guess was correct (i.e., \(C_L\) decides \(L_{n^{\prime }}\)), then the verifier accepts with probability one. However, if \(x_0\notin L_0\), then for every guess of \(C_L\) (i.e., every oracle for M) the verifier rejects with high probability. The running time of the verifier is \(\mathrm{poly}(n)\cdot \mathrm{poly}(S(n^{\prime }))=S(O(n))^{O(1)}\).□

In the following result, instead of assuming strong (e.g., super-polynomial) lower bounds for \(\mathcal {MATIME}\) against \(\mathcal {E}\), we assume fixed polynomial lower bounds for \(\mathcal {MATIME}\) against \(\mathcal {P}\), and deduce both a sub-exponential derandomization of \(\mathcal {BPP}\) and a polynomial-time derandomization of \(\mathcal {BPP}\) with \(n^{\epsilon }\) advice for an arbitrarly small constant \(\epsilon \gt 0\).⁴⁶

Theorem A.2

(Fixed-polynomial-size Lower Bounds for\(\mathcal {MA}\)\(\Longrightarrow\)Derandomization and Circuit Lower Bounds)

Assume that for every \(k\in \mathbb {N}\) it holds that \(\mathcal {P}\not\subseteq \mathtt {i.o.}\mathcal {MATIME}[n^k]\). Then, for every \(\epsilon \gt 0,\) it holds that \(pr\mathcal {BPP}\subseteq (pr\mathcal {P}/n^{\epsilon }\cap pr\mathcal {DTIME}[2^{n^{\epsilon }}])\).

Proof.

In high level, we want to use our hypothesis to deduce that there exists a polynomial-time algorithm that outputs the truth-table of a “hard” function and then use that “hard” function for derandomization. Loosely speaking, the following claim, whose proof is a refinement of an argument from [10], asserts that if the output string of every polynomial-time algorithm has circuit complexity at most \(n^k\), then all of \(\mathcal {P}\) can be decided by \(\mathcal {MA}\) verifiers running in time \(n^{O(k)}\).

Claim A.2.1.

Assume that there exists \(k\in \mathbb {N}\) such that for every deterministic polynomial-time machine M there exists an infinite set \(S\subseteq \mathbb {N}\) such that for every \(n\in S\) the following holds: For every \(x\in \lbrace 0,1\rbrace ^n\), when the output string \(M(x)\) is viewed as a truth-table of a function, this function has circuit complexity at most \(n^k\). Then, \(\mathcal {P}\subseteq \mathtt {i.o.}\mathcal {MATIME}[n^{O(k)}]\).□

Proof.

Let \(L\in \mathcal {P}\), and let M be a polynomial-time machine that decides L. Our goal is to decide L in \(\mathcal {MATIME}[n^k]\) on infinitely-many input lengths.

For every \(x\in \lbrace 0,1\rbrace ^n\), let \(T_x:\lbrace 0,1\rbrace ^{\mathrm{poly}(n)}\rightarrow \lbrace 0,1\rbrace\) be a polynomial-sized circuit that gets as input a string \(\Pi\) and accepts if and only if \(\Pi\) is the computational history of \(M(x)\) and \(M(x)=1\). Note that the mapping of \(x\mapsto T_x\) can be computed in polynomial time (since M runs in polynomial time). Also, fix a PCP system for CircuitSAT with the following properties: The verifier runs in polynomial time and uses \(O(\log (n))\) randomness and \(O(1)\) queries; the verifier has perfect completeness and soundness error \(1/3\); and there is a polynomial-time algorithm W that maps any circuit C and a satisfying assignment for C (i.e., \(y\in C^{-1}(1)\)) to a PCP proof that the verifier accepts. For every \(x\in \lbrace 0,1\rbrace ^n\) and every input \(\Pi \in \lbrace 0,1\rbrace ^{\mathrm{poly}(n)}\) for \(T_x\), let \(W(T_x,\Pi)\) be the corresponding PCP proof that W produces.

Observe that there is a polynomial-time algorithm A that gets as input \(x\in \lbrace 0,1\rbrace ^n\), produces the computational history of \(M(x)\), which we denote by \(H_{M(x)}\), produces the circuit \(T_x\), and finally prints the PCP witness \(W(T_x,H_{M(x)})\). Thus, by our hypothesis, there exists an infinite set \(S\subseteq \mathbb {N}\) such that for every \(n\in S\) and every \(x\in \lbrace 0,1\rbrace ^n\) there exists a circuit \(C_{x}:\lbrace 0,1\rbrace ^{O(\log (n))}\rightarrow \lbrace 0,1\rbrace\) of size \(n^k\) whose truth-table is \(W(T_x,H_{M(x)})\).

The \(\mathcal {MA}\) verifier V gets input x and expects to get as proof a circuit \(C:\lbrace 0,1\rbrace ^{O(\log (n))}\rightarrow \lbrace 0,1\rbrace\) bits. The verifier V now simulates the PCP verifier while resolving its queries to the PCP using the circuit C. Note that for every \(n\in S\) and every \(x\in \lbrace 0,1\rbrace ^n\) the following holds: If \(M(x)=1\), then there exists a proof (i.e., a circuit \(C_x\)) such that the verifier accepts with probability one; however, if \(M(x)=0\), then \(T_x\) rejects all of its inputs, which implies that for every proof, with probability at least \(2/3\), the \(\mathcal {MA}\) verifier rejects.□

Using our hypothesis that for every \(k\in \mathbb {N}\) it holds that \(\mathcal {P}\not\subseteq \mathtt {i.o.}\mathcal {MATIME}[n^k]\), and taking the counter-positive of Claim A.2.1, we deduce that:

Corollary A.2.2.

For every \(k\in \mathbb {N}\) there exists a polynomial-time machine M such that for every sufficiently large \(n\in \mathbb {N}\) there exists an input \(x\in \lbrace 0,1\rbrace ^n\) such that \(M(x)\) is the truth-table of a function with circuit complexity more than \(n^{k}\).

Now, fix \(\epsilon \gt 0\), let \(L\in pr\mathcal {BPP}\), and let R be a probabilistic polynomial-time machine that decides L. Given input \(x\in \lbrace 0,1\rbrace ^n\), we decide whether \(x\in L\) in polynomial-time and with \(n^{\epsilon }\) advice, as follows: Consider the circuit \(R_x\) that computes the decision of R at x as a function of the random coins of R, and let \(c\gt 1\) such that the size of \(R_x\) is at most \(n^c\). We instantiate Corollary A.2.2 with \(k=c^{\prime }/\epsilon\), where \(c^{\prime }\gt c\) is a sufficiently large constant. We expect as advice an input y of length \(n^{\epsilon }\) to the machine M such that \(M(y)\) has circuit complexity \(n^{c^{\prime }}\). We then use \(M(y)\) to instantiate Theorem 3.2 with seed length \(O(\log (n))\) and error \(1/10\) and for circuits of size \(n^c\) (such that the PRG “fools” the circuit \(R_x\)) and enumerate its seeds to approximate the acceptance probability of \(R_x\) (and hence decide whether or not \(x\in L\)).

We now also show that \(L\in pr\mathcal {DTIME}[2^{n^{2\epsilon }}]\). To do so, consider the foregoing algorithm, and assume that it gets no advice. Instead, it enumerates over all \(2^{n^{\epsilon }}\) possible advice strings to obtain \(2^{n^{\epsilon }}\) truth-tables, each of size \(\mathrm{poly}(n)\). We know that at least one of these truth-tables has circuit complexity \(n^{c^{\prime }}\). Now, the algorithm constructs the truth-table of a function f over \(n^{\epsilon }+O(\log (n))\) bits, which uses the first \(n^{\epsilon }\) bits to “choose” one of the \(2^{n^{\epsilon }}\) truth-tables, and uses the \(O(\log (n))\) bits as an index to an entry in that truth-table (i.e., for \(i\in \lbrace 0,1\rbrace ^{n^{\epsilon }}\) and \(z\in O(\log (n))\) it holds that \(f(i,z)=g_i(z)\), where \(g_i\) is the function that is obtained from the ith advice string). Note that, since at least one of the \(2^{n^{\epsilon }}\) functions had circuit complexity \(n^{c^{\prime }}\), it follows that f also has circuit complexity \(n^{c^{\prime }}\). Thus, this algorithm can use f to instantiate Theorem 3.2 with seed length \(n^{\epsilon }+O(\log (n))\) and for circuits of size \(n^c\) to “fool” the circuit \(R_x\).

B POLYNOMIALS ARE SAMPLE-AIDED WORST-CASE TO AVERAGE-CASE REDUCIBLE

Recall that, in Section 4.1 we defined the notion of sample-aided worst-case to \(\delta\)-average-case-reducible function (see Definitions 4.2 and 4.3) following [23]. In this Appendix, we explain why labeled samples can be helpful for uniform worst-case to “rare-case” reductions and show that low-degree polynomials are indeed sample-aided worst-case to average-case-reducible.

Consider a function f whose truth-table is a codeword of a locally list-decodable code, and also assume that f is randomly self-reducible (i.e., computing f in the worst-case is reducible to computing f on, say, .99 of the inputs). Then, for every circuit \(\tilde{C}\) that agrees with f on a tiny fraction of inputs (i.e., \(\tilde{C}\) computes a “corrupt” version of f), we can efficiently produce a small list of circuits with oracle gates to \(\tilde{C}\) such that one of these circuits correctly computes f on all inputs. The main trouble is that we do not know which candidate circuit in this list to use. This is where the labeled samples come in: We can iterate over the candidates in the list, use the labeled samples to test each candidate circuit for agreement with f, and with high probability find a circuit that agrees with f on (say) .99 of the inputs. Then, using the random self-reducibility of f, we obtain a circuit that correctly computes f on each input, with high probability.

The crucial property that we need from the code to make the foregoing algorithmic approach work is that the local list-decoding algorithm will efficiently produce a relatively short list. Specifically, recall that by our definition, a sample-aided worst-case to \(\delta\)-average-case reduction needs to run in time \(\mathrm{poly}(1/\delta)\). Hence, we need a list-decoding algorithm that runs in time \(\mathrm{poly}(1/\delta)\) (and indeed produces a list of such size). A suitable local list-decoding algorithm indeed exists in the case that the code is the Reed-Muller code, which leads us to the following result:

Proposition B.1 (Low-degree Polynomials are Uniformly Worst-case to Average-case Reducible with a Self-oracle).

Let \(q:\mathbb {N}\rightarrow \mathbb {N}\) be a field-size function, let \(\ell :\mathbb {N}\rightarrow \mathbb {N}\) such that \(n\ge \ell \cdot \log (q)\), and let \(d,\rho :\mathbb {N}\rightarrow \mathbb {N}\) such that \(10\sqrt {d(n)/q(n)}\le \rho (n)\le (q(n))^{-\Omega (1)}=o(1)\). Let \(f=\lbrace f_n:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace \rbrace _{n\in \mathbb {N}}\) be a sequence of functions such that \(f_n\) computes a polynomial \(\mathbb {F}_n^{\ell (n)}\rightarrow \mathbb {F}_n\) of degree \(d(n)\) where \(|\mathbb {F}_n|=q(n)\). Then f is sample-aided worst-case to \(\rho\)-average-case reducible.

Proof.

We construct a probabilistic machine M that gets input \(1^{n}\) and oracle access to a function \(\widetilde{f_n}\) that agrees with \(f_n\) on \(\rho (n)\) of the inputs, and also \(\mathrm{poly}(1/\rho (n))\) labeled samples for \(f_n\), and with probability \(1-\rho (n)\) outputs a circuit \(C:\mathbb {F}^{\ell }\rightarrow \mathbb {F}\) such that for every \(x\in \mathbb {F}^{\ell }\) it holds that \(\Pr _r[C^{\widetilde{f_n}}(x,r)=f_n(x)]\ge 2/3\).

The first step of the machine M is to invoke the local list-decoding algorithm of [54, Theorem 29], instantiated with degree parameter \(d=d(n)\) and agreement parameter \(\rho =\rho (n)\). The algorithm runs in time \(\mathrm{poly}(\ell (n),d,\log (q(n)),1/\rho)=\mathrm{poly}(n,1/\rho)\) and outputs a list of \(O(1/\rho)\) probabilistic oracle circuits \(C_{1},\ldots ,C_{O(1/\rho)}:\lbrace 0,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace ^n\) such that with probability at least \(2/3\) there exists \(i\in [O(1/\rho)]\) satisfying \(\Pr [C_i^{\widetilde{f_n}}(x)=f_n(x)]\ge 2/3\) for all \(x\in \lbrace 0,1\rbrace ^n\). We call any circuit that satisfies the latter condition good. By invoking the algorithm of [54] for \(\mathrm{poly}(1/\rho)\) times, we obtain a list of \(t=\mathrm{poly}(1/\rho)\) circuits \(C_1,\ldots ,C_t\) such that with probability at least \(1-\mathrm{poly}(\rho)\) there exists \(i\in [t]\) such that \(C_i\) is good.

The second step of the machine is to transform the probabilistic circuits into deterministic circuits such that, with high probability, the deterministic circuit corresponding to the “good” circuit \(C_i\) will correctly compute \(f_n\) on .99 of the inputs (when given oracle access to \(\widetilde{f_n}\)). Specifically, by implementing naive error-reduction in all circuits, we can assume that for every \(x\in \mathbb {F}^{\ell }\) it holds that \(\Pr _r[C_i^{\widetilde{f_n}}(x,r)=f_n(x)]\ge .995\). Now, the machine M creates \(O(\log (1/\rho))\) copies of each circuit in the list and for each copy M “hard-wires” a randomly chosen fixed value for the circuit’s randomness. The result is a list of \(t^{\prime }=\mathrm{poly}(1/\rho)\) deterministic circuits \(D_1,\ldots ,D_{t^{\prime }}\) such that with probability \(1-\mathrm{poly}(\rho)\) there exists a circuit \(D_i\) satisfying \(\Pr _x[D_i^{\widetilde{f_n}}(x)=f_n(x)]\ge .99\).

The third step of the machine M is to “weed” the list to find a single circuit \(D_i\) that (when given access to \(\widetilde{f_n}\)) correctly computes f on .95 of the inputs. To do so, M iterates over the list and for each circuit \(D_j\) estimates the agreement of \(D_j^{\widetilde{f_n}}\) with \(f_n\) with error .01 and confidence \(1-\mathrm{poly}(\rho)\), using the random samples.

The final step of the machine M is to use the standard random self-reducibility of the Reed-Muller code to transform the circuit \(D_i\) into a probabilistic circuit that correctly computes f at each input with probability at least \(2/3\). Specifically, the probabilistic circuit implements the standard random self-reducibility algorithm for the \((q,\ell ,d)\) Reed-Muller code (see, e.g., [2, Theorem 19.19]) while resolving its oracle queries using the circuit \(D_i\). The standard algorithm runs in time \(\mathrm{poly}(q,\ell ,d)\), and works whenever \(D_i\) agrees with \(f_n\) on at least \(1-\frac{1-d/q}{6}\lt .95+d/q\) of the inputs, which holds in our case, since \(d/q\lt \delta =o(1)\).□

C AN ℰ-COMPLETE PROBLEM WITH USEFUL PROPERTIES

In this Appendix, we prove Proposition 3.12, which asserts the existence of an \(\mathcal {E}\)-complete problem (under linear-time reductions) that is randomly self-reducible, has an instance checker with linear-length queries, and such that both the random self-reducibility algorithm and the instance checker use a linear number of random bits.

Proposition C.1

(An\(\mathcal {E}\)-complete Problem That is Random Self-reducible and Has a Good Instance Checker)

For every \(\eta \gt 0\) there exists \(L^{\mathtt {nice}}\in \mathcal {DTIME}[\tilde{O}(2^n)]\) such that:

(1)	Any \(L\in \mathcal {DTIME}[2^n]\) reduces to \(L^{\mathtt {nice}}\) in polynomial time with a multiplicative blow-up of at most \(1+\eta\) in the input length. Specifically, for every n there exists \(n^{\prime }\le (1+\eta)\cdot n\) such that any n-bit input for L is mapped to an \(n^{\prime }\)-bit input for \(L^{\mathtt {nice}}\).
(2)	The problem \(L^{\mathtt {nice}}\) is randomly self-reducible by an algorithm \(\mathtt {Dec}\) that on inputs of length n uses \(n+\mathrm{polylog}(n)\) random bits.
(3)	There is an instance checker \(\mathtt {IC}\) for \(L^{\mathtt {nice}}\) that on inputs of length n uses \(n+O(\log (n))\) random bits and makes \(O(1)\) queries of length \(\ell (n)\), where \(\ell (n)\lt (2+\eta)\cdot n\).

Proof.

For a sufficiently small \(\delta \le \eta /7\), let \(L^{\mathcal {E}}=\lbrace (\langle M \rangle ,x):M\text{ accepts }x\text{ in }2^{|x|}\text{ steps}\rbrace\). Let \(f_{L^{\mathcal {E}}}:\lbrace 0,1\rbrace ^{*}\rightarrow \lbrace 0,1\rbrace ^{*}\) be the low-degree extension of \(L^{\mathcal {E}}\) such that inputs of length \(n_0\) for \(L^{\mathcal {E}}\) are mapped to inputs in \(\mathbb {F}^m\), where \(m=\delta \cdot \frac{n_0}{\lfloor \log (n_0)\rfloor }\) and \(|\mathbb {F}|=2^{(1/\delta +1)\cdot \lceil \log (n_0)\rceil }\), for a polynomial of individual degree \(d=\lceil (n_0)^{1/\delta }\rceil\). Note that \((d+1)^m\ge 2^{n_0}\) (i.e., there is a unique extension of \(L^{\mathcal {E}}\) with these parameters) and that \(|\mathbb {F}|\gt m\cdot d\) (i.e., the polynomial is indeed of low degree). Finally, let \(L^{\mathtt {nice}}\) be the set of pairs \((z,i)\in \lbrace 0,1\rbrace ^{m\cdot \log (|\mathbb {F}|)}\times \lbrace 0,1\rbrace ^{\lceil \mathrm{loglog}(|\mathbb {F}|)\rceil }\), such that \(f_{L^{\mathcal {E}}}(z)_i=1\) (i.e., the ith bit in the binary representation of \(f_{L^{\mathcal {E}}}(z)\in \mathbb {F}\) equals one).

Note that \(L^{\mathcal {E}}\) is reducible in polynomial time to \(f_{L^{\mathcal {E}}}\), which is in turn reducible in polynomial time to \(L^{\mathtt {nice}}\); and that inputs of length \(n_0\in \mathbb {N}\) for \(L^{\mathcal {E}}\) are mapped to inputs of length \(n=m\cdot \log (|\mathbb {F}|)+\lceil \mathrm{loglog}(|\mathbb {F}|)\rceil +1\lt (1+2\delta)\cdot n_0\) for \(L^{\mathtt {nice}}\). Thus, any \(L\in \mathcal {DTIME}[2^n]\) is reducible in polynomial time to \(L^{\mathtt {nice}}\) with a multiplicative overhead of at most \(1+3\delta\) in the input length. Also note that \(L^{\mathtt {nice}}\in \mathcal {DTIME}[\tilde{O}(2^n)]\), since the polynomial \(f_{L^{\mathcal {E}}}\) can be evaluated in such time.

Let us now prove that \(L^{\mathtt {nice}}\) is randomly self-reducible with at most \((1+\delta)\cdot n\) random bits. Let \(\mathtt {Dec}_0\) be the standard random self-reducibility algorithm for \(f_{L^{\mathcal {E}}}\), which uses less than n random bits.⁴⁷ Given input \((z,i)\in \lbrace 0,1\rbrace ^{m\cdot \lceil \log (|\mathbb {F}|)\rceil +\lceil \mathrm{loglog}(|\mathbb {F}|)\rceil }\) and oracle access to some \(L^{\prime }\subseteq \lbrace 0,1\rbrace ^{n}\), we simulate \(\mathtt {Dec}_0\) at input z and with oracle access to a function induced by \(L^{\prime }\) (as detailed below) and then output the ith bit of its answer. Specifically, we initially choose a random permutation \(\pi\) of \(\lbrace 0,1\rbrace ^{\mathrm{loglog}(|\mathbb {F}|)}\) using \(\mathrm{polylog}(n)\lt \delta \cdot n\) random coins, and whenever \(\mathtt {Dec}_0\) makes a query \(q_1\in \mathbb {F}^m\), we query \(L^{\prime }\) at all inputs \(\lbrace (q_1,q_2)\rbrace _{q_2\in \lbrace 0,1\rbrace ^{\lceil \mathrm{loglog}(|\mathbb {F}|)\rceil }}\), ordered according to \(\pi\), and answer \(\mathtt {Dec}_0\) accordingly. Note that each of our queries is uniformly distributed: This is since for every query \((q_1,q_2)\), we have that \(q_1\) is uniform (because \(\mathtt {Dec}_0\)’s queries are uniform) and that \(q_2\) is uniform and independent from \(q_1\) (because we chose a random \(\pi\)). Also note that if \(L^{\prime }(q_1,q_2)=L^{\mathtt {nice}}(q_1,q_2)\) for every query \((q_1,q_2)\), then each query \(q_1\) of \(\mathtt {Dec}_0\) is answered by \(f_{L^{\mathcal {E}}}(q_1)\), in which case, we output \(f_{L^{\mathcal {E}}}(z)_i=L^{\mathtt {nice}}(z,i)\).

Finally, to see that \(L^{\mathtt {nice}}\) has an instance checker that uses \(n+O(\log (n))\) random bits and issues \(O(1)\) queries of length \((2+7\delta)\cdot n\), fix a PCP system for \(\mathcal {DTIME}[T]\), where \(T(n)=\tilde{O}(2^n)\), with the following specifications: The verifier V runs in polynomial time, uses \(n+O(\log (n))\) bits of randomness, issues \(O(1)\) queries, and has perfect completeness and soundness error \(1/6\); and there is an algorithm P that gets an input \(x\in \lbrace 0,1\rbrace ^{n}\) and outputs a proof for x in this PCP system (or \(\perp\), if \(x\notin L\)) in deterministic time \(\tilde{O}(2^{n})\) (for a suitable PCP system, see [4, Theorem 1]). We will instantiate this PCP system for the set \(L^{\mathtt {nice}}_1=\lbrace (z,i,b):L^{\mathtt {nice}}(z,i)=b\rbrace\), which is in \(\mathcal {DTIME}[\tilde{O}(2^n)]\).

The instance checker \(\mathtt {IC}\) for \(L^{\mathtt {nice}}\) gets input \((z,i)\in \lbrace 0,1\rbrace ^{n}\) and simulates the verifier V for \(L^{\mathtt {nice}}_1\) on inputs \((z,i,0)\) and \((z,i,1)\). Whenever \(V(z,i,b)\) queries its proof at location \(j\in [\tilde{O}(2^{n})]\), the instance checker \(\mathtt {IC}\) uses its oracle to try and decide the problem \(\Pi\) at input \((z,i,b,j)\), where \(\Pi =\lbrace ((z,i,b),j):P(z,i,b)_j=1\rbrace\). Specifically, since \(\Pi \in \mathcal {DTIME}[\tilde{O}(2^{n/2})]\subseteq \mathcal {DTIME}[\tilde{O}(2^n)],\) it holds that \(\Pi\) reduces to \(L^{\mathtt {nice}}\) in polynomial time and with multiplicative blow-up of \(1+3\delta\) in the input length; hence, \(\mathtt {IC}\) reduces \(((z,i,b),j)\) to an input for \(L^{\mathtt {nice}}\) of length \(\ell (n)\le (1+3\delta)\cdot (2n+1)\lt (2+7\delta)\cdot n\) and uses its oracle to try and obtain \(\Pi ((z,i,b),j)\). For \(\sigma \in \lbrace 0,1\rbrace ^{}\), the instance checker \(\mathtt {IC}\) outputs \(\sigma\) if and only if \(V(z,i,\sigma)=1\) and \(V(z,i,1-\sigma)=0\), and otherwise outputs \(\perp\). Note that \(\mathtt {IC}^{L^{\mathtt {nice}}}(z,i)=L^{\mathtt {nice}}(z,i)\), with probability one; and that \(\mathtt {IC}\) errs when given oracle \(L^{\prime }\ne L^{\mathtt {nice}}\) (i.e., \(\mathtt {IC}^{L^{\prime }}(z,i)=1-L^{\mathtt {nice}}(z,i)\)) only when V accepts \((z,i,1-L^{\mathtt {nice}}(z,i))\notin L^{\mathtt {nice}}_1\), which happens with probability at most \(1/6\) for any \(L^{\prime }\).□

ACKNOWLEDGMENTS

We are grateful to Igor Oliveira for pointing us to the results in [46, Section 5], which serve as a basis for the proof of Theorem 1.7. We thank Oded Goldreich, who provided feedback throughout the research process and detailed comments on the manuscript, both of which helped improve the work. We also thank Ryan Williams for a helpful discussion, for asking us whether a result as in Theorem 1.7 can be proved, and for feedback on the manuscript. Finally, we thank an anonymous reviewer for pointing out a bug in the initial proof of Theorem 1.6, which we fixed.

The work was initiated in the 2018 Complexity Workshop in Oberwolfach; the authors are grateful to the Mathematisches Forschungsinstitut Oberwolfach and to the organizers of the workshop for the productive and lovely work environment. Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Footnotes

¹ In [61], the introduction of these variants is credited to a private communication from Carmosino, Gao, Impagliazzo, Mihajlin, Paturi, and Schneider [7].
Footnote
² Some “strong” variants of standard exponential-time hypotheses are in fact known to be false (see [61]).
Footnote
³ See Section 3.1 for definitions of complexity classes used throughout the article.
Footnote
⁴ Throughout the article, when we say that a PRG is \(\epsilon\)-pseudorandom for uniform circuits, we mean that for every efficiently samplable distribution over circuits, the probability over choice of circuit that the circuit distinguishes the output of the PRG from uniform with advantage more than \(\epsilon\) is at most \(\epsilon\) (see Definitions 3.6 and 3.7). The existence of such PRGs implies an “average-case” derandomization of \(\mathcal {BPP}\) in the following sense: For every \(L\in \mathcal {BPP}\) there exists an efficient deterministic algorithm D such that every probabilistic algorithm that gets input \(1^n\) and tries to find \(x\in \lbrace 0,1\rbrace ^{n}\) such that \(D(x)\ne L(x)\) has a small probability of success (see, e.g., [20, Proposition 4.4]).
Footnote
⁵ Another relevant work is that of Goldreich [20]: He showed that if \(pr\mathcal {BPP}=pr\mathcal {P}\), then there exists a PRG for uniform circuits that suffices for this conclusion (in particular, the PRG runs in polynomial time and works for all input lengths).
Footnote
⁶ Other proof strategies (which use different hypotheses) were able to support an “almost-always” conclusion, albeit not necessarily a PRG, from an “almost-always” hypothesis (see [8, 26]).
Footnote
⁷ As mentioned above, Gutfreund and Vadhan [27, Section 6] showed that if we settle for average-case derandomization of \(\mathcal {RP}\) (rather than of \(\mathcal {BPP}\)), then the derandomization can work almost-always. As in previous results, their derandomization is relatively slow (i.e., it works in sub-exponential time). We show that their ideas can be combined with the techniques underlying Theorem 1.1, to deduce a fast average-case derandomization \(\mathcal {RP}\) that works almost-always (see Theorem 4.15).
Footnote
⁸ Note that indeed a non-deterministic analogue of rETH is \(\mathsf {MAETH}\) (or, arguably, \(\mathsf {AMETH}\)), rather than NETH, due to the use of randomness. Also recall that, while the “strong” version of \(\mathsf {MAETH}\) is false (see [61]), there is currently no evidence against the “non-strong” version \(\mathsf {MAETH}\).
Footnote
⁹ The question of equivalence is mostly “folklore” but was mentioned several times in writing. It was asked in [30, Remark 33], which proved an analogous equivalence between non-deterministic derandomization with short advice and circuit lower bounds against non-deterministic classes (i.e., against \(\mathcal {NTIME}\); see also [11]). It was also mentioned as a hypothetical possibility in [56] (referred to there as a “super-Karp-Lipton theorem”). Following the results of [43], the question was recently raised again as a conjecture in [55].
Footnote
¹⁰ We stress that our hypothesis refers to lower bounds for uniform models of computation, for which strong lower bounds (compared to those for non-uniform circuits) are already known. (For example, \(\mathcal {NP}\) is hard for \(\mathcal {NP}\)-uniform circuits of size \(n^k\) for every fixed \(k\in \mathbb {N}\) (see [49]), whereas we do not even know if \(\mathcal {E}^{\mathcal {NP}}\) is hard for non-uniform circuits of arbitrarily large linear size.)
Footnote
¹¹ For context, the best-known lower bounds for circuits of quasilinear size are against \(\Sigma _2\) (see [36]) or against \(\mathcal {MA}/1\) (i.e., Merlin-Arthur protocols that use one bit of non-uniform advice; see [48]).
Footnote
¹² For our derandomization results, it would have sufficed for \(f^{\mathtt {ws}}\) to be computable in quasiexponential time \(2^{\tilde{O}(n)}\) rather than linear space; see the comment in the end of Section 4.1.2.
Footnote
¹³ Recall that the standard arithmetization of \(3\text{-}\mathtt {SAT}\) is a polynomial that depends on the input formula, whereas we want a single polynomial that gets both a formula and the assignment as input.
Footnote
¹⁴ Actually, since \(f^{\mathtt {ws}}\) is downward self-reducible in \(\mathrm{polylog}\) steps, it can be computed relatively efficiently on infinitely-many input lengths and thus cannot be “hard” for almost all \(\ell\)’s. However, since \(\mathtt {TQBF}\) can be reduced to \(f^{\mathtt {ws}}\) with quasilinear overhead, if \(\mathtt {TQBF}\) is “hard” almost-always, then for every \(\ell (n)\) there exists \(\ell ^{\prime }\le \widetilde{O}(\ell (n))\) such that \(f^{\mathtt {ws}}\) is “hard” on \(\ell ^{\prime }\), which allows our argument to follow through, with a similar set \(\overline{S_n}\subset [n,n^{\mathrm{polyloglog}(n)}]\) (see Proposition 4.11 for details). For simplicity, we ignore this issue in the overview.
Footnote
¹⁵ This high-level proof structure, which combines a non-uniform collapse hypothesis (using a Karp-Lipton-style theorem) and a derandomization hypothesis, dates back to the work of Impagliazzo, Kabanets, and Wigderson [30], underlies the algorithmic method of Williams [59], and has been used in works published in parallel to ours (such as Chen et al. [10]).
Footnote
¹⁶ The notation \(\mathcal {OMA}\) stands for “oblivious” \(\mathcal {MA}\). It denotes the class of problems that can be decided by an \(\mathcal {MA}\) verifier such that for every input length there is a single “good” proof that convinces the verifier on all inputs in the set (rather than a separate proof for each input); see, e.g., [17, 22].
Footnote
¹⁷ Note that the problem of solving \(\mathsf {CAPP}\) for v-bit circuits of size \(n=2^{\Omega (v)}\) can be trivially solved in time \(2^{O(v)}=\mathrm{poly}(n)\), and thus unconditionally lies in \(pr\mathcal {P}\cap pr\mathcal {BPTIME}[\tilde{O}(n)]\). The derandomization problem described above simply calls for a faster deterministic algorithm for this problem.
Footnote
¹⁸ Intuitively, in the “low-end” Karp-Lipton result, we only need to derandomize probabilistic decisions made by the non-deterministic machine that constructs the circuit, whereas the circuit itself is deterministic; thus, a non-deterministic derandomization hypothesis suffices for this result. See Section 5.1.2 for details.
Footnote
¹⁹ For example, from such an algorithm they deduce the lower bound \(\mathcal {NEXP}\not\subseteq \mathcal {P}/\mathrm{poly}\); and from an algorithm that runs in time \(2^{n/\mathrm{polylog}(n)}\) as in Theorem 1.7, their results yield the lower bound \(\mathcal {NP}\not\subset \mathcal {SIZE}[n^k]\) for every fixed \(k\in \mathbb {N}\).
Footnote
²⁰ Another known result, which was communicated to us by Igor Oliveira, asserts that if \(\mathtt {CircuitSAT}\) for circuits over n variables and of size \(\mathrm{poly}(n)\) can be solved in probabilistic sub-exponential time \(2^{n^{o(1)}}\), then \(\mathcal {BPTIME}[2^{O(n)}]\not\subset \mathcal {P}/\mathrm{poly}\). This result can be seen as a “high-end” form of our result (i.e., of Theorem 1.7), where the latter will use a weaker hypothesis but deduce a weaker conclusion.
Footnote
²¹ That is, there exists a probabilistic algorithm that gets input \(1^n\) and oracle access to f and with high probability outputs an n-bit circuit of size \(n\cdot (\log n)^k\) that agrees with f on almost all inputs.
Footnote
²² Actually, our implementation of the [33] argument shows that if the function \(\mathtt {ECC}(f^{\mathtt {ws}})\) (where \(\mathtt {ECC}\) is defined as in Section 2.1) can be learned, then the function \(f^{\mathtt {ws}}\) can be efficiently computed. For simplicity, we ignore the difference between \(f^{\mathtt {ws}}\) and \(\mathtt {ECC}(f^{\mathtt {ws}})\) in the current high-level description.
Footnote
²³ The standard definition of instance checkers fixes the error probability to \(1/3\), but we can reduce the error to \(1/6\) using standard error-reduction.
Footnote
²⁴ Definition 4.2 is actually a slightly modified version of the definition in [23]. First, we consider reductions of computing f in the worst-case to computing f in “rare-case,” whereas [23] both reduces the computation of f to the computation of a possibly different function \(f^{\prime }\) and parametrizes the success probability of computing both f and \(f^{\prime }\). Second, we separately account for the success probability of the transformation M and of the final circuit C. And, last, we also require f to be length-preserving.
Footnote
²⁵ Actually, in [56] they define a Boolean function, which treats a suffix of its input as an index of an output bit in the non-Boolean version that we describe and outputs the corresponding bit. To streamline our exposition, we ignore this issue.
Footnote
²⁶ In more detail, we define three arithmetic operators on functions \(\mathbb {F}^{2n}\rightarrow \mathbb {F}\), each indexed by a variable \(j\in [n]\), and denote these operators by \(\lbrace \mathcal {O}^{j}_k\rbrace _{k\in [3],j\in [n]}\). In each recursive step \(i\in [r(n)]\), the polynomial corresponding to input length \(N_0+i\) is obtained by applying operator \(\mathcal {O}^{j(i)}_{k(i)}\), where \(j,k:\mathbb {N}\rightarrow [3]\) are polynomial-time computable functions, to the polynomial corresponding to input length \(N_0+i-1\). Thus, at input length \(N_0+i\), we compute \(L_{TV}(\varphi ,w)\) by applying i operators on the polynomial P and evaluating the resulting polynomial at \((\varphi ,w)\).
Footnote
²⁷ Recall that the downward self-reducibility algorithm for \(f^{\mathtt {ws}}\) works in time \(\mathrm{poly}(1/\delta)=2^{n/\mathrm{polylog}(n)}\), and thus the existence of this algorithm does not immediately imply that \(f^{\mathtt {ws}}\in \mathcal {PSPACE}\).
Footnote
²⁸ This choice makes our reduction of \(\mathtt {TQBF}\) to \(f^{\mathtt {ws}}\) somewhat wasteful, but this waste only causes a polylogarithmic overhead, which is insignificant for our results. Thus, for simplicity, we assume that the number of variables indeed equals the representation length of \(\varphi\).
Footnote
²⁹ The algorithm transforms M into an oblivious machine [24, 47] and then applies an efficient Cook-Levin transformation of the oblivious machine to a \(3\text{-}\mathtt {SAT}\) formula (see, e.g., [2, Section 2.3.4]).
Footnote
³⁰ The specific choice of H as the image of \(H_0=\lbrace x0^{4n^{\prime }}:x\in \lbrace 0,1\rbrace ^{n^{\prime }}\rbrace\) under \(\pi\) is immaterial for our argument, as long as we can efficiently decide \(H_0\) and enumerate over \(H_0\).
Footnote
³¹ This is the case, since the largest input length in \(I_n\) is \(10\lceil n/\ell _0(n)\rceil \cdot \ell _0(n)+11n\cdot \bar{\ell _0}(n)+(\bar{\ell _0}(n)-1)\lt 10n+10\ell _0(n)+(11n+1)\cdot \bar{\ell _0}(n)-1\lt 10n+11(n+1)\cdot \bar{\ell _0}(n)-1\), whereas the smallest input length in \(I_{n+1}\) is \(10\lceil (n+1)/\ell _0(n+1)\rceil \cdot \ell _0(n+1)+11(n+1)\cdot \bar{\ell _0}(n+1) \ge 10n + 11(n+1)\bar{\ell _0}(n+1)+10\).
Footnote
³² The only potential issue here is that the Boolean function is actually a “padded” version of the function that corresponds to polynomial: It is not immediate that if there exists an algorithm that computes the Boolean function correctly on \(\epsilon \gt 0\) of the n-bit inputs, then there exists an algorithm that computes the polynomial correctly on the same fraction \(\epsilon \gt 0\) of the \(m=\log (|\mathbb {F}^{2\ell _0}|)\)-bit inputs. However, the latter assertion holds in our case, since we are interested in probabilistic algorithms.
Footnote
³³ On odd input lengths, the function \(f^{\mathtt {GL(ws)}}\) is defined by ignoring the last input bit; that is, \(f^{\mathtt {GL(ws)}}(x,r\sigma)=f^{\mathtt {GL(ws)}}(x,r)\), where \(|x|=|r|\) and \(|\sigma |=1\).
Footnote
³⁴ In Definition 4.3 the output circuit has oracle gates to a function that agrees with the target function on a \(\delta\) fraction of the inputs. Indeed, we replace these oracle gates with copies of the circuit \(C^{(3)}_i\).
Footnote
³⁵ Moreover, in every small interval of input lengths, there is an input length on which \(f^{\mathtt {ws}}\) can be solved in time \(\mathrm{poly}(1/\delta)\) (without using an oracle).
Footnote
³⁶ Note that \(\mathtt {str}\) is well-defined, since we can assume without loss of generality that \(\overline{S_n}\cap \overline{S_{n^{\prime }}}=\emptyset\) for distinct \(n,n^{\prime }\in B_A\) (i.e., we can assume without loss of generality that n and \(n^{\prime }\) are sufficiently far apart).
Footnote
³⁷ This can be done using an idea from [29, Lemma 5.5] (attributed to Salil Vadhan), essentially “composing” Reed-Solomon codes over \(GF(n)\) of degree \(n/\mathrm{polylog}(n)\) with standard designs (à la Nisan and Wigderson [44]; see [29, Lemma 2.2]) with set-size \(\ell =\mathrm{polylog}(n)\).
Footnote
³⁸ Also, in this setting the function S represents “how far ahead” (beyond n) we are willing to look in our search for the “good” input length.
Footnote
³⁹ To see this more formally, let \(L^{\mathtt {pad}}=\lbrace (x,1^{O(\log (|x))}):x\in L^{\mathtt {nice}}\rbrace\). Since \(L^{\mathtt {nice}}\in \mathcal {DTIME}[\tilde{O}(2^n)]\), we have that \(L^{\mathtt {pad}}\in \mathcal {DTIME}[2^n]\). Using our hypothesis, \(L^{\mathtt {pad}}\) on inputs of length \(N^{\prime }=\bar{N}+O(\log (\bar{N}))\) has circuits of size \(S(N^{\prime })\), and these circuits can be converted (by hardwiring the last \(N^{\prime }-\bar{N}\) input bits) to \(\bar{N}\)-bit circuits for \(L^{\mathtt {nice}}\) of size \(S(N^{\prime })\lt S(2\bar{N})\).
Footnote
⁴⁰ Specifically, the algorithm uses the sampler from Theorem 3.5 (with a sufficiently large \(\beta ,\gamma \gt 1\) and sufficiently small \(\alpha \gt 0\)) to sample \(D=\mathrm{poly}(n)\) strings \(z_1,\ldots ,z_D\in \lbrace 0,1\rbrace ^{n^{\prime }}\) and then uses this sampler again to sample D strings \(r_1,\ldots ,r_{D}\in \lbrace 0,1\rbrace ^{n+O(\log (n))}\) to be used as randomness for the machine \(\mathtt {IC}\). The algorithm rejects \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) if and only if \(\Pr _{i\in [D]}[ \Pr _{j\in [D]}[ \mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z,r_j)=\perp ] \ge .01 ] \ge 1/2(n^{\prime })^{-2c}\), where \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z,r_j)\) denotes the simulation of \(\mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)\) with the fixed randomness \(r_j\). This algorithm always accepts YES instances. Now, assume that \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) is a NO instance, and let us call \(z\in \lbrace 0,1\rbrace ^{n^{\prime }}\) is bad if \(\Pr [ \mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z)=\perp ]\ge 1/6\). By the properties of the sampler, with high probability over the choice of \(z_1,\ldots ,z_D\), the fraction of bad z’s in our sample is at least \(1/2(n^{\prime })^{-2c}\); and for any (fixed) bad z, the probability that \(\Pr _{j\in [D]}[ \mathtt {IC}^{C^{L^{\mathtt {nice}}}_{\bar{n}}}(z,r_j)=\perp ] \lt .01\) is \(\exp (-n)\). Hence, \(C^{L^{\mathtt {nice}}}_{\bar{n}}\) will be rejected with high probability. The bound on the algorithm’s running time follows from standard quasilinear-time algorithms for the Circuit Eval problem (see, e.g., [38, Theorem 3.1]) and since \(\tilde{O}(S(4\bar{n}))\lt \mathrm{poly}(n)\cdot S(2\bar{n})\).
Footnote
⁴¹ This is reminiscent of the recent results of Murray and Williams [43], who showed that solving \(\mathsf {CAPP}\) for v-bit circuits of size \(2^{\Omega (v)}\) in time \(2^{.99\cdot v}\) suffices to deduce circuit lower bounds. Note that the foregoing \(\mathsf {CAPP}\) problem can be solved in deterministic polynomial time, since the input length is \(2^{\Omega (v)}\) (i.e., this \(\mathsf {CAPP}\) problem lies in \(pr\mathcal {BPTIME}[\tilde{O}(n)]\cap pr\mathcal {P}\)).
Footnote
⁴² In fact, for this statement, it suffices to assume that \(pr\mathcal {BPP}\subseteq pr\mathcal {NP}\Longrightarrow \mathcal {EXP}\not\subset \mathcal {P}/\mathrm{poly}\). However, since we will show a result with tighter relations between the parameters below (see Theorem 5.11), in the current statement, we ignore this issue for simplicity.
Footnote
⁴³ In more detail, the “\(\Longrightarrow\)” direction is trivial, so we prove the “\(\Longleftarrow\)” direction. For every \(\Pi \in pr\mathcal {EXP}\), let M be an exponential-time machine that solves \(\Pi\), and let \(L_M\) be the set of inputs that M accepts. Since \(L_M\in \mathcal {EXP}\), there exists an \(\mathcal {NP}\)-machine that decides \(L_M\) and a polynomial-sized circuit family that decides \(L_M\), and the foregoing machine and circuit family also solve \(\Pi\).
FootnoteFootnote
⁴⁴ In Section 3.1, we defined \(\mathcal {SIZE}\) as referring to languages, whereas, here, we apply this notation to a fixed n-bit function. The meaning of \(\mathcal {SIZE}(O)\) here is the size of the smallest circuit computing O.
Footnote
⁴⁵ Since we are interested in algorithms that run in time \(2^{n/\mathrm{polylog}(n)}\) for a sufficiently large polylogarithmic function, there is no significant difference for us between circuits and \(3\text{-}\mathtt {SAT}\) formulas of linear (or quasilinear) size. This is since any circuit can be transformed to a formula with only a polylogarithmic overhead, using an efficient Cook-Levin reduction; and since we can “absorb” polylogarithmic overheads by assuming that the polylogarithmic function in the running time \(2^{n/\mathrm{polylog}(n)}\) is sufficiently large.
Footnote
⁴⁶ Recall that, by Adleman’s theorem [1, 5], we can derandomize \(pr\mathcal {BPP}\) with \(\mathrm{poly}(n)\) bits of non-uniform advice (and even with \(O(n)\) bits, using Theorem 3.5). However, an unconditional derandomization of \(pr\mathcal {BPP}\) with \(o(n)\) bits of non-uniform advice is not known.
⁴⁷ Recall that \(\mathtt {Dec}_0\) chooses a random vector \(\vec{u}\in \mathbb {F}^m\), which requires \(m\cdot \log (|\mathbb {F}|)\lt n\) random bits, and queries its oracle on a set of points on the line corresponding to \(\vec{u}\); see, e.g., [19, Section 7.2.1.1].

REFERENCES

[1] Adleman Leonard. 1978. Two theorems on random polynomial time. In Proceedings of the 19th Annual IEEE Symposium on Foundations of Computer Science (FOCS). 75–83.Google ScholarDigital Library
[2] Arora Sanjeev and Barak Boaz. 2009. Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge.Google ScholarDigital Library
Reference
[3] Babai László, Fortnow Lance, Nisan Noam, and Wigderson Avi. 1993. BPP has subexponential time simulations unless EXPTIME has publishable proofs. Computat. Complex. 3, 4 (1993), 307–318.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
[4] Ben-Sasson Eli, Chiesa Alessandro, Genkin Daniel, and Tromer Eran. 2013. On the concrete efficiency of probabilistically-checkable proofs. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC). 585–594.Google ScholarDigital Library
[5] Bennett Charles H. and Gill John. 1981. Relative to a random oracle A, \({\bf P}^{A}\not={\bf NP}^{A}\not={\rm co}-{\bf NP}^{A}\) with probability 1. SIAM J. Comput. 10, 1 (1981), 96–113.Google ScholarDigital Library
[6] Cai Jin-Yi, Nerurkar Ajay, and Sivakumar D.. 1999. Hardness and hierarchy theorems for probabilistic quasi-polynomial time. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC). 726–735.Google ScholarDigital Library
Reference 1Reference 2
[7] Carmosino Marco L., Gao Jiawei, Impagliazzo Russell, Mihajlin Ivan, Paturi Ramamohan, and Schneider Stefan. 2016. Nondeterministic extensions of the strong exponential time hypothesis and consequences for non-reducibility. In Proceedings of the 7th Conference on Innovations in Theoretical Computer Science (ITCS). 261–270.Google ScholarDigital Library
Reference 1Reference 2
[8] Carmosino Marco L., Impagliazzo Russell, and Sabin Manuel. 2018. Fine-grained derandomization: From problem-centric to resource-centric complexity. In Proceedings of the 45th International Colloquium on Automata, Languages and Programming (ICALP).Google Scholar
Reference 1Reference 2Reference 3
[9] Chen Lijie. 2019. Non-deterministic quasi-polynomial time is average-case hard for ACC circuits. In Proceedings of the 60th Annual IEEE Symposium on Foundations of Computer Science (FOCS).Google ScholarCross Ref
Reference
[10] Chen Lijie, McKay Dylan M., Murray Cody D., and Williams R. Ryan. 2019. Relations and equivalences between circuit lower bounds and Karp-Lipton theorems. In Proceedings of the 34th Annual IEEE Conference on Computational Complexity (CCC). 30:1–30:21.Google Scholar
[11] Chen Lijie and Ren Hanlin. 2020. Strong average-case circuit lower bounds from non-trivial derandomization. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC).Google ScholarDigital Library
Reference
[12] Chen Lijie, Rothblum Ron D., and Tell Roei. 2022. Unstructured hardness to average-case randomness. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS).Google ScholarCross Ref
Reference 1Reference 2
[13] Chen Lijie and Tell Roei. 2021. Hardness vs. randomness, revised: Uniform, non-black-box, and instance-wise. In Proceedings of the 62nd Annual IEEE Symposium on Foundations of Computer Science (FOCS). 125–136.Google Scholar
Reference
[14] Chen Lijie and Williams R. Ryan. 2019. Stronger connections between circuit analysis and circuit lower bounds, via PCPs of proximity. In Proceedings of the 34th Annual IEEE Conference on Computational Complexity (CCC). 19:1–19:43.Google Scholar
Reference
[15] Dell Holger, Husfeldt Thore, Marx Dániel, Taslaman Nina, and Wahlén Martin. 2014. Exponential time complexity of the permanent and the Tutte polynomial. ACM Trans. Algor. 10, 4 (2014).Google Scholar
Reference 1Reference 2Reference 3
[16] Fortnow Lance and Klivans Adam R.. 2009. Efficient learning algorithms yield circuit lower bounds. J. Comput. Syst. Sci. 75, 1 (2009), 27–36.Google ScholarDigital Library
Reference
[17] Fortnow Lance, Santhanam Rahul, and Williams Ryan. 2009. Fixed-polynomial size circuit bounds. In Proceedings of the 24th Annual IEEE Conference on Computational Complexity (CCC). 19–26.Google ScholarDigital Library
Reference 1Reference 2
[18] Fürer Martin, Goldreich Oded, Mansour Yishay, Sipser Michael, and Zachos Stathis. 1989. On completeness and soundness in interactive proof systems. Adv. Comput. Res. 5 (1989), 429–442.Google Scholar
Reference
[19] Goldreich Oded. 2008. Computational Complexity: A Conceptual Perspective. Cambridge University Press, New York, NY.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
[20] Goldreich Oded. 2011. In a world of P=BPP. In Studies in Complexity and Cryptography. Miscellanea on the Interplay Randomness and Computation. 191–232.Google Scholar
Reference 1Reference 2
[21] Goldreich Oded and Levin Leonid A.. 1989. A hard-core predicate for all one-way functions. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC). 25–32.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
[22] Goldreich Oded and Meir Or. 2015. Input-oblivious proof systems and a uniform complexity perspective on P/poly. ACM Trans. Computat. Theor. 7, 4 (2015).Google ScholarDigital Library
Reference
[23] Goldreich Oded and Rothblum Guy N.. 2017. Worst-case to average-case reductions for subclasses of P. Electron. Colloq. Computat. Complex. 26 (2017), 130.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[24] Gurevich Yuri and Shelah Saharon. 1989. Nearly linear time. In Proceedings of the Symposium on Logical Foundations of Computer Science: Logic at Botik. 108–118.Google ScholarDigital Library
[25] Guruswami Venkatesan, Umans Christopher, and Vadhan Salil. 2009. Unbalanced expanders and randomness extractors from Parvaresh-Vardy codes. J. ACM 56, 4 (2009).Google ScholarDigital Library
Reference 1Reference 2
[26] Gutfreund Dan, Shaltiel Ronen, and Ta-Shma Amnon. 2003. Uniform hardness versus randomness tradeoffs for Arthur-Merlin games. Computat. Complex. 12, 3-4 (2003), 85–130.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[27] Gutfreund Dan and Vadhan Salil. 2008. Limitations of hardness vs. randomness under uniform reductions. In Proceedings of the 12th International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM). 469–482.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[28] Harkins Ryan C. and Hitchcock John M.. 2013. Exact learning algorithms, betting games, and circuit lower bounds. ACM Trans. Computat. Theor. 5, 4 (2013).Google ScholarDigital Library
Reference
[29] Hartman Tzvika and Raz Ran. 2003. On the distribution of the number of roots of polynomials and explicit weak designs. Rand. Struct. Algor. 23, 3 (2003), 235–263.Google ScholarDigital Library
[30] Impagliazzo Russell, Kabanets Valentine, and Wigderson Avi. 2002. In search of an easy witness: Exponential time vs. probabilistic polynomial time. J. Comput. Syst. Sci. 65, 4 (2002), 672–694.Google ScholarDigital Library
Reference 1Reference 2
[31] Impagliazzo Russell and Paturi Ramamohan. 2001. On the complexity of k-SAT. J. Comput. Syst. Sci. 62, 2 (2001), 367–375.Google ScholarDigital Library
Reference
[32] Impagliazzo Russell, Paturi Ramamohan, and Zane Francis. 2001. Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63, 4 (2001), 512–530.Google ScholarDigital Library
Reference
[33] Impagliazzo R. and Wigderson A.. 1998. Randomness vs. time: De-randomization under a uniform assumption. In Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS). 734–.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
Reference 10
Reference 11
Reference 12
Reference 13
Reference 14
Reference 15
Reference 16
[34] Impagliazzo Russell and Wigderson Avi. 1999. \({\rm P}={\rm BPP}\) if \({\rm E}\) requires exponential circuits: Derandomizing the XOR lemma. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC). 220–229.Google Scholar
Reference 1Reference 2Reference 3
[35] Kabanets Valentine. 2001. Easiness assumptions and hardness tests: Trading time for zero error. Vol. 63. 236–252. https://www.sciencedirect.com/science/article/pii/S0022000001917635.Google Scholar
Reference
[36] Kannan R.. 1982. Circuit-size lower bounds and non-reducibility to sparse sets. Inf. Contr. 55, 1-3 (1982), 40–56.Google ScholarCross Ref
[37] Klivans Adam, Kothari Pravesh, and Oliveira Igor. 2013. Constructing hard functions using learning algorithms. In Proceedings of the 28th Annual IEEE Conference on Computational Complexity (CCC). 86–97.Google ScholarCross Ref
Reference
[38] Lipton Richard J. and Williams Ryan. 2013. Amplifying circuit lower bounds against polynomial time, with applications. Computat. Complex. 22, 2 (2013), 311–343.Google ScholarCross Ref
[39] Liu Yanyi and Pass Rafael. 2022. Characterizing derandomization through fine-grained hardness of Levin-Kolmogorov complexity. In Proceedings of the 37th Annual IEEE Conference on Computational Complexity (CCC).Google Scholar
Reference
[40] Lokshtanov Daniel, Marx Dániel, and Saurabh Saket. 2011. Lower bounds based on the exponential time hypothesis. Bull. Eur. Assoc. Theoret. Comput. Sci.105 (2011), 41–71.Google Scholar
Reference
[41] Lu Chi-Jen. 2001. Derandomizing Arthur-Merlin games under uniform assumptions. Computat. Complex. 10, 3 (2001), 247–259.Google ScholarCross Ref
Reference 1Reference 2
[42] Lund Carsten, Fortnow Lance, Karloff Howard, and Nisan Noam. 1992. Algebraic methods for interactive proof systems. J. Assoc. Comput. Machin. 39, 4 (1992), 859–868.Google ScholarDigital Library
Reference 1Reference 2
[43] Murray Cody and Williams Ryan. 2018. Circuit lower bounds for nondeterministic quasi-polytime: An easy witness lemma for NP and NQP. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing (STOC).Google ScholarDigital Library
Reference
[44] Nisan Noam and Wigderson Avi. 1994. Hardness vs. randomness. J. Comput. Syst. Sci. 49, 2 (1994), 149–167.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
[45] Oliveira Igor C.. 2013. Algorithms versus circuit lower bounds. Electron. Colloq. Computat. Complex. 20 (2013), 117.Google Scholar
Reference
[46] Oliveira Igor C. and Santhanam Rahul. 2017. Conspiracies between learning algorithms, circuit lower bounds, and pseudorandomness. In Proceedings of the 32nd Annual IEEE Conference on Computational Complexity (CCC). Vol. 79..Google Scholar
Reference 1Reference 2
[47] Pippenger Nicholas and Fischer Michael J.. 1979. Relations among complexity measures. J. ACM 26, 2 (1979), 361–381.Google ScholarDigital Library
[48] Santhanam Rahul. 2009. Circuit lower bounds for Merlin-Arthur classes. SIAM J. Comput. 39, 3 (2009), 1038–1061.Google ScholarDigital Library
Reference
[49] Santhanam Rahul and Williams Ryan. 2013. On medium-uniformity and circuit lower bounds. In Proceedings of the 28th Annual IEEE Conference on Computational Complexity (CCC). 15–23.Google ScholarCross Ref
Reference
[50] Shaltiel Ronen and Umans Christopher. 2005. Simple extractors for all min-entropies and a new pseudorandom generator. J. ACM 52, 2 (2005), 172–216.Google ScholarDigital Library
Reference
[51] Shaltiel Ronen and Umans Christopher. 2007. Low-end uniform hardness vs. randomness tradeoffs for AM. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC). 430–439.Google ScholarDigital Library
Reference 1Reference 2
[52] Shamir Adi. 1992. IP = PSPACE. J. ACM 39, 4 (1992), 869–877.Google ScholarDigital Library
Reference 1Reference 2
[53] Shoup Victor. 1990. New algorithms for finding irreducible polynomials over finite fields. Math. Comput. 54, 189 (1990), 435–447.Google ScholarCross Ref
Reference
[54] Sudan Madhu, Trevisan Luca, and Vadhan Salil. 2001. Pseudorandom generators without the XOR lemma. J. Comput. Syst. Sci. 62, 2 (2001), 236–266.Google ScholarDigital Library
Reference 1Reference 2
[55] Tell Roei. 2019. Proving that \(pr\mathcal {BPP}=pr\mathcal {P}\) is as hard as proving that “almost \(\mathcal {NP}\)” is not contained in \(\mathcal {P}/\mathrm{poly}\). Inf. Process. Lett. 152 (2019), 105841.Google ScholarCross Ref
Reference
[56] Trevisan Luca and Vadhan Salil P.. 2007. Pseudorandomness and average-case complexity via uniform reductions. Computat. Complex. 16, 4 (2007), 331–364.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
Reference 10
Reference 11
Reference 12
Reference 13
[57] Umans Christopher. 2003. Pseudo-random generators for all hardnesses. J. Comput. Syst. Sci. 67, 2 (2003), 419–440.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[58] Vadhan Salil P.. 2012. Pseudorandomness. Now Publishers.Google ScholarCross Ref
Reference 1Reference 2Reference 3
[59] Williams Ryan. 2013. Improving exhaustive search implies superpolynomial lower bounds. SIAM J. Comput. 42, 3 (2013), 1218–1244.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[60] Williams Ryan. 2014. Algorithms for circuits and circuits for algorithms: Connecting the tractable and intractable. In Proceedings of the International Congress of Mathematicians (ICM). 659–682.Google Scholar
Reference
[61] Williams Richard Ryan. 2016. Strong ETH breaks with Merlin and Arthur: Short non-interactive proofs of batch evaluation. In Proceedings of the 31st Annual IEEE Conference on Computational Complexity (CCC).Google Scholar
Reference
[62] Williams Virginia V.. 2015. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis. In Proceedings of the 10th International Symposium on Parameterized and Exact Computation. Vol. 43. 17–29.Google Scholar
Reference
[63] Williams Virginia Vassilevska. 2018. On some fine-grained questions in algorithms and complexity. Retrieved from https://people.csail.mit.edu/virgi/eccentri.pdf.Google Scholar
Reference 1Reference 2
[64] Woeginger Gerhard J.. 2003. Exact algorithms for NP-hard problems: A survey. In Combinatorial Optimization—Eureka, You Shrink!(Lecture Notes in Computer Science, Vol. 2570). Springer, Berlin, 185–207.Google Scholar
Reference

Index Terms

On Exponential-time Hypotheses, Derandomization, and Circuit Lower Bounds
1. Theory of computation
  1. Computational complexity and cryptography
    1. Circuit complexity
    2. Complexity classes
  2. Randomness, geometry and discrete structures
    1. Pseudorandomness and derandomization

Recommendations

Nondeterministic circuit lower bounds from mildly derandomizing Arthur-Merlin games

In several settings, derandomization is known to follow from circuit lower bounds that themselves are equivalent to the existence of pseudorandom generators. This leaves open the question whether derandomization implies the circuit lower bounds that are ...
Read More
Derandomizing polynomial identity tests means proving circuit lower bounds

We show that derandomizing Polynomial Identity Testing is essentially equivalent to proving arithmetic circuit lower bounds for NEXP. More precisely, we prove that if one can test in polynomial time (or even nondeterministic subexponential time, ...
Read More
Derandomizing polynomial identity tests means proving circuit lower bounds
STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing

We show that derandomizing Polynomial Identity Testing is, essentially, equivalent to proving circuit lower bounds for NEXP. More precisely, we prove that if one can test in polynomial time (or, even, nondeterministic subexponential time, infinitely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 70, Issue 4
August 2023
213 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3615982
Editor:
Venkatesan Guruswami
University of California, Berkeley, United States
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2023
- Online AM: 20 April 2023
- Accepted: 12 March 2023
- Revised: 14 July 2022
- Received: 21 April 2021
Published in jacm Volume 70, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Exponential-time hypothesis
derandomization
circuit lower bounds
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 409
  Total Downloads
- Downloads (Last 12 months)409
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On Exponential-time Hypotheses, Derandomization, and Circuit Lower Bounds

Journal of the ACM

Abstract

1 INTRODUCTION

1.1 Our Results: Bird’s Eye

1.2 rETH and Pseudorandom Generators for Uniform Circuits

1.2.1 Background: Uniform Hardness vs. Randomness.

1.2.2 Our Contribution to Uniform Hardness vs. Randomness.

(rETH → PRG with Almost-exponential Stretch for Uniform Circuits; Informal)

(aa-rETH → Almost-always Derandomization in Timenpolyloglog(n); Informal)

(Non-deterministic Extensions).

1.3 NETH and an Equivalence of Derandomization and Circuit Lower Bounds

1.3.1 Background and a Surprising Observation.

1.3.2 Our Results: Even Very Weak Forms of NETH Suffice for the Equivalence.

(NETH → Circuit Lower Bounds are Equivalent to Derandomization; “Low-end” Setting)

(NETH → Circuit Lower Bounds are Equivalent to Derandomization; “High-end” Setting)

(NTIME-uniform Circuits for ℰ, Non-deterministic Derandomization, and Circuit Lower Bounds)

1.4 Disproving a Version of rETH Requires Circuit Lower Bounds

(Circuit Lower Bounds from RandomizedCircuitSATAlgorithms)

1.5 Open Problems and Subsequent Work

2 TECHNICAL OVERVIEW

2.1 Near-optimal Uniform Hardness-to-randomness Results for TQBF

2.1.1 The Well-structured Function fws.

2.1.2 Instantiating the Reference [33] Proof Framework with the Function fws.

2.1.3 The “Almost-always” Version: Proof of Theorem 1.2.

2.2 \(\mathcal {NTIME}\)-uniform Circuits for ℰ and an Equivalence between Derandomization and Circuit Lower Bounds

2.3 Circuit Lower Bounds from Randomized CircuitSAT Algorithms

3 PRELIMINARIES

3.1 Complexity Classes

3.2 Two Exponential-time Hypotheses

rETHsee Reference [15])

(NETH; see Reference [7])

3.3 Worst-case Derandomization and Pseudorandom Generators

(CAPP)

(Averaging Sampler).

3.4 Average-case Derandomization and Pseudorandom Generators

(Distinguishing Distributions from Uniform).

(PRGs for Uniform Circuits).

(HSGs for Uniform Circuits).

(HSGs for Uniform Circuits → Derandomization of RP “On Average”)

3.5 An ℰ-complete Problem with Useful Properties

(Instance Checkers).

(Random Self-reducible Function).

(An ℰ-complete Problem that is Random Self-reducible and has a Good Instance Checker)

4 RETH AND NEAR-OPTIMAL UNIFORM HARDNESS-TO-RANDOMNESS

4.1 Construction of a Well-structured Function

4.1.1 Well-structured Function: Definition.

(Downward Self-reducibility in Few Steps).

(Sample-aided Reductions; see.

(Sample-aided Worst-case to Average-case Reducibility).

(Well-structured Function).

(Reductions to Multi-output Functions).

4.1.2 Overview of Our Construction.

4.1.3 The Construction Itself.

(TQBF)

(A Well-structured Set That is Hard forTQBFunder Quasilinear Reductions)

(A Variant ofTQBFwith Verification that is Local in Both Input and Witness)

(Low-degree Arithmetization).

4.2 PRGs for Uniform Circuits with Almost-exponential Stretch

(Almost-exponential Hardness of a Well-structured Function → PRG for Uniform Circuits with Almost-exponential Stretch)

(Distinguishing a PRG based onfws → Computingfws)

(Combinatorial Designs).

(LearningfGL(ws) ⇛ Computingfws)

(“Almost Everywhere” Hardness offws → “Almost Everywhere” Derandomization of RP “On Average”)

(“Almost Everywhere” Hardness offws → “Almost Everywhere” Derandomization of BPP “On Average” with Short Advice)

(A PRG That Runs in Quasilogarithmic Space).

4.3 Proofs of Theorems 1.1 and 1.2

(rETH → i.o.-PRG for Uniform Circuits)

(a.a.-rETH → Almost-always HSG for Uniform Circuits and Almost-always “Average-case” Derandomization of\(\mathcal {BPP}\))

5 NETH AND THE EQUIVALENCE OF DERANDOMIZATION AND CIRCUIT LOWER BOUNDS

(.

(Size Functions and Time Functions).

5.1 Strengthened Karp-Lipton Style Results

5.1.1 Solving (1,1/3)CAPP using Many Untrusted CAPP Algorithms.

(Parametrized CAPP)

(SolvingCAPPwith One-sided Error on a Fixed Input Length Reduces to SolvingpCAPPon an Unknown “Close” Input Length)

(Search-to-decision Reduction that Preserves the Input Length).

5.1.2 A Strengthened Karp-Lipton Style Result for the “Low-end” Setting.

5.1.3 A Strengthened Karp-Lipton Style Result for the “High-end” Setting.

(Non-corrupt\(C^{L^{\mathtt {nice}}}_{\bar{n}}\) → Probabilistic Circuit for\(L^{\mathtt {nice}}\))

(aa-rETH → Almost-always Derandomization in Timen^{polyloglog(n)}; Informal)

2.1.1 The Well-structured Function f^ws.

2.1.2 Instantiating the Reference [33] Proof Framework with the Function f^ws.

(Distinguishing a PRG based onf^ws → Computingf^ws)

(Learningf^GL(ws) ⇛ Computingf^ws)

(“Almost Everywhere” Hardness off^ws → “Almost Everywhere” Derandomization of RP “On Average”)

(“Almost Everywhere” Hardness off^ws → “Almost Everywhere” Derandomization of BPP “On Average” with Short Advice)

(Diagonalization Against Circuits in Σ₄)

6.2 Randomized Σ₂-SAT[n] Algorithms Imply BPE Circuit Lower Bounds