1 Introduction

Building on the earlier theory of spin-glasses, statistical physicists in the early 2000s developed a detailed collection of predictions for a broad class of sparse random constraint satisfaction problems (rcsp). These predictions describe a series of phase transitions as the constraint density varies, which is governed by one-step replica symmetry breaking (1rsb) ([31, 34]; cf. [4] and Chapter 19 of [33] for a survey). We study one of such rcsp’s, named the random regular k-nae-sat model, which is perhaps the most mathematically tractable among the 1rsb class of rcsp’s. As a continuation of our companion work [37], this paper completes our program to establish that the 1rsb prediction for the random regular nae-sat hold with probability arbitrarily close to one.

The nae-sat problem is a random Boolean cnf formula, where n Boolean variables are subject to constraints in the form of clauses which are the “or” of k of the variables or their negations chosen uniformly at random. The formula itself is the “and” of these clauses. A variable assignment \(\underline{x}\in \{0,1\}^{n}\) is called a nae-sat solution if both \(\underline{ x}\) and \(\lnot \underline{ x}\) evaluate to true. We then choose a uniformly random instance of d-regular (each variable appears d times) k-nae-sat (each clause has k literals) problem, which gives the random d-regular k-nae-sat problem with clause density \(\alpha =d/k\) (see Sect. 2 for the formal definition).

Let \(Z_n\) denote the number of solutions for a given random d-regular k-nae-sat instance. The physics prediction is that for each fixed \(\alpha \), there exists \(\textsf {f}(\alpha )\) called the free energy such that

$$\begin{aligned} \frac{1}{n} \log Z_n \;\longrightarrow \; \textsf {f}(\alpha ) \quad \text {in probability}. \end{aligned}$$

A direct computation of the first moment \(\mathbb {E}Z_n\) gives that

$$\begin{aligned} \mathbb {E}Z_n = 2^n \left( 1-2^{-k+1} \right) ^m = e^{n\textsf {f}^{\textsf {rs}}(\alpha )}, \quad \text {where}\quad \textsf {f}^{\textsf {rs}}(\alpha )\equiv \log 2+ \alpha \log \left( 1-2^{-k+1}\right) , \end{aligned}$$

(\(\textsf {f}^{\textsf {rs}}(\alpha )\) is called the replica-symmetric free energy), so \(\textsf {f}\le \textsf {f}^{\textsf {rs}}\) holds by Markov’s inequality. The work of Ding–Sly–Sun [25] and Sly–Sun–Zhang [43] established some of the physics conjectures on the description of \(Z_n\) and \(\textsf {f}\) given in [31, 36, 45], which are summarized as follows.

  • ([25]) For large enough k, there exists the satisfiability threshold \(\alpha _{\textsf {sat}}\equiv \alpha _{\textsf {sat}}(k)>0\) such that

    $$\begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}(Z_n>0) = {\left\{ \begin{array}{ll} 1 &{} \text { for } \alpha \in (0,\alpha _{\textsf {sat}});\\ 0 &{} \text { for }\alpha > \alpha _{\textsf {sat}}. \end{array}\right. } \end{aligned}$$
  • ([43]) For large enough k, there exists the condensation threshold \(\alpha _{\textsf {cond}}\equiv \alpha _{\textsf {cond}}(k)\in (0,\alpha _{\textsf {sat}})\) such that

    $$\begin{aligned} \textsf {f}(\alpha )= {\left\{ \begin{array}{ll} \textsf {f}^{\textsf {rs}}(\alpha ) &{} \text { for } \alpha \le \alpha _{\textsf {cond}};\\ \textsf {f}^{1\textsf {rsb}}(\alpha ) &{} \text { for } \alpha > \alpha _{\textsf {cond}}, \end{array}\right. } \end{aligned}$$
    (1)

    where \(\textsf {f}^{1\textsf {rsb}}(\alpha )\) is the 1rsb free energy. Moreover, \(\textsf {f}^{\textsf {rs}}(\alpha ) > \textsf {f}^{\textsf {1\textsf {rsb}}}(\alpha )\) holds for \(\alpha \in (\alpha _{\textsf {cond}},\alpha _{\textsf {sat}})\). For the explicit formula and derivation of \(\textsf {f}^{1\textsf {rsb}}(\alpha )\) and \(\alpha _{\textsf {cond}}\), we refer to Section 1.6 of [43] for a concise overview.

Furthermore, there are more detailed physics predictions that the solution space of the random regular k-nae-sat is condensed when \(\alpha \in (\alpha _{\textsf {cond}},\alpha _{\textsf {sat}})\) into a finite number of clusters. Here, cluster is defined by the connected component of the solution space, where we connect two solutions if they differ by one variable. Indeed, in [37], we proved that for large enough k, the solution space of random regular k-nae-sat indeed becomes condensed in the condensation regime for a positive fraction of the instances. That is, it holds with probability strictly bounded away from 0.

The following theorem strengthens the aforementioned result and shows that the condensation phenomenon holds with probability arbitrarily close to 1.

Theorem 1.1

Let \(k\ge k_0\) where \(k_0\) is a large absolute constant, and let \(\alpha \in (\alpha _{\textsf {cond}}, \alpha _{\textsf {sat}})\) such that \(d\equiv \alpha k\in \mathbb {N}\). For all \(\varepsilon >0\) and \(M\in \mathbb {N}\), there exist constants \(K\equiv K(\varepsilon ,\alpha ,k)\in \mathbb {N}\) and \(C\equiv C(M,\varepsilon ,\alpha ,k)>0\) such that with probability at least \(1-\varepsilon \), the random d-regular k-nae-sat instance satisfies the following:

  1. (a)

    The K largest solution clusters, \(\mathcal {C}_1,\ldots ,\mathcal {C}_K\), occupy at least \(1-\varepsilon \) fraction of the solution space;

  2. (b)

    There are at least \(\exp (n \text {\textsf{f}}^{1 \text {\textsf{rsb}}}(\alpha ) -c^{\star }\log n -C )\) many solutions in \(\mathcal {C}_1,\ldots ,\mathcal {C}_M\), the M largest clusters (see Definition 2.19 for the definition of \(c^\star \)).

Remark 1.2

Throughout the paper, we take \(k_0\) to be a large absolute constant so that the results of [25, 43] and [37] hold. In addition, it was shown in [43, Proposition 1.4] that \((\alpha _{\textsf {cond}}, \alpha _{\textsf {sat}})\) is a subset of \((\alpha _{\textsf {lbd}}, \alpha _{\textsf {ubd}})\), where \(\alpha _{\textsf {lbd}}\equiv (2^{k-1}-2)\log 2\) and \(\alpha _{\textsf {ubd}}\equiv 2^{k-1}\log 2\), so we restrict our attention to \(\alpha \in (\alpha _{\textsf {lbd}}, \alpha _{\textsf {ubd}})\).

1.1 One-step replica symmetry breaking

In the condensation regime \(\alpha \in (\alpha _{\textsf {cond}},\alpha _{\textsf {sat}})\), the random regular k-nae-sat model is believed to possess a single layer of hierarchy of clusters in the solution space. That is, the solutions are fairly well-connected inside each cluster so that no additional hierarchical structure in it. Such behavior is conjectured in various other models such as random graph coloring and random k-sat. We remark that there are also other models such as maximum independent set (or high-fugacity hard-core model) in random graphs with small degrees [10] and Sherrington-Kirkpatrick model [41, 44], which are expected or proven [6] to undergo full rsb, which means that there are infinitely many levels of hierarchy inside the solution clusters.

A way to characterize 1rsb is to look at the overlap between two uniformly drawn solutions. In the condensation regime, there are a bounded number of clusters containing most of the solutions. Thus, the event of two solutions belonging to the same cluster, or different clusters, each happen with a non-trivial probability. According to the description of 1rsb, there is no additional structure inside each cluster, so the Hamming distance between the two solutions is expected to concentrate precisely at two values, depending on whether they came from the same cluster or not.

It was verified in [37] that the overlap concentrates at two values for a positive fraction of the random regular nae-sat instances. Theorem 1.4 below verifies that the overlap concentration happens for almost all random regular nae-sat instances.

Definition 1.3

For \(\underline{x}^1,\underline{x}^2 \in \{0,1\}^n\), let \(\underline{y}^i = 2\underline{x}^i - {\textbf {1}}\). The overlap \(\rho (\underline{x}^1,\underline{x}^2)\) is defined by

$$\begin{aligned} \rho (\underline{x}^1,\underline{x}^2) \equiv \frac{1}{n} \underline{y}^1 \cdot \underline{y}^2 = \frac{1}{n} \sum _{i=1}^n y^1_i y^2_i. \end{aligned}$$

In words, the overlap is the normalized difference between the number of variables with the same value and the number of those with different values.

Theorem 1.4

Let \(k\ge k_0\), \(\alpha \in (\alpha _{\textsf {cond}}, \alpha _{\textsf {sat}})\) such that \(d\equiv \alpha k\in \mathbb {N}\), and \(p^\star \equiv p^\star (\alpha ,k)\in (0,1)\) be a fixed constant (for its definition, see Definition 6.8 of [37]). For all \(\varepsilon >0\), there exist constants \(\delta =\delta (\varepsilon ,\alpha ,k)>0\) and \(C\equiv C(\varepsilon ,\alpha ,k)\) such that with probability at least \(1-\varepsilon \), the random d-regular k-nae-sat instance \(\mathscr {G}\) satisfies the following. Let \( {x}^1, {x}^2\in \{0,1\}^n\) be independent, uniformly chosen satisfying assignments of \(\mathscr {G}\). Then, the absolute value \(\rho _{\text {abs}} \equiv |\rho |\) of their overlap \(\rho \equiv \rho (\underline{x}^1,\underline{x}^2)\) satisfies

  1. (a)

    \(\mathbb {P}(\rho _{\text {abs}}\le n^{-1/3} |\mathscr {G}) \ge \delta \);

  2. (b)

    \(\mathbb {P}( \big |\rho _{\text {abs}}-p\big | \le n^{-1/3} |\mathscr {G})\ge \delta \);

  3. (c)

    \(\mathbb {P}( \min \{ \rho _{\text {abs}}, |\rho _{\text {abs}}-p|\} \ge n^{-1/3}|\mathscr {G})\le Cn^{-1/4}\).

1.2 Related works

Many of the earlier works on rcsps focused on determining the satisfiability thresholds and for rcsp models that are known not to exhibit rsb, such goals were established. These models include random linear equations [7], random 2-sat [12, 13], random 1-in-k-sat [1] and k-xor-sat [22, 26, 38]. On the other hand, for the models which are predicted to exhibit rsb, intensive studies have been conducted to estimate their satisfiability threshold (random k-sat [5, 18, 30], random k-nae-sat [2, 17, 21] and random graph coloring [3, 14, 15, 19]).

More recently, the satisfiability thresholds for rcsps in the 1rsb class have been rigorously determined for several models including maximum independent set [24], random regular k-nae-sat [25], random regular k-sat [18] and random k-sat [23]). These works carried out a demanding second moment method to the number of clusters instead of the number of solutions. Although determining the colorability threshold is left open, the condensation threshold for random graph coloring was established in [9], where they conducted a challenging analysis based on a clever “planting" technique, and the results were generalized to other models in [16]. Also, [8] identified the condensation threshold for random regular k-sat, where each variable appears d/2-times positive and d/2-times negative.

Further theory was developed in [43] to establish the 1rsb free energy for random regular k-nae-sat in the condensation regime by applying the second moment method to the \(\lambda \)-tilted partition function. Later, our companion paper [37] made further progress in the same model by giving a cluster level description of the condensation phenomenon. Namely, [37] showed that with positive probability, a bounded number of clusters dominate the solution space and the overlap concentrates on two points in the condensation regime. Our main contribution is to push the probability arbitrarily close to one and show that the same phenomenon holds with high probability.

Lastly, [11] studied the random k-max-nae-sat beyond \(\alpha _{\textsf {sat}}\), where they verified that the 1rsb description breaks down before \(\alpha \asymp k^{-3}4^k\). Indeed, the Gardner transition from 1rsb to full rsb is expected at \(\alpha _{\textsf {Ga}}\asymp k^{-3}4^k >\alpha _{\textsf {sat}}\) [32, 35], and [11] provides evidence of this phenomenon.

1.3 Proof ideas

In [37], the majority of the work was to compute moments of the tilted cluster partition function \(\overline{{{\textbf {Z}}}}_\lambda \) and \(\overline{{{\textbf {Z}}}}_{\lambda ,s}\), defined as

$$\begin{aligned} \overline{{{\textbf {Z}}}}_\lambda \equiv \sum _{\Upsilon } |\Upsilon |^\lambda \quad \text{ and }\quad \overline{{{\textbf {Z}}}}_{\lambda ,s} := \sum _{\Upsilon } |\Upsilon |^\lambda \, \mathbb {1}{\{|\Upsilon | \in [e^{ns}, e^{ns+1}) \}}. \end{aligned}$$
(2)

where the sums are taken over all clusters \(\Upsilon \). Moreover, let \(\overline{{{\textbf {N}}}}_{s}\) denote the number of clusters whose size is in the interval \([e^{ns}, e^{ns +1} )\), i.e.

$$\begin{aligned} \overline{{{\textbf {N}}}}_s :=\overline{{{\textbf {Z}}}}_{0,s}. \end{aligned}$$
(3)

Denote by \(s_\circ \) the size of the solution space in normalized logarithmic scale from Theorem 1.1:

$$\begin{aligned} s_\circ \equiv s_\circ (n,\alpha ,C) \equiv \textsf {f}^{1\textsf {rsb}}(\alpha ) -\frac{ c_\star \log n}{n} - \frac{C}{n}, \end{aligned}$$
(4)

where \(c_\star \) is the constant introduced in Theorem 1.1 and \(C\in \mathbb {R}\). In [37], we obtained the estimates on \(\overline{{\textbf {N}}}_{s_\circ }\) from the second moment method showing that \(\mathbb {E}[\overline{{{\textbf {N}}}}_{s_\circ }^2] \lesssim _{k} (\mathbb {E}\overline{{{\textbf {N}}}}_{s_\circ })^2\) holds, and that \(\mathbb {E}\overline{{{\textbf {N}}}}_{s_\circ }\) decays exponentially as \(C\rightarrow -\infty \). Thus, it was shown in [37] that

$$\begin{aligned} \mathbb {P}(\overline{{{\textbf {N}}}}_{s_\circ }>0) {\left\{ \begin{array}{ll} \rightarrow 0, &{}{} \text{ as } C\rightarrow -\infty ;\\ \ge c>0, &{}{} \text{ as } C \rightarrow \infty . \end{array}\right. } \end{aligned}$$

However, in order to establish (a) and (b) of the Theorem 1.1, we need to push the probability in the second line to \(1-\varepsilon \).

To do so, one may hope to have \(\mathbb {E}[\overline{{{\textbf {N}}}}_{s_\circ }^2] \approx (\mathbb {E}\overline{{{\textbf {N}}}}_{s_\circ })^2\) to deduce \(\mathbb {P}(\overline{{{\textbf {N}}}}_{s_\circ } >0) {\rightarrow } 1\) for large enough C, but this is false in the case of random regular nae-sat. The primary reasons is that short cycles in the graph causes multiplicative fluctuations in \(\overline{{{\textbf {N}}}}_{s_\circ }\). Therefore, our approach is to rescale \(\overline{{{\textbf {N}}}}_{s_\circ }\) according to the effects of short cycles, rescaled partition function \(\widetilde{{{\textbf {N}}}}_{s_\circ }\) concentrate around its expectation. That is, \(\mathbb {E}[\widetilde{{{\textbf {N}}}}_{s_\circ }^2] \approx (\mathbb {E}\widetilde{{{\textbf {N}}}}_{s_\circ })^2\) (to be precise, this will only be true when C is large enough, due to the intrinsic correlations coming from the largest clusters). Furthermore, we argue that the fluctuations coming from the short cycles are not too big, and hence can be absorbed by \(\overline{{{\textbf {N}}}}_{s_\circ }\) if \(\mathbb {E}\overline{{{\textbf {N}}}}_{s_\circ }\) is large. To this end, we develop a new argument that combines small subgraph conditioning [39, 40], which is a widely used tool in problems on random graphs, and the Doob martingale approach used in [24, 25], which are not effective in our model if used alone.

The small subgraph conditioning method ([39, 40]; for a survey, see Chapter 9.3 of [29]) has proven to be useful in many settings [27, 28, 42] to derive the precise distributional limits of partition functions. For example, [27] applied this method to the proper coloring model of bipartite random regular graphs, where they determined the limiting distribution of the number of colorings. However, this method relies much on the algebraic identities specific to the model which are sometimes not tractable, including our case. Roughly speaking, one needs a fairly clear combinatorial formula of the second moment to carry out the algebraic and combinatorial computations.

Another technique that inspired our proof, which we will refer to as the Doob martingale approach, was introduced in [24, 25]. This method rather directly controls the multiplicative fluctuations of \(\overline{\text {{\textbf {N}}}}_{s_\circ }\) by investigating the Doob martingale increments of \(\log \overline{{{\textbf {N}}}}_{s_\circ }\). It has proven to be useful in the study of models like random regular nae-sat, as seen in [25]. However, in the spin systems with infinitely many spins like our model, some of the key estimates in the argument become false, due to the existence of rare spins (or large free components).

Our approach blends the two techniques in a novel way to back up each other’s limitations. Although we could not algebraically derive the identities required for the small subgraph conditioning, we instead deduce them by a modified Doob martingale approach for the truncated model which has a finite spin space. Then, we send the truncation parameter to infinity on these algebraic identities, and show that they converge to the corresponding formulas for the untruncated model. This step requires a more refined understanding on the first and second moments of \(\widetilde{{{\textbf {N}}}}_{s_\circ }\) including the constant coefficient of the leading exponential term, whereas the order of the leading order was sufficient in the earlier works [25, 43]. We then appeal to the small subgraph conditioning method to deduce the conclusion based on those identities. We believe that our approach is potentially applicable to other models with an infinite spin space where the traditional small subgraph conditioning method is not tractable.

1.4 Notational conventions

For non-negative quantities \(f=f_{d,k, n}\) and \(g=g_{d,k,n}\), we use any of the equivalent notations \(f=O_{k}(g), g= \Omega _k(f), f\lesssim _{k} g\) and \(g \gtrsim _{k} f \) to indicate that for each \(k\ge k_0\),

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{f}{g} <\infty , \end{aligned}$$

with the convention \(0/0\equiv 1\). We drop the subscript k if there exists a universal constant C such that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{f}{g} \le C. \end{aligned}$$

When \(f\lesssim _{k} g\) and \(g\lesssim _{k} f\), we write \(f\asymp _{k} g\). Similarly when \(f\lesssim g\) and \(g\lesssim f\), we write \(f \asymp g\).

2 The Combinatorial Model

We begin with introducing the mathematical framework to analyze the clusters of solutions. We follow the formulation derived in [43, Section 2]. In [37], we needed further definitions in addition to those from [43], but in this work it is enough to rely on the concepts of [43]. In this section, we briefly review the necessary definitions for completeness.

There is a natural graphical representation to describe a d-regular k-nae-sat instance by a labelled (dk)-regular bipartite graph: Let \(V=\{ v_1, \ldots , v_n \}\) and \(F=\{a_1, \ldots , a_m \}\) be the sets of variables and clauses, respectively. Connect \(v_i\) and \(a_j\) by an edge if \(v_i\) is one of the variables contained in the clause \(a_j\). Let \(\mathcal {G}=(V,F,E)\) be this bipartite graph, and let \(\texttt {L}_e\in \{0,1\}\) for \(e\in E\) be the literal corresponding to the edge e. Then, the labelled bipartite graph \(\mathscr {G}=(V,F,E,\underline{\texttt {L}})\equiv (V,F,E,\{\texttt {L}_{e}\}_{e\in E})\) represents a nae-sat instance.

For each \(e\in E\), we denote the variable(resp. clause) adjacent to it by v(e) (resp. a(e)). Moreover, \(\delta v\) (resp. \(\delta a\)) are the collection of adjacent edges to \(v\in V\) (resp. \(a \in F\)). We denote \(\delta v {\setminus } e:= \delta v {\setminus } \{e\}\) and \(\delta a \setminus e:= \delta a \setminus \{e\}\) for simplicity. Formally speaking, we regard E as a perfect matching between the set of half-edges adjacent to variables and those to clauses which are labelled from 1 to \(nd=mk\), and hence a permutation in \(S_{nd}\).

Definition 2.1

For an integer \(l\ge 1\) and \(\underline{{\textbf {x}}}=(\textbf{x}_i) \in \{0,1\}^l\), define

$$\begin{aligned} I^{\textsc {nae}}(\underline{{\textbf {x}}}) := \mathbb {1} \{\underline{{\textbf {x}}}\text { is neither identically } 0 \text { nor }1 \}. \end{aligned}$$
(5)

Let \(\mathscr {G}= (V,F,E,\underline{\texttt {L}})\) be a nae-sat instance. An assignment \(\underline{{\textbf {x}}}\in \{0,1\}^V\) is called a solution if

$$\begin{aligned} I^{\textsc {nae}}(\underline{{\textbf {x}}};\mathscr {G}) := \prod _{a\in F} I^{\textsc {nae}} \big ((\textbf{x}_{v(e)} \oplus \texttt {L}_e)_{e\in \delta a}\big ) =1, \end{aligned}$$
(6)

where \(\oplus \) denotes the addition mod 2. Denote \(\textsf {SOL}(\mathscr {G})\subset \{0,1\}^V\) by the set of solutions and endow a graph structure on \(\textsf {SOL}(\mathscr {G})\) by connecting \(\underline{{\textbf {x}}}\sim \underline{{\textbf {x}}}'\) if and only if they have a unit Hamming distance. Also, let \(\textsf {CL}(\mathscr {G})\) be the set of clusters, namely the connected components under this adjacency.

2.1 The frozen configuration

Our first step is to define frozen configuration which is a basic way of encoding clusters. We introduce free variable which we denote by \({\texttt {f}}\), whose Boolean addition is defined as \(\texttt {f}\oplus 0=\texttt {f}\oplus 1=\texttt {f}\). Recalling the definition of \(I^{\textsc {nae}}\) (6), a frozen configuration is defined as follows.

Definition 2.2

(Frozen configuration). For \(\mathscr {G}= (V,F,E, \underline{\texttt {L}})\), \(\underline{x}\in \{0,1,{\texttt {f}}\}^V\) is called a frozen configuration if the following conditions are satisfied:

  • No nae-sat constraints are violated for \(\underline{x}\). That is, \(I^{\textsc {nae}}(\underline{x};\mathscr {G})=1\).

  • For \(v\in V\), \(x_v\in \{0,1\}\) if and only if it is forced to be so. That is, \(x_v\in \{0,1\}\) if and only if there exists \(e\in \delta v\) such that a(e) becomes violated if \(\texttt {L}_e\) is negated, i.e., \(I^{\textsc {nae}} (\underline{x}; \mathscr {G}\oplus \mathbb {1}_e )=0\) where \(\mathscr {G}\oplus \mathbb {1}_e\) denotes \(\mathscr {G}\) with \(\texttt {L}_e\) flipped. \(x_v={\texttt {f}}\) if and only if no such \(e\in \delta v\) exists.

We record the observations which are direct from the definition. Details can be found in the previous works ([25], Section 2 and [43], Section 2).

  1. (1)

    We can map a nae-sat solution \(\underline{{\textbf {x}}}\in \{0,1 \}^V\) to a frozen configuration via the following coarsening algorithm: If there is a variable v such that \(\textbf{x}_v\in \{0,1\}\) and \(I^{\textsc {nae}}(\underline{{\textbf {x}}};\mathscr {G}) = I^{\textsc {nae}}(\underline{{\textbf {x}}}\oplus \mathbb {1}_v;\mathscr {G})=1\) (i.e., flipping \(\textbf{x}_v\) does not violate any clause), then set \(\textbf{x}_v = \text {{\texttt {f}}}\). Iterate this process until additional modifications are impossible.

  2. (2)

    All solutions in a cluster \(\Upsilon \in \textsf{CL}(\mathscr {G})\) are mapped to the same frozen configuration \(\underline{x}\equiv \underline{x}[\Upsilon ] \in \{0,1,\text {{\texttt {f}}}\}^V\). However, coarsening algorithm is not necessarily surjective. For instance, a typical instance of \(\mathscr {G}\) does not have a cluster corresponding to all-free (\(\underline{{{\textbf {x}}}}\equiv {\texttt {f}}\)).

2.2 Message configurations

Although the frozen configurations provides a representation of clusters, it does not tell us how to comprehend the size of clusters. The main obstacle in doing so comes from the connected structure of free variables which can potentially be complicated. We now introduce the notions to comprehend this issue in a tractable way.

Definition 2.3

(Separating and forcing clauses). Let \(\underline{x}\) be a given frozen configuration on \(\mathscr {G}= (V,F,E,\underline{\texttt {L}})\). A clause \(a\in F\) is called separating if there exist \(e, e^\prime \in \delta a\) such that \(\texttt {L}_{e}\oplus x_{v(e)} = 0, \quad \texttt {L}_{e^\prime } \oplus x_{v(e^\prime )}=1.\) We say \(a\in F\) is non-separating if it is not a separating clause. Moreover, \(e\in E\) is called forcing if \(\texttt {L}_{e}\oplus x_{v(e)} \oplus 1 = \texttt {L}_{e'}\oplus x_{v(e')}\in \{0,1\}\) for all \(e'\in \delta a(e) {\setminus } e\). We say \(a\in F\) is forcing, if there exists \(e\in \delta a\) which is a forcing edge. In particular, a forcing clause is also separating.

Observe that a non-separating clause must be adjacent to at least two free variables, which is a fact frequently used throughout the paper.

Definition 2.4

(Free cycles). Let \(\underline{x}\) be a given frozen configuration on \(\mathscr {G}= (V,F,E,\underline{\texttt {L}})\). A cycle in \(\mathscr {G}\) (which should be of an even length) is called a free cycle if

  • Every variable v on the cycle is \(x_v = {\texttt {f}}\);

  • Every clause a on the cycle is non-separating.

Throughout the paper, our primary interest is on the frozen configurations which does not contain any free cycles. If \(\underline{x}\) does not have any free cycle, then we can easily extend it to a nae-sat solution in \(\underline{{\textbf {x}}}\) such that \(\textbf{x}_v = x_v\) if \(x_v\in \{0,1\}\), since nae-sat problem on a tree is always solvable.

Definition 2.5

(Free trees). Let \(\underline{x}\) be a frozen configuration in \(\mathscr {G}\) without any free cycles. Consider the induced subgraph H of \(\mathscr {G}\) consisting of free variables and non-separating clauses. Each connected component of H is called free piece of \(\underline{x}\) and denoted by \(\mathfrak {t}^{\text {in}}\). For each free piece \(\mathfrak {t}^{\text {in}}\), the free tree \(\mathfrak {t}\) is defined by the union of \(\mathfrak {t}^{\text {in}}\) and the half-edges incident to \(\mathfrak {t}^{\text {in}}\).

For the pair \((\underline{x}, \mathscr {G})\), we write \(\mathscr {F}(\underline{x},\mathscr {G})\) to denote the collection of free trees inside \((\underline{x}, \mathscr {G})\). We write \(V(\mathfrak {t})=V(\mathfrak {t}^{\text {in}})\), \(F(\mathfrak {t})=F(\mathfrak {t}^{\text {in}})\) and \(E(\mathfrak {t})=E(\mathfrak {t}^{\text {in}})\) to be the collection of variables, clauses and (full-)edges in \(\mathfrak {t}\). Moreover, define \(\dot{\partial } \mathfrak {t}\) (resp. \(\hat{\partial } \mathfrak {t}\)) to be the collection of boundary half-edges that are adjacent to \(F(\mathfrak {t})\) (resp. \(V(\mathfrak {t})\)), and write \(\partial \mathfrak {t}:= \dot{\partial }\mathfrak {t}\sqcup \hat{\partial } \mathfrak {t}\)

We now introduce the message configuration, which enables us to calculate the size of a free tree (that is, number of nae-sat solutions on \(\mathfrak {t}\) that extends \(\underline{x}\)) by local quantities. The message configuration is given by \(\underline{\tau }= (\tau _e)_{e\in E} \in \mathscr {M}^E\) (\(\mathscr {M}\) is defined below). Here, \(\tau _e=(\dot{\tau }_e,\hat{\tau }_e)\), where \(\dot{\tau }\) (resp. \(\hat{\tau }\)) denotes the message from v(e) to a(e) (resp. a(e) to v(e)).

A message will carry information of the structure of the free tree it belongs to. To this end, we first define the notion of joining l trees at a vertex (either variable or clause) to produce a new tree. Let \(t_1,\ldots , t_l\) be a collection of rooted bipartite factor trees satisfying the following conditions:

  • Their roots \(\rho _1,\ldots ,\rho _l\) are all of the same type (i.e., either all-variables or all-clauses) and are all degree one.

  • If an edge in \(t_i\) is adjacent to a degree one vertex, which is not the root, then the edge is called a boundary-edge. The rest of the edges are called internal-edges. For the special case where \(t_i\) consists of a single edge and a single vertex, we regard the single edge to be a boundary-edge.

  • \(t_1,\ldots ,t_l\) are boundary-labelled trees, meaning that their variables, clauses, and internal edges are unlabelled (except we distinguish the root), but the boundary edges are assigned with values from \(\{0,1,{{\texttt {S}}}\}\), where \({{\texttt {S}}}\) stands for ‘separating’.

Then, the joined tree \(t \equiv \textsf {j}(t_1,\ldots , t_l) \) is obtained by identifying all the roots as a single vertex o, and adding an edge which joins o to a new root \(o'\) of an opposite type of o (e.g., if o was a variable, then \(o'\) is a clause). Note that \(t= \textsf {j}(t_1,\ldots ,t_l)\) is also a boundary-labelled tree, whose labels at the boundary edges are induced by those of \(t_1,\ldots ,t_l\).

For the simplest trees that consist of single vertex and a single edge, we use 0 (resp. 1) to stand for the ones whose edge is labelled 0 (resp. 1): for the case of \(\dot{\tau }\), the root is the clause, and for the case of \(\hat{\tau }\), the root is the variable. Also, if its root is a variable and its edge is labelled \({{\texttt {S}}}\), we write the tree as \({{\texttt {S}}}\).

We can also define the Boolean addition to a boundary-labelled tree t as follows. For the trees 0, 1, the Boolean-additions \(0\oplus \texttt {L}\), \(1\oplus \texttt {L}\) are defined as above (\(t\oplus \texttt {L}\)), and we define \({{\texttt {S}}}\oplus \texttt {L}= {{\texttt {S}}}\) for \(\texttt {L}\in \{0,1\}\). For the rest of the trees, \(t \oplus 0:= t\), and \(t\oplus 1\) is the boundary-labelled tree with the same graphical structure as t and the labels of the boundary Boolean-added by 1 (Here, we define \({{\texttt {S}}}\oplus 1 = {{\texttt {S}}}\) for the \({{\texttt {S}}}\)-labels).

Definition 2.6

(Message configuration). Let \(\dot{\mathscr {M}}_0:= \{0,1,\star \}\) and \(\hat{\mathscr {M}}_0:= \emptyset \). Suppose that \(\dot{\mathscr {M}}_t, \hat{\mathscr {M}}_t\) are defined, and we inductively define \(\dot{\mathscr {M}}_{t+1}, \hat{\mathscr {M}}_{t+1}\) as follows: For \(\hat{\underline{\tau }} \in (\hat{\mathscr {M}}_t)^{d-1}\), \(\dot{\underline{\tau }} \in (\dot{\mathscr {M}}_t)^{k-1}\), we write \(\{\hat{\tau }_i \}:= \{\hat{\tau }_1,\ldots ,\hat{\tau }_{d-1} \}\) and similarly for \(\{\dot{\tau }_i \}\). We define

$$\begin{aligned} \hat{T}\left( \dot{\underline{\tau }} \right) := {\left\{ \begin{array}{ll} 0 &{}{} \{\dot{\tau }_i \} = \{ 1 \};\\ 1 &{}{} \{ \dot{\tau }_i \} = \{0 \};\\ {{\texttt {S}}}&{}{} \{\dot{\tau }_i \} \supseteq \{ 0,1 \};\\ \star &{}{} \star \in \{ \dot{\tau }_i \}, \{0,1 \} \nsubseteq \{\dot{\tau }_i \};\\ \textsf {j}\left( \dot{\underline{\tau }} \right) &{}{} \text{ otherwise }, \end{array}\right. }&\dot{T}(\hat{\underline{\tau }}):= {\left\{ \begin{array}{ll} 0 &{}{} 0 \in \{\hat{\tau }_i \} \subseteq \hat{\mathscr {M}}_t \setminus \{ 1\};\\ 1 &{}{} 1\in \{\hat{\tau }_i\} \subseteq \hat{\mathscr {M}}_t \setminus \{0 \};\\ {\texttt {z}}&{}{} \{0,1 \} \subseteq \{\hat{\tau }_i \};\\ \star &{}{} \star \in \{\hat{\tau }_i \} \subseteq \hat{\mathscr {M}}_t \setminus \{0,1\};\\ \textsf {j}\left( \hat{\underline{\tau }} \right) &{}{} \{\hat{\tau }_i \} \subseteq \hat{\mathscr {M}}_t\setminus \{0,1,\star \}. \end{array}\right. } \end{aligned}$$
(7)

Further, we set \(\dot{\mathscr {M}}_{t+1}:= \dot{\mathscr {M}}_t \cup \dot{T}( \hat{\mathscr {M}}_t^{d-1} ) {\setminus } \{{\texttt {z}}\}\), and \(\hat{\mathscr {M}}_{t+1}:= \hat{\mathscr {M}}_t \cup \hat{T}(\dot{\mathscr {M}}_t^{k-1} )\), and define \(\dot{\mathscr {M}}\) (resp. \(\hat{\mathscr {M}}\)) to be the union of all \(\dot{\mathscr {M}}_t\) (resp. \(\hat{\mathscr {M}}_t\)) and \(\mathscr {M}:= \dot{\mathscr {M}} \times \hat{\mathscr {M}}\). Then, a (valid) message configuration on \(\mathscr {G}=(V,F,E,\underline{\texttt {L}})\) is a configuration \(\underline{\tau }\in \mathscr {M}^E\) that satisfies (i) the local equations given by

$$\begin{aligned} \tau _e = (\dot{\tau }_e, \hat{\tau }_e) = \left( \dot{T}\big (\hat{\underline{\tau }}_{\delta v(e)\setminus e} \big ), \texttt {L}_e \oplus \hat{T} \big ((\underline{\texttt {L}}+ \dot{\underline{\tau }})_{\delta a(e)\setminus e} \big ) \right) , \end{aligned}$$
(8)

for all \(e\in E\), and (ii) if one element of \(\{\dot{\tau }_e,\hat{\tau }_e\}\) equals \(\star \) then the other element is in \(\{0,1\}\).

In the definition, \(\star \) is the symbol introduced to cover cycles, and \({\texttt {z}}\) is an error message. See Figure 1 in Section 2 of [43] for an example of \(\star \) message.

When a frozen configuration \(\underline{x}\) on \(\mathscr {G}\) is given, we can construct a message configuration \(\underline{\tau }\) via the following procedure:

  1. 1.

    For a forcing edge e, set \(\hat{\tau }_e=x_{v(e)}\). Also, for an edge \(e\in E\), if there exists \(e^\prime \in \delta v(e) {\setminus } e\) such that \(\hat{\tau }_{e^\prime } \in \{0,1\}\), then set \(\dot{\tau }_e=x_{v(e)}\).

  2. 2.

    For an edge \(e\in E\), if there exist \(e_1,e_2\in \delta a(e){\setminus } e\) such that \(\{\texttt {L}_{e_1}\oplus \dot{\tau }_{e_1}, \texttt {L}_{e_2}\oplus \dot{\tau }_{e_2}\}=\{0,1\}\), then set \(\hat{\tau }_e = {{\texttt {S}}}\).

  3. 3.

    After these steps, apply the local equations (8) recursively to define \(\dot{\tau }_e\) and \(\hat{\tau }_e\) wherever possible.

  4. 4.

    For the places where it is no longer possible to define their messages until the previous step, set them to be \(\star \).

In fact, the following lemma shows the relation between the frozen and message configurations. We refer to [43], Lemma 2.7 for its proof.

Lemma 2.7

The mapping explained above defines a bijection

$$\begin{aligned} \begin{Bmatrix} \text{ Frozen } \text{ configurations } \underline{x}\in \{0,1,{\texttt {f}}\}^V\\ \text{ without } \text{ free } \text{ cycles } \end{Bmatrix} \quad \longleftrightarrow \quad \begin{Bmatrix} \text{ Message } \text{ configurations }\\ \underline{\tau }\in \mathscr {M}^E \end{Bmatrix}. \end{aligned}$$
(9)

Next, we introduce a dynamic programming method based on belief propagation to calculate the size of a free tree by local quantities from a message configuration.

Definition 2.8

Let \(\mathcal {P}\{0,1\} \) denote the space of probability measures on \(\{0,1\}\). We define the mappings \(\dot{{{\texttt {m}}}}:\dot{\mathscr {M}} \rightarrow \mathcal {P}\{0,1\}\) and \(\hat{{{\texttt {m}}}}: \hat{\mathscr {M}} \rightarrow \mathcal {P}\{0,1\}\) as follows. For \(\dot{\tau }\in \{0,1\}\) and \(\hat{\tau }\in \{0,1\}\), let \(\dot{{{\texttt {m}}}}[\dot{\tau }] =\delta _{\dot{\tau }}\), \(\hat{{{\texttt {m}}}}[\hat{\tau }] = \delta _{\hat{\tau }}\). For \(\dot{\tau }\in \dot{\mathscr {M}} {\setminus } \{0,1,\star \}\) and \(\hat{\tau }\in \hat{\mathscr {M}} {\setminus } \{0,1,\star \}\), \(\dot{{{\texttt {m}}}}[\dot{\tau }]\) and \(\hat{{{\texttt {m}}}}[\hat{\tau }]\) are recursively defined:

  • Let \(\dot{\tau } = \dot{T}(\hat{\tau }_1,\ldots ,\hat{\tau }_{d-1})\), with \(\star \notin \{\hat{\tau }_i \}\). Define

    $$\begin{aligned} \dot{z}[\dot{\tau }] := \sum _{\textbf{x}\in \{0,1\} } \prod _{i=1}^{d-1} \hat{{{\texttt {m}}}}[\hat{\tau }_i](\textbf{x}), \quad \dot{{{\texttt {m}}}}[\dot{\tau }](\textbf{x}) := \frac{1}{\dot{z}[\dot{\tau }]} \prod _{i=1}^{d-1} \hat{{{\texttt {m}}}}[\hat{\tau }_i](\textbf{x}). \end{aligned}$$
    (10)

    Note that these equations are well-defined, since \((\hat{\tau }_1,\ldots , \hat{\tau }_{d-1})\) are well-defined up to permutation.

  • Let \(\hat{\tau } = \hat{T} ( \dot{\tau }_1,\ldots ,\dot{\tau }_{k-1}; \texttt {L})\), with \(\star \notin \{\dot{\tau }_i \}\). Define

    $$\begin{aligned} \hat{z}[\hat{\tau }] := 2-\sum _{\textbf{x}\in \{0,1\}} \prod _{i=1}^{k-1} \dot{{{\texttt {m}}}}[\dot{\tau }_i](\textbf{x}), \quad \hat{{{\texttt {m}}}}[\hat{\tau }](\textbf{x}) := \frac{1}{\hat{z}[\hat{\tau }]} \left\{ 1- \prod _{i=1}^{k-1} \dot{{{\texttt {m}}}}[\dot{\tau }_i](\textbf{x}) \right\} . \end{aligned}$$
    (11)

    Similarly as above, these equations are well-defined.

Moreover, observe that inductively, \(\dot{{{\texttt {m}}}}[\dot{\tau }], \hat{{{\texttt {m}}}}[\hat{\tau }] \) are not Dirac measures unless \(\dot{\tau }, \hat{\tau }\in \{0,1\}\).

It turns out that \(\dot{{{\texttt {m}}}}[\star ], \hat{{{\texttt {m}}}}[\star ]\) can be arbitrary measures for our purpose, and hence we assume that they are uniform measures on \(\{0,1\}\).

The Eqs. (10) and (11) are known as belief propagation equations. We refer the detailed explanation to [43], Section 2 where the same notions are introduced, or to [33], Chapter 14 for more fundamental background. From these quantities, we define the following local weights which are going to lead us to computation of cluster sizes.

$$\begin{aligned} \begin{aligned}&\bar{\varphi } (\dot{\tau }, \hat{\tau }) := \bigg \{ \sum _{\textbf{x}\in \{0,1\}} \dot{{{\texttt {m}}}}[\dot{\tau }](\textbf{x}) \hat{{{\texttt {m}}}}[\hat{\tau }](\textbf{x}) \bigg \}^{-1}\,;\\&\hat{\varphi }^{\text {lit}} (\dot{\tau }_1,\ldots , \dot{\tau }_k):= 1-\sum _{\textbf{x}\in \{0,1\}} \prod _{i=1}^k \dot{{{\texttt {m}}}}[\dot{\tau }_i](\textbf{x})=\frac{\hat{z}\big (\hat{T}\big ((\dot{\tau }_j)_{j\ne i})\big )\big )}{\bar{\varphi }\big (\dot{\tau }_i,\hat{T}\big ((\dot{\tau }_j)_{j\ne i}\big )\big )}\,;\\&\dot{\varphi } (\hat{\tau }_1,\ldots ,\hat{\tau }_d):= \sum _{\textbf{x}\in \{0,1\} }\prod _{i=1}^d \hat{{{\texttt {m}}}}[\hat{\tau }_i](\textbf{x})=\frac{\dot{z}\big (\dot{T}\big ((\hat{\tau }_j)_{j\ne i})\big )\big )}{\bar{\varphi }\big (\dot{T}\big ((\hat{\tau }_j)_{j\ne i}\big ), \hat{\tau }_i\big )}\,, \end{aligned} \end{aligned}$$
(12)

where the last identities in the last two lines hold for any choices of i. These weight factors can be used to derive the size of a free tree. Let \(\mathfrak {t}\) be a free tree in \(\mathscr {F}(\underline{x},\mathscr {G})\), and let \(w^{\text {lit}} (\mathfrak {t}; \underline{x},\mathscr {G})\) be the number of nae-sat solutions that extend \(\underline{x}\) to \(\{0,1\}^{V(\mathfrak {t})}\). Further, let \(\textsf {size}(\underline{x},\mathscr {G})\) denote the total number of nae-sat solutions that extend \(\underline{x}\) to \(\{0,1\}^V.\)

Lemma 2.9

([43], Lemma 2.9 and Corollary 2.10; [33], Ch. 14). Let \(\underline{x}\) be a frozen configuration on \(\mathscr {G}=(V,F,E,\underline{\texttt {L}})\) without any free cycles, and \(\underline{\tau }\) be the corresponding message configuration. For a free tree \(\mathfrak {t}\in \mathscr {F}(\underline{x};\mathscr {G})\), we have that

$$\begin{aligned} w^{\text {lit}}(\mathfrak {t},\underline{x},\mathscr {G})= \prod _{v\in V(\mathfrak {t})} \left\{ \dot{\varphi }(\hat{\underline{\tau }}_{\delta v}) \prod _{e\in \delta v} \bar{\varphi }(\tau _e) \right\} \prod _{a\in F(\mathfrak {t})} \hat{\varphi }^{\text {lit}}\big ( (\dot{\underline{\tau }} \oplus \underline{\texttt {L}})_{\delta a} \big ). \end{aligned}$$
(13)

Furthermore, let \(\Upsilon \in \textsf {CL}(\mathscr {G})\) be the cluster corresponding to \(\underline{x}\). Then, we have

$$\begin{aligned} \textsf {size}(\underline{x};\mathscr {G})= |\Upsilon | = \prod _{v\in V} \dot{\varphi } (\hat{\underline{\tau }}_{\delta v}) \prod _{a\in F} \hat{\varphi }^{\text {lit}}\big ((\dot{\underline{\tau }}\oplus \underline{\texttt {L}})_{\delta a} \big ) \prod _{e \in E} \bar{\varphi } (\tau _e). \end{aligned}$$

2.3 Colorings

In this subsection, we introduce the coloring configuration, which is a simplification of the message configuration. We give its definition analogously as in [43].

Recall the definition of \(\mathscr {M}=\dot{\mathscr {M}}\times \hat{\mathscr {M}}, \) and let \(\{ {\texttt {F}}\} \subset \mathscr {M}\) be defined by \(\{{\texttt {F}}\}:= \{\tau \in \mathscr {M}: \, \dot{\tau } \notin \{ 0,1,\star \} \text { and } \hat{\tau }\notin \{ 0,1,\star \} \}\).

Note that \(\{{\texttt {F}}\}\) corresponds to the messages on the edges of free trees, except the boundary edges labelled either 0 or 1.

Define \(\Omega := \{{{{\texttt {R}}}}_0, {{{\texttt {R}}}}_1, {{{\texttt {B}}}}_0, {{{\texttt {B}}}}_1\} \cup \{{\texttt {F}}\}\) and let \(\textsf {S}: \mathscr {M}\setminus \{(\star ,\star )\} \rightarrow \Omega \) be the projections given by

$$\begin{aligned} \textsf {S}(\tau ) := {\left\{ \begin{array}{ll} {{{\texttt {R}}}}_0 &{} \hat{\tau }=0;\\ {{{\texttt {R}}}}_1 &{} \hat{\tau }=1;\\ {{{\texttt {B}}}}_0 &{} \hat{\tau } \ne 0, \, \dot{\tau }=0;\\ {{{\texttt {B}}}}_1 &{} \hat{\tau } \ne 1, \, \dot{\tau }=1;\\ \tau &{} \text {otherwise, i.e., } \tau \in \{ {\texttt {F}}\}. \end{array}\right. } \end{aligned}$$

Here, we note that a (valid) message configuration \(\underline{\tau }=(\tau _e)_{e\in E} \in \mathscr {M}^{E}\) cannot have an edge e such that \(\tau _e=(\star ,\star )\) (see Definition 2.6), thus we may safely exclude the spin \((\star ,\star )\) from \(\mathscr {M}\).

For convenience, we abbreviate \(\{{{{\texttt {R}}}}\}= \{{{{\texttt {R}}}}_0, {{{\texttt {R}}}}_1 \}\) and \(\{{{{\texttt {B}}}}\} = \{{{{\texttt {B}}}}_0, {{{\texttt {B}}}}_1 \}\), and define the Boolean addition as \({{{\texttt {B}}}}_\textbf{x}\oplus \texttt {L}:= {{{\texttt {B}}}}_{\textbf{x}\oplus \texttt {L}}\), and similarly for \({{{\texttt {R}}}}_\textbf{x}\). Also, for \(\sigma \in \{ {{{\texttt {R}}}},{{{\texttt {B}}}},{{\texttt {S}}}\}\), we set \(\dot{\sigma }:= \sigma =:\hat{\sigma }\).

Definition 2.10

(Colorings). For \(\underline{\sigma }\in \Omega ^d\), let

$$\begin{aligned} \dot{I}(\underline{\sigma }) : = {\left\{ \begin{array}{ll} 1 &{} {{{\texttt {R}}}}_0 \in \{\sigma _i\} \subseteq \{{{{\texttt {R}}}}_0, {{{\texttt {B}}}}_0 \};\\ 1&{} {{{\texttt {R}}}}_1 \in \{\sigma _i\} \subseteq \{{{{\texttt {R}}}}_1,{{{\texttt {B}}}}_1 \};\\ 1 &{} \{\sigma _i \} \subseteq \{{{\texttt {S}}}\}\cup \{ {{\texttt {F}}} \}, \text { and } \dot{\sigma }_i = \dot{T}\big ( (\hat{\sigma }_j)_{j\ne i};0 \big ), \ \forall i;\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Also, define \( \hat{I}^{\text {lit}}: \Omega ^k \rightarrow \mathbb {R}\) to be

$$\begin{aligned} \begin{aligned} \hat{I}^{\text {lit}}(\underline{\sigma })&:= {\left\{ \begin{array}{ll} 1 &{} \exists i:\, \sigma _i = {{{\texttt {R}}}}_0 \text { and } \{\sigma _j \}_{j\ne i} = \{{{{\texttt {B}}}}_1 \};\\ 1 &{} \exists i:\, \sigma _i = {{{\texttt {R}}}}_1 \text { and } \{\sigma _j \}_{j\ne i} = \{{{{\texttt {B}}}}_0 \};\\ 1 &{} \{{{{\texttt {B}}}}\} \subseteq \{\sigma _i \} \subseteq \{{{{\texttt {B}}}}\} \cup \{\sigma \in \{{\texttt {F}}\}: \,\hat{\sigma }={{\texttt {S}}}\};\\ 1 &{} \{\sigma _i \} \subseteq \{ {{{\texttt {B}}}}_0, {{\texttt {F}}} \}, \, |\{i: \sigma _i\in \{{{\texttt {F}}} \}\} | \ge 2 , \text { and } \hat{\sigma }_i = \hat{T}((\dot{\sigma }_j)_{j\ne i}; 0), \ \forall i \text { s.t. } \sigma _i \ne {{{\texttt {B}}}}_0;\\ 1 &{} \{\sigma _i \} \subseteq \{ {{{\texttt {B}}}}_1, {{\texttt {F}}} \}, \, |\{i: \sigma _i\in \{{{\texttt {F}}} \}\}| \ge 2 , \text { and } \hat{\sigma }_i = \hat{T}((\dot{\sigma }_j)_{j\ne i}; 0), \ \forall i \text { s.t. } \sigma _i \ne {{{\texttt {B}}}}_1;\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned} \end{aligned}$$

On a nae-sat instance \(\mathscr {G}= (V,F,E,\underline{\texttt {L}})\), \(\underline{\sigma }\in \Omega ^E\) is a (valid) coloring if \(\dot{I}(\underline{\sigma }_{\delta v})=\hat{I}^{\text {lit}}((\underline{\sigma }\oplus \underline{\texttt {L}})_{\delta a}) =1 \) for all \(v\in V, a\in F\).

Given nae-sat instance \(\mathscr {G}\), it was shown in Lemma 2.12 of [43] that there is a bijection

$$\begin{aligned} \begin{Bmatrix} \text {message configurations}\\ \underline{\tau }\in \mathscr {M}^E \end{Bmatrix} \ \longleftrightarrow \ \begin{Bmatrix} \text {colorings} \\ \underline{\sigma }\in \Omega ^E\,. \end{Bmatrix} \end{aligned}$$
(14)

The weight elements for coloring, denoted by \(\dot{\Phi }, \hat{\Phi }^{\text {lit}}, \bar{\Phi }\), are defined as follows. For \(\underline{\sigma }\in \Omega ^d,\) let

$$\begin{aligned} \begin{aligned} \dot{\Phi }(\underline{\sigma }) := {\left\{ \begin{array}{ll} \dot{\varphi }(\hat{\underline{\sigma }}) &{} \dot{I}(\underline{\sigma }) =1 \text { and } \{\sigma _i \} \subseteq \{{\texttt {F}}\};\\ 1 &{} \dot{I}(\underline{\sigma }) =1 \text { and } \{\sigma _i \}\subseteq \{{{{\texttt {B}}}}, {{{\texttt {R}}}}\};\\ 0 &{} \text {otherwise, i.e., } \dot{I}(\underline{\sigma })=0. \end{array}\right. } \end{aligned} \end{aligned}$$

For \(\underline{\sigma }\in \Omega ^k\), let

$$\begin{aligned} \hat{\Phi }^{\text {lit}}(\underline{\sigma }) := {\left\{ \begin{array}{ll} \hat{\varphi }^\text {lit}((\dot{\tau }(\sigma _i))_i) &{} \hat{I}^{\text {lit}} (\underline{\sigma }) = 1 \text { and } \{\sigma _i \} \cap \{{{{\texttt {R}}}}\} = \emptyset ;\\ 1 &{} \hat{I}^{\text {lit}}(\underline{\sigma }) = 1 \text { and } \{\sigma _i \} \cap \{{{{\texttt {R}}}}\} \ne \emptyset ;\\ 0 &{} \text {otherwise, i.e., } \hat{I}^{\text {lit}}(\underline{\sigma })=0. \end{array}\right. } \end{aligned}$$

(If \(\sigma \notin \{{{{\texttt {R}}}}\}, \) then \(\dot{\tau }(\sigma _i)\) is well-defined.)

Lastly, let

$$\begin{aligned} \bar{\Phi }(\sigma ) := {\left\{ \begin{array}{ll} \bar{\varphi } (\sigma ) &{} \sigma \in \{{\texttt {F}}\};\\ 1 &{} \sigma \in \{{{{\texttt {R}}}}, {{{\texttt {B}}}}\}. \end{array}\right. } \end{aligned}$$

Note that if \(\hat{\sigma }={{\texttt {S}}}\), then \(\bar{\varphi }(\dot{\sigma },\hat{\sigma })=2\) for any \(\dot{\sigma }\). The rest of the details explaining the compatibility of \(\varphi \) and \(\Phi \) can be found in [43], Section 2.4. Then, the formula for the cluster size we have seen in Lemma 2.9 works the same for the coloring configuration.

Lemma 2.11

([43], Lemma 2.13). Let \(\underline{x}\in \{0,1,\text {{\texttt {f}}}\}^V\) be a frozen configuration on \(\mathscr {G}=(V,F,E,\underline{\texttt {L}})\), and let \(\underline{\sigma }\in \Omega ^E\) be the corresponding coloring. Define

$$\begin{aligned} w_\mathscr {G}^{\text {lit}}(\underline{\sigma }):= \prod _{v\in V}\dot{\Phi }(\underline{\sigma }_{\delta v}) \prod _{a\in F} \hat{\Phi }^{\text {lit}} ((\underline{\sigma }\oplus \underline{\texttt {L}})_{\delta a}) \prod _{e\in E} \bar{\Phi }(\sigma _e). \end{aligned}$$

Then, we have \(\textsf {size}(\underline{x};\mathscr {G}) = w_\mathscr {G}^{\text {lit}}(\underline{\sigma })\).

Among the valid frozen configurations, we can ignore the contribution from the configurations with too many free or red colors, as observed in the following lemma.

Lemma 2.12

([25] Proposition 2.2, [43] Lemma 3.3). For a frozen configuration \(\underline{x}\in \{0,1,\text {{\texttt {f}}}\}^{V}\), let \({{{\texttt {R}}}}(\underline{x})\) count the number of forcing edges and \(\text {{\texttt {f}}}(\underline{x})\) count the number of free variables. There exists an absolute constant \(c>0\) such that for \(k\ge k_0\), \(\alpha \in [\alpha _{\textsf {lbd}}, \alpha _{\textsf {ubd}}]\), and \(\lambda \in (0,1]\),

$$\begin{aligned} \sum _{\underline{x}\in \{0,1,\text {{\texttt {f}}}\}^V} \mathbb {E}\left[ \textsf {size}(\underline{x};\mathscr {G})^\lambda \right] \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{x})}{nd} \vee \frac{\text {{\texttt {f}}}(\underline{x})}{n}> \frac{7}{2^k} \right\} \le e^{-cn}, \end{aligned}$$

where \(\textsf {size}(\underline{x};\mathscr {G})\) is the number of nae-sat solutions \(\underline{{\textbf {x}}}\in \{0,1\}^{V}\) which extends \(\underline{x}\in \{0,1,\text {{\texttt {f}}}\}^{V}\).

Thus, our interest is in counting the number of frozen configurations and colorings such that the fractions of red edges and the fraction of free variables are bounded by \(7/2^k\). To this end, we define

$$\begin{aligned} \begin{aligned}&\text {{\textbf {Z}}}_{\lambda }:= \sum _{\underline{x}\in \{0,1,\text {{\texttt {f}}}\}^V} \textsf {size}(\underline{x};\mathscr {G})^\lambda \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{x})}{nd} \vee \frac{\text {{\texttt {f}}}(\underline{x})}{n}\le \frac{7}{2^k} \right\} ;\\&\text {{\textbf {Z}}}_{\lambda }^{\text {tr}}:= \sum _{\underline{\sigma }\in \Omega ^E}w_\mathscr {G}^{\text {lit}} (\underline{\sigma })^\lambda \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{\sigma })}{nd} \vee \frac{\text {{\texttt {f}}}(\underline{\sigma })}{n} \le \frac{7}{2^k} \right\} ;\\&\text {{\textbf {Z}}}_{\lambda ,s}:= \sum _{\underline{x}\in \{0,1,\text {{\texttt {f}}}\}^V} \textsf {size}(\underline{x};\mathscr {G})^\lambda \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{x})}{nd} \vee \frac{\text {{\texttt {f}}}(\underline{x})}{n}\le \frac{7}{2^k}, ~~~~~e^{ns} \le \textsf {size}(\underline{x};\mathscr {G})< e^{ns+1}\right\} ;\\&\text {{\textbf {Z}}}_{\lambda ,s}^{\text {tr}}:= \sum _{\underline{\sigma }\in \Omega ^E} w_\mathscr {G}^{\text {lit}}(\underline{\sigma })^{\lambda } \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{\sigma })\vee {{\texttt {S}}}(\hat{\underline{\sigma }})}{nd} \le \frac{7}{2^k},~~~~~e^{ns} \le w_\mathscr {G}^{\text {lit}}(\underline{\sigma }) < e^{ns+1} \right\} , \end{aligned} \end{aligned}$$
(15)

where \({{{\texttt {R}}}}(\underline{\sigma })\) count the number of red edges and \(\text {{\texttt {f}}}(\underline{\sigma })\) count the number of free variables. The superscript \(\text {tr}\) is to emphasize that the above quantities count the contribution from frozen configurations which only contain free trees, i.e. no free cycles (Recall that by Lemma 2.7 and (14), the space of coloring has a bijective correspondence with the space of frozen configurations without free cycles). Similarly, recalling the definition of \(\overline{\text {{\textbf {N}}}}_s\) in (3), total number of clusters of size in \([e^{ns},e^{ns+1})\), \({{\textbf {N}}}_s\) is defined to be

$$\begin{aligned} {{\textbf {N}}}_s:={{\textbf {Z}}}_{0,s}\quad \text{ and }\quad {{\textbf {N}}}_s^{\text{ tr }}:={{\textbf {Z}}}^{\text{ tr }}_{0,s}. \end{aligned}$$

Hence, \(e^{-n\lambda s-\lambda }{{\textbf {Z}}}_{\lambda ,s}\le {{\textbf {N}}}_{s}\le e^{-n\lambda s} {{\textbf {Z}}}_{\lambda ,s}\) holds.

Definition 2.13

(Truncated colorings). Let \(1\le L< \infty \), \(\underline{x}\) be a frozen configuration on \(\mathscr {G}\) without free cycles and \(\underline{\sigma }\in \Omega ^E\) be the coloring corresponding to \(\underline{x}\). Recalling the notation \(\mathscr {F}(\underline{x};\mathscr {G})\) (Definition 2.5), we say \(\underline{\sigma }\) is a (valid) L-truncated coloring if \(|V(\mathfrak {t})| \le L\) for all \(\mathfrak {t}\in \mathscr {F}(\underline{x};\mathscr {G})\). For an equivalent definition, let \(|\sigma |:=v(\dot{\sigma })+v(\hat{\sigma })-1\) for \(\sigma \in \{{\texttt {F}}\}\), where \(v(\dot{\sigma })\) (resp. \(v(\hat{\sigma })\)) denotes the number of variables in \(\dot{\sigma }\) (resp. \(\hat{\sigma }\)). Define \(\Omega _L:= \{{{{\texttt {R}}}},{{{\texttt {B}}}}\}\cup \{{\texttt {F}}\}_L\), where \(\{{\texttt {F}}\}_L\) is the collection of \(\sigma \in \{{\texttt {F}}\}\) such that \(|\sigma | \le L\). Then, \(\underline{\sigma }\) is a (valid) L-truncated coloring if \(\underline{\sigma }\in \Omega _L^E\).

To clarify the names, we often call the original coloring \(\underline{\sigma }\in \Omega ^E\) the untruncated coloring.

Analogous to (15), define the truncated partition function

$$\begin{aligned} \begin{aligned}&{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }} := \sum _{\underline{\sigma }\in \Omega _L^E}w_\mathscr {G}^{\text{ lit }} (\underline{\sigma })^\lambda \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{\sigma })}{nd} \vee \frac{{\texttt {f}}(\underline{\sigma })}{n} \le \frac{7}{2^k} \right\} ;\\ {}&{{\textbf {Z}}}_{\lambda ,s}^{(L), \text{ tr }}:= \sum _{\underline{\sigma }\in \Omega _L^E} w_\mathscr {G}^{\text{ lit }}(\underline{\sigma })^{\lambda } \mathbb {1}\left\{ \frac{{{{\texttt {R}}}}(\underline{\sigma })}{nd} \vee \frac{{\texttt {f}}(\underline{\sigma })}{n} \le \frac{7}{2^k},~~~~~e^{ns} \le w_\mathscr {G}^{\text{ lit }}(\underline{\sigma }) < e^{ns+1} \right\} . \end{aligned} \end{aligned}$$

2.4 Averaging over the literals

Let \(\mathscr {G}=(V,F,E,\underline{\texttt {L}})\) be a nae-sat instance and \(\mathcal {G}=(V,F,E)\) be the factor graph without the literal assignment. Let \(\mathbb {E}^{\text {lit}}\) denote the expectation over the literals \(\underline{\texttt {L}}\sim \text {Unif} [\{0,1\}^E]\). Then, for a coloring \(\underline{\sigma }\in \Omega ^{E}\), we can use Lemma 2.11 to write \(\mathbb {E}^{\text {lit}}[w_\mathscr {G}^{\text {lit}}(\underline{\sigma }) ]\) as

$$\begin{aligned} w_{\mathscr {G}}(\underline{\sigma })^{\lambda }:=\mathbb {E}^{\text {lit}} [ w_\mathscr {G}^{\text {lit}}(\underline{\sigma })^\lambda ] = \prod _{v\in V} \dot{\Phi }(\underline{\sigma }_{\delta v})^\lambda \prod _{a\in F} \mathbb {E}^{\text {lit}} \hat{\Phi }^{\text {lit}}((\underline{\sigma }\oplus \underline{\texttt {L}})_{\delta a})^\lambda \prod _{e\in E} \bar{\Phi }(\sigma _e)^\lambda . \end{aligned}$$

To this end, define

$$\begin{aligned} \hat{\Phi }(\underline{\sigma }_{\delta a})^\lambda := \mathbb {E}^{\text {lit}}[ \hat{\Phi }^{\text {lit}}((\underline{\sigma }\oplus \underline{\texttt {L}})_{\delta a})^\lambda ]. \end{aligned}$$

We now recall a property of \(\hat{\Phi }^{\text {lit}}\) from [43], Lemma 2.17:

Lemma 2.14

([43], Lemma 2.17). \(\hat{\Phi }^{\text {lit}}\) can be factorized as \(\hat{\Phi }^{\text {lit}}(\underline{\sigma }\oplus \underline{\texttt {L}}) = \hat{I}^{\text {lit}}(\underline{\sigma }\oplus \underline{\texttt {L}}) \hat{\Phi }^{\text {m}}(\underline{\sigma })\) for

$$\begin{aligned} \hat{\Phi }^{\text {m}}(\underline{\sigma }) := \max \big \{\hat{\Phi }^{\text {lit}}(\underline{\sigma }\oplus \underline{\texttt {L}}): \underline{\texttt {L}}\in \{0,1\}^k \big \}= {\left\{ \begin{array}{ll} 1 &{} \underline{\sigma }\in \{{{{\texttt {R}}}},{{{\texttt {B}}}}\}^{k},\\ \frac{\hat{z}[\hat{\sigma }_j]}{\bar{\varphi }(\sigma _j)} &{}\underline{\sigma }\in \Omega ^{k}\text { with } \sigma _j \in \{{\texttt {F}}\}. \end{array}\right. } \end{aligned}$$
(16)

As a consequence, we can write \(\hat{\Phi }(\underline{\sigma })^\lambda = \hat{\Phi }^{\text {m}}(\underline{\sigma })^\lambda \hat{v}(\underline{\sigma })\), where

$$\begin{aligned} \hat{v}(\underline{\sigma }) := \mathbb {E}^{\text {lit}} [ \hat{I}^{\text {lit}}(\underline{\sigma }\oplus \underline{\texttt {L}})]. \end{aligned}$$
(17)

2.5 Empirical profile of colorings

The coloring profile, defined below, was introduced in [43]. Hereafter, \(\mathscr {P}(\mathfrak {X})\) denotes the space of probability measures on \(\mathfrak {X}\).

Definition 2.15

(Coloring profile and the simplex of coloring profile, Definition 3.1 and 3.2 of [43]). Given a nae-sat instance \(\mathscr {G}\) and a coloring configuration \(\underline{\sigma }\in \Omega ^E \), the coloring profile of \(\underline{\sigma }\) is the triple \(H[\underline{\sigma }]\equiv H\equiv (\dot{H},\hat{H},\bar{H}) \) defined as follows.

$$\begin{aligned} \begin{aligned} \dot{H}\in \mathscr {P}(\Omega ^d), \quad&\dot{H}(\underline{\tau }) = |\{v\in V: \underline{\sigma }_{\delta v}=\underline{\tau } \} | / |V| \quad \text {for all } \underline{\tau }\in \Omega ^d;\\ \hat{H}\in \mathscr {P}(\Omega ^k), \quad&\hat{H}(\underline{\tau }) = |\{a\in F: \underline{\sigma }_{\delta a}=\underline{\tau } \} | / |F| \quad \text {for all } \underline{\tau }\in \Omega ^k;\\ \bar{H}\in \mathscr {P}(\Omega ), \quad&\bar{H}(\tau ) = |\{e\in E: \sigma _e=\tau \} | / |E| \quad \text {for all } \tau \in \Omega . \end{aligned} \end{aligned}$$

A valid H must satisfy the following compatibility equation:

$$\begin{aligned} \frac{1}{d} \sum _{\underline{\tau }\in \Omega ^{d}}\dot{H}(\underline{\tau })\sum _{i=1}^{d}\mathbb {1}\{\tau _i=\tau \} = \bar{H}(\tau ) = \frac{1}{k}\sum _{\underline{\tau }\in \Omega ^k}\hat{H}(\underline{\tau })\sum _{j=1}^{k}\mathbb {1}\{\tau _j = \tau \}\quad \text {for all}\quad \tau \in \Omega \,. \end{aligned}$$
(18)

The simplex of coloring profile \(\varvec{\Delta }\) is the space of triples \(H=(\dot{H},\hat{H},\bar{H})\) which satisfies the following conditions:

  • \(\dot{H} \in \mathscr {P}(\text {supp}\,\dot{\Phi }), \hat{H} \in \mathscr {P}(\text {supp}\,\hat{\Phi })\) and \(\bar{H} \in \mathscr {P}(\Omega )\).

  • \(\dot{H},\hat{H}\) and \(\bar{H}\) satisfy (18).

  • Recalling the definition of \(\text {{\textbf {Z}}}_{\lambda }\) in (15), \(\dot{H},\hat{H}\) and \(\bar{H}\) satisfy \(\max \{\bar{H}({\texttt {f}}),\bar{H}({{{\texttt {R}}}})\} \le \frac{7}{2^k}\).

For \(L <\infty \), we let \(\varvec{\Delta }^{(L)}\) be the subspace of \(\varvec{\Delta }\) satisfying the following extra condition:

  • \(\dot{H} \in \mathscr {P}(\text {supp}\,\dot{\Phi }\cap \Omega _{L}^{d}), \hat{H} \in \mathscr {P}(\text {supp}\,\hat{\Phi }\cap \Omega _L^{k})\) and \(\bar{H} \in \mathscr {P}(\Omega _L)\).

Given a coloring profile \(H\in \varvec{\Delta }\), denote \({{\textbf {Z}}}_{\lambda }^{\text{ tr }}[H]\) by the contribution to \({{\textbf {Z}}}_{\lambda }^{\text{ tr }}\) from the coloring configurations whose coloring profile is H. That is, \({{\textbf {Z}}}_\lambda ^{\text{ tr }}[H]\!:=\! \sum _{\underline{\sigma }:\; H[\underline{\sigma }] = H} w^{\text{ lit }}(\underline{\sigma })^\lambda \). For \(H \in \varvec{\Delta }^{(L)}\), \({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }[H]\) is analogously defined. In [43], they showed that \(\mathbb {E}{{\textbf {Z}}}^{(L),\text{ tr }}_\lambda [H]\) for the L-truncated coloring model can be written as the following formula, which is a result of Stirling’s approximation:

$$\begin{aligned} \begin{aligned}&\mathbb {E}{{\textbf {Z}}}^{(L),\text{ tr }}_\lambda [H] = n^{O_{L}(1)} \exp \left\{ n F_{\lambda ,L}(H)\right\} \quad \text{ for }\\ {}&\quad F_{\lambda ,L}(H):=\Sigma (H)+\lambda s(H),\quad H\in \varvec{\Delta }^{(L)},\quad \text{ where }\\ {}&\Sigma (H):=\sum _{\underline{\sigma }\in \Omega ^{d}}\dot{H}(\underline{\sigma }) \log \Big (\frac{1}{\dot{H}(\underline{\sigma })}\Big ) + \frac{d}{k} \sum _{\underline{\sigma }\in \Omega ^{k}} \hat{H}(\underline{\sigma }) \log \Big (\frac{\hat{v}(\underline{\sigma })}{\hat{H}(\underline{\sigma })} \Big ) \\ {}&\quad \quad \quad + d\sum _{\sigma \in \Omega } \bar{H}(\sigma ) \log \big (\bar{H}(\sigma )\big )\quad \text{ and }\\ {}&s(H):=\sum _{\underline{\sigma }\in \Omega ^{d}}\dot{H}(\underline{\sigma }) \log \big (\dot{\Phi }(\underline{\sigma })\big ) + \frac{d}{k} \sum _{\underline{\sigma }\in \Omega ^{k}} \hat{H}(\underline{\sigma }) \log \big (\hat{\Phi }^{\text{ m }}(\underline{\sigma }) \big ) + d\sum _{\sigma \in \Omega } \bar{H}(\sigma ) \log \big (\bar{\Phi }(\sigma )\big ). \end{aligned} \end{aligned}$$
(19)

Similar to \(F_{\lambda ,L}(H)\) for \(H\in \varvec{\Delta }^{(L)}\), the untruncated free energy \(F_{\lambda }(H)\) for \(H\in \varvec{\Delta }\) is defined by the same equation \(F_{\lambda }(H):=\Sigma (H)+\lambda s(H)\).

2.6 Belief propagation fixed point and optimal profiles

It was proven in [43] that the truncated free energy \(F_{\lambda ,L}(H)\) is maximized at the optimal profile \(H^\star _{\lambda ,L}\), defined in terms of Belief Propagation(BP) fixed point. In this subsection, we review the necessary notions to define \(H^\star _{\lambda ,L}\) (cf. Section 5 of [43]). To do so, we first define the BP functional: for probability measures \(\dot{\textbf{q}},\hat{\textbf{q}}\in \mathscr {P}(\Omega _L)\), where \(L<\infty \), let

$$\begin{aligned} \begin{aligned}&[\dot{\textbf{B}}_{1,\lambda }(\hat{\textbf{q}})](\sigma )\cong \bar{\Phi }(\sigma )^\lambda \sum _{\underline{\sigma }\in \Omega _L^{d}}\mathbb {1}\{\sigma _1=\sigma \}\dot{\Phi }(\underline{\sigma })^{\lambda }\prod _{i=2}^{d}\hat{\textbf{q}}(\sigma _i);\\&[\hat{\textbf{B}}_{1,\lambda }(\dot{\textbf{q}})](\sigma )\cong \bar{\Phi }(\sigma )^\lambda \sum _{\underline{\sigma }\in \Omega _L^{k}}\mathbb {1}\{\sigma _1=\sigma \}\hat{\Phi }(\underline{\sigma })^{\lambda }\prod _{i=2}^{k}\dot{\textbf{q}}(\sigma _i), \end{aligned} \end{aligned}$$
(20)

where \(\sigma \in \Omega _L\) and \(\cong \) denotes equality up to normalization, so that the output is a probability measure. Now, restrict the domain to the probability measures with one-sided dependence, i.e. satisfying \(\dot{\textbf{q}}(\sigma )=\dot{f}(\dot{\sigma })\) and \(\hat{\textbf{q}}(\sigma )=\hat{f}(\hat{\sigma })\) for some \(\dot{f}:\dot{\Omega }_L\rightarrow \mathbb {R}_{\ge 0}\) and \(\hat{f}:\hat{\Omega }_L\rightarrow \mathbb {R}_{\ge 0}\). It can be checked that \(\dot{\textbf{B}}_{1,\lambda }, \hat{\textbf{B}}_{1,\lambda }\) preserve the one-sided property, inducing

$$\begin{aligned} \dot{\text {BP}}_{\lambda ,L}:\mathscr {P}(\hat{\Omega }_{L}) \rightarrow \mathscr {P}(\dot{\Omega }_{L}),\quad \hat{\text {BP}}_{\lambda ,L}:\mathscr {P}(\dot{\Omega }_L) \rightarrow \mathscr {P}(\hat{\Omega }_L). \end{aligned}$$

More precisely, for \(\hat{q}\in \mathscr {P}(\hat{\Omega }_L)\) and \(\dot{q} \in \mathscr {P}(\dot{\Omega }_L)\), define the probability measures \(\dot{\text {BP}}_{\lambda ,L}(\hat{q})\in \mathscr {P}(\dot{\Omega }_L)\) and \(\hat{\text {BP}}_{\lambda ,L}(\dot{q})\in \mathscr {P}(\hat{\Omega }_L)\) as follows. For \(\dot{\sigma }\in \dot{\Omega }_L\) and \(\hat{\sigma }\in \hat{\Omega }_L\), let

$$\begin{aligned} \begin{aligned}&[\dot{\text {BP}}_{\lambda ,L}(\hat{q})](\dot{\sigma })=\big (\dot{\mathscr {Z}}_{\hat{q}}\big )^{-1} \cdot \bar{\Phi }(\dot{\sigma }, \hat{\sigma }^\prime )^\lambda \sum _{\underline{\sigma }\in \Omega _L^{d}}\mathbb {1}\{\sigma _1=(\dot{\sigma },\hat{\sigma }^\prime )\}\dot{\Phi }(\underline{\sigma })^{\lambda }\prod _{i=2}^{d}\hat{q}(\hat{\sigma }_i)\,,\\&[\hat{\text {BP}}_{\lambda ,L}(\dot{q})](\hat{\sigma })=\big (\hat{\mathscr {Z}}_{\dot{q}}\big )^{-1} \cdot \bar{\Phi }(\dot{\sigma }^\prime , \hat{\sigma })^\lambda \sum _{\underline{\sigma }\in \Omega _L^{k}}\mathbb {1}\{\sigma _1=(\dot{\sigma }^\prime , \hat{\sigma })\}\dot{\Phi }(\underline{\sigma })^{\lambda }\prod _{i=2}^{k}\dot{q}(\dot{\sigma }_i)\,, \end{aligned} \end{aligned}$$
(21)

where \(\hat{\sigma }^\prime \in \hat{\Omega }_L\) and \(\dot{\sigma }^\prime \in \dot{\Omega }_L\) are arbitrary with the only exception that when \(\dot{\sigma }\in \{{{{\texttt {R}}}},{{{\texttt {B}}}}\}\) (resp. \(\hat{\sigma }\in \{{{{\texttt {R}}}},{{{\texttt {B}}}}\}\)), then we take \(\hat{\sigma }^\prime = \dot{\sigma }\) (resp. \(\dot{\sigma }^\prime = \hat{\sigma }\)) so that the RHS above is non-zero. From the definition of \(\dot{\Phi },\hat{\Phi }\), and \(\bar{\Phi }\), it can be checked that the choices of \(\hat{\sigma }^\prime \in \hat{\Omega }_L\) and \(\dot{\sigma }^\prime \in \dot{\Omega }_L\) do not affect the values of the RHS above (see (12)). The normalizing constants \(\dot{\mathscr {Z}}_{\hat{q}}\) and \(\hat{\mathscr {Z}}_{\dot{q}}\) are given by

$$\begin{aligned} \begin{aligned}&\dot{\mathscr {Z}}_{\hat{q}}\equiv \sum _{\dot{\sigma } \in \dot{\Omega }_L} \bar{\Phi }(\dot{\sigma }, \hat{\sigma }^\prime )^\lambda \sum _{\underline{\sigma }\in \Omega _L^{d}}\mathbb {1}\{\sigma _1=(\dot{\sigma },\hat{\sigma }^\prime )\}\dot{\Phi }(\underline{\sigma })^{\lambda }\prod _{i=2}^{d}\hat{q}(\hat{\sigma }_i)\,,\\&\hat{\mathscr {Z}}_{\dot{q}}\equiv \sum _{\hat{\sigma } \in \hat{\Omega }_L}\bar{\Phi }(\dot{\sigma }^\prime , \hat{\sigma })^\lambda \sum _{\underline{\sigma }\in \Omega _L^{k}}\mathbb {1}\{\sigma _1=(\dot{\sigma }^\prime , \hat{\sigma })\}\dot{\Phi }(\underline{\sigma })^{\lambda }\prod _{i=2}^{k}\dot{q}(\dot{\sigma }_i)\,. \end{aligned} \end{aligned}$$
(22)

Here, \(\hat{\sigma }^\prime \in \hat{\Omega }_L\) and \(\dot{\sigma }^\prime \in \dot{\Omega }_L\) are again arbitrary. We then define the Belief Propagation functional by \(\text {BP}_{\lambda ,L}:= \dot{\text {BP}}_{\lambda ,L}\circ \hat{\text {BP}}_{\lambda ,L}\). The untruncated BP map, which we denote by \(\text {BP}_{\lambda }:\mathscr {P}(\dot{\Omega }) \rightarrow \mathscr {P}(\dot{\Omega })\), is analogously defined, where we replace \(\dot{\Omega }_L\)(resp. \(\hat{\Omega }_L\)) with \(\dot{\Omega }\)(resp. \(\hat{\Omega }\)).

Remark 2.16

In defining the untruncated BP map, note that \(\dot{\Omega }\) and \(\hat{\Omega }\) are not a finite set, thus the normalizing constant, analogue of (22), may be infinite. However, from the definitions of \(\dot{\Phi },\hat{\Phi }\), and \(\bar{\Phi }\), we have that \(\bar{\Phi }(\sigma _1)\dot{\Phi }(\underline{\sigma })\le 2\) and \(\bar{\Phi }(\tau _1)\dot{\Phi }(\underline{\tau })\le 2\) for \(\underline{\sigma }=(\sigma _1,\ldots ,\sigma _d) \in \Omega ^{d}\) and \(\underline{\tau }=(\tau _1,\ldots ,\tau _k) \in \Omega ^k\). Thus, it follows that the normalizing constants for the untruncated BP map are at most 2. We also remark that \(\underline{\sigma }=((\dot{\sigma }_1,\hat{\sigma }_1),\ldots ,(\dot{\sigma }_d,\hat{\sigma }_d))\in \Omega ^{d}\) such that \(\dot{\Phi }(\underline{\sigma })\ne 0\) is fully determined by \((\dot{\sigma }_1,\hat{\sigma }_1)\) and \(\hat{\sigma }_2,\ldots ,\hat{\sigma }_d\). Thus, the second sum \(\underline{\sigma }\in \Omega _L^d\) in the definition of \(\dot{\mathscr {Z}}_{\hat{q}}\) in (22) can be replaced with the sum over \(\sigma _1\in \Omega , \hat{\sigma }_2,\ldots , \hat{\sigma }_d\in \hat{\Omega }\). The analogous remark holds for the \(\hat{\mathscr {Z}}_{\dot{q}}\) and for the untruncated model.

Let \(\mathbf {\Gamma }_C\) be the set of \(\dot{q} \in \mathscr {P}(\dot{\Omega })\) such that

$$\begin{aligned} \dot{q}(\dot{\sigma })=\dot{q}(\dot{\sigma }\oplus 1)\quad \text{ for }\quad \dot{\sigma } \in \dot{\Omega },\quad \text{ and }\quad \frac{\dot{q}({{{\texttt {R}}}})+2^k\dot{q}({\texttt {f}})}{C}\le \dot{q}({{{\texttt {B}}}}) \le \frac{\dot{q}({{{\texttt {R}}}})}{1-C2^{-k}}, \end{aligned}$$
(23)

where \(\{{{{\texttt {R}}}}\}\equiv \{{{{\texttt {R}}}}_0,{{{\texttt {R}}}}_1\}\) and \(\{{{{\texttt {B}}}}\}\equiv \{{{{\texttt {B}}}}_0,{{{\texttt {B}}}}_1\}\). The proposition below shows that the BP map contracts in the set \(\mathbf {\Gamma }_C\) for large enough C, which guarantees the existence of Belief Propagation fixed point.

Proposition 2.17

(Proposition 5.5 item a,b of [43]). For \(\lambda \in [0,1]\), the following holds:

  1. 1.

    There exists a large enough universal constant C such that the map \(\text {BP}\equiv \text {BP}_{\lambda ,L}\) has a unique fixed point \(\dot{q}^\star _{\lambda ,L}\in \mathbf {\Gamma }_C\). Moreover, if \(\dot{q}\in \mathbf {\Gamma }_C\), \(\text {BP}\dot{q}\in \mathbf {\Gamma }_C\) holds with

    $$\begin{aligned} ||\text {BP}\dot{q}-\dot{q}^\star _{\lambda ,L}||_1\lesssim k^2 2^{-k}||\dot{q}-\dot{q}^\star _{\lambda ,L}||_1. \end{aligned}$$
    (24)

    The same holds for the untruncated BP, i.e. \(\text {BP}_{\lambda }\), with fixed point \(\dot{q}^\star _{\lambda }\in \Gamma _C\). \(\dot{q}^\star _{\lambda ,L}\) for large enough L and \(\dot{q}^\star _{\lambda }\) have full support in their domains.

  2. 2.

    In the limit \(L \rightarrow \infty \), \(||\dot{q}^\star _{\lambda ,L}-\dot{q}^\star _{\lambda }||_1 \rightarrow 0\).

For \(\dot{q} \in \mathscr {P}(\dot{\Omega })\), denote \(\hat{q}\equiv \hat{\text {BP}}\dot{q}\), and define \(H_{\dot{q}}=(\dot{H}_{\dot{q}},\hat{H}_{\dot{q}}, \bar{H}_{\dot{q}})\in \varvec{\Delta }\) by

$$\begin{aligned} \dot{H}_{\dot{q}}(\underline{\sigma })=\frac{\dot{\Phi }(\underline{\sigma })^{\lambda }}{\dot{\mathfrak {Z}}}\prod _{i=1}^{d}\hat{q}(\hat{\sigma }_i),\quad \hat{H}_{\dot{q}}(\underline{\sigma })=\frac{\hat{\Phi }(\underline{\sigma })^{\lambda }}{\hat{\mathfrak {Z}}}\prod _{i=1}^{k}\dot{q}(\dot{\sigma }_i),\quad \bar{H}_{\dot{q}}(\sigma )=\frac{\bar{\Phi }(\sigma )^{-\lambda }}{\bar{\mathfrak {Z}}}\dot{q}(\dot{\sigma })\hat{q}(\hat{\sigma }), \end{aligned}$$
(25)

where \(\dot{\mathfrak {Z}}\equiv \dot{\mathfrak {Z}}_{\dot{q}},\hat{\mathfrak {Z}}\equiv \hat{\mathfrak {Z}}_{\dot{q}}\) and \(\bar{\mathfrak {Z}}\equiv \bar{\mathfrak {Z}}_{\dot{q}}\) are normalizing constants.

Definition 2.18

(Definition 5.6 of [43]). The optimal coloring profile for the truncated model and the untruncated model is the tuple \(H^\star _{\lambda ,L}=(\dot{H}^\star _{\lambda ,L},\hat{H}^\star _{\lambda ,L},\bar{H}^\star _{\lambda ,L})\) and \(H^\star _{\lambda }=(\dot{H}^\star _{\lambda },\hat{H}^\star _{\lambda },\bar{H}^\star _{\lambda })\), defined respectively by \( H^\star _{\lambda ,L}:= H_{\dot{q}^\star _{\lambda ,L}}\) and \(H^\star _{\lambda }:=H_{\dot{q}^\star _{\lambda }}\).

Definition 2.19

For \(k\ge k_0,\alpha \in (\alpha _{\textsf {cond}}, \alpha _{\textsf {sat}})\) and \(\lambda \in [0,1]\), define the optimal \(\lambda \)-tilted truncated weight \(s^\star _{\lambda ,L}\equiv s^\star _{\lambda ,L}(\alpha ,k)\) and untruncated weight \(s^\star _{\lambda } \equiv s^\star _{\lambda }(\alpha ,k)\) by

$$\begin{aligned} \begin{aligned} s^\star _{\lambda ,L}&:=s(H^\star _{\lambda ,L})\equiv \big \langle \log \dot{\Phi }, \dot{H}^\star _{\lambda ,L}\big \rangle +\big \langle \log \hat{\Phi }^{\text {m}}, \hat{H}^\star _{\lambda ,L}\big \rangle +\big \langle \log \bar{\Phi }, \bar{H}^\star _{\lambda ,L}\big \rangle ;\\ s^\star _{\lambda }&:=s(H^\star _{\lambda })\equiv \big \langle \log \dot{\Phi }, \dot{H}^\star _{\lambda }\big \rangle +\big \langle \log \hat{\Phi }^{\text {m}}, \hat{H}^\star _{\lambda }\big \rangle +\big \langle \log \bar{\Phi }, \bar{H}^\star _{\lambda }\big \rangle . \end{aligned} \end{aligned}$$
(26)

Then, define the optimal tilting constants \(\lambda ^\star _L\equiv \lambda ^\star _L(\alpha ,k)\) and \(\lambda ^\star \equiv \lambda ^\star (\alpha , k)\) by

$$\begin{aligned} \lambda ^\star _L:=\sup \{\lambda \in [0,1]: F_{\lambda ,L}(H^\star _{\lambda ,L}) \ge \lambda s^\star _{\lambda ,L} \}\quad \text {and}\quad \lambda ^\star := \sup \{\lambda \in [0,1]: F_{\lambda }(H^\star _\lambda ) \ge \lambda s^\star _{\lambda } \}. \end{aligned}$$
(27)

Finally, we define \(s^\star _L\equiv s^\star _L(\alpha ,k),s^\star \equiv s^\star (\alpha ,k)\) and \(c^\star \equiv c^\star (\alpha ,k)\) by

$$\begin{aligned} s^\star _{L}:= s^\star _{\lambda ^\star _L,L},\quad s^\star \equiv s^\star _{\lambda ^\star },\quad \text {and}\quad c^\star \equiv (2\lambda ^\star )^{-1}. \end{aligned}$$
(28)

We remark that \(s^\star = \textsf {f}^{1\textsf {rsb}}(\alpha )\) and \(\lambda ^\star \in (0,1)\) holds for \(\alpha \in (\alpha _{\textsf {cond}}, \alpha _{\textsf {sat}})\) (see Proposition 1.4 of [43]).

To end this section, we define the optimal coloring profile in the second moment (cf. Definition 5.6 of [43]). Define the analogue of \((\dot{\Phi },\hat{\Phi },\bar{\Phi })\) in the second moment \((\dot{\Phi }_2,\hat{\Phi }_2,\bar{\Phi }_2)\) by \(\dot{\Phi }_2:=\dot{\Phi }\otimes \dot{\Phi }\), \(\bar{\Phi }_2:=\bar{\Phi }\otimes \bar{\Phi }\) and

$$\begin{aligned} \hat{\Phi }_2(\underline{\varvec{\sigma }})^{\lambda }:=\mathbb {E}^{\text {lit}}\Big [\hat{\Phi }^{\text {lit}}(\underline{\sigma }^1\oplus \underline{\texttt {L}}^1)^{\lambda }\hat{\Phi }^{\text {lit}}(\underline{\sigma }^2\oplus \underline{\texttt {L}}^2)^{\lambda } \Big ]\quad \text {for}\quad \underline{\varvec{\sigma }}=(\underline{\sigma }^1,\underline{\sigma }^2)\in \Omega ^{2k}. \end{aligned}$$

Then, the BP map in the second moment is defined by replacing \((\dot{\Phi },\hat{\Phi },\bar{\Phi })\) in (20) by \((\dot{\Phi }_2,\hat{\Phi }_2,\bar{\Phi }_2)\). Moreover, analogous to (25), define for \(\dot{q}\in \mathscr {P}\big ((\dot{\Omega }_L)^2\big )\) by replacing \((\dot{\Phi },\hat{\Phi },\bar{\Phi })\) in (25) by \((\dot{\Phi }_2,\hat{\Phi }_2,\bar{\Phi }_2)\). Here, and .

Definition 2.20

(Definition 5.6 of [43]). The optimal coloring profile in the second moment for the truncated model is the tuple \(H^{\bullet }_{\lambda ,L}=(\dot{H}^{\bullet }_{\lambda ,L},\hat{H}^{\bullet }_{\lambda ,L},\bar{H}^{\bullet }_{\lambda ,L})\) defined by .

3 Proof Outline

Recall that \({{\textbf {N}}}_s^{\text{ tr }}\equiv {{\textbf {Z}}}_{0,s}^{\text{ tr }}\) counts the number of valid colorings with weight between \(e^{ns}\) and \(e^{ns+1}\), which do not contain a free cycle. Also, recalling the constant \(s_\circ (C)\equiv s_\circ (n,\alpha ,C)\) in (4). It was shown in [37] that for fixed \(C\in \mathbb {R}\), \(\mathbb {E}{{\textbf {N}}}^{\text{ tr }}_{s_{\circ }(C)}\asymp _{k} e^{\lambda ^\star C}\) holds and we have the following:

$$\begin{aligned} \mathbb {E}({{\textbf {N}}}_{s_\circ (C)}^{\text{ tr }})^2 \lesssim _{k} (\mathbb {E}{{\textbf {N}}}_{s_\circ (C)}^{\text{ tr }})^{2} +\mathbb {E}{{\textbf {N}}}_{s_\circ (C)}^{\text{ tr }}. \end{aligned}$$
(29)

Hence, the Cauchy-Schwarz inequality shows that there is a constant \(C_k<1\), which only depends on \(\alpha \) and k, such that for \(C>0\),

$$\begin{aligned} \mathbb {P}\left( {{\textbf {N}}}_{s_\circ (C)}^{\text{ tr }}>0 \right) >C_k. \end{aligned}$$

The remaining work is to push this probability close to 1. The key to proving Theorem 1.1 and 1.4 is the following theorem.

Theorem 3.1

Let \(k\ge k_0\), \(\alpha \in (\alpha _{\textsf {cond}}, \alpha _{\textsf {sat}})\), and set \(\lambda ^\star , s^\star \) as in Definition 2.19. For every \(\varepsilon >0\), there exists \(C(\varepsilon ,\alpha ,k)>0\) and \(\delta \equiv \delta (\varepsilon ,\alpha ,k)>0\) such that we have for \(n\ge n_0(\varepsilon ,\alpha ,k)\) and \(C\ge C(\varepsilon ,\alpha ,k)\),

$$\begin{aligned} \mathbb {P}\bigg ({{\textbf {N}}}^{\text{ tr }}_{s_\circ (C)}\ge \delta \mathbb {E}{{\textbf {N}}}^{\text{ tr }}_{s_\circ (C)}\bigg )\ge 1-\varepsilon , \end{aligned}$$

where \(s_\circ (C)\equiv s_\circ (n,\alpha ,C)\equiv s^\star -\frac{ \log n}{2\lambda ^\star n} - \frac{C}{n}\).

Theorem 3.1 easily implies Theorems 1.1 and 1.4: in [37, Remark 6.11], we have already shown that Theorem 3.1 implies Theorem 1.4, so we are left to prove Theorem 1.1.

Proof of Theorem 1.1

By Theorem 3.22 of [37], \(\mathbb {E}{{\textbf {N}}}^{\text{ tr }}_{s_{\circ }(C)}\asymp e^{\lambda ^\star C}\), so Theorem 3.1 implies Theorem 1.1-(b). Hence, it remains to prove Theorem 1.1-(a). Fix \(\varepsilon >0\) throughout the proof. By Theorem 3.1, there exists \(C_1\equiv C_1(\varepsilon ,\alpha ,k)\) such that

$$\begin{aligned} \mathbb {P}({{\textbf {N}}}^{\text{ tr }}_{s_\circ (C_1)}\ge 1) \ge 1-\frac{\varepsilon }{4}. \end{aligned}$$
(30)

Note that on the event \({{\textbf {N}}}_{s_\circ (C_1)}^{\text{ tr }}>0\), we have

$$\begin{aligned} Z_n \ge {{\textbf {Z}}}^\text{ tr}_{1} \ge e^{ns_\circ (C_1)} = e^{-C_1} n^{-\frac{1}{2\lambda ^\star }} e^{ns^\star }, \end{aligned}$$

where Z denotes the number of nae-sat solutions in \(\mathscr {G}\). Moreover, it was shown in [37, Theorem 1.1-(a)] that for \(C_2\le n^{1/5}\), we have

$$\begin{aligned} \sum _{s\le s_{\circ }(C_2)}\mathbb {E}{{\textbf {Z}}}_{1,s}\le n^{-\frac{1}{2\lambda ^\star }}\exp \big (ns^\star -(1-\lambda ^\star )C_2+C_k\big ), \end{aligned}$$

where \(C_k\) is a constant that depends only on k and the sum in the lhs is for \(s\in n^{-1}\mathbb {Z}\). Therefore, by Markov’s inequality, we can choose \(C_2\equiv C_2(\varepsilon ,\alpha ,k)\) to be large enough so that

$$\begin{aligned} \mathbb {P}\bigg ( \sum _{s\le s_{\circ }(C_2)}{{\textbf {Z}}}_{1,s}\ge \varepsilon e^{-C_1} n^{-\frac{1}{2\lambda ^\star }} e^{ns^\star } \bigg ) \le \frac{\varepsilon }{4}. \end{aligned}$$
(31)

Furthermore, by Theorem 1.1-(a) of [37], there exists \(C_3\equiv C_3(\varepsilon ,\alpha ,k)\) such that

$$\begin{aligned} \mathbb {P}\bigg ( \sum _{s\ge s_\circ (C_3)} {{\textbf {N}}}_{s} \ge 1 \bigg ) \le \frac{\varepsilon }{4}. \end{aligned}$$
(32)

Finally, Theorem 3.24 and Proposition 3.25 of [37] show that for \(|C|\le n^{1/4}\), \(\mathbb {E}{{\textbf {N}}}_{s_{\circ }(C)}\asymp _{k}e^{-\lambda ^\star C}\) holds. Thus, we can choose \(K\equiv K(\varepsilon ,\alpha ,k)\in \mathbb {N}\) large enough so that

$$\begin{aligned} \mathbb {P}\bigg ( \sum _{s\in n^{-1}\mathbb {Z}:~s_\circ (C_2)\le s\le s_\circ (C_3)} {{\textbf {N}}}_s \ge K \bigg ) \le \frac{\varepsilon }{4}. \end{aligned}$$
(33)

Therefore, by (30)–(33), the conclusion of Theorem 1.1-(a) holds with \(K=K(\varepsilon ,\alpha ,k)\). \(\square \)

3.1 Outline of the proof of Theorem 3.1

In this subsection, we discuss the outline of the proof of Theorem 3.1. We begin with a natural way of characterizing cycles in \(\mathscr {G}= (\mathcal {G}, \underline{\texttt {L}})\) which was also used in [20].

Definition 3.2

(\(\zeta \)-cycle). Let \(l>0\) be an integer and for each \({\zeta }\in \{0,1\}^{2l}\), a \(\zeta \)-cycle in \(\mathscr {G}=(\mathcal {G},\underline{\texttt {L}})\) consists of

$$\begin{aligned} \mathcal {Y}(\zeta ) =\{v_i,a_i,(e_{v_i}^j, e_{a_i}^j)_{j=0,1} \}_{i=1}^l \end{aligned}$$

which satisfies the following conditions:

  • \(v_1,\ldots ,v_l\in [n]\equiv V\) are distinct variables, and for each \(i\in [l]\), \(e_{v_i}^0\) and \(e_{v_i}^1\) are distinct half-edges attached to \(v_i\).

  • \(a_1,\ldots ,a_l\in [m]\equiv F\) are distinct clauses, and for each \(i\in [l]\), \(e_{a_i}^0\) and \(e_{a_i}^1\in [k]\) are distinct half-edges attached to \(a_i\). Moreover,

    $$\begin{aligned} a_1 = \min \{a_i:i\in [l] \}, \quad \text {and} \quad e_{a_1}^0<e_{a_1}^1. \end{aligned}$$
    (34)
  • \((e_{v_i}^1,e_{a_{i+1}}^0)\) and \((e_{a_i}^1,e_{v_i}^0)\) are edges in \(\mathcal {G}\) for each \(i\in [l]\). (\(a_{l+1}=a_1\))

  • The literal on the half-edge \(\texttt {L}({e_{a_i}^j})\) is given by \(\texttt {L}({e_{a_i}^j}) = \zeta _{2(i-1)+j }\) for each \(i\in [l]\) and \(j\in \{0,1\}\). (\(\zeta _0=\zeta _{2l}\))

Note that (34) is introduced in order to prevent overcounting. Also, we denote the size of \(\zeta \) by \(||\zeta ||\), defined as

$$\begin{aligned} ||\zeta ||=l. \end{aligned}$$
(35)

Furthermore, we denote by \(X({\zeta })\) the number of \(\zeta \)-cycles in \(\mathscr {G}=(\mathcal {G},\underline{\texttt {L}})\). For \(\zeta \in \{0,1\}^{2\,l}\), it is not difficult to see that

$$\begin{aligned} X({\zeta })\; {\overset{\text {d}}{\longrightarrow }}\; \text {Poisson}(\mu ({\zeta })), \quad \text {where } \;\; \mu ({\zeta }) := \frac{1}{2l}2^{-2l}(k-1)^l(d-1)^l. \end{aligned}$$
(36)

Moreover, \(\{X({\zeta })\}\) are asymptotically jointly independent in the sense that for any \(l_0>0\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}\bigg (\bigcap _{\underline{\zeta }:\, ||\zeta || \le l_0} \left\{ X({\zeta }) = x_{\zeta } \right\} \bigg ) = \prod _{\zeta :\,||\zeta ||\le l_0} \mathbb {P}\left( \text {Poisson}(\mu ({\zeta })) = x_{\zeta }\right) . \end{aligned}$$
(37)

Both (36) and (37) follow from an application of the method of moments (e.g., see Theorem 9.5 in [29]). Given these definitions and properties, we are ready to state the small subgraph conditioning method, appropriately adjusted to our setting.

Theorem 3.3

(Small subgraph conditioning [39, 40]). Let \(\mathscr {G}= (\mathcal {G}, \underline{\texttt {L}})\) be a random d-regular k-nae-sat instance and let \(X({\zeta })\equiv X({\zeta ,n})\) be the number of \(\zeta \)-cycles in \(\mathscr {G}\) with \(\mu ({\zeta })\) given as (36). Suppose that a random variable \(Z_n\equiv Z_n(\mathscr {G})\) satisfies the following conditions:

  1. (a)

    For each \(l\in \mathbb {N}\) and \(\zeta \in \{0,1\}^{2l}\), the following limit exists:

    $$\begin{aligned} 1+ \delta ({\zeta }) \equiv \lim _{n \rightarrow \infty } \frac{\mathbb {E}[ Z_nX({\zeta })]}{\mu ({\zeta })\mathbb {E}Z_n }. \end{aligned}$$
    (38)

    Moreover, for each \(a,l\in \mathbb {N}\) and \(\zeta \in \{0,1 \}^{2\,l}\), we have

    $$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}[Z_n (X(\zeta ))_a] }{ \mathbb {E}Z_n} = (1+\delta (\zeta ))^a \mu (\zeta )^a, \end{aligned}$$

    where \((b)_a\) denotes the falling factorial \((b)_a:= b(b-1)\cdots (b-a+1)\).

  2. (b)

    The following limit exists:

    $$\begin{aligned} C\equiv \lim _{n \rightarrow \infty } \frac{\mathbb {E}Z_n^2}{(\mathbb {E}Z_n)^2}. \end{aligned}$$
  3. (c)

    We have \(\sum _{l=1}^\infty \sum _{\zeta \in \{0,1\}^{2\,l}} \mu ({\zeta }) \delta ({\zeta })^2 <\infty .\)

  4. (d)

    Moreover, the constant C satisfies \(C\le \exp \left( \sum _{l=1}^\infty \sum _{\zeta \in \{0,1\}^{2\,l}} \mu ({\zeta }) \delta ({\zeta })^2 \right) \).

Then, we have the following conclusion:

$$\begin{aligned} \frac{Z_n}{\mathbb {E}Z_n} \overset{\text {d}}{\longrightarrow } W\equiv \prod _{l=1}^\infty \prod _{\zeta \in \{0,1\}^{2l}} \big (1+\delta ({\zeta })\big )^{\bar{X}({\zeta })} \exp \big (-\mu ({\zeta }) \delta (\zeta )\big ), \end{aligned}$$
(39)

where \(\bar{X}({\zeta })\) are independent Poisson random variables with mean \(\mu ({\zeta })\).

We briefly explain a way to understand the crux of the theorem as follows. Since \(\{X({\zeta })\}\) jointly converges to \(\{\bar{X}({\zeta })\}\), it is not hard to see that

$$\begin{aligned} \frac{\mathbb {E}\left[ \mathbb {E}\big [Z_n\,\big |\,\{X({\zeta })\}\big ]^2 \right] }{(\mathbb {E}Z_n)^2} \rightarrow \exp \bigg (\sum _\zeta \mu ({\zeta }) \delta ({\zeta })^2\bigg ), \end{aligned}$$

using the conditions (a),(b),(c),(d) (e.g. see Theorem 9.12 in [29] and its proof). Therefore, conditions (b) and (d) imply that the conditional variance of \(Z_n\) given \(\{X({\zeta })\}\) is negligible compared to \((\mathbb {E}Z_n)^2\), and hence the distribution of \(Z_n\) is asymptotically the same as that of \(\mathbb {E}\big [Z_n\big | \{X({\zeta })\}\big ]\) as addressed in the conclusion of the theorem.

Having Theorem 3.3 in mind, our goal is to (approximately) establish the four assumptions for (a truncated version of) \({{\textbf {Z}}}_{\lambda ,s_\circ (C)}^{\text{ tr }},\) for \(s_{\circ }(C)\equiv s^\star -\frac{\log n}{2\lambda ^\star n}-\frac{C}{n}\). The condition (b) has already been obtained from the moment analysis from [37]. The condition (a) will be derived in Proposition 4.1 below and (c) will be derived in Lemma 4.6 below. The condition (d), however, will be true only in an approximate sense, where the approximation error becomes smaller when we take the constant C larger because of within-cluster correlations.

In the previous works [27, 28, 39, 40, 42], the condition (d) could be obtained through a direct calculation of the second moment in a purely combinatorial way. However, this approach seems to be intractable in our model; for instance, the main contributing empirical measure to the first moment \(H^\star _{\lambda }\) barely has combinatorial meaning.

Instead, we first establish (39) for the L-truncated model, by showing the concentration of the rescaled partition function, introduced in (40) below. The truncated model will be easier to work with since it has a finite spin space unlike the untruncated model. Then, we rely on the convergence results regarding the leading constants of first and second moments, derived in [37], to deduce (d) for the untruncated model in an approximate sense. We then apply ideas behind the proof of Theorem 3.3 to deduce Theorem 3.1 (for details, see Sect. 6).

We now give a more precise description on how we establish (d) for the truncated model. Let \(1\le L,l_0 <\infty \) and \(\lambda \in (0,\lambda ^\star _L)\), where \(\lambda ^\star _L\) is defined in Definition 2.19. Then, define the rescaled partition function \({{\textbf {Y}}}^{(L)}_{\lambda , l_0}\)

$$\begin{aligned} \begin{aligned} {{\textbf {Y}}}^{(L)}_{\lambda , l_0}&\equiv {{\textbf {Y}}}^{(L)}_{\lambda , l_0}(\mathscr {G}) := \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \prod _{||\zeta || \le l_0 } \left( 1+\delta _{L}(\zeta ) \right) ^{-X(\zeta )}\quad \text{ where }\\ \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}&\equiv \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}(\mathscr {G}):=\sum _{||H-H^\star _{\lambda ,L}||_1 \le n^{-1/2}\log ^{2}n}{{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }[H]\,. \end{aligned} \end{aligned}$$
(40)

Here, \(\delta _{L}(\zeta )\) is the constant defined in (38) for \(Z_n = {{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}\), assuming its existence. The precise definition of \(\delta _{L}(\zeta )\) is given in (45). The reason to consider \(\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\) instead of \({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }\) is to ignore the contribution from near-identical copies in the second moment. Then, Proposition 3.4 below shows that the rescaled partition function is concentrated for each \(L<\infty \). Its proof is provided in Sect. 5.

Proposition 3.4

Let \(k\ge k_0\), \(L<\infty \), and \(\lambda \in (\lambda ^\star _L-0.01\cdot 2^{-k},\lambda ^\star _L)\). Then we have

$$\begin{aligned} \lim _{l_0\rightarrow \infty } \lim _{n \rightarrow \infty } \frac{ \mathbb {E}\big ({{\textbf {Y}}}^{(L)}_{\lambda , l_0}\big )^{2}}{ \big (\mathbb {E}{{\textbf {Y}}}^{(L)}_{\lambda ,l_0} \big )^2} =1. \end{aligned}$$

Remark 3.5

An important thing to note here is that Proposition 3.4 is not true for \(\lambda =\lambda ^\star _L\). If \(\lambda <\lambda ^\star _L\), then \(s^\star _{\lambda ,L}<s^\star _L\), so there should exist exponentially many clusters of size \(e^{n s^\star _{\lambda ,L}}\). Therefore, the intrinsic correlations within clusters are negligible (that is, when we pick two clusters at random, the probability of selecting the same one is close to 0) and the fluctuation is taken over by cycle effects. However, when there are bounded number of clusters of size \(e^{ns^\star _{\lambda ,L}}\) (that is, when \(\lambda \) is very close to \(\lambda ^\star _L\)), within-cluster correlations become non-trivial. Mathematically, we can see this from (29), where we can ignore the first moment term in the rhs of (29) if (and only if) it is large enough.

Nevertheless, for \(s_\circ (C)\) defined as in Theorem 3.1, we will see in Sect. 6 that if we set C to be large, then (d) holds, and hence the conclusion of Theorem 3.3 holds with a small error.

Further notations. Throughout this paper, we will often use the following multi-index notation. Let \(\underline{a}=(a_\zeta )_{||\zeta ||\le l_0}\), \(\underline{b}=(b_\zeta )_{||\zeta ||\le l_0}\) be tuples of integers indexed by \(\zeta \) with \(||\zeta ||\le l_0\). Then, we write

$$\begin{aligned} (\underline{a})^{\underline{b}} = \prod _{\zeta :||\zeta ||\le l_0} a_\zeta ^{b_\zeta }; \quad \quad \quad (\underline{a})_{\underline{b}} = \prod _{\zeta :||\zeta ||\le l_0 } (a_\zeta )_{b_\zeta }= \prod _{\zeta :||\zeta ||\le l_0} \prod _{i=0}^{b_\zeta -1} (a_\zeta -i). \end{aligned}$$

Moreover, for non-negative quantities \(f=f_{d,k, L, n}\) and \(g=g_{d,k,L, n}\), we use any of the equivalent notations \(f=O_{k,L}(g), g= \Omega _{k,L}(f), f\lesssim _{k,L} g\) and \(g \gtrsim _{k,L} f \) to indicate that \(f\le C_{k,L}\cdot g\) holds for some constant \(C_{k,L}>0\), which only depends on kL.

4 The Effects of Cycles

In this section, our goal is to obtain the condition (a) of Theorem 3.3 for (truncated versions of) \({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }\) and \({{\textbf {Z}}}^{\text{ tr }}_{\lambda ,s_n}\), where \(|s_n-s^\star _{\lambda }|=O(n^{-2/3})\) (see Proposition 4.1 below). To do so, we first introduce necessary notations to define \(\delta (\zeta )\) appearing in Theorem 3.3.

For \(\lambda \in [0,1]\), recall the optimal coloring profile of the untruncated model \(H^\star _{\lambda }\) and truncated model \(H^\star _{\lambda ,L}\) from Definition 2.18. We denote the two-point marginals of \(\dot{H}^\star _{\lambda }\) by

$$\begin{aligned} \dot{H}^\star _{\lambda } (\tau _1,\tau _2) := \sum _{\underline{\sigma }\in \Omega ^d} \dot{H}^\star _{\lambda } (\underline{\sigma })\mathbb {1}{\{\sigma _1=\tau _1, \sigma _2=\tau _2 \}},\quad (\tau _1,\tau _2)\in \Omega ^{2} \end{aligned}$$

and similarly for \(\dot{H}^\star _{\lambda ,L}\). On the other hand, for \({\underline{\texttt {L}}}\in \{0,1\}^k\), consider the optimal clause empirical measure \(\hat{H}^{\underline{\texttt {L}}}_{\lambda }\) given the literal assignment \(\underline{\texttt {L}}\in \{0,1\}^{k}\) around a clause, namely for \(\underline{\sigma }\in \Omega ^k\),

$$\begin{aligned} \hat{H}^{\underline{\texttt {L}}}_{\lambda } (\underline{\sigma }) := \frac{1}{\hat{\mathfrak {Z}}^{\underline{\texttt {L}}}_{\lambda }} \hat{\Phi }^{\text {lit}}(\underline{\sigma }\oplus {\underline{\texttt {L}}})^\lambda \prod _{i=1}^k \dot{q}^\star _{\lambda } (\dot{\sigma }_i), \end{aligned}$$
(41)

where \(\hat{\mathfrak {Z}}^{\underline{\texttt {L}}}_{\lambda }\) is the normalizing constant. Note that \(\hat{\mathfrak {Z}}^{\underline{\texttt {L}}}_{\lambda } = \hat{\mathfrak {Z}}_{\lambda }\) is independent of \(\underline{\texttt {L}}\) due to the symmetry \(\dot{q}_\lambda ^\star (\dot{\sigma })=\dot{q}_\lambda ^\star (\dot{\sigma }\oplus 1)\). Similarly, define \(\hat{H}^{\underline{\texttt {L}}}_{\lambda ,L}\) for the truncated model. Given the literals \(\texttt {L}_1,\texttt {L}_2\) at the first two coordinates of a clause, the two point marginal of \(\hat{H}^{\underline{\texttt {L}}}_{\lambda }\) is defined by

$$\begin{aligned} \hat{H}^{\texttt {L}_1,\texttt {L}_2}_{\lambda } (\tau _1,\tau _2 )&:=\frac{1}{2^{k-2}} \sum _{\texttt {L}_3,\ldots \texttt {L}_k\in \{0,1\}} \sum _{\underline{\sigma }\in \Omega ^k} \hat{H}^{\underline{\texttt {L}}}_{\lambda } (\underline{\sigma }) \mathbb {1}\{\sigma _1=\tau _1,\sigma _2=\tau _2 \}\nonumber \\&= \sum _{\underline{\sigma }\in \Omega ^k} \hat{H}^{\underline{\texttt {L}}}_{\lambda } (\underline{\sigma }) \mathbb {1}\{\sigma _1=\tau _1,\sigma _2=\tau _2 \}, \end{aligned}$$
(42)

where the second equality holds for any \(\underline{\texttt {L}}\in \{0,1\}^k\) that agrees with \(\texttt {L}_1,\texttt {L}_2\) at the first two coordinates, due to the symmetry \(\hat{H}^{\underline{\texttt {L}}}_{\lambda }(\underline{\tau })=\hat{H}_{\lambda }^{\underline{\texttt {L}}\oplus \underline{\texttt {L}}'}(\underline{\tau }\oplus \underline{\texttt {L}}')\). The symmetry also implies that

$$\begin{aligned} \sum _{\tau _2} \hat{H}^{\texttt {L}_1,\texttt {L}_2}_{\lambda }(\tau _1,\tau _2) = \bar{H}^\star _{\lambda } (\tau _1), \end{aligned}$$

for any \(\texttt {L}_1, \texttt {L}_2 \in \{0,1\}\) and \(\tau _1\in \Omega \). We also define \(\hat{H}^{\texttt {L}_1,\texttt {L}_2}_{\lambda ,L}\) analogously for the truncated model.

Then, we define \(\dot{A}\equiv \dot{A}_{\lambda },\hat{A}^{\texttt {L}_1,\texttt {L}_2}\equiv \hat{A}^{\texttt {L}_1,\texttt {L}_2}_{\lambda }\) to be the \(\Omega \times \Omega \) matrices as follows:

$$\begin{aligned} \dot{A}(\tau _1,\tau _2) = \frac{\dot{H}^\star _{\lambda } (\tau _1,\tau _2)}{\bar{H}^\star _{\lambda }(\tau _1)}, \quad \hat{A}^{\texttt {L}_1,\texttt {L}_2}(\tau _1,\tau _2) = \frac{\hat{H}_{\lambda }^{\texttt {L}_1,\texttt {L}_2}(\tau _1,\tau _2)}{\bar{H}^\star _{\lambda }(\tau _1)}, \end{aligned}$$
(43)

and \(\Omega _L\times \Omega _L\) matrices \(\dot{A}_{\lambda ,L}\) and \(\hat{A}_{\lambda ,L}^{\texttt {L}_1,\texttt {L}_2}\) are defined analogously using \(\dot{H}^\star _{\lambda ,L}, \hat{H}^{\texttt {L}_1,\texttt {L}_2}_{\lambda ,L}\) and \(\bar{H}^\star _{\lambda ,L}\). Note that both matrices have row sums equal to 1, and hence their largest eigenvalue is 1. For \(\zeta \in \{0,1\}^{2l}\), we introduce the following notation for convenience:

$$\begin{aligned} (\dot{A} \hat{A} )^{\zeta } \equiv \prod _{i=0}^{l-1} \left( \dot{A} \hat{A}^{\zeta _{2i},\zeta _{2i+1}} \right) , \end{aligned}$$
(44)

where \(\zeta _0 = \zeta _{2l}\). Moreover, we define \((\dot{A}_L \hat{A}_L)^\zeta \) analogously. Then, the main proposition of this section is given as follows.

Proposition 4.1

Let \(L,l_0>0\) and let \(\underline{X} = \{X({\zeta })\}_{||\zeta ||\le l_0}\) denote the number of \(\zeta \)-cycles in \(\mathscr {G}\). Also, set \(\mu ({\zeta })\) as (36), and for each \(\zeta \in \cup _l\{0,1\}^{2l}\), define

$$\begin{aligned} \begin{aligned} \delta (\zeta ) \equiv \delta ({\zeta };\lambda )&:= Tr\left[ (\dot{A} \hat{A})^\zeta \right] -1, \\ \delta _L(\zeta ) \equiv \delta _{L} (\zeta ;\lambda )&:= Tr \left[ (\dot{A}_L \hat{A}_L)^{\zeta }\right] -1. \end{aligned} \end{aligned}$$
(45)

Then, there exists a constant \(c_{\textsf {cyc}}=c_{\textsf {cyc}}(l_0)\) such that the following statements hold true:

  1. (1)

    For \(\lambda \in (0,1)\) and any tuple of nonnegative integers \(\underline{a}=(a_\zeta )_{||\zeta ||\le l_0}\), such that \(||\underline{a}||_\infty \le c_{\textsf {cyc}} \log n\), we have

    $$\begin{aligned} \mathbb {E}\left[ \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \cdot (\underline{X})_{\underline{a}} \right] = \left( 1+ err(n,\underline{a}) \right) \left( \underline{\mu } ( 1+ \underline{\delta }_L)\right) ^{\underline{a}} \mathbb {E}\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}, \end{aligned}$$
    (46)

    where \(\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\) is defined in (40) and \(err(n,\underline{a}) = O_k \left( ||\underline{a}||_1 n^{-1/2} \log ^2 n \right) \).

  2. (2)

    For \(\lambda \in (0,\lambda _L^\star )\), where \(\lambda ^\star _{L}\) is defined in Definition 2.19, the analogue of (46) holds for the second moment as well. That is, for \(\underline{a}=(a_\zeta )_{||\zeta ||\le l_0}\) with \(||\underline{a}||_\infty \le c_{\textsf {cyc}} \log n\), we have

    $$\begin{aligned} \mathbb {E}\left[ \big (\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^2 \cdot (\underline{X})_{\underline{a}} \right] = \left( 1+ err(n,\underline{a}) \right) \left( \underline{\mu } ( 1+ \underline{\delta }_L)^2\right) ^{\underline{a}} \mathbb {E}\big (\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^2. \end{aligned}$$
    (47)
  3. (3)

    Under a slightly weaker error given by \(err'(n,\underline{a}) = O_k(||\underline{a}||_1 n^{-1/8})\), the analogue of (1) holds the same for the untruncated model with any \(\lambda \in (0,1)\). Namely, analogously to (40), define \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{\text{ tr }}\) and \(\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ,s}\) by

    $$\begin{aligned}&\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda }:=\sum _{||H-H^\star _{\lambda }||_1\le n^{-1/2}\log ^2 n}{{\textbf {Z}}}^{\text{ tr }}_{\lambda }[H];\nonumber \\ {}&\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ,s}:=\sum _{||H-H^\star _{\lambda }||_1\le n^{-1/2}\log ^2 n}{{\textbf {Z}}}^{\text{ tr }}_{\lambda }[H]\mathbb {1}\big \{s(H)\in [ns,ns+1)\big \}\,. \end{aligned}$$
    (48)

    Then, (46) continues to hold when we replace \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }},err(n,\underline{a})\) and \(\underline{\delta }_L\) by \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{\text{ tr }}, err'(n,\underline{a})\) and \(\underline{\delta }\) respectively. Moreover, (46) continues to hold when we replace \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }},err(n,\underline{a})\) and \(\underline{\delta }_L\) by \({{\textbf {Z}}}_{\lambda ,s_n}^{\text{ tr }}, err'(n,\underline{a})\) and \(\underline{\delta }\) respectively, where \(|s_n-s^\star _{\lambda }|=O(n^{-2/3})\).

  4. (4)

    For each \(\zeta \in \cup _l \{0,1\}^{2l}\), we have \(\lim _{L\rightarrow \infty } \delta _L (\zeta ) = \delta (\zeta )\).

In the remainder of this section, we focus on proving (1) of Proposition 4.1. In the proof, we will be able to see that (2) of the proposition follow by an analogous argument (see Remark 4.4). The proofs of (3) and (4) are deferred to Appendix 6, since they require substantial amounts of additional technical work.

For each \(\zeta \in \{0,1\}^{2l}\) and a nonnegative integer \(a_\zeta \), let \(\mathcal {Y}_i \equiv \mathcal {Y}_i(\zeta ) \in \big \{\{v_{\iota },a_{\iota },(e_{v_{\iota }}^{j}, e_{a_{\iota }}^{j})_{j=0,1} \}_{\iota =1}^l\big \}\), \(i\in [a_\zeta ]\) denote the possible locations of \(a_\zeta \) \(\zeta \)-cycles defined as Definition 3.2. Then, it is not difficult to see that

$$\begin{aligned} (X(\zeta ))_{a_\zeta } = \sum \mathbb {1}\{\mathcal {Y}_1,\ldots , \mathcal {Y}_{a_\zeta } \in \mathcal {G},\text { and } \underline{\texttt {L}}(\mathcal {Y}_i;\mathscr {G})=\zeta \,,\,\forall i\le a_{\zeta }\} \equiv \sum \mathbb {1}\{\mathcal {Y}_1,\ldots , \mathcal {Y}_{a_\zeta } \} , \end{aligned}$$
(49)

where the summation runs over distinct \(\mathcal {Y}_1,\ldots ,\mathcal {Y}_{a_\zeta }\), and \(\underline{\texttt {L}}(\mathcal {Y}_i;\mathscr {G})\) denotes the literals on \(\mathcal {Y}_i\) inside \(\mathscr {G}\). Based on this observation, we will show (1) of Proposition 4.1 by computing the cost of planting cycles at specific locations \(\{\mathcal {Y}_i\}\). Moreover, in addition to \(\{\mathcal {Y}_i\}\), prescribing a particular coloring on those locations will be useful. In the following definition, we introduce the formal notations to carry out such an idea.

Definition 4.2

(Empirical profile on \(\mathcal {Y}\)). Let \(L,l_0>0\) be given integers and let \(\underline{a}=(a_\zeta )_{||\zeta ||\le l_0}\). Moreover, let

$$\begin{aligned} \mathcal {Y}\equiv \{\mathcal {Y}_i(\zeta ) \}_{i\in [a_\zeta ], ||\zeta ||\le l_0} \end{aligned}$$

denote the distinct \(a_\zeta \) \(\zeta \)-cycles for each \(||\zeta ||\le l_0\) inside \(\mathscr {G}\) (Definition 3.2), and let \(\underline{\sigma }\) be a valid coloring on \(\mathscr {G}\). We define \({\Delta } \equiv {\Delta }[\underline{\sigma };\mathcal {Y}]\), the empirical profile on \(\mathcal {Y}\), as follows.

  • Let \(V(\mathcal {Y})\) (resp. \(F(\mathcal {Y})\)) be the set of variables (resp. clauses) in \(\cup _{||\zeta ||\le l_0} \cup _{i=1}^{a_\zeta } \mathcal {Y}_i(\zeta )\), and let \(E_c(\mathcal {Y})\) denote the collection of variable-adjacent half-edges included in \(\cup _{||\zeta ||\le l_0} \cup _{i=1}^{a_\zeta } \mathcal {Y}_i(\zeta )\). We write \(\underline{\sigma }_\mathcal {Y}\) to denote the restriction of \(\underline{\sigma }\) onto \(V(\mathcal {Y})\) and \(F(\mathcal {Y})\).

  • \(\Delta \equiv \Delta [\underline{\sigma };\mathcal {Y}] \equiv (\dot{\Delta }, (\hat{\Delta }^{\underline{\texttt {L}}})_{\underline{\texttt {L}}\in \{0,1\}^k}, \bar{\Delta }_c)\) is the counting measure of coloring configurations around \(V(\mathcal {Y}), F(\mathcal {Y})\) and \(E_c(\mathcal {Y})\) given as follows.

    $$\begin{aligned} \begin{aligned}&\dot{\Delta } (\underline{\tau }) = |\{v\in V(\mathcal {Y}): \underline{\sigma }_{\delta v} = \underline{\tau } \} |, \quad \text {for all } \underline{\tau } \in \dot{\Omega }_L^d;\\&\hat{\Delta }^{\underline{\texttt {L}}} (\underline{\tau }) = |\{a\in F(\mathcal {Y}): \underline{\sigma }_{\delta a} = \underline{\tau }, \;\underline{\texttt {L}}_{\delta a} = \underline{\texttt {L}}\} |, \quad \text {for all } \underline{\tau } \in \dot{\Omega }_L^k, \; \underline{\texttt {L}}\in \{0,1\}^k;\\&\bar{\Delta }_c ({\tau }) = |\{e\in E_c(\mathcal {Y}): \sigma _{e} = \tau \} |, \quad \text {for all } \tau \in \dot{\Omega }_L. \end{aligned} \end{aligned}$$
    (50)
  • We write \(|\dot{\Delta }| \equiv \langle \dot{\Delta },1 \rangle \), and define \(|\hat{\Delta }^{\underline{\texttt {L}}} |\), \(|\bar{\Delta }_c|\) analogously.

Note that \(\Delta \) is well-defined if \(\mathcal {Y}\) and \(\underline{\sigma }_\mathcal {Y}\) are given.

In the proof of Proposition 4.1, we will fix \(\mathcal {Y}\), the locations of \(\underline{a}\) \(\zeta \)-cycles, and a coloring configuration \(\underline{\tau }_\mathcal {Y}\) on \(\mathcal {Y}\), and compute the contributions from \(\mathscr {G}\) and \(\underline{\sigma }\) that has cycles on \(\mathcal {Y}\) and satisfies \(\underline{\sigma }_\mathcal {Y}= \underline{\tau }_\mathcal {Y}\). Formally, abbreviate \({{\textbf {Z}}}^\prime \equiv \widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\) for simplicity and define

$$\begin{aligned} {{\textbf {Z}}}^\prime [\underline{\tau }_\mathcal {Y}] = \sum _{\underline{\sigma }} w^\text{ lit }(\underline{\sigma })^\lambda \mathbb {1} \{\underline{\sigma }_{\mathcal {Y}} = \underline{\tau }_{\mathcal {Y}} \}. \end{aligned}$$

Then, we express that

$$\begin{aligned} \mathbb {E}\left[ {{\textbf {Z}}}^\prime (\underline{X})_{\underline{a}} \right]&= \sum _{\mathcal {Y}} \sum _{\underline{\tau }_\mathcal {Y}} \mathbb {E}\left[ {{\textbf {Z}}}^\prime [\underline{\tau }_\mathcal {Y}] \mathbb {1}\{\mathcal {Y}_i(\zeta ) \in \mathscr {G}, \;\forall i\in [a_\zeta ] , \, \forall ||\zeta ||\le l_0 \} \right] \nonumber \\ {}&\equiv \sum _{\mathcal {Y}} \sum _{\underline{\tau }_\mathcal {Y}} \mathbb {E}\left[ {{\textbf {Z}}}^\prime \mathbb {1}\{\mathcal {Y}, \underline{\tau }_\mathcal {Y}\} \right] , \end{aligned}$$
(51)

where the notation in the last equality is introduced for convenience. The key idea of the proof is to study the rhs of the above equation. We follow the similar idea developed in [25], Section 6, which is to decompose \({{\textbf {Z}}}^\prime \) in terms of empirical profiles of \(\underline{\sigma }\) on \(\mathcal {G}\). The main contribution of our proof is to suggest a method that overcomes the complications caused by the indicator term (or the planted cycles).

Proof of Proposition 4.1-(1)

As discussed above, our goal is to understand \(\mathbb {E}[{{\textbf {Z}}}^\prime \mathbb {1}\{\mathcal {Y},\underline{\tau }_\mathcal {Y}\} ]\) for given \(\mathcal {Y}\) and \(\underline{\tau }_\mathcal {Y}\). To this end, we decompose the partition function in terms of coloring profiles. It will be convenient to work with

$$\begin{aligned} g\equiv g(H)\equiv (\dot{g},(\hat{g}^{\underline{\texttt {L}}})_{\underline{\texttt {L}}\in \{0,1\}^k},\bar{g}) \equiv \bigg (n\dot{H}, \Big (\frac{m}{2^k}\hat{H}^{\underline{\texttt {L}}}\Big )_{\underline{\texttt {L}}\in \{0,1 \}^k}, nd\bar{H} \bigg ), \end{aligned}$$
(52)

the non-normalized empirical counts of H. Moreover, if g is given, then the product of the weight, clause, and edge factors is also determined. Let us denote this by w(g), defined by

$$\begin{aligned} w(g) \equiv w(\dot{g},(\hat{g}^{\underline{\texttt {L}}})_{\underline{\texttt {L}}}) \equiv {\prod _{\underline{\tau }\in \dot{\Omega }_L^d } \dot{\Phi }(\underline{\tau })^{\dot{g}(\underline{\tau }) } \prod _{\underline{\texttt {L}}\in \{0,1\}^k}\prod _{\underline{\tau }\in \dot{\Omega }_L^k} \hat{\Phi }^{\text {lit}} (\underline{\tau }\oplus \underline{\texttt {L}})^{\hat{g}^{\underline{\texttt {L}}}(\underline{\tau })} \prod _{\tau \in \dot{\Omega }_L} \bar{\Phi }(\tau )^{\dot{M}\dot{g}(\tau ) }}. \end{aligned}$$
(53)

Recalling the definition of \({{\textbf {Z}}}^\prime \equiv \widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\) in (40), we consider g such that \(||g-g^\star _{\lambda ,L}||_1\le \sqrt{n}\log ^2 n \), where we defined

$$\begin{aligned} g^\star _{\lambda ,L}:= g(H^\star _{\lambda ,L})\quad \text {and}\quad g^\star _{\lambda }:=g(H^\star _{\lambda }). \end{aligned}$$
(54)

Now, fix the literal assignment \(\underline{\texttt {L}}_E\) on \(\mathscr {G}\) which agrees with those on the cycles given by \(\mathcal {Y}\). Finally, let \(\Delta =(\dot{\Delta },\hat{\Delta }, \bar{\Delta }_c)\) denote the empirical profile on \(\mathcal {Y}\) induced by \(\underline{\tau }_\mathcal {Y}\). Then, we have

$$\begin{aligned}&\mathbb {E}\left[ \left. {{\textbf {Z}}}^\prime [g] \mathbb {1}\{\mathcal {Y}, \underline{\tau }_\mathcal {Y}\} \,\right| \, \underline{\texttt {L}}_E \right] \nonumber \\ {}&\quad = \frac{(\bar{g}-\bar{\Delta }_c)!}{(nd)!} {n- |\dot{\Delta }| \atopwithdelims ()\dot{g}-\dot{\Delta }} \prod _{\underline{\texttt {L}}\in \{0,1\}^k}{| \hat{g}^{\underline{\texttt {L}}}-\hat{\Delta }^{\underline{\texttt {L}}} | \atopwithdelims ()\hat{g}^{\underline{\texttt {L}}}-\hat{\Delta }^{\underline{\texttt {L}}}} \times w(g)^\lambda \nonumber \\ {}&\quad = \frac{1}{(n)_{|\dot{\Delta }|} (m)_{|\hat{\Delta }|}} {n\atopwithdelims ()\dot{g}} \left\{ \prod _{\underline{\texttt {L}}}{| \hat{g}^{\underline{\texttt {L}}} | \atopwithdelims ()\hat{g}^{\underline{\texttt {L}}}} \right\} {nd \atopwithdelims ()\bar{g}}^{-1} \frac{(\dot{g})_{\dot{\Delta }} \prod _{\underline{\texttt {L}}}(\hat{g}^{\underline{\texttt {L}}})_{\hat{\Delta }^{\underline{\texttt {L}}}} }{(\bar{g})_{\bar{\Delta }_c}}\times w(g)^\lambda \nonumber \\ {}&\quad = \frac{1+ O_k\left( ||\underline{a}||_1 n^{-1/2}\log ^2n\right) }{(nd)^{|\bar{\Delta }_c|}}\mathbb {E}[{{\textbf {Z}}}'[g]\,|\,\underline{\texttt {L}}_E] \frac{ (\dot{H}^\star )^{\dot{\Delta }} \prod _{\underline{\texttt {L}}} (\hat{H}^{\underline{\texttt {L}}})^{\hat{\Delta }^{\underline{\texttt {L}}}}}{(\bar{H}^\star )^{\bar{\Delta }_c}} , \end{aligned}$$
(55)

where we wrote \(H^\star = H^\star _{\lambda ,L}\) and the last equality follows from \(||g-g^\star _{\lambda ,L}||\le \sqrt{n}\log ^2 n\).

In the remaining, we sum the above over \(\mathcal {Y}\) and \(\underline{\tau }_\mathcal {Y}\), depending on the structure of \(\mathcal {Y}\). To this end, we define \(\eta \equiv \eta (\mathcal {Y})\) to be

$$\begin{aligned} \eta \equiv \eta (\mathcal {Y}) := |\bar{\Delta }_c|-|\dot{\Delta }| - |\hat{\Delta }| , \end{aligned}$$
(56)

where \(|\hat{\Delta }| = \sum _{\underline{\texttt {L}}} |\hat{\Delta }^{\underline{\texttt {L}}}|\) and noting that \(|\dot{\Delta }|, |\hat{\Delta }|\) and \(|\bar{\Delta }_c|\) are well-defined if \(\mathcal {Y}\) is given. Note that \(\eta \) describes the number of disjoint components in \(\mathcal {Y}\), in the sense that

$$\begin{aligned} \#\{\text {disjoint components of }\mathcal {Y}\} = ||\underline{a}||_1 - \eta . \end{aligned}$$

Firstly, suppose that all the cycles given by \(\mathcal {Y}\) are disjoint, that is, \(\eta (\mathcal {Y})=0\). In other words, all the variable sets \(V(\mathcal {Y}_i (\zeta ))\), \(i\in [a_\zeta ], ||\zeta ||\le l_0\) are pairwise disjoint, and the same holds for the clause sets \(F(\mathcal {Y}_i (\zeta ))\). In this case, the effect of each cycle can be considered to be independent when summing (55) over \(\underline{\tau }_\mathcal {Y}\), which gives us

$$\begin{aligned} \begin{aligned} \frac{ \sum _{\underline{\tau }_\mathcal {Y}} \mathbb {E}[{{\textbf {Z}}}^\prime [g] \mathbb {1}\{\mathcal {Y}, \underline{\tau }_\mathcal {Y}\} \,|\,\underline{\texttt {L}}_E ]}{ \mathbb {E}[{{\textbf {Z}}}^\prime [g]\,|\,\underline{\texttt {L}}_E] } =\frac{1+ O_k\left( ||\underline{a}||_1 n^{-1/2}\log ^2 n\right) }{(nd)^{|\bar{\Delta }_c|}} \prod _{||\zeta ||\le l_0} \left( Tr\left[ (\dot{A}_L \hat{A}_L)^\zeta \right] \right) ^{a_\zeta }, \end{aligned} \end{aligned}$$
(57)

where \((\dot{A}_L \hat{A}_L)^\zeta \) defined as (44). Also, note that although \({\Delta }\) is defined depending on \(\underline{\tau }_\mathcal {Y}\), \(|\bar{\Delta }_c|\) in the denominator is well-defined given \(\mathcal {Y}\). Thus, averaging the above over all \(\underline{\texttt {L}}_E\) gives

$$\begin{aligned} \begin{aligned} \frac{ \mathbb {E}[{{\textbf {Z}}}^\prime [g] \mathbb {1}\{\mathcal {Y}\} ]}{ \mathbb {E}[{{\textbf {Z}}}^\prime [g]] }&=\frac{1+ O_k\left( ||\underline{a}||_1 n^{-1/2}\log ^2 n\right) }{(2nd)^{|\bar{\Delta }_c|}} \prod _{||\zeta ||\le l_0} \left( Tr\left[ (\dot{A}_L \hat{A}_L)^\zeta \right] \right) ^{a_\zeta }\\ {}&= \left( 1+ O_k\left( ||\underline{a}||_1 n^{-1/2}\log ^2 n\right) \right) \mathbb {P}(\mathcal {Y}) \prod _{||\zeta ||\le l_0} \left( Tr\left[ (\dot{A}_L \hat{A}_L)^\zeta \right] \right) ^{a_\zeta }. \end{aligned} \end{aligned}$$
(58)

Moreover, setting \(a^\dagger = \sum _{||\zeta ||\le l_0} a_\zeta ||\zeta ||\), the number of ways of choosing \(\mathcal {Y}\) to be \(\underline{a}\) disjoint \(\zeta \)-cycles can be written by

$$\begin{aligned} (n)_{a^\dagger } (m)_{a^\dagger } (d(d-1)k(k-1))^{a^\dagger } \prod _{||\zeta ||\le l_0} \left( \frac{1}{2||\zeta ||} \right) ^{a_\zeta }. \end{aligned}$$
(59)

Having this in mind, summing (58) over all \(\mathcal {Y}\) that describes disjoint \(\underline{a}\) \(\zeta \)-cycles, and then over all \(||g-g^\star _{\lambda ,L}||\le n^{2/3}\), we obtain that

$$\begin{aligned} \frac{ \sum _{||g-g^\star _{\lambda ,L} ||\le \sqrt{n}\log ^2 n} \sum _{\mathcal {Y} \text{ disjoint }} \mathbb {E}[{{\textbf {Z}}}^\prime [g] \mathbb {1}\{\mathcal {Y}\} ] }{\mathbb {E}[{{\textbf {Z}}}^\prime ]} = \left( 1+O_k(||\underline{a}||_1 n^{-1/2}\log ^2 n) \right) \left( \underline{\mu } (1+\underline{\delta }_L ) \right) ^{\underline{a}}, \end{aligned}$$
(60)

where \(\underline{\mu }, \underline{\delta }_L\) are defined as in the statement of the proposition.

Our next goal is to deal with \(\mathcal {Y}\) such that \(\eta (\mathcal {Y})=\eta >0\) and to show that such \(\mathcal {Y}\) provide a negligible contribution. Given \(\eta >0\), this implies that at least \(||\underline{a}||_1 - 2\eta \) cycles of \(\mathcal {Y}\) should be disjoint from everything else in \(\mathcal {Y}\). Therefore, when summing the terms with \(H^\star \) in (55) over \(\underline{\tau }_\mathcal {Y}\), all but at most \(2\eta \) cycles contribute by \((1+{\delta }_L(\zeta ))\), while the others with intersections can become a different value. Thus, we obtain that

$$\begin{aligned} \begin{aligned} \frac{ \sum _{\underline{\tau }_\mathcal {Y}} \mathbb {E}[{{\textbf {Z}}}^\prime [g] \mathbb {1}\{\mathcal {Y}, \underline{\tau }_\mathcal {Y}\} \,|\,\underline{\texttt {L}}_E ]}{ \mathbb {E}[{{\textbf {Z}}}^\prime [g]\,|\,\underline{\texttt {L}}_E] } \le \frac{ (1+\underline{\delta }_L)^{\underline{a}}\, C^{2\eta }}{(nd)^{|\bar{\Delta }_c|}} , \end{aligned} \end{aligned}$$
(61)

for some constant \(C>0\) depending on \(k, L, l_0\).

Then, similarly to (59), we can bound the number of choices \(\mathcal {Y}\) satisfying \(\eta (\mathcal {Y})=\eta \). Since all but \(2\eta \) of the cycles are disjoint from the others, we have

$$\begin{aligned}&\#\{ \mathcal {Y}\text { such that }\eta (\mathcal {Y}) = \eta \} \nonumber \\&\quad \le \left\{ (n)_{|\dot{\Delta }|} (m)_{|\hat{\Delta }| } (d(d-1))^{|\dot{\Delta }|}(k(k-1))^{{}\hat{\Delta }|} (d-2)^{|\bar{\Delta }_c|-2|\dot{\Delta }|} (k-2)^{|\bar{\Delta }| - 2|\hat{\Delta }|} \right\} \nonumber \\&\qquad \times \left\{ \prod _{||\zeta ||\le l_0} \left( \frac{1}{2||\zeta ||} \right) ^{a_\zeta } \times (2l_0)^{2\eta } \right\} \times \left\{ (a^\dagger )^{\eta } d^{2a^\dagger - |\bar{\Delta }_c| }\right\} . \end{aligned}$$
(62)

The formula in the rhs can be described as follows.

  1. 1.

    The first bracket describes the number of ways to choose variables and clauses, along with the locations of half-edges described by \(\mathcal {Y}\). Note that at this point we have not yet chosen the places of variables, clauses and half-edges that are given by the intersections of cycles in \(\mathcal {Y}\).

  2. 2.

    The second bracket is introduced to prevent overcounting the locations of cycles that are disjoint from all others. Multiplication of \((2l_0)^{2\eta }\) comes from the observation that there can be at most \(2\eta \) intersecting cycles.

  3. 3.

    The third bracket bounds the number of ways of choosing where to put overlapping variables and clauses, which can be understood as follows.

    • Choose where to put an overlapping variable (or clause): number of choices bounded by \(a^\dagger \).

    • If there is an overlapping half-edge adjacent to the chosen variable (or clause), we decide where to put the clause at its endpoint: number of choices bounded by d.

    • Since there are \(2a^\dagger - |\bar{\Delta }_c|\) overlapping half-edges and \(2a^\dagger - |\dot{\Delta }|-|\hat{\Delta }|\) overlapping variables and clauses, we obtain the expression (62).

To conclude the analysis, we need to sum (61) over \(\mathcal {Y}\) with \(\eta (\mathcal {Y}) = \eta \), using (62) (and average over \(\underline{\texttt {L}}_E\)). One thing to note here is the following relation among \(|\dot{\Delta }|, |\hat{\Delta }|,\) and \(\bar{\Delta }_c\):

$$\begin{aligned} \min \{a^\dagger - |\dot{\Delta }|, \,a^\dagger - |\hat{\Delta }| \} \ge 2a^\dagger - |\bar{\Delta }_c|, \end{aligned}$$

which comes from the fact that for each overlapping edge, its endpoints count as overlapping variables and clauses. Therefore, we can simplify (62) as

$$\begin{aligned} \begin{aligned} \#\{ \mathcal {Y}\text { such that }\eta (\mathcal {Y}) = \eta \} \le (nd)^{|\dot{\Delta }| + |\hat{\Delta }|} 2^{2a^\dagger } \underline{\mu }^{\underline{a}} \times (4l_0^2 a^\dagger d^3 k^2)^\eta . \end{aligned} \end{aligned}$$
(63)

Thus, we obtain that

$$\begin{aligned} \begin{aligned} \sum _{\mathcal {Y}: \eta (\mathcal {Y})=\eta } \sum _{\underline{\tau }_\mathcal {Y}} \frac{ \mathbb {E}[{{\textbf {Z}}}^\prime [g] \mathbb {1}\{\mathcal {Y}, \underline{\tau }_\mathcal {Y}\},|\, \underline{\texttt {L}}_E ]}{ \mathbb {E}[{{\textbf {Z}}}^\prime [g],|\, \underline{\texttt {L}}_E] } \le 2^{2a^\dagger } \left( \underline{\mu }(1+\underline{\delta }_L)\right) ^{\underline{a}}\, \left( \frac{C' a^\dagger }{n} \right) ^{\eta }, \end{aligned} \end{aligned}$$
(64)

for another constant \(C'\) depending on \(k, L, l_0\). We choose \(c_{\textsf {cyc}}=c_{\textsf {cyc}}(l_0)\) to be \(2^{2a^\dagger } \le n^{1/3}\) if \(||\underline{a}||_\infty \le c_{\textsf {cyc}} \log n\). Then, summing this over \(\eta \ge 1\) and all g with \(||g-g^\star _{\lambda ,L}|| \le \sqrt{n}\log ^2 n\) shows that the contributions from \(\mathcal {Y}\) with \(\eta (\mathcal {Y})\ge 1\) is negligible for our purposes. Combining with (60), we deduce the conclusion. \(\square \)

Remark 4.3

Although we will not use it in the rest of the paper, the analogue of Proposition 4.1-(1) for \({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }\) holds under the same condition. That is, we have

$$\begin{aligned} \mathbb {E}\left[ {{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }} \cdot (\underline{X})_{\underline{a}} \right] = \left( 1+ err(n,\underline{a}) \right) \left( \underline{\mu } ( 1+ \underline{\delta }_L)\right) ^{\underline{a}} \mathbb {E}{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}. \end{aligned}$$
(65)

To prove the equation above, we have from Proposition 3.4 of [43] that

$$\begin{aligned} \begin{aligned}&\sum _{g: ||g-g_{\lambda ,L}^\star ||_1 \ge \sqrt{n} \log ^2 n} \mathbb {E}\big [ {{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}[g] (\underline{X})_{\underline{a}} \big ] \\ {}&\quad \le \sum _{||g-g_{\lambda ,L}^\star ||_1 \ge \sqrt{n}\log ^2 n} \mathbb {E}\big [{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}[g]\big ] n^{O_k(\log ^2 n)} + \mathbb {E}\big [{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }} (\underline{X})_{\underline{a}} \mathbb {1}\{ ||\underline{X}||_{\infty } \ge \log ^2 n \} \big ] \\ {}&\quad \le e^{-\Omega _k(\log ^4 n)} \mathbb {E}{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}. \end{aligned} \end{aligned}$$
(66)

In the second line, we controlled the second term crudely by using \({{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}\le 2^n\) and (37). Having (66) in hand, the rest of the proof of (65) is the same as the proof of Proposition 4.1-(1).

Remark 4.4

Having proved Proposition 4.1-(1), the proof of Proposition 4.1-(2) is almost identical. Namely, if we consider the empirical coloring profile in the second moment and consider the analogue of w(g) (53) in the second moment (i.e. replace \(\dot{\Phi },\hat{\Phi }^{\text {lit}}\), and \(\bar{\Phi }\) in (53) respectively by \(\dot{\Phi }\otimes \dot{\Phi }, \hat{\Phi }^{\text {lit}}(\cdot \oplus \underline{\texttt {L}}_1)\otimes \hat{\Phi }^{\text {lit}}(\cdot \oplus \underline{\texttt {L}}_2)\), and \(\bar{\Phi }\otimes \bar{\Phi }\)), then the rest of the argument is the same.

As a corollary, we make an observation that the contribution to \(\mathbb {E}{{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }\) and \(\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\big )^2\) from too large \(X(\zeta )\) is negligible.

Corollary 4.5

Let \(c>0\), \(L>0\), \(\lambda \in (0,\lambda ^\star _L)\) and \(\zeta \in \cup _l \{0,1\}^{2\,l}\) be fixed. Then, the following estimates hold true:

  1. (1)

    \(\mathbb {E}\Big [ \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \mathbb {1}\{X(\zeta )\ge c\log n \}\Big ] = n^{-\Omega (\log \log n)} \mathbb {E}\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L)}\);

  2. (2)

    \(\mathbb {E}\Big [ \big (\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^2 \mathbb {1}\{X(\zeta )\ge c\log n \}\Big ] = n^{-\Omega (\log \log n)} \mathbb {E}\Big [\big (\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^2\Big ]\);

  3. (3)

    The analogue of (1) is true for the untruncated model with \(\lambda \in (0,\lambda ^\star )\). Namely, (1) continues to hold when we replace \({{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}\) by \({{\textbf {Z}}}_{\lambda }^{\text{ tr }}\).

Proof

We present the proof of (1) of the corollary; the others will follow by the same idea due to Proposition 4.1. Let \(c_{\textsf {cyc}}=c_{\textsf {cyc}}(||\zeta ||)\) be as in Proposition 4.1, and set \(c'=\frac{1}{2}(c\wedge c_{\textsf {cyc}})\). Then, by Markov’s inequality, we have

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \mathbb {1}\{X(\zeta )\ge c\log n \}\right] \le \left( \frac{c}{2}\log n \right) ^{-c'\log n} \mathbb {E}\left[ \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \cdot \left( X(\zeta )\right) _{{c'\log n}} \right] . \end{aligned} \end{aligned}$$

Then, plugging the estimate from Proposition 4.1-(1) in the rhs implies the conclusion. \(\square \)

To conclude this section, we present an estimate that bounds the sizes of \(\delta (\zeta )\) and \(\delta _L(\zeta )\). One purpose for doing this is to obtain Assumption (c) of Theorem 3.3.

Lemma 4.6

In the setting of Proposition 4.1, let \(\lambda \in (0,\lambda ^\star ]\) and \(\delta _L\) be defined as (45). Then, there exists an absolute constant \(C>0\) such that for all \(\zeta \in \cup _l \{0,1\}^{2l}\) and L large enough,

$$\begin{aligned} \delta _L(\zeta ;\lambda ) \le (k^C 2^{-k})^{||\zeta ||}. \end{aligned}$$
(67)

Hence, \(\delta (\zeta ;\lambda ) \le (k^C 2^{-k})^{||\zeta ||}\) holds by Proposition 4.1-(4), and we have for large enough k,

$$\begin{aligned} \sum _{\zeta }\mu (\zeta )\delta _L(\zeta ;\lambda )^2 \le \sum _{l=1}^{\infty } \frac{1}{2l} (k-1)^{l}(d-1)^{l}(k^C 2^{-k})^{2l}<\infty , \end{aligned}$$

where the last inequality holds because \(d\le k 2^k\) holds by Remark 1.2. Replacing \(\delta _L(\zeta ;\lambda )\) by \(\delta (\zeta ;\lambda )\) in the equation above, the analogue also holds for the untruncated model.

5 The Rescaled Partition Function and Its Concentration

In random regular k-nae-sat, it is believed that the primary reason for non-concentration of \({{\textbf {Z}}}_{\lambda }^{\text{ tr }}\) is the existence of short cycles in the graph. Based on the computations done in the previous section, we show that the partition function is indeed concentrated if we rescale it by the cycle effects. However, we work with the truncated model, since some of our important estimates break down in the untruncated model. The goal of this section is to establish Proposition 3.4.

To this end, we write the variance of the rescaled partition by the sum of squares of Doob martingale increments with respect to the clause-revealing filtration, and study each increment by using a version of discrete Fourier transform. Although such an idea was also used in [25] to study \({{\textbf {Z}}}_0\), the rescaling factors of the partition function make the analysis more involved and ask for more delicate estimates (for instance, Proposition 4.1) than what is done in [25]. Moreover, an important thing to note is that due to the rescaling, the result we obtain in Proposition 3.4 is stronger than Proposition 6.1 in [25]. This improvement describes the underlying principle more clearly, which says that the multiplicative fluctuation of the partition function originates from the existence of cycles.

Although the setting in this section is similar to that in Section 6, [25], we begin with explaining them in brief for completeness. Then, we focus on the point where the aforementioned improvement comes from, and outline the other technical details which are essentially analogous to those in [25].

Throughout this section, we fix \(L\ge 1\), \(\lambda \in (0,\lambda ^\star _L)\) and \(l_0>0\), which all can be arbitrary. Recall the rescaled partition function \({{\textbf {Y}}}\equiv {{\textbf {Y}}}_{\lambda ,l_0}^{(L)}(\mathscr {G})\) defined in (40):

$$\begin{aligned} {{\textbf {Y}}}\equiv {{\textbf {Y}}}_{\lambda ,l_0}^{(L)}(\mathscr {G}) \equiv \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \prod _{\zeta : \,||\zeta ||\le l_0} (1+\delta _{L}(\zeta ) )^{-X({\zeta })}, \end{aligned}$$

where \(\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\equiv \sum _{||H-H^\star _{\lambda ,L}||_1\le n^{-1/2}\log ^{2}n}{{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }[H]\). We sometimes write \({{\textbf {Y}}}(\mathscr {G})\) to emphasize the dependence on \(\mathscr {G}= (\mathcal {G}, \underline{\texttt {L}})\), the underlying random (dk)-regular graph.

Let \(\mathcal {F}_i\) be the \(\sigma \)-algebra generated by the first i clauses \(a_1,\ldots ,a_i\) and the matching of the half-edges adjacent to them. Then, we can write

$$\begin{aligned} {{\,\text {Var}\,}}({{\textbf {Y}}}) = \sum _{i=1}^m \mathbb {E}\left( \mathbb {E}\left[ \left. {{\textbf {Y}}}\right| \mathcal {F}_i \right] - E\left[ \left. {{\textbf {Y}}}\right| \mathcal {F}_{i-1} \right] \right) ^2\equiv \sum _{i=1}^m {{\,\text {Var}\,}}_i({{\textbf {Y}}}). \end{aligned}$$

For each i, let A denote the set of clauses with indices between \(i \vee (m-k+1)\) and m. Set \(\mathscr {K}\) to be the collection of variable-adjacent half-edges that are matched to A. Further, let \(\acute{\mathscr {G}} = (\acute{\mathcal {G}}, \acute{\texttt {L}})\) be the random (dk)-regular graph coupled to \(\mathscr {G}\), which has the same clauses \(a_1,\ldots ,a_{\max {\{i-1,m-k\}}}\) and literals adjacent to them as \(\mathscr {G}\) and randomly resampled clauses and their literals adjacent to \(\mathscr {K}\):

$$\begin{aligned} \begin{aligned}&A \equiv (a_{\max \{i, m-k+1\}},\ldots , a_m );\\&\acute{A}\equiv (\acute{a}_{\max {\{i,m-k+1\}}},\ldots , \acute{a}_m). \end{aligned} \end{aligned}$$

Let \(G^\circ \equiv \mathcal {G}\setminus A\) be the graph obtained by removing A and the half-edges adjacent to it from \(\mathcal {G}\). Then, for \(i\le m-k+1\), Jensen’s inequality implies that

$$\begin{aligned} {{\,\text {Var}\,}}_i({{\textbf {Y}}}) \le \mathbb {E}\left( {{\textbf {Y}}}(\mathscr {G}) - {{\textbf {Y}}}(\acute{\mathscr {G}}) \right) ^2 \le \sum _{A,\acute{A}} \mathbb {E}\left( {{\textbf {Y}}}(G^\circ \cup A) - {{\textbf {Y}}}(G^\circ \cup \acute{A}) \right) ^2, \end{aligned}$$

where the summation in the rhs runs over all possible matchings \(A, \acute{A}\) of \(\mathscr {K}\) by k clauses (we refer to Section 6.1 in [25] for the details). Note that the sum runs over the finitely many choices depending only on k, which is affordable in our estimate. Also, we can write down the same inequality with \(i>m-k+1\), for which the only difference is the size of \(\mathscr {K}\) being smaller than \(k^2\). Thus, in the remaining subsection, our goal is to show that for \(|\mathscr {K}|=k^2 \), there exists an absolute constant \(C>0\) such that

$$\begin{aligned} \mathbb {E}\left( {{\textbf {Y}}}( A) - {{\textbf {Y}}}(\acute{A}) \right) ^2 \lesssim _{k,L} \frac{(k^C4^{-k})^{l_0}}{n} (\mathbb {E}{{\textbf {Y}}})^2, \end{aligned}$$
(68)

where we denoted \({{\textbf {Y}}}( A)\equiv {{\textbf {Y}}}(G^\circ \cup A)\). This estimate, which is shown at the end of Sect. 5.3, directly implies the conclusion of Proposition 3.4.

Before moving on, we present an analogue of Corollary 4.5 for the rescaled partition function. This will function as a useful fact in our later analysis on \({{\textbf {Y}}}\). Due to the rescaling factors in \({{\textbf {Y}}}\), the proof is more complicated than that of Corollary 4.5, but still based on similar ideas from Proposition 4.1 and hence we defer it to Sect. A.2.

Corollary 5.1

Let \(c>0\), \(L>0\), \(\lambda \in (0,\lambda ^\star _L)\) and \(l_0>0 \) be fixed and let \({{\textbf {Y}}}= {{\textbf {Y}}}_{\lambda ,l_0}^{(L)}\) as above. Then, for any \(\zeta \) such that \(||\zeta ||\le l_0\), the following estimates hold true:

  1. (1)

    \(\mathbb {E}[{{\textbf {Y}}}\mathbb {1}\{X(\zeta )\ge c\log n \}] = n^{-\Omega _k(\log \log n)} \mathbb {E}\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\);

  2. (2)

    \(\mathbb {E}[ {{\textbf {Y}}}^2 \mathbb {1}\{X(\zeta )\ge c\log n \}] = n^{-\Omega _k(\log \log n)} \mathbb {E}(\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }})^2\);

5.1 Fourier decomposition and the effect of rescaling

To see (68), we will apply a discrete Fourier transform to \({{\textbf {Y}}}(A)\) and control its Fourier coefficients. We begin with introducing the following definitions to study the effect of A and \(\acute{A}\): Let \(B_t^\circ (\mathscr {K})\) denote the ball of graph-distance t in \(G^\circ \) around \(\mathscr {K}\). Hence, for instance, if t is even then the leaves of \(B_t^\circ (\mathscr {K})\) are the half-edges adjacent to clauses. Then, we set

$$\begin{aligned} T\equiv B_{l_0}^\circ (\mathscr {K}). \end{aligned}$$

Note that T is most likely a union of \(|\mathscr {K}|\) disjoint trees, but it can contain a cycle with probability \(O((dk)^{l_0/2}/n)\). Let \(\mathscr {U}\) denote the collection of leaves of T other than the ones in \(\mathscr {K}\), and we write \(G^\partial \equiv G^\circ \setminus T\).

Remark 5.2

(A parity assumption) For the rest of Sect. 5, we assume that \(l_0\) is even. The assumption gives that the half-edges in \(\mathscr {U}\) are adjacent to clauses of T and hence their counterparts are adjacent to variables of \(G^\partial \). For technical reasons in dealing with the rescaling factors (Lemma 5.5), we have to treat the case of odd \(l_0\) separately, however it will be apparent that the argument from Sects. 5.15.3 works the same. In Remark 5.4, we explain the main difference in formulating the Fourier decomposition for an odd \(l_0\).

Based on the above decomposition of \(\mathcal {G}\), we introduce several more notions as follows. For \(\zeta \in \{0,1\}^{2l}\) with \(l\le l_0\), let \(X({\zeta })\) and \(X^T({\zeta })\) (resp. \(\acute{X}({\zeta })\) and \(\acute{X}^T({\zeta })\)) be the number of \(\zeta \)-cycles in the graph \(G^\circ \cup A = \mathcal {G}\) and \(A\cup T\) (resp. \(G^\circ \cup \acute{A} = \acute{\mathcal {G}}\) and \(\acute{A}\cup T\)), respectively, and set

$$\begin{aligned} X^\partial (\zeta ) \equiv X({\zeta }) - X^T({\zeta }). \end{aligned}$$

(Note that this quantity is the same as \(\acute{X}({\zeta })-\acute{X}^T({\zeta })\), since the distance from \(\mathscr {U}\) to \(\mathscr {K}\) is at least \(2l_0\).) Based on this notation, we define the local-neighborhood-rescaled partition function \({{\textbf {Z}}}_T\) and \(\acute{{{\textbf {Z}}}}_T\) by

$$\begin{aligned} \begin{aligned} {{\textbf {Z}}}_T&\equiv {{\textbf {Z}}}'[G^\circ \cup A] \prod _{\zeta : ||\zeta || \le l_0} \left( 1+\delta _L (\zeta ) \right) ^{-X^T (\zeta )};\\ \acute{{{\textbf {Z}}}}_T&\equiv {{\textbf {Z}}}' [G^\circ \cup \acute{A}] \prod _{\zeta : ||\zeta || \le l_0} \left( 1+\delta _L (\zeta ) \right) ^{-\acute{X}^T (\zeta )}, \end{aligned} \end{aligned}$$
(69)

where \({{\textbf {Z}}}' \equiv \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\) and \({{\textbf {Z}}}'[G^\circ \cup A]\) denotes the partition function on the graph \(G^\circ \cup A=\mathcal {G}\). Here, we omitted the dependence on the literals \(\underline{\texttt {L}}\) on \(\mathcal {G}\), since we are only interested in their moments.

One of the main ideas of Sect. 5 is to relate \({{\textbf {Y}}}\) and \({{\textbf {Z}}}_T\), by establishing the following lemma:

Lemma 5.3

Let \({{\textbf {Y}}}(A), {{\textbf {Y}}}(\acute{A}), {{\textbf {Z}}}_T\), \(\acute{\text {{\textbf {Z}}}}_T\) and \(X^\partial \) be defined as above. Then, we have

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left[ \left( {{\textbf {Y}}}(A)- {{\textbf {Y}}}(\acute{A})\right) ^2 \right] \\ {}&= (1+o(1 ) ) \mathbb {E}\left[ \left( {{\textbf {Z}}}_T - \acute{{{\textbf {Z}}}}_T \right) ^2 \right] \exp \left( -\sum _{||\zeta ||\le l_0} \mu (\zeta ) (2\delta (\zeta ) +\delta (\zeta )^2) \right) + O\left( \frac{\log ^6 n}{n^{3/2}} \right) \mathbb {E}({{\textbf {Z}}}')^2, \end{aligned} \end{aligned}$$

where \({{\textbf {Z}}}'\equiv \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\) and the error o(1) depends on L, \(l_0\).

The lemma can be understood as a generalization of Proposition 4.1 to the case of \({{\textbf {Z}}}_T\). Although the proof of the lemma is based on similar ideas as the proposition, the analysis becomes more delicate since we need to work with the difference \({{\textbf {Y}}}(A)- {{\textbf {Y}}}(\acute{A})\). The proof will be discussed later in Sect. 5.4.

In the remaining section, we develop ideas to deduce (68) from Lemma 5.3. To work with \({{\textbf {D}}}:={{\textbf {Z}}}_T - \acute{{{\textbf {Z}}}}_T\), we develop a discrete Fourier transform framework as introduced in Section 6 of [25]. Recall the definition of the weight factor \(w^{\text {lit}}_{\mathcal {G}}(\underline{\sigma }_{\mathcal {G}})\) on a factor graph \(\mathcal {G}\), which is

$$\begin{aligned} w^{\text {lit}}_{\mathscr {G}} (\underline{\sigma }_{\mathcal {G}})\equiv { \prod _{v\in V(\mathcal {G})} \dot{\Phi }(\underline{\sigma }_v) \prod _{a\in F(\mathcal {G})} \hat{\Phi }^{\text {lit}}_a(\underline{\sigma }_a\oplus \underline{\texttt {L}}_a) \prod _{e\in E(\mathcal {G})} \bar{\Phi }(\sigma _e)}. \end{aligned}$$

Let \(\kappa (\underline{\sigma }_\mathscr {U})\) (resp. \({{\textbf {Z}}}^\partial (\underline{\sigma }_\mathscr {U})\)) denote the contributions to \({{\textbf {Y}}}(A)\) coming from \(T{\setminus } \mathscr {U}\) (resp. \(G^\partial \)) given \(\underline{\sigma }_\mathscr {U}\), namely,

$$\begin{aligned} \begin{aligned} \kappa (\underline{\sigma }_\mathscr {U}) \equiv \kappa (\underline{\sigma }_\mathscr {U}, \mathscr {G})&\equiv \frac{\sum _{\underline{\sigma }_T \sim \underline{\sigma }_\mathscr {U}} w^{\text{ lit }}_{A\cup T\setminus \mathscr {U}}(\underline{\sigma }_{A\cup T\setminus \mathscr {U}})^\lambda }{ (1+\underline{\delta }_{L})^{\underline{X}^T} };\\ {{\textbf {Z}}}^\partial (\underline{\sigma }_\mathscr {U}) \equiv {{\textbf {Z}}}^\partial (\underline{\sigma }_\mathscr {U}, \mathscr {G})&\equiv {\sum _{\underline{\sigma }_{G^\partial } \sim \underline{\sigma }_\mathscr {U}} w^{\text{ lit }}_{G^\partial }(\underline{\sigma }_{G^\partial })^\lambda }.\end{aligned} \end{aligned}$$
(70)

where \(\underline{\sigma }_T \sim \underline{\sigma }_\mathscr {U}\) means that the configuration of \(\underline{\sigma }_T\) on \(\mathscr {U}\) is \(\underline{\sigma }_\mathscr {U}\). Define \(\acute{\kappa }(\underline{\sigma }_\mathscr {U})\) analogously, by \(\acute{\kappa } (\underline{\sigma }_\mathscr {U}) \equiv \kappa (\underline{\sigma }_\mathscr {U}, \acute{\mathscr {G}})\).

The main intuition is that the dependence of \(\mathbb {E}{{\textbf {Z}}}^\partial (\underline{\sigma }_\mathscr {U})\) on \(\underline{\sigma }_\mathscr {U}\) should be given by the product measure that is i.i.d. \(\dot{q}^\star _{\lambda ,L}\) at each \(u\in \mathscr {U}\), where \(\dot{q}^\star _{\lambda ,L}\) is the fixed point of the BP recursion we saw in Proposition 2.17. To formalize this idea, we perform a discrete Fourier decomposition with respect to \(\underline{\sigma }_\mathscr {U}\) in the following setting. Let \(({{\textbf {b}}}_1,\ldots ,{{\textbf {b}}}_{|\dot{\Omega }_L|})\) be an orthonormal basis for \(L^2(\dot{\Omega }_L,\dot{q}^\star _{\lambda ,L})\) with \({{\textbf {b}}}_1\equiv 1\), and let \({{\textbf {q}}} \) be the product measure \(\otimes _{u\in \mathscr {U}} \dot{q}^\star _{\lambda ,L}\). Extend this to the orthonormal basis \(({{\textbf {b}}}_{\underline{r}})\) on \(L^2((\dot{\Omega }_L)^\mathscr {U}, {{\textbf {q}}})\) by

$$\begin{aligned} {{\textbf {b}}}_{\underline{r}}(\underline{\sigma }_\mathscr {U}) \equiv \prod _{u\in \mathscr {U}} {{\textbf {b}}}_{r(u)}(\sigma _u) \quad \text{ for } \text{ each } \underline{r}\in [|\dot{\Omega }_L|]^\mathscr {U}, \end{aligned}$$

where \([|\dot{\Omega }_L|]:= \{1,2,\ldots , \dot{\Omega }_L \}.\) For a function f on \((\dot{\Omega }_L)^\mathscr {U}\), we denote its Fourier coefficient by

$$\begin{aligned} f^\wedge (\underline{r}) \equiv \sum _{\underline{\sigma }_\mathscr {U}} f(\underline{\sigma }_\mathscr {U}) {{\textbf {b}}}_{\underline{r}}(\underline{\sigma }_\mathscr {U}) {{\textbf {q}}}(\underline{\sigma }_\mathscr {U}). \end{aligned}$$

Then, defining \({{\textbf {F}}}(\underline{\sigma }_\mathscr {U})\equiv {{\textbf {q}}}(\underline{\sigma }_\mathscr {U})^{-1} {{\textbf {Z}}}^\partial (\underline{\sigma }_\mathscr {U})\), we use Plancherel’s identity to obtain that

$$\begin{aligned} \begin{aligned} {{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}(\mathscr {G})\prod _{\zeta : ||\zeta || \le l_0} \left( 1+\delta _L (\zeta ) \right) ^{-X^T (\zeta )}= \sum _{\underline{r}} \kappa ^\wedge (\underline{r}) {{\textbf {F}}}^\wedge (\underline{r}). \end{aligned} \end{aligned}$$

Remark 5.4

(When \(l_0\) is odd). If \(l_0\) is odd, then the half-edges \(\mathscr {U}\) are adjacent to the clauses of \(G^\partial \). Therefore, the base measure of the Fourier decomposition should be \(\hat{q}^\star _{\lambda ,L}\) rather than \(\dot{q}^\star _{\lambda ,L}\). In this case, we rely on the same idea that \({{\textbf {Y}}}^\partial (\underline{\sigma }_{\mathscr {U}})\) should approximately be written in terms of the product measure of \(\hat{q}^\star _{\lambda ,L}\).

To describe the second moment of the above quantity, we abuse notation and write \({{\textbf {q}}}\), \({{\textbf {b}}}\) for the product measure of \(\dot{q}_{\lambda ,L}^\star \otimes \dot{q}_{\lambda ,L}^\star \) on \(\mathscr {U}\) and the orthonormal basis given by \({{\textbf {b}}}_{\underline{r}^1,\underline{r}^2}(\underline{\sigma }^1,\underline{\sigma }^2)\equiv {{\textbf {b}}}_{\underline{r}^1}(\underline{\sigma }^1){{\textbf {b}}}_{\underline{r}^2}(\underline{\sigma }^2).\) Moreover, we denote the pair configuration by \(\underline{\varvec{\sigma }}=(\underline{\sigma }^1,\underline{\sigma }^2)\) throughout Sect. 5. Let be the contribution of the pair configurations on \(G^\partial \) given by

Then, denote by the contribution to from pair coloring profile \(||H-H^\bullet _{\lambda ,L}||_1\le n^{-1/2}\log ^{2}n\), where \(H^\bullet _{\lambda ,L}\) is defined in Definition 2.20. Recall that \({{\textbf {Z}}}_T\) is defined in terms of \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\) as in (69). Since \(\lambda <\lambda ^\star _L\) and we restricted our attention to \(||H-H^\star _{\lambda ,L}||_1 \le n^{-1/2}\log ^{2}n\) in \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\), the major contribution to the second moment \(\mathbb {E}{{\textbf {D}}}^{2}\equiv \mathbb {E}({{\textbf {Z}}}_T-\acute{{{\textbf {Z}}}}_T)^{2}\) comes from , where is defined by

(71)

Namely, Proposition 4.20 of [37] and Proposition 3.10 of [43] imply that

Thus, we aim to upper bound . Let \(\mathbb {E}_T\) denote the conditional expectation given T. Again using Plancherel’s identity, we can write

(72)

where we wrote

(73)

In the remaining subsections, we begin with estimating \(\kappa ^\wedge \) in Sect. 5.2. This is the part that carries the major difference from [25] in the conceptual level, which in turn provides Proposition 3.4, a stronger conclusion than Proposition 6.1 of [25]. Then, since the Fourier coefficients deals with the non-rescaled partition function, we may appeal to the analysis given in [25] to deduce above (68) in Sect. 5.3.

Before moving on, we introduce some notations following [25] that are used in the remainder of Sect. 5. We write \(\varnothing \) as the index of an all-1 vector, that is, \({{\textbf {b}}}_{\varnothing }\equiv 1\). Moreover, for \(\underline{r}=(\underline{r}^1,\underline{r}^2)\in [|\dot{\Omega }_L|]^{2\mathscr {U}}\), we define

$$\begin{aligned} |\{\underline{r}^1 \underline{r}^2 \}| \equiv |\{u\in \mathscr {U}: r^1(u) \ne 1 \text { or } r^2(u)\ne 1 \}|. \end{aligned}$$

5.2 Local neighborhood Fourier coefficients

The properties of \(\kappa ^\wedge \) may vary significantly depending on the structure of \(T= B^\circ _{l_0}(\mathscr {K})\). Typically, T consists of \(|\mathscr {K}|\) disjoint trees, and in this case the rescaling factor has no effect due to the absence of cycles. Therefore, the analysis done in Section 6.4 of [25] can be applied to our case as follows. Let \({{\textbf {T}}}\) be the event that T consists of \(|\mathscr {K}|\) tree components. Then, Lemmas 6.8 and 6.9 of [25] imply that when \({{\textbf {T}}}\) tholds, for \(\underline{r}\in [|\dot{\Omega }_L|]^{2\mathscr {U}}\),

  • \(\kappa ^\wedge (\underline{r}) = \acute{\kappa }^\wedge (\underline{r})\) for all \(|\{\underline{r} \}|\le 1\).

  • \(\left. \kappa ^\wedge (\varnothing )\right| _{\text {{\textbf {T}}}}\) takes a constant value \(\overline{\kappa }^\wedge (\varnothing )\) independent of A and the literals on T.

  • \(|\kappa ^\wedge (\underline{r}) - \acute{\kappa }^\wedge (\underline{r})| \lesssim _k \overline{\kappa }^\wedge (\varnothing )/4^{(k-4)l_0}\) for all \(|\{\underline{r}\}|=2\).

Moreover, let \({{\textbf {C}}}^\circ \) denote the event that T contains a single cycle but consists of \(|\mathscr {K}|\) connected components, one of which contains a single cycle and others which are trees. In this case, although the rescaling factor is now non-trivial, it is the same for both \(\kappa \) and \(\acute{\kappa }\). Therefore, Lemma 6.8 of [25] tells us that

  • \(\kappa ^\wedge (\varnothing ) = \acute{\kappa }^\wedge (\varnothing )\).

The case where we notice an important difference is the event \({{\textbf {C}}}_{t}\), \(t\le l_0\), when \(B_{t-1}^\circ (\mathscr {K})\) has \(|\mathscr {K}|\) connected components but \(B_{t'}^\circ \) has \(|\mathscr {K}|-1\) components for \(t\le t'\le l_0\). Using the cycle effect, we deduce the following estimate which is stronger than Lemma 6.10 of [25].

Lemma 5.5

Suppose that \(T\in {{\textbf {C}}}_{t}\) for some \(t\le l_0\). Then, for any choice of A and \(\acute{A}\) of matching \(\mathscr {K}\) with k clauses, we have

$$\begin{aligned} \kappa ^\wedge (\varnothing ) = \acute{\kappa }^\wedge (\varnothing ). \end{aligned}$$

Proof

Let \(T_0\) and \(T_\textsf {link}\) be the connected components of T defined as follows: \(T\in {{\textbf {C}}}_t\) consists of \(|\mathscr {K}|-2\) copies of isomorphic trees \(T_0\) and one tree \(T_\textsf {link}\) that contains two half-edges of \(\mathscr {K}\). Note that \(T\cup A\) and \(T\cup \acute{A}\) have different structures only if we are in the following situation:

  • One clause in A is connected with both half-edges of \(\mathscr {K}\cap T_\textsf {link}\). Thus, the connected components of \(T\cup A\) are \((k-1)\) copies of \(\mathcal {T}_0\) and one copy of \(\mathcal {T}_\textsf {cyc}\) as illustrated in Fig. 1. (Recall that we assumed \(|\mathscr {K}|=k^2\) (68).) Here, \(\mathcal {T}_0\) is the union of disjoint k copies of \(T_0\) and a clause connecting them. Also, \(\mathcal {T}_\textsf {cyc}\) is the union of \(k-2\) disjoint copies of \(T_0\), one \(T_\textsf {link}\), and a clause connecting them.

  • The two half-edges \(\mathscr {K}\cap T_{\textsf {link}}\) are connected to different clauses of \(\acute{A}\). Therefore, the connected components of \(T\cup \acute{A}\) are \((k-2)\) copies of \(\mathcal {T}_0\) and one copy of \(\mathcal {T}_\textsf {link}\). Here, \(\mathcal {T}_{\textsf {link}}\) is the union of \(2k-2\) disjoint copies of \(T_0\), one \(T_{\textsf {link}}\) and two clauses connecting them as illustrated in Fig. 1.

Fig. 1
figure 1

An illustration of the graphs \(A\cup T\) (left) and \(\acute{A}\cup T\) (right)

Let \(\kappa _0^\wedge \) and \(\kappa _\textsf {cyc}^\wedge \) (resp. \(\kappa _\textsf {link}^\wedge \)) be the contributions to \(\kappa ^\wedge (\varnothing )\) (resp. \(\acute{\kappa }^\wedge (\varnothing )\)) from \(\mathcal {T}_0\) and \(\mathcal {T}_\textsf {cyc}\), respectively (resp. \(\mathcal {T}_{\textsf {link}}\)). Then, we have

$$\begin{aligned} \kappa ^\wedge (\varnothing ) = (\kappa _0^\wedge )^{k-1} \kappa ^\wedge _\textsf {cyc} ,\quad \text {and}\quad \acute{\kappa }^\wedge (\varnothing ) =(\kappa _0^\wedge )^{k-2} \kappa _\textsf {link}^\wedge . \end{aligned}$$
(74)

In what follows, we present an explicit computation of \(\kappa _0^\wedge \), \(\kappa _\textsf {cyc}^\wedge \) and \(\kappa _\textsf {link}^\wedge \) and show that the two quantities in (74) are the same.

We begin with computing \(\kappa _0^\wedge \). Since we are in a tree, \(\kappa _0^\wedge \) does not depend on the assignments of literals, and hence we can replace the weight factor \(w^{\text {lit}}\) by its averaged version w. Let \(e_0\) (resp. \(\mathscr {Y}_0\)) be the root half-edge (resp. the collection of leaf half-edges) of \(T_0\). We define

$$\begin{aligned} \varkappa _0 (\sigma ; \underline{\sigma }_{\mathscr {Y}_0}) \equiv \sum _{\underline{\sigma }_{T_0}\sim (\sigma , \underline{\sigma }_{\mathscr {Y}_0})} w(\underline{\sigma }_{T_0})^\lambda , \end{aligned}$$
(75)

where \(\underline{\sigma }_{T_0}\sim (\sigma , \underline{\sigma }_{\mathscr {Y}_0})\) means that \(\underline{\sigma }_{T_0}\) agrees with \(\sigma \) and \(\underline{\sigma }_{\mathscr {Y}_0}\) at \(e_0\) and \(\mathscr {Y}_0\), respectively. Note that since \(\mathcal {T}_0\) is a tree, the rescaling factor from the cycle effect is trivial. Denoting the number of variables and clauses of \(T_0\) by \(v(T_0)\) and \(a(T_0)\), respectively, the Fourier coefficient of \(\varkappa _0(\sigma ;\,\cdot \,)\) at \(\varnothing \) is given by

$$\begin{aligned} \varkappa _0^\wedge (\sigma ) \equiv \sum _{\underline{\sigma }_{\mathscr {Y}_0}} \varkappa _0(\sigma ;\underline{\sigma }_{\mathscr {Y}_0}) {{\textbf {q}}}(\underline{\sigma }_{\mathscr {Y}_0}) = \dot{q}^\star _{\lambda ,L}(\sigma ) \dot{\mathscr {Z}}^{v(T_0)} \hat{\mathscr {Z}}^{a(T_0)},\end{aligned}$$
(76)

where the second equality follows from the fact that \(\dot{q}^\star _{\lambda ,L}\) and the constants \(\dot{\mathscr {Z}} = \dot{\mathscr {Z}}_{{q}^\star _{\lambda ,L}}\) and \(\hat{\mathscr {Z}} = \hat{\mathscr {Z}}_{q^\star _{\lambda ,L}}\) are the fixed point and the normalizing constants of the Belief Propagation recursions (20). Thus, we can calculate \(\kappa _0^\wedge \) by

$$\begin{aligned} \kappa _0^\wedge = \sum _{\underline{\sigma }\in (\dot{\Omega }_L)^k} \hat{\Phi }(\underline{\sigma }) \prod _{i=1}^k \varkappa _0^\wedge (\sigma _i) = \hat{\mathfrak {Z}}\,\dot{\mathscr {Z}}^{k\,v(T_0)} \hat{\mathscr {Z}}^{k\,a(T_0)}, \end{aligned}$$
(77)

where \(\hat{\mathfrak {Z}}\) is the normalizing constant of \(\hat{H}^\star _{\lambda ,L}\) given by (25). Since \(\mathcal {T}_{\textsf {link}}\) is a tree, we can compute \(\kappa _\textsf {link}^\wedge \) using the same argument, namely,

$$\begin{aligned} \kappa _\textsf {link}^\wedge = \hat{\mathfrak {Z}}\, \dot{\mathscr {Z}}^{(2k-2)v(T_0)+v(T_\textsf {link})} \hat{\mathscr {Z}}^{(2k-2)a(T_0)+ a(T_\textsf {link})+1}, \end{aligned}$$
(78)

since the total number of variables and clauses in \(\mathcal {T}_{\textsf {link}}\) are \((2k-2)v(T_0)+v(T_\textsf {link})\) and \((2k-2)a(T_0)+ a(T_\textsf {link})+2\).

What remains is to calculate \(\kappa _\textsf {cyc}^\wedge \). There is a single cycle of length 2t in the graph \(T\cup A\), and let this be a \(\zeta \)-cycle with \(\zeta \in \{0,1\}^{2t}\). Unlike the previous two cases, the literal assignment \(\zeta \) actually has a non-trivial effect, but still the literals outside of the cycle can be ignored. We compute

$$\begin{aligned} \tilde{\kappa }_\textsf {cyc}^\wedge = \kappa _\textsf {cyc}^\wedge \cdot Tr\left[ \prod _{i=1}^t \dot{A}_L \hat{A}_L^{\zeta _{2i-1},\zeta _{2i}} \right] , \end{aligned}$$

which does not include the rescaling term by the cycle effect. Let C denote the cycle in \(\mathcal {T}_{\textsf {cyc}}\) and 2t be its length. Let \(\mathscr {Y}_{C}\) be the half-edges that are adjacent to but not contained in C. Hence, \(t(d-2)\) (resp. \(t(k-2)\)) half-edges in \(\mathscr {Y}_{C}\) are adjacent to a variable (resp. a clause) in C.

For each \(u\in \mathscr {Y}_{\textsf {cyc}}\), let \(T_u\) denote the connected component of \(\mathcal {T}_{\textsf {cyc}}\setminus \{u \}\) that is a tree. Let \(e_u\) denote the root half-edge of \(T_u\), that is, the half-edge that is matched with u in \(\mathcal {T}_{\textsf {cyc}}\), and \(\varkappa _u(\sigma ;\,\cdot \,)\) be defined analogously as (75). Then, according to the same computation as (76), we obtain that

$$\begin{aligned} \varkappa _u^\wedge (\sigma _u) = {\left\{ \begin{array}{ll} \dot{q}_{\lambda ,L}^\star (\sigma _u) \dot{\mathscr {Z}}^{v(T_u)} \hat{\mathscr {Z}}^{a(T_u)} , &{}\text {if } u\text { is adjacent to a clause in }C,\\ \hat{q}_{\lambda ,L}^\star (\sigma _u) \dot{\mathscr {Z}}^{v(T_u)} \hat{\mathscr {Z}}^{a(T_u)} , &{}\text {if } u\text { is adjacent to a variable in }C. \end{array}\right. } \end{aligned}$$
(79)

Furthermore, for convenience we denote the set of variables, clauses and edges of C by VF, and E, respectively and setting \(\mathscr {Y}\equiv \mathscr {Y}_{C}\cup E\). For each \(a\in F\), denote the two literals on C that are adjacent to a by \(\zeta _a^1, \zeta _a^2\). Observe that \(\kappa _\textsf {cyc}^\wedge \) can be written by

$$\begin{aligned} \tilde{\kappa }_\textsf {cyc}^\wedge&= \sum _{\underline{\sigma }_{\mathscr {Y}}} {\prod _{v\in V} \dot{\Phi }(\underline{\sigma }_v)^\lambda \prod _{a\in F} \hat{\Phi }^{\zeta _a^1,\zeta _a^2}(\underline{\sigma }_a)^\lambda }{ \prod _{e\in E} \bar{\Phi }(\underline{\sigma }_e)^\lambda } \prod _{u\in \mathscr {Y}_{C}} \varkappa _u^\wedge (\sigma _u) \end{aligned}$$
(80)
$$\begin{aligned}&= \dot{\mathscr {Z}}^{\sum _{u\in \mathscr {Y}_{C}} v(T_u) } \hat{\mathscr {Z}}^{\sum _{u\in \mathscr {Y}_{C}} a(T_u) } \sum _{\underline{\sigma }_{\mathscr {Y}}} \frac{ \prod _{v\in V} \dot{H}^\star (\underline{\sigma }_v) \prod _{a\in F} \hat{H}^{\zeta _a^1,\zeta _a^2}(\underline{\sigma }_a) }{\prod _{e\in E} \bar{H}^\star (\sigma _e)}\, \frac{ \dot{\mathfrak {Z}}^t \hat{\mathfrak {Z}}^t }{\bar{\mathfrak {Z}}^{2t}}, \end{aligned}$$
(81)

where the second equality is obtained by multiplying \(\prod _{e\in E} \dot{q}^\star _{\lambda ,L}(\sigma _e) \hat{q}^\star _{\lambda ,L}(\sigma _e)\) both in the numerator and denominator of the first line. Moreover, the normalizing constant for \(\hat{H}^{\zeta _1,\zeta _2}\) is the same regardless of \(\zeta _1,\zeta _2\) (see (41)). (Note that in the RHS we wrote \(\dot{H}^\star \equiv \dot{H}^\star _{\lambda ,L}\) and similarly for \(\hat{H}^{\zeta _1,\zeta _2}, \bar{H}^\star \).) The literal assignments did not play a role in the previous two cases of \(\mathcal {T}_0\), \(\mathcal {T}_{\textsf {link}}\) which are trees, but in \(\mathcal {T}_{\textsf {cyc}}\) their effect is non-trivial in principle due to the existence of the cycle C. Plugging the identities \(\dot{\mathfrak {Z}}=\dot{\mathscr {Z}}\bar{\mathfrak {Z}}\) and \(\hat{\mathfrak {Z}}=\hat{\mathscr {Z}}\bar{\mathfrak {Z}}\) into (81), we deduce that

$$\begin{aligned} \tilde{\kappa }_\textsf {cyc}^\wedge = \dot{\mathscr {Z}}^{v(\mathcal {T}_{\textsf {cyc}})} \hat{\mathscr {Z}}^{a(\mathcal {T}_{\textsf {cyc}})} \cdot Tr\left[ \prod _{i=1}^t \dot{A}_L \hat{A}_L^{\zeta _{2i-1},\zeta _{2i}} \right] , \end{aligned}$$

and hence \(\tilde{\kappa }_\textsf {cyc}^\wedge = \dot{\mathscr {Z}}^{v(\mathcal {T}_{\textsf {cyc}})} \hat{\mathscr {Z}}^{a(\mathcal {T}_{\textsf {cyc}})} \). Therefore, combining this result with (74), (77) and (78), we obtain the conclusion \(\kappa ^\wedge (\varnothing ) = \acute{\kappa }^\wedge (\varnothing ) \). \(\square \)

5.3 The martingale increment estimate and the proof of Proposition 3.4

We begin with establishing (68) by combining the discussions in the previous subsections. The proof follows by the same argument as Section 7, [25], along with plugging in the improved estimate Lemma 5.5 and obtaining an estimate on \(\mathbb {E}{{\textbf {Y}}}\) using Proposition 4.1.

To this end, we first review the result from [25] that gives the estimate on the Fourier coefficients defined in (73). In [25] Lemma 6.7 and the discussion below, it was showed that

(82)

independent of T. (The logarithmic factor for \(|\{\underline{r}^1, \underline{r}^2 \}| \ge 3\) is slightly worse than that of [25], since we work with g such that \(||g-g^\star ||\le \sqrt{n}\log ^2 n\), not \(||g-g^\star ||\le \sqrt{n}\log n\).) Based on this fact and the analysis from Sect. 5.2, our first goal in this subsection is to establish the following:

Lemma 5.6

Let \(L>0, \lambda \in (0,\lambda ^\star _L) \) and \(l_0 >0\) be fixed, and let \({{\textbf {Z}}}_T\) and \(\acute{{{\textbf {Z}}}}_T\) be given as (69). Then, there exist an absolute constant \(C>0\) and a constant \(C_{k,L}>0\) such that for large enough n,

$$\begin{aligned} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \right] \le \frac{C_{k,L}}{n} (k^C 4^{-k})^{l_0} (\mathbb {E}{{\textbf {Z}}}')^2,\end{aligned}$$
(83)

where \({{\textbf {Z}}}' = \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }} \)

Proof

Let be defined as (71). Based on the expression (72), we study the conditional expectation for different shapes of T. To this end, we first recall the events \({{\textbf {T}}}\), \({{\textbf {C}}}^\circ \) and \({{\textbf {C}}}_t\) defined in the beginning of Sect. 5.2. We additionally write

$$\begin{aligned} {{\textbf {B}}}\equiv \left( \cup _{t\le l_0} {{\textbf {C}}}_t \cup {{\textbf {T}}}\cup {{\textbf {C}}}^\circ \right) ^c . \end{aligned}$$
(84)

Note that T can be constructed from a configuration model in a depth \(\ell \) neighborhood of \(\mathscr {K}\) which is of size \(O_k(1)\). Revealing the edges of these neighborhoods one by one, each new edge creates a cycle with probability \(O_k(1/n)\). The event \({{\textbf {C}}}^\circ \) requires a single cycle so by a union bound \(\mathbb {P}({{\textbf {C}}}^\circ )=O_k(1/n)\) while the event \(\text {{\textbf {B}}}\) requires at least two cycles so again by a union bound \(\mathbb {P}(\text {{\textbf {B}}})=O_k(n^{-2})\).

For each event above, we can make the following observation. When we have \({{\textbf {T}}}\), the only contribution to comes from \((\underline{r}^1, \underline{r}^2)\) such that \(|\{\underline{r}^1, \underline{r}^2 \}| \ge 2\), due to the properties of \(\kappa ^\wedge \) discussed in the beginning of Sect. 5.2. Note that the number of choices of \((\underline{r}_1, \underline{r}_2)\) with \(|\{\underline{r}^1, \underline{r}^2 \}| = 2\) is \(\le |\dot{\Omega }_L|^4 (k^54^k)^{l_0}\). Therefore, (82) gives that

(85)

Similarly on \({{\textbf {C}}}^\circ \), the analysis on \(\kappa ^\wedge \) implies that there is no contribution from \((\underline{r}^1,\underline{r}^2)= \varnothing \). Thus, we obtain from (82) that

(86)

Since the event \({{\textbf {B}}}\) has probability \(\mathbb {P}({{\textbf {B}}}) =O_k(n^{-2})\), we also have that

(87)

The last remaining case is \({{\textbf {C}}}_t\), and this is where we get a nontrivial improvement compared to [25]. Lemma 5.5 tells us that there is no contribution from \((\underline{r}_1,\underline{r}_2) = \varnothing \). Thus, similarly as (86), for each \(t\le l_0\) we have

(88)

Thus, combining the Eqs. (85)–(88), we obtain the conclusion. \(\square \)

To obtain the conclusion of the form (68), we need to replace \((\mathbb {E}{{\textbf {Z}}}')^2\) in (83) by \((\mathbb {E}{{\textbf {Y}}})^2\). This follows from Proposition 4.1 and can be summarized as follows.

Corollary 5.7

Let \(L>0\), \(\lambda \in (0,\lambda ^\star _L)\) and \(l_0>0\) be fixed, and let \({{\textbf {Y}}}\equiv {{\textbf {Y}}}_{\lambda ,l_0}^{(L)}\) be the rescaled partition function defined by (40). Further, let \(\underline{\mu }\), \(\underline{\delta }_L\) be as in Proposition 4.1. Then, we have

$$\begin{aligned} \mathbb {E}{{\textbf {Y}}}= \left( 1+O\left( \frac{\log ^3 n}{n^{1/2}}\right) \right) \mathbb {E}{{\textbf {Z}}}' \cdot \left\{ \exp \left( -\sum _{||\zeta ||\le l_0} \mu (\zeta ) \delta _L(\zeta ) \right) +o(n^{-1}) \right\} ,\end{aligned}$$

where \({{\textbf {Z}}}^\prime \equiv \widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\).

Proof

Let \(c_\textsf {cyc}=c_\textsf {cyc}(l_0)\) be given as Proposition 4.1. Corollary 5.1 shows that \(\mathbb {E}{{\textbf {Y}}}\mathbb {1}\{||\underline{X}||_\infty \ge c_{\textsf {cyc}} \log n \}\) is negligible for our purposes, and hence we focus on estimating \(\mathbb {E}{{\textbf {Y}}}\mathbb {1}\{||\underline{X}||_\infty \le c_{\textsf {cyc}} \log n \}\).

Note that for an integer \(x\ge 0\), \((1+\theta )^x = \sum _{a\ge 0} \frac{(x)_a}{a!} \theta ^a \). Thus, if we define \(\tilde{\delta }(\zeta ) \equiv (1+\delta _L(\zeta ))^{-1} -1\), we can write

$$\begin{aligned} \begin{aligned} \mathbb {E}[{{\textbf {Y}}}\mathbb {1}\{||\underline{X}||_\infty \le c_{\textsf {cyc}} \log n \}]&= \sum _{\underline{a}\ge 0} \frac{1}{\underline{a}!} \mathbb {E}\left[ {{\textbf {Z}}}' (\tilde{\underline{\delta }})^{\underline{a}} (\underline{X})_{\underline{a}} \mathbb {1}\{ ||\underline{X}||_\infty \le c_{\textsf {cyc}}\log n \} \right] \\ {}&= \left( 1+O\left( \frac{\log ^3 n}{n^{1/2}}\right) \right) \sum _{ ||\underline{a}||_\infty \le c_{\textsf {cyc}}\log n } \frac{1}{\underline{a}!} \mathbb {E}{{\textbf {Z}}}' \left( \tilde{\underline{\delta }} \underline{\mu } (1+ \underline{\delta }_L) \right) ^{\underline{a}},\end{aligned} \end{aligned}$$

and performing the summation in the rhs easily implies the conclusion.\(\square \)

We conclude this subsection by presenting the proof of Proposition 3.4.

Proof of Proposition 3.4

As discussed in the beginning of Sect. 5, it suffices to establish (68) to deduce Proposition 3.4. Combining Lemmas 5.3, 5.6 and Corollary 5.7 gives that

$$\begin{aligned} \frac{\mathbb {E}[({{\textbf {Y}}}(A) - {{\textbf {Y}}}(\acute{A}))^2 ] }{ (\mathbb {E}{{\textbf {Y}}})^2 } \le \frac{1}{n} (k^C 4^{-k})^{l_0} \exp \left( \sum _{||\zeta ||\le l_0} \mu (\zeta ) \delta _L(\zeta )^2 \right) + O\left( \frac{\log ^6 n}{n^{3/2}} \right) ,\end{aligned}$$

for some absolute constant \(C>0\). Moreover, Lemma 4.6 implies that

$$\begin{aligned} \sum _{\zeta } \mu (\zeta ) \delta _L(\zeta )^2 <\infty , \end{aligned}$$

hence establishing (68). \(\square \)

5.4 Proof of Lemma 5.3

In this subsection, we establish Lemma 5.3. One nontrivial aspect of this lemma is achieving the error \(O(n^{-3/2} \log ^6 n) \mathbb {E}[({{\textbf {Z}}}')^2]\), where \({{\textbf {Z}}}^\prime \equiv \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\). For instance, there can be short cycles in \(\mathcal {G}\) intersecting T (but not included in T) with probability \(O(n^{-1})\), and in principle this will contribute by \(O(n^{-1}) \) in the error term. One observation we will see later is that the effect of these cycles wears off since we are looking at the difference \({{\textbf {Y}}}(A) - {{\textbf {Y}}}(\acute{A})\) between rescaled partition functions.

To begin with, we decompose the rescaling factor (which is exponential in \(\underline{X}^\partial \)) into the sum of polynomial factors based on an elementary fact we also saw in the proof of Corollary 5.7: for a nonnegative integer x, we have \((1+\theta )^x = \sum _{a\ge 0} \frac{(x)_a}{a!} \theta ^a\). Let \(\tilde{\delta }(\zeta )= (1+\delta _L(\zeta ) )^{-2}-1\), and write

$$\begin{aligned} \left( 1+\underline{\delta }_L \right) ^{-2\underline{X}^\partial } = \sum _{\underline{a}\ge 0} \frac{1}{\underline{a}!} \tilde{\underline{\delta }}^{\underline{a}} (\underline{X}^\partial )_{\underline{a}}. \end{aligned}$$
(89)

Therefore, our goal is to understand \(\mathbb {E}[({{\textbf {Z}}}_T-\acute{{{\textbf {Z}}}}_T)^2 (\underline{X}^\partial )_{\underline{a}}\) which can be described as follows.

Lemma 5.8

Let \(L>0\), \(\lambda \in (0,\lambda _L^\star )\) and \(l_0>0\) be fixed, set \(\underline{\mu }\), \(\underline{\delta }_L\) as in Proposition 4.1, and let \({{\textbf {Z}}}_T, \acute{{{\textbf {Z}}}}_T\) be defined as (69). For any \(\underline{a}=(a_\zeta )_{||\zeta ||\le l_0} \) with \(||\underline{a}||_\infty \le \log ^2 n\), we have

$$\begin{aligned} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 (\underline{X}^\partial )_{\underline{a}} \right] =&\left( 1+ O\left( \frac{||\underline{a}||_1^2}{n} \right) \right) \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2\right] \left( \underline{\mu } (1+\underline{\delta }_L)^2 \right) ^{\underline{a}} \nonumber \\ {}&\quad + O\left( \frac{||\underline{a}||_1 \log ^6 n }{n^{3/2}} \right) \mathbb {E}[({{\textbf {Z}}}')^2].\end{aligned}$$
(90)

The first step towards the proof is to write the lhs of (90) using the Fourier decomposition as in Sect. 5.1. To this end, we recall Definitions 3.2, 4.2 (but now \(\Delta \) counts the number of pair-coloring configurations around variables, clauses, and half-edges) and decompose \((\underline{X}^\partial )_{\underline{a}}\) similarly as the expression (51). Hence, we write

$$\begin{aligned} \mathbb {E}_T \left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 (\underline{X}^\partial )_{\underline{a}} \right] =\sum _{\mathcal {Y}} \sum _{\underline{\tau }_\mathcal {Y}} \mathbb {E}_T \left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y},\underline{\varvec{\sigma }}_\mathcal {Y}\} \right] , \end{aligned}$$

where \(\mathcal {Y}= \{\mathcal {Y}_i(\zeta ) \}_{i\in [a_\zeta ],\, ||\zeta ||\le l_0}\) denotes the locations of \(\underline{a}\) \(\zeta \)-cycles and \(\underline{\varvec{\sigma }}_\mathcal {Y}\) describes a prescribed coloring configuration on them.

In what follows, we fix a tuple \((\mathcal {Y},\underline{\varvec{\sigma }}_\mathcal {Y})\) and work with the summand of above via Fourier decomposition. Let

$$\begin{aligned} U \equiv \mathscr {U}\cap \left( \cup _{v\in V(\mathcal {Y})} \delta v \right) \end{aligned}$$

be the set of half-edges in \(\mathcal {U}\) that are adjacent to a variable in \(\mathcal {Y}\). Since the colors on U are already given by \(\underline{\varvec{\sigma }}_\mathcal {Y}\), we will perform a Fourier decomposition in terms of \(\underline{\varvec{\sigma }}_{\mathscr {U}'}\), with \(\mathscr {U}' \equiv \mathscr {U}\setminus U\). Let \(\kappa (\underline{\sigma }_{\mathscr {U}'}; \underline{\sigma }_\mathcal {Y}) \) (resp. \(\acute{\kappa }(\underline{\sigma }_{\mathscr {U}'}; \underline{\sigma }_\mathcal {Y}) \)) be the partition function on \(T \cup A\) (resp. \(T\cup \acute{A}\)) (in terms of the single-copy model), under the prescribed coloring configuration \(\underline{\sigma }_{\mathscr {U}'}\) on \(\mathscr {U}'\) and \(\underline{\sigma }_{\mathcal {Y}\cap T}\) on \(\mathcal {Y}\cap T\). Setting

$$\begin{aligned} \varpi (\,\cdot ; \underline{\sigma }_{\mathcal {Y}}) \equiv \kappa (\,\cdot ; \underline{\sigma }_\mathcal {Y}) - \acute{\kappa }(\,\cdot ; \underline{\sigma }_\mathcal {Y}), \end{aligned}$$

and writing \(\underline{\varvec{\sigma }}_\mathcal {Y}= (\underline{\sigma }_\mathcal {Y}^1, \underline{\sigma }_\mathcal {Y}^2 )\), we obtain by following the same idea as (71) that

(91)

Note that \((\underline{X}^\partial )_{\underline{a}}\) is deterministically bounded by \(\exp (O(\log ^3 n))\), and hence at the end the second term will have a negligible contribution due to \(\exp (-\Omega (n) )\), which comes from the correlated pairs of colorings. Then, we investigate

(92)

To be specific, we want to derive the analog of Lemma 6.7, [25], which dealt with without having the planted cycles inside the graph. To explain the main computation, we introduce several notations before moving on. Let \(\bar{\Delta }\), \(\bar{\Delta }_U\) be counting measures on \(\dot{\Omega }_L^2\) defined as

$$\begin{aligned} \begin{aligned}&\bar{\Delta }(\varvec{\tau }) = |\{e\in E_c(\mathcal {Y}) \setminus (E(T) \cup U) : \varvec{\sigma }_{e} = \varvec{\tau }\} |, \quad \text {for all } \varvec{\tau }\in \dot{\Omega }_L^2;\\&\bar{\Delta }_U ({\varvec{\tau }}) = |\{e\in U: \varvec{\sigma }_{e} = \varvec{\tau }\} |, \quad \text {for all } \varvec{\tau }\in \dot{\Omega }_L^2. \end{aligned} \end{aligned}$$

Note that \(\bar{\Delta }\) and \(\bar{\Delta }_U\) indicate empirical counts of edge-colors on disjoint sets. Moreover, for a given coloring configuration \(\underline{\varvec{\sigma }}_\mathcal {Y}\) on \(\mathcal {Y}\), we define \(\Delta _\partial =(\dot{\Delta }_\partial , (\hat{\Delta }^{\underline{\texttt {L}}}_\partial )_{\underline{\texttt {L}}})\), the restricted empirical profile on \(\mathcal {Y}\setminus T\), by

$$\begin{aligned} \begin{aligned} \dot{\Delta }_\partial ( \underline{\varvec{\sigma }})&= |\{v\in V(\mathcal {Y}) \setminus V(T) : \underline{\varvec{\sigma }}_{\delta v} = \underline{\varvec{\sigma }}\}|, \quad \text {for all }\underline{\varvec{\sigma }}\in (\dot{\Omega }_L^2)^d; \\ \hat{\Delta }_\partial ^{\underline{\texttt {L}}} ( \underline{\varvec{\sigma }})&= |\{a\in F(\mathcal {Y}) \setminus F(T) : \underline{\varvec{\sigma }}_{\delta a} = \underline{\varvec{\sigma }}, \underline{\texttt {L}}_{\delta a} = \underline{\texttt {L}}\}|, \quad \text {for all }\underline{\varvec{\sigma }}\in (\dot{\Omega }_L^2)^k, \; \underline{\texttt {L}}\in \{0,1\}^k. \end{aligned} \end{aligned}$$

Note that \(\dot{\Delta }_\partial \) carries the information on the colors on U, while \(\bar{\Delta }\) does not (and hence we use different notations). Lastly, let \(\mathscr {U}' \equiv \mathscr {U}\setminus U\), and for a given coloring configuration \(\underline{\varvec{\sigma }}_{\mathscr {U}'}\) on \(\mathscr {U}'\), define \(\bar{h}^{\underline{\varvec{\sigma }}_{\mathscr {U}'}}\) to be the following counting measure on \(\dot{\Omega }_L^2\):

$$\begin{aligned} \bar{h}^{\underline{\varvec{\sigma }}_{\mathscr {U}'}} (\varvec{\sigma }) = |\{e\in \mathscr {U}' : \varvec{\sigma }_e = \varvec{\sigma }\}|, \quad \text {for all } \varvec{\sigma }\in \dot{\Omega }_L^2. \end{aligned}$$

Then, the next lemma provides a refined estimate on (92), which can be thought as a planted-cycles analog of Lemma 6.7, [25].

Lemma 5.9

Let \(\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\) be given as above. For any given \(\underline{a}\) with \(||\underline{a}||_\infty \le \log ^2 n\) and for all \(\underline{\varvec{\sigma }}_{\mathscr {U}'}\), we have

(93)

where the terms in the identity can be explained as follows.

  1. (1)

    \(c_0>0\) is a constant depending only on \(|\mathscr {U}|\).

  2. (2)

    \(b (\underline{\varvec{\sigma }}_{\mathcal {Y}})\) is a quantity such that \(|\epsilon (\underline{\varvec{\sigma }}_{\mathcal {Y}})| = O(n^{-1/2} \log ^2n)\), independent of \(\underline{\varvec{\sigma }}_{\mathscr {U}'}\).

  3. (3)

    \(C_{k,L}>0\) is an integer depending only on kL, and \(\xi _j = (\xi _j (\tau ))_{\tau \in \dot{\Omega }_L^2}\), \(0\le j\le C_{k,L}\) are fixed vectors on \(\dot{\Omega }_L^2\) satisfying

    $$\begin{aligned} ||\xi _j||_\infty = O(n^{-1/2}). \end{aligned}$$
  4. (4)

    \(\mathbb {P}_T(\mathcal {Y})\) is the conditional probability given the structure T such that the prescribed half-edges of \(\mathcal {Y}\) are all paired together and assigned with the right literals.

  5. (5)

    Write \(\dot{H}\equiv \dot{H}^\star _{\lambda ,L} \), and similarly for \(\hat{H}^{\underline{\texttt {L}}}\), \(\bar{H}\). The function \(\beta _T( \mathcal {Y}, \Delta )\) is defined as

    $$\begin{aligned} \beta _T(\mathcal {Y},\Delta ) \equiv \frac{\dot{H}^{\dot{\Delta }_\partial } \prod _{\underline{\texttt {L}}} (\hat{H}^{\underline{\texttt {L}}})^{\hat{\Delta }_\partial ^{\underline{\texttt {L}}}} }{ \bar{H}^{\bar{\Delta } + \bar{\Delta }_U }} \times \prod _{e\in U} \dot{q}^\star _{\lambda ,L} (\varvec{\sigma }_e). \end{aligned}$$

The proof goes similarly as that of Proposition 4.1, but requires extra care due to the complications caused by the (possible) intersection between \(\mathcal {Y}\) and T. Due to its technicality, we defer the proof to Sect. A.4 in the appendix.

Based on the expansion obtained from Lemma 5.9, we conclude the proof of Lemma 5.8.

Proof of Lemma 5.8

We work with fixed \(\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\) as in Lemma 5.9. For \(\underline{r}=(\underline{r}^1,\underline{r}^2)\), define the Fourier coefficient of (92) as

(94)

We compare this with the Fourier coefficients

(95)

of which we already saw the estimates in (82). In addition, it will be crucial to understand the expansion of as in Lemma 5.9. This was already done in Lemma 6.7 of [25] and we record the result as follows. \(\square \)

Lemma 5.10

(Lemma 6.7, [25]). There exist a constant \(C_{k,L}'>0\) and coefficients \(\xi _j'\equiv (\xi _j'(\varvec{\sigma }))_{\varvec{\sigma }\in \dot{\Omega }_L^2}\) indexed by \(0\le j\le C_{k,L}'\), such that \(||\xi _j'||_\infty = O(n^{-1/2})\) and

(96)

where \(c_0\) is the constant appearing in Lemma 5.9. Moreover, \(C_{k,L}'\) and the coefficients \(\xi _j'\), \(1\le j \le C_{k,L}'\) can be set to be the same as \(C_{k,L}\) and \(\xi _j\) in Lemma 5.9.

The identity (96) follows directly from Lemma 6.7, [25], and the last statement turns out to be apparent from the proof of Lemma 5.9 (see Sect. A.4).

Based on Lemma 5.9, we obtain the following bound on the Fourier coefficient (94):

(97)

Moreover, suppose that \(U= \emptyset \), that is, \(\mathcal {Y}\) does not intersect with \(\mathscr {U}\). In this case, we can compare (94) and (95) in the following way, based on Lemmas 5.9 and 5.10:

(98)

Using these observations, we investigate the following formula which can be deduced from (91) by Plancherel’s identity:

(99)

where the Fourier coefficients of \(\varpi \) are given by

$$\begin{aligned} \varpi ^\wedge (\underline{r}^1;\underline{\sigma }_{\mathcal {Y}}^1) \equiv \sum _{\underline{\sigma }_{\mathscr {U}'}^1} \varpi (\underline{\sigma }_{\mathscr {U}'}^1; \underline{\sigma }^1_\mathcal {Y}) \,{{\textbf {b}}}_{\underline{r}^1} (\underline{\sigma }_{\mathscr {U}'}^1)\,{{\textbf {q}}}(\underline{\sigma }_{\mathscr {U}'}^1) .\end{aligned}$$

Define \(\eta (\mathcal {Y})\equiv \eta (\mathcal {Y};T)\equiv |\bar{\Delta }|+|U|-|\dot{\Delta }_\partial |- |\hat{\Delta }_\partial |\), similarly as (56). As before, note that the quantities \(|\bar{\Delta }|, |U|, |\dot{\Delta }_\partial |,\) and \( |\hat{\Delta }_\partial |\) are all well-defined if T and \(\mathcal {Y}\) are given. Observe that

$$\begin{aligned} \#\{ \text {connected components in }\mathcal {Y}\text { disjoint with } \mathscr {U}\} = ||\underline{a}||_1 -\eta (\mathcal {Y}). \end{aligned}$$

The remaining work is done by a case analysis with respect to \(\eta (\mathcal {Y})\).

Case 1. \(\eta (\mathcal {Y})=0\).

In this case, all cycles in \(\mathcal {Y}\) are not only pairwise disjoint, but also disjoint with \(\mathscr {U}\). As we will see below, such \(\mathcal {Y}\) gives the most contribution to (99). Recall the events \(\text {{\textbf {T}}}\), \(\text {{\textbf {C}}}^\circ \), \(\text {{\textbf {C}}}_t\) and \(\text {{\textbf {B}}}\) defined in the beginning of Sect. 5.2 and in (84).

On the event \({{\textbf {T}}}^c = \cup _{t\le l_0} {{\textbf {C}}}_t \cup {{\textbf {C}}}^\circ \cup {{\textbf {B}}}\), we can apply the same approach as in the proof of Lemma 5.6 using (97) and obtain that

$$\begin{aligned} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \;; {{\textbf {T}}}^c\right] = O\left( \frac{\log n}{n^{3/2}} \right) \mathbb {E}[({{\textbf {Z}}}')^2] \; \mathbb {P}(\mathcal {Y}|{{\textbf {T}}}^c) \; \beta _T(\mathcal {Y},\Delta ).\end{aligned}$$

On the other hand, on \({{\textbf {T}}}\), \(\varpi ^\wedge (\underline{r}^1) = 0 \) for \(|\{\underline{r}^1 \} |\le 1\) and hence the most contribution comes from \(|\{\underline{r} \} | =2\). To control this quantity, we use the estimate (98) and get

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \;; {{\textbf {T}}}\right] \\ {}&\quad = \mathbb {P}({{\textbf {T}}})\; \mathbb {P}(\mathcal {Y}|{{\textbf {T}}}) \; \beta _T(\mathcal {Y},\Delta ) \left( \mathbb {E}_T \left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2\right] + O\left( \frac{\log ^{12}n}{n^{3/2}} \right) \mathbb {E}[({{\textbf {Z}}}')^2] \right) .\end{aligned} \end{aligned}$$

If we sum over all \(\underline{\varvec{\sigma }}_\mathcal {Y}\), and then over all \(\mathcal {Y}\) such that \(\eta (\mathcal {Y})=0\), we obtain by following the same computations as (57)–(60) that

$$\begin{aligned} \begin{aligned}&\sum _{\mathcal {Y}: \eta (\mathcal {Y})=0} \sum _{\underline{\varvec{\sigma }}_\mathcal {Y}} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \right] \\ {}&\quad = \left( 1+ O\left( \frac{||\underline{a}||_1^2}{n} \right) \right) \left( \underline{\mu } ( 1+\underline{\delta }_L)^2 \right) ^{\underline{a}} \left( \mathbb {E}_T \left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2\right] + O\left( \frac{\log ^{12}n}{n^{3/2}} \right) \mathbb {E}[({{\textbf {Z}}}')^2] \right) .\end{aligned} \end{aligned}$$
(100)

Case 2. \(\eta (\mathcal {Y})=1\).

One important observation we make here is that if \(T\in {{\textbf {T}}}\) \(\eta (\mathcal {Y})=1\), then for any \(\underline{\varvec{\sigma }}_\mathcal {Y}= (\underline{\sigma }_\mathcal {Y}^1,\underline{\sigma }_\mathcal {Y}^2)\), we have

$$\begin{aligned} \kappa ^\wedge (\varnothing ; \underline{\sigma }_\mathcal {Y}^1) = \acute{\kappa }^\wedge (\varnothing ; \underline{\sigma }_\mathcal {Y}^1), \end{aligned}$$

and analogously for the second copy \(\underline{\sigma }_\mathcal {Y}^2\). If we had \(|U|\le 1\), then this is a direct consequence of the results mentioned in the beginning of Sect. 5.1.

On the other hand, suppose that \(|U|=2\). If we want to have \(\eta (\mathcal {Y})=1\), then the only choice of \(\mathcal {Y}\) is that there exists one cycle in \(\mathcal {Y}\) that intersects with \(\mathscr {U}\) at two distinct half-edges, while all others in \(\mathcal {Y}\) are disjoint from each other and from \(\mathscr {U}\). In such a case, since the lenghs of cycles in \(\mathcal {Y}\) are all at most \(2l_0\), the cycle intersecting with \(\mathscr {U}\) cannot intersect with A (or \(\acute{A}\)). Therefore, the two half-edges U are contained in the same tree of T, and hence by symmetry the \(\varnothing \)-th Fourier coefficient does not depend on A (or \(\acute{A}\)).

With this in mind, the \(\varnothing \)-th Fourier coefficient does not contribute to (99), and hence we get

$$\begin{aligned} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \;; {{\textbf {T}}}\right] = O\left( n^{-1/2}\right) \mathbb {E}[({{\textbf {Z}}}')^2] \; \mathbb {P}(\mathcal {Y}| {{\textbf {T}}}) \; \beta _T(\mathcal {Y},\Delta ),\end{aligned}$$

where \(\Delta = \Delta [\underline{\varvec{\sigma }}_\mathcal {Y}]\).

On the event \(\text {{\textbf {T}}}^c\), we can bound it coarsely by

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \left( \text {{\textbf {Z}}}_T -\acute{\text {{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \;; \text {{\textbf {T}}}^c\right]&\lesssim _{k,L} \mathbb {P}(\text {{\textbf {T}}}^c) \, \mathbb {E}[(\text {{\textbf {Z}}}')^2] \; \mathbb {P}_T (\mathcal {Y}|\text {{\textbf {T}}}^c) \; \beta _T(\mathcal {Y},\Delta )\\&=O\left( \frac{\log n}{n} \right) \mathbb {E}[(\text {{\textbf {Z}}}')^2] \; \mathbb {P}_T (\mathcal {Y}) \; \beta _T(\mathcal {Y},\Delta ). \end{aligned} \end{aligned}$$

What remains is to sum the above two over \(\underline{\varvec{\sigma }}_\mathcal {Y}\) and \(\mathcal {Y}\) such that \(\eta ({\mathcal {Y}})=1\). Since there can be at most 2 cycles from \(\mathcal {Y}\) that are not disjoint from all the rest, there exists a constant \(C=C_{k,L,l_0}\) such that

$$\begin{aligned} \sum _{\underline{\varvec{\sigma }}_\mathcal {Y}} \beta _T (\mathcal {Y},\Delta ) \le \left( 1+\underline{\delta }_L \right) ^{2\underline{a}} C^2. \end{aligned}$$
(101)

(see (61)) Then, we can bound the number choices of \(\mathcal {Y}\) as done in (62) and (64). This gives that

$$\begin{aligned} \sum _{\mathcal {Y}: \eta (\mathcal {Y})=1} \sum _{\underline{\varvec{\sigma }}_\mathcal {Y}} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \right] = O\left( \frac{C^2 l_0 ||\underline{a}||_1 }{n^{3/2}} \right) \left( \underline{\mu } (1+\underline{\delta }_L)^2 \right) ^{\underline{a}} \mathbb {E}[({{\textbf {Z}}}')^2].\end{aligned}$$
(102)

Case 3. \(\eta (\mathcal {Y})\ge 2\).

In this case, we deduce the conclusion relatively straightforwardly since \(\sum _{\mathcal {Y}} \mathbb {P}_T (\mathcal {Y})\) is too small. Namely, we first have the crude bound from (97) such that

$$\begin{aligned} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \right] = O(1) \mathbb {E}[({{\textbf {Z}}}')^2] \; \mathbb {P}_T (\mathcal {Y}) \; \beta _T(\mathcal {Y},\Delta ).\end{aligned}$$

From similar observations as in (101), we can obtain that

$$\begin{aligned} \sum _{\underline{\tau }_\mathcal {Y}}\beta _T (\mathcal {Y},\Delta ) \le \left( 1+\underline{\delta }_L \right) ^{2\underline{a} } C^{2\eta }, \end{aligned}$$

where C is as in (101). Further, we control the number of choices of \(\mathcal {Y}\) as before, which gives that

$$\begin{aligned} \sum _{\mathcal {Y}: \eta (\mathcal {Y})=\eta } \sum _{\underline{\varvec{\sigma }}_\mathcal {Y}} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \mathbb {1}\{\mathcal {Y}, \underline{\varvec{\sigma }}_\mathcal {Y}\} \right] = O\left( \left( \frac{C^2 l_0 ||\underline{a}||_1 }{n}\right) ^\eta \right) \left( \underline{\mu } (1+\underline{\delta }_L)^2 \right) ^{\underline{a}} \mathbb {E}[({{\textbf {Z}}}')^2].\end{aligned}$$
(103)

Combining (100), (102) and (103), we obtain the conclusion.\(\square \)

Having Lemma 5.8 in hand, we are now ready to finish the proof of Lemma 5.3.

Proof of Lemma 5.3

Set \(\tilde{{\delta }}(\zeta ) = (1+\delta _L(\zeta ))^{-2}-1\). Using the identity \((1+\theta )^x = \sum _{a\ge 0} \frac{(x)_a}{a!} \theta ^a\) (which holds for all nonnegative integer x), we can express that

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 (1+\underline{\delta }_L)^{-2\underline{X}^\partial } \mathbb {1}\{||\underline{X}^\partial ||_\infty \le \log n \} \right] \\ {}&\quad = \sum _{||\underline{a}||_\infty \le \log n} \frac{1}{\underline{a}!} \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2 \tilde{\underline{\delta }}^{\underline{a}} (\underline{X}^\partial )_{\underline{a}} \right] + n^{-\Omega (\log \log n)} \mathbb {E}[ ({{\textbf {Z}}}')^2],\end{aligned} \end{aligned}$$

where we used Corollary 5.1 to obtain the error term in the RHS. Also note that \((\underline{X}^\partial )_{\underline{a}}=0\) if \(||\underline{a}||_\infty >\log n\) and \(||\underline{X}^\partial ||_\infty \le \log n\). Therefore, by applying Lemma 5.8, we see that the above is the same as

$$\begin{aligned} \left( 1+ O\left( \frac{\log ^2 n}{n} \right) \right) \mathbb {E}\left[ \left( {{\textbf {Z}}}_T -\acute{{{\textbf {Z}}}}_T \right) ^2\right] \sum _{||\underline{a}||_\infty \le \log n} \frac{1}{\underline{a}!} \left( \tilde{\underline{\delta }} \underline{\mu } (1+\underline{\delta }_L)^2 \right) ^{\underline{a}} + O\left( \frac{\log ^{12}n }{n^{3/2}} \right) \mathbb {E}[({{\textbf {Z}}}')^2],\end{aligned}$$

and from here we can directly deduce conclusion from performing the summation. \(\square \)

6 Small Subgraph Conditioning and the Proof of Theorem 3.1

In this section, we prove Theorem 3.1 by small subgraph conditioning method. To do so, we derive the condition (d) of Theorem 3.3 first for the truncated model, and then deduce the analogue for the untruncated model based on the continuity of the coefficients, which was proved in [37, Lemma 4.17].

Proposition 6.1

Let \(L>0\) and \(\lambda \in (0,\lambda ^\star _L)\) be given. Moreover, set \(\mu (\zeta ), \delta _L(\zeta ;\lambda )\) as in Proposition 4.1. Recalling \(\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\) defined in (40), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\big )^{2} }{\big (\mathbb {E}\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^2 }= \exp \bigg ( \sum _{\zeta } \mu (\zeta ) \delta _L(\zeta ;\lambda )^2 \bigg ).\end{aligned}$$
(104)

Proof

Fix \(\lambda <\lambda ^\star _L\) and abbreviate \(\delta _L(\zeta )\equiv \delta _L(\zeta ;\lambda )\) as before. We first show that the lhs is lower bounded by the rhs in (104). Let \(\underline{X} = (X(\zeta ))_\zeta \) be the number of \(\zeta \)-cycles in \(\mathscr {G}\). For an integer \(l_0>0\), we write \(\underline{X}_{\le l_0}=(X(\zeta ))_{||\zeta ||\le l_0}\) (note the difference from the notations used in the previous subsections). Note that Proposition 4.1-(1) gives us that the limiting law of \(\underline{X}_{\le l_0}\) reweighted by \(\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\) must be independent Pois\(\big (\mu (\zeta )(1+\delta _L(\zeta )\big )\), since the moments of falling factorials are given by (46). Namely, for a given collection of integers \(\underline{x}_{\le l_0} = (x(\zeta ))_{||\zeta || \le l_0}\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}\big [ \widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda } \mathbb {1} \{\underline{X}_{\le l_0} = \underline{x}_{\le l_0} \}\big ] }{ \mathbb {E}\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda } } = \prod _{||\zeta || \le l_0 } \mathbb {P}\bigg ( \text{ Pois }\Big (\mu (\zeta )\big (1+\delta _L(\zeta )\big )\Big ) = x(\zeta ) \bigg ).\end{aligned}$$

Recall that the unweighted \(\underline{X}_{\le l_0}\) has the limiting law given by (37). Thus, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}\big [ \widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda } \, \big | \, \underline{X}_{\le l_0} = \underline{x}_{\le l_0} \big ]}{ \mathbb {E}\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda } } = \prod _{||\zeta ||\le l_0 } (1+\delta _L(\zeta ))^{x(\zeta )} e^{-\mu (\zeta )\delta (\zeta )}, \end{aligned}$$
(105)

for any \(\underline{x}_{\le l_0} = (x(\zeta ))_{||\zeta || \le l_0}\). Thus, by Fatou’s Lemma, we have

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{\mathbb {E}\left[ \mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda } \,\big |\, \underline{X}_{\le l_0}\big ]^{2} \right] }{ \big (\mathbb {E}\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\big )^2 }\ge \exp \bigg ( \sum _{||\zeta ||\le l_0} \mu (\zeta ) \delta _L(\zeta )^2 \bigg )\end{aligned}$$
(106)

Since this holds for any \(l_0\), we obtain the lower bound of lhs of (104).

To work with the upper bound, recall the definition of the rescaled partition function \({{\textbf {Y}}}_{l_0} \equiv {{\textbf {Y}}}_{\lambda ,l_0}^{(L)}\) in (40). For any \(\varepsilon >0\), Proposition 3.4 implies that there exists \(l_0(\varepsilon )>0\) such that for \(l_0\ge l_0(\varepsilon )\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}{{\textbf {Y}}}_{l_0}^2}{ (\mathbb {E}{{\textbf {Y}}}_{l_0})^2 } \le 1+\varepsilon . \end{aligned}$$
(107)

On the other hand, we make the following observation which are the consequences of Corollaries 4.5, 5.1 and Proposition 4.1:

$$\begin{aligned} \begin{aligned} \mathbb {E}{{\textbf {Y}}}_{l_0}&= (1+o(1)) \mathbb {E}\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda } \exp \bigg (-\sum _{||\zeta ||\le l_0} \mu (\zeta ) \delta _L(\zeta ) \bigg )\,,\\ \mathbb {E}{{\textbf {Y}}}_{l_0}^2&= (1+o(1)) \mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\big )^2 \exp \bigg (-\sum _{||\zeta ||\le l_0} \mu (\zeta ) \left( 2\delta _L(\zeta )+\delta _L(\zeta )^2 \right) \bigg )\,.\end{aligned} \end{aligned}$$
(108)

We briefly explain how to obtain (108). First, note that it suffices to estimate \(\mathbb {E}[{{\textbf {Y}}}_{l_0} \mathbb {1}_{\{||\underline{X} ||_\infty \le \log n \} }] \) due to Corollary 5.1. Then, we expand the rescaling factor of \({{\textbf {Y}}}_{l_0}\) by falling factorials using the formula (89). Each correlation term \(\mathbb {E}[{{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda } (\underline{X})_{\underline{a}}\mathbb {1}_{\{||\underline{X} ||_\infty \le \log n \} }]\) can then be studied based on Proposition 4.1 and Corollary 4.5. We can investigate the second moment of \({{\textbf {Y}}}_{l_0}\) analogously.

Combining (107) and (108) shows

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\big )^2 }{\big (\mathbb {E}{{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }\big )^2 } \le (1+\varepsilon ) \exp \left( \sum _{||\zeta ||\le l_0} \mu (\zeta ) \delta _L(\zeta )^2 \right) ,\end{aligned}$$

which holds for all \(l_0 \ge l(\varepsilon )\) and \(\varepsilon >0\). Therefore, letting \(l_0\rightarrow \infty \) and \(\varepsilon \rightarrow 0\) gives the conclusion. \(\square \)

The next step is to deduce the analogue of Proposition 6.1 for the untruncated model. To do so, we first review the following notions from [37]: for coloring configurations \(\underline{\sigma }^{1},\underline{\sigma }^{2}\in \Omega ^{E}\), let \(x^{1}\in \{0,1,{\texttt {f}}\}^{V}\) (resp. \(x^{2}\in \{0,1,{\texttt {f}}\}^{V}\)) be the frozen configuration corresponding to \(\underline{\sigma }^{1}\) (resp. \(\underline{\sigma }^{2}\)) via Lemma 2.7 and (14). Then, define the overlap \(\rho (\underline{\sigma }^{1},\underline{\sigma }^{2})\) of \(\underline{\sigma }^1\) and \(\underline{\sigma }^2\) by

$$\begin{aligned} \rho (\underline{\sigma }^{1},\underline{\sigma }^{2}):=\frac{1}{n}\sum _{i=1}^{n}\mathbb {1}\{x^{1}_i\ne x^{2}_i\}. \end{aligned}$$

Then, for \(\lambda \in [0,1]\) and \(s\in [0,\log 2)\), denote by the contribution to \((\text {{\textbf {Z}}}^{\text {tr}}_{\lambda ,s})^{2}\) from the near-independence regime \(|\rho (\underline{\sigma }^{1},\underline{\sigma }^{2})-\frac{1}{2}|\le k^{2}2^{-k/2}\)

$$\begin{aligned} {{\textbf {Z}}}^{2}_{\lambda ,s,\text{ ind }}&:=\sum _{\underline{\sigma }^{1},\underline{\sigma }^{2}\in \Omega ^{E}}w_{\mathscr {G}}^{\text{ lit }}(\underline{\sigma }^{1})^{\lambda } w_{\mathscr {G}}^{\text{ lit }}(\underline{\sigma }^{2})^{\lambda }\mathbb {1}\Big \{w_{\mathscr {G}}^{\text{ lit }}(\underline{\sigma }^{1}),w_{\mathscr {G}}^{\text{ lit }}(\underline{\sigma }^{2})\in [e^{ns},e^{ns+1})\,,\,\\&\Big |\rho (\underline{\sigma }^1,\underline{\sigma }^2)-\frac{1}{2}\Big |< \frac{k^2}{2^{k/2}}\Big \}. \end{aligned}$$

Similarly, we respectively denote by \({{\textbf {Z}}}^{2}_{\lambda ,\text{ ind }}\), \({{\textbf {Z}}}^{2,(L)}_{\lambda ,\text{ ind }}\) and \({{\textbf {Z}}}^{2,(L)}_{\lambda ,s,\text{ ind }}\) the contribution to \(({{\textbf {Z}}}^{\text{ tr }}_{\lambda })^{2}\), \(({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda })^2\) and \(({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda ,s})^{2}\) from the near-independence regime \(|\rho (\underline{\sigma }^{1},\underline{\sigma }^{2})-\frac{1}{2}|\le k^{2}2^{-k/2}\).

Proposition 6.2

Let \(\mu (\zeta )\) and \(\delta (\zeta ;\lambda ^\star )\) be the constants from Proposition 4.1. Then, for \((s_n)_{n \ge 1}\) converging to \(s^\star \) with \(|s_n-s^\star |\le n^{-2/3}\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}{{\textbf {Z}}}^{2}_{\lambda ^\star ,s_n,\text{ ind }}}{\big (\mathbb {E}{{\textbf {Z}}}^{\text{ tr }}_{\lambda ^\star ,s_n}\big )^{2}} = \exp \bigg ( \sum _{\zeta } \mu (\zeta ) \delta (\zeta ;\lambda ^\star )^2 \bigg ). \end{aligned}$$

Proof

Note that for \(\lambda <\lambda ^\star _L\), Proposition 4.20 of [37] shows that the contribution to \(\mathbb {E}\big ( \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^{2}\) from the correlated regime \(|\rho (\underline{\sigma }^{1},\underline{\sigma }^{2})-\frac{1}{2}|\ge k^{2}2^{-k/2}\) is negligible compared to the near-independence regime. Also, since \(\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\) is defined to be the contribution to \({{\textbf {Z}}}^{(L),\text{ tr }}_{\lambda }\) from \(||H-H^\star _{\lambda ,L}||_1 \le n^{-1/2}\log ^{2}n\), Proposition 3.10 of [43] shows that the contribution to \(\mathbb {E}\big ( \widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^{2}\) from the near-independence regime is \(\big (1-o(1)\big )\mathbb {E}{{\textbf {Z}}}^{2,(L)}_{\lambda ,\text{ ind }}\). Similarly, Proposition 3.4 of [43] shows \(\mathbb {E}\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}= \big (1-o(1)\big )\mathbb {E}{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}\). Therefore, for \(\lambda <\lambda ^\star _L\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}{{\textbf {Z}}}^{2,(L)}_{\lambda ,\text{ ind }} }{\big (\mathbb {E}{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}\big )^2 }= \lim _{n\rightarrow \infty }\frac{\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{(L),\text{ tr }}_{\lambda }\big )^{2} }{\big (\mathbb {E}\widetilde{{{\textbf {Z}}}}_{\lambda }^{(L),\text{ tr }}\big )^2 }= \exp \bigg ( \sum _{\zeta } \mu (\zeta ) \delta _L(\zeta ;\lambda )^2 \bigg ), \end{aligned}$$
(109)

where the last inequality is from Proposition 6.1. By Theorem 3.21, Proposition 4.15 and Proposition 4.18 in [37], we can send \(L\rightarrow \infty \) and \(\lambda \nearrow \lambda ^\star \) to have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}{{\textbf {Z}}}^2_{\lambda ^\star ,\text{ ind }}}{ \big (\mathbb {E}{{\textbf {Z}}}^\text{ tr}_{\lambda ^\star }\big )^2 } = \lim _{\lambda \nearrow \lambda ^\star }\lim _{L\rightarrow \infty } \lim _{n\rightarrow \infty }\frac{\mathbb {E}{{\textbf {Z}}}^{2,(L)}_{\lambda ,\text{ ind }} }{\big (\mathbb {E}{{\textbf {Z}}}_{\lambda }^{(L),\text{ tr }}\big )^2 }= \exp \left( \sum _{\zeta } \mu (\zeta ) \delta (\zeta ;\lambda ^\star )^2 \right) , \end{aligned}$$
(110)

where in the last inequality, we used (109) and Proposition 4.1-(4). Finally, Lemma 4.17 and Proposition 4.19 of [37] shows the lhs of the equation above equals \(\lim _{n\rightarrow \infty }\frac{\mathbb {E}{{\textbf {Z}}}^{2}_{\lambda ^\star ,s_n,\text{ ind }}}{\big (\mathbb {E}{{\textbf {Z}}}^{\text{ tr }}_{\lambda ^\star ,s_n}\big )^{2}}\), so (110) concludes the proof. \(\square \)

Corollary 6.3

Let \(\underline{X}_{\le l_0}=(X(\zeta ))_{||\zeta ||\le l_0}\) be collection of the number of \(\zeta \)-cycles in \(\mathscr {G}\) with size \(||\zeta ||\le l_0\). Denote \(s_\circ (C)\equiv s^\star -\frac{\log n}{2\lambda ^\star n}-\frac{C}{n}\). Recalling the definition of \(\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ,s}\) from (48), we have

$$\begin{aligned} \lim _{C\rightarrow \infty } \limsup _{l_0\rightarrow \infty }\limsup _{n\rightarrow \infty } \frac{\mathbb {E}\Big [{{\,\text {Var}\,}}\big (\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)} \,\big |\,\underline{X}_{\le l_0}\big )\Big ]}{\big (\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star , s_{\circ }(C)}\big )^{2}}=0. \end{aligned}$$
(111)

Proof

Proceeding in the same fashion as (106) in the proof of Proposition 6.1, Proposition 4.1-(3) shows

$$\begin{aligned} \liminf _{n\rightarrow \infty }\frac{\mathbb {E}\left[ \mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_\circ (C)} \,\big |\, \underline{X}_{\le l_0}\big ]^{2} \right] }{ \big (\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_\circ (C)}\big )^2 }\ge \exp \bigg ( \sum _{||\zeta ||\le l_0} \mu (\zeta ) \delta (\zeta ;\lambda ^\star )^2 \bigg ). \end{aligned}$$
(112)

To this end, we aim to find a matching upper bound for \(\frac{\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\big )^{2}}{\big (\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\big )^{2}}\). Note that Proposition 4.20 of [37] shows that the contribution to \(\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\big )^{2}\) from the correlated regime \(|\rho (\underline{\sigma }^{1},\underline{\sigma }^{2})-\frac{1}{2}|\ge k^{2}2^{-k/2}\) is bounded above by \(\lesssim _{k}e^{2n\lambda ^\star s_{\circ }(C)}\mathbb {E}{{\textbf {N}}}_{s_{\circ }(C)}+e^{-\Omega _{k}(n)}\). Thus, we have

$$\begin{aligned} \mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\big )^{2}\le C_k e^{2n\lambda ^\star s_{\circ }(C)}e^{\lambda ^\star C}+\mathbb {E}{{\textbf {Z}}}^{2}_{\lambda ^\star ,s_\circ (C),\text{ ind }}\le \Big (1+C_k^\prime e^{-\lambda ^\star C}\Big )\mathbb {E}{{\textbf {Z}}}^{2}_{\lambda ^\star ,s_\circ (C),\text{ ind }}, \end{aligned}$$
(113)

where the \(C_k\) and \(C_k^\prime \) are constants which depend only on k, and the last inequality holds because of Proposition 4.16 in [37]. Moreover, by Proposition 3.17 in [37], \(\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}=\big (1-o(1)\big )\mathbb {E}{{\textbf {Z}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\) holds. Thus, combining (113) and Proposition 6.2, we have

$$\begin{aligned} \limsup _{C\rightarrow \infty }\limsup _{n\rightarrow \infty } \frac{\mathbb {E}\big (\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\big )^{2}}{\big (\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\big )^{2}}\le \exp \bigg ( \sum _{\zeta } \mu (\zeta ) \delta (\zeta ;\lambda ^\star )^2 \bigg ). \end{aligned}$$
(114)

Therefore, (112), (114), and Lemma 4.6 conclude the proof. \(\square \)

Proof of Theorem 3.1

Fix \(\varepsilon >0\). Having Corollary 6.3 in mind, for \(\delta>0, C>0\) and \(l_0\in \mathbb {N}\), we bound

$$\begin{aligned}&\mathbb {P}\bigg (\frac{\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\le \delta \bigg )\le \mathbb {P}\bigg (\bigg |\frac{\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}-\frac{\mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\,\big |\, \underline{X}_{\le l_0}\big ]}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\bigg |\ge \delta \bigg )\nonumber \\&+\mathbb {P}\bigg (\frac{\mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\,\big |\, \underline{X}_{\le l_0}\big ]}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\le 2\delta \bigg ). \end{aligned}$$
(115)

We first control the second term of the rhs of (115): Proposition 4.1-(3) shows (cf. (105))

$$\begin{aligned} \frac{\mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\,\big |\, \underline{X}_{\le l_0}\big ]}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\; {\overset{\text{ d }}{\longrightarrow }}\; W_{l_0}:=\prod _{||\zeta ||\le l_0}\Big (1+\delta (\zeta ;\lambda ^\star )\Big )^{\bar{X}(\zeta )}e^{-\mu (\zeta )\delta (\zeta )}, \end{aligned}$$

where \(\{\bar{X}(\zeta )\}_{\zeta }\) are independent Poisson random variables with mean \(\{\mu (\zeta )\}_\zeta \). Moreover, we have

$$\begin{aligned} W_{\ell _0} \; {\overset{\text {a.s.}}{\longrightarrow }}\; W:=\prod _{\zeta }\Big (1+\delta (\zeta ;\lambda ^\star )\Big )^{\bar{X}(\zeta )}e^{-\mu (\zeta )\delta (\zeta )}\quad \text {and}\quad W>0\quad \text {a.s.}, \end{aligned}$$

where the infinite product in W is well defined a.s. due to Lemma 4.6 (see Theorem 9.13 of [29] for a proof). Thus, for small enough \(\delta \equiv \delta _{\varepsilon }\) which does not depend on C and large enough \(l_0\ge l_0(\varepsilon )\), we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }\mathbb {P}\bigg (\frac{\mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\,\big |\, \underline{X}_{\le l_0}\big ]}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\le 2\delta _{\varepsilon } \bigg )\le \frac{\varepsilon }{2}. \end{aligned}$$
(116)

We now turn to the first term of the rhs of (115). By Chebyshev’s inequality and Corollary 6.3, for large enough \(C\ge C_{\varepsilon }\) and \(\ell _0\ge \ell _0(\varepsilon )\), we have

$$\begin{aligned}&\limsup _{n\rightarrow \infty } \mathbb {P}\bigg (\bigg |\frac{\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}-\frac{\mathbb {E}\big [\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\,\big |\, \underline{X}_{\le l_0}\big ]}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\bigg |\ge \delta _{\varepsilon } \bigg )\nonumber \\ {}&\quad \le (\delta _{\varepsilon })^{-2}\limsup _{n\rightarrow \infty }\frac{\mathbb {E}\Big [{{\,\text {Var}\,}}\big (\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)} \,\big |\,\underline{X}_{\le l_0}\big )\Big ]}{\big (\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star , s_{\circ }(C)}\big )^{2}}\le \frac{\varepsilon }{2}. \end{aligned}$$
(117)

Therefore, by (115), (116) and (117), for \(\delta \equiv \delta _{\varepsilon }\) and \(C\ge C_{\varepsilon }\), we have

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathbb {P}\bigg (\frac{\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}{\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}}\le \delta \bigg )\le \varepsilon . \end{aligned}$$
(118)

Since \(\mathbb {E}\widetilde{{{\textbf {Z}}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}=\big (1-o(1)\big )\mathbb {E}{{\textbf {Z}}}^{\text{ tr }}_{\lambda ^\star ,s_{\circ }(C)}\) holds by Proposition 3.17 of [37] and \({{\textbf {Z}}}^{\text{ tr }}_{\lambda ^\star ,s}\asymp e^{n\lambda ^\star s}{{\textbf {N}}}^{\text{ tr }}_{s}\) holds by definition, (118) concludes the proof. \(\square \)