research-article

Open Access

Identifying Overly Restrictive Matching Patterns in SMT-based Program Verifiers (Extended Version)

Authors:
Alexandra Bugariu

Department of Computer Science, ETH Zurich

Department of Computer Science, ETH Zurich

0000-0002-7412-0895
View Profile

,
Arshavir Ter-Gabrielyan

DFINITY

DFINITY

0000-0003-0292-7750
View Profile

,
Peter Müller

Department of Computer Science, ETH Zurich

Department of Computer Science, ETH Zurich

0000-0001-7001-2566
View Profile

Authors Info & Claims

Formal Aspects of Computing Volume 35 Issue 2Article No.: 7pp 1–27https://doi.org/10.1145/3571748

Published:24 June 2023Publication History

Formal Aspects of Computing

Abstract

Universal quantifiers occur frequently in proof obligations produced by program verifiers, for instance, to axiomatize uninterpreted functions and to statically express properties of arrays. SMT-based verifiers typically reason about them via E-matching, an SMT algorithm that requires syntactic matching patterns to guide the quantifier instantiations. Devising good matching patterns is challenging. In particular, overly restrictive patterns may lead to spurious verification errors if the quantifiers needed for proof are not instantiated; they may also conceal unsoundness caused by inconsistent axiomatizations. In this article, we present the first technique that identifies and helps the users and the developers of program verifiers remedy the effects of overly restrictive matching patterns. We designed a novel algorithm to synthesize missing triggering terms required to complete unsatisfiability proofs via E-matching. Tool developers can use this information to refine their matching patterns and prevent similar verification errors, or to fix a detected unsoundness.

1 INTRODUCTION

Proof obligations frequently contain universal quantifiers, both in the specification and to encode the semantics of the programming language. Most deductive verifiers [10, 13, 16, 22, 26, 32, 50] rely on SMT solvers to discharge their proof obligations via E-matching [25]. This SMT algorithm requires syntactic matching patterns of ground terms (called patterns in the following), to control the instantiations of the quantifiers. For example, the pattern \(\lbrace \mathtt {f}(x, y)\rbrace\) in the formula \(\forall x:\mathrm{Int}, y:\mathrm{Int} :: \lbrace \mathtt {f}(x, y) \rbrace \; (x = y) \wedge \lnot \mathtt {f}(x, y)\) instructs the solver to instantiate the quantifier only when it finds a triggering term that matches the pattern, e.g., \(\mathtt {f}(7, z)\), where \(\mathtt {f}\) is an uninterpreted function and z is a free integer variable. The patterns can be written manually or inferred automatically by the solver or the verifier. However, devising them is challenging [33, 37]. Too permissive patterns may lead to unnecessary instantiations that slow down verification or even cause non-termination (if each instantiation produces a new triggering term, in a so-called matching loop [25]). Overly restrictive patterns may prevent the instantiations needed to complete a proof; they cause two major problems in program verification: incompleteness and undetected unsoundness.

Incompleteness. Overly restrictive patterns may cause spurious verification errors when the proof of valid proof obligations fails. Figure 1 illustrates this case. The integer x represents the address of a node, and the uninterpreted functions len and nxt encode operations on linked lists. The axiom defines len: its result is positive, the last node points to itself, and any added node increases the length of the list by one. The assertion directly follows from the axiom, yet the proof fails, as the proof obligation generated by the verifier for the assert statement does not contain any triggering term that matches the pattern \(\:\) {len(nxt(x))}. Thus, the axiom does not get instantiated. However, realistic proof obligations often contain hundreds of quantifiers [5], making manual identification of missing triggering terms extremely difficult. Unsoundness. Most of the universal quantifiers in proof obligations appear in axioms over uninterpreted functions (to encode type information, heap models, datatypes, etc.). To obtain sound results, these axioms must be consistent (i.e., satisfiable); otherwise, all the proof obligations hold trivially. Consistency can be proved once and for all by showing the existence of a model that satisfies all the axioms, as part of the soundness proof of the verification technique. However, this solution is difficult to apply for those verifiers that generate axioms dynamically, depending on the program to be verified. Proving consistency then requires verifying the algorithm that generates the axioms for all possible inputs, and needs to consider many subtle issues [23, 34, 45].

Fig. 1. Example (written in Boogie [15]) that leads to a spurious verification error. The assertion follows from the axiom, but the axiom does not get instantiated without the triggering term len(nxt(7)).

A more practical approach is to check if the axioms generated for a given program are consistent. However, this check also depends on triggering: the solver may fail to prove unsat if the triggering terms needed to instantiate the contradictory axioms are missing. The unsoundness can thus remain undetected. For example, Dafny’s [32] sequence axiomatization from June 2008 contained an inconsistency found only over a year later. A fragment of this axiomatization is shown in Figure 2.

Fig. 2. Fragment of an old version of Dafny’s [32] sequence axiomatization. \(\mathrm{U}\) and \(\mathrm{V}\) are uninterpreted types. All the named functions are uninterpreted. To improve readability, we use mathematical notation throughout this article instead of SMT-LIB syntax [18].

The types \(\mathrm{U}\) and \(\mathrm{V}\) are uninterpreted. All the named functions are uninterpreted and are used to describe operations over generic sequences (their original names have been simplified for presentation purposes): \(\mathtt {Type} :\mathrm{V} \rightarrow \mathrm{V}\) represents the sequence’s type, while \(\mathtt {ElemType} :\mathrm{V} \rightarrow \mathrm{V}\) denotes the type of the sequence’s elements. Therefore, \(F_0\) states that the elements of a sequence of \(t_0\) (e.g., integers) have type \(t_0\). The function \(\mathtt {typ} :\mathrm{U} \rightarrow \mathrm{V}\) returns the type of its argument, i.e., of the sequence (e.g., \(s_4\) in \(F_4\)), or of its elements (e.g., \(v_4\) in \(F_4\)). The elements of a sequence can also be sequences. \(\mathtt {Empty} :\mathrm{V} \rightarrow \mathrm{U}\) denotes an empty sequence of elements of a given type. \(\mathtt {Build} :\mathrm{U}\) \(\times\) Int \(\times\) U \(\times\) Int \(\rightarrow\) U creates a new sequence from the one provided as the first argument.

The axioms from Figure 2 express that sequences (including empty sequences and sequences obtained through the \(\mathtt {Build}\) operation) are well-typed (\(F_0\)–\(F_2\)), that the length of a type-correct sequence must be non-negative (\(F_3\)), and that \(\mathtt {Build}\) constructs a new sequence of the required length (\(F_4\)). The intended behavior of \(\mathtt {Build}\) is to update the element at index \(i_4\) in sequence \(s_4\) to \(v_4\). However, since there are no constraints on the parameter \(l_4\), \(\mathtt {Build}\) can be used with a negative length, leading to a contradiction with \(F_3\). This unsoundness cannot be detected by checking the satisfiability of the formula \(F_0 \wedge \ldots \wedge F_4\) because the axioms \(F_0\)–\(F_4\) do not get instantiated.

This work. For SMT-based deductive verifiers, discharging proof obligations and revealing inconsistencies in axiomatizations require the SMT solver to prove unsat via E-matching. (Verification techniques based on proof assistants are out of scope.) Given an SMT formula for which E-matching yields unknown due to insufficient quantifier instantiations, our technique generates suitable triggering terms that allow the solver to complete the unsatisfiability proof. These terms enable tool users and developers to understand and remedy the revealed completeness or soundness issue. Since the SMT encodings of different input programs and their specifications typically share axiomatizations or parts of the verification condition that encode the semantics of the programming language, fixing such issues benefits the verification of many or even all future runs of the verifier.

Fixing the incompleteness. For Figure 1, our technique finds the triggering term len(nxt(7)), which allows one to fix the incompleteness. Tool users (who cannot change the axioms) can add the triggering term to their program. For example, adding the lines var t: int; t := len(nxt(7)) before the assertion has no effect on the execution of the program but triggers the instantiation of the axiom. Tool developers can devise less restrictive patterns; e.g., they can move the conjunct len(x) > 0 to a separate axiom with the pattern \(\:\) {len(x)} (simply changing the axiom’s pattern to \(\:\) {len(x)} would cause matching loops). Alternatively, they can use this information to adapt the encoding to emit additional triggering terms enforcing certain instantiations [29, 33].

Fixing the unsoundness. In Figure 2, our synthesized triggering term \(\mathtt {Len}(\mathtt {Build}(\) \(\mathtt {Empty}(\mathtt {typ}(v)), 0,\) \(v, -1))\) (for a fresh value v) is sufficient to detect the unsoundness (see Section 2). Tool users can use this triggering term to report bugs in the implementation of the program verifier, while tool developers can add an antecedent to \(F_4\), which prevents the construction of sequences with negative lengths. Soundness modulo patterns. Figure 3 illustrates another scenario: Boogie’s [15] map axiomatization is inconsistent by design at the SMT level [35]; since \(F_2\) states that storing a key-value pair into a map results in a new map with a potentially different type, one can prove that two different types (e.g., Boolean and Int) are equal in SMT. However, this behavior cannot be exposed from Boogie, as the type system prevents the required instantiations. Thus, it does not affect Boogie’s soundness.

Fig. 3. Fragment of Boogie’s [15] map axiomatization, sound only modulo patterns. \(\mathrm{U}\) and \(\mathrm{V}\) are uninterpreted types. All the named functions are uninterpreted.

Still, it is necessary to detect such cases as they could surface while using Boogie with quantifier instantiation strategies not based on E-matching (such as MBQI [28]) or with first-order provers (e.g., Vampire [31]), which do not consider patterns. They could thus unsoundly classify an invalid Boogie program that uses this map axiomatization as valid. Since the verifier proves the validity of the verification condition by showing that its negation is unsatisfiable, if the refutation algorithm yields unsat, the verifier concludes that the program fulfills its specification. This is the case when checking the axioms from Figure 3 with MBQI: the formula \(F_0 \wedge \ldots \wedge F_3\) is equivalent to \(\mathtt {false}\), so any (even invalid) Boogie program whose SMT encoding contains the axioms \(F_0\)–\(F_3\) is reported as valid.

This example shows that the problems tackled in our work cannot be solved simply by switching to alternative instantiation strategies, which ignore the patterns. First, these are not the preferred choices of most modern verifiers [10, 13, 16, 22, 26, 32, 50], and are unlikely to outperform E-matching. Second, these alternatives may produce unsound results for those verifiers designed for E-matching, with axiomatizations that are sound only modulo patterns (as the one from Figure 3).

Contributions. This article makes the following technical contributions:

(1)	We present the first automated technique that allows users and developers of SMT-based program verifiers to detect completeness issues in program verifiers and soundness problems in their axiomatizations. Moreover, our approach helps them devise better triggering strategies for all future runs of their tool with E-matching.
(2)	We developed a novel algorithm for synthesizing the triggering terms necessary to complete unsatisfiability proofs using E-matching. Since quantifier instantiation is undecidable for first-order formulas over uninterpreted functions, our algorithm might not terminate. However, all identified triggering terms are sufficient to complete the proof; there are no false positives.
(3)	We evaluated our technique on benchmarks with known triggering problems from four program verifiers. Our experimental results show that it successfully synthesized the missing triggering terms in 65.6% of the cases and can significantly reduce the human effort in localizing and fixing the errors.

Outline. Section 2 presents background information on E-matching. Section 3 gives an overview of our technique; the details follow in Section 4. In Section 5, we present our experimental results, in Section 6, we describe various optimizations that allow our algorithm to scale to real-world inputs, and in Section 7, we explain its limitations. We discuss related work in Section 8 and conclude in Section 9.

The current article is an extended and revised version of our paper “Identifying Overly Restrictive Matching Patterns in SMT-based Program Verifiers” presented at FM’21 [20]. Compared to the conference paper, this article explains in more details the concept of “soundness modulo patterns” (Sections 1 and 5.3), describes how E-matching proves the unsatisfiability of an input formula (Section 2), presents five extensions of our algorithm (Section 4.3), and illustrates the (extended) algorithm on various examples: the Boogie and the Dafny examples from Figure 1 and Figure 2, as well as on a new VCC/Havoc [22, 47] benchmark and a list axiomatization with nested quantifiers (Section 4.4). Moreover, the current article explains how our algorithm supports quantifier-free formulas and more complex inputs with synonym functions as patterns, multi-patterns, and alternative patterns (Section 4.4). It also discusses the impact of various configurations of our technique on its effectiveness (Section 5.1), provides a mechanism for automatically selecting benchmarks with triggering issues for the evaluation (Section 5.2), includes a more detailed discussion about the differences between our algorithm and MBQI and Vampire (Section 5.3), presents threats to the validity of our experiments (Section 5.4), describes four optimizations implemented in our tool (Section 6), and discusses additional related work (Section 8) and various research directions we would like to explore in the future (Section 9).

2 BACKGROUND: E-MATCHING

In this section, we present the E-matching-related terminology used in this article and explain how this quantifier-instantiation algorithm works on an example.

Patterns vs. triggering terms. Patterns are syntactic hints attached to quantifiers, which instruct the SMT solver when to perform an instantiation. In Figure 2, the quantified formula \(F_3\) will be instantiated only when a triggering term that matches the pattern \(\lbrace \mathtt {Len}(s_3)\rbrace\) is encountered during the SMT run (i.e., the triggering term is present in the quantifier-free part of the input formula or is obtained by the solver from the body of a previously-instantiated quantifier). Patterns are matched modulo equalities, that is, \(F_4\), which has the pattern \(\lbrace \mathtt {Len}(\mathtt {Build}(s_4, i_4, v_4, l_4))\rbrace\), will be instantiated also when the solver is provided the triggering term \(\mathtt {Len}(s)\) and it knows that \(s=\mathtt {Build}(s_4, i_4, v_4, l_4)\) holds for some \(s_4:\mathrm{U}, i_4:\mathrm{Int}, v_4:\mathrm{U}, l_4:\mathrm{Int}\). However, our algorithm does not generate such triggering terms, as it automatically substitutes s by the right-hand side of the equality.

E-matching. We now illustrate how E-matching works on the example from Figure 2; in particular, we show how our synthesized triggering term \(\mathtt {Len}(\mathtt {Build}\) \((\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1)))\) helps the solver to prove unsat when added to the axiomatization (v is a fresh variable of type \(\mathrm{U}\)). To keep the explanation concise, we omit unnecessary instantiations. First, the sub-terms \(\mathtt {Empty}(\mathtt {typ}(v))\) and \(\mathtt {Len}(\mathtt {Build}\) \((\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1))\) trigger the instantiation of \(F_1\) and \(F_4\), respectively. The solver obtains the body of the quantifiers for these particular values: \(\begin{equation*} \begin{array}{ll} B_1:& \mathtt {typ}(\mathtt {Empty}(\mathtt {typ}(v))) = \mathtt {Type}(\mathtt {typ}(v)) \\ B_4:& \lnot (\mathtt {typ}(\mathtt {Empty}(\mathtt {typ}(v))) = \mathtt {Type}(\mathtt {typ}(v))) \: \vee \\ & (\mathtt {Len}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1)) = -1) \\ \end{array} \end{equation*}\)

As the first disjunct of \(B_4\) evaluates to \(\mathtt {false}\) (from \(B_1\)), the solver learns that the second disjunct must hold (i.e., the length must be \(-\)1); we abbreviate it as \(L = -1\). The sub-terms \(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v))\) and \(\mathtt {Len}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1))\) of the synthesized triggering term lead to the instantiation of \(F_2\) and \(F_3\), respectively: \(\begin{equation*} \begin{array}{ll} B_2:& \mathtt {type}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1)) = \mathtt {Type}(\mathtt {typ}(v)) \\ B_3:& \lnot (\mathtt {typ}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1)) = \\ & \mathtt {Type}(\mathtt {ElemType}(\mathtt {typ}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1))))) \: \vee \\ & (0 \le \mathtt {Len}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1)))) \end{array} \end{equation*}\)

The term \(\mathtt {Type}(\mathtt {ElemType}(\mathtt {typ}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1))))\) from \(B_3\) triggers \(F_0\): \(\begin{equation*} \begin{array}{ll} &B_0:\mathtt {ElemType}(\mathtt {typ}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1))) \\ &\quad = \mathtt {ElemType}(\mathtt {Type}(\mathtt {ElemType}(\mathtt {typ}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1))))) \end{array} \end{equation*}\)

By equalizing the arguments of the outer-most \(\mathtt {ElemType}\) in \(B_0\), the solver learns that the first disjunct of \(B_3\) is \(\mathtt {false}\). The second disjunct must thus hold (i.e., the length should be positive); we abbreviate it as \(0 \le L\). Since \((L = -1) \wedge (0 \le L) = \mathtt {false}\), the unsatisfiability proof succeeds.

3 OVERVIEW

Our goal is to synthesize missing triggering terms, i.e., concrete instantiations for (a small subset of) the quantified variables of an unsatisfiable input formula \(\texttt {I}\), which are necessary for the solver to actually prove its unsatisfiablity. Intuitively, these triggering terms include counterexamples to the satisfiability of \(\texttt {I}\)\(\) and can be obtained from a model of its negation. For example, \(\texttt {I}= \forall n:\mathrm{Int} :: n \gt 7\) is unsatisfiable, and a counterexample \(n=6\) is a model of its negation \(\lnot \texttt {I}= \exists n:\mathrm{Int} :: n \le 7\).

However, this idea does not apply to formulas over uninterpreted functions, which are common in proof obligations. The negation of \(\texttt {I}= \exists \mathtt {f}, \forall n:\mathrm{Int} :: \mathtt {f}(n, 7)\), where \(\mathtt {f}\) is an uninterpreted function, is \(\lnot \texttt {I}=\forall \mathtt {f}, \exists n :\mathrm{Int} :: \lnot \mathtt {f}(n, 7)\). This is a second-order constraint (it quantifies over functions) and cannot be directly encoded in SMT. We thus take a different approach.

Let F be a second-order formula, in which universal quantifiers appear only in positive positions. We define its approximation as (1) \(\begin{equation} F_{\approx } = F[\exists \overline{\mathtt {f}} \: / \: \forall \overline{\mathtt {f}}] , \end{equation}\) where \(\overline{\mathtt {f}}\) are uninterpreted functions. The approximation considers only one interpretation, not all possible interpretations for each uninterpreted function.

We, therefore, construct a candidate triggering term from a model of \(\lnot \texttt {I}_{\approx }\) and check if it is sufficient to prove that \(\texttt {I}\) is unsatisfiable (due to the approximation, a model is no longer guaranteed to be a counterexample for the original formula).

The four main steps of our algorithm are depicted in Figure 4. The algorithm is stand-alone, i.e., not integrated into, nor dependent on any specific SMT solver. We illustrate it on the inconsistent axioms from Figure 5 (which we assume are part of a larger axiomatization). To show that the formula \(\texttt {I}= F_0 \wedge F_1 \wedge \ldots\) is unsatisfiable, the solver requires the triggering term \(\mathtt {f}(\mathtt {g}(7))\). The corresponding instantiations of \(F_0\) and \(F_1\) generate contradictory constraints: \(\mathtt {f}(\mathtt {g}(7)) \ne 7\) and \(\mathtt {f}(\mathtt {g}(7)) = 7\). In the following, we explain how we obtain this triggering term systematically.

Fig. 4. Main steps of our algorithm, represented as blue boxes, which helps the developers of SMT-based verifiers devise better triggering strategies (and enable E-matching to prove unsat). The arrows depict data.

Fig. 5. Formulas that set contradictory constraints on the uninterpreted function \(\mathtt {f}\) . Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {f}(\mathtt {g}(7)))\) requires theory reasoning and syntactic unification. \(\mathtt {dummy}\) is a fresh uninterpreted function (see Step 4).

Step 1: Clustering. As typical proof obligations or axiomatizations contain hundreds of quantifiers, exploring combinations of triggering terms for all of them does not scale. To prune the search space, we exploit the fact that \(\texttt {I}\) is unsatisfiable only if there exist instantiations of some (in the worst case all) of its quantified conjuncts F such that they produce contradictory constraints on some uninterpreted functions. (If there is a contradiction among the quantifier-free conjuncts, the solver will detect it directly.) We thus identify clusters C of formulas F that share function symbols and then process each cluster separately. In Figure 5, \(F_0\) and \(F_1\) share the function symbol \(\mathtt {f}\), so we build the cluster \(C = F_0 \wedge F_1\).

Step 2: Syntactic unification. The formulas within clusters usually contain uninterpreted functions applied to different arguments (e.g., \(\mathtt {f}\) is applied to \(x_0\) in \(F_0\) and to \(\mathtt {g}(x_1)\) in \(F_1\)). We thus perform syntactic unification to identify sharing constraints on the quantified variables (which we call rewritings and denote their set by R) such that instantiations that satisfy these rewritings generate formulas with common terms (on which they might set contradictory constraints). \(F_0\) and \(F_1\) share the term \(\mathtt {f}(\mathtt {g}(x_1))\) if we perform the rewritings \(R =\lbrace x_0 = \mathtt {g}(x_1)\rbrace\).

Step 3: Identifying candidate triggering terms. The cluster \(C = F_0 \wedge F_1\) from step 1 contains a contradiction if there exists a formula \(F_i\) in C such that: (1) \(F_i\) is unsatisfiable by itself, or (2) \(F_i\) contradicts at least one other formula from C.

To address scenario (1), we ask an SMT solver for a model of the formula \(G = \lnot C_{\approx }\), where \(C_{\approx }\) is defined in (1). After Skolemization, G is quantifier-free, so the solver is generally able to provide a model, if one exists. We then obtain a candidate triggering term by substituting the quantified variables from the patterns of the formulas in C with their corresponding values from the model. However, scenario (1) is not sufficient to expose the contradiction from Figure 5, since both \(F_0\) and \(F_1\) are individually satisfiable. Our algorithm thus also derives stronger G formulas corresponding to scenario (2). That is, it will next consider the case where \(F_0\) contradicts \(F_1\), whose encoding into first-order logic is: \(\lnot {F_0}_{\approx } \wedge F_1 \wedge \bigwedge {R}\), where R is the set of rewritings identified in step 2, used to connect the quantified variables. This formula is universally-quantified (since \(F_1\) is), so the solver cannot prove its satisfiability and generate models. We solve this issue by requiring \(F_0\) to contradict the instantiation of \(F_1\), which is a weaker constraint.

Let F be an arbitrary formula, with universal quantifiers only in positive positions. We define its instantiation as (2) \(\begin{equation} F_{Inst} = F[\exists \overline{\mathtt {x}} \: / \: \forall \overline{\mathtt {x}}] , \end{equation}\) where \(\overline{\mathtt {x}}\) are variables. Then \(G=\lnot {F_0}_{\approx } \wedge {F_1}_{Inst} \wedge \bigwedge {R}\) is equivalent to \((\mathtt {f}(x_0) = 7) \wedge (\mathtt {f}(\mathtt {g}(x_1)) = x_1) \wedge (x_0=\mathtt {g}(x_1))\). (To simplify the notation, here and in the following formulas, we omit existential quantifiers.) All its models set \(x_1\) to 7. Substituting \(x_0\) by \(\mathtt {g}(x_1)\) (according to R) and \(x_1\) by 7 (its value from the model) in the patterns of \(F_0\) and \(F_1\) yields the candidate triggering term \(\mathtt {f}(\mathtt {g}(7))\).

Step 4: Validation. Once we have found a candidate triggering term, we add it to the original formula \(\texttt {I}\) (wrapped in a fresh uninterpreted function \(\mathtt {dummy}\), to make it available to E-matching, but not affect the input’s satisfiability) and check if the solver can prove unsat. If so, our algorithm terminates successfully and reports the synthesized triggering term (after a minimization step that removes unnecessary sub-terms); otherwise, we go back to step 3 to obtain another candidate. In our example, the triggering term \(\mathtt {dummy}(\mathtt {f}(\mathtt {g}(7)))\) is sufficient to complete the unsatisfiability proof.

4 SYNTHESIZING TRIGGERING TERMS

Next, we present our algorithm for synthesizing triggering terms required by E-matching to return unsat: in Section 4.1, we define the input formulas and in Section 4.2, we explain the details of the algorithm. Its extensions follow in Section 4.3. We illustrate the algorithm on additional examples in Section 4.4.

4.1 Input Formula

To simplify our algorithm, we pre-process the inputs (i.e., the proof obligations or the axioms of a verifier): we Skolemize existential quantifiers and transform all propositional formulas into negation normal form (NNF), where negation is applied only to literals and the only logical connectives are conjunction and disjunction; we also apply the distributivity of disjunction over conjunction and split conjunctions into separate formulas. These steps preserve satisfiability and the semantics of patterns (Section 6 addresses scalability issues). The resulting formulas follow the grammar from Figure 6. Literals L may include interpreted and uninterpreted functions, variables, and constants. Free variables are nullary functions. Quantified variables can have interpreted or uninterpreted types, and we ensure that their names are globally unique. We assume that each quantifier is equipped with a pattern P (if none is provided, we run the solver to infer one). Patterns are combinations of uninterpreted functions and must mention all quantified variables. Since there are no existential quantifiers after Skolemization, we use the term quantifier to denote universal quantifiers.

Fig. 6. Grammar of input formulas \(\texttt {I}\) . Inputs are conjunctions of formulas F, which are (typically quantified) disjunctions of literals (L or \(\lnot L\) ) or nested quantified formulas. Each quantifier is equipped with a pattern P. \(\overline{x}\) denotes a (non-empty) list of variables.

4.2 Algorithm

The pseudo-code of our algorithm is given in Algorithm 1. It takes as input an SMT formula \(\texttt {I}\) (defined in Figure 6), which we treat in a slight abuse of notation as both a formula and a set of conjuncts. Three other parameters allow us to customize the search strategy and are discussed later. The algorithm yields a triggering term that enables the unsat proof, or None if no term was found. We assume here that \(\texttt {I}\) contains no nested quantifiers and present those at the end of this subsection.

The algorithm iterates over each quantified conjunct F of \(\texttt {I}\) (Algorithm 1, line 3) and checks if it is individually unsatisfiable (for depth = 0). For complex proofs, this is usually not sufficient, as \(\texttt {I}\) is typically inconsistent due to a combination of conjuncts (\(F_0 \wedge F_1\) in Figure 5). In such cases, the algorithm proceeds as follows:

Step 1: Clustering. It constructs clusters of formulas similar to F (Algorithm 2, line 4), based on their Jaccard similarity index. Let \(F_i\) and \(F_j\) be two arbitrary formulas, and \(S_i\) and \(S_j\) their respective sets of uninterpreted function symbols (from their bodies and the patterns of the quantifiers). The Jaccard similarity index is defined as (3) \(\begin{equation} J(F_i, F_j) =\frac{|S_i \cap S_j|}{|S_i \cup S_j|} . \end{equation}\) That is, the number of common uninterpreted functions divided by the total number. For the two formulas from Figure 5, \(S_0 =\lbrace \mathtt {f}\rbrace\), \(S_1 =\lbrace \mathtt {f, g} \rbrace\), therefore \(J(F_0, F_1) = \frac{|\lbrace \mathtt {f}\rbrace |}{|\lbrace \mathtt {f, g} \rbrace |} = 0.5\).

Our algorithm explores the search space by iteratively expanding clusters to include transitively-similar formulas up to a maximum depth (parameter \(\delta\) in Algorithm 1). For two formulas \(F_i, F_j \in \texttt {I}\), we define the similarity function as (4) \(\begin{equation} \texttt {sim}_\texttt {I}^{\delta }(F_i, F_j, \sigma)= \left\lbrace \begin{array}{ll} J(F_i, F_j) \ge \sigma , &\: \: \delta = 1 \\ \exists F_k: \texttt {sim}_{\texttt {I}\setminus \lbrace F_i\rbrace }^{\delta -1}(F_i, F_k, \sigma) \text{and} J(F_k, F_j) \ge \sigma , &\:\:\delta \gt 1 \end{array} \right. , \end{equation}\) where \(\sigma \in [0,1]\) is a similarity threshold used to parameterize our algorithm and J is defined in (3).

The initial cluster (for \(\texttt {depth} = 1\)) includes all the conjuncts of \(\texttt {I}\) that are directly similar to F. Each subsequent iteration adds the conjuncts that are directly similar to an element of the cluster from the previous iteration, that is, transitively similar to F. This search strategy allows us to gradually strengthen the formulas G (used to synthesize candidate terms in step 3) without overly constraining them (an over-constrained formula is unsatisfiable, and has no models).

Step 2: Syntactic unification. Next (Algorithm 2, line 7), we identify rewritings, i.e., constraints under which two similar quantified formulas share terms. (Section 4.4 presents the quantifier-free case.) We obtain the rewritings by performing a simplified form of syntactic term unification, which reduces their number to a practical size. Our rewritings are directed equalities. For two formulas \(F_i\) and \(F_j\) and an uninterpreted function \(\mathtt {f}\) they have the following shape: (5) \(\begin{equation} x_m = {rhs}_n , \end{equation}\) where \(m = i\) and \(n = j\) or \(m = j\) and \(n = i\), \(x_m\) is a quantified variable of \(F_m\), \(F_m\) contains a term \(\mathtt {f}(x_m)\), \(F_n\) contains a term \(\mathtt {f}({rhs}_n)\), and \({rhs}_n\) is a constant \(c_n\), a quantified variable \(x_n\), or a composite function \((\mathtt {f} \circ \mathtt {g}_0 \circ \cdots \circ \mathtt {g}_p)(\overline{c_n}, \overline{x_n})\) occurring in the formula \(F_n\); \(\mathtt {g}_0, \ldots , \mathtt {g}_p\) are arbitrary (interpreted or uninterpreted) functions. We thus determine the most general unifier [14] only for those terms that have uninterpreted functions as the outer-most functions and quantified variables as arguments. The unification algorithm is standard (except for the restricted shape), so it is not shown explicitly.

In Figure 5, \(F_1\) is similar to \(F_0\) for any \(\sigma \le 0.5\). We then compute the rewritings for all the quantified variables of \(F_0\) that appear in its body as arguments to some common uninterpreted functions (in this case, only \(x_0\)). Unifying the terms \(\mathtt {f}(x_0)\) and \(\mathtt {f}(\mathtt {g}(x_1))\) generates the rewriting \(x_0 = \mathtt {g}(x_1)\), which has the shape defined in (5).

Since a term may appear more than once in F, or F unifies with multiple similar formulas through the same quantified variable, we can obtain alternative rewritings for a quantified variable. In such cases, we either duplicate or split the cluster, such that in each cluster-rewriting pair, each quantified variable is rewritten at most once (see Algorithm 2, line 11). For example, in Figure 7, both \(F_1\) and \(F_2\) are similar to \(F_0\) (all three formulas share the uninterpreted function symbol \(\mathtt {f}\)). As the unification produces alternative rewritings for \(x_0\) (\(x_0 = x_1\) and \(x_0 = x_2\)), the procedure clustersRewritings returns the pairs \(\lbrace (\lbrace F_1\rbrace , \lbrace x_0 = x_1\rbrace), (\lbrace F_2\rbrace , \lbrace x_0 = x_2\rbrace)\rbrace\). Step 3: Identifying candidate terms. From the clusters and the rewritings (identified before), we then derive quantifier-free formulas G (Algorithm 1, line 10), and, if they are satisfiable, construct the candidate triggering terms from their models (Algorithm 1, line 15). Each formula G consists of: (1) \(\lnot F_{\approx }\) (defined in (1), which is equivalent to \(\lnot F^{\prime }\), since F has the shape \(\forall \overline{x} :: F^{\prime }\) from Algorithm 1, line 3), (2) the instantiations (defined in (2)) of all the similar formulas from the cluster, and (3) the corresponding rewritings R. (As we assume that all the quantified variables are globally unique, we do not perform variable renaming when computing the instantiations).

Fig. 7. Formulas that set contradictory constraints on the uninterpreted function \(\mathtt {f}\) . Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {f}(0))\) requires clusters of similar formulas with alternative rewritings.

If a similar formula has multiple disjuncts \(D_k\), the SMT solver may use short-circuiting semantics when generating the model for G. That is, if it can find a model that satisfies the first disjunct, it may not consider the remaining ones. To obtain more diverse models, we synthesize formulas that cover each disjunct, i.e., make sure that it evaluates to \(\mathtt {true}\) at last once. We thus compute multiple instantiations of each similar formula, of the form: \((\bigwedge _{0 \le j \lt k} \lnot D_j) \wedge D_k, \forall k:0 \le k \le n\) (see Algorithm 1, line 7). To consider all the combinations of disjuncts, we derive the formula G from the Cartesian product of the instantiations (Algorithm 1, line 9). (For presentation purposes, we also store \(\lnot F^{\prime }\) in the instantiations map (Algorithm 1, line 8), even if it does not represent the instantiation of F.)

In Figure 8, \(F_1\) is similar to \(F_0\) and \(R = \lbrace x_0 = x_1\rbrace\). \(F_1\) has two disjuncts and thus two possible instantiations: \(\textsf {Inst[}F_1\textsf {]} = \lbrace x_1 \ge 1, (x_1 \lt 1) \wedge (\mathtt {f}(x_1) = 6) \rbrace\). The formula \(G = (x_0 \gt -1) \wedge (\mathtt {f}(x_0) \le 7) \wedge (x_1 \ge 1) \wedge (x_0 = x_1)\) for the first instantiation is satisfiable, but none of the values the solver can assign to \(x_0\) (which are all greater or equal to 1) are sufficient for the unsatisfiability proof to succeed. The second instantiation adds additional constraints: instead of \(x_1 \ge 1\), it requires (\(x_1 \lt 1) \wedge (\mathtt {f}(x_1) = 6)\). The resulting G formula has a unique solution for \(x_0\), namely 0, and the triggering term \(\mathtt {f}(0)\) is sufficient to prove unsat.

Fig. 8. Formulas that set contradictory constraints on the uninterpreted function \(\mathtt {f}\) . Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {f}(0))\) requires instantiations that cover all the disjuncts.

The procedure \(\textsf {candidateTerm}\) in Algorithm 3 synthesizes a candidate triggering term T from the models of G and the rewritings R. We first collect all the patterns of the formulas from the cluster C (Algorithm 3, line 2), i.e., of F and of its similar conjuncts (see Algorithm 1, line 15). Then, we apply the rewritings, in an arbitrary order (Algorithm 3, lines 3–6). That is, we substitute the quantified variable x from the left-hand side of the rewriting with the right-hand side term rhs and propagate this substitution to the remaining rewritings. This step allows us to include in the synthesized triggering terms additional information, which cannot be provided by the solver. Then (Algorithm 3, lines 7–8) we substitute the remaining variables with their constant values from the model (i.e., constants for built-in types, and fresh, unconstrained variables for uninterpreted types). For interpreted, user-defined types (such as a type IList for representing a List of Int, where List and Int are both interpreted types), the solver generates constants for each type component, or a sequence of operations required to construct them. For instance, insert(0, nil) (i.e., a singleton list containing the constant 0) is a possible model provided by the SMT solver Z3 [24] for a variable of type IList. The resulting triggering term is wrapped in an application to a fresh, uninterpreted function dummy to ensure that conjoining it to \(\texttt {I}\) does not change \(\texttt {I}\)’s satisfiability.

Step 4: Validation. We validate the candidate triggering term T by checking if \(\texttt {I}\wedge T\) is unsatisfiable, i.e., if these particular interpretations for the uninterpreted functions generalize to all interpretations (Algorithm 1, line 16). If this is the case, then we return the minimized triggering term (Algorithm 1, line 18). The \(\mathtt {dummy}\) function has multiple arguments, each of them corresponding to one pattern from the cluster (Algorithm 3, line 9). This is an over-approximation of the required triggering terms (once instantiated, the formulas may trigger each other), so minimized removes redundant (sub-)terms. If T does not validate, we re-iterate its construction up to a bound \(\mu\) and strengthen the formula G to obtain a different model (Algorithm 1, lines 19 and 11). The parameter \(\mu\) allows us to deal with other sources of incompleteness, as we explain next.

Let us consider the formula from Figure 9, which was part of an axiomatization with 2,495 axioms. F axiomatizes the uninterpreted function \(\mathtt {\_div}:\mathrm{Int} \times \mathrm{Int} \rightarrow \mathrm{Int}\) and is inconsistent, because there exist two integers whose real division (“/”) is not an integer. The model produced by Z3 for the formula \(G = \lnot F^{\prime }\) is \(x = -1, y = 0\). \(-1/0\) is defined (“/” is a total function [18]), but its result is not specified. Thus, the solver cannot validate this model (i.e., it returns unknown). In such cases, we ask the solver for a different model. In Figure 9, if we simply exclude previous models, we can obtain a sequence of models with different values for the numerator, but with the same value (0) for the denominator. There are infinitely many such models; all of them fail to validate for the same reason.

Fig. 9. Inconsistent axiom from F* [49]. \(\mathtt {\_div}:\mathrm{Int} \times \mathrm{Int} \rightarrow \mathrm{Int}\) is an uninterpreted function. Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {\_div}(1, 2))\) requires diverse models.

There are various heuristics one can employ to guide the solver’s search for a new model and our algorithm can be parameterized with different ones. In our experiments, we interpret the conjunct \(\lnot \mathtt {model}\) from Algorithm 1, line 19 as \((\bigwedge _{x\in \overline{x}} x \ne \mathtt {model}(x)) \wedge (\bigwedge _{x_i, x_j\in \overline{x}, \; i \ne j, \; \mathtt {model}(x_i) = \mathtt {model}(x_j)} x_i \ne x_j)\). This allows us to synthesize the triggering term \(\mathtt {dummy}(\mathtt {\_div}(1, 2))\) and expose the error from Figure 9.

The first component (\(\bigwedge _{x\in \overline{x}} x \ne \mathtt {model}(x)\)) requires all the variables to have different values than before. This requirement may be too strong for some variables, but as we use only soft constraints, the solver may ignore some constraints if it cannot generate a satisfying assignment. The second part (i.e., \(\bigwedge _{x_i, x_j\in \overline{x}, \; i \ne j, \; \mathtt {model}(x_i) = \mathtt {model}(x_j)}\) \(x_i \ne x_j\)) requires models from different equivalence classes, where an equivalence class includes all the variables that are equal in the model. For example, if the model is \(x_0 = x, x_1 = x\), where x is a value of the corresponding type, then \(x_0\) and \(x_1\) belong to the same equivalence class. Considering equivalence classes is particularly important for variables of uninterpreted types. The solver cannot provide actual values for them, thus it assigns fresh, unconstrained variables. However, different fresh variables do not lead to diverse models.

Nested quantifiers. Our algorithm also supports nested quantifiers. Nested existential quantifiers in positive positions and nested universal quantifiers in negative positions are replaced in NNF by new, uninterpreted Skolem functions. Step 2 is also applicable to them: Skolem functions with arguments (the quantified variables from the outer scope) are unified as regular uninterpreted functions; they can also appear as rhs in a rewriting, but not as the left-hand side (we do not perform higher-order unification). In such cases, the result is imprecise: the unification of \(\mathtt {f}(x_0, \mathtt {skolem}())\) and \(\mathtt {f}(x_1, 1)\) produces only the rewriting \(x_0 = x_1\).

After pre-processing, the conjunct F and the similar formulas may still contain nested universal quantifiers. F is always negated in G, thus it becomes, after Skolemization, quantifier-free. To ensure that G is also quantifier-free (and the solver can generate a model), we extend the algorithm to recursively instantiate similar formulas with nested quantifiers when computing the instantiations.

4.3 Extensions

Next, we describe various extensions of our algorithm that enable complex unsatisfiability proofs.

Combining multiple candidate terms. In Algorithm 1, each candidate term is validated separately. To enable proofs that require multiple instantiations of the same formula, we developed an extension that validates multiple triggering terms at the same time. In such cases, the algorithm returns a set of terms that are necessary and sufficient to prove unsat. Figure 10 presents a simple example from SMT-COMP 2019 pending benchmarks [3]. (The files in this category are not guaranteed to comply with the SMT-LIB standard, but our benchmarks selection algorithm described in Section 5.2 checks this automatically.) The input \(\texttt {I}= F_0 \wedge F_1\) is unsatisfiable, as there does not exist an interpretation for the uninterpreted function \(\mathtt {U}\) that satisfies all the constraints: \(F_1\) requires \(\mathtt {U}(\mathtt {s})\) to be \(\mathtt {true}\); if \(F_0\) is instantiated for \(x_0 = \mathtt {s}\), the solver learns that \(\mathtt {U}(\mathtt {il})\) must be \(\mathtt {true}\) as well; however, if \(x_0 = \mathtt {il}\), then \(\mathtt {U}(\mathtt {il})\) must be \(\mathtt {false}\), which is a contradiction. Exposing the inconsistency thus requires two instantiations of \(F_0\), triggered by \(\mathtt {f}(\mathtt {s})\) and \(\mathtt {f}(\mathtt {il})\), respectively. We generate both triggering terms, but in separate iterations (independently, both fail to validate). However, by validating them simultaneously (i.e., conjoining both of them to \(\texttt {I}\), as arguments to the fresh function \(\mathtt {dummy}\)), our algorithm identifies the required triggering term \(T = \mathtt {dummy}(\mathtt {f}(\mathtt {s}), \mathtt {f}(\mathtt {il}))\).

Fig. 10. Benchmark from SMT-COMP 2019 [3]. The formulas set contradictory constraints on the uninterpreted function \(\mathtt {U}\) . \(\mathrm{S}\) is an uninterpreted type, \(\mathtt {s}\) and \(\mathtt {il}\) are user-defined constants of type \(\mathrm{S}\) . Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {f}(\mathtt {s}), \mathtt {f}(\mathtt {il}))\) requires multiple candidate terms. We use conjunctions here for simplicity, but our pre-processing applies distributivity of disjunction over conjunction and splits \(F_0\) into three different formulas with unique names for the quantified variables.

Unification across multiple instantiations. The clusters constructed by our algorithm are sets (see Algorithm 2, line 11), so they contain a formula at most once, even if it is similar to multiple other formulas from the cluster. We thus consider the rewritings for multiple instantiations of the same formula separately, in different iterations. To handle cases that require multiple (but boundedly many) instantiations, we extend the algorithm with a parameter \(\Phi\), which bounds the maximum frequency of a quantified conjunct within the formulas G. That is, it allows a similar quantified formula, as well as F itself, to be added to a cluster (now represented as a list) more than once (after performing variable renaming, to ensure that the names of the quantified variables are still globally unique). This results in an equisatisfiable formula for which our algorithm determines multiple triggering terms. Inputs whose unsatisfiability proofs require an unbounded number of instantiations typically contain a matching loop, thus, we do not consider them here. Figure 11 presents an example, which consists of a single inconsistent formula F. Our regular algorithm from Algorithm 2 does not identify any rewritings. However, with this extension, F unifies with itself for any \(\Phi \gt 1\), and one possible rewriting is \(x^{\prime } = 7\) (where \(x^{\prime }\) is a fresh variable representing the second instantiation of F). The corresponding triggering term, \(T = \mathtt {dummy}(\mathtt {g}(7))\), allows E-matching to prove unsat. Note that the uninterpreted function \(\mathtt {g}\) is used only as a pattern. If the pattern would have been \(\mathtt {f}(x)\), any triggering term \(\mathtt {f}(c)\), where c is an integer constant, would have been sufficient to complete the proof: (1) for \(c = 7\), the contradiction would have been exposed directly; (2) for \(c \ne 7\), the term \(\mathtt {f}(7)\), obtained from the first instantiation of F, would have triggered its second instantiation and case (1) would have then applied. Type-based constraints. The rewritings of the form \(x_i = x_j\) can be too imprecise (especially for quantified variables of uninterpreted types), as they do not constrain the rhs. In Figure 12, the solver cannot provide concrete values of the uninterpreted type \(\mathrm{U}\) for \(e_1\) and \(op_1\), it can only assign fresh, unconstrained variables (e.g., e and op). However, the triggering terms \(\mathtt {some}(e)\) and \(\mathtt {get}(op)\), which can be obtained from these fresh variables, are not sufficient to prove unsat; one would additionally need the rewriting \(e_1 = \mathtt {get}(op_1)\), which cannot be identified by our unification from Section 4.2. To address such scenarios, we extend the unification to also consider as rhs a constant or an uninterpreted function from the body of the similar formulas, which has the same type as the quantified variable from the left-hand side. For Figure 12, it will thus generate the rewritings \(R = \lbrace e_0 = \mathtt {get}(op_2), e_1 = \mathtt {get}(op_2), op_1 = \mathtt {none}, op_2 = \mathtt {none}\rbrace\) (this is one of the alternatives). These type-based constraints allow us to synthesize the triggering term \(T = \mathtt {dummy}(\mathtt {some}(\mathtt {get}(\mathtt {none})))\), which exposes the unsoundness from Gobra’s [52] option types axiomatization.

Fig. 11. Formula that sets contradictory constraints on the uninterpreted function \(\mathtt {f}\) . The uninterpreted function \(\mathtt {g}\) is used only as a pattern (i.e., it does not appear in the body of F, see Section 4.4). Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {g}(7))\) requires unification across multiple instantiations.

Fig. 12. Fragment of Gobra’s [52] option types axiomatization. \(\mathrm{U}\) is an uninterpreted type, \(\mathtt {none}\) is a user-defined constant of type \(\mathrm{U}\) . \(F_1\) and \(F_2\) have multi-patterns (discussed in Section 4.4). Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {some}(\mathtt {get}(\mathtt {none})))\) requires type-based constraints.

Unification for sub-terms. Figure 13 shows an example which cannot be solved by any extension discussed so far, since it requires semantic reasoning: by applying \(\mathtt {f}\) on both sides of the equality, one can learn from \(F_1\) that \(\mathtt {f}(\mathtt {g}(2020)) = \mathtt {f}(\mathtt {g}(2021))\). From \(F_0\) though, \(\mathtt {f}(\mathtt {g}(2020)) = 2020\) and \(\mathtt {f}(\mathtt {g}(2021)) = 2021\), which implies that \(2020 = 2021\), i.e., \(\mathtt {false}\). Our extended algorithm synthesizes the required triggering term \(T = \mathtt {dummy}(\mathtt {f}(\mathtt {g}(2020)), \mathtt {f}(\mathtt {g}(2021)))\) by applying the unification also to sub-terms; due to our restrictive shape of the rewritings, the sub-terms can only be applications of uninterpreted functions. In Figure 13, trying to unify \(\mathtt {f}(\mathtt {g}(x_0))\) does not produce any rewritings, as \(F_1\) does not contain \(\mathtt {f}(\mathtt {g})\). We thus unify the sub-term \(\mathtt {g}(x_0)\) with \(\mathtt {g}(2020)\) and \(\mathtt {g}(2021)\) and obtain the rewritings \(R = \lbrace x_0 = 2020, x_0 = 2021\rbrace\). Together with the extension for combining multiple candidate terms described above, these rewritings provide sufficient information for the unsat proof to succeed. This unification is syntactic, but produces the triggering terms that would be obtained if the solver would apply some uninterpreted function present in the input on a learned predicate (the solver performs semantic reasoning automatically, but without generating new triggering terms). Alternative triggering terms. Our algorithm returns the first candidate term that successfully validates (Algorithm 1, line 18). However, it might also be useful to synthesize alternative triggering terms for the same input, as they may correspond to different completeness or soundness issues. Our tool provides this option and can also return all the triggering terms found within the given timeout.

Fig. 13. Formulas that set contradictory constraints on the uninterpreted function \(\mathtt {f}\) . Synthesizing the triggering term \(\mathtt {dummy}(\mathtt {f}(\mathtt {g}(2020)), \mathtt {f}(\mathtt {g}(2021)))\) requires unification for sub-terms.

All these extensions (individually or together with other extensions) allow us to complete the refutation proofs for particular benchmarks. Section 5 evaluates the impact of a few configurations of our technique, which can be obtained by enabling some of the extensions or by setting certain values for some of the additional parameters. Automatically determining which is the most suited configuration for a particular input is left as future work.

4.4 Additional Examples

In this section, we illustrate our algorithm on various examples (including those from Figure 1 and Figure 2, and an example with nested quantifiers). We also explain how the algorithm supports quantifier-free formulas, synonym functions as patterns, multi-patterns, and alternative patterns.

Nested quantifiers. Our algorithm handles inputs with nested quantifiers as described in Section 4.2. We illustrate this aspect on the formulas from Figure 14, which axiomatize operations over lists of integers. The axioms \(F_3\) and \(F_4\) set contradictory constraints on \(\mathtt {indexOf}\) when the element is not contained in the list. According to Algorithm 2, one of the clusters generated for \(F_3\) is \(C=\lbrace F_2, F_0\rbrace\), with the rewritings \(R=\lbrace l_3 = l_2, el_3 = el_2, l_2 = l_0\rbrace\). The algorithm then computes the instantiations for \(F_0\) and \(F_2\); as \(F_2\) contains nested quantifiers, we remove both of them and obtain: \(\textsf {Inst[}F_2\textsf {]} = \lbrace \lnot \mathtt {isEmpty}(l_2), \mathtt {isEmpty}(l_2) \wedge \lnot \mathtt {has}(l_2, el_2)\rbrace\), \(\textsf {Inst[}F_0\textsf {]} = \lbrace \lnot (l_0 = \mathtt {EmptyList}), l_0 = \mathtt {EmptyList} \wedge \mathtt {isEmpty}(l_0) \rbrace\). The model of the corresponding G formula and R allow us to synthesize the required triggering term \(T= \mathtt {dummy}(\mathtt {isEmpty}(\mathtt {EmptyList}), \mathtt {has}(\mathtt {EmptyList}, 0))\). Quantifier-free formulas. Our algorithm iterates only over quantified conjuncts but leverages the additional information provided by quantifier-free formulas and includes them in the clusters even if the unification cannot find a rewriting (Algorithm 2, line 8). Since quantifier-free conjuncts can be seen as already instantiated formulas, we only have to cover all their disjuncts (Algorithm 1, line 7). Boogie example. Figure 15 shows the example from Figure 1 encoded in our input format. The quantifier-free formula \(F_3\) (i.e., the negation of the verification condition) is similar to \(F_0\) (they share the function symbol \(\mathtt {len}\)) and unifies through the rewritings \(R = \lbrace x_0 = 7\rbrace\). We obtained the required triggering term \(T = \mathtt {dummy}(\mathtt {len}(\mathtt {nxt}(7)))\) from the model of the formula \(G = \lnot F_0^{\prime } \wedge \textsf {Inst[}F_3\textsf {][}0\textsf {]} \wedge \bigwedge {R} = (\mathtt {len}(x_0) \le 0) \wedge (\mathtt {len}(7) \le 0) \wedge (x_0 = 7)\).

Fig. 14. Formulas that set contradictory constraints on the uninterpreted function \(\mathtt {indexOf}\) . \(\mathrm{L}\) is an uninterpreted type, \(\mathtt {EmptyList}\) is a user-defined constant of type \(\mathrm{L}\) . \(\mathtt {f_1}\) is a Skolem function, which replaces a nested existential quantifier. \(F_2\) contains nested universal quantifiers.

Fig. 15. Boogie example from Figure 1 encoded in our input format. \(F_0\) – \(F_2\) represent the axiom, while the quantifier-free formula \(F_3\) is the negation of the assertion (for discharging the proof obligation, the verifier considers the axioms and the negation of the verification condition).

Dafny example. Our algorithm can synthesize various triggering terms that expose the unsoundness from Figure 2, depending on the values of its parameters. We explain here one, obtained for \(\sigma = 0.1\). For \(\texttt {depth} = 0\), the algorithm checks each formula \(F_0\)–\(F_4\) in isolation. As they are all individually satisfiable, it continues with \(\texttt {depth} = 1\). To avoid redundant explanations, we present only the iteration for \(F_3\). \(F_3\) shares at least two uninterpreted function symbols with each of the other formulas, so there are various alternative rewritings: \(s_3 = \mathtt {Empty}(t_1)\), \(s_3 = s_4\), \(s_3 = \mathtt {Build}(s_4, i_4, v_4, l_4)\), and so on. As we consider clusters-rewritings pairs in which each quantified variable has maximum one rewriting, one such pair is \((C = \lbrace F_4\rbrace , R = \lbrace s_3 = \mathtt {Build}(s_4, i_4, v_4, l_4)\rbrace)\). \(F_4\) has two disjuncts, therefore its instantiations are: \(\textsf {Inst[}F_4\textsf {]} = \lbrace \mathtt {typ}(s_4) = \mathtt {Type}(\mathtt {typ}(v_4),\) \(\lnot (\mathtt {type}(s_4) = \mathtt {Type}(\mathtt {typ}(v_4)) \wedge (\mathtt {Len}(\mathtt {Build}(s_4,\) \(i_4, v_4, l_4)) = l_4)\rbrace\). From these instantiations and the rewritings R, we derive two formulas: \(G_0 = \lnot F_3^{\prime } \wedge \textsf {Inst[}F_4\textsf {][}0\textsf {]} \wedge \bigwedge R\), with the model \(s_3 = s\), \(s_4 = s^{\prime }\), \(i_4 = 0\), \(v_4 = v\), \(l_4 = 1\) and \(G_1 = \lnot F_3^{\prime } \wedge \textsf {Inst[}F_4\textsf {][}1\textsf {]} \wedge \bigwedge R\), with the model \(s_3 = s\), \(s_4 = s^{\prime }\), \(i_4 = 0\), \(v_4 = v\), \(l_4 = -1\), where s, \(s^{\prime }\), and v are fresh variables of type \(\mathrm{U}\). (We use indexes for the G formulas to refer to them later.) We then construct the candidate triggering terms from the patterns of the formulas \(F_3\) and \(F_4\). We replace \(s_3\) by its rhs in the rewriting, i.e., \(\mathtt {Build}(s_4, i_4, v_4, l_4)\), and all the other quantified variables by their constants from the model. The result after removing redundant terms is: \(T_0 = \mathtt {dummy}(\mathtt {Len}(\mathtt {Build}(t, 0, v, 1)))\) and \(T_1 = \mathtt {dummy}(\mathtt {Len}(\mathtt {Build}(t, 0, v, -1)))\). Since the validation step fails for both \(T_0\) and \(T_1\), we continue with the other \((C, R)\) pairs, the remaining quantified conjuncts and their similarity clusters.

If no candidate term is sufficient to prove unsat, our algorithm expands the clusters. To scale to real-world axiomatizations, it efficiently reuses the results from the previous iterations; i.e., it prunes the search space if a previously synthesized formula G is unsatisfiable and it strengthens G if it is satisfiable. The pair \((C = \lbrace F_4\rbrace , R = \lbrace s_3 = \mathtt {Build}(s_4, i_4, v_4, l_4)\rbrace)\) can be extended to \((C = \lbrace F_4, F_1\rbrace , R = \lbrace s_3 = \mathtt {Build}(s_4, i_4, v_4, l_4), s_4 = \mathtt {Empty}(t_1), t_1 = \mathtt {typ}(v_4)\rbrace)\), as \(F_1\) is similar to \(F_4\) through the rewritings \(R = \lbrace s_4 = \mathtt {Empty}(t_1), t_1 = \mathtt {typ}(v_4)\rbrace\). We thus conjoin the instantiation of \(F_1\) and the two additional rewritings to the formulas \(G_0\) and \(G_1\) from the previous iteration. This is equivalent to recomputing the similarity cluster, the rewritings, and the combinations of instantiations. We then obtain: \(G_0^{\prime } = G_0 \wedge (\mathtt {type}(\mathtt {Empty}(t_1)) = \mathtt {Type}(t_1)) \wedge (s_4 = \mathtt {Empty}(t_1)) \wedge (t_1 = \mathtt {typ}(v_4))\), which is unsatisfiable, and \(G_1^{\prime } = G_1 \wedge (\mathtt {type}(\mathtt {Empty}(t_1)) = \mathtt {Type}(t_1)) \wedge (s_4 = \mathtt {Empty}(t_1)) \wedge (t_1 = \mathtt {typ}(v_4))\) with the model: \(s_3 = s\), \(s_4 = s^{\prime }\), \(i_4 = 0\), \(v_4 = v\), \(l_4 = -1\), \(t_1 = t\), where s, \(s^{\prime }\), v, and t are fresh variables of types \(\mathrm{U}\) and \(\mathrm{V}\), respectively. From this model and the rewritings, we construct the triggering term \(T = \mathtt {dummy}(\mathtt {Len}(\mathtt {Build}(\mathtt {Empty}(\mathtt {typ}(v)), 0, v, -1)))\), which is sufficient to expose the inconsistency between \(F_3\) and \(F_4\). VCC/HAVOC example. Figure 16 presents a fragment of an unsatisfiable benchmark generated by VCC/HAVOC [22, 47], which cannot be refuted by neither E-matching, nor MBQI. (Section 5 provides additional experimental results and a detailed comparison between our algorithm and alternative refutation techniques.) F, which was part of a set of 160 formulas, is inconsistent by itself: when \(size = 0\), it evaluates to \(\mathtt {false}\) for any integer values a, b, such that \(a \le b\). Our algorithm synthesizes a triggering term for E-matching in \(\approx\)7 s because it initially considers each quantified conjunct in isolation. The formula \(G = \lnot F^{\prime } = \mathtt {both\_ptr}(a, b, size) * size \gt a - b\) is satisfiable and the simplest models the solver can provide (without assigning an interpretation to the uninterpreted function \(\mathtt {both\_ptr}\)) all include \(size = 0\) and some values for a and b, such that \(a \le b\) (e.g., \(a = -2\), \(b = -1\)). Synonym functions as patterns. For the examples discussed so far, the functions used as patterns were also present in the body of the quantifiers. However, to have better control over the instantiations, one can also write formulas where the patterns are additional uninterpreted functions, which do not appear in the bodies. Such patterns are not uncommon in proof obligations. Figure 17 shows an example, which uses the synonym functions technique [33] to avoid matching loops. \(\mathtt {sum}\) and \(\mathtt {sum\_syn}\) compute the sum of the elements of a sequence, between a lower and an upper bound. The two functions are identical (according to \(F_0\)), but only \(\mathtt {sum}\) is used as a pattern. For equal bounds, \(F_1\) and \(F_2\) set contradictory constraints on the interpretation of \(\mathtt {sum\_syn}\). \(\mathtt {seq.nth}\) returns the nth element of the sequence. Using the information from the quantifier-free formula \(F_3\), our algorithm generates the triggering term \(T=\mathtt {dummy}(\mathtt {sum}(\mathtt {empty}, 0, 0), \mathtt {sum}(\mathtt {empty}, 0+1, 0))\). The term “\(0+1\)” comes from the rewriting \(l_0 = l_2 +1\). The addition is a built-in function but is used as an argument to the uninterpreted function \(\mathtt {sum\_syn}\), thus, it is supported by our unification. Our algorithm is syntactic, so it does not perform arithmetic operations, it just substitutes \(l_2\) with its value from the model. The solver then performs theory reasoning and concludes unsat.

Fig. 16. Inconsistent formula from a VCC/HAVOC [22, 47] benchmark from SMT-COMP 2020 [5], which cannot be proved unsat by MBQI. Our synthesized triggering term \(\mathtt {dummy}(\mathtt {both\_ptr}(-2, -1, 0))\) allows E-matching to refute it.

Fig. 17. Formulas with synonym functions as patterns that axiomatize sequence comprehensions and set contradictory constraints on the uninterpreted function \(\mathtt {sum\_syn}\) . \(\mathrm{ISeq}\) is a user-defined type, \(\mathtt {empty}\) is a user-defined constant of type \(\mathrm{ISeq}\) (i.e., the empty sequence).

Multi-patterns and alternative patterns. SMT solvers allow patterns to contain multiple terms, all of which must be present to perform an instantiation. \(F_1\) in Figure 18 has such a multi-pattern and can be instantiated only when triggering terms that match both \(\lbrace \mathtt {g}(b_1)\rbrace\) and \(\lbrace \mathtt {f}(x_1)\rbrace\) are present in the SMT run. Our algorithm directly supports multi-patterns, as the procedure \(\textsf {candidateTerm}\) instantiates all the patterns from the given cluster (see Algorithm 3, line 9). For the example from Figure 18, our technique synthesizes the triggering term \(T = \mathtt {dummy}(\mathtt {f}(7), \mathtt {g}(b))\) from the rewritings \(R=\lbrace x_0 = x_1\rbrace\) and the model of the formula \(G = \lnot F_0^{\prime } \wedge \textsf {Inst[}F_1\textsf {][}1\textsf {]} \wedge \bigwedge R = (\mathtt {f}(x_0) = 7) \wedge (\lnot \mathtt {g}(b_1) \wedge \mathtt {f}(x_1) = x_1) \wedge (x_0 = x_1)\). Here b is a fresh, unconstrained variable of the uninterpreted type \(\mathrm{B}\).

Fig. 18. Formulas that set contradictory constraints on the uninterpreted function \(\mathtt {f}\) . \(\mathrm{B}\) is an uninterpreted type. \(F_1\) has a multi-pattern.

Formulas can also contain alternative patterns. For example, the quantified formula \(\forall x:\mathrm{Int} :: \lbrace \mathtt {f}(x)\rbrace \: \lbrace \mathtt {h}(x)\rbrace \; (\mathtt {f}(x) \ne 7) \vee (\mathtt {h}(x) = 6)\) is instantiated only if there exists a triggering term that matches \(\lbrace \mathtt {f}(x)\rbrace\) or one that matches \(\lbrace \mathtt {h}(x)\rbrace\). Our algorithm does not differentiate between multi-patterns and alternative patterns, thus it always synthesizes the arguments for all the patterns of a cluster. For alternative patterns, this results in an over-approximation of the set of necessary triggering terms. However, the minimization step (performed before returning the triggering term that successfully validates), removes the unnecessary terms.

5 EVALUATION

Evaluating our work requires benchmarks with known triggering issues (i.e., for which E-matching yields unknown due to incomplete quantifier instantiations). Since there is no publicly available suite, in Section 5.1, we used manually-collected benchmarks from four verifiers [32, 38, 49, 52]. Our algorithm succeeded for 65.6%. To evaluate its applicability to other verifiers, in Section 5.2, we used SMT-COMP [5] inputs. As they were not designed to expose triggering issues, we developed a filtering step to automatically identify the subset that falls into this category. The results show that our algorithm is suited also for benchmarks from Spec# [16], Havoc [22], and VCC [47]. Section 5.3 illustrates that our triggering terms are simpler than the unsat proofs produced by quantifier instantiation and refutation techniques, enabling one to fix the root cause of the revealed issues.

Setup. We used Z3 (4.8.10) [24] to infer the patterns, generate the models, and validate the candidate terms. However, our open-source tool [7] can be used with any solver that supports E-matching and exposes the inferred patterns. We used Z3’s NNF tactic to transform the inputs into NNF and locality-sensitive hashing to compute the clusters. We fixed Z3’s random seeds to the following values: sat.random_seed to 488, smt.random_seed to 599, nlsat.seed to 611. We set the (soft) timeout to 600 s and the memory limit to 6 GB per run and used a 1 s timeout for obtaining a model and for validating a candidate term. The experiments were conducted on a Linux server with 252 GB of RAM and 32 Intel Xeon CPUs at 3.3 GHz and can be replicated via our Docker image [6].

5.1 Effectiveness on Verification Benchmarks with Known Triggering Issues

First, we used manually-collected benchmarks with known triggering issues from four state-of-the-art verifiers: Dafny [32], F* [49], Gobra [52], and Viper [38]. We reconstructed 4, respectively 2 inconsistent axiomatizations from Dafny and F*, based on the changes from the repositories and the messages from the issue trackers; we obtained 11 inconsistent axiomatizations of arrays and option types from Gobra’s developers and collected 15 incompleteness issues from Viper’s test suite [1], with at least one assertion needed only for triggering (we removed these assertions from the benchmarks, as our work is expected to find the triggering terms automatically). The Viper files contain algorithms for arrays, binomial heaps, binary search trees, and regression tests. The input sizes (minimum–maximum number of formulas or quantifiers) are shown in Table 1, columns \(\mathbf {\#F}\)–\(\mathbf {\#\forall }\).

Table 1.

	\(\mathbf {\#}\)	\(\mathbf {\#F}\)	\(\mathbf {\#\forall }\)	C0	C1	C2	C3	C4	Our	Z3	CVC4	Vampire
Source		min-max	min-max	default	\(\mathbf {\sigma = 0.1}\)	\(\mathbf {\beta = 1}\)	type	\(\mathbf {\sigma = 0.1\:\wedge \:}\)sub	work	MBQI	enum inst	CASC\(\mathbf {\:\wedge \:}\)Z3
Dafny	4	6–16	5–16	1	1	1	1	0	1	1	0	2
F*	2	18–2,388	15–2,543	1	1	1	1	2	2	1	0	2
Gobra	11	64–78	50–63	5	10	1	7	10	11	6	0	11
Viper	15	84–143	68–203	7	5	3	5	5	7	11	0	15
Total	32								21 (65.6%)	19 (59.3%)	0 (0%)	30 (93.7%)

\(\mathbf {\sigma }\) = similarity threshold; \(\mathbf {\beta }\) = batch size; type = type-based constraints; sub = sub-terms C0: \(\mathbf {\sigma = 0.3}\); \(\mathbf {\beta = 64}\); \(\mathbf {\lnot }\)type; \(\lnot\)sub.

View Table

Table 1. Results on Verification Benchmarks with Known Triggering Issues. The columns from left to right show: the source of the benchmarks, the number of files ( \(\mathbf {\#}\) ), their number of conjuncts ( \(\mathbf {\#F}\) ) and of quantifiers ( \(\mathbf {\#\forall }\) ), the number of files for which five different configurations of our algorithm (C0–C4) synthesized suited triggering terms, our results across all configurations, the number of unsat proofs generated by Z3 (with MBQI [28]), CVC4 (with enumerative instantiation [43]), and Vampire [31] (in CASC mode [48], using Z3 for ground theory reasoning). The columns marked with grey represent E-matching-based algorithms; only those can be soundly used by verifiers whose SMT encodings have patterns, i.e., are designed for E-matching

\(\mathbf {\sigma }\) = similarity threshold; \(\mathbf {\beta }\) = batch size; type = type-based constraints; sub = sub-terms C0: \(\mathbf {\sigma = 0.3}\); \(\mathbf {\beta = 64}\); \(\mathbf {\lnot }\)type; \(\lnot\)sub.

Configurations. We ran our tool with five configurations, to also analyze the impact of its parameters (see Algorithm 1 and Section 4.3). The default configuration C0 has: \(\sigma = 0.3\) (similarity threshold), \(\beta =64\) (batch size, i.e., the number of candidate terms validated together), \(\lnot\)type (no type-based constraints), \(\lnot\)sub (no unification for sub-terms). The other configurations differ from C0 in the parameters shown in Table 1. All configurations use \(\delta = 2\) (maximum transitivity depth), \(\mu = 4\) (maximum number of different models), and 600 s timeout per file.

Results. Columns C0–C4 in Table 1 show the number of files solved by each configuration, Our work summarizes those solved by at least one. Overall, we found suited triggering terms for 65.6%, including all F* and Gobra benchmarks. An F* unsoundness exposed by all configurations in \(\approx\)60 s is given in Figure 9. It required two developers to be manually diagnosed based on a bug report [4]. A simplified Gobra axiomatization for option types is shown in Figure 12; the entire axiomatization (considered in Table 1) was solved only by C4 in \(\approx\)13 s. Gobra’s team spent one week to identify some of the issues. As our triggering terms for F* and Gobra were similar to the manually-written ones, we believe they could have reduced the human effort in localizing and fixing the errors.

Our algorithm synthesized missing triggering terms for 7 Viper files, including the array maximum example [2], for which E-matching could not prove that the maximal element in a strictly increasing array of size 3 is its last element. Our triggering term loc(a,2) (loc maps arrays and integers to heap locations) can be added by a user of the verifier to their postcondition. A developer can fix the root cause of the incompleteness by including a generalization of the triggering term to arbitrary array sizes: len(a) != 0 ==> x == loc(a, len(a)-1).val (val allows one to access the value at the corresponding heap location). Both fixes result in E-matching refuting the proof obligation in under 0.1 s. We also exposed another case where Boogie (which is used by Viper) is sound only modulo patterns (as in Figure 3), i.e., the unsoundness is visible only at the SMT level.

As Table 1 shows, configurations with smaller \(\sigma\) (C1 and C4) were particularly important for some of the F* and Gobra benchmarks. Our algorithm starts with the given \(\sigma\) and if it does not find the required triggering terms, it decreases \(\sigma\) by 0.1 and reiterates. Thus C0 also covers the case \(\sigma = 0.1\), if the overall timeout is large enough. However, always starting with a small \(\sigma\) may prevent our algorithm from synthesizing the triggering terms, since the number of rewritings it has to explore is considerably high. The extensions for unifying sub-terms (C4) and identifying type-based constraints (C3) were also needed for one, respectively, two input files.

5.2 Effectiveness on SMT-COMP Benchmarks

Next, we considered 61 SMT-COMP [5] benchmarks from Spec# [16], VCC [47], Havoc [22], Simplify [25], and the Bit-Width-Independent (BWI) encoding [39]. These were selected automatically using a filtering algorithm that we designed (described below) and are summarized in Table 2.

Table 2.

	\(\mathbf {\#}\)	\(\mathbf {\#F}\)	\(\mathbf {\#\forall }\)	C0	C1	C2	C3	C4	Our	Z3	CVC4	Vampire
Source		min-max	min-max	default	\(\mathbf {\sigma = 0.1}\)	\(\mathbf {\beta = 1}\)	type	\(\mathbf {\sigma = 0.1\:\wedge \:}\)sub	Work	MBQI	enum inst	CASC\(\mathbf {\:\wedge \:}\)Z3
Spec#	33	28–2,363	25–645	16	16	14	16	15	16	16	0	29
VCC/Havoc	14	129–1,126	100–1,027	11	9	5	11	9	11	12	0	14
Simplify	1	256	129	0	0	0	0	0	0	1	0	0
BWI	13	189–384	198–456	1	1	2	1	1	2	12	0	12
Total	61								29(47.5%)	41 (67.2%)	0(0%)	55 (90.1%)

\(\mathbf {\sigma }\) = similarity threshold; \(\mathbf {\beta }\) = batch size; type = type-based constraints; sub = sub-terms C0: \(\mathbf {\sigma = 0.3}\); \(\mathbf {\beta = 64}\); \(\mathbf {\lnot }\)type; \(\lnot\)sub.

View Table

Table 2. Results on SMT-COMP Inputs. The columns have the structure from Table 1

\(\mathbf {\sigma }\) = similarity threshold; \(\mathbf {\beta }\) = batch size; type = type-based constraints; sub = sub-terms C0: \(\mathbf {\sigma = 0.3}\); \(\mathbf {\beta = 64}\); \(\mathbf {\lnot }\)type; \(\lnot\)sub.

Benchmarks selection. We collected all 27,716 benchmarks from SMT-COMP 2020 (single query track) [5], with ground truth unsat and at least one pattern (as this suggests they were designed for E-matching). We then ran Z3 to infer the missing patterns and to transform the formulas into NNF and removed all benchmarks for which the inference or the transformation did not succeed within 600 s per file and 4 s per formula. We also removed the files with features not yet supported by PySMT [27], the parsing library used in our experiments (e.g., sort signatures in datatypes declarations), but we did extend PySMT to handle, e.g., patterns and overloaded functions. This filtering resulted in 6,481 benchmarks. We then ran E-matching and kept only those 61 examples that could not be solved within 600 s due to incompleteness in instantiating quantifiers (our work only targets this incompleteness, but the SMT-COMP suite also contains other solving challenges).

Results. The results are shown in Table 2, which follows the structure of Table 1. Our algorithm enabled E-matching to refute 47.5% of the files, most of them from Spec# and VCC/Havoc. We manually inspected some BWI benchmarks (for which the algorithm had worse results) and observed that the validation step times out even with a much higher timeout. This shows that some candidate terms trigger matching loops and explains why C2 (which validates them individually) solved one more file. Extending our algorithm to avoid matching loops, by construction, is left as future work. The other configurations did not prove to be better than C0 for these SMT-COMP inputs.

5.3 Comparison with Unsatisfiability Proofs

As an alternative to our work, the developers of program verifiers could try to manually identify triggering issues from refutation proofs. In this experiment, we considered three state-of-the-art provers that rely on different solving strategies and could generate such proofs: Z3 (4.8.10, the same version as for our tool) with MBQI [28] (a model based quantifier instantiation technique), CVC4 (1.6) [17] with enumerative instantiation [43] (an algorithm based on E-matching, used when E-matching saturates), and the first-order theorem prover Vampire (4.4) [31], using Z3 for ground theory reasoning [42] and the CASC [48] portfolio mode with competition presets. Note that the only approach comparable with ours is enumerative instantiation; MBQI and Vampire do not consider patterns, thus they solve a different problem. We are not aware of any other work that synthesizes triggering terms for E-matching.

The last three columns in Tables 1 and 2 show the number of unsatisfiability proofs produced by each of these alternatives. CVC4 failed for all examples (it cannot construct proofs for quantified logics), Vampire refuted most of them. Our algorithm enabled E-matching to solve more inputs than MBQI for F* and Gobra and had similar results for Dafny, Spec#, and VCC/Havoc. All our five configurations solved two VCC/Havoc files not refuted by MBQI (Figure 16 presents one).

In terms of complexity, our triggering terms are much simpler than the proofs and directly highlight the root cause of the issues. The term loc(a,2) generated for Viper’s array maximum example from Section 5.1 is easier to understand than MBQI’s proof (which has 2,135 lines and over 700 reasoning steps) and than Vampire’s proof (with 348 lines and 101 inference steps). Other proofs have similar sizes. Therefore, determining the source of the inconsistency from such proofs requires expert knowledge of the tool-specific proof format and significant manual effort.

Most deductive verifiers [10, 13, 16, 22, 26, 32, 50] employ E-matching for discharging their proof obligations because E-matching is the most efficient SMT algorithm for program verification [28] (the vast majority of the SMT-COMP benchmarks we initially collected were also directly refuted by E-matching). It is thus important to help the developers use the algorithm of their choice and return sound results even if they rely on patterns for soundness (as in Figure 3).

As our algorithm accepts as input an SMT formula, it can also produce triggering terms required only at the SMT level, but which cannot be encoded into the input language of the verifier (e.g., Boogie), since they are rejected by the type system. However, such triggering terms can be filtered out, as lifting them to the input language is mostly straightforward (we performed this step manually in our experiments, to identify the cases of soundness modulo patterns; automating this process is a possible future extension). However, this is not the case for refutation proofs, whose back translation to the source language is an open research problem. To enable the developers debug the axiomatizations or fix the incompleteness more efficiently, our tool can also generate multiple triggering terms (as explained in Section 4.3). It can thus reveal multiple triggering issues for the same input formula, information which cannot be directly obtained from unsatisfiability proofs.

5.4 Threats to Validity

We identified the following two threats to the validity of our experiments:

Non-determinism. The SMT solvers use randomized algorithms, which can cause non-determi- nism. To mitigate this problem, we fixed all the available random seeds and used the same seeds in all the phases of our evaluation (i.e., for inferring the patterns, pre-filtering via E-matching, running our tool and MBQI).

Benchmarks selection. We relied on Z3’s E-matching algorithm to select examples with incompleteness in instantiating quantifiers. An implementation of E-matching from another solver could have led to different files. To avoid biases, we used Z3 in all the experiments.

6 OPTIMIZATIONS

In this section, we present various optimizations implemented in our tool, which allow the algorithm to scale to real-world verification benchmarks.

Grammar. The grammar from Figure 6 allows us to simplify the presentation of the algorithm. However, eliminating conjunctions by applying distributivity and splitting (as described in Section 4.1) can result in an exponential increase in the number of terms and introduce redundancy, affecting the performance. Conjunction elimination is not implemented in Z3’s NNF tactic (used in our evaluation from Section 5), thus it is not performed automatically. We apply this transformation only at the top level, i.e., we do not recursively distribute disjunctions over conjunctions. For this reason, the input conjuncts F supported by our tool can actually contain conjunctions, in which case, we use an extended algorithm when computing the instantiations, to ensure that all the resulting G formulas are still quantifier-free. The number of conjuncts and the number of quantifiers reported in Tables 1 and 2 were computed before applying distributivity, thus they are not artificially increased.

Rewritings. The restrictive shape of our rewritings (see (5)), ensures that their number is finite, because if it exists, the most general unifier is unique up to variable renaming, i.e., substitutions of the type \(\lbrace x_i \rightarrow x_j, x_j \rightarrow x_i\rbrace\) [14]. (Such substitutions are rewritings of shape (5), where rhs is also a quantified variable.) However, for most practical examples, the number of rewritings is very large, thus our implementation identifies them lazily, in increasing order of cardinality. If a rewriting \(r \in R\) leads to an unsat G formula for some instantiations, then we discard all the subsequent G formulas that contain r and the same instantiations (they will also be unsatisfiable). To make sure that the algorithm terminates within a given amount of time, in our experiments we bound the number of G formulas derived for each quantified conjunct F to 100.

Instantiations. Our implementation computes lazily the Cartesian product of the instantiations (Algorithm 1, line 9) since it can also have a high number of elements. However, many of them are in practice unsatisfiable; our tool efficiently identifies trivial conflicts (e.g., \(\lnot D_i \wedge D_i\)), pruning the search space accordingly.

Candidate terms. To improve the performance of our algorithm, we keep track of all the candidate triggering terms that failed to validate (i.e., of the models from which they were synthesized). Then, we add constraints (similar to the conjunct \(\lnot \mathtt {model}\) from Algorithm 1, line 19) to ensure the solver does not provide previously-seen models for the quantified variables from the same set of patterns.

7 LIMITATIONS

In the following, we discuss the limitations of our approach, as well as possible solutions.

Applicability. Our algorithm effectively addresses a common cause of failed unsatisfiability proofs in program verification, i.e., missing triggering terms. Other causes (e.g., incompleteness in the solver’s decision procedures due to undecidable theories) are beyond the scope of our work. Also, our algorithm is tailored to unsatisfiability proofs; satisfiability proofs cannot be reduced to unsatisfiability proofs by negating the input, because the negation cannot usually be encoded in SMT (as we have explained in Section 3).

SMT solvers. Our algorithm synthesizes triggering terms as long as the solver can find models for our quantifier-free formulas. However, solvers are incomplete, i.e., they can return unknown and produce only partial models, which are not guaranteed to be correct. Nonetheless, we use also partial models, as the validation step (step 4 in Figure 4) ensures that they do not lead to false positives.

Patterns. Since our algorithm is based on patterns (provided or inferred), it will not succeed if they do not permit the necessary instantiations. For example, the formula \(\forall x:\mathrm{Int}, y:\mathrm{Int} :: x = y\) is unsatisfiable. However, the SMT solver cannot automatically infer a pattern from the body of the quantifier, since equality is an interpreted function and must not occur in a pattern. Thus E-matching (and implicitly our algorithm) cannot solve this example, unless the user provides as pattern some uninterpreted function that mentions both x and y (e.g., \(\mathtt {f}(x, y)\)).

Bounds and rewritings. Synthesizing triggering terms is generally undecidable. We ensure termination by bounding the search space through various customizable parameters, thus our algorithm misses results not found within these bounds. We also only unify applications of uninterpreted functions, which are common in verification. Efficiently supporting interpreted functions (especially equality) is very challenging for inputs with a small number of types (e.g., from Boogie [15]).

Intended behavior. Our technique can detect soundness errors in axiomatizations, but it cannot check if the given axioms correctly model the intended behavior of the uninterpreted functions. For instance, the formula \(F = \forall t:\mathrm{V} :: \lbrace \mathtt {Empty}(t)\rbrace \; \mathtt {Len}(\mathtt {Empty}(t)) = 7\) is satisfiable, as the solver can find an interpretation for the uninterpreted functions \(\mathtt {Empty}\) and \(\mathtt {Len}\) (representing empty sequences and the length of a sequence, respectively). Our algorithm is thus not applicable. Nonetheless, F wrongly axiomatizes empty sequences, whose length should be 0, for all the possible types. Approaches that use non-axiomatic semantics, such as VST [12] or CompCert [36], could address this problem, which is orthogonal to our work.

Despite these limitations, our algorithm effectively identifies the triggering terms required in practical examples, as we have experimentally shown in Section 5.

8 RELATED WORK

To the best of our knowledge, no other approach automatically produces the information needed by the developers of program verifiers to remedy the effects of overly restrictive patterns. Quantifier instantiation and refutation techniques (discussed next) can produce unsatisfiability proofs, but these are much more complex than our triggering terms (as we have shown in Section 5.3).

Quantifier instantiation techniques. Model-based quantifier instantiation (MBQI) [28] was designed for satisfiable formulas. It checks if the models obtained for the quantifier-free part of the input satisfy the quantifiers, whereas we check if the synthesized triggering terms obtained for some interpretation of the uninterpreted functions generalize to all interpretations. In some cases, MBQI can also generate unsatisfiability proofs, but they require expert knowledge to be understood; our triggering terms are much simpler. Counterexample-guided quantifier instantiation [44] is a technique for satisfiable formulas, which synthesizes computable functions from logical specifications. It is applicable to functions whose specifications have explicit syntactic restrictions on the space of possible solutions, which is usually not the case for axiomatizations. Thus the technique cannot directly solve the complementary problem of proving the soundness of the axiomatization.

E-matching-based approaches. Rümmer [46] proposed a calculus for first-order logic modulo linear integer arithmetic that integrates constraint-based free variable reasoning with E-matching. Our algorithm does not require reasoning steps, so it is applicable to formulas from all the logics supported by the SMT solver. Enumerative instantiation [43] is an approach that exhaustively enumerates ground terms from a set of ordered, quantifier-free terms from the input. It can be used to refute formulas with quantifiers, but not to construct proofs (see Section 5.3). Our algorithm derives quantifier-free formulas and synthesizes the triggering terms from their models, even if the input does not have a quantifier-free part; we use also syntactic information (obtained from the rewritings) to construct complex triggering terms.

Theorem provers. First-order theorem provers (e.g., Vampire [31]) also generate refutation proofs. More recent works combine a superposition calculus with theory reasoning [42, 51], integrating SAT/SMT solvers with theorem provers. We also use unification, but to synthesize triggering terms required by E-matching. However, our triggering terms are much simpler than Vampire’s proofs and can be used to improve the triggering strategies for all future runs of the verifier.

Detecting matching loops. Becker et al. [19] dynamically detect performance issues in quantified SMT formulas that already include triggering terms, by identifying too permissive patterns that lead to matching loops. Our work targets soundness and completeness errors and synthesizes the triggering terms required to refute SMT inputs with overly restrictive patterns.

Testing verifiers. As formally verifying state-of-the-art program analysis tools (i.e., static analyzers, program verifiers) is rarely possible in practice [11, 21], a few research efforts focus on testing them. Ahn and Denney [9] proposed an approach that identifies inconsistencies in the quantified axiomatizations without patterns of a verifier. The work also requires a computational model of the axioms, which includes interpretations for all the function symbols. Thus, it cannot be applied to axiomatizations with uninterpreted functions and types, which are very common in program verification. The technique tests each axiom in isolation, so it cannot find non-trivial inconsistencies caused by the interaction between axioms. Our approach is fully automatic and detects complex errors by identifying sharing constraints between formulas and synthesizing triggering terms from clusters of similar formulas. Recent work by Irfan et al. [30] tests the soundness and precision of the Dafny verifier [32] by generating annotated random programs, which, by construction, fulfill or violate their specifications. The approach detects an error if the verifier accepts a program it should reject or vice versa; this decision is based on an SMT solver being able to refute the corresponding SMT formula or to find a model for it. However, the technique does not address the case in which the solver returns unknown. Our work provides a solution for incomplete quantifier instantiations.

9 CONCLUSIONS

In this article, we presented the first automated technique that enables the users and the developers of verifiers remedy the effects of overly restrictive patterns. Since discharging proof obligations and identifying inconsistencies in axiomatizations require the SMT solver to prove the unsatisfiability of a formula via E-matching, we developed a novel algorithm for synthesizing triggering terms that allow the solver to complete the proof. Our approach is effective for a diverse set of verifiers, and can significantly reduce the human effort in localizing and fixing triggering issues. This article focuses on the applications of our algorithm to program verification. However, our technique is suited also for other kinds of first-order constraint-solving problems, e.g., system-level modeling via Event-B [8] and the Isabelle/HOL [40] backend Sledgehammer [41]. (A thorough evaluation of our tool on systems beyond program verifiers is left as future work.) To tackle such problems, solvers generally require additional guidance (typically, in the form of syntactic patterns) to decide how the space of possible quantifier instantiations can be pruned, resulting in tractable, yet incomplete, search strategies. This article offers a systematic approach to gradually improve their completeness. As future work, we also plan to extend the syntactic unification, to efficiently support commonly-used interpreted functions and to avoid generating triggering terms that cause matching loops. Automatically determining the best combination of parameters (i.e., the best configuration of our algorithm) for a specific input formula is another research direction we would like to explore in the future. We also plan to investigate if our triggering terms could be used to identify potential fixes for unsound axiomatizations or to guide the developers in devising new, sound ones.

ACKNOWLEDGMENTS

We would like to thank the anonymous FAC’22 journal reviewers, as well as the FM’21 reviewers for their insightful comments and suggestions. We are also grateful to Felix Wolf for providing us the Gobra benchmarks, and to Evgenii Kotelnikov for his detailed explanations about Vampire.

REFERENCES

[1] 2013. Viper Test Suite. Retrieved from https://github.com/viperproject/silver/tree/master/src/test/resources. Accessed on May 4, 2021.Google Scholar
Reference
[2] 2015. Array Maximum, by Elimination. Retrieved from http://viper.ethz.ch/examples/max-array-elimination.html. Accessed on May 6, 2021.Google Scholar
Reference
[3] 2019. The 14th International Satisfiability Modulo Theories Competition (Including Pending Benchmarks). Retrieved from https://smt-comp.github.io/2019/, https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks-tmp/benchmarks-pending. Accessed on May 14, 2020.Google Scholar
Reference 1Reference 2
[4] 2019. F* Issue 1848. Retrieved from https://github.com/FStarLang/FStar/issues/1848. Accessed on May 6, 2021.Google Scholar
Reference
[5] 2020. The 15th International Satisfiability Modulo Theories Competition. Retrieved from https://smt-comp.github.io/2020/. Accessed on May 6, 2021.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[6] 2022. Our Docker Image. Retrieved from https://hub.docker.com/r/aterga/smt-triggen. Accessed on August 21, 2022.Google Scholar
Reference
[7] 2022. Our Tool. Retrieved from https://github.com/alebugariu/smt-triggen. Accessed on August 21, 2022.Google Scholar
Reference
[8] Abrial Jean-Raymond. 2010. Modeling in Event-B: System and Software Engineering (1st ed.). Cambridge University Press. Google ScholarCross Ref
Reference
[9] Ahn Ki Yung and Denney Ewen. 2010. Testing first-order logic axioms in program verification. In Tests and Proofs. Fraser Gordon and Gargantini Angelo (Eds.), Springer, Berlin, 22–37. Google ScholarCross Ref
Reference
[10] Amighi Afshin, Blom Stefan, and Huisman Marieke. 2016. VerCors: A layered approach to practical verification of concurrent software. In Proceedings of the 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE Computer Society, 495–503. Retrieved from https://ieeexplore.ieee.org/abstract/document/7445381.Google ScholarCross Ref
Reference 1Reference 2Reference 3
[11] Andreasen Esben Sparre, Møller Anders, and Nielsen Benjamin Barslev. 2017. Systematic approaches for increasing soundness and precision of static analyzers. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP’17). Association for Computing Machinery, New York, NY, 31–36. DOI:Google ScholarDigital Library
Reference
[12] Appel Andrew W.. 2011. Verified software toolchain. In Programming Languages and Systems. Barthe Gilles (Ed.), Springer, Berlin, 1–17. Google Scholar
Reference
[13] Astrauskas Vytautas, Müller Peter, Poli Federico, and Summers Alexander J.. 2019. Leveraging Rust types for modular specification and verification, In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’19). Proceedings of the ACM on Programming Languages 3, 147:1–147:30. DOI:Google ScholarDigital Library
Reference 1Reference 2Reference 3
[14] Baader Franz and Snyder Wayne. 2001. Unification theory. In Handbook of Automated Reasoning. Robinson John Alan and Voronkov Andrei (Eds.), Elsevier and MIT Press, 445–532.Google ScholarCross Ref
Reference 1Reference 2
[15] Barnett Michael, Chang Bor-Yuh Evan, DeLine Robert, Jacobs Bart, and Leino K. Rustan M.. 2005. Boogie: A modular reusable verifier for object-oriented programs. In Formal Methods for Components and Objects (FMCO’05)(Lecture Notes in Computer Science, Vol. 5). Boer Frank S. de, Bonsangue Marcello M., Graf Susanne, and Roever Willem P. de (Eds.), Springer, 364–387.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[16] Barnett Michael, Fähndrich Manuel, Leino K. Rustan M., Müller Peter, Schulte Wolfram, and Venter Herman. 2011. Specification and verification: The Spec# experience. Communications of the ACM 54, 6 (June2011), 81–91.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[17] Barrett Clark, Conway Christopher L., Deters Morgan, Hadarean Liana, Jovanović Dejan, King Tim, Reynolds Andrew, and Tinelli Cesare. 2011. CVC4. In Computer Aided Verification. Gopalakrishnan Ganesh and Qadeer Shaz (Eds.), Springer, Berlin, 171–177. Google Scholar
Reference
[18] Barrett Clark, Fontaine Pascal, and Tinelli Cesare. 2017. The SMT-LIB Standard: Version 2.6. Technical Report. Department of Computer Science, The University of Iowa. Available at www.SMT-LIB.org.Google Scholar
Reference 1Reference 2
[19] Becker Nils, Müller Peter, and Summers Alexander J.. 2019. The axiom profiler: Understanding and debugging SMT quantifier instantiations. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS’19)(LNCS, Vol. 11427). Vojnar Tomás and Zhang Lijun (Eds.), Springer-Verlag, 99–116.Google ScholarDigital Library
Reference
[20] Bugariu Alexandra, Ter-Gabrielyan Arshavir, and Müller Peter. 2021. Identifying overly restrictive matching patterns in SMT-based program verifiers. In Formal Methods. Huisman Marieke, Păsăreanu Corina, and Zhan Naijun (Eds.), Springer International Publishing, Cham, 273–291. Google Scholar
Reference
[21] Cadar Cristian and Donaldson Alastair F.. 2016. Analysing the program analyser. In Proceedings of the 38th International Conference on Software Engineering Companion (ICSE’16). Association for Computing Machinery, New York, NY, 765–768. DOI:Google ScholarDigital Library
Reference
[22] Chatterjee Shaunak, Lahiri Shuvendu K., Qadeer Shaz, and Rakamarić Zvonimir. 2007. A reachability predicate for analyzing low-level software. In Tools and Algorithms for the Construction and Analysis of Systems. Grumberg Orna and Huth Michael (Eds.), Springer, Berlin, 19–33. Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
[23] Darvas Ádám and Leino K. Rustan M.. 2007. Practical reasoning about invocations and implementations of pure methods. In Fundamental Approaches to Software Engineering (FASE’07)(LNCS, Vol. 4422). Dwyer Matthew B. and Lopes Antónia (Eds.), Springer-Verlag, 336–351.Google ScholarCross Ref
Reference
[24] Moura Leonardo de and Bjørner Nikolaj. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems. Ramakrishnan C. R. and Rehof Jakob (Eds.), Springer, Berlin, 337–340. Google ScholarCross Ref
Reference 1Reference 2
[25] Detlefs David, Nelson Greg, and Saxe James B.. 2005. Simplify: A theorem prover for program checking. Journal of the ACM 52, 3 (May2005), 365–473. DOI:Google ScholarDigital Library
Reference 1Reference 2Reference 3
[26] Eilers Marco and Müller Peter. 2018. Nagini: A static verifier for python. In Computer Aided Verification (CAV’18)(LNCS, Vol. 10982). Chockler Hana and Weissenbacher Georg (Eds.), Springer International Publishing, 596–603. DOI:Google ScholarCross Ref
Reference 1Reference 2Reference 3
[27] Gario Marco and Micheli Andrea. 2015. PySMT: A solver-agnostic library for fast prototyping of SMT-based algorithms. In Proceedings of the SMT Workshop 2015.Google Scholar
Reference
[28] Ge Yeting and Moura Leonardo de. 2009. Complete instantiation for quantified formulas in satisfiabiliby modulo theories. In Computer Aided Verification. Bouajjani Ahmed and Maler Oded (Eds.), Springer, Berlin, 306–320. Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[29] Heule Stefan, Kassios Ioannis T., Müller Peter, and Summers Alexander J.. 2013. Verification condition generation for permission logics with abstract predicates and abstraction functions. In European Conference on Object-Oriented Programming (ECOOP’13)(Lecture Notes in Computer Science, Vol. 7920). Castagna Giuseppe (Ed.), Springer, 451–476.Google ScholarDigital Library
Reference
[30] Irfan Ahmed, Porncharoenwase Sorawee, Rakamarić Zvonimir, Rungta Neha, and Torlak Emina. 2022. Testing dafny (experience paper). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual, South Korea) (ISSTA’22). Association for Computing Machinery, New York, NY, 556–567. DOI:Google ScholarDigital Library
Reference
[31] Kovács Laura and Voronkov Andrei. 2013. First-order theorem proving and vampire. In Computer Aided Verification. Sharygina Natasha and Veith Helmut (Eds.), Springer, Berlin, 1–35. Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[32] Leino K. Rustan M.. 2010. Dafny: An automatic program verifier for functional correctness. In Logic for Programming, Artificial Intelligence, and Reasoning. Clarke Edmund M. and Voronkov Andrei (Eds.), Springer, Berlin, 348–370. Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
[33] Leino K. Rustan M. and Monahan Rosemary. 2009. Reasoning about comprehensions with first-order SMT solvers. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC’09). Association for Computing Machinery, New York, NY, 615–622. DOI:Google ScholarDigital Library
Reference 1Reference 2Reference 3
[34] Leino K. Rustan M. and Müller Peter. 2008. Verification of equivalent-results methods. In European Symposium on Programming (ESOP’08)(Lecture Notes in Computer Science, Vol. 4960). Drossopoulou S. (Ed.), Springer-Verlag, 307–321.Google Scholar
Reference
[35] Leino K. Rustan M. and Rümmer Philipp. 2010. A polymorphic intermediate verification language: Design and logical encoding. In Tools and Algorithms for the Construction and Analysis of Systems. Esparza Javier and Majumdar Rupak (Eds.), Springer, Berlin, 312–327. Google Scholar
Reference
[36] Leroy Xavier. 2009. Formal verification of a realistic compiler. Communications of the ACM 52, 7 (Jul2009), 107–115. DOI:Google ScholarDigital Library
Reference
[37] Moskal Michał. 2009. Programming with triggers. In Proceedings of the 7th International Workshop on Satisfiability Modulo Theories. ACM, 20–29.Google ScholarDigital Library
Reference
[38] Müller Peter, Schwerhoff Malte, and Summers Alexander J.. 2016. Viper: A verification infrastructure for permission-based reasoning. In Verification, Model Checking, and Abstract Interpretation (VMCAI’16)(LNCS, Vol. 9583). Jobstmann B. and Leino K. R. M. (Eds.), Springer-Verlag, 41–62.Google Scholar
Reference 1Reference 2
[39] Niemetz Aina, Preiner Mathias, Reynolds Andrew, Zohar Yoni, Barrett Clark, and Tinelli Cesare. 2019. Towards bit-width-independent proofs in SMT solvers. In Automated Deduction—CADE 27. Fontaine Pascal (Ed.), Springer International Publishing, Cham, 366–384. Google ScholarDigital Library
Reference
[40] Nipkow Tobias, Paulson Lawrence C., and Wenzel Markus. 2002. Isabelle/HOL—A Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, Vol. 2283. Springer. Google ScholarCross Ref
Reference
[41] Paulson Lawrence C. and Susanto Kong Woei. 2007. Source-level proof reconstruction for interactive theorem proving. In Theorem Proving in Higher Order Logics. Schneider Klaus and Brandt Jens (Eds.), Springer, Berlin, 232–245. Google ScholarCross Ref
Reference
[42] Reger Giles, Bjorner Nikolaj, Suda Martin, and Voronkov Andrei. 2016. AVATAR modulo theories. In GCAI 2016. 2nd Global Conference on Artificial Intelligence(EPiC Series in Computing, Vol. 41). Benzmüller Christoph, Sutcliffe Geoff, and Rojas Raul (Eds.), EasyChair, 39–52. DOI:Google ScholarCross Ref
Reference 1Reference 2
[43] Reynolds Andrew, Barbosa Haniel, and Fontaine Pascal. 2018. Revisiting enumerative instantiation. In Tools and Algorithms for the Construction and Analysis of Systems. Beyer Dirk and Huisman Marieke (Eds.), Springer International Publishing, Cham, 112–131. Google Scholar
Reference 1Reference 2Reference 3
[44] Reynolds Andrew, Deters Morgan, Kuncak Viktor, Tinelli Cesare, and Barrett Clark. 2015. Counterexample-guided quantifier instantiation for synthesis in SMT. In Computer Aided Verification. Kroening Daniel and Pasareanu Corina S. (Eds.), Springer International Publishing, Cham, 198–216. Google Scholar
Reference
[45] Rudich Arsenii, Darvas Ádám, and Müller Peter. 2008. Checking well-formedness of pure-method specifications. In Formal Methods (FM)(Lecture Notes in Computer Science, Vol. 5014). Cuellar J. and Maibaum T. (Eds.), Springer-Verlag, 68–83.Google Scholar
Reference
[46] Rümmer Philipp. 2012. E-Matching with free variables. In Logic for Programming, Artificial Intelligence, and Reasoning. Bjørner Nikolaj and Voronkov Andrei (Eds.), Springer, Berlin, 359–374. Google Scholar
Reference
[47] Schulte Wolfram. 2008. VCC: Contract-based modular verification of concurrent C. In Proceedings of the 31st International Conference on Software Engineering, ICSE’09 (31st international conference on software engineering, icse 2009 ed.). IEEE Computer Society. Retrieved from https://www.microsoft.com/en-us/research/publication/vcc-contract-based-modular-verification-of-concurrent-c/.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[48] Sutcliffe Geoff. 2016. The CADE ATP system competition—CASC. AI Magazine 37, 2 (2016), 99–101.Google ScholarCross Ref
Reference 1Reference 2
[49] Swamy Nikhil, Hrit̨cu Cătălin, Keller Chantal, Rastogi Aseem, Delignat-Lavaud Antoine, Forest Simon, Bhargavan Karthikeyan, Fournet Cédric, Strub Pierre-Yves, Kohlweiss Markulf, Zinzindohoue Jean-Karim, and Zanella-Béguelin Santiago. 2016. Dependent types and multi-monadic effects in F*. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16). Association for Computing Machinery, New York, NY, 256–270. DOI:Google ScholarDigital Library
Reference 1Reference 2Reference 3
[50] Swamy Nikhil, Weinberger Joel, Schlesinger Cole, Chen Juan, and Livshits Benjamin. 2013. Verifying higher-order programs with the dijkstra monad. In Proceedings of the 34th Annual ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 387–398. Retrieved from https://www.microsoft.com/en-us/research/publication/verifying-higher-order-programs-with-the-dijkstra-monad/.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[51] Voronkov Andrei. 2014. AVATAR: The architecture for first-order theorem provers. In Computer Aided Verification. Biere Armin and Bloem Roderick (Eds.), Springer International Publishing, Cham, 696–710. Google Scholar
Reference
[52] Wolf Felix A., Arquint Linard, Clochard Martin, Oortwijn Wytse, Pereira Joao C., and Müller Peter. 2021. Gobra: Modular specification and verification of go programs. In Computer Aided Verification (CAV’21). Silva Alexandra and Leino R. (Eds.), Springer International Publishing, 367–379.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4

Index Terms

Identifying Overly Restrictive Matching Patterns in SMT-based Program Verifiers (Extended Version)
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Formal software verification

Recommendations

Identifying Overly Restrictive Matching Patterns in SMT-Based Program Verifiers
Formal Methods
Abstract
Universal quantifiers occur frequently in proof obligations produced by program verifiers, for instance, to axiomatize uninterpreted functions and to express properties of arrays. SMT-based verifiers typically reason about them via E-matching, an ...
Read More
Programming with triggers
SMT '09: Proceedings of the 7th International Workshop on Satisfiability Modulo Theories

We give a case study for a Satisfiability Modulo Theories (SMT) solver usage in functional verification of a real world operating system. In particular, we present a view of the E-matching pattern annotations on quantified formulas as a kind of logic ...
Read More
Evaluation of SMT solvers in abstraction-based software model checking
LADC '22: Proceedings of the 11th Latin-American Symposium on Dependable Computing

The verification of safety-critical software systems is an algorithmically complex task, which often utilizes SMT (Satisfiability Modulo Theory) solvers for computing abstractions and refinements. It follows that the performance of SMT solvers impacts ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Formal Aspects of Computing Volume 35, Issue 2
June 2023
187 pages
ISSN:0934-5043
EISSN:1433-299X
DOI:10.1145/3605783
Editor:
Jim Woodcock
University of York, UK
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2023
- Online AM: 18 November 2022
- Accepted: 13 November 2022
- Revised: 20 August 2022
- Received: 31 March 2022
Published in fac Volume 35, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Matching patterns
Triggering terms
SMT
E-matching
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 337
  Total Downloads
- Downloads (Last 12 months)306
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Identifying Overly Restrictive Matching Patterns in SMT-based Program Verifiers (Extended Version)

Formal Aspects of Computing

Abstract

1 INTRODUCTION

2 BACKGROUND: E-MATCHING

3 OVERVIEW

4 SYNTHESIZING TRIGGERING TERMS

4.1 Input Formula

4.2 Algorithm

4.3 Extensions

4.4 Additional Examples

5 EVALUATION

5.1 Effectiveness on Verification Benchmarks with Known Triggering Issues

5.2 Effectiveness on SMT-COMP Benchmarks

5.3 Comparison with Unsatisfiability Proofs

5.4 Threats to Validity

6 OPTIMIZATIONS

7 LIMITATIONS

8 RELATED WORK

9 CONCLUSIONS

ACKNOWLEDGMENTS

REFERENCES

Cited By

Index Terms

Recommendations

Identifying Overly Restrictive Matching Patterns in SMT-Based Program Verifiers

Programming with triggers

Evaluation of SMT solvers in abstraction-based software model checking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media