Abstract
Static analyses based on typestates are important in certifying correctness of code contracts. Such analyses rely on Deterministic Finite Automata (DFAs) to specify properties of an object. We target the analysis of contracts in low-latency environments, where many useful contracts are impractical to codify as DFAs and/or the size of their associated DFAs leads to sub-par performance. To address this bottleneck, we present a lightweight compositional typestate analyzer, based on an expressive specification language that can succinctly specify code contracts. By implementing it in the static analyzer Infer, we demonstrate considerable performance and usability benefits when compared to existing techniques. A central insight is to rely on a sub-class of DFAs whose analysis uses efficient bit-vector operations.
1 INTRODUCTION
Industrial-scale software is generally composed of multiple interacting components, which are typically produced separately. As a result, software integration is a major source of bugs [20]. Many integration bugs can be attributed to violations of code contracts. Because these contracts are implicit and informal in nature, the resulting bugs are particularly insidious. To address this problem, formal code contracts are an effective solution [13] because static analyzers can automatically check whether client code adheres to ascribed contracts.
Typestate is a fundamental concept in ensuring the correct use of contracts and APIs. A typestate refines the concept of a type: whereas a type denotes the valid operations on an object, a typestate denotes operations valid on an object in its current program context [23]. Typestate analysis is a technique used to enforce temporal code contracts. In object-oriented programs, where objects change state over time, typestates denote the valid sequences of method calls for a given object. The behavior of the object is prescribed by the collection of typestates, and each method call can potentially change the object’s typestate.
Given this, it is natural for static typestate checkers, such as Fugue [10], SAFE [26], and Infer’s Topl checker [1], to define the analysis property using Deterministic Finite Automata (DFAs). The abstract domain of the analysis is a set of states in the DFA; each operation on the object modifies the set of possible reachable states. If the set of abstract states contains an error state, then the analyzer warns the user that a code contract may be violated. Widely applicable and conceptually simple, DFAs are the de facto model in typestate analyses.
Here, we target the analysis of realistic code contracts in low-latency environments such as, e.g., Integrated Development Environments (IDEs) [24, 25]. In this context, to avoid noticeable disruptions in the users’ workflow, the analysis should ideally run under a second [2]. However, relying on DFAs jeopardizes this goal, as it can lead to scalability issues.
To illustrate these limitations, consider the representative example of a class with four setter/ getter method pairs, where each setter method enables a corresponding getter method and then disables itself; the intention is that values can be set up once and accessed multiple times. The associated DFA contract has \(2^4\) states, as any subset of getter methods can be available at a particular program point, depending on previous calls (cf. Figure 1). Additionally, the full DFA-based specification requires as many as 64 state transitions. To see this, each state has 4 transitions available, with complementary enabled setter and getter methods; this way, e.g., state \(q_3\) has outgoing transitions with labels \(g_1, g_2, s_3, s_4\) and state \(q_7\) has outgoing transitions with labels \(g_1, g_2, g_3, s_4\). In the general case (n methods), a DFA for this kind of contract can have \(2^{n}\) states. Even with a small n, as in Figure 1, such contracts are impractical to codify manually and are likely to result in sub-par performance.
This kind of enable/disable properties are referred to as may call properties. Interestingly, the specification of common “must call” properties can also result in prohibitively large DFA state-space. As an example, consider a class that has m pairs of methods for acquiring/releasing some resources. The contract should ensure that all acquired resources are released before the object is destructed. Because states would need to track unreleased resources, a DFA for this contract requires \(2^m\) states.
Any DFA-based typestate analysis crucially depends on the number of states. Typically the analysis has a finite-state domain and a distributive transfer function; it falls into a category of so-called distributive analysis that admits precise interprocedural (compositional) analysis in polynomial time (see IFDS [21]). The number of states is critical: in the worst case, the analysis takes \(|Q|^3\) operations per method invocation, where Q is the set of states of the underlying DFA. To see why this is the case, we may notice that a procedure can be invoked in any state—thus, we need to analyze a function with every state as a potential entry state. Furthermore, this per-state analysis must deal with subsets of states. Thus, contracts that induce a large state space can severely impact the performance of the compositional analysis.
Interestingly, many practical contracts do not require a full DFA. In our enable/disable example, the method dependencies are local to a subset of methods—an enabling/disabling relation concerns a pair of methods. In contrast, DFA-based approaches have by definition a global standpoint; as a result, local method dependencies can impact transitions of unrelated methods. Thus, using DFAs for contracts that specify dependencies that are local to each method (or to a few methods) is redundant and/or prone to inefficient implementations.
Our Solution. Based on these observations, we present a lightweight typestate analyzer for locally dependent code contracts in low-latency environments. It rests upon two insights:
(1) | Allowed and disallowed sequences of method calls for objects can be succinctly specified without using DFAs. To unburden the task of specifying typestates, we introduce lightweight annotations to specify method dependencies as annotations on methods. Lightweight annotations can specify code contracts for usage scenarios commonly encountered when using libraries such as File, Stream, Socket, and so on, in considerably fewer lines of code than DFAs. | ||||
(2) | A sub-class of DFAs suffices to express many useful code contracts. To give semantics to lightweight annotations, we define Bit-Vector Finite Automata ( |
Importantly, code contracts that are locally dependent allow efficient reasoning about contract subtyping, as required by class inheritance. Relying on DFAs can make reasoning and specifying contract subtyping a difficult task. Suppose \(c_2\) is a sub-class of \(c_1\) (i.e., \(c_1\) is the super-class of \(c_2\)). Intuitively, a contract for \(c_2\) must be at least as permissive as a contract for \(c_1\). That is, a set of allowed sequences of method invocations for \(c_2\) must subsume that of \(c_1\). Locally-dependent contracts enable succinct specifications, which in turn enable an efficient subsumption checking algorithm, thereby making reasoning about subtyping an easy task. Indeed, by relying on our annotation language, we can check the subtyping relation simply by comparing annotations of the corresponding methods of super- and sub-classes; because this comparison operation is usual set inclusion, subtyping checking is insensitive to the number of states in a corresponding DFA.
We have implemented our lightweight typestate analysis in the industrial-strength static analyzer Infer [8]. Our analysis exhibits concrete usability and performance advantages and is expressive enough to encode many relevant typestate properties in the literature. On average, compared to state-of-the-art typestate analyses, our approach requires less annotations than DFA-based analyzers and does not exhibit slowdowns due to state increase.
Contributions and Organization. We summarize our contributions as follows:
– | A specification language for typestates based on lightweight annotations. Our language rests upon | ||||
– | A lightweight analysis technique for code contracts, implemented in Infer (Section 3). An associated artifact is publicly available [4]. | ||||
– | The specification language in Section 2 and the analysis technique in Section 3 concern “may call” properties, which involve methods that may be called at some program point. In Section 4, we extend our approach to consider also “must call” properties, which are useful to express that a method requires another one to be invoked in a code continuation. | ||||
– | Extensive evaluations for our lightweight analysis technique, which demonstrate considerable gains in performance and usability (Section 5). |
We review related work in Section 6 and collect some closing remarks in Section 7.
This article is an extended and revised version of our conference paper [3]. In this presentation, we consider a more general formalism of
2 BIT-VECTOR TYPESTATE ANALYSIS
2.1 Annotation Language
We introduce
We define some base sets and notations.
We will often use E and D to denote subsets of \(\Sigma ^{\bullet }_c\). Also, we shall write \(\tilde{x}\) to denote finite sequences of elements \(x_1, \ldots , x_k\) (with \(k \gt 0\)).
Definition. Following the above intuitions on “\(\texttt {@Enable}(n) \ m\)” and “\(\texttt {@Disable}(n) \ m\)”, we define
(Annotation Language).
Let \(c \in \mathit {Classes}\) such that \(\Sigma ^{}_c= \lbrace m^\uparrow , m_1, \ldots , m_n, m^\downarrow \rbrace\). We have:
Let \(\tilde{x} = m^\uparrow , x_1, x_2, \ldots\) be a sequence where each \(x_i \in \Sigma ^{\bullet }_c\). We say that \(\tilde{x}\) is valid (w.r.t. annotations) if for all subsequences \(\tilde{x}^{\prime }=x_i, \ldots ,x_{k}\) of \(\tilde{x}\) such that \(x_k \in D_i\) there is j (\(i \lt j \le k\)) such that \(x_k \in E_j\).
The formal semantics for these specifications is given in Section 2.2. We note that if \(E_i\) or \(D_i\) is \(\emptyset\) then we omit the corresponding annotation.
Derived Annotations. The annotation language can be used to derive other useful annotations: \(\begin{align*} \texttt {@EnableOnly}(E_i) \ m_i &\stackrel{\text{def}}{=}\texttt {@Enable}(E_i) \ \texttt {@Disable}(\Sigma ^{\bullet }_c\setminus E_i) \ m_i \\ \texttt {@DisableOnly}(D_i) \ m_i &\stackrel{\text{def}}{=}\texttt {@Disable}(D_i) \ \texttt {@Enable}(\Sigma ^{\bullet }_c\setminus D_i) \ m_i \\ \texttt {@EnableAll} \ m_i &\stackrel{\text{def}}{=}\texttt {@Enable}(\Sigma ^{\bullet }_c) \ m_i \end{align*}\) This way, the annotation “\(\texttt {@EnableOnly}(E_i) \ m_i\)” asserts that a call to method \(m_i\) enables only calls to methods in \(E_i\) while disabling all other methods in \(\Sigma ^{\bullet }_c\). The annotation “\(\texttt {@DisableOnly}(D_i) \ m_i\)” is defined dually. Finally, the annotation “\(\texttt {@EnableAll} \ m_i\)” asserts that a call to method \(m_i\) enables all methods in a class; an annotation “\(\texttt {@DisableAll} \ m_i\)” can be defined similarly.
Examples. We illustrate the expressivity and usability of
Next, we consider the
Some method call sequences do not cause errors but have redundancies. For example, we can disallow consecutive calls to
“
as the result of the first call to
Figure 2 gives the corresponding DFA that substitutes dynamic checks and avoids redundancies. (In the figure, and in the following, we write \(\mathit {aP}\) to denote/abbreviate “
The entire contract for the
Another difference concerns the treatment of local method dependencies: a small change in
2.2 Bit-Vector Finite Automata
We define BFA (
(Sets and Bit-vectors).
Let \(\mathcal {B}^n\) denote the set of bit-vectors of length \(n \gt 0\). We write \(b, b^{\prime }, \ldots\) to denote elements of \(\mathcal {B}^n\), with \(b[i]\) denoting the ith bit in b. Given a finite set S with \(|S|=n\), every \(A \subseteq S\) can be represented by a bit-vector \(b_A \in \mathcal {B}^n\), obtained via the usual characteristic function.
By a small abuse of notation, given sets \(A, A^{\prime } \subseteq S\), we may write \(A \subseteq A^{\prime }\) to denote the subset operation applied on \(b_{A}\) and \(b_{A^{\prime }}\) (and similarly for \(\cup ,\cap\), and \(\setminus\)).
We first define a
Definition 2.3, given next, gives a mapping from methods to triples of bit-vectors, denoted \(\mathcal {L}_c\). Given \(k \gt 0\), let us write \(1^k\) (resp. \(0^k\)) to denote a sequence of 1s (resp. 0s) of length k.
The initial state is determined by \(E^c\), the set of enabling annotations on the constructor.
(Mapping \(\mathcal {L}_c\))
Given a class c, we define \(\mathcal {L}_c\) as a mapping from methods to triples of subsets of \(\Sigma ^{\bullet }_c\) as follows: \(\begin{equation*} \mathcal {L}_c : \Sigma ^{}_c\rightarrow \mathcal {P}(\Sigma ^{\bullet }_c) \times \mathcal {P}(\Sigma ^{\bullet }_c) \times \mathcal {P}(\Sigma ^{\bullet }_c) \end{equation*}\)
Given \(m_i \in \Sigma ^{}_c\), we shall write \(E_i\), \(D_i\), and \(P_i\) to denote each of the elements of the triple \(\mathcal {L}_c(m_i)\). Similarly, we write \(E^c\), \(D^c\), and \(P^c\) to denote the elements of the triple \(\mathcal {L}_c(m^\uparrow)\). The mapping \(\mathcal {L}_c\) is induced by the annotations in class c: for each \(m_i\), the sets \(E_i\) and \(D_i\) are explicit, and \(P_i\) is simply the singleton \(\lbrace m_i\rbrace\). This singleton formulation is convenient to define the domain of the compositional analysis in Section 3.2: as we will see later, it allows us to uniformly treat method calls and procedure calls which can have more elements in pre-set \(P_i\).
We impose some natural well-formedness conditions on the
(\({\mathit {well\_formed}}(\mathcal {L}_c)\))
Let c, \(\Sigma _c\), and \(\mathcal {L}_c\) be a class, its method set, and its
The first condition says that the constructor’s enabling and disabling sets must be disjunctive and complementary with respect to \(\Sigma ^{\bullet }_c\); this will be convenient later when defining the compositional analysis algorithm in Section 3. The second condition ensures that every method’s enabling and disabling sets are disjunctive. Furthermore, by taking \(E_i, D_i \in \Sigma ^{\bullet }_c\) we ensure that the annotations of method \(m_i\) cannot refer to the constructor nor the destructor (see Notation 2.1).
In a
These intuitions should serve to illustrate our approach and, in particular, the local nature of enabling/disabling dependencies between methods. The following definition makes them precise.
(.
– | Q is a finite set of states \(q_b, q_{b^{\prime }}, \ldots\), where \(b, b^{\prime }, \ldots \in \mathcal {B}^n\); | ||||
– | \(\Sigma ^{\bullet }_c= \lbrace m_1, \ldots , m_n\rbrace\) is the alphabet (method identities); | ||||
– | \(q_{E^c}\) is the starting state (recall that \(E^c\) is enabling set of a constructor); | ||||
– | \(\mathcal {L}_c\) is a | ||||
– | \(\delta : Q \times \Sigma _c \rightarrow Q\) is the transition function, where \(\begin{equation*} \delta (q_b, m_i) = q_{b^{\prime }} \end{equation*}\) with \(b^{\prime } = (b \cup E_i) \setminus D_i\), if \(P_i \subseteq b\), and is undefined otherwise. |
We remark that in a
(SparseLU).
We give the
Contrasting
The property, called context-independence, is satisfied by all
Let \(M = (Q, \Sigma ^{\bullet }_c, \delta , q_{E^c},\mathcal {L}_c)\) be a
We only consider the first item, as the second item is shown similarly. By \(\widetilde{p} \cdot m_{n+1} \notin L(M)\) and \(\widetilde{p} \cdot m_n \cdot m_{n+1} \in L(M)\) and Definition 2.5 we know that (1) \(\begin{align} m_{n+1} \in E_{n} \end{align}\) Furthermore, for any \(\widetilde{m} \in ({\Sigma ^{\bullet }_c})^*\), let \(q_{b}\) be such that \(\delta (q_{10^{n-1}}, \widetilde{m})=q_b\) and \(q_{b^{\prime }}\) such that \(\delta (q_b, m_n)=q_{b^{\prime }}\). Now, by Definition 2.5 we have that \(\delta (q_{b^{\prime }},m_{n+1})\) is defined, as by (1) we know \(P_{n+1} = \lbrace m_{n+1} \rbrace \subseteq b^{\prime }\). Thus, for all \(\widetilde{m} \in L(M)\) we have \(\widetilde{m} \cdot m_n \cdot m_{n+1} \in L(M)\). This concludes the proof.□
Informally, the above theorem says that the effect of a call to \(m_n\) to subsequent calls (\(m_{n+1}\)) is not influenced by previous calls (i.e., the context) \(\widetilde{m}\). That is, Item 1. (resp. Item 2.) says that method \(m_n\) enables (resp. disables) the same set of methods in any context.
The context-independence property is not satisfied by all DFAs. Consider, for example, a DFA that disallows modifying a collection while iterating is not a
“
should be allowed, whereas
“
should not. That is, “
3 A COMPOSITIONAL ANALYSIS ALGORITHM
Since
3.1 Key Ideas
We motivate our compositional analysis technique with the example below.
Let
The central idea of our analysis is to accumulate enabling and disabling annotations. For this, the abstract domain maps object access paths to triples from the definition of \(\mathcal {L}_{\text{SparseLU}}\) (cf. Definition 2.3). A transfer function interprets method calls in this abstract state. We illustrate the transfer function; the evolution of the abstract state is presented as comments in the following code listing.
At the procedure entry (line 2), we initialize the abstract state as a triple with empty sets (\(s_1\)). Next, the abstract state is updated at the invocation of
Finally, we join the abstract states of two branches (i.e., \(s_2\) and \(s_3\)) at line 7. Intuitively, this join operates as follows: (i) a method is enabled only if it is enabled in both branches and not disabled in any branch; (ii) a method is disabled if it is disabled in either branch; (iii) a method called in either branch must be in the pre-condition (cf. Definition 3.2). Accordingly, in line 8, we obtain the final state \(s_4\), which is also a summary for the method
Now, we illustrate the checking of the client code
Above, at line 2, the abstract state is initialized with annotations of the constructor
Class Composition. In the above example, the allowed orderings of method calls to an object of class
where the class
3.2 The Algorithm
We formally define our analysis, which presupposes the control-flow graph (CFG) of a program. Let us write \(\mathcal {AP}\) to denote the set of access paths, which enable a field-sensitive data-flow analysis; see, e.g., [5, 18, 22] for more information on this subject. Access paths model heap locations as paths used to access them: a program variable followed by a finite sequence of field accesses (e.g., \(foo.a.b\)). We use access paths as we would like to explicitly track states of class members; this, in turn, enables a precise compositional analysis. The abstract domain, denoted \(\mathbb {D}\), maps access paths \(\mathcal {AP}\) to
(Join Operator).
We define \(\bigsqcup : Cod(\mathcal {L}_c) \times Cod(\mathcal {L}_c) \rightarrow Cod(\mathcal {L}_c)\) as follows: \(\begin{equation*} \langle E_1, D_1, P_1 \rangle \sqcup \langle E_2, D_2, P_2 \rangle = \langle E_1 \ \cap \ E_2 \setminus (\ D_1 \cup D_2),\ D_1 \cup D_2,\ P_1 \cup \ P_2 \rangle \end{equation*}\)
The join operator on \(Cod(\mathcal {L}_c)\) is lifted to \(\mathbb {D}\) by taking the union of un-matched entries in the mapping.
We now define some useful functions and predicates. First, we remark that our analysis is only concerned with three types of CFG nodes: a method call node, entry, and exit node of a method body; all other node types are irrelevant.
We introduce convenient notations for entry and method call nodes:
The following definitions concern CFG traversal, predecessor nodes, exit nodes, and actual parameters:
(forward(-))
Let G be a CFG. Then, \(\mathit {forward}(G)\) enumerates nodes of G by traversing it in a breadth-first manner.
(pred(-)))
Let G be a CFG and v a node of G. Then, \(\mathit {pred}(v)\) denotes a set of predecessor nodes of v. That is, \({pred}(v) = W\) such that \(w \in W\) if and only if there is an edge from w to v in G.
(warning(-))
Let G be a CFG and \(\mathcal {L}_1,\ldots , \mathcal {L}_k\) be a collection of
(exit_node(-))
Let v be a method call node. Then, \(\mathit {exit\_node}(v)\) denotes the exit node w of a method body corresponding to v.
(actual_arg(-,-))
Let \(v = \texttt {Call-node}[m_j(p_0:b_0, \ldots , p_n:b_n)]\) be a call node. Suppose \(p \in \mathcal {AP}\). We define \({actual\_arg}(p, v) = b_i\) if \(p=p_i\) for \(i \in \lbrace 0,\ldots ,n \rbrace\); otherwise \({actual\_arg}(p, v)=p\).
For convenience, we use a dot notation to access elements of
(Dot Notation for.
The compositional analysis is given in Algorithm 1. It expects a program’s CFG and a series of contracts, expressed as
The algorithm traverses the CFG nodes top-down in a for-loop (lines 2–7) as given by \(\mathit {forward}(G)\) (cf. Definition 3.3). For each node v, we first check whether v has predecessors: if not, when \(\mathit {pred}(v) = \emptyset\), we initialize domain \(\sigma\) as an empty mapping of type \(\mathbb {D}\); otherwise, we collect information from its predecessors (as given by \(\mathit {pred}(v)\)) and join them as \(\sigma\) (line 6). Then, it uses predicate
Guard Predicate. Predicate \(\textsf {guard}(v, \sigma)\) checks whether a pre-condition for method call node v in the abstract state \(\sigma\) is met (cf. Algorithm 2). We represent a call node as \(m_j(p_0:b_0,\ldots ,p_n:b_n)\) where \(p_i\) and \(b_i\) (for \(i \in \lbrace 0, \ldots , n\rbrace\)) are formal and actual arguments, respectively. Let \(\sigma _w\) be a post-state of an exit node of method \(m_j\). A pre-condition is satisfied if for all \(b_i\) there are no elements in their pre-condition set (i.e., the third element of \(\sigma _w[b_i]\)) that are also in disabling set of the current abstract state \(\sigma [b_i]\).
For this predicate, we need the property \(D = \Sigma ^{\bullet }_{c_i} \setminus E\), where \(\Sigma ^{\bullet }_{c_i}\) is a set of methods for class \(c_i\). This is ensured by condition \(well\_formed(\mathcal {L}_{c_i})\) (Definition 2.4) and by the definition of
The Transfer Function. The transfer function, given in Algorithm 3, distinguishes between two types of CFG nodes:
Entry-node: (lines 3–6) This is a function entry node. As described in Notation 3.1, for simplicity, we represent it as \(m_j(p_0, \ldots , p_n)\) where \(m_j\) is a method name and \(p_0, \ldots , p_n\) are formal arguments. We assume \(p_0\) is a reference to the receiver object (i.e., this). If method \(m_j\) is defined in a class \(c_i\) with user-supplied annotations \(\mathcal {L}_{c_i}\), in line 5, we initialize the domain to the singleton map (i.e., this mapped to \(\mathcal {L}_{c_i}(m_j)\)). Otherwise, we return an empty map meaning that a summary has to be computed.
Call-node: (lines 7–20) We represent a call node as \(m_j(p_0:b_0, \ldots , p_n:b_n)\) (cf. Notation 3.1) where we assume actual arguments \(b_0, \ldots , b_n\) are access paths for objects, with \(b_0\) representing a receiver object.
The analysis is skipped if this is in the domain (line 10): this means the method has user-entered annotations. Otherwise, we transfer an abstract state for each argument \(b_i\), but also for each class member whose state is updated by \(m_j\). Thus, we consider all access paths in the domain of \(\sigma _w\), that is \(ap \in dom(\sigma _w)\) (line 11). We construct an access path \(ap^{\prime }\) given ap. We distinguish two cases: ap denotes (i) a member and (ii) a formal argument of \(m_j\). In line 12, we handle both cases. In the former case, we know ap has form \(this.c_1. \ldots . c_n\). We then construct \(ap^{\prime }\) as ap with \({this}\) substituted for \(b_0\) (\(\mathsf {actual\_arg}(\text{-})\) is the identity in this case, see Definition 3.7): e.g., if receiver \(b_0\) is \(this.a\) and ap is \(this.c_1. \ldots . c_n\) then \(ap^{\prime } = this.a.c_1.\ldots .c_n\). In the latter case ap denotes the formal argument \(p_i\) and \(\mathsf {actual\_arg}(\text{-})\) returns the corresponding actual argument \(b_i\) (as \(p_i\lbrace b_0 / this\rbrace = p_i\)).
Now, as \(ap^{\prime }\) is determined, we construct its
We can see that for each method call we have a constant number of bit-vector operations per argument. That is, our
Analysis Complexity: Comparison to DFA-based algorithm. As already mentioned, the performance of a compositional DFA-based analysis depends on the number of states.
In DFA-based analyses, the analysis domain is given by \(\mathcal {P}(Q)\), where Q is the set of states. In the intraprocedural analysis, at each method call, the transfer function would need to transition each state in the abstract state according to a given DFA. That is, the transfer function is the DFA’s transition function lifted to a subset of states (with signature \(\mathcal {P}(Q) \mapsto \mathcal {P}(Q)\)). Clearly, the intraprocedural analysis depends linearly in the number of DFA states.
Even more prominently, the compositional interprocedural analysis is affected by the number of states. Each procedure has to be analyzed taking each state as an entry state: thus, effectively, we would need to run the intraprocedural analysis \(|Q|\) times. Now, as a procedure body can contain branches, the analysis can result in a set of states for a given input state: the procedure summary is a mapping from a state into a set of states. For a procedure call, the transfer function would need to apply this mapping, thus taking \(|Q|^2\) in the worst case. Overall, the compositional analysis takes \(|Q|^3\) operations in the worst-case per a procedure call.
To sum up, taking
Implementation. Note, in our implementation, we use several features specific to Infer: (1) Infer’s summaries, which allow us to use a single domain for intra and inter procedural analysis; (2) scheduling on CFG top-down traversal, which simplifies the handling of branch statements. In principle, however,
Correctness. In a
(\({[\![ }\text{-}{]\!] }(\text{-})\))
Let \(\langle E, D, P \rangle \in Cod(\mathcal {L}_c)\) and \(b \in \mathcal {B}^n\). We define \({[\![ }\langle E, D, P \rangle {]\!] }(b) = {b^{\prime }}\), where \(b^{\prime }=(b \cup E) \setminus D\) if \(P \subseteq b\), and is undefined otherwise.
We show the two items:
(1) | By Definition 2.5, for all \(q_b \in S\) we know \(\delta (q_b, m)\) is defined when \(P \subseteq b\) with \(\langle E, P, D \rangle = \mathcal {L}_c(m)\). So, we have \(P \subseteq \bigcap _{q_b \in P} b = b_*\) and \(\delta (q_{b_*}, m)\) is defined. | ||||
(2) |
Our
We define the declarative transfer function as follows:
(dtransfer c(-,-))
Let \(c \in \mathit {Classes}\) be a class, \(\Sigma ^{\bullet }_c\) be a set of methods of c, and \(\mathcal {L}_c\) be a
– | |||||
– | |||||
– | \(P^{\prime } = P \ \cup \ (P^{m} \ \setminus \ E)\), if \(P^m \cap D = \emptyset\), and is undefined otherwise. |
Let \(m_1,\ldots ,m_n, m_{n+1}\) be a method sequence and \(\phi = \langle E, D, P \rangle\), then \(\begin{align*} &\mathsf {dtransfer}_c(m_1,\ldots ,m_n, m_{n+1}, \phi) = \mathsf {dtransfer}_c(m_{n+1}, \mathsf {dtransfer}_c(m_1,\ldots ,m_n, \phi)) \end{align*}\)
Relying on Theorem 3.10, we state the soundness of \(\mathsf {join}\):
(Soundness of ⊔)
Let \(q_b \in Q\) and \(\phi _i = \langle E_i, D_i, P_i \rangle\) for \(i \in \lbrace 1,2\rbrace\). Then, \(\begin{equation*} {[\![ }\phi _1{]\!] }(b) \cap {[\![ }\phi _2{]\!] }(b) = {[\![ }\phi _1 \sqcup \phi _2{]\!] }(b) \end{equation*}\)
By Definition 3.9, Definition 3.2, and set laws we have: \(\begin{align*} {[\![ }\phi _1{]\!] }(b) \cap {[\![ }\phi _2{]\!] }(b) &= ((b \cup E_1) \setminus D_1) \cap ((b \cup E_2) \setminus D_2) \\ &= ((b \cup E_1) \cap (b \cup E_2)) \setminus (D_1 \cup D_2) \\ &= (b \cup (E_1 \cap E_2)) \setminus (D_1 \cup D_2) \\ &= (b \cup (E_1 \cap E_2 \setminus (D_1 \cup D_2)) \setminus (D_1 \cup D_2) \\ & = {[\![ }\phi _1 \sqcup \phi _2{]\!] }(b) \end{align*}\)□
With these auxiliary notions in place, we show the correctness of the transfer function (i.e., summary computation that is specialized for the code checking):
(Correctness of dtransfer c(-,-))
Let \(M = (Q, \Sigma ^{\bullet }_c, \delta , q_{E^c}, \mathcal {L}_c)\). Let \(q_b \in Q\) and \(\widetilde{m} = m_1, \ldots , m_n \in {(\Sigma ^{\bullet }_c)}^*\). Then \(\begin{align*} &\mathsf {dtransfer}_{c}(m_1, \ldots , m_n, \langle \emptyset , \emptyset , \emptyset \rangle) = \langle E^{\prime }, D^{\prime }, P^{\prime } \rangle \iff \hat{\delta }(q_b, m_1, \ldots , m_n)=q_{b^{\prime }} \end{align*}\) such that \(b^{\prime } = {[\![ }\langle E^{\prime }, D^{\prime }, P^{\prime } \rangle {]\!] }(b)\).
We show the two directions of the equivalence:
– | (\(\Rightarrow\), Soundness): By induction on n, the length of \(\widetilde{m} = m_1,\ldots ,m_n\).
| ||||
– | (\(\Leftarrow\), Completeness): By induction on n, the length of \(\widetilde{m} = m_1,\ldots ,m_n\).
|
Let us discuss the specialization of Theorem 3.3 for code checking. In this case, we know that a method sequence starts with the constructor method (i.e., the sequence is of the form \(m^\uparrow , m_1, \ldots , m_n\)) and \(q_{E^c}\) is the input state. By \(\mathit {well\_formed}(\mathcal {L}_c)\) (Definition 2.4), we know that if \(\delta (q_{E^c}, m^\uparrow)=q_b\) and \(\begin{equation*} \mathsf {dtransfer}_c(m^\uparrow , m_1, \ldots , m_n, \langle \emptyset , \emptyset , \emptyset \rangle) = \sigma \end{equation*}\) then methods not enabled in \(q_b\) are in the disabling set of \(\sigma\). Thus, for any sequence \(m_1,\ldots , m_{k-1}, m_{k}\) such that \(m_k\) is disabled by the constructor and not enabled in substring \(m_1,\ldots , m_{k-1}\), the condition \(P \cap D_i \not= \emptyset\) correctly checks that a method is disabled. If \(\mathit {well\_formed}(\mathcal {L}_c)\) did not hold, the algorithm would fail to detect an error as it would put \(m_k\) in P since \(m_k \notin E\).
Aliasing. We discuss how aliasing information can be integrated into our approach. In Example 3.1, member
Above, at line 2, we would need to update the bindings of \(S_1\) and \(S_2\) by applying a
4 ANALYZING “MUST CALL” PROPERTIES
Up to here, we have considered the specification of so-called may call properties—our
We note that local contracts involving only “must call” method dependencies also suffer from the state explosion problem. To illustrate this, consider a class that contains n pairs of methods such that one method requires another one to be invoked in a code continuation. Depending on the call history, at any given program point, any subset of n methods is required to be called in a code continuation. As this information must be encoded in states, the corresponding DFA would have \(2^n\) reachable states.
Now we discuss how we refine our abstraction of states (set of states) in the presence of require annotations. In the case of enabling/disabling annotations, we showed that states only differ in a set of output edges. We leveraged this fact to abstract a set of states into a set of output edges. However, by having the additional “require” annotations there could be two distinct states with the same set of output edges where incoming paths of one state can satisfy the “require” annotation, whereas paths of the other state cannot. Furthermore, only states whose incoming paths satisfy all “require” conditions can be accepting. Therefore, our abstraction of states must include information of required methods in addition to enabled methods. We remark that this refined abstraction still allow us to represent a set of states as a single state.
4.1 Annotation Language Extension
First, we extend the
We extend the definition of annotation language from Definition 2.1 as follows:
(Annotation Language, Extended).
Let \(\Sigma ^{}_c= \lbrace m^\uparrow , m_1, \ldots , m_n, m^\downarrow \rbrace\) be a set of method names, where we have
Let \(\tilde{x} = m^\uparrow , x_0, x_1, x_2, \ldots\) be a sequence where each \(x_i \in \Sigma ^{\bullet }_c\). We say that \(\tilde{x}\) is valid (w.r.t. annotations) if the following holds:
Analogously to \(\texttt {@EnableOnly}(E_i) \ m_i\) we can derive \(\texttt {@RequireOnly}(R_i) \ m_i\) as follows: \(\begin{align*} \texttt {@RequireOnly}(R_i) \ m_i &\stackrel{\text{def}}{=}\texttt {@Enable}(R_i) \ \texttt {@Disable}(\Sigma ^{\bullet }_c\setminus R_i) \ \texttt {@Require}(R_i) \ m_i \end{align*}\)
We illustrate the semantics of \(``\texttt {@Require}(R_i) \ m_i^{\prime \prime }\) by appealing to our running example from Figure 2. We wish to refine the contract for class
Observe that the “must call” contract induces an extended
Our insight is that every state q should record the accumulated requirements for its outgoing paths, i.e., methods that must be invoked to reach accepting states. For example, the abstraction of state \(q_2\) should contain information that method
4.2 Formalizing the “Must Call” Property
4.2.1 Extended BFA (\(\textsf {BFA}^*\)).
Following the intuition that a state must record requirements for outgoing paths, we extend the state bit-vector representation as follows: \(\begin{equation*} q_{b,f} \end{equation*}\) where \(b, f \in \mathcal {B}^n\) with n being the number of methods in a class. Here, b represents the enabled methods in a state, as before, and f accumulates require annotations: methods that must be elements of every path from \(q_{b,f}\) to some accepting state.
Now, we define \(\mathcal {L}^*_c\) as the extension of the mapping \(\mathcal {L}_c\) from Definition 2.3 as follows:
(Mapping \(\mathcal {L}^*_c\))
Given a class c, we define \(\mathcal {L}^*_c\) as a mapping from methods to tuple of subsets of \(\Sigma ^{}_c\): \(\begin{equation*} \mathcal {L}^*_c: \Sigma ^{}_c\rightarrow \big (\mathcal {P}(\Sigma ^{\bullet }_c) \times \mathcal {P}(\Sigma ^{\bullet }_c) \times \mathcal {P}(\Sigma ^{\bullet }_c)\big) \times \big (\mathcal {P}(\Sigma ^{\bullet }_c) \times \mathcal {P}(\Sigma ^{\bullet }_c)\big) \end{equation*}\)
Above, the first triple is as before: given \(m_i \in \Sigma ^{}_c\) we write \(E_i\), \(D_i\), and \(P_i\) to denote first three elements of \(\mathcal {L}^*_c (m_i)\). There is an additional pair in \(\mathcal {L}^*_c (m_i)\), which collects information needed to encode the “must call” property. We shall write \(R_i\) and \(C_i\) to denote its elements.
Similarly as before, transitions between states \(q_{b, f}, q_{b^{\prime }, f^{\prime }}, \cdots\) are determined by \(\mathcal {L}^*_c\). In addition to the semantics of \(E_i\), \(D_i\), and \(P_i\) on transitions, we give the following intuitions for \(R_i\) and \(C_i\). The set of methods \(R_i\) adds the following requirements for subsequent transitions: given \(m_i \in \Sigma ^{}_c\) we have \(l \in R_i\) if and only if \(m_l\) must be called after \(m_i\). Dually, \(C_i\) records the fulfillment of requirements for a transition. Similarly to \(P_i\), \(C_i\) is a singleton set containing method \(m_i\). Again, we define this is as a set to ease the definition of the domain of the compositional analysis algorithm in Section 4.3. We formalize these intuitions as an extension of
Well-formed mapping. We identify some natural well-formedness conditions on the mapping \(\mathcal {L}^*_c\). First, we remark that a method cannot require a call to itself, as this would make a self-loop of requirements which cannot be satisfied by any finite sequence. Furthermore, in order to be able to satisfy requirements (i.e., to reach accepting states), we need a condition that require annotations are subset of enabling annotations. We incorporate these conditions in the extension of predicate \({well\_formed}(\text{-})\) (Definition 2.4):
(\({well\_formed}(\mathcal {L}^*_c)\))
Let c, \(\Sigma ^{}_c\), and \(\mathcal {L}^*_c\) be a class, its method set, and its mapping, respectively. Then, \(\mathsf {well\_formed}(\mathcal {L}^*_c)={\bf true}\) iff the following conditions hold:
We are now ready to extend the definition of
(BFA*)
Given a \(c \in \mathit {Classes}\) with \(n \gt 0\) methods, an extended
– | Q is a finite set of states \(q_{b,f}, q_{b^{\prime },f^{\prime }},\ldots\), where \(b, b^{\prime }, \ldots , f, f^{\prime }, \ldots \in \mathcal {B}^n\) | ||||
– | \(\Sigma ^{\bullet }_c= \lbrace m_1, \ldots , m_n\rbrace\) is the alphabet (method identities); | ||||
– | |||||
– | \(\delta : Q \times \Sigma ^{\bullet }_c\rightarrow Q\) is the transition function, where \(\begin{equation*} \delta (q_{b,f}, m_i) = q_{b^{\prime }, f^{\prime }} \end{equation*}\) with \(b^{\prime } = (b \cup E_i) \setminus D_i\) if \(P_i \subseteq b\), and is undefined otherwise. Also, \(f^{\prime }= f \setminus C_i \cup R_i\); | ||||
– | \(\mathcal {L}^*_c\) is an extended | ||||
– | The set of accepting states F is defined as \(\begin{equation*} F = \lbrace q_{b,0^{n}} : q_{b,0^{n}} \in Q \rbrace \end{equation*}\) |
The definition of F captures the intuition that a state is accepting only if it has no outstanding requirements, i.e., its bit-vector f is the zero-vector.
We now need to show that a well-formed \({\mathcal {L}^*_c}\) ensures that its induced \(\textsf {BFA}^*\) has reachable accepting states. This boils down to showing that in each state, the required bit set f is contained in the enabled bit set b:
Let \(M = (Q, \Sigma ^{\bullet }_c, \delta ,q_{E^c, R^c},{\mathcal {L}^*_c}, {F})\) be a \(\textsf {BFA}^*\). Then, for \(q_{b,f} \in Q\) we have \(f \subseteq b\).
First, we can see that initial state \(q_{E^c, R^c}\) trivially satisfies \(f \subseteq b\). Furthermore, let \(q_{b,f} \in Q\) such that \(f \subseteq b\). Then, for \(m_i \in \Sigma ^{\bullet }_c\) we have \(\begin{equation*} \delta (q_{b,f}, m_i) = q_{b^{\prime }, f^{\prime }} \end{equation*}\) with \(b^{\prime } = (b \cup E_i) \setminus D_i\) if \(P_i \subseteq b\), and is undefined otherwise. Also, \(f^{\prime }= f \setminus C_i \cup R_i\). Now, the goal \(f^{\prime } \subseteq b^{\prime }\) follows by this and conditions \(E_i \cap D_i = \emptyset\) and \(R_i \subseteq E_i\) ensured by \(\mathsf {well\_formed}(\mathcal {L}^*_c)\) (Definition 4.3).□
We illustrate states and transitions of a \(\textsf {BFA}^*\) given in Figure 4 in the following example:
(SparseLU must-contract).
The mapping \(\mathcal {L}^*_{\text{SparseLU}}\) that corresponds to the contract given in Listing 6 is as follows: \(\begin{align*} \mathcal {L}^*_{\text{SparseLU}} &= \big \lbrace 0 \mapsto \langle \langle \lbrace 1,2\rbrace ,\lbrace 3,4\rbrace , \emptyset \rangle , \langle \emptyset , \emptyset \rangle \rangle ,\ 1 \mapsto \langle \langle \lbrace 3\rbrace ,\lbrace 1,2,4\rbrace ,\lbrace 1\rbrace \rangle , \langle \lbrace 3\rbrace ,\lbrace 1\rbrace \rangle \rangle ,\ \\ & \quad 2 \mapsto \langle \langle \lbrace 4\rbrace ,\lbrace 1,2,3\rbrace ,\lbrace 2\rbrace \rangle , \langle \lbrace 4\rbrace , \lbrace 2\rbrace \rangle \rangle ,\ 3 \mapsto \langle \langle \lbrace 4\rbrace ,\lbrace 1,2,3\rbrace ,\lbrace 3\rbrace \rangle , \langle \lbrace 4 \rbrace , \lbrace 3\rbrace \rangle \rangle ,\ \\ & \quad 4 \mapsto \langle \langle \lbrace 1,2,3\rbrace ,\emptyset ,\lbrace 4\rbrace \rangle , \langle \emptyset , \lbrace 4 \rbrace \rangle \rangle \big \rbrace \end{align*}\) The starting state is \(q_{1100, 0000}\). The set of states is \(\begin{equation*} Q=\lbrace q_{1100, 0000}, q_{0010, 0010}, q_{0001, 0001}, q_{1111, 0000}\rbrace \end{equation*}\) Differently from the contract given in Example 2.6, in which all states were accepting, here we have an explicit set of accepting states: \(\begin{equation*} F= \lbrace q_{1100, 0000}, q_{1111, 0000} \rbrace . \end{equation*}\) The corresponding transition function \(\delta (\text{-})\) is as follows:
Notice that the transformations of b-bits of states are as in Example 2.6. Additionally, transitions operate on f-bits to determine the accepting states. For example, the transition
\(\begin{equation*} \delta (q_{1111, 0000}, \mathit {compute}) = q_{0001, 0001} \end{equation*}\) adds the requirement to call
\(\textsf {BFA}^*\) subtyping. We now discuss the extension of the subtyping relation given in Section 2.2. In order to check that \(c_1\) is a superclass of \(c_2\), that is that \(M_2\) subsumes \(M_1\) (\(M_2 \succeq M_1\)), additionally to checking respective E, D, and P sets of \(\mathcal {L}^*_{c_1}\) and \(\mathcal {L}^*_{c_2}\) for each method, as given in Section 2.2, we need the following checks: \(R_2 \subseteq R_1\) and \(C_1 \subseteq C_2\). This follows the intuition that a superclass must be at least permissive as its subclasses: the subclass methods can only have less requirements.
4.3 An Extended Algorithm
We now present the extension of the compositional analysis algorithm to account for \(\textsf {BFAs}^*\). We illustrate the key ideas of the required extensions with an example.
In Listing 7, we give class
Analogously to how our original algorithm accumulates enabling annotations by traversing a program’s CFG, in the extension, we will accumulate require annotations. We extend the abstract domain with a pair \(\langle R, C \rangle\), where R and C are sets of methods in which we will appropriately accumulate require annotations. Intuitively, we use R to record call requirements for a code continuation and C to track methods that have been called up to a current code point.
First, we compute a summary for
At procedure entry, we initialize the abstract state as an empty pair (\(s_1\)). Next, on the invocation of
Next, we compute a summary for
In the first if-branch, on line 4, we copy the corresponding annotations from \(\mathcal {L}^*_{SparseLU} (aP)\) to obtain \(s_2\). Here, we remark that
Now, on line 12, we should join the resulting sets of the two branches, that is, \(s_3\) and \(s_4\). For this we take the union of the require sets and the intersection of the called sets: this follows the intuition that a method must be called in a continuation if it is required within any branch; dually, a method call required prior to branching is satisfied only if it is invoked in both branches.
Once summaries for
Here, on line 4, we simply copy the summary computed for method
We show how to extend our compositional analysis algorithm from Section 3 to incorporate analysis of “must call” properties.
Abstract Domain. First, we recall that our abstract domain \(\mathbb {D}\) is a mapping from access paths to elements of mapping \(\mathcal {L}_c\). Given the extended mapping \(\mathcal {L}^*_c\), this is reflected on the abstract domain as follows: \(\begin{align*} \mathbb {D}: \mathcal {AP}\rightarrow \bigcup _{c \in \mathit {Classes}} Cod(\mathcal {L}^*_c) \end{align*}\)
The elements of the co-domain have now the following form: \(\begin{equation*} \big \langle \langle E, D, P \rangle , \langle R, C \rangle \big \rangle \end{equation*}\) where \(R, C \subseteq \Sigma ^{\bullet }_c\). Intuitively, R is a set of methods that must be called in a code continuation, and C is a set of methods that have been called up to the current program point.
Algorithm. We modify the algorithm to work with an abstract domain extended with the pair \(\langle R, C \rangle\). To this end, we extend (i) the join operator, (ii) the guard predicate (Algorithm 2), and (iii) the transfer function (Algorithm 3). Next, we discuss these extensions.
Join operator. The modified join operator has the following signature: \(\begin{equation*} \bigsqcup : Cod(\mathcal {L}^*_c) \times Cod(\mathcal {L}^*_c) \rightarrow Cod(\mathcal {L}^*_c) \end{equation*}\) Its definition is conservatively extended as follows: \(\begin{align*} &\big \langle \langle E_1, D_1, P_1 \rangle , \langle R_1, C_1 \rangle \big \rangle \sqcup \big \langle \langle E_2, D_2, P_2 \rangle , \langle R_2, C_2 \rangle \big \rangle \\ &\qquad =\big \langle \langle E_1 \ \cap \ E_2 \setminus (\ D_1 \cup D_2),\ D_1 \cup D_2,\ P_1 \cup \ P_2 \rangle ,\ \langle R_1 \cup R_2,\ C_1 \cap C_2 \rangle \big \rangle \end{align*}\)
Guard predicate. In Algorithm 2, in the body of case Call-node[\(m_j(p_0:b_0,\ldots ,p_n:b_n)\)] we add the following check after line 4: \(\begin{align*} {\bf if} \ m_j == destructor \ {\bf and} \ \sigma _w[p_0].R \not= \emptyset \ {\bf then return} {\bf {\it False}}{\bf ;} \end{align*}\) In the case \(m_j\) is destructor, we additionally check whether its requirements are empty; if not we raise a warning.
Transfer function. In Algorithm 3, we add the following lines after line 16 to transfer the new elements \(\langle R, C \rangle\): \(\begin{align*} R^{\prime } &= \big (\sigma (ap).R \cup \sigma _w(ap).R \big) \setminus \sigma _w(ap).C \\ C^{\prime } &= (\sigma (ap).C \cup \sigma _w(ap).C) \setminus \sigma _w(ap).R \end{align*}\) Then, the output abstract state \(\sigma ^{\prime }\) is constructed as follows: \(\begin{align*} \sigma ^{\prime }(ap^{\prime }) = \big \langle \langle E^{\prime }, D^{\prime }, P^{\prime } \rangle , \langle R^{\prime }, C^{\prime } \rangle \big \rangle \end{align*}\) where \(E^{\prime }, D^{\prime }\), and \(P^{\prime }\) are constructed as in Algorithm 3.
4.4 Extended Proofs of Correctness
Here, we present the correctness guarantees for \(\textsf {BFAs}^*\). We describe the needed extensions to the definitions, theorems, and proofs we discussed in the case of
Context-independence. Here, we characterize context-independence property for required annotations. Recall that context-independence states that the effects of annotations on subsequent calls do not depend on previous calls. Similarly, in the case of enabling/disabling annotations, this property directly follows from the idempotence of the operation on f-bits in the extended definition of \(\delta (\text{-})\), that is, \(f^{\prime } = (f \setminus C_i) \cup R_i\). The effect of this operation is independent of bits in f, which are accumulated by preceding calls (i.e., they represent a context).
Now, we formalize the extension of the statement and proof. First, as not all states in a \(\textsf {BFA}^*\) are accepting, the definition of \(L(M)\) that denotes strings accepted by M is now as follows: \(\begin{equation*} L(M) = \lbrace \widetilde{m} : \hat{\delta }(q_{E^c, R^c}, \widetilde{m}) = q^{\prime } \wedge q^{\prime } \in F \rbrace \end{equation*}\)
Consequently, we need to reformulate statements of first two items to preserve their meanings, and add the item concerning require annotations. Thus, we extend Theorem 2.1 as follows:
Let \(M = (Q, {\Sigma ^{\bullet }_c}, \delta , {q_{E^c, R^c}},\mathcal {L}^*_c, F)\) be a \(\textsf {BFA}^*\). Then, for \(m_n \in \Sigma ^{\bullet }_c\) we have
This property follows directly by the definition of transition function \(\delta (\text{-})\) in \(\textsf {BFAs}^*\) (Definition 4.4): that is, by the idempotence of b and f-bits transformation. More precisely, the effects of transformations \(b^{\prime } = (b \cup E_i) \setminus D_i\) (resp. \(f^{\prime } = (f \setminus C_i) \cup R_i\)) do not depend on input bits b (resp. f).
The first two items are shown similarly as in the proof of Theorem 2.1. We remark that additional sequences are only introduced in order to properly use the definition of \(L(M)\) for \(\textsf {BFAs}^*\).
Now we show item (3). We prove directly by the extended definition of the transition function \(\delta (\text{-})\), that is by \(f^{\prime } = (f \setminus C_i) \cup R_i\). First, let \(q_{b,f}\) be defined as follows: \(\begin{equation*} \delta (q_{E^c, R^c}, \widetilde{p}_1 \cdot m_n)=q_{b,f} \end{equation*}\)
Further, by \(\widetilde{p}_1 \cdot m_n \cdot \widetilde{p}_2 \in L(M)\) we have \(\widetilde{p}_2 \supseteq f\), as \(q_{b^{\prime }, 0^n} \in F\) by the definition of F. By this we have \(\widetilde{p}_2 \supseteq R_n\) as \(R_n \subseteq f\). Finally, as \(\mathcal {L}^*_c (m).R = \emptyset\) for \(m \in \widetilde{m}\) we have \(\begin{equation*} \delta (q_{E^c, R^c}, \widetilde{m} \cdot m_n)=q_{b^{\prime },R_n} \end{equation*}\) Using this and \(\widetilde{p}_2 \supseteq R_n\) we have \(\widetilde{m} \cdot m_n \cdot \widetilde{p}_2 \in L(M)\).□
(\({[\![ }\text{-}{]\!] }(\text{-})\) Extended)
Let \(\big \langle \langle E, D, P \rangle , \langle R, C\rangle \big \rangle \in Cod(\mathcal {L}^*_c)\), \(b,f \in \mathcal {B}^n\). We define \(\begin{align*} {[\![ }\big \langle \langle E, D, P \rangle , \langle R, C\rangle \big \rangle {]\!] } (b, f) = {b^{\prime }, f^{\prime }} \end{align*}\) where \(b^{\prime }=(b \cup E) \setminus D\) if \(P \subseteq b\), and is undefined otherwise; and \(f^{\prime } = (f \setminus C) \cup R\).
Now, to abstract the set of states of a \(\textsf {BFA}^*\) we also need to handle f-bits of states. Complementary to \(b^*\), we define \(f^*\) as the union of f-bits. Now, we extend Theorem 3.10 by incorporating f-bits in states and also item (3), which shows that a union of f-bits is the right way to abstract set of states P into a single state: intuitively, set of states P can be abstracted into the accepting state only if all states in P are accepting.
(\(\textsf {BFA}^*\)\(\cap\)-Property)
Suppose \(M = (Q, {\Sigma ^{\bullet }_c}, \delta , {q_{E^c, R^c}},\mathcal {L}^*_c, F)\), \(S\subseteq Q\), \(b_* = \bigcap _{q_b \in P} b\), and \(f^*=\bigcup _{q_{b,f} \in P} f\). Then we have:
The first item is only concerned with b-bits, thus it is shown as in Theorem 3.10.
Now, we discuss the proof for item (2). Here, we can separately prove the part for b-bits and for f-bits. The former proof is the same as in the corresponding case of Theorem 3.10. Moreover, the proof concerning f-bits follows the same lines as for b-bits (by induction on the cardinality of \({S}\) and set laws): it again directly follows by the idempotence of transformation of f-bits (i.e., \(f^{\prime } = (f \setminus C_i) \cup R_i\)); we remark that difference here is that we use the union (in the definition of \(f^*\) bits) instead of the intersection.
Finally, the proof of item (3) follows directly from the definition of accepting states, that is \(F = \lbrace q_{b,0^{n}} : q_{b,0^{n}} \in Q \rbrace\). Thus, we know \(S\subseteq F\) if and only if for all \(q_{b,f} \in S\) we have \(f =0^{n}\). The right-hand side is equivalent to \(f^* = 0^{n}\).□
Soundness of join operator. We extend Theorem 3.2 with f-bits in the state representation, \(\langle R_i, C_i \rangle\) in \(\phi _i\), and using the extended \({[\![ }\text{-}{]\!] }(\text{-})\) from Definition 4.8. We note that this theorem again relies on Theorem 4.2: we abstract set of reachable states by the union of f-bits.
For convenience, we will use “projections” of \({[\![ }\text{-}{]\!] }(\text{-})\) to b and f-bits. Let \(\phi = \langle \langle E, D, P \rangle , \langle R, C\rangle \rangle\), then will use \({[\![ }\phi {]\!] }_b(b)=b^{\prime }\) and \({[\![ }\phi {]\!] }_f(f)=f^{\prime }\), where \(b^{\prime }\) and \(f^{\prime }\) are defined as in Definition 4.8.
(Soundness of Extended \(\sqcup\))
Let \(q_{b,f} \in Q\) and \(\phi _i = \langle \langle E_i, D_i, P_i \rangle , \langle R_i, C_i \rangle \rangle\) for \(i \in \lbrace 1,2\rbrace\). Then, \({[\![ }\phi _1{]\!] }_b(b) \cap {[\![ }\phi _2{]\!] }_b(b) = {[\![ }\phi _1 \sqcup \phi _2{]\!] }_b(b)\) and \({[\![ }\phi _1{]\!] }_f(f) \cup {[\![ }\phi _2{]\!] }_f(f) = {[\![ }\phi _1 \sqcup \phi _2{]\!] }_f(f)\).
The proof concerning b-bits is the same as in Theorem 3.2. Now, we show the part concerning f-bits, that is \(\begin{equation*} {[\![ }\phi _1{]\!] }_f(f) \cup {[\![ }\phi _2{]\!] }_f(f) = {[\![ }\phi _1 {\sqcup } \phi _2{]\!] }_f(f) \end{equation*}\)
The proof follows by the extended definition of \({[\![ }\text{-}{]\!] }(\text{-})\) from Definition 4.8 and set laws as follows: \(\begin{align*} {[\![ }\phi _1{]\!] }(f) \cup {[\![ }\phi _2{]\!] }(f) &= ((f \setminus C_1) \cup R_1) \cup ((f \setminus C_2) \cup R_2) \\ &= (f \setminus (C_1 \cap C_2)) \cup (R_1 \cup R_2) = {[\![ }\phi _1 \sqcup \phi _2{]\!] }(f) \end{align*}\)□
Correctness of \(\mathsf {dtransfer}_{c}(\text{-},\text{-})\). We extend \(\mathsf {dtransfer}_{c}(\text{-},\text{-})\) from Definition 3.10 to account for the extended transfer function as follows:
(\(\mathsf {dtransfer}_c(\text{-},\text{-})\))
Let \(c \in \mathit {Classes}\) be a class, \(\Sigma ^{\bullet }_c\) be a set of methods of c, and \(\mathcal {L}^*_c\) be a \(\textsf {BFA}^*\). Furthermore, let \(m \in \Sigma ^{\bullet }_c\) be a method, \(\langle \langle E^m, D^m, P^m \rangle , \langle R^m,C^m \rangle \rangle =\mathcal {L}^*_c (m)\), and \(\langle \langle E, D, P \rangle , \langle R, C \rangle \rangle \in Cod(\mathcal {L}^*_c)\). Then, \(\begin{align*} \mathsf {dtransfer}_{c}(m, \big \langle \langle E, D, P \rangle ,\ \langle R, C \rangle \big \rangle) = \big \langle \langle E^{\prime }, D^{\prime }, P^{\prime } \rangle ,\ \langle R^{\prime }, C^{\prime } \rangle \big \rangle \end{align*}\) where \(E^{\prime } = (E \ \cup \ E^{m}) \setminus D^{m}\), \(D^{\prime } = (D \ \cup \ D^{m}) \setminus E^{m}\), and \(P^{\prime } = P \ \cup \ (P^{m} \ \setminus \ E)\), if \(P^m \cap D = \emptyset\), and is undefined otherwise. Also, \(R^{\prime }= (R \cup R^m) \setminus C^m\) and \(C^{\prime }= (C \cup C^m) \setminus R^m\).
Let \(m_1,\ldots ,m_n, m_{n+1}\) be a method sequence and \(\phi = \big \langle \langle E, D, P \rangle , \langle R, C \rangle \big \rangle\), then \(\begin{align*} &\mathsf {dtransfer}_c(m_1,\ldots ,m_n, m_{n+1}, \phi) = \mathsf {dtransfer}_c(m_{n+1}, \mathsf {dtransfer}_c(m_1,\ldots ,m_n, \phi)) \end{align*}\)
We now extend Theorem 3.3 to show the correctness of the extended \(\mathsf {dtransfer}_{c}(\text{-},\text{-})\) as follows:
(Correctness of \(\mathsf {dtransfer}_{c}(\text{-},\text{-})\))
Let \(M = (Q, {\Sigma ^{\bullet }_c}, \delta , {q_{E^c, R^c}},\mathcal {L}^*_c, F)\). Let \(q_{b,f} \in Q\) and \(m_1, \ldots , m_n \in (\Sigma ^{\bullet }_c)^*\). Then \(\begin{align*} &\mathsf {dtransfer}_{c}(m_1, \ldots , m_n, \big \langle \langle \emptyset , \emptyset , \emptyset \rangle ,\ \langle \emptyset , \emptyset \rangle \big \rangle) = \phi ^{\prime } \iff \hat{\delta }(q_{b,f}, m_1, \ldots , m_n)=q_{b^{\prime },f^{\prime }} \end{align*}\) such that \(b^{\prime },f^{\prime } = {[\![ }\phi ^{\prime }{]\!] }(b, f)\) where \(\phi ^{\prime }=\big \langle \langle E^{\prime }, D^{\prime }, P^{\prime } \rangle ,\ \langle R^{\prime }, C^{\prime } \rangle \big \rangle\).
The proof concerning b-bits is as in Theorem 3.3. Now, we will prove the part concerning transformation of f-bits.
We show only the Soundness (\(\Rightarrow\)) direction as the other direction is shown similarly. The proof is by induction. We strengthen the induction hypothesis with the following invariant: \(R^{\prime } \cap C^{\prime } = \emptyset\).
– | Case \(n=1\). We have \(\widetilde{m} = m_1\). Let \(R^m = \mathcal {L}^*_c (m_1).R\) and \(C^m = \mathcal {L}^*_c (m_1).C\). First by the definition of \(\mathsf {dtransfer}_{c}(\text{-})\) we have \(R^{\prime } = (\emptyset \cup R^m) \setminus C^m = R^m\) and \(C^{\prime } = (\emptyset \cup C^m) \setminus R^m = C^m\). Thus, we have \(f^{\prime } = {[\![ }\phi ^{\prime }{]\!] }_f(f)=(f \setminus C^m) \cup R^m\). Thus, directly by the definition of \(\delta (\text{-})\) we have \(\delta (q_{b,f}, m_1) = q_{b^{\prime },f^{\prime }}\). | ||||
– | Case \(n \gt 1\). Let \(\widetilde{m}=m_1,\ldots ,m_n, m_{n+1}\). By IH we know (9) \(\begin{align} &\mathsf {dtransfer}_{c}(m_1,\ldots ,m_n, \big \langle \langle \emptyset , \emptyset , \emptyset \rangle ,\ \langle \emptyset , \emptyset \rangle \big \rangle) = \phi ^{\prime } \Rightarrow \hat{\delta }(q_{b,f}, m_1,\ldots ,m_n)=q_{b^{\prime },f^{\prime }} \end{align}\) such that \(b^{\prime },f^{\prime } = {[\![ }\phi ^{\prime }{]\!] }(b, f)\) and \(f^{\prime } \subseteq b^{\prime }\) where \(\phi ^{\prime }=\big \langle \langle E^{\prime }, D^{\prime }, P^{\prime } \rangle ,\ \langle R^{\prime }, C^{\prime } \rangle \big \rangle\). As we focus only on \(f^{\prime }\) bits, we can infer \(f^{\prime } = (f \setminus C^{\prime }) \cup R^{\prime }\). Now, we assume (10) \(\begin{align} &\mathsf {dtransfer}_{c}(m_1,\ldots ,m_n, m_{n+1}, \big \langle \langle \emptyset , \emptyset , \emptyset \rangle ,\ \langle \emptyset , \emptyset \rangle \big \rangle) = \phi ^{\prime \prime } \end{align}\) such that \(\phi ^{\prime \prime }=\big \langle \langle E^{\prime \prime }, D^{\prime \prime }, P^{\prime \prime } \rangle ,\ \langle R^{\prime \prime }, C^{\prime \prime } \rangle \big \rangle\). We should show (11) \(\begin{align} \hat{\delta }(q_{b,f}, m_1,\ldots ,m_n, m_{n+1})=q_{b^{\prime \prime },f^{\prime \prime }} \end{align}\) such that \(f^{\prime \prime } = (f \setminus C^{\prime \prime }) \cup R^{\prime \prime }\) and \(f^{\prime \prime } \subseteq b^{\prime \prime }\). Let \(\mathcal {L}^*_c (m_{n+1})=\langle R^m, C^m \rangle\). We know \(C^m=\lbrace m_{n+1}\rbrace\). By Definition 3.10 we have \(\begin{align*} \mathsf {dtransfer}_{c}(m_1,\ldots ,m_n, m_{n+1}, \big \langle \langle \emptyset , \emptyset , \emptyset \rangle ,\ \langle \emptyset , \emptyset \rangle \big \rangle) = \mathsf {dtransfer}_{c}(m_{n+1}, \phi ^{\prime }) \end{align*}\) Furthermore, by Equations (9), (10) and Definition 3.10 we have: \(\begin{align*} R^{\prime \prime } &= (R^{\prime } \cup R^m) \setminus C^m \\ C^{\prime \prime } &= (C^{\prime } \cup C^m) \setminus R^m \end{align*}\) Here, we remark that the invariant \(R^{\prime \prime } \cap C^{\prime \prime } = \emptyset\) holds as \(R^m \cap C^m = \emptyset\) by \({well\_formed}(\mathcal {L}^*_c)\) (Definition 4.3). Now, by substitution and De Morgan’s laws we have: \(\begin{align*} f^{\prime \prime } &= (f \setminus C^{\prime \prime }) \cup R^{\prime \prime } \\ &= (f \setminus (C^{\prime } \cup C^m)) \cup ((R^{\prime } \cup R^m) \setminus C^m) \\ &= (((f \setminus C^{\prime }) \cup R^{\prime }) \setminus C^m) \cup R^m \\ &= ((f^{\prime } \setminus C^m) \cup R^m \end{align*}\) where the third equivalence holds by invariant \(R^{\prime } \cap C^{\prime } = \emptyset\) and \(R^m \cap C^m = \emptyset\). Furthermore, by the definition of \(\delta (\text{-})\) (from Definition 2.5) we have \(\delta (q_{b^{\prime }, f^{\prime }}, m_{n+1})=q_{b^{\prime \prime }, f^{\prime \prime }}\). This concludes this case. |
Summing up, we presented \(\textsf {BFAs}^*\), the extension to
5 EVALUATION
To evaluate our technique, we implement two analyses in Infer, namely \(\textsf {BFA}^*\) and DFA, and use the default Infer typestate analysis Topl as a baseline comparison. More in details:
(1) | \(\textsf {BFA}^*\): The Infer implementation of the analysis technique introduced in this article. | ||||
(2) | DFA: A lightweight, DFA-based typestate analyzer implemented in Infer. We translate \(\textsf {BFA}^*\) annotations to a minimal DFA and perform the analysis. | ||||
(3) | Topl: An industrial typestate analyzer, implemented in Infer [1]. |
We remark that Topl is designed for high precision and not for low-latency environments. It uses Pulse, an Infer memory safety analysis, which provides it with alias information. We include it in our evaluation as a baseline state-of-the-art typestate analysis, i.e., an off-the-shelf industrial-strength tool that we could hypothetically use. We note our benchmarks do not require aliasing and in theory Pulse is not required.
Goals and Considered Contracts. Our evaluation aims to validate the following two claims:
We analyzed a benchmark of 22 contracts that specify common patterns of locally dependent contract annotations for a class. Of these, 18 are may contracts and 4 are must contracts. We identified common patterns of locally dependent contracts, such as the setter/getter example given in Figure 1, and generated variants of them (e.g., by varying annotations and number of methods) such that we have contract samples that are almost linearly distributed in the number of (DFA) states. This can be seen in Figure 7, which outlines key features of these contracts (such as number of methods and number of states). The annotations for \(\textsf {BFA}^*\) are varied; from them, we generated minimal DFA representations in the DFA annotation format and Topl annotation format. This allows us to clearly show how the performance of the analyzers under consideration is impacted by the increase of the state space.
Moreover, we self-generated 122 client programs that follow the compositional patterns we described in Example 3.1 (this kind of patterns are also considered in, e.g., [14]). The pattern defines a composed class, as the class
5.1 Experimental Setup
We used an Intel(R) Core(TM) i9-9880H CPU at 2.3 GHz with 16 GB of physical RAM running macOS 11.6 on the bare-metal. The experiments were conducted in isolation without virtualization so that runtime results are robust. All experiments shown here are run in single-thread for Infer 1.1.0 running with OCaml 4.11.1.
Our use case is to integrate static analyses in interactive IDEs e.g., Microsoft Visual Studio Code [24], so that code can be analyzed at coding time. For this reason, our use case requires low-latency execution of the static analysis. Our SLA is based on the RAIL user-centric performance model [2].
5.2 Usability Evaluation
Figure 7 outlines the key features of the 22 contracts we considered, called CR-1 – CR-22. Among these, CR-12, CR-14, CR-17, and CR-22 are must contracts. For each contract, we specify the number of methods, the number of DFA states the contract corresponds to, and number of atomic annotation terms in \(\textsf {BFA}^*\), DFA, and Topl. An atomic annotation term is a standalone annotation in the given annotation language. In Figures 5 and 6, we detail CR-4 as an example.
Figure 7 shows that as the contract sizes increase in number of states, the annotation overhead for DFA and Topl increase significantly. On the other hand, the annotation overhead for \(\textsf {BFA}^*\) remain largely constant wrt. state increase and increases rather proportionally with the number of methods in a contract. Observe that for contracts on classes with four or more methods, a manual specification using DFA or Topl annotations becomes impractical. Overall, we validate Claim-I by the fact that \(\textsf {BFA}^*\) requires less annotation overhead on all of the contracts, making contract specification more practical.
5.3 Performance Evaluation
Recall that we distinguish between base and composed classes: the former have a user-entered contract, and the latter have contracts that are implicitly inferred based on those of their members (that could be either base or composed classes themselves). The total number of base classes in a composed class and contract size (i.e., the number of states in a minimal DFA that is a translation of a \(\textsf {BFA}^*\) contract) play the most significant roles in execution-time. In Figure 8, we present a comparison of analyzer execution-times (y-axis) with contract size (x-axis), where each line in the graph represents a different number of base classes composed in a given class (given in legends).
Comparing \(\textsf {BFA}^*\) and DFA analyses. The comparison is presented in Figure 8(a) and 8(b):
– | Figure 8(a) compares various class compositions (with contracts) specified in the legend, for client programs of 500-1K LoC. The DFA implementation sharply increases in execution-time as the number of states increases. The \(\textsf {BFA}^*\) implementation remains rather constant, always under the SLA of 1 seconds. Overall, \(\textsf {BFA}^*\) produces a geometric mean speedup over DFA of 5.7\(\times\). | ||||
– | Figure 8(b) compares various class compositions for client programs of 15K LoC. Both implementations fail to meet the SLA; however, the \(\textsf {BFA}^*\) is close and exhibits constant behavior regardless of the number of states in the contract. The DFA implementation is rather erratic, tending to sharply increase in execution-time as the number of states increases. Overall, \(\textsf {BFA}^*\) produces a geometric mean speedup over DFA of 1.5\(\times\). We note that must contracts do not have noticeable performance differences from may contracts. |
Comparing \(\textsf {BFA}^*\)-based analysis vs TOPL typestate implementation (Execution time). Here again, client programs do not require aliasing. The comparison is presented in Figure 8(c) and 8(d):
– | Figure 8(c) compares various class compositions for client programs of 500-1K LoC. The Topl implementation sharply increases in execution-time as the number of states increases, quickly missing the SLA. In contrast, the \(\textsf {BFA}^*\) implementation remains constant always under the SLA. Overall, \(\textsf {BFA}^*\) produces a geometric mean speedup over Topl of 6.59\(\times\). | ||||
– | Figure 8(d) compares various class compositions for client programs of 15K LoC. Both implementations fail to meet the SLA. The Topl implementation remains constant until \(\sim\)30 states and then rapidly increases in execution time. Overall, \(\textsf {BFA}^*\) produces a geometric mean speedup over Topl of 301.65\(\times\). |
Overall, we validate Claim-II by showing that our technique removes state as a factor of performance degradation at the expense of limited but suffice contract expressively. Even when using client programs of 15K LoC, we remain close to our SLA and with potential to achieve it with further optimizations. Again, we note that must contracts do not have noticeable performance differences from may contracts.
6 RELATED WORK
We focus on comparisons with restricted forms of typestate contracts. We refer to the typestate literature [7, 9, 10, 17, 23] for a more general treatment. The work [15] proposes a restricted form of typestates, tailored to use-cases of object construction using the builder pattern. This approach is restricted in that it only accumulates called methods in an abstract (monotonic) state, and it does not require aliasing for supported contracts. Compared to our approach, we share the idea of specifying typestate without explicitly mentioning states. On the other hand, their technique is less expressive than our annotations. They cannot express various properties we can (e.g., the property “cannot call a method”). Similarly, [12] defines heap-monotonic typestates where monotonicity can be seen as a restriction. It can be performed without an alias analysis.
Recent work on the Rapid analyzer [11] aims to verify cloud-based APIs usage. It combines local type-state with global value-flow analysis. Locality of type-state checking in their work is related to aliasing, not to type-state specification as in our work. Their type-state approach is DFA-based. They also highlight the state explosion problem for usual contracts found in practice, where the set of methods has to be invoked prior to some event. In comparison, we allow more granular contract specifications with a very large number of states while avoiding an explicit DFA. The Fugue tool [9] allows DFA-based specifications, but also annotations for describing specific resource protocols contracts. These annotations have a locality flavor—annotations on one method do not refer to other methods. Moreover, we share the idea of specifying typestate without explicitly mentioning states. The annotations in Fugue can specify “must call” properties (e.g., “must call a release method”). In this version of our article, we propose
Our annotations could be mimicked by having a local DFA attached to each method. In this case, the DFAs would have the same restrictions as our annotation language. We are not aware of prior work in this direction. We also note that while our technique is implemented in Infer using the algorithm in Section 2, the fact that we can translate typestates to bit-vectors allows typestate analysis for local contracts to be used in distributive dataflow frameworks, such as IFDS [21].
7 CONCLUDING REMARKS
In this article, we have tackled the problem of analyzing code contracts in low-latency environments by developing a novel lightweight typestate analysis. Our technique is based on
Future Work. There are several interesting research directions for the future work. First, it is worth investigating how
Moreover, it would be interesting to explore whether our
ACKNOWLEDGMENTS
We are grateful to the anonymous reviewers for their constructive remarks.
- [1] 2021. Infer TOPL. (2021). Retrieved from https://fbinfer.com/docs/checker-topl/Google Scholar
- [2] 2021. RAIL model. (2021). Retrieved from https://web.dev/rail/
Accessed: 2021-09-30. Google Scholar - [3] . 2022. Scalable typestate analysis for low-latency environments. In Integrated Formal Methods - 17th International Conference, IFM 2022, Lugano, Switzerland, June 7-10, 2022, Proceedings (Lecture Notes in Computer Science), and (Eds.), Vol. 13274. Springer, 322–340.
DOI: Google ScholarDigital Library - [4] . 2022. LFA checker: Scalable typestate analysis for low-latency environments. (
Mar 2022).DOI: Google ScholarCross Ref - [5] . 2014. FlowDroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. ACM SIGPLAN Notices 49, 6 (06 2014), 259–269.
DOI: Google ScholarDigital Library - [6] . 2007. Modular typestate checking of aliased objects. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA’07). Association for Computing Machinery, New York, NY, 301–320.
DOI: Google ScholarDigital Library - [7] . 2012. The clara framework for hybrid typestate analysis. International Journal on Software Tools for Technology Transfer 14, 3 (
jun 2012), 307–326.Google ScholarDigital Library - [8] . 2011. Infer: An automatic program verifier for memory safety of c programs. In NASA Formal Methods, , , , and (Eds.). Springer, Berlin, 459–465.Google ScholarCross Ref
- [9] . 2004. The Fugue Protocol Checker: Is Your Software Baroque?
Technical Report MSR-TR-2004-07. Microsoft Research.Google Scholar - [10] . 2004. Typestates for objects. In ECOOP 2004 - Object-Oriented Programming, 18th European Conference, Oslo, Norway, June 14-18, 2004, Proceedings (Lecture Notes in Computer Science), (Ed.), Vol. 3086. Springer, 465–490.
DOI: Google ScholarCross Ref - [11] . 2021. RAPID: Checking API usage for the cloud in the cloud. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, 1416–1426.
DOI: Google ScholarDigital Library - [12] . 2003. Heap monotonic typestate. In Proceedings of the 1st International Workshop on Alias Confinement and Ownership (IWACO) (proceedings of the first international workshop on alias confinement and ownership (iwaco) ed.). Retrieved from https://www.microsoft.com/en-us/research/publication/heap-monotonic-typestate/Google Scholar
- [13] . 2010. Static contract checking with abstract interpretation. In Proceedings of the 2010 International Conference on Formal Verification of Object-Oriented Software (FoVeOOS’10). Springer-Verlag, Berlin, 10–30.Google Scholar
- [14] . 2021. Papaya: Global typestate analysis of aliased objects. In Proceedings of the 23rd International Symposium on Principles and Practice of Declarative Programming (PPDP’21). Association for Computing Machinery, New York, NY, Article
19 , 13 pages.DOI: Google ScholarDigital Library - [15] . 2020. Verifying object construction. In ICSE 2020, Proceedings of the 42nd International Conference on Software Engineering. Seoul, Korea.Google ScholarDigital Library
- [16] . 2017. Data Flow Analysis: Theory and Practice. CRC Press. Retrieved from https://books.google.rs/books?id=9PyrtgNBdg0CGoogle ScholarCross Ref
- [17] . 2004. Generalized typestate checking using set interfaces and pluggable analyses. SIGPLAN Not. 39, 3 (
March 2004), 46–55.DOI: Google ScholarDigital Library - [18] . 2015. Access-path abstraction: Scaling field-sensitive data-flow analysis with unbounded access paths. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE Press, 619–629.
DOI: Google ScholarDigital Library - [19] . 2021. Java typestate checker. In Coordination Models and Languages, and (Eds.). Springer International Publishing, Cham, 121–133.Google Scholar
- [20] . 2021. Why security defects go unnoticed during code reviews? A case-control study of the chromium OS project. In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 1373–1385.
DOI: Google ScholarDigital Library - [21] . 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’95). Association for Computing Machinery, New York, NY, 49–61.
DOI: Google ScholarDigital Library - [22] . 2019. Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems. Proceedings of the ACM on Programming Languages 3, POPL, Article
48 (jan 2019), 29 pages.DOI: Google ScholarDigital Library - [23] . 1986. Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering 12, 1 (1986), 157–171.
DOI: Google ScholarDigital Library - [24] . 2022. A static analysis framework for data science notebooks. In Proceedings of the 44th International Conference on Software Engineering.Google Scholar
- [25] . 2016. IncA: A DSL for the definition of incremental program analyses. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). Association for Computing Machinery, New York, NY, 320–331.
DOI: Google ScholarDigital Library - [26] . 2011. The SAFE Experience. Springer, Berlin, 17–33.
DOI: Google ScholarCross Ref
Index Terms
- Bit-Vector Typestate Analysis
Recommendations
Typestate-like analysis of multiple interacting objects
OOPSLA '08: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applicationsThis paper presents a static analysis of typestate-like temporal specifications of groups of interacting objects, which are expressed using tracematches. Whereas typestate expresses a temporal specification of one object, a tracematch state may change ...
The Clara framework for hybrid typestate analysis
A typestate property describes which operations are available on an object or a group of inter-related objects, depending on this object's or group's internal state, the typestate. Researchers in the field of static analysis have devised static program ...
Typestate-like analysis of multiple interacting objects
This paper presents a static analysis of typestate-like temporal specifications of groups of interacting objects, which are expressed using tracematches. Whereas typestate expresses a temporal specification of one object, a tracematch state may change ...
Comments