1 Introduction

Numerous extensions of nondeterministic finite-state string automata have been proposed in the past few decades. On the one hand, the qualitative evaluation of inputs was extended to a quantitative evaluation in the weighted automata of [26]. This development led to the fruitful study of recognizable formal power series [25], which are well-suited for representing factors such as costs, consumption of resources, or time and probabilities related to the processed input. The main algebraic structure for the weight calculations are semirings [16, 17], which offer a nice compromise between generality and efficiency of computation (due to their distributivity). On the other hand, finite-state automata have been generalized to other input structures such as infinite words [24] and trees [4]. Finite-state tree automata were introduced independently in [7, 27, 28], and they and the tree languages they generate, called regular tree languages, have been intensively studied since their inception [4]. They are successfully utilized in various applications in many diverse areas like natural language processing [18], picture generation [8], and compiler construction [31]. Indeed several applications require the combination of the two mentioned generalizations, and a broad range of weighted tree automaton (WTA) models has been studied (see [13, Chapter 9] for an overview).

It is well-known that finite-state tree automata cannot ensure that two subtrees (of potentially arbitrary size) are always equal in an accepted tree [14]. An extension proposed in [22] aims to remedy this problem and introduces a tree automaton model that explicitly can require certain subtrees to be equal or different. Such models are very useful when investigating (tree) transformation models (see [13] for an overview) that can copy subtrees (thus resulting in equal subtrees in the output), and they are the main tool used in the seminal paper [15] that proved that the HOM problem is decidable. The HOM problem was a long-standing open problem in the theory of tree languages and recently solved in [15]. It asks whether the image of an (effectively presented) regular tree language under a given tree homomorphism is again regular. This is not necessarily the case as tree homomorphisms can create copies of subtrees. Indeed removing this ability from the tree homomorphism, obtaining a linear tree homomorphism, yields that the mentioned image is always regular [14]. In the solution to the HOM problem provided in [15] the image is first represented by a tree automaton with constraints, and then it is investigated whether this tree automaton actually generates a regular tree language.

The HOM problem is also interesting in the weighted setting as it once again provides an answer whether a given homomorphic image of a regular weighted tree language can be represented efficiently. While preservation of regularity has been investigated [3, 10,11,12] also in the weighted setting, the decidability of the HOM problem remains wide open. With the goal of investigating this problem, we introduce weighted tree grammars with constraints (WTGc for short) in this contribution. We demonstrate that those WTGc can again represent all (nondeleting and nonerasing) homomorphic images of the regular weighted tree languages. Thus, in principle, it only remains to provide a decision procedure for determining whether a given WTGc generates a regular weighted tree language. We approach this task by providing some common closure properties following essentially the steps also taken in [15]. For zero-sum free semirings we can also show that decidability of support emptiness and finiteness are directly inherited from the unweighted case [15].

The present work is a revised and extended version of [20] presented at the 26th Int. Conf. Developments in Language Theory (DLT 2022). We provide additional proof details and examples, as well as a new pumping lemma for the class of (nondeleting and nonerasing) homomorphic images of regular weighted tree languages. We utilize this pumping lemma to show that for any zero-sum free semiring, the class of homomorphic images of regular weighted tree languages is properly contained in the class of weighted tree languages generated by all positive WTGc, which are WTGc that utilize only equality constraints.

2 Preliminaries

We denote the set of nonnegative integers by \(\mathbb {N}\), and we let \([k] = \{i \in \mathbb {N}\mid 1 \le i \le k\}\) for every \(k \in \mathbb {N}\). For all sets T and Z let \(T^Z\) be the set of all mappings \(\varphi :Z \rightarrow T\), and correspondingly we sometimes write \(\varphi _z\) instead of \(\varphi (z)\) for every \(\varphi \in T^Z\). The inverse image \(\varphi ^{-1}(S)\) of \(\varphi \) for a subset \(S \subseteq T\) is \(\varphi ^{-1}(S) = \{z \in Z \mid \varphi (z) \in S\}\), and we write \(\varphi ^{-1}(t)\) instead of \(\varphi ^{-1}(\{t\})\) for every \(t \in T\). The range of \(\varphi \) is

$$ \textrm{ran}(\varphi ) = \bigl \{\varphi (z) \mid z \in Z \bigr \} \hspace{5.0pt}. $$

Let \(R \subseteq T \times Z\) be a relation. We denote its inverse relation \(\{(z, t) \mid (t, z) \in R\}\) by \(R^{-1}\). The identity relation on T is \( {{{\,\textrm{id}\,}}}_T = \{(t,t) \mid t \in T\}\), or simply ‘\({{\,\textrm{id}\,}}\)’ if the set T is clear from the context. Finally, the cardinality of Z is denoted by \(|Z |\).

A ranked alphabet \((\varSigma , {{{\,\textrm{rk}\,}}})\) is a pair consisting of a finite set \(\varSigma \) and a map \( {{{\,\textrm{rk}\,}}} \in \mathbb {N}^\varSigma \) that assigns a rank to each symbol of \(\varSigma \). If there is no risk of confusion, we denote a ranked alphabet \((\varSigma , {{{\,\textrm{rk}\,}}})\) by \(\varSigma \). We write \(\sigma ^{(k)}\) to indicate that \({{\,\textrm{rk}\,}}(\sigma ) = k\). Moreover, for every \(k \in \mathbb {N}\) we let \(\varSigma _k = {{\,\textrm{rk}\,}}^{-1}(k)\). Let \(X = \{x_i \mid i\ge 1\}\) be a countable set of (formal) variables. For each \(k \in \mathbb {N}\) we let \(X_k = \bigl \{x_i \mid i \in [k] \bigr \}\). Given a ranked alphabet \(\varSigma \) and a set Z, the set \(T_\varSigma (Z)\) of \(\varSigma \)trees indexed by Z is the smallest set such that \(Z \subseteq T_\varSigma (Z)\) and \(\sigma (t_{1}, \dotsc , t_{k}) \in T_\varSigma (Z)\) for every \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_\varSigma (Z)\). We abbreviate \(T_\varSigma (\emptyset )\) simply to \(T_\varSigma \), and any subset \(L \subseteq T_\varSigma \) is called a tree language.

Let \(\varSigma \) be a ranked alphabet, Z a set, and \(t \in T_\varSigma (Z)\). The set \({{\,\textrm{pos}\,}}(t)\) of positions of t is inductively defined by \({{\,\textrm{pos}\,}}(z) = \{\varepsilon \}\) for all \(z \in Z\) and by

$$ {{\,\textrm{pos}\,}}\bigl (\sigma (t_{1}, \dotsc , t_{k}) \bigr ) = \bigl \{\varepsilon \bigr \} \cup \bigcup _{i \in [k]} \bigl \{iw \mid w \in {{\,\textrm{pos}\,}}(t_i) \bigr \} $$

for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_\varSigma (Z)\). The size \(|t |\) of t is defined as \(|t | = |{{\,\textrm{pos}\,}}(t) |\), and its height \({{\,\textrm{ht}\,}}(t)\) is \({{\,\textrm{ht}\,}}(t) = \max _{w \in {{\,\textrm{pos}\,}}(t)} |w |\). For \(w\in {{\,\textrm{pos}\,}}(t)\) and \(t' \in T_\varSigma (Z)\), the label t(w) of t at w, the subtree \(t|_w\) of t at w, and the substitution \(t[t']_w\) of \(t'\) into t at w are defined by \(z(\varepsilon ) = z|_\varepsilon = z\) and \(z[t']_\varepsilon = t'\) for all \(z \in Z\) and for \(t = \sigma (t_{1}, \dotsc , t_{k})\) by \(t(\varepsilon ) = \sigma \), \(t(iw') = t_i(w')\), \(t|_\varepsilon = t\), \(t|_{iw'} = t_i|_{w'}\), \(t[t']_\varepsilon = t'\), and

$$ t[t']_{iw'} = \sigma \bigl (t_{1}, \dotsc , t_{i-1}, t_i[t']_{w'}, t_{i+1}, \dotsc , t_{k} \bigr ) $$

for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), \(t_{1}, \dotsc , t_{k} \in T_\varSigma (Z)\), \(i \in [k]\), and \(w' \in {{\,\textrm{pos}\,}}(t_i)\). For all \(S \subseteq \varSigma \cup Z\), we let \({{\,\textrm{pos}\,}}_S(t) = \bigl \{w \in {{\,\textrm{pos}\,}}(t) \mid t(w) \in S \bigr \}\) and \({{\,\textrm{var}\,}}(t) = \bigl \{x \in X \mid {{\,\textrm{pos}\,}}_x(t) \ne \emptyset \bigr \}\). For a single \(\sigma \in \varSigma \cup Z\) we abbreviate \({{\,\textrm{pos}\,}}_{\{\sigma \}}(t)\) simply by \({{\,\textrm{pos}\,}}_\sigma (t)\).

The yield mapping \(\text {yield} :T_\varSigma (Z) \rightarrow Z^*\) is recursively defined by

$$ {{\,\textrm{yield}\,}}\bigl (z \bigr ) = z \qquad \text {and} \qquad {{\,\textrm{yield}\,}}\bigl ( \sigma (t_{1}, \dotsc , t_{k}) \bigr ) = {{\,\textrm{yield}\,}}(t_1) \cdots {{\,\textrm{yield}\,}}(t_k) $$

for every \(z \in Z\), \(k \in \mathbb N\), \(\sigma \in \varSigma _k\), and trees \(t_{1}, \dotsc , t_{k} \in T_\Sigma (Z)\). A tree \(t \in T_\Sigma (Z)\) is called context if \(|{{\,\textrm{pos}\,}}_z(t) | = 1\) for every \(z \in Z\). We write \(C_\Sigma (Z)\) for the set of such contexts and \(\widehat{C}_\varSigma (X_k) = \bigl \{c \in C_\varSigma (X_k) \mid {{\,\textrm{yield}\,}}(c) = x_{1} \cdots x_{k} \bigr \}\). Finally, for every \(t\in T_\varSigma (Z)\), finite \(V \subseteq Z\), and \(\theta \in T_\varSigma (Z)^V\), the substitution \(\theta \) applied to t is written as \(t\theta \) and defined by \(v\theta = \theta _v\) for every \(v \in V\), \(z\theta = z\) for every \(z \in Z \setminus V\), and

$$ \sigma (t_{1}, \dotsc , t_{k})\theta = \sigma (t_1\theta , \dotsc , t_k\theta ) $$

for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_\varSigma (Z)\). We also write the substitution \(\theta \in T_\varSigma (Z)^V\) as \([v_1 \leftarrow \theta _{v_1}, \dotsc , v_n \leftarrow \theta _{v_n}]\) if \(V = \{v_{1}, \dotsc , v_{n}\}\). Finally, we abbreviate it further to just \([\theta _{v_1}, \dotsc , \theta _{v_n}]\) if \(V = X_n\).

A commutative semiring [16, 17] is a tuple \((\mathbb {S}, +, \cdot , 0, 1)\) such that \((\mathbb {S}, +, 0)\) and \((\mathbb {S}, \cdot , 1)\) are commutative monoids, \(\cdot \) distributes over \(+\), and \(0 \cdot s = 0\) for all \(s\in \mathbb {S}\). Examples include (i) the Boolean semiring \(\mathbb {B} = \bigl (\{ 0,1 \}, \vee , \wedge , 0, 1 \bigr )\), (ii) the semiring \(\mathbb {N}= \bigl (\mathbb {N}, +, \cdot , 0, 1)\), (iii) the tropical semiring \(\mathbb T = \bigl (\mathbb {N}\cup \{\infty \}, {\min }, +, \infty , 0 \bigr )\), and (iv) the arctic semiring \(\mathbb A = \bigl (\mathbb {N}\cup \{-\infty \}, {\max }, +, -\infty , 0 \bigr )\). Given two semirings

$$ (\mathbb S, +, \cdot , 0, 1) \qquad \text {and} \qquad (\mathbb T, \oplus , \odot , \bot , \top ) \hspace{5.0pt}, $$

a semiring homomorphism is a mapping \(h \in \mathbb T^{\mathbb S}\) such that \(h(0) = \bot \), \(h(1) = \top \), and \(h(s_1 + s_2) = h(s_1) \oplus h(s_2)\) as well as \(h(s_1 \cdot s_2) = h(s_1) \odot h(s_2)\) for all \(s_1, s_2 \in \mathbb S\). When there is no risk of confusion, we refer to a semiring \((\mathbb {S}, +, \cdot , 0, 1)\) simply by its carrier set \(\mathbb {S}\). A semiring \(\mathbb {S}\) is a ring if there exists \(-1 \in \mathbb S\) such that \(-1 + 1 = 0\). Let \(\varSigma \) be a ranked alphabet. Any mapping \(A \in \mathbb S^{T_\varSigma }\) is called a weighted tree language over \(\mathbb {S}\), and its support is \({{\,\textrm{supp}\,}}(A) = \{t \in T_\varSigma \mid A_t \ne 0\}\).

Let \(\varSigma \) and \(\varDelta \) be ranked alphabets and \(h' \in T_\varDelta (X)^\varSigma \) a map such that \(h'_\sigma \in T_\varDelta (X_k) \) for all \(k \in \mathbb {N}\) and \(\sigma \in \varSigma _k\). We extend \(h'\) to \(h \in T_\varDelta ^{T_\varSigma }\) by (i) \(h(\alpha ) = h'_\alpha \in T_\varDelta (X_0) = T_\varDelta \) for all \(\alpha \in \varSigma _0\) and (ii) \(h\bigl (\sigma (t_{1}, \dotsc , t_{k}) \bigr ) = h'_\sigma \bigl [h(t_1), \dotsc , h(t_k) \bigr ]\) for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_\varSigma \). The mapping h is called the tree homomorphism induced by \(h'\). For complexity arguments, we define the size \(|h |\) of h as \(|h |=\sum _{\sigma \in \varSigma } |{{\,\textrm{pos}\,}}(h'_\sigma ) |\). The tree homomorphism h is nonerasing if \(h'_\sigma \notin X\) for all \(k\in \mathbb {N}\) and \(\sigma \in \varSigma _k\), and it is nondeleting if \({{\,\textrm{var}\,}}(h'_\sigma ) = X_k\) for all \(k\in \mathbb {N}\) and \(\sigma \in \varSigma _k\). For simplicity, we denote both \(h'\) and its induced tree homomorphism by h.

Let \(h \in T_\varDelta ^{T_\varSigma }\) be a nonerasing and nondeleting homomorphism. Then h is input finitary; i.e., the set \(h^{-1}(u)\) is finite for every \(u \in T_\varDelta \) because \(|t | \le |u |\) for each \(t \in h^{-1}(u)\). Additionally, let \(A \in \mathbb {S}^{T_\varSigma }\) be a weighted tree language. We define the weighted tree language \(h(A) \in \mathbb {S}^{T_\varDelta }\) for every \(u \in T_\varDelta \) by \(h(A)_u = \sum _{t \in h^{-1}(u)} A_t\).

3 Weighted Tree Grammars with Constraints

Let us start with the formal definition of our weighted tree grammars. They are a weighted variant of the tree automata with equality and inequality constraints originally introduced in [1, 5]. Compared to [1, 5] our model is slightly more expressive as we allow arbitrary constraints, whereas constraints were restricted to subtrees occurring in the productions in [1, 5]. This more restricted version will be called classic in the following. An overview of further developments for these automata can be found in [29]. We essentially use the version recently utilized to solve the HOM problem [15, Definition 4.1]. For the rest of this section, let \((\mathbb {S}, +, \cdot , 0, 1)\) be a commutative semiring.

Definition 1

(see [15, Definition 4.1]) A weighted tree grammar with constraints (WTGc) is a tuple \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) such that

  • Q is a finite set of nonterminals and \(F \in \mathbb {S}^Q\) assigns final weights,

  • \(\varSigma \) is a ranked alphabet of input symbols,

  • P is a finite set of productions of the form \((\ell , q, E, I)\), where \(\ell \in T_\varSigma (Q) {\setminus } Q\), \(q \in Q\), and \(E, I \subseteq \mathbb N^* \times \mathbb N^*\) are finite sets, and

  • \({{{\,\textrm{wt}\,}}} \in \mathbb {S}^P\) assigns a weight to each production. \(\square \)

In the following, let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTGc. The components of a production \(p = (\ell , q, E, I) \in P\) are the left-hand side \(\ell \), the target nonterminal q, the set E of equality constraints, and the set I of inequality constraints. Correspondingly, the production p is also written \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}q\) or even \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}_{{{\,\textrm{wt}\,}}_p} q\) if we want to indicate its weight. Additionally, we simply list an equality constraint \((v, v') \in E\) as \(v = v'\) and an inequality constraint \((v, v') \in I\) as \(v \ne v'\). A production \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\) is normalized if \(\ell = \sigma (q_{1}, \dotsc , q_{k})\) for some \(k \in \mathbb N\), \(\sigma \in \varSigma _k\), and \(q_{1}, \dotsc , q_{k} \in Q\). It is positive if \(I = \emptyset \); i.e., it has no inequality constraints, and it is unconstrained if \(E = \emptyset = I\); i.e., the production has no constraints at all. Instead of \(\ell {\mathop {\longrightarrow }\limits ^{\emptyset , \emptyset }}q\) we also write just \(\ell \rightarrow q\). The production is classic if \(\{v, v'\} \subseteq {{\,\textrm{pos}\,}}_Q(\ell )\) for all constraints \((v, v') \in E \cup I\). In other words, in a classic production the constraints can only refer to nonterminal-labeled subtrees of the left-hand side. The WTGc G is a weighted tree automaton with constraints (WTAc) if all productions \(p \in P\) are normalized, and it is a weighted tree grammar (WTG) [14] if all productions \(p \in P\) are unconstrained. If G is both a WTAc as well as a WTG, then it is a weighted tree automaton (WTA) [14]. All these devices have Boolean final weights if \(F \in \{0,1\}^Q\), they are positive if every \(p \in P\) is positive, and they are classic if every production \(p \in P\) is classic. Finally, if we utilize the Boolean semiring \(\mathbb B\), then we reobtain the unweighted versions and omit the ‘W’ in the abbreviations and the mapping ‘\({{\,\textrm{wt}\,}}\)’ from the tuple.

The semantics for our WTGc G is a slightly non-standard derivation semantics when compared to [15, Definitions 4.3 & 4.4]. Let \((v,v') \in \mathbb {N}^* \times \mathbb {N}^*\) and \(t \in T_\varSigma \). If \(v,v' \in {{\,\textrm{pos}\,}}(t)\) and \(t|_v = t|_{v'}\), we say that t satisfies \((v,v')\), otherwise t dissatisfies \((v,v')\). Let now \(C \subseteq \mathbb {N}^* \times \mathbb {N}^*\) be a finite set of constraints. We write \(t\models C\) if t satisfies all \((v,v') \in C\), and  if t dissatisfies all \((v,v') \in C\). Universally dissatisfying C is generally stronger than simply not satisfying C.

Definition 2

A sentential form (for G) is simply a tree of \(\xi \in T_\varSigma (Q)\). Given an input tree \(t \in T_\varSigma \), sentential forms \(\xi , \zeta \in T_\varSigma (Q)\), a production \(p = \ell {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\), and a position \(w \in {{\,\textrm{pos}\,}}(\xi )\), we write \(\xi \Rightarrow _{G,t}^{p,w} \zeta \) if \(\xi |_w = \ell \), \(\zeta = \xi [q]_w\), and the constraints E and I are fulfilled on \(t|_w\); i.e., \(t|_w \models E\) and \(t|_w \) . A sequence

$$ d = (p_1,w_1) \cdots (p_n, w_n) \in (P \times \mathbb N^*)^* $$

is a derivation of G for t if there exist \(\xi _{1}, \dotsc , \xi _{n} \in T_\varSigma (Q)\) such that

$$ t \Rightarrow _{G, t}^{p_1,w_1} \xi _1 \Rightarrow _{G, t}^{p_2,w_2} \cdots \Rightarrow _{G, t}^{p_n,w_n} \xi _n \hspace{5.0pt}. $$

It is left-most if additionally \(w_1 \prec w_2 \prec \cdots \prec w_n\), where \(\preceq \) is the lexicographic order on \(\mathbb N^*\) in which prefixes are larger, so \(\varepsilon \) is the largest element. \(\square \)

Note that the sentential forms \(\xi _{1}, \dotsc , \xi _{n}\) are uniquely determined if they exist, and for any derivation d for t there exists a unique permutation of d that is a left-most derivation for t. The derivation d is complete if \(\xi _n \in Q\), and in that case it is also called a derivation to \(\xi _n\). The set of all complete left-most derivations for t to \(q \in Q\) is denoted by \(D^q_G(t)\). The WTGc G is unambiguous if \(\sum _{q \in {{\,\textrm{supp}\,}}(F)} |D_G^q(t) | \le 1\) for every \(t \in T_\varSigma \).

Let \(p = \ell {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\) be a production. Since there exist unique \(k = |{{\,\textrm{pos}\,}}_Q(\ell ) |\), context \(c \in \widehat{C}_\varSigma (X_k)\), and \(q_{1}, \dotsc , q_{k} \in Q\) such that \(\ell = c[q_{1}, \dotsc , q_{k}]\), we also simply write

$$ c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E, I}}q $$

instead of p. Using this notation, we can present a recursion for the set \(D_G^q(t)\) of complete left-most derivations for \(t \in T_\varSigma \) to \(q \in Q\).

Specifically, let \(d = (p_1, w_1) \cdots (p_n, w_n)\) be a complete derivation for some tree \(t \in T_\varSigma \). For a given position \(w \in \{w_{1}, \dotsc , w_{n}\}\), we let \(k \in \mathbb N\) and \(1 \le i_1< \cdots < i_k \le n\) be the indices such that \(\bigl \{i_{1}, \dotsc , i_{k} \bigr \} = \bigl \{ i \in [n] \mid w_i = ww'_i \bigr \}\); i.e., the indices of the derivation steps applied to positions below w with \(w'_i\) being the suffix of \(w_i\) following the prefix w for all \(i \in \{i_{1}, \dotsc , i_{k}\}\). The derivation for \(t|_w\) incorporated in d is the derivation \((p_{i_1}, w'_{i_1}), \dotsc , (p_{i_k}, w'_{i_k})\). Conversely, for every \(w \in \mathbb {N}^*\) we abbreviate the derivation \((p_1, ww_1) \cdots (p_n, ww_n)\) by simply wd.

Definition 3

The weight of a derivation \(d = (p_1,w_1) \cdots (p_n, w_n)\) is defined to be

$$ {{\,\textrm{wt}\,}}_G(d) = \prod _{i = 1}^n {{\,\textrm{wt}\,}}(p_i) \hspace{5.0pt}. $$

The weighted tree language generated by G, written simply \(G \in \mathbb S^{T_\varSigma }\), is defined for every \(t \in T_\varSigma \) by

$$ G_t = \sum _{q \in Q,\, d \in D^q_G(t)} F_q \cdot {{\,\textrm{wt}\,}}_G(d) . \qquad \quad {\square } $$

Two WTGc are equivalent if they generate the same weighted tree language. Finally, a weighted tree language is

  • regular if it is generated by some WTG,

  • positive constraint-regular if it is generated by some positive WTGc,

  • classic constraint-regular if it is generated by some classic WTGc, and

  • constraint-regular if it is generated by some WTGc.

Since the weights of productions are multiplied, we can assume without loss of generality that \({{\,\textrm{wt}\,}}_p \ne 0\) for all \(p \in P\).

Example 1

Consider the WTGc \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) over the arctic semiring \(\mathbb A\) with nonterminals \(Q = \{q, q'\}\), \(\varSigma = \{\alpha ^{(0)}, \gamma ^{(1)}, \sigma ^{(2)}\}\), \(F_q = -\infty \), \(F_{q'} = 0\), and P and ‘\({{\,\textrm{wt}\,}}\)’ given by the productions \(p_1 = \alpha \rightarrow _0 q\), \(p_2 = \gamma (q) \rightarrow _1 q\), and \(p_3 = \sigma \bigl (\gamma (q), q\bigr ) {\mathop {\longrightarrow }\limits ^{11=2}}_1 q'\). Clearly, G is positive and classic, but not a WTAc. The tree \(t = \sigma \bigl (\gamma (\gamma (\alpha )), \gamma (\alpha ) \bigr )\) has the unique left-most derivation

$$ d = (p_1, 111) \, (p_2, 11) \, (p_1, 21) \, (p_2, 2) \, (p_3, \varepsilon ) $$

to the nonterminal \(q'\), which is illustrated in Fig. 1. Overall, we have

$$ {{\,\textrm{supp}\,}}(G) = \big \{ \sigma \bigl (\gamma ^{i+1}(\alpha ), \gamma ^i(\alpha )\bigr ) \mid i \in \mathbb {N}\bigr \} $$

and \(G_t = |{{\,\textrm{pos}\,}}_\gamma (t) |\) for every \(t \in {{\,\textrm{supp}\,}}(G)\), where \(\gamma ^i(t)\) abbreviates \(\gamma ( \cdots \gamma (t) \cdots )\) containing i" times the unary symbol \(\gamma \) atop t. \(\square \)

Fig. 1
figure 1

Illustration of the derivation mentioned in Example 1

Next, we introduce another semantics, called initial algebra semantics, which is based on the presented recursive presentation of derivations and often more convenient in proofs.

Definition 4

For every nonterminal \(q \in Q\) we recursively define the map \({{{\,\textrm{wt}\,}}_G^q} \in \mathbb {S}^{T_\varSigma }\) such that for every \(t \in T_\varSigma \) by

(1)

\(\square \)

It is a routine matter to verify that \({{\,\textrm{wt}\,}}_G^q(t) = \sum _{d \in D^q_G(t)} {{\,\textrm{wt}\,}}_G(d)\) for every \(q \in Q\) and \(t \in T_\varSigma \). This utilizes the presented recursive decomposition of complete derivations as well as distributivity of the semiring \(\mathbb S\).

As for WTG and WTA [13], also every (positive) WTGc can be turned into an equivalent (positive) WTAc at the expense of additional nonterminals by decomposing the left-hand sides.

Lemma 1

(cf. [15, Lemma 4.8]) WTGc and WTAc are equally expressive. This also applies to positive WTGc.

Proof

Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTGc with a non-normalized production

$$ p = \sigma (\ell _{1}, \dotsc , \ell _{k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P \hspace{5.0pt}, $$

let \(U\supseteq Q\) and let \(\varphi \in U^{T_\varSigma (Q)}\) be an injective map such that \(\varphi _q = q\) for all \(q \in Q\). We define the WTGc \(G' = (Q', \varSigma , F', P', {{{\,\textrm{wt}\,}}'})\) such that \(Q' = Q \cup \{\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}\}\), \(F'_q = F_q\) for all \(q \in Q\) and \(F'_{q'} = 0\) for all \(q' \in Q' {\setminus } Q\), and

$$ P' = \bigl (P {\setminus } \{p\} \bigr ) \cup \bigl \{\sigma (\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \bigr \} \cup \bigl \{\ell _i \rightarrow \varphi _{\ell _i} \mid i \in [k], \ell _i \notin Q \bigr \} \hspace{5.0pt}, $$

and for every \(p' \in P'\)

$$ {{\,\textrm{wt}\,}}'_{p'} = {\left\{ \begin{array}{ll} {{\,\textrm{wt}\,}}_{p'} &{} \text {if } p' \in P {\setminus } \{p\} \\ {{\,\textrm{wt}\,}}_p &{} \text {if } p' = \sigma (\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \\ 1 &{} \text {otherwise.} \end{array}\right. } $$

To prove that \(G'\) is equivalent to G we observe that for every left-most derivation

$$ d = (p_1, w_1) \cdots (p_n, w_n) $$

of G, there exists a corresponding derivation \(d'\) of \(G'\), which is obtained by replacing each derivation step \((p_a, w_a)\) with \(p_a = p\) by the sequence

$$ (\ell _i \rightarrow \varphi _{\ell _i},\, w_ai)_{i \in [k], \ell _i \notin Q} \bigl (\sigma (\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}) {\mathop {\longrightarrow }\limits ^{E,I}}q, w_a \bigr ) $$

of derivation steps of \(G'\) (yielding also a unique corresponding left-most derivation). This replacement preserves the weight of the derivation. Vice versa any left-most derivation of \(G'\) that utilizes the production \(\sigma (\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P'\) at w needs to previously utilize the productions \(\ell _i \rightarrow \varphi _{\ell _i} \in P'\) at wi for all \(i \in [k]\) with \(\ell _i \notin Q\) since these are the only productions that generate the nonterminal \(\varphi _{\ell _i}\). Thus, we established a weight-preserving bijection between the left-most derivations of G and \(G'\), so it is obvious that \(G' = G\). Repeated application of the normalization yields an equivalent WTAc. Assuming a fixed ranked alphabet \(\varSigma \), every step of the normalization adds at most \(\max _{\sigma \in \varSigma } {{\,\textrm{rk}\,}}(\sigma )\) new productions and states that can be computed in constant time. In total \(\sum _{c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E,I}}q\in P} \bigl (|{{\,\textrm{pos}\,}}_\varSigma (c) | -1 \bigr )\) steps will be executed, so the overall construction runs in linear time and returns an output WTAc whose size is linear in the size of the input WTGc. Finally, we note that the constructed WTAc is positive if the original WTGc is positive. \(\square \)

As we will see in the next example, the construction used in the proof of Lemma 1 does not preserve the classic property.

Example 2

Consider the classic and positive WTGc G of Example 1 and its non-normalized production \(p = \sigma \big (\gamma (q), q\big ) {\mathop {\longrightarrow }\limits ^{11=2}}_1 q'\). Applying the construction in the proof of Lemma 1 we replace p by the productions \(\sigma (q'', q) {\mathop {\longrightarrow }\limits ^{11=2}}_1 q\), which is not classic, and \(\gamma (q) \rightarrow _0 q''\), where \(q''\) is some new nonterminal. The WTGc obtained this way is already a positive WTAc. \(\square \)

Another routine normalization turns the final weights into Boolean final weights following the approach of [2, Lemma 6.1.1]. This is achieved by adding special copies of all nonterminals that terminate the derivation and pre-apply the final weight.

Lemma 2

WTGc and WTGc with Boolean final weights are equally expressive. This also applies to positive WTGc, classic WTGc, and classic positive WTGc as well as the same WTAc.

Proof

Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTGc. Let \(f \in C^Q\) be bijective with \(C \cap Q = \emptyset \). We construct the WTGc \(G' = (Q \cup C, \varSigma , F', P \cup P', {{{\,\textrm{wt}\,}}} \cup {{{\,\textrm{wt}\,}}'})\) such that \(p' = \ell {\mathop {\longrightarrow }\limits ^{E, I}}f_q\) belongs to \(P'\) and \({{\,\textrm{wt}\,}}'_{p'} = {{\,\textrm{wt}\,}}_p \cdot F_q\) for every \(p = \ell {\mathop {\longrightarrow }\limits ^{E, I}}q \in P\). No other productions belong to \(P'\). Finally, \(F'_q = 0\) for all \(q \in Q\) and \(F_c = 1\) for all \(c \in C\). The proof of equivalence is straightforward showing for every \(t \in T_\varSigma \) and \(q \in Q\) that

$$ {{\,\textrm{wt}\,}}_{G'}^q(t) = {{\,\textrm{wt}\,}}_G^q(t) \qquad \text {and} \qquad {{\,\textrm{wt}\,}}_{G'}^{f(q)}(t) = {{\,\textrm{wt}\,}}_G^q(t) \cdot F_q \hspace{5.0pt}. $$

The construction trivially preserves the properties normalized, positive, and classic. \(\square \)

Let \(d \in D_G^q(t)\) be a derivation for some \(q \in Q\) and \(t \in T_\varSigma \). Since we often argue with the help of such derivations d, it is a nuisance that we might have \({{\,\textrm{wt}\,}}_G(d) = 0\). This anomaly can occur even if \({{\,\textrm{wt}\,}}_p \ne 0\) for all \(p \in P\) due to the presence of zero-divisors, which are elements \(s, s' \in \mathbb S {\setminus } \{0\}\) such that \(s \cdot s' = 0\). However, we can fortunately avoid such anomalies altogether utilizing a construction of [19], which has been lifted to tree automata in [9].

Lemma 3

For every WTGc G there exists a WTGc \(G' = (Q', \varSigma , F', P', {{{\,\textrm{wt}\,}}'})\) that is equivalent and \({{\,\textrm{wt}\,}}'_{G'}(d') \ne 0\) for all \(q' \in Q'\), \(t' \in T_\varSigma \), and \(d' \in D_{G'}^{q'}(t')\). This also applies to positive WTGc, classic WTGc, and classic positive WTGc as well as the same WTAc. The construction also preserves Boolean final weights.

Proof

Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\). Obviously, \((\mathbb S, \cdot , 1, 0)\) is a commutative monoid with zero. Let \((s_{1}, \dotsc , s_{n})\) be an enumeration of the finite set \({{\,\textrm{wt}\,}}(P) \setminus \{1\} \subseteq \mathbb S\). We consider the monoid homomorphism \(h :\mathbb N^n \rightarrow \mathbb S\), which is given by

$$ h(m_{1}, \dotsc , m_{n}) = \prod _{i = 1}^n s_i^{m_i} $$

for every \(m_{1}, \dotsc , m_{n} \in \mathbb N\). According to Dickson’s lemma [6] the set \(\min h^{-1}(0)\) is finite, where the partial order is the standard pointwise order on \(\mathbb N^n\). Hence there is \(u \in \mathbb N\) such that \(\min h^{-1}(0) \subseteq \{0, \dotsc , u\}^n = U\). We define the operation \({\oplus } :U^2 \rightarrow U\) by \((v \oplus v')_i = \min (v_i + v'_i, u)\) for every \(v, v' \in U\) and \(i \in [n]\). Moreover, for every \(i \in [n]\) we let \(1_{s_i} \in U\) be the vector such that \((1_{s_i})_i = 1\) and \((1_{s_i})_a = 0\) for all \(a \in [n] \setminus \{i\}\). Let \(V = U {\setminus } h^{-1}(0)\). We construct the equivalent WTGc \(G'\) such that \(Q' = Q \times V\), \(F'_{\langle q, v\rangle } = F_q\) for all \(\langle q, v \rangle \in Q'\), and \(P'\) and \({{\,\textrm{wt}\,}}'\) are given as follows. For every production

$$ p = c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E,I}}q \in P $$

and all \(v_{1}, \dotsc , v_{k} \in V\) such that \(v = 1_{{{\,\textrm{wt}\,}}_p} \oplus \bigoplus _{i = 1}^k v_i \in V\) the production

$$ c \bigl [\langle q_1, v_1\rangle , \dotsc , \langle q_k, v_k\rangle \bigr ] {\mathop {\longrightarrow }\limits ^{E,I}}\langle q, v \rangle $$

belongs to \(P'\) and its weight is \({{\,\textrm{wt}\,}}'_{p'} = {{\,\textrm{wt}\,}}_p\). No further productions are in \(P'\). The construction trivially preserves the properties positive, classic, and normalized. For correctness, let \(q' = \langle q, v\rangle \in Q'\), \(t' \in T_\varSigma \), and \(d' \in D_{G'}^{q'}(t')\). We suitably (for the purpose of zero-divisors) track the weight of the derivation in v and \(h_v \ne 0\) by definition. Consequently, \({{\,\textrm{wt}\,}}'_{G'}(d') \ne 0\) as required. We note that possibly \({{\,\textrm{wt}\,}}_{G'}(d') \ne h_v\). \(\square \)

For zero-sum free semirings [16, 17] we obtain that the support \({{\,\textrm{supp}\,}}(G)\) of an WTGc can be generated by a TGc. A semiring is zero-sum free if \(s = 0 = s'\) for every \(s, s' \in \mathbb S\) such that \(s + s' = 0\). Clearly, rings are never zero-sum free, but the mentioned semirings \(\mathbb B\), \(\mathbb N\), \(\mathbb T\), and \(\mathbb A\) are all zero-sum free.

Corollary 1

(of Lemmas 2and 3) If \(\mathbb S\) is zero-sum free, then \({{\,\textrm{supp}\,}}(G)\) is (positive, classic) constraint-regular for every (respectively, positive, classic) WTGc G.

Proof

We apply Lemma 2 to obtain an equivalent WTGc with Boolean final weights and then Lemma 3 to obtain the WTGc \(G' = (Q', \varSigma , F', P', {{{\,\textrm{wt}\,}}'})\) with Boolean final weights. As mentioned we can assume that \({{\,\textrm{wt}\,}}'_{p'} \ne 0\) for all \(p' \in P'\). Let \(q' \in {{\,\textrm{supp}\,}}(F')\) and \(t' \in T_\varSigma \) with \(D_{G'}^{q'}(t')\ne \emptyset \). Since \({{\,\textrm{wt}\,}}'_{G'}(d') \ne 0\) for every derivation \(d' \in D_{G'}^{q'}(t')\) and \(s + s' \ne 0\) for all \(s, s' \in \mathbb S \setminus \{0\}\) due to zero-sum freeness, we obtain \(t' \in {{\,\textrm{supp}\,}}(G')\). Thus, the existence of a complete derivation for \(t'\) to an accepting nonterminal (i.e., one with final weight 1) characterizes whether we have \(t' \in {{\,\textrm{supp}\,}}(G')\). Consequently, the TGc \(\bigl (Q', \varSigma , {{\,\textrm{supp}\,}}(F'), P' \bigr )\) generates the tree language \({{\,\textrm{supp}\,}}(G')\), which is thus constraint-regular. The properties positive and classic are preserved in all the constructions. \(\square \)

4 Closure Properties

Next we investigate several closure properties of the constraint-regular weighted tree languages. We start with the (point-wise) sum, which is given by \((A + A')_t = A_t + A'_t\) for every \(t \in T_\varSigma \) and \(A, A' \in \mathbb S^{T_\varSigma }\). Given WTGc G and \(G'\) generating A and \(A'\) we can trivially use a disjoint union construction to obtain a WTGc generating \(A + A'\). We omit the details.

Proposition 1

The (positive, classical) constraint-regular weighted tree languages (over a fixed ranked alphabet) are closed under sums. \(\square \)

The corresponding (point-wise) product is the Hadamard product, which is given by \((A \cdot A')_t = A_t \cdot A'_t\) for every \(t \in T_\varSigma \) and \(A, A' \in \mathbb S^{T_\varSigma }\). With the help of a standard product construction we show that the (positive) constraint-regular weighted tree languages are also closed under Hadamard product. As preparation we introduce a special normal form. A WTAc \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) is constraint-determined if \(E = E'\) and \(I = I'\) for all productions

$$ \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E, I}}q \in P \quad \text {and} \quad \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E', I'}}q \in P \hspace{5.0pt}. $$

In other words, two productions cannot differ only in the sets of constraints. It is straightforward to turn any (positive) WTAc into an equivalent constraint-determined (positive) WTAc by introducing additional nonterminals (e.g. annotate the constraints to the nonterminal on the right-hand side).

Theorem 1

The (positive) constraint-regular weighted tree languages (over a fixed ranked alphabet) are closed under Hadamard products.

Proof

Let \(A, A' \in \mathbb S^{T_\varSigma }\) be constraint-regular. Without loss of generality (see Lemma 1) we can assume constraint-determined WTAc

$$ G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}}) \qquad \text {and} \qquad G' = (Q', \varSigma , F', P', {{{\,\textrm{wt}\,}}'}) $$

that generate A and \(A'\), respectively. We construct the direct product WTAc

$$\begin{aligned} G \times G' = (Q \times Q', \varSigma , F'', P'', {{{\,\textrm{wt}\,}}''}) \end{aligned}$$

such that \(F''_{\langle q, q'\rangle } = F_q \cdot F'_{q'}\) for every \(q \in Q\) and \(q' \in Q'\) and for every production \(p = \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E, I}}q \in P\) and production \(p' = \sigma (q'_{1}, \dotsc , q'_{k}) {\mathop {\longrightarrow }\limits ^{E', I'}}q' \in P'\) the production

$$ p'' = \sigma \bigl ( \langle q_1, q'_1\rangle , \dotsc , \langle q_k, q'_k\rangle \bigr ) {\mathop {\longrightarrow }\limits ^{E \cup E', I \cup I'}}\langle q, q'\rangle $$

belongs to \(P''\) and its weight is \({{\,\textrm{wt}\,}}''_{p''} = {{\,\textrm{wt}\,}}_p \cdot {{\,\textrm{wt}\,}}'_{p'}\). No other productions belong to \(P''\). It is straightforward to see that the property positive is preserved. The correctness proof that \(G \times G' = A \cdot A'\) is a straightforward induction proving

$$ {{\,\textrm{wt}\,}}_{G \times G'}^{\langle q, q'\rangle }(t) = {{\,\textrm{wt}\,}}_G^q(t) \cdot {{\,\textrm{wt}\,}}_{G'}^{q'}(t) $$

for all \(t \in T_\varSigma \) using the initial algebra semantics. The WTAc G and \(G'\) are required to be constraint-determined, so that we can uniquely identify the basic productions \(p \in P\) and \(p' \in P'\) that construct a newly formed production \(p'' \in P''\).

We can obtain a constraint-determined WTAc at the expense of a polynomial increase in the number of productions (assuming that the ranked alphabet of input symbols is fixed). Let \(r = \max _{\sigma \in \varSigma } {{\,\textrm{rk}\,}}(\sigma )\) be the maximal rank of an input symbol and \(c = |P |\) be the number of productions of the given WTAc \(G = (Q, \varSigma , F, P, \mathord {{{\,\textrm{wt}\,}}})\). First, we modify the target nonterminal q of each production \(\rho = (\ell , q, E, I) \in P\) to additionally include the identifier \(\rho \), which yields the production \((\ell , \langle q, \rho \rangle , E, I)\). This effectively yields the new nonterminal set \(Q \times P\), which has size \(|Q | \cdot c\). Then the copies of the production \((\sigma (q_{1}, \dotsc , q_{k}), \langle q, \rho \rangle , E, I)\) form the set

$$ \Bigl \{ \bigl (\sigma (\langle q_1, \rho _1\rangle , \dotsc , \langle q_k, \rho _k \rangle ), \langle q, \rho \rangle , E, I \bigr ) \;\Big |\; \rho _{1}, \dotsc , \rho _{k} \in P \Bigr \} \hspace{5.0pt}. $$

Clearly, this turns each production into at most \(c^r\) productions since \(k \le r\), so the overall number of productions after all replacements is at most \(c^{r+1}\). The product construction itself is then quadratic. \(\square \)

We note that the previous construction also works for classic WTAc.

Example 3

Let \(G = \bigl (\{q\}, \varSigma , F, P, \mathord {{{\,\textrm{wt}\,}}} \bigr )\) and \(G' = \bigl (\{z\}, \varSigma , F', P', \mathord {{{\,\textrm{wt}\,}}'} \bigr )\) be WTAc over \(\mathbb A\) and \(\varSigma = \{\alpha ^{(0)}, \gamma ^{(1)}, \sigma ^{(2)}\}\), \(F_q = F'_z = 0\), and the productions

$$\begin{aligned} \alpha \rightarrow _0 q \qquad \qquad \gamma (q) \rightarrow _2 q \qquad \qquad \sigma (q, q) {\mathop {\longrightarrow }\limits ^{1=2}}_0 q \qquad \qquad (P) \end{aligned}$$
$$\begin{aligned} \alpha \rightarrow _0 z \qquad \qquad \gamma (z) {\mathop {\longrightarrow }\limits ^{11 \ne 12}}_1 z \qquad \qquad \sigma (z, z) \rightarrow _1 z \, . \qquad \qquad (P') \end{aligned}$$

We observe that

$$\begin{aligned} {{\,\textrm{supp}\,}}(G) = \bigl \{t \in T_\varSigma \mid \forall w \in {{\,\textrm{pos}\,}}_\sigma (t) :t|_{w1} = t|_{w2} \bigr \} \end{aligned}$$
$$\begin{aligned} {{\,\textrm{supp}\,}}(G') = \bigl \{t \in T_\varSigma \mid \forall w \in {{\,\textrm{pos}\,}}_\gamma (t) :\text { if } t(w1) = \sigma \text { then } t|_{w11} \ne t|_{w12} \bigr \} \end{aligned}$$

and \(G_t = 2|{{\,\textrm{pos}\,}}_\gamma (t) |\) as well as \(G'_{t'} = |{{\,\textrm{pos}\,}}_\gamma (t') | + |{{\,\textrm{pos}\,}}_\sigma (t') |\) for every tree \(t \in {{\,\textrm{supp}\,}}(G)\) and tree \(t' \in {{\,\textrm{supp}\,}}(G')\). We obtain the WTAc \(G \times G' = \bigl (\{\langle q,z\rangle \}, \varSigma , F'', P'', {{{\,\textrm{wt}\,}}''}\bigr )\) with \(F''_{\langle q, z\rangle } = 0\) and the following productions.

$$ \alpha \rightarrow _0 \langle q,z\rangle \qquad \gamma \bigl (\langle q,z\rangle \bigr ) {\mathop {\longrightarrow }\limits ^{11\ne 12}}_3 \langle q, z\rangle \qquad \sigma \bigl ( \langle q,z\rangle , \langle q,z\rangle \bigr ) {\mathop {\longrightarrow }\limits ^{1=2}}_1 \langle q,z\rangle $$

Hence we obtain the equality \((G \times G')_t = 3|{{\,\textrm{pos}\,}}_\gamma (t) | + |{{\,\textrm{pos}\,}}_\sigma (t) | = G_t \cdot G'_t\) for every tree \(t \in {{\,\textrm{supp}\,}}(G) \cap {{\,\textrm{supp}\,}}(G')\). \(\square \)

Next, we use an extended version of the classical power set construction to obtain an unambiguous WTAc that keeps track of the reachable nonterminals, but preserves only the homomorphic image of its weight. The unweighted part of the construction mimics a power-set construction and the handling of constraints roughly follows [15, Definition 3.1].

Theorem 2

Let \(h \in \mathbb {T}^{\mathbb S}\) be a semiring homomorphism into a finite semiring \(\mathbb T\). For every (classic) WTAc \(G = (Q, \varSigma , F, P, \mathord {{{\,\textrm{wt}\,}}})\) over \(\mathbb S\) there exists an unambiguous (classic) WTAc \(G' = (\mathbb {T}^Q, \varSigma , F', P', {{{\,\textrm{wt}\,}}'})\) such that for every tree \(t \in T_\varSigma \) and \(\varphi \in \mathbb {T}^Q\)

$$ {{\,\textrm{wt}\,}}_{G'}^\varphi (t) = {\left\{ \begin{array}{ll} 1 &{} if \varphi _q = h \bigl ({{\,\textrm{wt}\,}}_G^q(t) \bigr ) for all q \in Q \\ 0 &{} otherwise. \end{array}\right. } $$

Moreover, \(G'_t = h(G_t)\) for every \(t \in T_\varSigma \).

Proof

For every \(\sigma \in \varSigma \), let

$$ \mathscr {C}_\sigma = \bigl \{ E \mid \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P \bigr \} \cup \bigl \{ I \mid \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P \bigr \} $$

be the constraints that occur in productions of G whose left-hand side contains \(\sigma \). We let \(F'_\varphi = \sum _{q \in Q} h(F_q) \cdot \varphi _q\) for every \(\varphi \in \mathbb T^Q\). For all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), nonterminals \(\varphi ^1, \dotsc , \varphi ^k \in \mathbb {T}^Q\), and constraints \(\mathscr {E} \subseteq \mathscr {C}_\sigma \) we let \(p' = \sigma (\varphi ^1, \dotsc \varphi ^k) {\mathop {\longrightarrow }\limits ^{\mathscr {E}, \mathscr {I}}}\varphi \in P'\), where \(\mathscr {I} = \mathscr {C}_\sigma \setminus \mathscr {E}\) and for every \(q \in Q\)

$$\begin{aligned} \varphi _q = \sum _{\begin{array}{c} p = \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E, I}}q \in P \\ E \subseteq \mathscr {E},\, I \subseteq \mathscr {I} \end{array}} h({{\,\textrm{wt}\,}}_p) \cdot \varphi ^1_{q_1} \cdot \ldots \cdot \varphi ^k_{q_k} \hspace{5.0pt}. \end{aligned}$$
(2)

No additional productions belong to \(P'\). Finally, we set \(\mathord {{{\,\textrm{wt}\,}}'_{p'}} = 1\) for all \(p' \in P'\). In general, the WTAc \(G'\) is certainly not deterministic due to the choice of constraints, but \(G'\) is unambiguous since the resulting \(2^{|\mathscr {C}_\sigma |}\) rules for each left-hand side have mutually exclusive constraint sets. In fact, for each \(t\in T_\varSigma \) there is exactly one left-most complete derivation of \(G'\) for t, and it derives to \(\varphi \in \mathbb {T}^Q\) such that \(\varphi _q = h \bigl ({{\,\textrm{wt}\,}}_G^q(t) \bigr )\) for every \(q \in Q\). The weight of that derivation is 1. These statements are proven inductively. The final statement \(G'_t = h(G_t)\) for every \(t \in T_\varSigma \) is an easy consequence of the previous statements. If G is classic, then also the constructed WTAc \(G'\) is classic. \(\square \)

Example 4

Recall the WTAc G and \(G'\) from Example 3. Consider the WTAc generating their disjoint union, as well as the semiring homomorphism \(h \in \mathbb {B}^{\mathbb A}\) given by \(h_a = 1\) for all \(a \in \mathbb {A} {\setminus } \{-\infty \}\) and \(h_{-\infty } = 0\). The sets \(\mathscr {C}_\gamma \) and \(\mathscr {C}_\sigma \) of utilized constraints are \(\mathscr {C}_\gamma = \bigl \{(11, 12) \bigr \}\) and \(\mathscr {C}_\sigma = \bigl \{(1,2) \bigr \}\), and we write \(\varphi \in \mathbb {B}^Q\) simply as subsets of Q. We obtain the unambiguous WTAc \(G''\) with the following sensible (i.e., having satisfiable constraints) productions for all \(Q', Q'' \subseteq \{q, z\}\), which all have weight 1.

$$\begin{aligned} \alpha&\longrightarrow \{q, z\} \\* \gamma (Q')&\overset{11 = 12}{\longrightarrow }Q' \cap \{q\}&\gamma (Q')&\overset{11 \ne 12}{\longrightarrow }Q' \\* \sigma (Q', Q'')&\overset{1=2}{\longrightarrow }Q' \cap Q''&\sigma (Q', Q'')&\overset{1 \ne 2}{\longrightarrow }Q' \cap Q'' \cap \{z\} \end{aligned}$$

Each \(t \in T_\varSigma \) has exactly one left-most complete derivation in \(G''\); it derives to \(Q'\), where (i) \(q \in Q'\) iff \(t \in {{\,\textrm{supp}\,}}(G)\) and (ii) \(z \in Q'\) iff \(t \in {{\,\textrm{supp}\,}}(G')\). It is \(F''_\emptyset =0\) and \(F''_Q=1\) for all non-empty \(Q\subseteq \{q,z\}\).\(\square \)

Corollary 2

(of Theorem 2) Let \(\mathbb S\) be finite. For every (classic) WTAc over \(\mathbb S\) there exists an equivalent unambiguous (classic) WTAc. \(\square \)

Corollary 3

(of Theorem 2) Let \(\mathbb S\) be zero-sum free. For every (classic) WTAc G over \(\mathbb S\) there exists an unambiguous (classic) TAc generating \({{\,\textrm{supp}\,}}(G)\).

Proof

Utilizing Lemma 2 we can first construct an equivalent WTAc with Boolean final weights. If \(\mathbb S\) is zero-sum free, then there exists a semiring homomorphism \(h \in \mathbb B^{\mathbb S}\) by [30]. By Lemma 3 we can assume that each derivation of G has non-zero weight and sums of non-zero elements remain non-zero by zero-sum freeness. Thus we can simply replace the factor \(h({{\,\textrm{wt}\,}}_p)\) by 1 in (2). The such obtained TAc generates \({{\,\textrm{supp}\,}}(G)\). \(\square \)

Corollary 4

(of Theorem 2) Let \(\mathbb S\) be zero-sum free. For every (classic) WTAc G over \(\mathbb S\) there exists an unambiguous (classic) TAc generating \(T_\varSigma \setminus {{\,\textrm{supp}\,}}(G)\).

Proof

Let \(G' = (Z, \varSigma , Z_0, P')\) be the unambiguous TAc given by Corollary 3. Since \(G'\) is also complete in the sense that every input tree has a derivation, the desired unambiguous TAc \(G''\) is simply \(G'' = (Z, \varSigma , Z\setminus Z_0, P')\). \(\square \)

Let \(A, A' \in \mathbb {S}^{T_\varSigma }\). It is often useful (see [15, Definition 4.11]) to restrict A to the support of \(A'\) but without changing the weights of those trees inside the support. Formally, we define \(A|_{{{\,\textrm{supp}\,}}(A')} \in \mathbb S^{T_\varSigma }\) for every \(t \in T_\varSigma \) by \(A|_{{{\,\textrm{supp}\,}}(A')}(t) = A_t\) if \(t \in {{\,\textrm{supp}\,}}(A')\) and \(A|_{{{\,\textrm{supp}\,}}(A')}(t) = 0\) otherwise. Utilizing unambiguous WTAc and the Hadamard product, we can show that \(A|_{{{\,\textrm{supp}\,}}(A')}\) is constraint-regular if A and \(A'\) are constraint-regular and the semiring \(\mathbb {S}\) is zero-sum free.

Theorem 3

Let \(\mathbb {S}\) be zero-sum free. For all (classic) WTAc G and \(G'\) there exists a (classic) WTAc H such that \(H = G|_{{{\,\textrm{supp}\,}}(G')}\).

Proof

By Corollary 1 the support \({{\,\textrm{supp}\,}}(G')\) is constraint-regular. Hence we can obtain an unambiguous WTAc \(G''\) for \({{\,\textrm{supp}\,}}(G')\) using Corollary 3. Without loss of generality we assume that both G and \(G''\) are constraint-determined; we note that the normalization preserves unambiguous WTAc. Finally we construct \(G \times G''\), which by Theorem 1 generates exactly \(G|_{{{\,\textrm{supp}\,}}(G')}\) as required. \(\square \)

In the following, we establish a special property for classic WTGc. To this end, we first need another notion. Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTGc. A nonterminal \(\bot \in Q\) is a sink nonterminal (in G) if \(F_\bot = 0\) and

$$ \bigl \{ \sigma (\bot , \dotsc , \bot ) \rightarrow _1 \bot \mid \sigma \in \varSigma \bigr \} = \bigl \{ \ell {\mathop {\longrightarrow }\limits ^{E,I}}_s q \in P \mid q = \bot \bigr \} \hspace{5.0pt}. $$

In other words, for every sink nonterminal \(\bot \) the production \(\sigma (\bot , \dotsc , \bot ) \rightarrow \bot \) belongs to P with weight 1 for every symbol \(\sigma \in \varSigma \). Additionally, no other productions have the sink nonterminal \(\bot \) as target nonterminal. Given a set \(E \subseteq \mathbb N^* \times \mathbb N^*\) of equality constraints, we let \(\mathord {\equiv }_E = (E \cup E^{-1} \cup {{{\,\textrm{id}\,}}})^{\scriptscriptstyle +}\) be the smallest equivalence relation containing E (where the superscript \({}^{\scriptscriptstyle +}\) denotes the transitive closure), and \([w]_{\mathord {\equiv }_E}\) be the equivalence class of \(w \in \mathbb N^*\). Additionally, for every \(c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\) we let

$$ c(E) = \bigl \{(i,j) \in [k] \times [k] \mid (v, v') \in E,\, c(v) = x_i,\, c(v') = x_j \bigr \} $$

be a representation of the equality constraints on the indices [k].

Definition 5

A classic WTGc \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) is eq" restricted if there exists a sink nonterminal \(\bot \in Q\) such that for every production \(p = c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\) and index \(i \in [k]\) there exists a nonterminal \(q' \in Q\) such that

  1. 1.

    \(\{q_j \mid j \in [i]_{\equiv _{c(E)}} \} \subseteq \{q', \bot \}\) and

  2. 2.

    there exists exactly one index \(j \in [i]_{\equiv _{c(E)}}\), also called governing index for i in p, such that \(q_j = q'\).

The mapping \(g_p :[k] \rightarrow [k]\) assigns to each index \(i \in [k]\) its governing index for i in p. \(\square \)

In other words, in an eq" restricted WTGc one subtree is generated normally by the WTGc and all the subtrees that are required to be equal by means of the equality constraints are generated by the sink nonterminal \(\bot \), which can generate any tree with weight 1. In this manner, the restrictions on subtree and weight generation induced by the WTGc are exhibited completely on a single subtree and the “copies” are only provided by the equality constraint, but not further restricted by the WTGc. We will continue to use \(\bot \) for the suitable sink nonterminal of an eq" restricted WTGc.

Finally, we show that the weighted tree languages generated by eq" restricted positive WTGc are closed under relabelings. A relabeling is a tree homomorphism of the form \(\pi \in T_\varDelta (X)^\varSigma \) such that for every \(k \in \mathbb {N}\) and \(\sigma \in \varSigma _k\) there exists \(\delta \in \varDelta _k\) with \(\pi _\sigma = \delta (x_{1}, \dotsc , x_{k})\). In other words, a relabeling deterministically replaces symbols respecting their rank. We often specify a relabeling just as a mapping \(\pi \in \varDelta ^\varSigma \) such that \(\pi _\sigma \in \varDelta _k\) for every \(k \in \mathbb {N}\) and \(\sigma \in \varSigma _k\).

Theorem 4

The weighted tree languages generated by eq-restricted positive WTGc are closed under relabelings.

Proof

Let WTGc \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be an eq" restricted positive WTGc with sink nonterminal \(\bot \). Without loss of generality, suppose that \(\varSigma \cap X = \emptyset \), and let \(\pi \in \varDelta ^\varSigma \) be a relabeling. We first extend \(\pi \) to a mapping \(\pi ' \in (\varDelta \cup X)^{\varSigma \cup X}\), in which we treat the elements of X as nullary symbols, for every \(\sigma \in \varSigma \) and \(x \in X\) by \(\pi '_\sigma = \pi _\sigma \) and \(\pi '_x = x\). Let \(G' = (Q, \varDelta , F, P', {{{\,\textrm{wt}\,}}}')\) be the eq" restricted positive WTGc such that

$$\begin{aligned} P'&= \Bigl \{ \pi '(c)[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \mid c[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \in P,\, q \ne \bot \Bigr \} \\&\quad \cup \Bigl \{ \delta (\bot , \dotsc , \bot ) \rightarrow \bot \mid \delta \in \varDelta \Bigr \} \end{aligned}$$

and for every production \(p' = c'[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \in P'\) with \(q \ne \bot \) we let

$$\begin{aligned} {{\,\textrm{wt}\,}}'_{p'} = \sum _{\begin{array}{c} p = c[q_{1}, \dotsc , q_{k}] \overset{E,\emptyset }{\longrightarrow }q \in P \\ c \in (\pi ')^{-1}(c') \end{array}} {{\,\textrm{wt}\,}}_p \hspace{5.0pt}. \end{aligned}$$
(3)

Finally, \({{\,\textrm{wt}\,}}' \bigl (\delta (\bot , \dotsc , \bot ) \rightarrow \bot \bigr ) = 1\) for all \(\delta \in \varDelta \). Since the relabeling replaces symbols by symbols, the size of \(G'\) is linear in the size of G. For the weight function \({{\,\textrm{wt}\,}}'\), we must compute the sum  (3), which can be achieved by accumulating the weight during the construction of the new productions, so the overall time complexity remains linear. For correctness we prove the following equality for every \(u \in T_\varDelta \) and \(q \in Q\) by induction on u

$$\begin{aligned} {{\,\textrm{wt}\,}}_{G'}^q(u) = {\left\{ \begin{array}{ll} \sum _{t \in \pi ^{-1}(u)} {{\,\textrm{wt}\,}}_G^q(t) &{} \text {if } q \ne \bot \\ 1 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(4)

The second case is immediate since there is a single derivation, namely the one utilizing only nonterminal \(\bot \), for u to \(\bot \) and its weight is 1. In the remaining case we have \(q \ne \bot \). Then

$$\begin{aligned}&\qquad {{\,\textrm{wt}\,}}_{G'}^q(u) \\&\overset{(1)}{=} \sum _{\begin{array}{c} p' = c'[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \in P' \\ u_{1}, \dotsc , u_{k} \in T_\varDelta \\ u = c'[u_{1}, \dotsc , u_{k}] \\ u \models E \end{array}} {{\,\textrm{wt}\,}}'_{p'} \cdot \prod _{i = 1}^k {{\,\textrm{wt}\,}}_{G'}^{q_i}(u_i) \\&\overset{\text {IH}}{=} \sum _{\begin{array}{c} p' = c'[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \in P' \\ u_{1}, \dotsc , u_{k} \in T_\varDelta \\ u = c'[u_{1}, \dotsc , u_{k}] \\ u \models E \end{array}} {{\,\textrm{wt}\,}}'_{p'} \cdot \prod _{\begin{array}{c} i \in [k] \\ q_i \ne \bot \end{array}} \Bigl ( \sum _{t_i \in \pi ^{-1}(u_i)} {{\,\textrm{wt}\,}}_G^{q_i}(t_i) \Bigr ) \cdot \prod _{\begin{array}{c} i \in [k] \\ q_i = \bot \end{array}} 1\quad . \end{aligned}$$

Recall that \(g_p :[k] \rightarrow [k]\) assigns to each index its governing index. For better readability, we write just \(g'\). Note that due to the special form of substitution we automatically fulfill \(u \models E\) and can thus drop it.

$$\begin{aligned}&\overset{(3)}{=} \sum _{\begin{array}{c} p' = c'[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \in P' \\ \forall i \in {{\,\textrm{ran}\,}}(g') :u_i \in T_\varDelta ,\, t_i \in \pi ^{-1}(u_i) \\ u = c'[u_{g'(1)}, \dotsc , u_{g'(k)}] \end{array}} \Bigl (\sum _{\begin{array}{c} p = c[q_{1}, \dotsc , q_{k}] \overset{E, \emptyset }{\longrightarrow }q \in P \\ c \in (\pi ')^{-1}(c') \end{array}} {{\,\textrm{wt}\,}}_p \Bigr ) \cdot \prod _{i \in {{\,\textrm{ran}\,}}(g')} {{\,\textrm{wt}\,}}_G^{q_i}(t_i) \end{aligned}$$

We note that \(g_{p'} = g_p\) for all used productions p, so we just write g. Additionally, for every \(q_i\) with \(i \in [k] {\setminus } {{\,\textrm{ran}\,}}(g)\) we have \(q_i = \bot \) and thus \({{\,\textrm{wt}\,}}_G^{q_i}(t_{g(i)}) = 1\) because there is exactly one such derivation with weight 1.

$$\begin{aligned}&= \sum _{\begin{array}{c} p = c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q \in P \\ \forall i \in {{\,\textrm{ran}\,}}(g) :t_i \in T_\varSigma \\ u = \pi (c[t_{g(1)}, \dotsc , t_{g(k)}]) \end{array}} {{\,\textrm{wt}\,}}_p \cdot \prod _{i = 1}^k {{\,\textrm{wt}\,}}_G^{q_i}(t_{g(i)}) \\&= \sum _{t \in \pi ^{-1}(u)} \Biggl ( \sum _{\begin{array}{c} p = c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q \in P \\ t_{1}, \dotsc , t_{k} \in T_\varSigma \\ t = c[t_{1}, \dotsc , t_{k}] \\ t \models E \end{array}} {{\,\textrm{wt}\,}}_p \cdot \prod _{i = 1}^k {{\,\textrm{wt}\,}}_G^{q_i}(t_i) \Biggr ) {\mathop {=}\limits ^{(1)}} \sum _{t \in \pi ^{-1}(u)} {{\,\textrm{wt}\,}}_G^q(t) \end{aligned}$$

We complete the proof for every \(u \in T_\varDelta \) as follows.

$$\begin{aligned} G'_u&= \sum _{q \in Q} F_q \cdot {{\,\textrm{wt}\,}}_{G'}^q(u) {\mathop {=}\limits ^{(4)}} \sum _{q \in Q \setminus \{\bot \}} F_q \cdot \Bigl (\sum _{t \in \pi ^{-1}(u)} {{\,\textrm{wt}\,}}_G^q(t) \Bigr ) = \sum _{t \in \pi ^{-1}(u)} \Bigl (\sum _{q \in Q} F_q \cdot {{\,\textrm{wt}\,}}_G^q(t) \Bigr ) \\&= \sum _{t \in \pi ^{-1}(u)} G_t \qquad {\square } \end{aligned}$$

5 Towards the HOM Problem

The strategy of [15] for deciding the HOM problem first represents the homomorphic image \(L' = h(L)\) of the regular tree language L with the help of an WTGc \(G'\). For deciding whether \(L'\) is regular, a tree automaton \(G''\) simulating the behavior of \(G'\) up to a certain bounded height is constructed. If the automata \(G'\) and \(G''\) are equivalent, i.e., \(G'' = G'\), then \(L'\) is regular. In the remaining case, pumping arguments are used to prove that it is impossible to find any TA for \(L'\). Overall, this reduces the HOM problem to an equivalence problem.

Towards solving the HOM problem in the weighted case we now proceed similarly. First, we show that WTGc can encode each (well-defined) homomorphic image of a regular weighted tree language. This ability motivated their definition in the unweighted case [15, Proposition 4.6], and it also applies in the weighted case with minor restrictions that just enforce that all obtained sums are finite.

Theorem 5

Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTA and \(h \in T_\varDelta ^{T_\varSigma }\) be a nondeleting and nonerasing tree homomorphism. There exists an eq" restricted positive WTGc \(G'\) with \(G' = h(G)\).

Proof

We recall that we also use h for the mapping \(h \in T_\varDelta (X)^\varSigma \) inducing the tree homomorphism h. We construct a WTGc \(G'\) for h(G) in two stages. First, let

$$ G'' = \bigl (Q \cup \{\bot \}, \varDelta \cup \varDelta \times P, F'', P'', {{{\,\textrm{wt}\,}}''} \bigr ) $$

such that for every \(p = \sigma (q_{1}, \dotsc , q_{k}) \rightarrow q \in P\) and \(h_\sigma = u = \delta (u_{1}, \dotsc , u_{n})\),

$$\begin{aligned} p''&= \Bigl ( \langle \delta ,p\rangle (u_{1}, \dotsc , u_{n}) \llbracket q_{1}, \dotsc , q_{k}\rrbracket {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q \Bigr ) \in P'' \end{aligned}$$

with \(E = \bigcup _{i \in [k]} {{\,\textrm{pos}\,}}_{x_i}(u)^2\), in which the substitution \(\langle \delta ,p\rangle (u_{1}, \dotsc , u_{n}) \llbracket q_{1}, \dotsc , q_{k} \rrbracket \) replaces for every \(i \in [k]\) only the left-most occurrence of \(x_i\) in \(\langle \delta ,p\rangle (u_{1}, \dotsc , u_{n})\) by \(q_i\) and all other occurrences by \(\bot \). Moreover \({{\,\textrm{wt}\,}}''_{p''} = {{\,\textrm{wt}\,}}_p\). Additionally, we let

$$ p''_\delta = \delta (\bot , \dotsc , \bot ) \rightarrow \bot \in P'' $$

with weight \({{\,\textrm{wt}\,}}''_{p''_\delta } = 1\) for every \(k \in \mathbb N\) and \(\delta \in \varDelta _k \cup \varDelta _k \times P\). No other productions are in \(P''\). Finally, we let \(F''_q = F_q\) for all \(q \in Q\) and \(F''_\bot = 0\). Obviously, \(G''\) is eq" restricted and positive.

In order to better describe the behaviour of \(G''\), let us introduce the following notation. Given a tree \(t = \sigma (t_{1}, \dotsc , t_{k}) \in T_\varSigma \) and a complete left-most derivation \(d = (p_1, w_1) \cdots (p_m, w_m)\) of G for t, let \(d_{1}, \dotsc , d_{k}\) be the derivations for \(t_{1}, \dotsc , t_{k}\), respectively that are incorporated in d and \(h_\sigma = \delta (u_{1}, \dotsc , u_{n})\). Then we define the tree \(h(t, d) \in T_{\varDelta \cup \varDelta \times P}\) inductively by

$$ h(t, d) = \langle \delta , p_m \rangle (u_{1}, \dotsc , u_{n}) \big [h(t_1, d_1), \dotsc , h(t_k, d_k) \big ] \hspace{5.0pt}. $$

Using this notation, let us now prove that for each \(q\in Q\) we have

$$\begin{aligned} \bigl \{s \in T_{\varDelta \cup \varDelta \times P} \mid D_{G''}^q(s) \ne \emptyset \bigr \} = \bigl \{ h(t, d) \mid t \in T_\varSigma , d \in D_G^q(t) \bigr \} \end{aligned}$$
(5)

and, in turn, every such \(D_{G''}^q(s)\) is a singleton set with \({{\,\textrm{wt}\,}}_{G''}(d'') = {{\,\textrm{wt}\,}}_G(d)\) for the unique \(d'' \in D_{G''}^q \big (h(t, d) \big )\).

We start with the inclusion from right to left. To this end, let \(t \in T_\varSigma \) be a tree and \(d = (p_1, w_1) \cdots (p_m, w_m)\) be a complete left-most derivation of G for t to some nonterminal \(q \in Q\). Let \(t = \sigma (t_{1}, \dotsc , t_{k})\) be the input tree with \(h_\sigma = \delta (u_{1}, \dotsc , u_{n})\), let \(p_m = \sigma (q_{1}, \dotsc , q_{k}) \rightarrow q\) be the production utilized last in d, and let \(d_i\) be the complete left-most derivation for \(t_i\) to \(q_i\) incorporated in d for every \(i \in [k]\). For every \(i \in [k]\), we utilize the induction hypothesis to conclude that \(D_{G''}^{q_i} \bigl (h(t_i, d_i) \bigr )\) is a singleton set, so let \(d''_i \in D_{G''}^{q_i} \bigl (h(t_i, d_i) \bigr )\) be the unique element, for which we additionally have \({{\,\textrm{wt}\,}}_{G''}(d''_i) = {{\,\textrm{wt}\,}}_G(d_i)\). Moreover, for every \(i \in [k]\) there is a derivation \(d^\bot _i\) for \(h(t_i, d_i)\) with weight 1 that exclusively utilizes the nonterminal \(\bot \). We define

$$ s = \langle \delta , p_m\rangle (u_{1}, \dotsc , u_{n}) \bigl [h(t_1, d_1), \dotsc , h(t_k, d_k) \bigr ] \hspace{5.0pt}. $$

For every \(i \in [k]\), let \(v_i\) be the left-most position labeled by \(x_i\) in \(h_\sigma \). We consider the derivations \(v_1d_1'', \dotsc , v_kd_k''\), and for every other occurrence v of \(x_i\) in \(h_\sigma \) we consider the derivation \(vd^\bot _i\). Let \(d''\) be the derivation assembled from the considered subderivations followed by \((p''_m, \varepsilon )\), where \(p''_m = \langle \delta , p_m\rangle (u_{1}, \dotsc , u_{n}) \llbracket q_{1}, \dotsc , q_{k} \rrbracket {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q\) with the constraints \(E = \bigcup _{i=1}^k {{\,\textrm{pos}\,}}_{x_i}(h_\sigma )^2\). Clearly, the production \(p''_m\) is the only applicable one since the only other production whose left-hand side is labeled by \(\langle \delta , p_m\rangle \) at the root reaches \(\bot \ne q\). Reordering the derivation \(d''\) to be left-most, we obtain the desired complete left-most derivation \(\underline{d}''\) for s, for which we also have \({{\,\textrm{wt}\,}}_{G''}(\underline{d}'') = {{\,\textrm{wt}\,}}_G(d)\). This proves that \(\underline{d}''\) is the required single element of \(D_{G''}^q(s) = D_{G''}^q \bigl (h(t, d) \bigr ) \ne \emptyset \).

On the other hand, consider \(s \in T_{\varDelta \cup \varDelta \times P}\) such that there exists a complete left-most derivation \(d'' = (p_1'', w''_1) \cdots (p''_m, w''_m)\) for s to q; i.e. \(d'' \in D_{G''}^q(s) \ne \emptyset \). The final rule \(p''_m\) that is applied must be of the form

$$ p''_m = \langle \delta , p\rangle (u_{1}, \dotsc , u_{n}) \llbracket q_{1}, \dotsc , q_{k} \rrbracket {\mathop {\longrightarrow }\limits ^{E,\emptyset }}q $$

with \(\delta (u_{1}, \dotsc , u_{n}) \llbracket q_{1}, \dotsc , q_{k} \rrbracket = h_\sigma \llbracket q_{1}, \dotsc , q_{k} \rrbracket \) for some symbol \(\sigma \in \varSigma _k\) and production \(p = \sigma (q_{1}, \dotsc , q_{k}) \rightarrow q\). For every \(i \in [k]\), we denote by \(w_i\) the unique position among \({{\,\textrm{pos}\,}}_{x_i}(h_\sigma )\) that is labeled by \(q_i\) in \(h_\sigma \llbracket q_{1}, \dotsc , q_{k} \rrbracket \) (recall that every other position related to \(w_i\) via E will be labeled by \(\bot \)). By the induction hypothesis applied to \(s|_{w_i}\), for which the complete left-most derivation \(d''_i\) for \(s|_{w_i}\) to \(q_i\) incorporated in \(d''\) exists, there exists a tree \(t_i \in T_\varSigma \) and a complete left-most derivation \(d_i\) of G for \(t_i\) to \(q_i\) such that \(s|_{w_i} = h(t_i, d_i)\) and \({{\,\textrm{wt}\,}}_G(d_i) = {{\,\textrm{wt}\,}}_{G''}(d''_i)\). For the tree \(t = \sigma (t_{1}, \dotsc , t_{k})\) we obtain that \(s = h(t, d)\) for the complete left-most derivation \(d \in D_G^q(t)\) given by

$$\begin{aligned} d= (1d_1) \cdots (kd_k) (p,\varepsilon ) \hspace{5.0pt}, \end{aligned}$$

for which we also have \({{\,\textrm{wt}\,}}_G(d) = {{\,\textrm{wt}\,}}_{G''}(d'')\), which completes this proof.

So far, \(Q''\) and \(P''\) are larger than Q and P only by a constant (assuming a fixed alphabet \(\varSigma \)) caused by the additional sink nonterminal \(\bot \) and its productions, but the alphabet size increases by the summand \(|\varDelta |\cdot |P |\). Computing a single production only takes linear time in the size of h (assuming constant-time access to the tree in the range of h), so \(G''\) is constructed in linear time in \(|G | \cdot |h |\).

We now delete the annotation with the help of the relabeling \(\pi \in \varDelta ^{\varDelta \cup \varDelta \times P}\) given for every \(\delta \in \varDelta \) and \(p \in P\) by \(\pi _\delta = \pi _{\langle \delta , p\rangle } = \delta \) following the construction in Theorem 4.

$$\begin{aligned} \pi (G'')_u&= \sum _{s \in \pi ^{-1}(u)} G''_s \!=\! \sum _{s \in \pi ^{-1}(u)} \Bigl ( \sum _{q \in Q} F''_q \cdot {{\,\textrm{wt}\,}}_{G''}^q(s) \Bigr ) \!=\! \sum _{\begin{array}{c} q \in Q,\, s \in \pi ^{-1}(u) \\ d'' \in D_{G''}^q(s) \end{array}} F''_q \cdot {{\,\textrm{wt}\,}}_{G''}(d'') \\&{\mathop {\!}\limits ^{(5)}}=\! \sum _{\begin{array}{c} q \in Q,\, s \in \pi ^{-1}(u) \\ t \in T_\varSigma ,\, d \in D_G^q(t) \\ s = h(t, d) \end{array}} F_q \cdot {{\,\textrm{wt}\,}}_G(d) = \sum _{\begin{array}{c} q \in Q \\ t \in h^{-1}(u) \end{array}} F_q \cdot {{\,\textrm{wt}\,}}_G^q(t) \!=\! \sum _{t \in h^{-1}(u)} G_t \!=\! h(G)_u \end{aligned}$$

for every \(u \in T_\varDelta \). The overall time complexity is thus still in \(\mathscr {O}(|G | \cdot |h |)\). The construction of Theorem 4 is applicable because \(\bot \) is clearly a sink nonterminal in \(G''\) and \(G''\) is an eq" restricted positive WTGc. \(\square \)

Let us illustrate the construction on a simple example.

Example 5

Consider the WTA \(G = \bigl (\{q, q'\}, \varSigma , F, P, {{{\,\textrm{wt}\,}}} \bigr )\) over the semiring \(\mathbb {N}\) of nonnegative integers with \(\varSigma = \{\alpha ^{(0)}, \phi ^{(1)}, \gamma ^{(1)}, \varepsilon ^{(1)}\}\), \(F_q = 0\), \(F_{q'} = 1\), and the set of productions and their weights given by

$$ p_1 = \alpha \rightarrow _1 q \qquad p_2 = \gamma (q) \rightarrow _2 q \qquad p_3 = \varepsilon (q) \rightarrow _1 q \quad \text {and} \quad p_4 = \phi (q) \rightarrow _1 q' \hspace{5.0pt}. $$

Then \({{\,\textrm{supp}\,}}(G) = \bigl \{ \phi (t) \mid t \in T_{\varSigma {\setminus }\{\phi \}} \bigr \}\) and \(G_t = 2^{|{{\,\textrm{pos}\,}}_\gamma (t) |}\) for every \(t \in {{\,\textrm{supp}\,}}(G)\). Consider the ranked alphabet \(\varDelta = \{\alpha ^{(0)}, \gamma ^{(1)}, \sigma ^{(2)}\}\) and the tree homomorphism h induced by \(h_\alpha = \alpha \), \(h_\gamma = h_\epsilon = \gamma (x_1)\), and \(h_\phi = \sigma \bigl (\gamma (x_1), x_1 \bigr )\). Consequently,

$$ {{\,\textrm{supp}\,}}\bigl (h(G) \bigr ) = \bigl \{ \sigma \bigl (\gamma ^{n+1}(\alpha ), \gamma ^n(\alpha ) \bigr ) \mid n \in \mathbb {N}\bigr \} $$

and \(h(G)_t = \sum _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) 2^k=3^n\) for every \(t = \sigma \bigl (\gamma ^{n+1}(\alpha ), \gamma ^n(\alpha ) \bigr ) \in {{\,\textrm{supp}\,}}\bigl (h(G) \bigr )\). A WTGc for h(G) is constructed as follows. First, we let

$$ G'' = \bigl (\{q, q', \bot \}, \varDelta \cup \varDelta \times P, F'', P'', \mathord {{{\,\textrm{wt}\,}}''} \bigr ) $$

with \(F''_{q'} = 1\), \(F''_{q} = F''_\bot = 0\) and the productions and their weights are given by

$$\begin{aligned} \langle \alpha , p_1\rangle&\rightarrow _1 q&\langle \gamma , p_2 \rangle (q)&\rightarrow _2 q&\langle \gamma , p_3 \rangle (q)&\rightarrow _1 q&\langle \sigma , p_4 \rangle \bigl (\gamma (q), \bot \bigr )&{\mathop {\longrightarrow }\limits ^{11=2}}_1 q' \end{aligned}$$

and \(\delta (\bot , \dotsc , \bot ) \rightarrow _1 \bot \) for all \(\delta \in \varDelta \cup \varDelta \times P\). Next we remove the second component of the symbols of \(\varDelta \times P\) and add the weights of all productions that yield the same production once the second components are removed. In our example, this applies to the production \(\gamma (q) \rightarrow q\), which is the result of the two productions \(\langle \gamma , p_2\rangle (q) \rightarrow _2 q\) and \(\langle \gamma , p_3\rangle (q) \rightarrow _1 q\), so its weight is \(2 + 1 = 3\). Overall, we obtain the WTGc \(G' = \bigl (\{q, q', \bot \}, \varDelta , F'', P', {{{\,\textrm{wt}\,}}'} \bigr )\) with the following productions for all \(\delta \in \varDelta \):

$$\begin{aligned} \alpha&\rightarrow _1 q&\gamma (q)&\rightarrow _3 q&\sigma \bigl (\gamma (q), \bot \bigr )&{\mathop {\longrightarrow }\limits ^{11=2}}_1 q'&\delta (\bot , \dotsc , \bot )&\rightarrow _1 \bot \hspace{5.0pt}. \qquad {\square } \end{aligned}$$

Trees generated by a WTGc must satisfy certain equality constraints on their subtrees. Therefore, if we naively swap subtrees of generated trees, then we might violate such an equality constraint and obtain a tree that is no longer generated by the WTGc. Luckily, the particular kind of WTGc constructed in Theorem 5, namely eq" restricted positive WTGc, allows us to refine the subtree substitution such that it takes into consideration the equality constraints in force. The following definition is the natural adaptation of [15, Definition 5.1] for (Boolean) tree automata with constraints.

Definition 6

Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be an eq" restricted positive WTGc with sink nonterminal \(\bot \). Moreover, let \(q, q' \in Q\), \(t, t' \in T_\varSigma \), and \(d \in D_G^q(t)\) as well as \(d' \in D_G^{q'}(t')\) such that \(q \ne \bot \ne q'\) and \(d = \underline{d} (p, \varepsilon )\) uses \(p = c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q \in P\) as its final production. For every \(i \in [k]\) let \(w_i = {{\,\textrm{pos}\,}}_{x_i}(c)\) and \(d_i\) be the unique left-most derivation for \(t_i = t|_{{{\,\textrm{pos}\,}}_{x_i}(c)}\) incorporated in d. Finally, for every tree \(u \in T_\varSigma \) let \(d^\bot _u\) be the unique left-most derivation for u to \(\bot \). For every \(w \in {{\,\textrm{pos}\,}}(t)\), for which the derivation for \(t|_w\) incorporated in d yields \(q'\), we recursively define the derivation substitution \(d \llbracket d' \rrbracket _w\) of \(d'\) into d at w and the resulting tree \(t \llbracket t' \rrbracket _w^d\) as follows. If \(w = \varepsilon \), then \(d \llbracket d' \rrbracket _{\varepsilon } = d'\) and \(t \llbracket t' \rrbracket _{\varepsilon }^d = t'\). Otherwise \(w = w_j\underline{w}\) for some \(j \in [k]\) and we have

$$ d \llbracket d' \rrbracket _w = d'_{1} \cdots d'_{k} (p, \varepsilon ) \qquad \text {and} \qquad t \llbracket t' \rrbracket _w^d = c[t'_{1}, \dotsc , t'_{k}] \hspace{5.0pt}, $$

where for each \(i \in [k]\) we have

  • if \(i = j\) (i.e., \(w_i\) is a prefix of w), then \(d'_i = w_i (d_i \llbracket d' \rrbracket _{\underline{w}})\) and \(t'_i = t_i \llbracket t' \rrbracket _{\underline{w}}^{d'_i}\),

  • if \(q_i = \bot \) and \(w_i \in [w_j]_{\equiv _E}\) (i.e., it is a position that is equality restricted to \(w_j\)), then \(d'_i = w_id^\bot _u\) and \(t'_i = u\) with \(u = t_j \llbracket t' \rrbracket _{\underline{w}}^{d'_j}\), and

  • otherwise \(d'_i = w_id_i\) and \(t'_i = t_i\) (i.e., derivation and tree remain unchanged).

It is straightforward to verify that \(d \llbracket d' \rrbracket _w\) is a complete left-most derivation of G for \(t \llbracket t'\rrbracket _w^d\) to q. \(\square \)

Example 6

We consider the WTGc \(G = \big (\{q, \bot \}, \varSigma , F, P, {{{\,\textrm{wt}\,}}}\big )\) with input ranked alphabet \(\varSigma = \{a ^{(0)}, g^{(2)}, f^{(2)}\}\), final weights \(F_q = 1\) and \(F_\bot = 0\) as well as productions

$$ p_a = a \rightarrow _1 q \qquad p_g = g(q, \bot ) {\mathop {\longrightarrow }\limits ^{1=2}}_1 q \quad \text {and} \quad p_f = f \big (q, f(q, \bot ) \big ) {\mathop {\longrightarrow }\limits ^{1=22}}_1 q $$

besides the sink nonterminal productions \(p_\sigma ^\bot = \sigma (\bot , \dotsc , \bot ) \rightarrow _1 \bot \) for all \(\sigma \in \varSigma \). As before, for every \(u \in T_\varSigma \) we let \(d^\bot _u \in D_G^\bot (u)\) be the unique derivation of G for u to \(\bot \), which utilizes only the nonterminal \(\bot \). According to Definition 6 we choose the states \(q = q'\) and the trees t and \(t'\) and derivations d and \(d'\) as given in Fig. 2 and below.

$$\begin{aligned} d&= (p_a, 11) \, (p^\bot _a, 12) \, (p_g, 1) \, (p_a, 21) \, (p^\bot _a, 221) \, (p^\bot _a, 222) \, (p^\bot _g, 22) \, (p_f, \varepsilon ) \\ d'&= (p_a, 1) \, (p^\bot _a, 2) \, (p_g, \varepsilon ) \end{aligned}$$

We select that position \(w = 11\) and observe that that the derivation for \(t|_{11}\) is \((p_a, \varepsilon )\), which yields \(q = q'\). We compute \(d \llbracket d' \rrbracket _w\) as follows

$$\begin{aligned} d \llbracket d' \rrbracket _{11}&= \Bigl (1 (d'_1 \llbracket d' \rrbracket _1) \Bigr ) \, \Bigl (21 (p_a, \varepsilon ) \Bigr ) \, \Bigl (22 d^\bot _u \Bigr ) \, (p_f, \varepsilon ) \\&= \Biggl (1 \Bigl (1 d' \Bigr ) \, \Bigl (2 d_{g(a,a)}^\bot \Bigr ) \, (p_g, \varepsilon ) \Biggr ) \, (p_a, 21) \, (22d_u^\bot ) \, (p_f, \varepsilon ) \\&= (p_a, 111) \, (p^\bot _a, 112) \, (p_g, 11) \, (12d_{g(a,a)}^\bot ) \, (p_g, 1) \, (p_a, 21) \, (22d_u^\bot ) \, (p_f, \varepsilon ) \hspace{5.0pt}, \end{aligned}$$

where \(d'_1 = (p_a, 1) \, (p^\bot _a, 2) \, (p_g, \varepsilon )\) and \(u = g \bigl (g(a, a), g(a, a) \bigr )\). We note that \(w = 11\) is explicitly equality constrained to position 12 in d via the constraint \(1 = 2\) at position 1 and implicitly equality constrained to positions 221 and 222 via the constraint \(1 = 22\) at the root \(\varepsilon \). Thus, we obtain \(d \llbracket d' \rrbracket _{11}\) by substituting \(d'\) into d at position 11 as well as substituting \(d_{t'}^\bot \) into d at positions 12, 221, and 222. The obtained tree \(t \llbracket t' \rrbracket _w^d\) is displayed in Fig.  3. \(\square \)

Fig. 2
figure 2

Input trees t and \(t'\) from Example 6

Fig. 3
figure 3

Obtained pumped tree \(t \llbracket t' \rrbracket _{11}^d\) from Example 6

As our example illustrates, the tree \(t \llbracket t' \rrbracket _w^d\) is obtained from t by (i) identifying the set of all positions of t that are explicitly or implicitly equality constrained to w by the productions in the derivation d and (ii) substituting \(t'\) into t at every such position. If \(w' \in {{\,\textrm{pos}\,}}(t)\) is parallel to all positions constrained to w, like position 21 in Example 6, then \(t \llbracket t' \rrbracket _w|_{w'} = t|_{w'}\). Note that \(t|_{21}\) is equal to the replaced subtree \(t|_{11}\), but we only replace constrained subtrees and not all equal subtrees.

This substitution allows us to prove a pumping lemma for eq" restricted positive WTGc, which can generate all (nondeleting and nonerasing) homomorphic images of regular weighted tree languages by Theorem 5. To this end, we need some final notions. Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTGc. Moreover, let \(p = \ell {\mathop {\longrightarrow }\limits ^{E,I}}q\in P\) be a production. We define the height \({{\,\textrm{ht}\,}}(p)\) of p by \({{\,\textrm{ht}\,}}(p) = {{\,\textrm{ht}\,}}(\ell )\) (i.e., the height of its left-hand side). Moreover, we let

$$ {{\,\textrm{ht}\,}}(P) = \max \bigl \{{{\,\textrm{ht}\,}}(p) \mid p \in P \bigr \} \qquad \text {and} \qquad {{\,\textrm{ht}\,}}(G) = (|Q | + 1) \cdot {{\,\textrm{ht}\,}}(P) \hspace{5.0pt}. $$

Lemma 4

Let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be an eq" restricted positive WTGc with sink nonterminal \(\bot \). There exists \(n \in \mathbb N\) s.t. for every tree \(t_0 \in T_\varSigma \), nonterminal \(q \in Q \setminus \{\bot \}\), and derivation \(d \in D_G^q(t_0)\) such that \({{\,\textrm{ht}\,}}(t_0) > n\) and \({{\,\textrm{wt}\,}}_G(d) \ne 0\) there are infinitely many trees \(t_1, t_2, \dotsc \) and derivations \(d_1, d_2, \dotsc \) such that \(d_i \in D_G^q(t_i)\) and \({{\,\textrm{wt}\,}}_G(d_i) \ne 0\) for all \(i \in \mathbb {N}\).

Proof

Without loss of generality, suppose that for every \(c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E,\emptyset }}q' \in P\) with \(q' \ne \bot \) and \(k \ne 0\) there exists \(i \in [k]\) such that \(q_i \ne \bot \). This can easily be achieved by introducing a copy \(\top \) of nonterminal \(\bot \) and replacing one instance of \(\bot \) by \(\top \) in offending productions. Similarly, we can assume without loss of generality that the construction in the proof of Lemma 3 has been applied to G. If this is the case, then we can select \(n = {{\,\textrm{ht}\,}}(G)\). Let \(t_0 \in T_\varSigma \) be such that \({{\,\textrm{ht}\,}}(t_0) > n\). Let \(Q' = Q \setminus \{\bot \}\), \(d \in D_G^q(t_0)\) be a derivation with \({{\,\textrm{wt}\,}}_G(d) \ne 0\), and select a position \(w \in {{\,\textrm{pos}\,}}(t_0)\) of maximal length such that d incorporates a derivation for \(t_0|_w\) to some \(q' \in Q'\). Then

$$ |w | \ge {{\,\textrm{ht}\,}}(t_0) - {{\,\textrm{ht}\,}}(P) \ge {{\,\textrm{ht}\,}}(G) - {{\,\textrm{ht}\,}}(P) = |Q | \cdot {{\,\textrm{ht}\,}}(P) \hspace{5.0pt}, $$

which yields that at least \(|Q |\) proper prefixes \(w'\) of w exist such that d incorporates a derivation for \(t_0|_{w'}\) to some \(q' \in Q'\). Hence there exist prefixes \(w', w''\) of w such that d incorporates a derivation \(d'\) for \(t' = t_0|_{w'}\) to \(q' \in Q'\) as well as a derivation for \(t_0|_{w''}\) to the same nonterminal \(q'\), and, without loss of generality, \(w'\) is a proper prefix of \(w''\). Then \(d \llbracket d' \rrbracket _{w''}\) is a derivation of G for \(t_1 = t \llbracket t' \rrbracket _{w''}^d\) to q with \({{\,\textrm{ht}\,}}(t_1) > {{\,\textrm{ht}\,}}(t_0)\). Since we achieve the same state q, the annotation of the proof of Lemma 3 guarantees that \({{\,\textrm{wt}\,}}_G(d_1) \ne 0\). Iterating this substitution yields the desired trees \(t_1, t_2, \dotsc \) and derivations \(d_1, d_2, \dotsc \). \(\square \)

A WTGc generating a (nondeleting and nonerasing) homomorphic image of a regular weighted tree language, if constructed as described in Theorem 5, will never have overlapping constraints since constraints always point to leaves of the left-hand sides of productions as required by classic WTGc. It is intuitive that this limitation to the operating range of constraints leads to an actual restriction in the expressive power of WTGc, but we will only prove it for eq-restricted positive WTGc.

Proposition 2

Let \(\mathbb {S}\) be a zero-sum free semiring. The class of positive constraint-regular weighted tree languages is strictly more expressive than the class of weighted tree languages generated by eq" restricted positive WTGc.

Proof

Let us consider the positive WTGc \(G = \bigl (\{q, q'\}, \varSigma , F, P, \mathord {{{\,\textrm{wt}\,}}}\bigr )\) with input ranked alphabet \(\varSigma = \{f^{(2)}, \underline{f}^{(2)}, g^{(2)}, a^{(0)}\}\), final weights \(F_q = 1\) and \(F_{q'} = 0\), and the following productions, of which each has weight 1.

$$\begin{aligned} a \rightarrow _1 q' \qquad \qquad g(q', q') \rightarrow _1 q \qquad \qquad f(q, q) {\mathop {\longrightarrow }\limits ^{12=21}}_1 q \qquad \qquad \underline{f}(q, q){\mathop {\longrightarrow }\limits ^{12=21}}_1 q \end{aligned}$$

The first two productions are only used on leaves and on subtrees of the form g(aa). Every other position w (i.e., neither leaf nor position with two leaves as children) is labeled either f or \(\underline{f}\) and additionally every derivation enforces the constraint \(12 = 21\), so the subtrees \(t|_{w12}\) and \(t|_{w21}\) of the input tree t need to be equal for a complete derivation of G to exist.

For the sake of a contradiction, suppose that there exists an eq" restricted positive WTGc \(G' = (Q', \varSigma , F', P', {{{\,\textrm{wt}\,}}'})\) that is equivalent to G. We recursively define the trees \(t_n \in T_\varSigma \) and \(t'_n \in T_\varSigma \) for every \(n \in \mathbb N\) with \(n \ge 1\) by

$$\begin{aligned} t_0&= a \qquad&\qquad t_1&= g(t_0, t_0) \qquad&\qquad t_{n+1}&= f(t_n, t_n) \\ t'_0&= a \qquad&\qquad t'_1&= g(t'_0, t_0) \qquad&\qquad t'_{n+1}&= \underline{f}(t'_n, t_n) \end{aligned}$$

Clearly, \(t_n\) and \(t'_n\) are both complete binary trees of height n. Naturally, the leaves are labeled a, and the penultimate level in both trees is always labeled g. In \(t_n\) the remaining levels are universally labeled f, whereas in \(t'_n\) the left-most spine on those levels is labeled \(\underline{f}\). We illustrate an example tree \(t'_n\) in Fig. 4. Obviously \(G(t_n) = 1\) as well as \(G(t'_n) = 1\) for every \(n \in \mathbb N\) with \(n \ge 1\). Furthermore we note that the derivations of G only enforce equality constraints on positions of the form w12 or w21, but since \({{\,\textrm{pos}\,}}_{\underline{f}}(t'_n) \subseteq \{1\}^*\), the positions, in which the labels in \(t_n\) and \(t'_n\) differ, are not affected by any equality constraint. This can be used to verify that \(G(t'_n) = 1\) for each \(n \ge 1\).

Fig. 4
figure 4

A snippet of the tree \(t'_n\) and the productions used by \(G'\)

In the following, let \(n = 3 {{\,\textrm{ht}\,}}(G') + 2\). Since \(G'\) is equivalent to G, we need to have \(G'(t'_n) = 1\) as well, which requires a complete derivation of \(G'\) for \(t'_n\) to some final nonterminal \(q_0 \in Q'\). Let \(d \in D_{G'}^{q_0}(t'_n)\) be such a derivation. Moreover, let \(d = \underline{d} (p, \varepsilon )\) for some production \(p = c[q_{1}, \dotsc , q_{k}] {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q_0 \in P'\). Since the input tree \(t'_n\) contains positions

$$ \Bigl \{1^i = \underbrace{11\cdots 1}_{i \text { times}} \mid 0 \le i \le n \Bigr \} \subseteq {{\,\textrm{pos}\,}}(t'_n) \hspace{5.0pt}, $$

there must exist \(j \in \mathbb {N}\) such that \(c(1^j) = x_1\); i.e., position \(1^j\) is labeled \(x_1\) in c. Obviously, \(j \le {{\,\textrm{ht}\,}}(G')\), so the height of the subtree \(t'' = t'_n|_{1^j}\), which is still a complete binary tree, is at least \(2 {{\,\textrm{ht}\,}}(G') + 2\). We can thus apply Lemma 4 to the tree \(t''\) in such a way that it modifies its second direct subtree (starting from \(1^j \in {{\,\textrm{pos}\,}}(t'_n)\), we descend to \(1^j2\); from there, we either find a subderivation to some nonterminal different from \(\bot \), or all subtrees below \(1^j2\) are copies of subtrees below \(1^j1\), and in that case, we apply the pumping to an equality constrained subtree below \(1^j1\), which then also modifies the corresponding subtree below \(1^j2\)). Let u be the such obtained pumped tree, which according to zero-sum freeness and Lemma 4 is also in the support of \(G'\); i.e., \(u \in {{\,\textrm{supp}\,}}(G')\). Let \(d'\) be the derivation constructed in Lemma 4 corresponding to u. We have \(u(1^{j-1}) = \underline{f}\), so the position \(1^{j-1}\) is labeled \(\underline{f}\). Since G and \(G'\) are equivalent, there must be a derivation of G for u as well, which enforces the equality constraint \(u|_{1^{j-1}12} = u|_{1^{j-1}21}\). By construction we have \(t'_n|_{1^{j-1}12} \ne u|_{1^{j-1}12}\). Since the positions \(1^{j-1}12\) and \(1^{j-1}21\) have no common suffix, this equality can only be guaranteed by \(G'\) if \(1^{j-1}12\) and \(1^{j-1}21\) are themselves (explicitly or implicitly) equality constrained in \(d'\). The potentially several constraints that achieve this must of course be located at prefixes of \(1^{j-1}12\) and \(1^{j-1}21\), and since the production used in \(d'\) at the root is still p and stretches all the way to \(1^j\), this can only be achieved if \(d'\) enforces \(1^{j-1}1 = 1^{j-1}2\) via p at the root as well as \(1 = 2\) at \(1^{j-1}1\) or at \(1^{j-1}2\). However, this is a contradiction as \(u(1^{j-1}1) = \underline{f} \ne f = u(1^{j-1}2)\), so we cannot have an explicit or implicit equality constraint between \(1^{j-1}12\) and \(1^{j-1}21\), so \(u|_{1^{j-1}21} = t'_n|_{1^{j-1}21}\), but contradicts that G has a complete derivation for u. \(\square \)

Although for zero-sum free semirings, the support of a regular weighted tree language is again regular, in general, the converse is not true, so we cannot apply the decision procedure of [15] to the support of a homomorphic image in order to decide its regularity. Instead, we hope to extend the unweighted argument in a way that tracks the weights sufficiently close. For this, we prepare two decidability results, which rely mostly on the corresponding results in the unweighted case. To this end, we need to relate our WTGc constructed in Theorem 5 to the classic TGc used in [15]. At this point we mention that their classic TGc additionally require that equality constrained positions have the same nonterminal label. Compared to our eq-restriction this change is entirely immaterial in the unweighted case.

Theorem 6

Let \(\mathbb S\) be a zero-sum free semiring. Moreover, let \(G = (Q, \varSigma , F, P, {{{\,\textrm{wt}\,}}})\) be a WTA and \(h \in T_\varDelta ^{T_\varSigma }\) be a nondeleting and nonerasing tree homomorphism. Finally, let \(G' = h(G)\). Emptiness and finiteness of \({{\,\textrm{supp}\,}}(G')\) are decidable.

Proof

We apply the construction in the proof of Lemma 3 to the eq" restricted positive WTGc \(G' = (Q', \varSigma , F', P', {{{\,\textrm{wt}\,}}'})\) constructed according to Theorem 5. Thus we ensure that all derivations have non-zero weight. Due to zero-sum freeness, we can simply drop the weights and obtain an eq" restricted positive TGc \(G'' = (Q'', \varSigma , F'', P'')\) generating \({{\,\textrm{supp}\,}}(G')\). Emptiness and finiteness are decidable for the tree language \({{\,\textrm{supp}\,}}(G')\) generated by \(G''\) according to [15, Corollaries 5.11 & 5.20]. \(\square \)

6 Conclusion

The purpose of this contribution is to lay out the groundwork for investigating and eventually ideally deciding the weighted HOM problem. For this, we have introduced the model of eq-restricted WTGh and showed that they are well-suited to efficiently represent homomorphic images of regular weighted tree languages (Theorem 5). Apart from classical closure properties, we have proved a pumping lemma for these devices (Lemma 4).

Recently, significant progress has been made on the topic of the weighted HOM problem. Most notably, the \(\mathbb {N}\)-weighted version of this problem was proved to be decidable [21]. There, the HOM problem is reduced to a specific property of the WTGh constructed in our Theorem 5 for the homomorphic image, and this specific property is shown to be decidable. The proof of the latter part is based on our pumping Lemma 4. Additionally [23] shows that if the input of the HOM problem satisfies particular conditions (intuitively, the tree homomorphism must satisfy a condition generalizing injectivity and the input WTA must satisfy an ambiguity restriction), then the WTGh for the homomorphic image constructed in Theorem 5 is unambiguous. In that case, the (thus restricted) weighted HOM problem over any zero-sum free semiring can be reduced to the unweighted HOM problem [15] and is therefore decidable. Our current efforts are centered around the weighted HOM problem over fields, for which we hope to prove decidability with the same strategy that was used in [21] for \(\mathbb {N}\)-weights and a pumping lemma similar to Lemma 4 for zero-sum free semirings.