1 Introduction and Results

A very extensively studied version of graph modification problems asks to modify a given graph to a graph that satisfies a certain property \(\mathcal G\) by deleting a minimum number of vertices. The case \(\mathcal G\) being ‘edgeless’ is the well-known vertex cover problem, one of the classical \(\textsf{NP}\)-hard problems. If \(\mathcal G\) is a ‘cluster graph’, a graph in which every connected component is a clique, the corresponding problem is another well-known \(\textsf{NP}\)-hard problem, the cluster vertex deletion problem (cluster-vd for short). In this paper, we revisit the computational complexity of cluster-vd, formally given below.

figure a

Being an hereditary property on induced subgraphs, cluster-vd is \(\textsf{NP}\)-complete [25] and cannot be solved in \(2^{o(n+m)}\) time unless the ETH (Exponential-Time Hypothesis) fails [21], where n and m are the vertex and edge number of the input graphs, respectively. cluster-vd remains \(\textsf{NP}\)-complete even when restricted to planar graphs [32] and to bipartite graphs [33], and to planar bipartite graphs of maximum degree 3 [14]. Most recent works on cluster-vd deal with exact, FPT and approximation algorithms [1, 2, 15, 31].

It is noticeable that there are only a few known cases where the problem can be solved efficiently: cluster-vd is polynomially solvable on block graphs, split graphs and interval graphs [3], and on graphs of bounded treewidth [29]. On the other hand, the complexity status of cluster-vd on many well-studied graph classes is still open, e.g., chordal graphs discussed in [3] and planar bipartite graphs mentioned in [4].

In this paper we initiate studying the computational complexity of cluster-vd on graphs defined by forbidding certain induced subgraphs. We remark that related approaches for other problems are quite common in the literature, e.g., for vertex cover (aka independent set) [10, 13] and coloring [11, 23], and that many popular graph classes are defined or characterized by forbidding induced subgraphs, e.g., chordal and bipartite graphs (by infinitely many forbidden subgraphs), and cographs and line graphs (by finitely many forbidden subgraphs).

All graphs considered are undirected, finite and have no multiple edges or self-loops. Let H be a given graph. A graph G is H-free if no induced subgraph in G is isomorphic to H. A path with n vertices and \(n-1\) edges is denoted by \(P_n\). The main result of the present paper is the following complexity dichotomy:

Theorem 1

Let H be a fixed graph. cluster-vd is polynomially solvable on H-free graphs if H is an induced subgraph of the 4-vertex path \(P_4\), and \(\textsf{NP}\)-complete otherwise.

Furthermore, in case H is not an induced subgraph of \(P_4\), no algorithm of runtime \(2^{o(n)}\) can solve cluster-vd on H-free n-vertex graphs, unless the ETH fails.

We also consider the connected variant of cluster-vd, which is as follows.

figure b

It is known that connected cluster-vd is \(\textsf{NP}\)-complete and cannot be solved in \(2^{o(n+m)}\) time unless the ETH fails [21]. It turns out that connected cluster-vd admits the same complexity dichotomy as for cluster-vd:

Theorem 2

Let H be a fixed graph. connected cluster-vd is polynomially solvable on H-free graphs if H is an induced subgraph of the 4-vertex path \(P_4\), and \(\textsf{NP}\)-complete otherwise.

Furthermore, in case H is not an induced subgraph of \(P_4\), no algorithm of runtime \(2^{o(n)}\) can solve connected cluster-vd on H-free n-vertex graphs, unless the ETH fails.

Theorems 1 and 2 enlarge a list of rare dichotomy theorems on H-free graphs: Korobitsin [22] proved that dominating set is solvable in polynomial time on H-free graphs if H is an induced subgraph of \(P_4+tP_1\), the union of \(P_4\) and t isolated vertices for \(t\ge 0\), and \(\textsf{NP}\)-complete otherwise. Munaro [27] proved that the same dichotomy holds for connected dominating set and for graph VC\(_{\textsc {con}}\) dimension. Král, Kratochvíl, Tuza and Woeginger [23] proved that colouring on H-free graphs is solvable in polynomial time if H is an induced subgraph of \(P_4\) or of \(P_3+P_1\) and \(\textsf{NP}\)-complete otherwise. Kamiński [20] proved that max-cut is solvable in polynomial time if H is an induced subgraph of \(P_4\) and \(\textsf{NP}\)-complete otherwise.

2 Preliminaries

For a set \(\mathcal H\) of graphs, \(\mathcal{H}\)-free graphs are those in which no induced subgraph is isomorphic to a graph in \(\mathcal H\). We denote by \(K_{1,n}\) the tree with \(n+1\ge 3\) vertices and n leaves, by \(C_n\) the n-vertex cycle. The girth girth(G) of a graph G is the smallest length of a cycle in G; we set \(girth(G)=\infty \) if G is a forest, a graph without cycles. Thus, for any fixed integer \(g\ge 3\), \(gith(G)>g\) if and only if G is \(\{C_3,C_4, \ldots , C_g\}\)-free.

As usual, we denote by \(\overline{G}\) the complement of a graph G. The union \(G+H\) of two vertex-disjoint graphs G and H is the graph with vertex set \(V(G)\cup V(H)\) and edge set \(E(G)\cup E(H)\); we write pG for the union of p copies of G. For a subset \(S \subseteq V(G)\), let G[S] denote the subgraph of G induced by S; \(G-S\) stands for \(G[V(G)\setminus S]\). By ‘G contains an H’ we mean G contains H as an induced subgraph. Graphs in which every vertex has degree 3 are called 3-regular graphs or cubic graphs and graphs with maximum degree 3 subcubic graphs.

A graph G is a cluster graph if each of its connected components is a clique. Observe that G is a cluster graph if and only if G is \(P_3\)-free. If \(S\subseteq V(G)\) is a subset of vertices of G such that \(G-S\) is \(P_3\)-free, then S is called a cluster vertex deletion set of G. An optimal cluster vertex deletion set is one of minimum size.

Algorithmic lower bounds in this paper are conditional, based on the Exponential Time Hypothesis (ETH) [16]. The ETH asserts that no algorithm can solve 3sat in subexponential time \(2^{o(n)}\) for n-variable 3-cnf formulas. As shown by the Sparsification Lemma in [17], the hard cases of 3sat consist of sparse formulas with \(m=O(n)\) clauses. Hence, the ETH implies that 3sat cannot be solved in time \(2^{o(n+m)}\).

Recall that an instance for nae 3sat is a 3-cnf formula \(F=C_1\wedge C_2\wedge \cdots \wedge C_m\) over n variables, in which each clause \(C_j\) consists of three distinct literals. The problem asks whether there is a truth assignment of the variables such that every clause in F has at least one true and at least one false literal. Such an assignment is called an nae assignment, i.e. a not-all-equal assignment. There is a polynomial reduction from 3sat to nae 3sat ([26, Theorem 7.3]), which transforms an instance for 3sat with n variables and m clauses to an equivalent instance for nae 3sat with \(2n+24m\) variables and 32m clauses. Thus, we obtain:

Theorem 3

([17, 26]) nae 3sat is \(\textsf{NP}\)-complete and, assuming ETH, cannot be solved in time \(2^{o(n+m)}\) on inputs with n variables and m clauses.

We will also need the following restriction of nae 3sat. For integers \(p, q\ge 2\), let (pq)-3sat denote the problem of deciding if a 3-cnf formula in which each variable occurs at most p times positively and at most q times negatively is satisfiable. (pq)-nae 3sat is defined analogously. A reduction from 3sat, linear in the number of clauses, due to Tovey [30] shows that (2, 2)-3sat remains \(\textsf{NP}\)-complete and, assuming ETH, cannot be solved in time \(2^{o(n)}\) for inputs with n variables. Now, the reduction due to Moret [26, Theorem 7.3] mentioned above transforms an instance for (2, 2)-3sat to an equivalent instance for (4, 4)-nae 3sat, linear in the number of variables and clauses. Hence, we obtain:

Theorem 4

([17, 26, 30]) (4, 4)-nae 3sat is \(\textsf{NP}\)-complete and, assuming ETH, cannot be solved in time \(2^{o(n)}\) on inputs with n variables.

Structure of the paper

We first address the polynomial part of Theorems 1 and 2 in the next section. Then we present two new \(\textsf{NP}\)-completeness results for cluster-vd and connected cluster-vd in Sections 4 and 5. These hardness results allow us to clear the \(\textsf{NP}\)-completeness part of Theorems 1 and 2 in Section 6. The last section concludes the paper.

3 H-free Graphs: Polynomial Cases

The polynomial part in Theorems 1 and 2 consists of six cases; see Fig. 1 for all graphs H for which cluster-vd and connected cluster-vd are polynomially solvable on H-free graphs.

Observe that H-freeness is hereditary, meaning if \(H'\) is an induced subgraph of H then \(H'\)-free graphs are H-free graphs. Thus, it suffices to prove the polynomial part only for the case where H is the 4-vertex path \(P_4\).

The proof will follow from the concept of clique-width of graphs in connection with the so-called monadic second-order logic, \(MSOL_1\) for short, an extension of first-order logic with quantification over vertex set variables. Briefly, the clique-width of a graph G, introduced in [8], is the minimum number of labels needed to construct G by:

  • creating a new vertex with label i,

  • taking a disjoint union of two labeled graphs,

  • joining every vertex with label i to every vertex with label \(j\not = i\), and

  • renaming label i to label j.

Fig. 1
figure 1

The graphs H for which cluster-vd and connected cluster-vd are polynomially solvable on H-free graphs

Such a construction with k labels defines an algebraic k-expression. A well-known meta-theorem by Courcelle, Makowsky and Rotics [9] states that any graph property expressible in \(MSOL_1\) is decidable in linear time for graphs with bounded clique-width, provided a k-expression of the graphs is given. It is well known that \(P_4\)-free graphs, also known as cographs, have clique-width at most 2 and a corresponding 2-expression can be constructed in linear time (see, e.g., [9]). Hence, any \(MSOL_1\) graph property is decidable in linear time when restricted to \(P_4\)-free graphs.

Now, being a cluster vertex deletion set is a \(MSOL_1\) property:

$$\begin{aligned}&\forall u, v, w \big (\lnot S(u)\wedge \lnot S(v)\wedge \lnot S(w) \wedge E(u,v)\wedge E(v,w)\wedge (u\not = w) \rightarrow E(u,w)\big ), \end{aligned}$$

where S(x) means \(x\in S\) and E(xy) means \(xy\in E(G)\). (The sentence says that the graph \(G-S\) is \(P_3\)-free.)

Also, the fact that the vertex set S in a graph G induces a connected subgraph of G can be written as a \(MSOL_1\) sentence:

$$\begin{aligned}&\forall T\subseteq S \Big ((S\not =\emptyset \wedge S\setminus T\not =\emptyset ) \rightarrow \big (\exists u\in S\setminus T,\, \exists v\in T:\, E(u,v)\big )\Big ). \end{aligned}$$

(The sentence says that, for any bipartition of S into two non-empty sets, there is an edge joining two vertices in different parts of the bipartition.)

Thus, cluster-vd and connected cluster-vd can be solved in linear time on \(P_4\)-free graphs. Indeed, we have a stronger fact. The weighted optimization version of cluster-vd and connected cluster-vd, minimum cluster-vd and minimum connected cluster-vd, are \(LinEMSOL_{\tau _{1,p}}\) problems (\(LinEMSOL_{\tau _{1,p}}\) is an extension of \(MSOL_1\) which allows one to search for optimal sets of vertices with respect to some linear objective function). We refer to the paper [9] for details, in which it is shown that every \(LinEMSOL_{\tau _{1,p}}\) problem on \(P_4\)-free graphs can be solved in linear time [9, Theorem 4]. To sum up, we have:

Proposition 5

cluster-vd and connected cluster-vd can be solved in linear time on \(P_4\)-free graphs, even in the weighted optimization version.

Another approach for obtaining the above results is to use the so-called cotree of cographs. Using the cotree of a cograph G, we are able to compute an optimal (connected) cluster vertex deletion set of G in linear time in a direct and simple way. The details are given in the Appendices A and B.

4 Cluster-VD and Connected Cluster-VD on Dense Graphs

In this section, we give a polynomial reduction from vertex cover to cluster-vd, showing that cluster-vd remains \(\textsf{NP}\)-complete when restricted to \(\{3P_1, 2P_2\}\)-free n-vertex graphs with minimum degree at least \(n-4\).

Recall that the vertex cover problem asks, for a given graph G and an integer k, if one can delete a vertex set S of size at most k such that \(G-S\) is edgeless. It is well known that vertex cover is \(\textsf{NP}\)-complete and, assuming ETH, cannot be solved in \(2^{o(n+m)}\) time on n-vertex m-edge graphs. This fact and a result in [18] imply that, assuming ETH, vertex cover cannot be solved in \(2^{o(n)}\) time on subcubic n-vertex graphs. There is a polynomial-time reduction from vertex cover in cubic graphs to vertex cover in subcubic planar graphs with arbitrarily large girth, which transforms an instance (Gk) of the first version to an equivalent instance \((G',k')\) for the second version, where the vertex number of \(G'\) is linear in the vertex number of G (see, e.g., [28] or [21]). Thus, we obtain:

Theorem 6

([18, 21, 28]) Let \(g\ge 3\) be a fixed integer. vertex cover is \(\textsf{NP}\)-complete even when restricted to subcubic graphs of girth \(>g\) and, assuming ETH, vertex cover cannot be solved in \(2^{o(n)}\) time in this restricted graph class.

We now describe the announced reduction. Let \(g\ge 3\) be an integer and let (Gk) be an instance for vertex cover, where G is a n-vertex subcubic graph with girth \(>g\). We may assume that

  • G is not perfect. This is because vertex cover is polynomially solvable on perfect graphs (see [12]); notice that G is perfect if and only if \(\overline{G}\) is perfect and perfect graphs can be recognized in polynomial time [5], and

  • \(k\le |V(G)|/2\). This fact can be easily seen as follows: given G with n vertices and an integer k, let \(G'\) be obtained from G by adding \(p=\max \{0,2k-n\}\) isolated vertices. Then \(k=|V(G')|/2\) and \((G,k)\in \textsc {vertex cover} \) if and only if \((G',k)\in \textsc {vertex cover} \). Notice that like G, \(G'\) is subcubic, not perfect and has girth \(>g\), too.

From (Gk) we construct an equivalent instance \((G',k')\) for cluster-vd as follows: \(G'\) is obtained from two disjoint copies of \(\overline{G}\), \(G_1\) and \(G_2\), by adding all possible edges between \(V(G_1)\) and \(V(G_2)\). Set \(k'=2k\).

We argue that \((G,k)\in \textsc {vertex cover} \) if and only \((G',k')\in \textsc {cluster-vd} \). First, let \(S\subset V(G)\) be a vertex cover, that is \(G-S\) is edgeless, with \(|S|\le k\). Let \(S_1\) and \(S_2\) be the copy of S in \(G_1\) and \(G_2\), respectively. Then, for each \(i\in \{1,2\}\), \(G_i-S_i\) is a clique in \(G_i=\overline{G}\), and with \(S'=S_1\cup S_2\), \(G'-S'\) is a clique in \(G'\) with \(|S'|=2|S|\le 2k=k'\).

Conversely, let \(S'\subseteq V(G')\) be a cluster vertex deletion set of \(G'\) with \(|S'|\le k'=2k\). Observe that, for each \(i\in \{1,2\}\), \(S'\cap V(G_i)\) is a proper nonempty subset of \(V(G_i)\): if for some i, \(S'\cap V(G_i)=\emptyset \) then \(G_i\) (hence G) would be perfect because in this case \(G_i\) would be a cluster, and if \(V(G_i)\subset S'\) then \(2k\ge |S'|>|V(G_i)|=|V(G)|\), contradicting \(k\le |V(G)|/2\). It follows from the above that \(G'-S'\) is a single clique, implying for each \(i\in \{1,2\}\), \(G_i-S_i\) is a clique in \(G_i\) where \(S_i=S'\cap V(G_i)\). Since \(|S'|\le 2k\), \(|S_1|\le k\) or \(|S_2|\le k\). Let \(|S_1|\le k\), say, and let \(S\subseteq V(G)\) be the set of the corresponding vertices in G. Then \(G-S\) is edgeless with \(|S|\le k\).

We have seen that G has a vertex cover of size at most k if and only if \(G'\) has a cluster vertex deletion set of size at most \(k'\), as claimed.

Note that \(G'\) has 2n vertices and minimum degree at least \(2n-4\) (as G has n vertices and maximum degree at most 3). Now, observe that, for any connected graph X, if G is X-free then \(G'\) is \(\overline{X}\)-free. Since G is \(\{C_3,C_4,\ldots ,C_g\}\)-free, we obtain with Theorem 6:

Theorem 7

For any fixed \(g\ge 3\), cluster-vd is \(\textsf{NP}\)-complete on \(\{\overline{C_3}, \overline{C_4}, \ldots , \overline{C_g}\}\)-free n-vertex graphs with minimum degree at least \(n-4\) and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

In particular, cluster-vd is \(\textsf{NP}\)-complete on \(\{3P_1, 2P_2\}\)-free graphs and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

We observe that the proof of Theorem 7 remains true for connected cluster vertex deletion sets: G has a vertex cover of size at most \(k\le |V(G)|/2\) if and only if \(G'\) has a connected cluster vertex deletion set of size at most \(k'=2k\). Thus, Theorem 7 also holds for connected cluster-vd:

Theorem 8

For any fixed \(g\ge 3\), connected cluster-vd is \(\textsf{NP}\)-complete on \(\{\overline{C_3}, \overline{C_4}, \ldots , \overline{C_g}\}\)-free n-vertex graphs with minimum degree at least \(n-4\) and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

In particular, connected cluster-vd is \(\textsf{NP}\)-complete on \(\{3P_1, 2P_2\}\)-free graphs and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

5 Cluster-VD and Connected Cluster-VD on Sparse Graphs

In [33, Lemma 1], Yannakakis gave a polynomial-time reduction from nae 3sat to cluster-vd, which transforms an instance for nae 3sat with n variables and m clauses, into an equivalent instance (Gk) for cluster-vd, where G is a bipartite graph with \(6n+12m\) vertices. Thus, by Theorem 3, cluster-vd is \(\textsf{NP}\)-complete even when restricted to bipartite graphs and, assuming ETH, cluster-vd cannot be solved in \(2^{o(n)}\) time on bipartite graphs with n vertices.

We remark that by considering (4, 4)-nae 3sat instead of nae 3sat, the bipartite graph obtained from the reduction of Yannakakis mentioned above has maximum degree at most four. Thus, by Theorem 4, we obtain:

Theorem 9

([33]) cluster-vd is \(\textsf{NP}\)-complete even when restricted to n-vertex bipartite graphs of maximum degree at most 4 and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

In [14], Hsieh, Le, Le and Peng gave another polynomial-time reduction from nae 3sat to cluster-vd, which transforms an instance for nae 3sat with n variables and m clauses, into an equivalent instance (Gk) for cluster-vd, where G is a subcubic bipartite graph with \(6nm+30m\) vertices. Recall that we may assume (by the Sparsification Lemma) that \(m=O(n)\). Thus, by Theorem 3, we obtain:

Theorem 10

([14]) cluster-vd is \(\textsf{NP}\)-complete even when restricted to subcubic n-vertex bipartite graphs and, assuming ETH, cannot be solved in time \(2^{o(\sqrt{n})}\).

In this section, we will further improve Theorems 9 and 10 by Theorems 12 and 13, respectively. We begin with the following fact.

Lemma 11

Given a graph G, let \(G'\) be obtained from G by subdividing each edge \(e=xy\) in G with three new vertices \(e_x, e_{xy}\) and \(e_y\), thus obtaining the 5-vertex path \(xe_xe_{xy}e_yy\) in \(G'\) in which all new vertices are of degree 2. Assuming G is triangle-free, G has a cluster vertex deletion set of size at most k if and only if \(G'\) has a cluster vertex deletion set of size at most \(k+m\), where m is the edge number of G.

Proof

Observe that since G is triangle-free, a cluster in G is a collection of isolated vertices and edges.

For one direction, extend a cluster vertex deletion set \(S\subseteq V(G)\) to a cluster vertex deletion set \(S'\subseteq V(G')\) of size \(|S|+m\) as follows; see also Fig. 2: initially, set \(S'=S\). Then, for each edge \(e=xy\) in G,

  • if both x and y are in S or outside S, put \(e_{xy}\) into \(S'\);

  • if \(x\in S\) and \(y\notin S\), put \(e_y\) into \(S'\);

  • if \(x\notin S\) and \(y\in S\), put \(e_x\) into \(S'\).

To see that \(G'-S'\) is \(P_3\)-free, notice that by construction, for each edge \(e=xy\) in G, exactly one of \(e_x, e_{xy}\) and \(e_y\) is in \(S'\), and if \(e_x, e_{xy}\notin S'\) then \(x\in S\), and if \(e_x, x\notin S'\) then \(y\notin S\), hence \(e_{xy}\in S'\). Since each \(P_3\) in \(G'\) has the form \(xe_xe_{xy}\), \(e_xe_{xy}e_y\) or \(e_xxe'_x\) for some edge \(e=xy\) and \(e'=xz\), it follows from these facts and the assumption that G is triangle-free that \(G'-S'\) is \(P_3\)-free.

Fig. 2
figure 2

Proof of Lemma 11 illustrated: A triangle-free graph G (left) with two highlighted edges \(e=xy\) and \(e'=xz\), and the graph \(G'\) obtained from G as described in Lemma 11 (right); the cluster vertex deletion set \(S=\{x,y\}\) of G is extended to the cluster vertex deletion set \(S'\) of \(G'\) consisting of the nine black vertices

For the other direction, suppose that \(G'\) has a cluster vertex deletion set of size at most \(k+m\), and consider such a set \(S'\) of minimum size. Then, we may assume that, for each edge \(e=xy\) in G, \(S'\) contains exactly one of \(e_x, e_{xy}\) and \(e_y\): note that \(e_xe_{xy}e_y\) is a \(P_3\), hence \(|S'\cap \{e_x,e_{xy},e_y\}|\ge 1\), and by minimality, \(|S'\cap \{e_x,e_{xy},e_y\}|\le 2\). Now, if \(|S'\cap \{e_x,e_{xy},e_y\}| = 2\) for some edge \(e=xy\) in G, then \(S'\) can be modified to a minimum cluster vertex deletion set containing exactly one of \(e_x, e_{xy}\) and \(e_y\) as follows:

  • suppose that \(e_x, e_{xy}\in S'\). Then \(x, y\not \in S'\) (if \(x\in S'\) then \(S'-e_x\) would be a cluster vertex deletion set of \(G'\), and if \(y\in S'\) then \(S'-e_{xy}\) would be a cluster vertex deletion set of \(G'\), contradicting the minimality of \(S'\)), and \(S''=S'-e_{xy}+y\) is the desired cluster vertex deletion set of minimum size;

  • suppose that \(e_y, e_{xy}\in S'\). Then similar to the above case, \(x, y\not \in S'\), and \(S''=S'-e_{xy}+x\) is the desired cluster vertex deletion set of minimum size;

  • suppose that \(e_x,e_y\in S'\). Then \(x,y\notin S'\) (if \(x\in S'\) or \(y\in S'\) then \(S''=S'-e_x\), respectively \(S''=S'-e_y\), would be a cluster vertex deletion set of \(G'\), contradicting the minimality of \(S'\)), and \(S''=S'-e_x+x\) is the desired cluster vertex deletion set of minimum size.

Hence, \(S=S'\cap V(G)\) has at most k vertices, and \(G-S\) is \(P_3\)-free: if there would be an induced \(P_3\) xyz in G with edges \(e=xy\) and \(e'=yz\), then, as \(|S'\cap \{e_x,e_{xy},e_y\}|=1=|S'\cap \{e'_y,e'_{yz},e'_z\}|\), one of the 3-paths \(xe_xe_{xy}\), \(e_yye'_y\) and \(e'_{yz}e'_zz\) would be outside \(S'\).

Thus, G has a cluster vertex deletion set of size at most k if and only if \(G'\) has a cluster vertex deletion set of size at most \(k+m\), as claimed. \(\square \)

We now show that, for any given tree T containing two vertices of degree 3, cluster-vd remains \(\textsf{NP}\)-complete when restricted to T-free bipartite graphs of maximum degree 4 and with arbitrarily large girth.

Theorem 12

For any given integer \(g\ge 3\) and any given tree T containing two degree-3 vertices, cluster-vd is \(\textsf{NP}\)-complete on T-free n-vertex bipartite graphs of maximum degree at most 4 and with girth \(>g\) and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

Proof

Note that cluster-vd restricted to the graph class in question is in \(\textsf{NP}\). Below we give a polynomial-time reduction from cluster-vd restricted to bipartite graphs of degree at most 4 to cluster-vd restricted to T-free bipartite graphs of degree at most 4 and with arbitrarily large girth.

First, given a bipartite graph G of maximum degree at most 4 with n vertices and m edges, let \(G'\) be obtained from G by subdividing the edges as described in Lemma 11. Note that like G, \(G'\) is bipartite and has maximum degree at most 4. By Lemma 11, G has a cluster vertex deletion set of size at most k if and only if \(G'\) has a cluster vertex deletion set of size at most \(k+m\).

Now, given \(g>0\) and a tree T with two degree-3 vertices, fix an integer \(t\ge \max \{\log _4 g{,} |V(T)|\}\). Then, repeating the construction in Lemma 11t times, the final bipartite graph \(G'\) has girth \(4^t\cdot girth(G) > g\) and maximum degree at most 4, and contains no induced subgraph isomorphic to T (as the distance between two degree-3 vertices in \(G'\) is larger than |V(T)|). Thus the \(\textsf{NP}\)-hardness part of the theorem follows from the first part of Theorem 9. Note that \(G'\) has \(n+(4^t-1)m=O(n)\) vertices, hence, the second part of the theorem follows from the second part of Theorem 9. \(\square \)

Observe that if we consider subcubic bipartite graphs and make use of Theorem 10 instead of Theorem 9 in the proof of Theorem 12, we obtain:

Theorem 13

For any given integer \(g\ge 3\) and any given tree T containing two degree-3 vertices, cluster-vd is \(\textsf{NP}\)-complete on T-free subcubic bipartite graphs and with girth \(>g\) and, assuming ETH, cannot be solved in \(2^{o(\sqrt{n})}\) time.

Fig. 3
figure 3

The tree H(grs). The \((g+2)n\) black vertices form an optimal (connected) cluster vertex deletion set

We now are going to show that connected cluster-vd remains \(\textsf{NP}\)-complete when restricted to bipartite graphs with arbitrarily large girth. (Notice that a reduction based on Lemma 11, similar to the reduction in Theorem 12, does not work for connected cluster-vd.) Let \(g>0\) be a given integer. From an instance (Gk) of cluster-vd, where \(G=(X\cup Y,E)\) is a bipartite graph with girth \(>g\), we construct an instance \((G(g),k')\), where G(g) is a bipartite graph of girth \(>g\), for connected cluster-vd as follows:

  • We may assume that g is odd (otherwise, replace g by \(g+1\));

  • Write \(X=\{x_1,x_2,\ldots ,x_r\}\), \(Y=\{y_1,y_2,\ldots ,y_s\}\), and \(n=r+s\);

  • Let H(grs) be the tree depicted in Fig. 3; note that H(grs) has \(6r+3gr+6s+3gs=(6+3g)n\) vertices. The property of H(grs) that will be used is that the set of all degree-3 vertices of H(grs), that is all \(x_{ig}\), \(1\le i \le r\), and all \(y_{jg}\), \(1\le j\le s\), is both an optimal cluster vertex deletion set and the unique connected cluster vertex deletion set. The vertices \(x_{ig}\) and \(y_{jg}\) will have degree 3 in the whole graph G(g). In Fig. 3 the unique connected cluster vertex deletion set contains the \((g + 2)n\) black vertices.

Then, let G(g) be obtained from G and H(grs) by adding an edge between \(x_i\) and \(x_{ig}\), \(1\le i\le r\), and between \(y_j\) and \(y_{jg}\), \(1\le j\le s\). Note that like G, G(g) is bipartite (as g is odd) and has \(n'=n+(6+3g)n=(7+3g)n\) vertices. See Fig. 4 for an example in case \(g=3\). Finally, set \(k'=k+(g+2)n\). Clearly, \((G(g),k')\) can be constructed in polynomial time from (Gk).

Fig. 4
figure 4

An example of the reduction from cluster-vd to connected cluster-vd: A bipartite graph G (left) and the bipartite graph G(3) (right) obtained from G and H(3, 4, 3); the bipartition of the vertex set is indicated by circle and rectangle vertices

Now, let S be a cluster vertex deletion set of G of size at most k. Then G(g) has a connected cluster vertex deletion set \(S'\) of size \(|S|+(g+2)n\le k'\): \(S'\) is obtained from S by adding all vertices of H(grs) with degree 3 in G(g) (the \((g+2)n\) black vertices in Fig. 3). Observe that \(S'\) induces a connected subgraph in G(g) since every vertex in S is adjacent to some \(x_{ig}\) or \(y_{jg}\), and all vertices of H(grs) with degree 3 in G(g) induce a connected subgraph in G(g).

Conversely, let \(S'\) be a (connected or not) cluster vertex deletion set of G(g) of size at most \(k'\). Since every vertex u in H(grs) with degree 3 in G(g) (the black vertices in Fig. 3) belongs to an induced \(P_3=uvw\) in H(grs) with \(\deg _{G(g)}(v)=2\) and \(\deg _{G(g)}(w)=1\), we may assume that \(S'\) contains all \((g+2)n\) vertices of H(grs) with degree 3 (and no other vertices of H(grs)). Let S be the restriction of \(S'\) on V(G). Then S is a cluster vertex deletion set of G of size \(|S|=|S'|-(g+2)n\le k\).

Observe that the girth of G(g) is at least \(\max \{girth(G), 2g+6\}>g\) and the maximum degree of G(g) is one more than the maximum degree of G. Hence, by Theorems 12 and 13, we obtain:

Theorem 14

For any given integer \(g\ge 3\), connected cluster-vd is \(\textsf{NP}\)-complete on bipartite graphs of maximum degree at most 5 and with girth \(>g\) and, assuming ETH, cannot be solved in \(2^{o(n)}\) time.

Theorem 15

For any given integer \(g\ge 3\), connected cluster-vd is \(\textsf{NP}\)-complete on bipartite graphs of maximum degree at most 4 and with girth \(>g\) and, assuming ETH, cannot be solved in \(2^{o(\sqrt{n})}\) time.

6 H-free Graphs: \(\textsf{NP}\)-completeness Cases

In this section we give the proof of the \(\textsf{NP}\)-completeness part of Theorems 1 and 2.

Let H be a fixed graph. By Proposition 5, cluster-vd is polynomially solvable on H-free graphs whenever H is an induced subgraph of the 4-vertex path \(P_4\). The following fact is easy to see:

Observation 16

A graph is an induced subgraph of the 4-path \(P_4\) if and only if it is a \(\{3P_1, 2P_2\}\)-free forest.

Thus, it remains to consider the cases where H contains a cycle or a \(3P_1\) or a \(2P_2\) as an induced subgraph.

Now, if H contains a cycle then graphs of girth \(> g=|V(H)|\) are H-free, hence Theorems 12 and 14 imply that cluster-vd and connected cluster-vd are \(\textsf{NP}\)-complete on H-free graphs and, assuming ETH, cannot be solved in \(2^{o(n)}\) time on H-free n-vertex graphs. If H contains a \(3P_1\) or a \(2P_2\) then \(\{3P_1, 2P_2\}\)-free graphs are H-free graphs, hence Theorems 7 and 8 imply that cluster-vd and connected cluster-vd are \(\textsf{NP}\)-complete on H-free graphs and, assuming ETH, cannot be solved in \(2^{o(n)}\) time on H-free n-vertex graphs.

The proofs of Theorems 1 and 2 are complete.

7 Conclusion

We have found a complete characterization of graphs H for which cluster-vd on H-free graphs is polynomially solvable and for which it is \(\textsf{NP}\)-complete (Theorem 1). The same complexity dichotomy holds also for connected cluster-vd (Theorem 2).

We remark that a complexity dichotomy for vertex cover and connected vertex cover on H-free graphs, like Theorems 1 and 2 for cluster-vd and connected cluster-vd, respectively, seems very hard to achieve. Indeed, it is a long-standing open problem whether there exists a constant t for which vertex cover or connected vertex cover is \(\textsf{NP}\)-complete on \(P_t\)-free graphs. So far it is known that such a constant t, if any, must be at least 7 for vertex cover [13], respectively, at least 6 for connected vertex cover [19].

Let \(\mathcal H\) be a set of (possibly infinitely many) graphs. A natural question generalizing the case of one forbidden induced subgraph is: what is the complexity of cluster-vd and of connected cluster-vd on \(\mathcal{H}\)-free graphs? The case \(\mathcal{H}=\{H\}\) is completely solved by Theorems 1 and 2. The case \(\mathcal{H}=\{C_\ell \mid \ell \ge 4\}\), also known as chordal graphs, addressed in [3] is still open. The next step may be the case of two-element sets \(\mathcal{H} =\{H_1,H_2\}\); in particular, \(\mathcal{H} =\{H,\overline{H}\}\). Another interesting problem is to clear the complexity of cluster-vd and connected cluster-vd on line graphs, a well-studied graph class defined by excluding nine small induced subgraphs.