1 Introduction

Faliszewski et al. [18] propose a fictitious election of the chair of Computer Science Department in a faraway university. We modify and expand their example to illustrate the problems we are going to study.

The department consists of three research teams and each one can nominate exactly one of two candidates: a or \(a'\) for group A(rtificial Intelligence), b or \(b'\) for B(usiness Informatics), and c or \(c'\) for C(omputer Networks). The preferences of the members of the department are as follows:

$$\begin{aligned} \begin{array}{cc} \text {Group 1 (18 members) } &{} b\succ a \succ c'\succ a'\succ b'\succ c \\ \text {Group 2 (10 members) } &{} b'\succ a \succ c'\succ a'\succ c\succ b \\ \text {Group 3 ( 8 members) } &{} a'\succ c \succ a\succ b\succ b'\succ c' \\ \text {Group 4 (20 members) } &{} c'\succ a \succ c\succ a'\succ b'\succ b \\ \end{array} \end{aligned}$$

The voting rule has not been decided yet, so the research groups face a difficult decision: which candidate to nominate to ensure the victory for themselves? As there are two possible candidates for each of the three teams, there are eight possible candidate sets, moreover, the winner of the election depends on the voting rule used.

Table 1 Election winners for different candidate sets and voting rules

Table 1 shows that Team A can ensure the victory of their nominee irrespective of the nominations of other groups, if they nominate candidate a, unless among rules listed in Table 1 Plurality or Plurality with runoff (PRun for short) will be used. Their candidate \(a'\) can also possibly win but on condition that the other nominees will be \(b'\) and c. Team C could get their winner for all voting rules, if they nominate candidate \(c'\), provided team A simultaneously nominate a’. By contrast, team B cannot ensure the victory of their candidate under any circumstances, irrespective of the voting rule.

In real life political parties usually carefully contemplate whom to nominate for important positions in the political structure of the country. This process is often performed through primaries where voters affiliated with the political party vote over candidates of their own party and the winner of these primaries then competes in a general election [10]. However, not taking into account the preferences of voters outside the party might not be an optimal strategy. Have a look at our example again. Assume that Department C comprises Group 3 and 6 voters from Group 4 and that Borda is used as the voting rule. If Department C nominate thanks to the primaries their candidate c then they are sure to lose the election. By contrast, if these voters take into account the preferences of other voters then under favourable circumstances (when Department A nominates \(a'\)) their candidate might be the winner.

In other cases parties decide their support for a concrete candidate based on information obtained from polls about the preferences of all potential voters and the final nomination is decided not by an election within the party, but by its leadership. We mention at least two real recent examples illustrating the importance of the candidate nominations that come from the country of most authors of this paper.

President of Slovakia has a limited, but not just ceremonial role. The president has the power to nominate the prime minister, to veto bills, and to nominate judges on the highest tiers of the judicial branch. Presidential elections in 2019 took place in the atmosphere of anti-government protests following the murder of a young journalist, Ján Kuciak and his fiancée, Martina Kušnírová. Kuciak had been investigating the alleged ties between government officials and organised crime. SMER, Slovakia’s ruling party at that time and its leader, Slovakia’s former prime minister Robert Fico, have both been tarnished with corruption claims.Footnote 1 Clearly, too much was at stake. SMER was delaying the announcement of their candidate, considering several possible nominees and testing their expected acceptance by the public, with the clear aim to secure the position in the palace for themselves.Footnote 2

In 2020 Slovakia was electing the new general prosecutor – a key post for the future of law enforcement in the country. After rumours that one of the coalition parties made a deal with the opposition to support one of the candidates, prime minister warned that should one of the coalition parties elect the new prosecutor general with the help of the opposition, his party will leave the government and resign [19].

Plurality with run-off is used both in the election of the president as well as of the general prosecutor in Slovakia, but while the president is elected by popular vote, the general prosecutor is elected by the parliament that has 150 members.

Faliszewski et al. [18] formally defined the problem of candidate nomination for the election where the set of candidates is split into parties and each party can nominate just one candidate. Given the preferences of all voters over all potential candidates, the question is whether a given party can choose its nominee in such a way that she will be the winner of the election for some nominations of other parties (problem Possible President) or that she will be the winner irrespective of other nominations (problem Necessary President).

Faliszewski et al. [18] explored the computational complexity of these problems for plurality. They showed that Possible President is NP-complete and Necessary President is coNP-complete for general preference profiles. Given these hardness results, they concentrated on structured preferences and showed that Necessary President admits a polynomial-time algorithm for single-peaked profiles, but Possible President remains NP-complete even on 1D-Euclidean profiles, but admits a polynomial-time algorithm if the elections are restricted to single-peaked profiles where the candidates of any party appear consecutively on the societal axis.

In a follow-up paper Misra [29] invoked the framework of parameterized complexity, again for plurality. He showed that Possible President is in XP and also W[2]-hard when parameterized by the number of parties, but FPT when restricted to profiles that are 1D-Euclidean. When parameterized by the size of the largest party, Possible President is para-NP-hard, even when constrained to profiles that are both single-peaked and single-crossing. As far as Necessary President is concerned, he showed that this problem is also in P for single-crossing profiles with no additional requirements.

A different approach to the problem of strategic selection of party nominees was presented in [21]. The authors assumed that each potential party candidate as well as each voter has a predefined position on a one-dimensional political spectrum and a candidate attracts the closest voters compared to all other candidates. The players of the game are parties and they strategise over which candidate to select as their nominee. In this game, the Nash equilibria are not guaranteed to exist even in two party competitions. Finding a Nash equilibrium is NP-complete for the general case, but if there are only two competing parties, this can be achieved in linear time. The model was extended in [12]. Now, each candidate comes at a different cost and the goal of a party is to choose the most profitable nominee, given the nominees chosen by the rest of the parties. The profit of a party is the number of voters closer to their nominee minus its cost. The authors examine the parameterized complexity of deciding whether a pure Nash equilibrium exists for this model. Several FPT and XP algorithms and W[1]-hardness results are presented.

1.1 Our contribution

Let us stress here that all the above works used only plurality as the voting rule. Therefore, we extend our understanding of parties’ candidate nomination problem by exploring the computational complexity of Possible President and Necessary President for several different voting rules: k-approval (its special case is plurality), k-veto, Borda, Plurality with run-off, and three Condorcet-consistent voting rules: Copeland, Llull, and maximin.

We show that for all the considered voting rules Possible President is NP-complete, even when each party has size at most two. Moreover, we show that Necessary President is coNP-complete for k-approval, k-veto and Plurality with run-off. Then we formulate integer programs for the Possible President and Necessary President problems for the studied voting rules and test them on real and artificial data.

1.2 Related work

Our work fits into the framework of strategic candidacy games, introduced in [13]. There is a set of voters and a set of candidates and the voters as well as candidates have preferences over all candidates. In the first stage of the game the candidates decide whether to run in the election or not and then the chosen voting procedure is applied. The authors call a voting rule to be candidate stable if the joint action where all candidates enter the election is always a pure strategy Nash equilibrium and they show that no non-dictatorial voting procedure satisfying unanimity is stable.

Lang et al. [23] study Nash equilibria in strategic candidacy games for several voting rules. They show that in the case of 4 candidates a pure strategy Nash equilibrium always exists for Condorcet-consistent rules, but for plurality, for at least 4 candidates, and for maximin for at least 5 candidates, there are candidacy games without Nash equilibria Brill and Conitzer [11] extend the analysis to also include strategic behavior by the voters and study the complexity of computing the set of potential outcomes.

By contrast to the situations when candidates themselves or their parties decide whether to run in the election or whom to nominate, the problem of election control by adding or deleting candidates by a chair or other manipulating entity has been studied for a much longer time. The first work dealing with its complexity was [1]. These authors already showed that the complexity of the control problem may depend on the particular voting rule used. Namely, they showed that the control by adding as well as deleting candidates is NP-hard for plurality, but Condorcet rule is immune to control by adding candidates and computationally vulnerable to control by deleting candidates. As there exists a huge number of works dealing with various special cases of this type of election control, we mention here just a monograph chapter [16] and a recent paper [15] giving a detailed classification and overview of known results concerning various constructive (a given candidate is meant to be the winner) as well as destructive (the aim is to prevent a certain candidate from winning) control problems by replacing, adding and deleting candidates or voters for the most prominent voting rules.

For hard control problems alternative solution methods have been proposed. We have been mostly inspired by the following papers. [7] investigates the parameterized complexity of control problems for Llull and Copeland voting rules by representing the elections in a form of directed graphs. In our experimental work we mainly follow the approach taken in [20] and [31]. Gurski and Roos [20] consider constructive control by adding and deleting candidates in Copeland and Llull voting rules and derive binary linear programming formulations for these problems. Polyakovskiy et al. [31] design integer linear programs for constructive control by deleting voters or candidates in range elections, plurality, Condorcet, maximin and Bucklin rule. They also sketch how their models proposed for problems of constructive control can be adapted for destructive control.

There are also works studying situations when uncertainty stems from the voters, rather from the incomplete knowledge of the candidate set. Konczak and Lang [22] introduced a model of election where voters have only partial preferences over candidates. A candidate is a possible winner if she is a winner for some complete extension of preferences and she is a necessary winner if she is a winner for all complete extensions. Konczak and Lang [22] showed that both possible winner and necessary winner can be computed in polynomial time for the Condorcet rule. Xia and Conitzer [34] determined the complexity of possible/necessary winner problems for further voting rules, including plurality, veto, k-approval, Borda, for Copeland, maximin and plurality with runoff. Betzler and Dorn [6] and Baumeister and Rothe [2] fully characterized the possible winner problem for positional scoring rules, showing that it is NP-complete for all such rules apart from plurality and veto. In a different model, Baumeister et al. [4] assumed that voters have weights but these are unknown. Now a candidate is a possible winner if she is the winner of the election for some weights of the voters. [4] classified several voting rules into polynomial-time solvable or NP-complete with respect to the possible winner problem and their results have been complemented by [30].

2 Definitions and notation

An election is a pair \(E=(A,V)\) where \(A=\{a_1,a_2,\dots , a_M\}\) is a finite set of M candidates (alternatives) and \(V=\{v_1,v_2,\dots , v_N\}\) is a finite set of N voters. The preference of each voter v is represented by a strict linear order \(\succ _v\) on the set of candidates A, where \(a\succ _v a'\) means that voter v prefers candidate a to candidate \(a'\), or, as we shall sometimes say, that candidate a is higher in the preference list of voter v than candidate \(a'\). The collection of all elections will be denoted by \({{{\mathcal {E}}}}\). We shall also assume that a partition \({{{\mathcal {P}}}}=\{P_1,\dots ,P_r\}\) of the set of candidates A is given, sets \(P_\ell\) are interpreted as parties that have to decide about whom of their potential candidates to nominate for the election.

Formally, the reduced election that arises after all parties have nominated a unique candidate, will be \(E_c=(A_c,V)\), where \(|A_c\cap P_\ell |=1\) for each party \(P_\ell \in {{\mathcal {P}}}\) and the preference of each voter \(v\in V\) is the restriction of her original preference over A to \(A_c\).

A voting rule \(f:{{{\mathcal {E}}}}\rightarrow 2^A\) chooses a set of winners of the election.

We shall consider the following voting rules.

  • A positional scoring rule for elections with M candidates is associated with a vector \(\textbf{s}=(s_1,s_2,\dots ,s_M)\), such that \(s_1\ge s_2\ge \cdots \ge s_M\) and at least one inequality is strict. The rule assigns to each candidate a for each voter that places a in the \(i^{th}\) place of her preference list \(s_i\) points. The winners are all the candidates with the highest total number of points (score). Special cases of positional scoring rules are:

    • The Borda rule assigns a candidate \(M-1\) points each time it is ranked first, \(M-2\) points each time it is ranked second,..., and 0 points each time it is ranked last. This can also be interpreted in such a way that the number of points candidate a receives from a voter v is equal to the number of other candidates that are ranked lower than a in the preference list of v.

    • Plurality corresponds to the score vector \(\textbf{s}=(1,0,\dots ,0)\). In other words, in plurality each voter votes only for her top candidate.

    • k-approval rule corresponds to the score vector with ones in their first k positions and zeros afterwards. This means that in k-approval each voter votes for their top k preferred candidates and the winners are all the candidates with the highest number of votes.

    • In k-veto rules each voter votes for their k least preferred candidates and the winners are all the candidates with the smallest number of votes. k-veto has the score vector containing ones in the first \(M-k\) positions and then zeros.

  • Plurality with runoff (PRun for brevity) consists of two rounds. In the first round every voter casts a single vote for their most preferred candidate. If some candidate a receives a strict majority of votes then she is a winner. Otherwise, if there are at least two candidates with the maximum number of votes, they all proceed to the second round; and if there is only one such candidate, then this candidate together with all candidates with the second maximum number of votes proceed to the runoff round. In the runoff round the candidate with the support of an absolute majority of voters is the winner; if more than one candidate receives the maximum number of votes, a tie-breaking rule is applied to choose a unique winner.Footnote 3

  • Voting rules based on pairwise contests between pairs of candidates. For two candidates \(a, b \in A\), let \(N_E(a, b)\) be the number of voters preferring a to b in election E, called the advantage of a over b. A Condorcet winner is a candidate a who beats all other candidates in pairwise contests, i.e., for each other candidate b, it holds that \(N_E(a, b) > N_E(b, a)\). As a Condorcet winner does not always exist (but if it exists, it is unique), several Condorcet consistent rules, i.e., such that always choose the Condorcet winner, if it exists, have been proposed.

    • In the maximin voting rule the winners of the election E are all the candidates a whose minimum advantage \(N_E(a)=\min _{b\in A} N_E(a,b)\) is maximum over all candidates.

    • In Copeland\(^\alpha\)-rule with the parameter \(\alpha \in [0,1]\) every candidate a receives one point for every other candidate b such that \(N_E(a, b) > N_E(b, a)\) and she receives \(\alpha\) points for every other candidate b such that \(N_E(a, b) = N_E(b, a)\). The winners of Copeland\(^\alpha\) rule are all the candidates with the highest score. (Notice that if there are an odd number of voters then no head-to-head ties can happen and all the election rules in the Copeland\(^\alpha\) family are equivalent.) In this paper we examine two important cases: \(\alpha = 0\), denoted as Copeland, and \(\alpha = 1\), denoted as Llull.

Now we formulate the problems studied in this paper. Notice that we consider the unique winner model, i.e., we consider a candidate \(p\in A\) to be the winner only if \(f(E)=\{p\}\).

figure a
figure b

3 Intractability of candidate nomination

Notice that for all our considered voting rules the problem Possible President belongs to the class NP and the problem Necessary President to the class coNP, since when the nominated candidates are known, the winner of the election can be computed in polynomial time.

All our hardness proofs use a polynomial transformation from the following NP-complete modification of satisfiability [5]. Here and elsewhere, the symbol [z] for some integer z denotes the set of integers \(\{1,2,\dots ,z\}\).

figure c

In all our intractability proofs of Theorems 17 we construct a polynomial transformation from the problem (2,2)-e3-SAT to the Possible President or Necessary President problem for a certain voting rule. So we always assume that a boolean formula B with variables \(x_j, j\in [n]\) and clauses \(C_i,i\in [m]\) of the form \(C_i=\ell _r+\ell _s+\ell _t\) for some literals \(\ell _r,\ell _s\) and \(\ell _t\) is given. Based on a formula B we specify the components of an election E.

To simplify notation, a candidate associated with a variable \(x_j\) (literal candidates) or a clause \(C_i\) (clause candidates) will be denoted by the same symbol \(x_j\) or \(c_i\) respectively, as no confusion should occur. Sometimes a general literal or a literal candidate will be denoted by \(\ell\).

Without loss of generality, we shall assume that the ordering of the literals in each clause is derived from their standard ordering \(x_1,{\bar{x}}_1,x_2,{\bar{x}}_2,\dots ,x_n,{\bar{x}}_n\). To write the preference lists of voters in a more compact way, we shall use some shorthands. When we write [X] we mean all the literal candidates in the standard ordering, \([\overleftarrow{X}]\) denotes all the literal candidates in the reversed ordering; similarly [C] is the ordered list \(c_1,c_2,\dots ,c_m\) and \([\overleftarrow{C}]\) the ordered list \(c_m,c_{m-1},\dots ,c_1\). Symbol \([\dots ]\) indicates all the remaining candidates, not explicitly written in the respective preference list, in any strict order.

3.1 Scoring rules

First we prove intractability of k-approval and k-veto. Our proof of Theorem 1 is a modification of the proof of Theorem 1 in [18]. However, our result is a bit stronger, as we achieve that the intended candidate is the unique winner of the election, not just one member of a larger set of the winners.

Notice that k-approval and k-veto for \(k=1\) reduce to plurality and (simple) veto, respectively, and in this case, the ‘garbage collector’ candidates denoted by Greek letters do not appear in the constructions.

Theorem 1

Possible President is NP-complete and Necessary President is coNP-complete for k-approval for any k, even if the size of each party is at most 2.

Proof

Let a formula B be given. We construct an election as follows.

There are \(n + (m+4)\cdot (k-1) + 2\) parties: \(P = \{p\}, P' = \{p'\}, P_{\alpha _q} = \{\alpha _q\}, P_{\beta _q} = \{\beta _q\}, P_{\gamma _q} = \{\gamma _q\}, P_{\delta _q} = \{\delta _q\},\) for each \(q \in [k-1], P_{i,q} = \{\epsilon _{i,q}\}\) for each \(i \in [m], q \in [k-1]\) and \(P_{j} = \{x_j, {\bar{x}}_j\}\) for each variable \(x_j, j \in [n]\). The set of voters is \(V = V_1 \cup V'_1 \cup V_2 \cup V'_2 \cup V_3\) and their number is \(2\,m + 9\). The voters’ preferences are depicted in Table 2.

Table 2 Voters for the construction in the proof of Theorem 1

According to the preferences given in Table 2, candidate p receives 5 votes from the voters in \(V_1 \cup V'_{1}\) and candidate \(p'\) receives 4 votes from the voters in \(V_2 \cup V'_{2}\). Further, irrespective of how the parties nominate their candidates, none of the garbage collectors is approved by more than 3 voters and hence cannot be a winner and no literal candidate receives more than 4 approvals.

Assume now that B is satisfied by a boolean valuation f. Let each party \(P_j\) nominate the true literal of variable \(x_j\). In that case no candidate different from p receives more than 4 votes, so p is the winner of the election.

Conversely, assume that p is the winner of the election. Candidate \(p'\) must not receive the votes from \(V_3\), since in that case she would receive at least two votes (additionally to 4 votes from \(V_2\) and \(V'_2\)) as voters in \(V_3\) come in identical pairs. But that means that parties \(P_j\) have to nominate their candidates in such a way that B is satisfied by the truth valuation defined by setting the literals corresponding to nominated candidates true.

This means that there exist nominations of candidates such that p is the only winner of the election if and only if B is satisfiable, hence Possible President is NP-complete.

Finally, candidate \(p'\) is the winner of the election for any possible nominations of literal candidates if and only if B is not satisfiable, because then she receives at least two additional votes from voters in \(V_3\), in total 6 votes. Therefore Necessary President is coNP-complete. \(\square\)

Theorem 2

Possible President is NP-complete and Necessary President is coNP-complete for k-Veto for any k, even if the size of each party is at most 2.

Proof

Given a formula B, we construct \(n+k\) parties: \(P = \{p\}\), \(P_{j} = \{x_j, {\bar{x}}_j\}\) for each variable \(x_j, j \in [n]\) and \(P'_u=\{\gamma _u\}\) for \(u\in [k-1]\). There are \(2m+3n+3\) voters in \(V=V_1\cup V_2\cup V_3\cup V_4\) with the preferences given in Table 3. The symbol \(\Gamma\) in the preference lists represents the ordered sequence \(\gamma _1,\gamma _2,\dots ,\gamma _{k-1}\).

Table 3 Voters for the construction in the proof of Theorem 2

Notice that each \(\gamma _u\), \(u\in [k-1]\) is vetoed by each voter, while candidate \(p'\) always receives 2 vetos and each literal candidate receives at least 3 vetos.

If B is satisfied by a boolean valuation f and each party \(P_j\) nominates the true literal of variable \(x_j\), then p receives no additional veto to those from the voter in \(V_1\) and so she is the unique winner of the election with only one veto.

Conversely, should p be the winner of the election, she must not receive any additional veto from the voters in \(V_3\), since in that case she would receive in total at least 3 vetos, as candidates in \(V_3\) are in identical pairs. This means that parties \(P_j\) have to nominate their candidates in such a way that B is satisfied by the truth valuation defined by setting the literals corresponding to nominated candidates true.

Therefore there exist nominations of candidates such that p is the only winner of the election if and only if B is satisfiable, so Possible President is NP-complete.

Further, it is easy to see that \(p'\) is the winner of the election with exactly two vetos irrespective of the nominations of the literal candidates, if and only if B is not satisfiable, so Necessary President is coNP-complete. \(\square\)

The following theorem deals with another important scoring rule.

Theorem 3

Possible President is NP-complete for Borda, even if the size of each party is at most 2.

Proof

Based on a formula B we shall construct an election E as follows. First we define the set of candidates A. The \(m+n+1\) parties are \(P=\{p\}\), \(P_i=\{c_i\}\), \(i\in [m]\) and \(P'_j=\{x_j,{\bar{x}}_j\}\) for \(j\in [n]\). The set of voters is \(V=V_1\cup V_2\cup V_3\cup V_4\) where \(|V_1|=|V_2|=m-1\), \(V_3=\{v_1,\dots , v_m\}\), \(V_4=\{v'_1,\dots , v'_m\}\) and their preferences are given in Table 4. Preferences of all voters in \(V_1\) are the same, similarly for voters in \(V_2\). Voters \(v_i\) and \(v'_i\) correspond to clause \(C_i\), we assume that \(C_i=\ell _r+\ell _s+\ell _t\). The indices of clause candidates different from \(c_i\) in the preferences lists of these voters are understood modulo m. The notation \([\overleftarrow{X^{-i}}]\) denotes all the literal candidates except those present in clause \(C_i\) in the ordering reversed to the standard ordering.

Table 4 Voters for the construction in the proof of Theorem 3

Recall that the number of points that a given candidate a receives from a voter v is equal to the number of other candidates that are ranked lower than a in the preference list of v. Let sc(a) denote the total score of candidate \(a\in A\).

Let us count the points received by candidate p. As she is higher than all clause and all variable candidates in the preference lists of all voters in \(V_1\cup V_2\) and all variable candidates plus candidate \(c_i\) in the list of each voter \(v_i\) in \(V_3\), her total score is

$$\begin{aligned} sc(p)=2(m-1)(m+n)+m(n+1)=3mn+2m^2-m-2n. \end{aligned}$$

Now take a look at a nominated candidate \(\ell _j\in \{x_j,{\bar{x}}_j\}\) for some j. For voters in \(V_1\) she is higher than candidates nominated by parties \(P_{j+1},\dots , P_{n}\), in total \(n-j\) candidates, and for voters in \(V_2\) she is higher than candidates nominated by parties \(P_{j-1},\dots , P_1\), in total \(j-1\) candidates. Therefore from voters in \(V_1\cup V_2\) she gets altogether \((m-1)(n-j)+(m-1)(j-1)=(m-1)(n-1)\) points. From each voter \(v_i\in V_3\) she gets at most n points and from each voter \(v'_i\in V_4\) at most \(n+m\) points (this happens if literal \(\ell _j\) appears in clause \(C_i\)), so

$$\begin{aligned} sc(x_j)\le (m-1)(n-1)+mn+m(n+m)=3mn+m^2-m-n+1. \end{aligned}$$

Due to the form of a formula in (2,2)-e3-sat we know that \(n=3m/4\), therefore

$$\begin{aligned} sc(p)=\frac{9}{4}m^2+2m^2-m-\frac{3}{2}m=\frac{17}{4}m^2-\frac{5}{2}m \end{aligned}$$

and

$$\begin{aligned} sc(x_j)\le \frac{9}{4}m^2+m^2-m-\frac{3}{4}m+1=\frac{13}{4}m^2-\frac{7}{4}m+1, \end{aligned}$$

therefore

$$\begin{aligned} sc(p)-sc(x_j)\ge m^2-\frac{3}{4}m-1=\left( m-\frac{3}{8}\right) ^2-\frac{73}{64}>0, \end{aligned}$$

because \(m\ge 4\). Therefore no literal candidate can have more points than p in the election.

Now let us count the number of candidates that are lower in the preferences of voters than candidate \(c_i\) for some i. First, \(c_i\) is higher than all literal candidates for voters in \(V_1\cup V_2\), this means \(2(m-1)n\) points. \(c_i\) is also higher than candidates \(c _{i+1}, \dots , c_m\) in the preference lists of voters in \(V_1\) and than candidates \(c _{i-1}, \dots , c_1\) in the preference lists of voters in \(V_2\), in total \((m-1)(m-i)+(m-1)(i-1)=(m-1)(m-1)\). Similarly, as the ordering of clause candidates by a voter \(v_s\in V_3\) is reversed to the ordering of clause candidates by the corresponding voter \(v'_s\in V_4\), candidate \(c_i\) receives exactly \(m(m-1)\) points from voters in \(V_3\cup V_4\) thanks to other clause candidates. Further, candidate \(c_i\) is higher than candidate p in the preference lists of all but one voter in \(V_3\) and in the preference lists of all voters in \(V_4\) and she is higher than all literal candidates in the preference lists of all voters in \(V_3\), except voter \(v_i\). This gives a partial score

$$\begin{aligned} sc'(c_i)=2(m-1)n+ (m-1)(m-1)+m(m-1)+(2m-1)+(m-1)n=3mn+2m^2-m-3n. \end{aligned}$$

Further notice that the only voter in \(V_4\) who prefers candidate \(c_i\) to some literal candidate is voter \(v'_i\).

Now suppose that B is satisfied by a boolean valuation f. Assume that each party \(P_j\) nominates the candidate corresponding to the true literal of \(x_j\). Then for each i, voter \(v'_i\) prefers \(c_i\) to at most \(n-1\) variable candidates, hence

$$\begin{aligned} sc(c_i)\le sc'(c_i)+n-1=3mn+2m^2-m-2n-1<sc(p), \end{aligned}$$

so p is the only winner of the election.

Conversely, suppose that p is the only winner of the election for some nomination of literal candidates. Then the maximum number \(\theta\) of points that any candidate \(c_i\) may receive from voter \(v'_i\) due to the nominated variable candidates to have \(sc(p)> sc(c_i)\) is such that

$$\begin{aligned} 0<sc(p)- (sc'(c_i)+\theta )= n-\theta , \end{aligned}$$

so \(\theta <n\). This means that each \(c_i\) may receive from \(v'_i\) at most \(n-1\) points due to the nominated variable candidates. From this we obtain that the nomination contains at least one literal candidate present in each clause – setting these literals true makes B satisfied.

Hence, Possible President for Borda is NP-complete. \(\square\)

3.2 Plurality with run-off

Theorem 4

Possible President is NP-complete and Necessary President is coNP-complete for plurality with run-off, even if the size of each party is at most 2.

Proof

Let B be a boolean formula. In the constructed election there are \(n+2\) parties \(P=\{p\}\), \(P'=\{p'\}\) and \(P_j=\{x_j,{\bar{x}}_j\}\) for each variable \(x_j, j\in [n]\). The set of voters is \(V=V_1\cup V_2\cup \ V_3\cup V_4\) and their number is \(2\,m+10\). The voters’ preferences are given in Table 5.

Table 5 Voters for the construction in the proof of Theorem 4

First notice that in any case, in the first round of the election each literal candidate receives at most 4 votes from voters in \(V_3\), as each literal occurs exactly twice in B. However, the nominee of party \(P_1\) gets exactly 4 votes from voters in \(V_3\), as the preferences of voters in \(V_3\) obey the standard ordering, plus one additional vote from the voter in \(V_4\).

Assume that B is satisfied by a boolean valuation f. Let each party \(P_j\) nominate the true literal of variable \(x_j\). Let us count the votes for the candidates in the first round. Candidate p gets 5 votes from \(V_1\) and candidate \(p'\) gets 4 votes from \(V_2\). Moreover, as said above, the true literal of variable \(x_1\), say \(\ell _1\), gets 5 votes and no other candidate receives more than 4 votes. Therefore the participants of the second round are the candidates p and \(\ell _1\). In the second round, \(\ell _1\) receives still 5 votes, but p gets in addition to the 5 votes from \(V_1\) also 4 votes from \(V_2\) and \(2\,m-4\) votes from those voters in \(V_3\) that correspond to clauses not containing \(\ell _1\). Hence p wins the election.

Conversely, assume that p is the winner of the election. For that to happen, p has to make it to the second round. As p gets 5 votes from \(V_1\), candidate \(p'\) must not get the votes from \(V_3\), since in that case she would receive at least two votes (additionally to the 4 votes from \(V_2\)) as voters in \(V_3\) come in identical pairs. But that means that parties \(P_j\) have to nominate their candidates in such a way that B is satisfied by setting the literals corresponding to nominated candidates true.

Hence, Possible President is NP-complete.

Further, let us realize that \(p'\) is the winner of the election irrespective of the nominations from all the parties if and only if B is not satisfiable: candidates \(p,\ell _1\) and \(p'\) are the participants of the second round after obtaining 5, 5 and at least 6 plurality votes in the first round, respectively. Therefore Necessary President is coNP-complete. \(\square\)

3.3 Condorcet consistent voting rules

Theorem 5

Possible President is NP-complete for maximin voting rule, even if the size of each party is at most 2.

Proof

Based on a boolean formula B we construct an election E as follows. There are \(m+n+1\) parties \(P=\{p\}\), \(P'_i=\{c_i\}\) for \(i\in [m]\) and \(P_j=\{x_j,{\bar{x}}_j\}\) for each variable \(x_j, j\in [n]\). The set of voters is \(V=V_1\cup V_2\cup \ V_3\cup V_4\cup V_5\) and their number is \(6m+1\). The voters’ preferences are given in Table 6.

To express the constructed preferences in a compact form, in addition to shorthands defined in Sect. 2 we use the following notation. \([X^{-i}]\) means all the literal candidates except those contained in clause \(C_i\), written in the standard ordering, and \([C^{-i}]\) means \(c_1\succ \dots \succ c_{i-1}\succ c_{i+1}\succ \dots \succ c_m\).

Table 6 Voters for the construction in the proof of Theorem 5

Let us evaluate the pairwise contests of the candidates for the election \(E_c\) obtained by some nominations of candidates by parties.

\(N_{E_c}(p,\ell )=4m+1\) for any nominated literal candidate \(\ell\), as candidate p is preferred to \(\ell\) by all voters in \(V_1\cup V_2\cup V_3\cup V_4\). Further, \(N_{E_c}(p,c_i)=2\,m+1\) for any candidate \(c_i,i\in [m]\), as p is preferred to \(c_i\) by all voters in \(V_1\cup V_2\) and no one else. Hence \(N_{E_c}(p)=2m+1\).

A nominated literal candidate \(\ell\) has \(N_{E_c}(\ell ,p)=2m\), since she is preferred to p only by voters in \(V_5\). Hence \(N_{E_c}(\ell )\le 2m\) and so no nominated literal candidate can be the winner of the election.

Notice also that for any \(i\in [m]\) we have \(N_{E_c}(c_i,p)=4m\) and \(N_{E_c}(c_i,c_k)\ge 2m+2\) since \(c_i\) is preferred to \(c_k\) by voters in \(V_1\) if and only if \(c_k\) is preferred to \(c_i\) by voters in \(V_2\). Similarly, the preferences of voters in \(V_3\) and in \(V_4\) are reversed with respect to candidates \(c_i,i\in [m]\). Moreover, each \(c_i\) is preferred to all other c-candidates by voters \(v_i\) and \(v'_i\).

Now assume that B is satisfied by some truth assignment f and let the candidates corresponding to true literals be nominated. Take a candidate \(c_i\) for any \(i\in [m]\) and a literal candidate \(\ell\) such that \(\ell\) is contained in clause \(C_i\) and true according to f. Then candidate \(c_i\) is preferred to \(\ell\) only by voters in \(V_3\cup V_4\), hence \(N_{E_c}(c_i,\ell )=2m\), thus \(N_{E_c}(c_i)\le 2\,m\) and therefore candidate p is the only winner of election E.

Conversely, assume that p is the only winner of the election. This means that we must have \(N_{E_c}(c_i)\le 2m\) for each \(i\in [m]\). Hence there must be a nominated literal candidate \(\ell\) contained in each clause \(C_i\) such that the voters \(v_i\) and \(v'_i\) do not prefer \(c_i\) to \(\ell\). It is now easy to see that if we assign the literals corresponding to the nominated candidates to be true, this truth assignment makes formula B true. \(\square\)

Election control for Copeland and Llull election was studied in [7] and [20] using the language of graph theory, and this is the approach we shall also follow.

First we summarize some terminology. A directed graph (digraph) is a pair \(G=(A,H)\) where A is the set of vertices and H is the set of arcs, i.e., ordered pairs of vertices. If \((a,a')\in H\) then we say that arc \((a,a')\) starts in vertex a and ends in vertex \(a'\). We do not allow bidirected arcs and loops, i.e., for any two different vertices \(a,a'\) either \((a,a')\in H\) or \((a',a)\in H\) or none, and (aa) is never an arc. The outdegree of a vertex \(a\in A\), denoted by \(\delta ^+(a)\) is the number of arcs starting in a and the indegree \(\delta ^-(a)\) of vertex \(a\in A\) is the number of arcs ending in a. More precisely, \(\delta ^+(a)=|\{a'\in A: (a,a')\in H \}|\) and \(\delta ^-(a)=|\{a'\in A: (a',a)\in H \}|\). If a subset \(A'\subseteq A\) is given, then the digraph induced by \(A'\) has \(A'\) as its vertex set and its arc set is \(\{(a,a')\in H: a,a'\in A'\}\).

Each election \(E=(A,V)\) can be represented by a digraph \(G_E = (A,H)\), called the majority graph of the election, whose set of vertices corresponds to the set of candidates A in E. There is an arc from vertex a to vertex \(a'\) if and only if the corresponding candidate a wins the head-to-head contest with candidate \(a'\). No arc between vertices a and \(a'\) indicates that candidates a and \(a'\) tie in their head-to-head contest.

The Copeland score of the candidate is the number of head-to-head contests won. By the definition of the set of arcs in the digraph \(G_E\), the Copeland score of a candidate a in election E equals \(\delta ^+(a)\) in \(G_E\) and the Copeland winner of E corresponds to a vertex with maximum outdegree in \(G_E\).

The Llull score of a candidate a can be represented as the total number of candidates except a minus the number of candidates that beat a in the pairwise head-to-head contest. Thus, the Llull score of a candidate corresponding to vertex a is equal to \(|A|-1-\delta ^-(a)\) and the Llull winner of E corresponds to a vertex with minimum indegree in \(G_E\).

Conversely, by a result of [28], for each digraph \(G=(A,H)\) there is an election \(E_G=(A,V)\) such that the outcomes of the pairwise head-to-head contests in \(E_G\) correspond to the arcs of the digraph. This means that the pair \((a,a')\in H\) if and only if candidate a beats candidate \(a'\) in the pairwise contest in \(E_G\). To create such an election, we do the following. For every arc \((a,a')\in H\) we add two voters with preferences \(a\succ a'\succ [A']\) and \([\overleftarrow{A'}]\succ a\succ a'\), where \(A'=A\backslash \{a,a'\}\) and the orderings of candidates in \([\overleftarrow{A'}]\) and \([A']\) are exactly opposite. In the preferences of these two voters, a beats \(a'\) and all other pairs of candidates are tied.

Now we define the studied graph problems.

figure d
figure e

Theorem 6

Max-Outdegree-Choice is NP-complete.

Proof

Based on an instance of (2,2)-e3-sat represented by a formula B we construct a digraph \(G = (A,H)\). The set of vertices A equals \(A_0\cup C\cup X\) where

$$\begin{aligned} A_0=\bigcup _{s=1}^4 A_s \qquad C=\bigcup _{i=1}^m C_i \qquad X=\bigcup _{j=1}^n X_j \end{aligned}$$

and the partition sets of A are

$$\begin{aligned} A_s=\{a_s\}, s\in [4]\quad C_i=\{c_i\},i\in [m]\quad X_j=\{x_j,{\bar{x}}_j\}, j \in [n]. \end{aligned}$$

The arc set H contains the arcs \((a_1,a_s)\) for \(s=2,3,4\) and for each \(i\in [m]\) the arcs from the clause vertex \(c_i\) to the literal vertices corresponding to the literals contained in clause \(C_i\). The constructed digraph is illustrated in Fig. 1.

Notice that \(a_1\) and all clause vertices \(c_i\) have outdegree three and all other vertices have outdegree zero in G.

Fig. 1
figure 1

The digraph constructed in the proof of Theorem 6

Assume now that B is satisfied by a boolean valuation f. Let us choose for each \(j\in [n]\) the vertex in \(X_j\) that corresponds to the false literal of variable \(x_j\) and denote their set by \(X_F\). Since f makes B true, every clause contains at least one true literal, so choosing for each \(X_j\) the vertices in \(X_F\), the outdegree of each clause vertex decreases to at most two. Hence \(a_1\) becomes the only vertex with the maximum outdegree equal three in the digraph induced by \(A_0\cup C\cup X_F\).

Conversely, assume that there is a set \(A_c\subset A\) containing a unique vertex in each \(X_j\) for \(j\in [n]\) such that \(a_1\) is the only vertex with maximum outdegree in the digraph induced by \(A_c\). This implies that the outdegree of every clause vertex \(c_i\), \(i \in [m]\) is at most 2. If we define the literals corresponding to the literal vertices in \(A_c\) to be false for each \(j\in [n]\), we ensure that each clause contains at least one true literal and B is satisfied. \(\square\)

Theorem 7

Min-Indegree-Choice is NP-complete.

Proof

Let a boolean formula B as an instance of (2,2)-e3-sat be given. We construct a digraph G for B as follows.

The set of vertices is

$$\begin{aligned} A=A_1\cup Y \cup Z \cup W \cup C \cup X \end{aligned}$$

where

$$\begin{aligned} Y=\bigcup _{s=1}^3 Y_s, \ Z=\bigcup _{s=1}^3 Z_s, \ W=\bigcup _{s=1}^3 W_s,\ C=\bigcup _{i=1}^m C_i,\ X=\bigcup _{j=1}^n X_j \end{aligned}$$

and \(A_1=\{a_1\}\), \(Y_s=\{y_s\}\), \(Z_s=\{z_s\}\) and \(W_s=\{w_s\}\) for \(s\in [3]\), \(C_i=\{c_i\}\) for \(i\in [m]\) and \(X_j=\{x_j,{\bar{x}}_j\}\) for \(j\in [n]\). The arcs of G are as follows.

\((y_q,z_s), (z_s, w_\ell ), (w_\ell , y_q)\)

for \(q, s,\ell \in [3]\)

\((y_1,a_1), (y_2,a_1)\)

 

\((y_1,c_i), (y_2,c_i)\)

for \(i\in [m]\)

\((y_q,x_j), (y_q,{\bar{x}}_j)\)

for \(q\in [3], j\in [n]\)

\((x_j,c_i), ({\bar{x}}_j, c_i)\)

for \(j\in [n], i\in [m]\) if clause \(C_i\) contains the

corresponding literal \(x_j\) or \({\bar{x}}_j\), respectively

For an illustration of the construction see Fig. 2. Its left part shows the subgraph induced by ‘fixed’ vertices, i.e., those that belong to one-element partition sets \(A_1, Y_s, Z_s, W_s\), and \(C_i\), and its right part shows the digraph obtained by a particular choice of vertices from X. The ovals represent the sets YZ and W and the thick arrows mean that there is an arc from each vertex in the first oval to each vertex in the second oval. Notice that the indegrees of all the vertices in the digraph induced by ‘fixed’ vertices are 3 with the exception of vertices \(c_i, i\in [m]\) and \(a_1\) that all have indegree 2. Also, each vertex in X, if added, will have the indegree 3.

Fig. 2
figure 2

The digraph constructed in the proof of Theorem 7

Assume that B is satisfiable. Choose for each \(j\in [n]\) the literal vertex corresponding to the true literal. Obviously, all the clause vertices will now have indegree at least 3, so \(a_1\) becomes the only vertex with minimum indegree equal to 2.

Conversely, if the vertex \(a_1\) is the only vertex with the minimum indegree in the digraph induced by \(A_c\), this means that for each clause vertex \(c_i\) there is at least one chosen literal vertex with the arc pointing to \(c_i\). This implies that the boolean valuation that makes the literals corresponding to the chosen vertices true satisfies B. \(\square\)

Theorems 6 and 7 imply the following assertion.

Theorem 8

Possible President is NP-complete for Copeland and Llull voting rules, even if the size of each party is at most 2.

4 Integer programs for candidate nomination

In this section we propose integer programs to solve the Possible President and Necessary President problems for various voting rules. The programs for scoring rules were inspired by the program given in [31] for plurality. For Copeland and Llull voting rules we use their equivalent formulations in the graph-theoretical language, and we modify the ideas from [20].

Recall that we use the letters M and N to denote the numbers of candidates and voters in the election, respectively.

In all our programs we have the decision variables \(x_i\) for each candidate \(a_i, i\in [M]\) with the following interpretation:

$$\begin{aligned} x_i=\left\{ \begin{array}{ll} 1 &{} \text{ if } a_i \text{ is } \text{ nominated }\\ 0 &{} \text{ otherwise } \end{array}\right. \end{aligned}$$

The following constraint ensures that exactly one candidate is nominated by each party.

$$\begin{aligned} \sum _{a_i\in P_\ell } x_i= & {} 1 \text{ for } \text{ each } \ell \in [r] \end{aligned}$$
(1)

We will also assume that the preferences of voters are represented by a family of matrices \(Q^j=(q^j_{is})\in \{0,1\}^{M\times M}\), j corresponding to voter \(v_j\), such that

$$\begin{aligned} q^j_{is}=1 \Longleftrightarrow a_i\succ _{v_j} a_s. \end{aligned}$$

4.1 k-approval

In addition to variables \(x_i\), \(i\in [M]\), there are binary variables \(z^j_i\) for \(i\in [M]\) and \(j\in [N]\) with the following interpretation

$$z_{i}^{j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & \begin{gathered} {\text{if}}\;{\text{a}}_{{\text{i}}} \;{\text{is}}\;{\text{one}}\;{\text{of}}\;{\text{the}}\;{\text{top-}}k\;{\text{candidates}}\;{\text{among}}\;{\text{nominated}}\; {\text{candidates}}\;{\text{for}}\;{\text{voter}}\;v_{j} \hfill \\ \end{gathered} \hfill \\ 0 \hfill & {{\text{otherwise }}} \hfill \\ \end{array} } \right.$$

Inequalities (24) given below achieve the desired interpretation of variables \(z^j_i\) as follows. First notice, that if \(a_i\) is not nominated, i.e., \(x_i=0\), then (2) leads to \(z_i^j=0\) for each voter \(v_j\). Then inequalities (34) are fulfilled trivially.

Now take a nominated candidate \(a_i\). The sums in the left-hand sides of (3) and (4) count the number of nominated candidates that voter \(v_j\) prefers over \(a_i\). If \(z^j_i=1\), i.e., if candidate \(a_i\) is among the top-k candidates for voter \(v_j\), inequality (3) ensures that this sum is at most \(k-1\). Conversely, if \(a_i\) is not among the top-k candidates for voter \(v_j\), i.e., \(z^j_i=0\), then taking into account that \(x_i=1\), thanks to (4) there are at least k nominated candidates preferred by \(v_j\) over \(a_i\).

$$\begin{aligned} \sum _{j=1}^N z^j_i\ \le & {}\ Nx_i \text{ for } \text{ each } i\in [M]\end{aligned}$$
(2)
$$\begin{aligned} \sum _{s=1}^M q^j_{si}x_s+(M-k+1)z^j_i\le & {}\ M \text{ for } \text{ each } i\in [M], j\in [N] \end{aligned}$$
(3)
$$\begin{aligned} \sum _{s=1}^M q^j_{si}x_s+k(z^j_i-x_i)\ge & {}\ 0 \text{ for } \text{ each } i\in [M], j\in [N] \end{aligned}$$
(4)

Finally, inequalities (5) given below ensure that the number of voters that have candidate \(a_1\) among their top-k candidates is greater than the number of voters that have any other nominated candidate in their set of top-k candidates.

$$\begin{aligned} \sum _{j=1}^N z^j_1- \sum _{j=1}^N z^j_i\ge & {}\ 1 \text{ for } \text{ each } i\in [M], i> 1 \end{aligned}$$
(5)

In summary:

Theorem 9

Candidate \(a_1\) is a possible winner of the k-approval election if and only if the binary program with variables \(x_i, i\in [M]\) and \(z_i^j\) for \(i\in [M]\) and \(j\in [N]\) consisting of inequalities (25) plus equalities (1) is feasible.

To modify the previous ILP to capture the Necessary President problem, we need to achieve that \(a_1\) receives less votes than at least one candidate. Therefore we introduce new binary variables \(y_i\) for each candidate \(a_i, i\in [M], i\ne 1\) with the following interpretation:

$$\begin{aligned} y_i=\left\{ \begin{array}{ll} 1 &{} \text{ if } \text{ candidate } a_i \text{ receives } \text{ more } \text{ votes } \text{ than } \text{ candidate } a_1 \\ 0 &{} \text{ otherwise } \end{array}\right. \end{aligned}$$

To ensure that at least one candidate receives more votes than candidate \(a_1\) we add

$$\begin{aligned} \sum _{i=2}^M y_i\ge 1 \end{aligned}$$
(6)

and to achieve that only nominated candidates can receive more votes than \(a_1\) we add inequalities

$$\begin{aligned} y_i\le x_i \text{ for } \text{ each } i>1 \end{aligned}$$
(7)

Finally, we have that

$$\begin{aligned} \sum _{j=1}^N z^j_1- \sum _{j=1}^N z^j_i \le N(1-y_i) \text{ for } \text{ each } i\in [M], i>1 \end{aligned}$$
(8)

Let us see what inequalities (68) together imply. If \(y_i=0\) then inequality (8) is trivially fulfilled. Thanks to inequalities (6) and (7) we have that \(y_i=1\) for at least one nominated candidate \(a_i,i>1\), and this will be the candidate that receives more votes than \(a_i\) for some nominations resulting from a solution of the ILP.

Theorem 10

Candidate \(a_1\) is not the necessary winner of the k-approval election if and only if the binary program with variables \(x_i, i\in [M]\), \(y_i, i\in [M], i>2\) and \(z_i^j\) for \(i\in [M]\) and \(j\in [N]\) consisting of inequalities (2)–(4), (6)–(8) plus equalities (1) is feasible.

4.2 k-veto

In addition to variables \(x_i\) there are binary variables \(z^i_j\), \(i\in [M], j\in [N]\) with the following interpretation

$$\begin{aligned} z_i^j=\left\{ \begin{array}{ll} 1 &{} \text{ if } a_i \text{ is } \text{ one } \text{ of } \text{ the } \text{ k } \text{ least } \text{ preferred } \text{(worst) } \text{ candidates } \text{ among } \\ &{} \hbox { nominated candidates for voter}\ v_j \\ 0 &{} \text{ otherwise } \end{array}\right. \end{aligned}$$

Inequalities (911) achieve the desired interpretation of variables \(z^j_i\) as follows. If \(a_i\) is not nominated, i.e., \(x_i=0\), then (9) leads to \(z_i^j=0\) for each voter \(v_j\) and inequalities (1012) are fulfilled trivially.

Now consider a nominated candidate \(a_i\). The sums \(\sum _{s=1}^M q^j_{is}x_s\) in the left-hand sides of (10) and (11) count the number of nominated candidates that are less preferred than \(a_i\) by voter \(v_j\). If \(z^i_j=1\), i.e., if candidate \(a_i\) is among the k worst candidates for voter \(v_j\), inequalities (10) ensure that these sums are at most \(k-1\). Conversely, if \(z^i_j=0\) for a nominated candidate \(a_i\) then taking into account that \(x_i=1\), inequalities (11) imply that there are at least k nominated candidates less preferred by \(v_j\) than \(a_i\).

$$\begin{aligned} \sum _{j=1}^N z^j_i\le & {}\ Nx_i \text{ for } \text{ each } i\in [M]\end{aligned}$$
(9)
$$\begin{aligned} \sum _{s=1}^M q^j_{is}x_s+(M-k+1)z^j_i\le & {}\ M \text{ for } \text{ each } i\in [M], j\in [N] \end{aligned}$$
(10)
$$\begin{aligned} \sum _{s=1}^M q^j_{is}x_s+k(z^j_i-x_i)\ge & {}\ 0 \text{ for } \text{ each } i\in [M], j\in [N] \end{aligned}$$
(11)

Finally, inequalities (12) given below ensure that the number of voters that have candidate \(a_1\) among their k worst nominated candidates is strictly smaller than the number of voters that have any other nominated candidate \(a_i\) in their set of k worst candidates. Note that we cannot simply reverse the inequality sign in (5), as this time, when candidate \(a_i\) is not nominated, the sum \(\sum _{j=1}^n z_i^j\) is equal to 0 and the obtained ILP will never be feasible. Therefore \(N+1\) is added to the right hand side of the inequality when \(x_i = 0\).

$$\begin{aligned} \sum _{j=1}^N z^j_1-\sum _{j=1}^N z^j_i \le (N+1)(1 - x_i)-1 \text{ for } \text{ each } i\in [M], i\ne 1 \end{aligned}$$
(12)

To sum up:

Theorem 11

Candidate \(a_1\) is a possible winner of the k-veto election if and only if the binary program with variables \(x_i, i\in [M]\) and \(z_i^j\) for \(i\in [M]\) and \(j\in [N]\) consisting of inequalities (92) plus equalities (1) is feasible.

To get the ILP for the Necessary President problem for k-veto, we use similarly as for k-approval binary variables \(y_i\) for each candidate \(a_i, i\in [M], i\ne 1\), now with the interpretation \(y_i=1\) if candidate \(a_i\) receives less negative votes than \(a_1\), together with inequalities (6) and (7). Inequalities (12) are replaced by

$$\begin{aligned} \sum _{j=1}^N z^j_i- \sum _{j=1}^N z^j_1 \le (N+1)(1-y_i) \text{ for } \text{ each } i\in [M], i>1 \end{aligned}$$
(13)

while again the second term in the right-hand side ensures the inequality trivially for those candidates \(a_i\) that are not winning, and so also for those that are not nominated.

Theorem 12

Candidate \(a_1\) is not the necessary winner of the k-veto election if and only if the binary program with variables \(x_i, i\in [M]\), \(y_i, i\in [M], i>2\) and \(z_i^j\) for \(i\in [M]\) and \(j\in [N]\) consisting of inequalities (911), (67), (13) plus equalities (1) is feasible.

4.2.1 Borda

Recall that the number of points that a given candidate \(a_i\) receives from a voter \(v_j\) is equal to the number of other nominated candidates that are lower in \(v_j\)’s preference list than candidate \(a_i\). The double sums in the left-hand sides of inequality (14) thus count the Borda score of candidates \(a_1\) and \(a_i\), respectively. The second term in the right-hand side ensures that the inequality is automatically fulfilled if candidate \(a_i\) is not nominated.

$$\begin{aligned} \sum _{j=1}^N\sum _{s=1}^M q^j_{1s}x_s-\sum _{j=1}^N\sum _{s=1}^M q^j_{is}x_s \ge 1-MN(1-x_i) \text{ for } \text{ each } i\in [M], i\ne 1 \end{aligned}$$
(14)

Theorem 13

Candidate \(a_1\) is a possible winner for Borda elections if and only if the binary program with variables \(x_i, i\in [M]\) and the set of equalities (1) and inequalities (14) is feasible.

To capture the Necessary President problem for Borda election, we again use the binary variables \(y_i, i\in [M], i>1\) and replace inequality (14) by inequality (15).

$$\begin{aligned} \hspace{-4.25pt}\sum _{j=1}^N\sum _{s=1}^M q^j_{1s}x_s-\sum _{j=1}^N\sum _{s=1}^M q^j_{is}x_s \le MN(1-y_i) \text{ for } \text{ each } i\in [M], i\ne 1 \end{aligned}$$
(15)

Theorem 14

Candidate \(a_1\) is not the necessary winner for Borda elections if and only if the binary program with variables \(x_i, i\in [M]\) and \(y_i, i\in [M], i>1\) consisting of equalities (1) and inequalities (67) and (15) is feasible.

4.3 Condorcet consistent voting rules

4.3.1 Maximin voting rule

Recall that the winner of the maximin election E is the candidate a, who achieves maximum advantage of all candidates, i.e., for whom the quantity \(N_E(a)=min_{b\in A}N_E(a,b)\) is maximum. The advantage of the intended winner \(a_1\) over a nominated candidate \(a_s\) in the reduced election \(E_c\) is thus

$$\begin{aligned} N_{E_c}(a_1,a_s)=\sum _{j=1}^N q_{1s}^j. \end{aligned}$$

Let B denote the minimum advantage of \(a_1\) over all nominated candidates. Then B has to fulfil for each \(s\ne 1\)

$$\begin{aligned} B\le \sum _{j=1}^N q_{1s}^j+N(1-x_s) \end{aligned}$$
(16)

The second term in the right-hand side of (16) ensures that the inequality for a candidate \(a_s\) that is not nominated is not taken into account in the computation of the minimum advantage. We introduce new binary variables \(z_{is}\) for each pair of candidates \(a_i,a_s\in A\), \(i\ne s\) with the interpretation

$$\begin{aligned} z_{is}=\left\{ \begin{array}{ll} 1 &{} \text{ if } N_{E_c}(a_i,a_s)<B \\ 0 &{} \text{ otherwise } \end{array}\right. \end{aligned}$$

To ensure this interpretation we add the following inequalities.

$$\sum\limits_{{s = 1,s \ne i}}^{M} {z_{{is}} } + M(1 - x_{i} ) \ge 1\;{\text{for}}\;{\text{all}}\;i > 1$$
(17)
$$z_{{is}} \le x_{s} \;{\text{for}}\;{\text{all}}\;i,s,i > 1{\text{,}}\;i \ne s$$
(18)
$$\begin{aligned} \sum _{j=1}^N q^j_{is}x_s-N(1-z_{is})-N(1-x_i)\le & {}\ B-1 \text{ for } \text{ all } i,s, i>1\text{, } i\ne s \end{aligned}$$
(19)

In detail, if \(a_i\) is not nominated, i.e., if \(x_i=0\), then (17) and (19) are trivially fulfilled. Now assume that \(a_i\) is nominated. Then thanks to (17) and (18) we have \(z_{is}= 1\) for at least one nominated candidate \(a_s\). Then for such a candidate \(a_s\) we have that the advantage of \(a_i\) over \(a_s\), and hence the minimum advantage of \(a_i\), is strictly smaller than the minimum advantage of \(a_1\) due to (19).

Theorem 15

Candidate \(a_1\) is a possible winner of the maximin election if and only if the binary program with variables \(x_i, i\in [M]\), B, and \(z_{is}\) for \(i,s\in [M], i,s>2\) consisting of inequalities (16) for each \(s>1\) and (1719) for each \(i,s>1, i\ne s\) plus equalities (1) is feasible.

To write the ILP for the Necessary President, let us denote by \(B_i\) the minimum advantage of candidate \(a_i\), \(i>1\) over all nominated candidates. \(B_i\) fulfils for each \(s\in [M]\), \(i\ne s\)

$$\begin{aligned} B_i\le \sum _{j=1}^N q_{is}^j+N(1-x_s), \end{aligned}$$
(20)

where again the second term in the right-hand side of (20) ensures that the inequality for a candidate \(a_s\) that is not nominated is not taken into account. We further have binary variables \(\rho _{is}\) for each pair of candidates \(a_i,a_s\in A\), \(i,s>1\) with the interpretation

$$\begin{aligned} \rho _{is}=\left\{ \begin{array}{ll} 1 &{} \text{ if } N_{E_c}(a_1,a_s)<B_i \\ 0 &{} \text{ otherwise } \end{array}\right. \end{aligned}$$

The following inequalities ensure that there exist nominated candidates \(i,s>1\), such that \(N_{E_c}(a_1,a_s)< B_i\).

$$\begin{aligned} \sum _{i=2}^M\sum _{s=2}^M \rho _{is}\ge & {}\ 1 \end{aligned}$$
(21)
$$\rho _{{is}} \le x_{s} \;{\text{for}}\;{\text{each}}\;i,s = 2, \ldots ,M$$
(22)
$$\rho _{{is}} \le x_{i} \;{\text{for}}\;{\text{each}}\;i,s = 2, \ldots ,M$$
(23)
$$\begin{aligned} \sum _{j=1}^N q^j_{1s}x_s-N(1-\rho _{is})\le & {}\ B_i \text{ for } \text{ each } i,s=2,\dots , M \end{aligned}$$
(24)

In more detail, if \(a_i\) or \(a_s\) are not nominated, then (22) and (23) ensure that \(\rho _{is}=0\). Then (24) implies that the advantage of candidate \(a_1\) over at least one nominated candidate \(a_s\) is smaller than the minimum advantage of some nominated candidate \(a_i\).

Theorem 16

Candidate \(a_1\) is not the necessary winner of the maximin election if and only if the binary program with variables \(x_i,i\in [M]\), \(B_i,i\in [M], i>1\) and \(\rho _{is}, i,s\in [M], i,s>1\) and \(i\ne s\) consisting of inequalities (2024) plus equalities (1) is feasible.

4.3.2 Copeland and Llull elections

We interpreted Copeland and Llull elections in the language of graph theory, therefore we derive the ILP for this version too. The binary variable \(x_i\) now corresponds to candidate \(a_i\) as well to the respective vertex.

The following inequalities ensure that \(a_1\) will be the only vertex with the maximum outdegree in the digraph induced by the chosen vertices.

$$\begin{aligned} \sum _{(a_i,a_s)\in H}x_s-M(1-x_i)+1 \le \sum _{(a_1,a_s)\in H}x_s \text{ for } \text{ each } a_i\in A\backslash \{a_1\} \end{aligned}$$
(25)

To see the correctness of inequalities (25), realize that their right-hand side correctly counts the outdegree of vertex \(a_1\), taking into account only chosen vertices. If vertex \(a_i\) is chosen, i.e., \(x_i\)=1, then the left-hand sides of (25) count its outdegree to the chosen vertices. If \(a_i\) is not chosen then \(x_i=0\) ensures the fulfillment of (25) thanks to the second term in the left-hand side.

By a similar argument, inequalities (26) ensure that \(a_1\) will be the only vertex with the minimum indegree in the digraph induced by the chosen vertices.

$$\begin{aligned} \sum _{(a_s,a_i)\in H}x_s+M(1-x_i)-1 \ge \sum _{(a_s,a_1)\in H}x_s \text{ for } \text{ each } a_i\in A\backslash \{a_1\} \end{aligned}$$
(26)

Theorem 17

The candidate corresponding to vertex \(a_1\) is a possible winner of Copeland election if and only if the integer program with binary variables \(x_i\), \(i\in [M]\) and constraints (1) and (25) is feasible and she is a possible winner of Llull election if and only if the integer program with binary variables \(x_i, i\in [M]\) and constraints (1) and (26) is feasible.

To get the integer programs for the Necessary President problem, we add variables \(y_i, i\in [M], i>1\) with inequalities (6) and (7). Then inequalities (27) ensure that at least one nominated vertex has its outdegree greater than or equal to that of \(a_1\), and inequalities (28) make sure that at least one nominated vertex has its indegree smaller than or equal to that of \(a_1.\)

$$\begin{aligned} \sum _{(a_i,a_s)\in H}x_s+M(1-y_i) \ge \sum _{(a_1,a_s)\in H}x_s \text{ for } \text{ each } a_i\in A\backslash \{a_1\} \end{aligned}$$
(27)
$$\begin{aligned} \sum _{(a_s,a_i)\in H}x_s-M(1-y_i) \le \sum _{(a_s,a_1)\in H}x_s \text{ for } \text{ each } a_i\in A\backslash \{a_1\} \end{aligned}$$
(28)

Theorem 18

The candidate corresponding to vertex \(a_1\) is not the necessary winner of Copeland election if and only if the integer program with binary variables \(x_i\), \(i\in [M]\), \(y_i, i\in [M], i>1\) and constraints (1), (67) and (27) is feasible and she is not the necessary winner of Llull election if and only if the integer program with binary variables \(x_i, i\in [M]\), \(y_i, i\in [M], i>1\) and constraints (1), (67) and (28) is feasible.

5 Computational experiments

To experimentally verify the efficiency of the proposed integer programs for the candidate nomination problems, we tested them on real as well as randomly generated data.

We implemented all the integer programs from Sect. 4 in the programming language Python 3.10.4, using libraries PuLP 2.6.0, pandas 1.4.2, and numpy 1.22.3. For processing and visualizing data obtained from the simulations, we utilized data science tools, specifically the pandas library, and visualization libraries matplotlib 3.6.3 and seaborn 0.12.2. We ran the simulations on a computer with an AMD Ryzen 7 5800H 3.2 GHz processor and 16.0 GB RAM.

We generated 30 different elections for each considered case and for each one we solved the ILP for the Possible President as well as for Necessary President.

In all cases we randomly chose the intended winner among the candidates and the remaining candidates were assigned randomly to parties. We used two different methods to generate parties.

For the first case, briefly refereed to as large parties, we generated the sizes of parties to be independent integers between 1 and the total number of candidates, using uniform distribution. When the number of remaining candidates was larger than the size of the next generated party, we let this party contain all the remaining candidates. As under uniform distribution the probability that the party will contain at least half of the candidates is the same as the probability that the party will contain less than half of the candidates, the number of parties thus created was never larger than 6, and equal to 3 in approximately one half of generated instances, irrespective of the dataset used.

To compare how the computation times differ in the case when there are a few parties with many members and in the case with many small parties, in alternative simulations we prescribed the size of each party to be 2, with one party of size 3 if necessary. We refer to this case as small parties.

Once we generated the sizes of parties \(size_1, size_2, \dots , size_r\), we used the Python function random shuffle to generate a random order of candidates. The first \(size_1\) candidates in this order were allocated to the first party, the following \(size_2\) candidates to the second party and so on.

To each generated election instance we applied all the considered voting rules (with some exceptions, explained later). We recorded computation times that include the time needed to create the ILP as well as the solution time for the ILP. We also illustrate the proportions of elections where the chosen candidate was identified as a possible winner as well as those where she was the necessary winner.

5.1 Real data

In real elections, especially with a large number of candidates, it is unrealistic to expect that each voter is able to give a complete preference list containing all candidates. Similarly, if the number of voters is large, based on polls or other information, one could infer say, which proportion of voters has which candidate as their first choice, but it would hardly be possible to estimate the complete preference lists of all voters. These restrictions also apply to the availability of real data. We used two datasets from the Preflib library [26, 27], that we now describe in greater detail.

5.1.1 T-Shirts data

This datasetFootnote 4 was created by Carleton Coffrin and contains complete strict rank orderings of 11 T-Shirt designs (candidates) voted on by 30 members of the Optimization Research Group at NICTA.

The structure of elections with large parties generated for this dataset may model situations similar to our introductory example. Imagine again a research department that consists of a small number of research teams where about one third of its members might be potential candidates for the chair. These people vote too, and we do not assume that everybody considers herself to be the most suitable candidate.

The computation times of the respective ILPs are presented in Figs. 3 and 4 in the form of boxplots, with red color representing the elections with small parties and the green color the elections with large parties. Notice that, in general, the ILP for the Necessary President problem needed longer computation time. Also, the large parties case required longer time, which is quite counterintuitive, as the number of possible reduced elections is much smaller than in the small parties case.

The numbers above boxes indicate the numbers of election instances with ‘nomination success’, i.e., when the chosen candidate was identified as a possible or necessary winner of the election, respectively. Let us remark that in approximately half of the generated instances the chosen candidate was identified as a possible winner for both variants of sizes of parties, the exception of 3-approval and 3-veto for the case of large parties. On the other hand, she was never the necessary winner, with very few exceptions.

5.1.2 Countries rankings

Boehmer and Schaar [9] created elections generated from indicator-based rankings of countriesFootnote 5 based on the popular world happiness report.Footnote 6 In these elections the countries are the candidates and each vote ranks them according to one indicator. The raw data were post-processed by deleting some candidates and voters to make the election complete. We chose years 2012–2014, with the numbers of candidates 112,115 and 116, respectively, and the number of voters (i.e., indicators) equal to 15.

In this dataset the votes cannot be considered independent, as some of the indicators are significantly correlated, for example GDP per capita, social support, and healthy life years [32].

For each dataset we generated 30 random elections with large parties and 30 random elections with small parties. The computation results are depicted in Fig. 5 in the form of boxplots using the same color code to distinguish between the small parties case and the large parties case. The numbers above the plots represent, once again, the numbers of instances in each case where the nominations were successful. Now, however, as the computation times differ greatly, we use the logarithmic scale to better enable comparisons.

Notice that the number of variables as well as constraints in the constructed ILPs for maximin voting rule is quadratic in the number of candidates (which means in this case tens of thousands) and O(mn) for k-approval and k-veto. These sizes proved to be prohibitive and for many samples the computations did not finish within 5 min, so we only display the results for plurality, veto, Borda, Copeland and Llull voting rules. One can again see that the computation times for the case with large parties are much higher than for the small parties, with the exception of Borda. Further, while the chosen candidate was almost never the necessary winner and she was almost always a possible winner for large parties, in the case with small parties the nomination success dropped to only a few instances.

5.2 Random data

To test the proposed ILPs on random data we fixed the number of voters to be 100 and varied the size of the candidate set to be 17, 33, 65 and 129, i.e., powers of two plus one. For each size of the candidate set we randomly generated 30 preference profiles of mutually independent complete orderings, again using the Python function random.shuffle.

Candidate \(a_1\) was the chosen candidate and we again generated random partitions of the remaining candidates into small and large parties as described above.

Comparisons of computation times for the ILPs for the Possible President and the Necessary President problems for the studied voting rules as a function of the number of candidates are presented in Fig. 6 on the logarithmic scale. Notice that while candidate \(a_1\) has never been the necessary winner, the ratios of instances where candidate \(a_1\) was identified as a possible winner varied between 6 and 30, with greater success rate in the case of large parties.

One can see that the computation times grow more than exponentially with the number of candidates. The fastest were the ILPs for Copeland and Llull. This is expected, as the ILPs for these voting rules are the smallest ones.

Again, the computation times grew too fast for k-approval and k-veto for \(k>1\) and maximin; in some cases the program was running for hours without a conclusion. Therefore we did not include these voting rules in our illustrations.

6 Conclusions and outlook

In this paper we proved intractability of two versions of the candidate nomination problem for several voting rules and demonstrated how integer programming can be used to solve them. Here we summarize where we see possible extensions of the current results.

  • Notice that we leave the formal intractability proofs of Necessary President for Borda and for the considered Condorcet consistent voting rules open. As far as Copeland\(^\alpha\) where \(\alpha \in (0,1)\) is concerned, we hypothetise that both Possible President and Necessary President are hard, but we do not have formal proofs either. Recall that the two extreme cases, Copeland and Llull, use two different graph-theoretical encodings, the intermediate values of \(\alpha\) may need a completely different approach. Further, our work does not deal with other popular voting rules for which complexity of election control has been studied for example in [15], like fallback, range or normalized range voting.

  • In this paper we only studied constructive control, i.e., the party wants its candidate to win. How shall the results differ for destructive control, i.e., can a party prevent the victory of a ‘hated’ candidate of some other party by a suitable nomination of its candidate(s)?

  • Walsh [33] asked about the computational complexity of manipulating an election by adding, deleting or partitioning candidates when preferences are guaranteed to be single peaked. A partial answer was given by Faliszewski et al. [17] who showed that plurality voting is vulnerable to constructive and destructive control by adding candidates. By contrast, for Borda rule Yang [35] showed that constructive control by adding and deleting voters remain NP-hard for single-peaked preferences. However, he left the complexity of control by adding or deleting candidates open. In the context of the candidate nomination problem, [18] showed for plurality that Necessary President is in P provided that the input election is single-peaked. However, Possible President remains NP-complete for single-peaked preferences, unless the candidates from each party are ranked consecutively on the societal axis. If we restrict the preference profiles to single peaked preferences, will the intractability results for candidate nomination for other voting rules still hold?

  • Recently, many authors works parameterized complexity of election control, for example [7, 14, 24, 25] or [36]. What are the suitable parameters for candidate nomination problems? We already know that the size of the maximum party alone would not help, as all our intractability results hold for parties of size 2, but one could consider for example the number of parties or the number of voters.

Fig. 3
figure 3

Summary of computation times and nomination success for k-approval, k-veto, \(k=1,2,3\) in 30 randomly generated elections for the dataset T-shirts

Fig. 4
figure 4

Summary of computation times and nomination success for Borda, maximin, Copeland and Llull voting rules in 30 randomly generated elections for the dataset T-shirts

Fig. 5
figure 5

Summary of computation times and nomination success for the randomly generated elections with small and large parties for three different datasets based on ranking of countries

Fig. 6
figure 6

Summary of computation times and nomination success for the randomly generated elections with small and large parties and growing numbers of potential candidates