1 Introduction

The Gibbard-Satterthwaite theorem (Gibbard, 1973; Satterthwaite, 1975) implies that essentially any voting rule is coalitionally manipulable (CM), i.e. sensitive to strategic voting by a coalition of voters (except dictatorship, and provided that at least three distinct candidates can be elected). However, not all voting rules need be equal in this respect: they may differ by the frequency of the situations where they are CM, the complexity of computing the strategic ballots, the number of candidates who can benefit from the manipulation, the consequences of strategic voting on the quality of the outcome, and the balance of power between naive, sincere voters and sophisticated, strategic ones. In order to investigate all these quantitative aspects, we run computer simulations with the Python package SVVAMP (Durand et al., 2016a) on the basis of two datasets: the FairVote dataset, gathering 162 political American elections, and the Netflix dataset, that enabled us to generate 2243 profiles of preference of users about movies. The rest of the paper is organized as follows: Section 2 gives our general definitions and notations; Section 3 defines the voting rules of our study; Section 4 introduces the two datasets; Section 5 give an overview of SVVAMP and the algorithms that we use; Section 6 presents our results; Section 7 concludes.

2 Definitions and notations

A profile is defined by:

  • Two non-empty finite sets \(\mathscr {V}\) and \(\mathscr {C}\), whose elements are respectively called voters and candidates,

  • For each voter v, a utility function \(u_v\) that assigns a value \(u_{vc}\) to each candidate c.

We denote \(V = {{\,\textrm{card}\,}}(\mathscr {V})\) and \(C = {{\,\textrm{card}\,}}(\mathscr {C})\). In the following, v denotes a generic voter; c and d, generic candidates. We always assume that for any voter, all her utility values are distinct. We denote by \(r_v\) her preference ranking, defined by: \(r_{vc} = 1 + {{\,\textrm{card}\,}}\{d \in \mathscr {C} \text { s.t. } u_{vd} > u_{vc} \}\). For example, \(r_{vc} = 1\) if c is her most liked candidate.

W denotes the weighted majority matrix of the profile, defined by \(W_{cd} = {{\,\textrm{card}\,}}\{v \in \mathscr {V} \text { s.t. } u_{vc} > u_{vd}\}\). The graph naturally associated to W is called the weighted majority graph. M denotes the majority matrix of the profile, defined by \(M_{cd} = \mathbbm {1}_{W_{cd} > W_{dc}}\) (where \(\mathbbm {1}\) denotes the indicator function). A candidate c is a Condorcet winner (CW) if, for any other candidate d, \(M_{cd} = 1\). The Smith set is the smallest set of candidates S such that, for any \(c \in S\) and \(d \notin S\), \(M_{cd} = 1\). A profile has a Condorcet Order (CO) if the binary relation represented by M is a strict total order. A candidate c is a majority favorite (MF) if \({{\,\textrm{card}\,}}\{v \in \mathscr {V} \text { s.t. } r_{vc} = 1\} > \frac{V}{2}\).

A voting rule is defined,Footnote 1 for any \(\mathscr {V}\) and \(\mathscr {C}\), by:

  • A set of strategies \(\mathscr {S}\) (in this paper, it is the same for all voters),

  • A counting function f that maps a tuple of strategies \((\sigma _v)_{v \in \mathscr {V}} \in \mathscr {S}^\mathscr {V}\) to a winning candidate \(w \in \mathscr {C}\),

  • A sincerity function s that maps the utility function \(u_v\) of a voter v to a strategy \(\sigma _v \in \mathscr {S}\).

Together, the utility function of the profile, the set of strategies and the counting function of the voting rule define a game in the usual sense of game theory. All the voting rules in this paper consist in a succession of rounds (often only one) where all voters play simultaneously. The action of a voter at a given round is called a ballot. In particular, for rules in one round, the strategy of a voter is simply called her ballot.

In the following, we will always denote \(w = f\Big ( \big (s(u_v)\big )_{v \in \mathscr {V}} \Big )\) and call her (by a slight abuse of language) the sincere winner. We say that a voting rule is Condorcet-consistent if, for any profile with a Condorcet winner c, it holds that \(w = c\).

For any \(c \ne w\), we denote \(\mathscr {V}_\text {NM}(c) = \{ v \in \mathscr {V} \text { s.t. } u_{vw} > u_{vc} \}\) and \(\mathscr {V}_\text {M}(c) = \{ v \in \mathscr {V} \text { s.t. } u_{vc} > u_{vw} \}\), whose respective cardinalities are denoted \(V_\text {NM}(c)\) and \(V_\text {M}(c)\). We say that a voting rule is coalitionally manipulable (CM) in a given profile if there exists \(c \ne w\) and \(\big (\sigma _v\big )_{v \in \mathscr {V}_\text {M}(c)} \in \mathscr {S}^{\mathscr {V}_\text {M}(c)}\) such that \(f\Big ( \big (s(u_v)\big )_{v \in \mathscr {V}_\text {NM}(c)}, \big (\sigma _v\big )_{v \in \mathscr {V}_\text {M}(c)} \Big ) = c\). In that case, we say that c is a CM winner. We say that a voting rule is unison manipulable (UM) in a given profile if there exists \(c \ne w\) and \(\sigma \in \mathscr {S}\) such that \(f\Big ( \big (s(u_v)\big )_{v \in \mathscr {V}_\text {NM}(c)}, \big (\sigma \big )_{v \in \mathscr {V}_\text {M}(c)} \Big ) = c\) (note that all the manipulators use the same strategy).Footnote 2

In a given profile, we say that a candidate c is a resistant Condorcer winner (RCW) if, for any pair of candidates (de) that are different from c: \({{\,\textrm{card}\,}}\{v \in \mathscr {V} \text { s.t. } u_{vc}> u_{vd} \text { and } u_{vc}> u_{ve} \} > \frac{V}{2}\). This is equivalent to: in this profile, any Condorcet-consistent voting rule elects c and is not CM (Durand et al., 2016b). This property is stronger than CW and weaker than MF, in the sense that: c is MF \(\Rightarrow\) c is RCW \(\Rightarrow\) c is CW.

3 Voting rules under study

The voting rules described in the literature are often irresolute: in a (generally limited) number of cases, they can output several candidates. In order to always select a single winner , we use the same tie-breaking principle for all of them: the candidates of a profile are equipped a priori with distinct integer indices and, in case of tie, candidates with lower indices are favored. Hence when describing a voting rule, we may write “the candidate with highest (resp. lowest) score is elected (resp. eliminated)” as a shortcut of language meaning “among the candidates with the highest (resp. lowest) score, the candidate of lowest (resp. highest) index is elected (resp. eliminated)”.

The classification of the voting rules in the following sections is made for exposition purposes only; many rules could be included with reason in several categories. For each rule, we also define an abbreviated name that we use in the figures. Sections 3.1 to 3.4 present ordinal rules, where sincere voting depends only on the voter’s preference ranking; unless otherwise stated, the set of strategies is the set of rankings over the candidates, and sincere voting consists in giving one’s true preference ranking. Section 3.5 presents cardinal rules (i.e. non-ordinal).

3.1 Score-based voting rules

Each candidate c is assigned a numerical score denoted \({{\,\textrm{score}\,}}(c)\). Elect the candidate with the highest score.

Plurality (Plu):

\({{\,\textrm{score}\,}}(c) = \sum _{v \in \mathscr {V}} \mathbbm {1}_{(r_{vc} = 1)}\).

Veto, or antiplurality (Vet):

\({{\,\textrm{score}\,}}(c) = - \sum _{v \in \mathscr {V}} \mathbbm {1}_{(r_{vc} = C)}\).

Borda rule (Bor):

\({{\,\textrm{score}\,}}(c) = \sum _{v \in \mathscr {V}} (C - r_{vc})\).

Copeland rule (Cop):

\({{\,\textrm{score}\,}}(c) = \sum _{d \ne c} M_{cd}\).

Maximin rule (Max):

\({{\,\textrm{score}\,}}(c) = \min _{d \ne c} W_{cd}\).

Bucklin rule (Buc):

\({{\,\textrm{score}\,}}(c) = (-m_c, x_c)\), where \(m_c = {{\,\textrm{median}\,}}_{v \in \mathscr {V}} r_{vc}\) and \(x_c = {{\,\textrm{card}\,}}\{v \in \mathscr {V} \text { s.t. } r_{vs} \le m_c \}\). Scores are compared using the lexicographic order.

3.2 Elimination rules

In the six following rules, one or several candidates are eliminated, and the process is iterated until only one candidate remains, who is then declared the winner. When a score is used (plurality score, Borda score, etc), it is always computed on the profile restricted to the non-eliminated candidates.

Instant-runoff voting (IRV):

Eliminate the candidate with the lowest plurality score.

Baldwin rule (Bal):

Eliminate the candidate with the lowest Borda score.

Nanson rule (Nan):

Eliminate all candidates whose Borda score is below the average.

Coombs rule (Coo):

Eliminated the candidate with the lowest veto score.

Kim-Roush rule (KR):

Eliminate all candidates whose veto score is below the average (Kim & Roush, 1996).

Viennot rule (Vie):

Let (cd) be the two candidates with the lowest plurality scores . If \(W_{cd} > W_{dc}\), then eliminate d, and vice versa (Durand, 2015).

In the two following rules, the election proceeds in several rounds.

Exhaustive ballot (EB):

At each round, each voter casts a ballot for one candidate. Sincere voting consists in voting for one’s preferred candidate among the non-eliminated ones. The candidate with the lowest score is eliminated. Note that if voters are sincere, the winner is the same as in IRV.

Two-round system (TR):

This is similar to exhaustive ballot, but after the first round, only the two candidates with the highest plurality scores are selected for the second and last round. Note that for \(C = 3\), this rule is equivalent to exhaustive ballot.

3.3 Condorcet-consistent variants of IRV

Together with IRV and exhaustive ballot, we call the five following rules the IRV family.

Condorcet-IRV (CI):

If a Condorcet winner exists, elect her. Otherwise, elect the IRV winner.

Benham rule (Ben):

As long as the profile has no Condorcet winner, eliminate the candidate with the lowest plurality score. Then elect the Condorcet winner of the restricted profile.

Tideman rule (Tid):

Alternately, eliminate all the candidates outside the Smith set (if any), and the candidate with the lowest plurality score. When only one candidate remains, she is declared the winner.

Smith-IRV (SI):

Eliminate the candidates outside the Smith set, then run IRV on the restricted profile.

Woodall rule (Woo):

Among the candidates of the Smith set, elect the one that is eliminated latest in IRV.

Condorcet-IRV is defined by Green-Armytage et al. (2014) and Durand et al. (2016b); the four other rules above are described by Green-Armytage (2011).

3.4 Other Condorcet rules

Among the rules mentioned above, Copeland, Maximin, Baldwin, Nanson, Viennot, Condorcet-IRV, Benham, Tideman, Smith-IRV and Woodall are Condorcet-consistent. In addition, we study the four following ones.

Black rule (Bla):

If a Condorcet winner exists, elect her. Otherwise, elect the Borda winner (Black, 1958).

Ranked Pairs (RP):

Construct a graph whose vertices are the candidates. One by one, add the same edges as in the weighted majority graph by order of decreasing weight, except when the newly added edge would create a cycle. Finally, elect the candidate at the maximal vertex of the graph (Tideman, 1987).

We now denote by \(S_{cd}\) the width of the widest path from c to d in the weighted majority graph.

Schulze rule (Sch):

Elect the candidate w such that \(\forall c \ne w, S_{wc} \ge S_{cw}\) (Schulze, 2011).

Split Cycle (SC):

Elect the candidate w such that \(\forall c \ne w, S_{wc} \ge W_{cw}\) (Holliday & Pacuit, 2020).

3.5 Cardinal rules

Approval voting (AV):

Each voters votes for any number of candidates. Elect the candidate with most votes.

Range voting (RV):

Each voters assigns a numerical grade to each candidate, in a set of authorized grades. Elect the candidate with the highest total grade.

Scoring then automatic runoff (Star):

Ballots are the same as in range voting. Let c and d be the two candidates with the highest total grades. If c is rated higher than d by more voters than the opposite, then elect c, and vice-versa.

Majority Judgment (MJ):

Each voters v assigns a mention \(m_{vc}\) to each candidate c, in an ordered set of authorized mentions. Denote \(m_c = {{\,\textrm{median}\,}}_{v \in \mathscr {V}} m_{vc}\), \(p_c = {{\,\textrm{card}\,}}\{v \in \mathscr {V} \text { s.t. } m_{vc} > m_c \}\) and \(q_c = {{\,\textrm{card}\,}}\{v \in \mathscr {V} \text { s.t. } m_{vc} < m_c \}\). If \(p_c > q_c\), then \({{\,\textrm{score}\,}}(c) = (m_c, p_c)\); otherwise, \({{\,\textrm{score}\,}}(c) = (m_c, - q_c)\). Scores are compared using the lexicographic order (Balinski & Laraki, 2010).

We will specify the sincerity function that we consider for these rules (and the set of authorized grades or mentions for range voting, Star and majority judgment) when we present our datasets in Sect. 4.

4 Datasets

In this paper, we study two datasets that we call the FairVote dataset and the Netflix dataset.

The FairVote organization (www.fairvote.org) has collected the ballots of 172 single-winner elections using IRV in the US: member of city council, member of board of supervisors, mayor, sheriff, district attorney, school director, assessor treasurer, etc. Generally, ballots give a truncated preference ranking: voters are allowed to mention their k most-liked candidates (with \(k = 3\) typically). For reasons of computation time, we limit our dataset to the 162 elections with at most 11 candidates. Figure 1 gives an overview of the selected profiles, with elections ranging from 3 to 11 candidates and from 1560 to 299,107 voters.

Fig. 1
figure 1

Overview of the profiles: FairVote dataset

Fig. 2
figure 2

Overview of the profiles: Netflix dataset

Our second dataset is extracted from the “training set” of the Netflix prize (www.netflixprize.com).Footnote 3 The original dataset consists of 100,480,507 integer grades, from 1 to 5 stars, that 480,189 users gave to 17,770 movies. For a given number of candidates C, we generate several preference profiles by the following greedy algorithm. We select the movie that was graded by most voters; then we select a second movie that maximizes the number of common voters with the first movie; a third movie, that maximizes the number of common voters with the first and second movies, etc. When C movies are selected, we save our first profile, defined by these C movies and their common voters (removing the voters who assign the same grade to all of them). Then we remove these C movies from the database, and we proceed similarly to generate the next profile with C candidates. We continue as long as the generated profile has at least 1000 voters. This whole algorithm is used for all \(C \in \{3, \ldots 11\}\). Finally, this process generates 2243 profiles with 3 to 11 candidates (movies) and 1000 to 91,880 voters (users), as illustrated in Fig. 2. The interest of this dataset is threefold: it provides a large number of profile; the preferences are cardinal, and not only ordinal; and voters (users) have incentive to reveal their true preferences, because it helps Netflix’ algorithm advise them about other movies that they may like.

For the FairVote dataset, we convert the truncated rankings into cardinal preferences by considering an adapted Borda score (where c has 1 point for each d such that v ranks c higher than d, and 0.5 points for each d such that v’s ballot treat c and d equally). Note that this choice has only an impact on the cardinal voting rules.

For each profile of both datasets, for all cardinal ratings, we add i.i.d. uniform noises whose amplitude is negligible compared to the differences between cardinal ratings. As a consequence, if v declares preferring c to d in her original ballot, then it is the case in her noised ballot; but if v puts several candidates as tied, then they are in a uniformly random order after adding the noise. The objective is twofold: lead investigations in a space that is richer than the only original profile; and simplify the analysis by considering only strict preferences. For each profile, we actually draw several noised profiles: 62 for the FairVote dataset, and 5 for the Netflix dataset, so that the margin of uncertainty due to the random realization is of order \(1 / \sqrt{162 \cdot 62} < 1\%\) and \(1 / \sqrt{2243 \cdot 5} < 1\%\) respectively. By convention, this statistical uncertainty will not be represented in the figures.

For approval voting, we consider that a sincere voter will vote for all candidates who have a cardinal utility at least equal to the average possible value, i.e. \(\frac{C-1}{2}\) in the FairVote dataset and 3 stars in the Netflix dataset. For range voting, Star and majority judgment:

  • In the FairVote dataset, the authorized grades (or mentions) are the continuous interval [0, 1]; we consider that sincere voters will apply an affine transformation to their cardinal preferences so that their most (resp. least) liked candidate has a grade of 1 (resp. 0).

  • In the Netflix dataset, the authorized grades (or mentions) are the integer interval \(\{1, \ldots , 5\}\).

5 Algorithms

In order to study the manipulation by coalition, we use the Python package SVVAMP 0.8.3: Simulator of Various Voting Algorithms in Manipulating Populations (Durand et al., 2016a).Footnote 4

A core feature of SVVAMP is to study the unweighted coalitional optimization problem (UCO): compute X(c), the minimal number for which there exists strategies \(\sigma _1, \ldots , \sigma _{X(c)}\) such that \(f\Big ( \big (s(u_v)\big )_{v \in \mathscr {V}_\text {NM}(c)}, \sigma _1, \ldots , \sigma _{X(c)} \Big )= c\). Roughly speaking, it is the minimal number of manipulators needed to make c win. If and only if \(V_\text {M}(c) \ge X(c)\), candidate c is a CM winner. Unfortunately, computing X(c) can be very expensive: for example, it is NP-hard for IRV (Bartholdi & Orlin, 1991), maximin and ranked pairs (Xia et al., 2009), Borda, Baldwin and Nanson (Davies et al., 2014). For this reason, SVVAMP computes bounds \(\underline{X}(c)\) and \(\overline{X}(c)\) such that \(\underline{X}(c) \le X(c) \le \overline{X}(c)\). Table 1 indicates the type of algorithm used for each voting rule and their time complexity: “exact” means that \(\underline{X}(c) = X(c) = \overline{X}(c)\); “approximate” means that there is a theoretically proven guarantee on the ratio or the difference between \(\underline{X}(c)\) and \(\overline{X}(c)\); “heuristic” means that there is no such approximation guarantee. Table 1 also indicates what type of algorithm is used to compute UM.

Table 1 Algorithms used and their time complexity

Since even UM cannot always be computed exactly in polynomial time, we also use the notion of trivial manipulation (Durand, 2015). Let \(t(u_v, c, w)\) be the trivial strategy of voter v in favor of c against w, defined as follows:

  • If the voting rule is ordinal, v acts as if c was her most liked candidate, w her most disliked candidate, with other candidates in the same relative order as in her true preferences;

  • If the voting rule is cardinal, v gives the best grade or mention to c, and the worst one to all other candidates.

We say that the voting rule is trivially manipulable (TM) in a given profile if there exists \(c \ne w\) such that \(f\Big ( \big (s(u_v)\big )_{v \in \mathscr {V}_\text {NM}(c)}, \big (t(u_v, c, w)\big )_{v \in \mathscr {V}_\text {M}(c)} \Big )= c\). Firstly, this can always be computed in polynomial time (provided the winner can be computed in polynomial time, which is the case for all voting rules in this paper). Secondly, it is a relatively simple and natural manipulation heuristic, requiring little information about the whole profile; it can be argued as more realistic for human manipulators than a sophisticated manipulation, like the one resulting from a non-polynomial algorithm.

6 Results

6.1 Qualitative features of the profiles

Fig. 3
figure 3

Qualitative features of the profiles: FairVote dataset

Fig. 4
figure 4

Qualitative features of the profiles: Netflix dataset

Figures 3 and 4 represent the qualitative features of the profiles. In both datasets, more than 99 % of the profiles have a Condorcet winner (CW), which qualifies previous theoretical work (such as Gehrlein (2006)), but confirms previous similar empirical findings (Tideman, 2006). Even having a Condorcet order (CO) happens very often: 99% in the FairVote dataset and 97% in the Netflix dataset. 41% of the profiles in the FairVote dataset, and 7% in the Netflix dataset, have a resistant Condorcet winner (RCW): no Condorcet-consistent rule can be CM in these profiles. Finally, 37% of the profiles in the FairVote dataset, and 5% in the Netflix dataset, have a majority favorite (MF): some rules such as plurality or IRV cannot be CM in these profiles. Since all these rates are higher in the FairVote dataset, we can already expect more possibilities of coalitional manipulation in the Netflix dataset.

6.2 CM rate

Fig. 5
figure 5

CM rate: FairVote dataset

Fig. 6
figure 6

CM rate: Netflix dataset

The CM rate of a voting rule is the proportion of profiles where the rule is CM (in a given dataset or probabilistic model). Figures 5 and 6 show the CM rates of the voting rules under study. In these figures and all the following bar plots, the solid bar gives a lower bound, and the upper end of the thin black line provides an upper bound. For example, for Benham rule (Ben) in Fig. 5: SVVAMP proves that Benham is CM in 3% of the profiles (solid blue bar), is unable to conclude in less than 1% of the profiles (thin black line, representing the algorithmic uncertainty), and proves that Benham is not CM in the remaining 96% of the profiles. These figures also indicate the RCW bound: no Condorcet-consistent rule can have a higher CM rate because of the profiles having an RCW. As we already suspected, all the values of CM rate are higher in the Netflix dataset than in the FairVote dataset. However, several qualitative conclusions are common.

Our main conclusion is that the seven rules of the IRV family have a lower CM rate than all the other ones. Their CM rates are very similar: the difference is lower than 3% in both datasets. In the Netflix dataset, it is not excluded that Tideman, Benham and Smith-IRV have a CM rate significantly lower than the other rules of the family (with a difference of at most 2%) ; this would deserve further investigation. Apart from the IRV family, the two-round system has the lowest CM rate. This can be partly explained by the fact that in both datasets, approximately one third of the profiles have 3 candidates, a case where the two-round system is equivalent to exhaustive ballot. We will discuss the performances of the two-round system depending on the number of candidates in Sect. 6.6.

As for the Condorcet-consistent rules that are not part of the IRV family, we can take maximin and Schulze as references, because their results are almost exact in practice (the algorithmic uncertainty is less than 0.1%), and the lower bound for their CM rate is identical, whatever the dataset. Compared to them:

  • Baldwin and Copeland show promising results (better lower bound) that would deserve further investigation;

  • Viennot, ranked pairs and split cycle have essentially the same lower bound, but more precise algorithms would be necessary to determine if they are as good or worse than maximin and Schulze;

  • Nanson and Black exhibit worse CM rates; in both datasets, Black has a CM rate that is close to the RCW bound, i.e. the worst possible CM rate for a Condorcet-consistent rule.

Plurality has a higher CM rate than maximin and Schulze, and other rules have an even higher CM rate, for example majority judgment, Bucklin, Kim-Roush, veto and Borda. Four rules have a higher CM rate than the RCW bound in both datasets: Star, range voting, approval voting and Coombs.

6.3 UM rate

Fig. 7
figure 7

UM rate: FairVote dataset

Fig. 8
figure 8

UM rate: Netflix dataset

Figures 7 and 8 represent the UM rate, defined similarly to the CM rate. Most conclusions are similar to Sect. 6.2, with the following precisions or differences.

  • The Condorcet-consistent rules of the IRV family show the same results, with a UM of at most 1% in both datasets. This UM rate is strictly lower than for IRV or exhaustive ballot, but the difference is less than 1%.

  • Baldwin and Copeland confirm their promising results, compared to maximin and Schulze.

  • Veto has much better results for the UM rate than for the CM rate (for example, its UM rate is lower than maximin and Schulze). This is not surprising because a typical manipulation for c in veto consists in dividing the manipulators’ ballots between all the other candidates.

6.4 TM rate

Fig. 9
figure 9

TM rate: FairVote dataset

Fig. 10
figure 10

TM rate: Netflix dataset

Figures 9 and 10 present the TM rates. The conclusions are similar to Sect. 6.3, with the following precisions.

  • Star has significantly better performances in terms of TM rate than for the CM rate or UM rate, but with a dramatic difference between the FairVote dataset (0%) and the Netflix dataset (80%).

  • Viennot, ranked pairs and split cycle have essentially the same TM rate as maximin and Schulze; the TM rate of Viennot is slightly lower, but the difference is less than 1% in both datasets.

6.5 CM complexity index

Fig. 11
figure 11

CM complexity index: FairVote dataset

The computational complexity of computing the strategic ballots has often been mentioned as a way to deter manipulation : for example, it is NP-hard for Borda (Davies et al., 2014). However, in practice, Borda has exactly the same CM rate and TM rate in both datasets, suggesting that strategic ballots are actually easy to compute. To formalize this idea, we introduce the CM complexity index as the share of the profiles where the rule is neither UM nor TM, divided by the share of the profiles where the rule is CM. Since our margins of uncertainty are too high in the Netflix dataset to have interpretable results, we present only the results for the FairVote dataset in Fig. 11.

All the rules of the IRV family, the two-round system and veto have a CM complexity index that is higher than 75%. Baldwin, Copeland, Viennot, ranked pairs, split cycle, Nanson, Kim-Roush, Black and Schulze have a CM complexity index that is lower. Approval voting, range voting, Coombs, majority judgment and plurality have a CM complexity index that can easily be proven equal to 0% in general, because they are CM if and only if they are UM. Star, Borda, Bucklin and maximin have no such theoretical property, but in practice, their CM complexity index is also equal to 0% in this dataset.

6.6 Number of CM winners

Fig. 12
figure 12

Average ratio of CM winners: FairVote dataset

Fig. 13
figure 13

Average ratio of CM winners: Netflix dataset

We now investigate the indeterminacy of the outcome that is due to strategic voting. Figures 12 and 13 represent the ratio of CM winners, defined as the number of CM winners divided by \(C - 1\). This confirms the good results of the IRV family, with 0–2% in both datasets, better than all the other rules.

Fig. 14
figure 14

Average number of CM winners: Netflix dataset

In Fig. 14, we represent the average number of CM winners as a function of the number of candidates C, only for the voting rules whose algorithmic uncertainty is less than 1%. We omit the FairVote dataset, where we do not have enough different elections for each possible number of candidates. Globally, the average number of CM winners can roughly be described as an affine function of C. By increasing result, we have: Condorcet-IRV, IRV and exhaustive ballot, with barely 0.3 CM winners for \(C=11\); Schulze, Veto and Bucklin (\(\approx 5\) CM winners for \(C=11\)) ; majority judgment and Borda (\(\approx 8\) CM winners for \(C=11\)) ; approval voting, range voting, Coombs, Star and plurality, who can lead to the election of almost any candidate on average. The case of the two-round system deserves a particular mention: almost 0 CM winners when \(C=3\) (when the rule is equivalent to exhaustive ballot), but performance degrades when C increases and reaches approximately 8 CM winners for \(C=11\).

6.7 Condorcet violation rate

Fig. 15
figure 15

Condorcet violation rate: FairVote dataset

Fig. 16
figure 16

Condorcet violation rate: Netflix dataset

The sincere Condorcet consistency rate is the share of the profiles where the sincere winner is the Condorcet winner, divided by the share of the profiles where a Condorcet winner exists. The Condorcet consistency rate with CM is the share of the profiles where the sincere winner is the Condorcet winner and the rule is not CM, divided by the share of the profiles where a Condorcet winner exists. The Condorcet violation rate, sincere or with CM, is 1 minus the corresponding Condorcet consistency rate. It is represented in Figs. 15 and 16.

By definition, all Condorcet-consistent rules have a sincere Condorcet violation rate equal to 0%. When coalitional manipulation is taken into account, the seven rules of the IRV family outperform all the others, and have relatively similar results between them: in the Netflix dataset, where the differences are larger, their Condorcet violation rates with CM span from 1% (lower bound for Tideman) to less than 4% (for Condorcet-IRV, IRV, exhaustive ballot and Woodall).

6.8 Loss of social welfare

Fig. 17
figure 17

Loss of normalized social welfare: FairVote dataset

Fig. 18
figure 18

Loss of normalized social welfare: Netflix dataset

The social welfare of candidate c is defined as \({{\,\textrm{SW}\,}}(c) = \sum _{v \in \mathscr {V}} u_{vc}\). The loss of normalized social welfare of candidate c is \(\frac{ \max _{d \in \mathscr {C}} {{\,\textrm{SW}\,}}(d) - {{\,\textrm{SW}\,}}(c) }{ \max _{d \in \mathscr {C}} {{\,\textrm{SW}\,}}(d) - \min _{d \in \mathscr {C}} {{\,\textrm{SW}\,}}(d) }\). Applied to \(c = w\), it yields the sincere loss of normalized social welfare. Applied to the CM winner c who minimizes \({{\,\textrm{SW}\,}}(c)\), it yields the loss of normalized social welfare with CM. Both values are represented in Figs. 17 and 18.

For the sincere value, by definition, range voting is optimal. When coalitional manipulation is taken into account, once again, the seven rules of the IRV family outperform all the others, with a loss of 2% or lower.

6.9 CM power index

Strategic voting can generate an inequality of power between the naive, sincere voters and the sophisticated, strategic ones . To quantify this idea, we introduce the CM power index of a voting rule. In a given profile, it is defined as \(\max _{c \ne w} \frac{ V_\text {NM}(c) }{ X(c) }\). In a dataset or probabilistic model, it is the average value of the above quantity over the profiles. Roughly speaking, if a voting rule has a CM power index of x, then a strategic voter have x times as much power than a non-strategic voter.

Fig. 19
figure 19

CM power index: FairVote dataset

Fig. 20
figure 20

CM power index: Netflix dataset

Figures 19 and 20 show, in both datasets, that the rules of the IRV family have a CM power index between 1.00 and 1.15, thus being close to the “one person, one vote” principle. All other voting rules have a higher CM power index, with values that can be as high as 5.93 (reached for Star in the Netflix dataset).

7 Conclusion

We have studied coalitional manipulability by computer simulations on the basis of two empirical datasets . For all the indicators, the seven rules of the IRV family (exhaustive ballot, IRV, Condorcet-IRV, Benham, Smith-IRV, Tideman and Woodall) outperform all the other rules of our study. Although the differences between these seven rules seem inconsequential, further studies with more precise algorithms would be interesting to evaluate their respective performances.