Numerical indicators to monitor and assess government performance have become an increasingly salient feature of international relations (Kelley, 2017; Kelley & Simmons, 2017). Quantifying and ranking government performance represents an important and emerging avenue of exercising power in the international system. The indicators represent a “technology of global governance” (Davis et al., 2012) and “tools of organizational power” (Cooley, 2015, p. 2) that allow diverse actors – nation states, international organizations, and nongovernmental organizations – to influence policymaking across the globe. Indicators can alter state behavior by triggering status competition among states, creating a focal point for domestic political activism, or encouraging international pressure from third parties, such as intergovernmental or nongovernmental organizations (Kelley & Simmons, 2020).

However, the impact of these indicators remains contested. Accumulating evidence suggests that indicators play an important role in influencing state behavior in areas like business regulations (Doshi et al., 2019), terrorist financing (Morse, 2019), and foreign aid (Honig & Weaver, 2019). Survey evidence suggests that rankings meaningfully affect citizen perceptions and may lead to demands for policy centralization to address low rankings (Davies et al., 2021). On the other hand, critics have noted that indicators may have limited impact due to the absence of formal enforcement mechanisms (Lee & Matanock, 2020). In addition, they may cause perverse consequences, such as the shifting of resources away from other compelling needs (Bisbee et al., 2019) or the establishment of arbitrary standards that entrenth social hiearchies among states (Broome et al., 2017). Furthermore, the precise mechanisms through which indicators affect state behavior remain relatively underexplored.

In this article, we examine the role of numerical indicators in the field of education. Although international education is an often-neglected topic among international relations scholars, the substantive importance of education is unmistakable: on average, governments devote about 14% of their spending on education, about double the amount they invest in military expenditures.Footnote 1 A strong education system has also been widely recognized as a crucial precondition for economic development (Hanushek & Kimko, 2000; Hanushek & Woessmann, 2012; Rodrik, 1995). Education aid is a longstanding priority of bilateral and multilateral aid donors, with flows reaching about $15.6 billion in 2018 (UNESCO, 2020). Furthermore, education policy has been a major source of contestation since the end of World War II, as domestic and international proponents of common norms, policy alignment, and accountability have clashed repeatedly with defenders of national prerogatives to control access to education and its content (Kijima & Lipscy, 2022).

Cross-national assessments in education (CNAs), such as those promoted by the Organisation for Economic Co-operation and Development (OECD) and International Association for the Evaluation of Educational Achievement (IEA), are among the most prominent and visible numerical rankings of national performance. Assessment results receive widespread media coverage and make headlines all over the world following their release.Footnote 2 They are now widely followed as indicators for national educational quality and, more broadly, human capital and international competitiveness.Footnote 3

At their best, CNAs can improve education outcomes by benchmarking education progress over time and against peer countries (Ramirez et al., 2018). Transparent testing can promote accountability and trigger calls for reform by citizens. However, CNAs are deeply contentious. Critics have accused the tests of imposing Western norms of education on developing countries, a form of “educational colonialism (Meyer, 2014).” Some scholars have criticized CNAs for generating perverse consequences through overreliance on quantitative metrics and petitioned multilateral agencies to halt their administration.Footnote 4

Although CNAs have both vocal advocates and critics, the politics surrounding CNAs remain understudied. Existing research has predominantly focused on the role of global norms and culture institutionalized by Western countries and multilateral agencies (Ramirez et al., 2016; Smith, 2016), which have compelled more countries to participate in CNAs (Kamens & McNeely, 2010). Although this is a useful framework for explaining the general proliferation of CNAs, it is less useful for explaining their impact. Existing studies on the consequences of CNAs have largely focused on case studies (Abdul-Hamid et al., 2011; Addey, 2015; Grek, 2009; Takayama, 2008). To the best of our knowledge, this is the first systematic study of how CNA participation shapes the politics of education and impacts education outcomes.

We test three mechanisms through which CNA participation can influence education policy. First, CNA participation can alter the context of education policymaking at the elite level. Administering a CNA requires extensive preparation, which includes elite interaction with international assessment agencies, education experts, and foreign counterparts. This provides significant opportunities for technical transfers, learning, and socialization. CNA participation can also strengthen the hand of reformist elites by increasing the salience of education performance as a form of cross-national status competition. Second, education reforms could occur in response to test results, which trigger calls for change from domestic groups like civil society, parents, or teachers. Finally, CNA participation may lead to greater education aid inflows from intergovernmental organizations, which welcome accountability and numerical metrics.

Empirically, we adopt a mixed-methods approach. Our primary analysis draws on an original panel dataset covering all CNAs and all countries in the international system since 1959. We supplement this data using an elite survey of education officials responsible for the planning and administration of CNAs in 46 countries, along with personal interviews of 48 policymakers in both target states and assessment agencies. Difference-in-differences estimation based on the panel data provides strong evidence that CNA participation is associated with increases in education attainment and education aid inflows. We use several empirical strategies to address potential endogeneity, such as the possibility that preexisting education reforms are responsible for both CNA participation and changes to education outcomes.

An important contribution of our study is disentangling the mechanisms through which global performance indicators matter. Existing studies of global performance indicators generally examine their impact without explicitly testing specific mechanisms (Kelley & Simmons, 2015) or focus on testing one particular mechanism, such as international pressure (Lee & Matanock, 2020; Morse, 2019; Roberts & Tellez, 2017). Our empirical approach allows us to explicitly test three causal mechanisms against each other. Evidence from the panel analysis and survey suggest that elite mechanisms are primarily responsible for our findings, though the other two mechanisms may matter in specific circumstances.

1 The growing role of cross-national assessments in education

During the last sixty years, there has been a considerable expansion in the number of countries administering CNAs. Traditionally, governments treated information about their education systems and student performance as a domestic policy matter, limiting information sharing across borders (Anderson, 2006). On the other hand, CNAs disseminate information about education systems internationally, allowing for cross-national comparisons. Close to eighty countries now routinely participate in CNAs of some sort.

Countries have participated in CNAs for a variety of reasons. Participation can be motivated by technocratic priorities, such as a desire to better understand learning gaps and improve education outcomes. For example, the 1959 Pilot Twelve-Country Study – the first CNA – was motivated by the Soviet launch of Sputnik and Western concerns that better empirical measures were needed to assess and improve educational performance to remain competitive (Husén, 1979). CNA participation is also sometimes mandated or encouraged by international organizations vis-à-vis their members or a condition for administering aid (Benveniste, 2002; Kamens & McNeely, 2010). Participation may also be driven by normative factors such as a desire to emulate wealthy countries or regional peers (Kijima, 2010). For policymakers who anticipate a high ranking, CNA participation may be attractive as a marker of international prestige or as a mechanism to advertise a high-quality labor force to potential investors. Conversely, countries that anticipate or experience low performance may opt out of participation. Although the former is difficult to observe, countries like Botswana, South Africa, and the Kyrgyz Republic have dropped out of CNAs after experiencing low performance (Lockheed & Wagemaker, 2013). This introduces a potential selection problem that we will discuss at greater length in the empirical section.

CNAs include global assessments and regional assessments. Global assessments, such as the Trends in International Mathematics and Science Study (TIMSS) and Programme for International Student Assessment (PISA), are universalistic in principle. These assessments are open to all countries and place few restrictions on participation, though universality has not been achieved in practice.Footnote 5 Regional assessments target countries in a specific region. Examples include the Southern African Consortium of Education Quality (SACMEQ) and the Regional Comparative and Explanatory Study (ERCE), which focuses on Latin American countries.

Since the first CNA was administered in 1959, numerous countries and economies have participated in global international assessments like PISA, TIMSS, and the Program in International Reading and Literacy Study (PIRLS) conducted by the International Association for the Evaluation of Educational Achievement (IEA). Figure 1 traces participation in CNAs since 1959. CNAs primarily attracted a handful of economically advanced nations from 1959 through the 1980s. Participation expanded considerably since the 1990s, particularly among less developed countries. This can be attributed to the implementation of large-scale assessments like TIMSS (1995), SACMEQ (1995), and PISA (2000), along with greater confidence in their value among policymakers and other stakeholders. As Addey and Sellar note, during this period, “International assessments and data that had previously been questioned were accepted as valid and even understood as essential for policy making, benchmarking progress, setting standards, and identifying ‘what works’ (Addey & Sellar, 2019, p. 5).” This also contributed to greater demands for participation from international organizations and domestic stakeholders concerned about education policymaking.

Fig. 1
figure 1

The Number of Participants in Cross-National Assessments in Education (1959–2012). Note: Participation in cross-national assessments in education has grown over time. A country is recorded as participating if it took part in any CNA during a given year. Most CNAs are not administered annually, which creates gaps during some years. OECD status is as of 2012

2 Theory: The impact of cross-national assessments in education

While the reach and visibility of CNAs has expanded dramatically, very little scholarship has systematically examined how they affect the political context of education policy. Theories of global performance assessments generally see impacts on state behavior occurring through three pathways of influence at the elite, domestic political, and transnational levels (Kelley, 2017; Kelley & Simmons, 2015). Adapting this framework, we will propose three hypotheses about how CNAs affect education outcomes.

Although the benefits of high-quality education are numerous and seemingly self-evident, there are also many potential barriers to reforms. Assessment agencies and international organizations active in the area of education have often promoted objectives such as expanding enrollment and education access under initiatives like “Education for All,” providing equal opportunities for marginalized groups, and an emphasis on quantifiable skills such as reading, math, and science (Martens & Niemann, 2013). Although these priorities may appear universal, they are often the subject of considerable domestic contestation and distributive conflict. The expansion of education access to marginalized groups such as ethnic minorities and girls is contentious in specific country contexts (Murphy-Graham & Lloyd, 2015; Nuamah, 2019). Education reforms are often resisted by teacher unions and administrators, who bear the burden of adapting to new policies, educating and administering more students, and updating pedagogical practices (Bruns et al., 2019). Education is not only a means of human capital development but also mechanism to promote obedience and social order (Lü, 2014; Paglayan, 2022). Global priorities like quantifiable learning outcomes and accountability can threaten the status quo by exposing the shortcomings of curriculums based on nationalistic or ideological prerogatives. Without a clear impetus for reform, these sources of resistance can stymie change.

CNA participation can affect education outcomes by altering the behavior of policymaking elites. Existing research shows that global performance indicators exert significant influence at the elite level by shaping incentives and norms (Kelley & Simmons, 2020). For example, the Trafficking in Persons Report influences state behavior by raising concerns about reputation and status maintenance among political elites (Kelley, 2017). Analogously, quantitative and qualitative evidence suggests that the Aid Transparency Index enhances transparency by invoking normative and reputational concerns among political principles and bureaucrats in aid agencies (Honig & Weaver, 2019). Furthermore, survey evidence suggests that performance assessments that involve direct interaction between assessment agencies and government policymakers tend to yield greater influence (Masaki & Parks, 2020). This suggests elite-level mechanisms play an important role in linking performance indicators to policy outcomes.

CNA participation can alter the incentives, beliefs, and capacity of education policy elites. For the purposes of this article, we define education elites as government officials and experts who oversee education policymaking in a specific country. Such elites are primarily bureaucrats in national education ministries, but they also encompass politicians with an interest in education policy and experts such as former officials, academics, and consultants retained by governments to participate in the policymaking process. The definition excludes practitioners such as school administrators and teachers as well as members of civil society not directly involved in policymaking.

CNA participation can alter the preferences of education policy elites through socialization and the generation of status competition. Assessment agencies draw on their legitimacy as authoritative, “scientific” arbiters of education quality to shape the priorities of education officials. OECD, which administers PISA, draws more broadly on the prestige associated with its status as a club of the most advanced industrialized economies in the world (Davis, 2016; Meyer & Benavot, 2013). Assessment agencies, along with international aid agencies, which are often involved in the administration of CNAs, construct “shared ideologies of an ‘imagined’ world order through a process of negotiation, diffusion, and sometimes contestation (Mundy, 1999, p. 28).” Assessment agencies and related international organizations leverage their authoritative status and access to education policy elites to promote priorities consistent with global values, such as greater and more equitable access to education and emphasis on economically important skills.

Participation in a CNA also generates status competition by placing a country within a global hierarchy based on quantified education performance. Education is a core function of the modern state that affects essentially all citizens, and it is intuitively associated with status competition as the first context in which children experience formal evaluation and relative comparisons. For some countries, education performance is intertwined with a sense of national pride, while for others, it is seen as a proxy for international economic or geopolitical competition (Martens & Niemann, 2013). Assessment agencies reinforce status competition by sponsoring conferences that regularize in-person, cross-national interaction among the education policy elites of participating countries. If these causal mechanisms are important, we should be able to observe a shift in elite preferences over education policymaking after CNA participation is initiated.

CNA participation also improves the capacity of education officials to implement effective reforms by increasing access to technical expertise, training, and feedback about shortcomings in their education systems. Unlike global performance indicators that rely primarily on existing data sources (Honig & Weaver, 2019), CNAs generate new data with the active participation of students, teachers, school administrators, and government officials. Thus, CNAs are conducted by countries in close cooperation with assessment agencies and other international organizations, which are actively involved in the development of test items, planning, sampling, field trials, and analysis (Henry et al., 2001; Lockheed, 2012). Education officials from participating countries are often invited to conferences where they receive detailed information about reforms and practices in other countries.Footnote 6 Hence, CNA participation can build the capacity of education policy elites through direct, intensive interaction with assessment agencies, international organizations, foreign counterparts, and technical experts (Lockheed, 2012).

Furthermore, CNA participation can strengthen the hand of reformers within the domestic education establishment to overcome resistance to change. Education reforms are often resisted by traditionalists who invoke factors like nationalism and culture to defend the status quo. For example, education access for girls is often curtailed in developing countries on cultural or religious grounds (Sarvarzade & Wotipka, 2017; Stromquist, 1990). Similarly, ethnic minority groups often face significant access barriers and receive lower quality education (Letchamanan, 2013). Coursework in foundational skills like reading, math, and science can be crowded out by the imperatives of ideological or nationalistic education. The international status competition associated with CNAs gives education reformers a potent counterargument against defenders of the status quo by tying national prestige and reputation to improvements in the education system. Hence, we propose:

  • H1 Elite Politics: CNA participation will lead to improvements in education outcomes through the action of elites.

Second, CNA participation may facilitate education reforms by providing a focal point for non-elite domestic constituents who favor changes, such as civil society, teachers, and parents. Many organizations that publish global performance indicators seek to impact policy outcomes through domestic political channels (Kelley & Simmons, 2020), and their impact can be magnified through dissemination in the media and civil society (Carpenter, 2007; McCombs & Shaw, 1972). However, non-elite mobilization may be limited in political contexts where governments have the ability to control information dissemination and suppress civil society (Kelley, 2017).

CNA results are often reported by media outlets as a definitive indicator of a country’s education performance, and lower-than-expected scores or declining scores can provide ammunition for critics of the status quo. Many countries have experienced PISA “shocks,” in which lower than expected scores become widely publicized by the media and lead to public criticism of existing policies (Breakspear, 2012; Wiseman, 2010). Even if overall scores are satisfactory, CNA results often identify specific areas of weakness that can become focal points for mobilization, such as reading scores for boys (Brozo et al., 2014) or the performance of girls in science and mathematics (OECD, 2015). Hence:

  • H2 Non-Elite Mobilization: CNA participation will lead to improvements in education outcomes through the mobilization of non-elite domestic actors.

The mechanisms associated with H1 and H2 have generally been challenging to tease apart in the existing literature on global performance indicators. When policymakers implement changes in response to a performance indicator, are they motivated intrinsically by reputation and status concerns or by domestic political pressures? In the empirical section, we will focus on the distinct mechanisms through which H1 and H2 operate in order to test them against each other. Specifically, H2 suggests education reforms should primarily occur after the release of CNA scores, which provide a focal point for domestic activism. In contrast, the learning, training, socialization, and status competition associated with H1 start earlier, as elites prepare for the administration of CNAs, interact with relevant experts, and participate in associated international conferences. We will also examine survey evidence to unpack policymaker motivations surrounding CNAs.

Finally, existing scholarship has theorized that transnational pressure from third-party state and non-state actors is an important avenue for the impact of global performance assessments (Kelley, 2017; Kelley & Simmons, 2015). For example, conditionality by aid agencies has been an important motivation for the adoption of the Millennium Development Goals (Skagerlind, 2020) and states that experience a decline in Freedom House’s Freedom in the World rankings tend to face public criticism from other states (Roberts & Tellez, 2017). However, the impact of transnational pressure on policy outcomes remains highly contested. Lee and Matanock see third-party enforcement as a “less likely” mechanism due to the extra steps involved between international assessment and policy implementation, and they find limited evidence that investors alter their behavior in reaction to the Corruption Perceptions Index (Lee & Matanock, 2020). Even when third-party pressure is present, policymakers may resist policy implementation or reallocate resources from other priorities, leading to unintended, perverse consequences (Bisbee et al., 2019).

In the case of CNAs, one plausible source of transnational pressure is education aid inflows from international donor agencies. CNAs provide a clear, quantifiable measure of a country’s education system, and repeated assessments can establish a track record of improvement or decline. This is an attractive feature for aid agencies, which can use CNAs in their monitoring and evaluation of education aid projects. As we discuss below, international organizations sometimes mandate participation in a CNA as a condition for membership or aid disbursement. Even if CNA participation reveals deficiencies in a country’s education system, it can provide justification for requesting greater aid to remedy the problems. Hence, CNA participation may make it more likely that a country receives foreign aid for the purposes of improving its education system:

  • H3 Transnational Pressure: CNA participation will lead to improvements in education outcomes by increasing education aid inflows.

If H3 is correct, there should be a clear association between CNA participation and education aid inflows. Furthermore, if transnational pressure is the mechanism through which CNAs improve education outcomes, we should also observe an association between education aid inflows and improved education outcomes.

3 Panel analysis

In this section, we will use a panel dataset to examine the impact of CNA participation on education outcomes. To do so, we constructed an original dataset covering all countries in the international system from 1950 to 2012. The dataset contains participation status by country for all CNAs, including major cross-national assessments and smaller-scale regional assessments. A full list of CNAs included in the panel and participating countries is available in the supporting information.

CNAs provide the best available measure for national education quality. One observable implication of our theory is that, ceteris paribus, CNA scores should be higher for countries that initiated participation in CNAs earlier, assuming enhancements to education quality accumulate over time. This is indeed the case: the correlation between average test scores in 2005 and number of years since first CNA participation is 0.43.Footnote 7 However, a major limitation of the CNA data is that it is unavailable for non-participating states and participating states prior to their first test. Hence, it is impossible to rule out obvious alternative explanations such as self-selection: countries with better education quality likely selected into assessments early on. This necessitates an alternative variable with greater cross-national and temporal availability.

As an alternative measure, we use net secondary school enrollment, defined as the percentage of secondary-school-aged children who are enrolled in secondary school.Footnote 8 The measure is widely used in existing studies as a proxy for educational development and attainment.Footnote 9 This is intuitive: increases in net secondary enrollment tend to reflect stronger student performance at the primary level or policy initiatives to enhance access to higher levels of education. Secondary education attainment is widely cited as essential for the transition from school to the labor market in a knowledge-based economy, and many countries include secondary enrollment in benchmarking the progress of their education system (Bloom, 2006; Statistics, 2010; Stone et al., 2013). The measure is also useful because it is not susceptible to “gaming” by officials seeking to artificially inflate CNA scores: loosening standards to advance more unqualified students to secondary education will tend to lower, rather than increase, CNA performance.Footnote 10 As such, we can expect increases in net secondary enrollment associated with CNA participation to reflect genuine educational improvements: e.g. reductions in absenteeism or the removal of access barriers for qualified students, such as ethnic minorities or girls.

Net secondary enrollment also has important theoretical and empirical advantages over alternative measures. Most CNAs in our dataset, including those with the largest number of country participants, such as TIMSS and PISA, target youths in upper primary and secondary education.Footnote 11 As such, reforms motivated by CNAs will most likely have an impact at this level. In addition, unlike primary enrollment, which is universal in a large majority of countries, there is meaningful cross-national and temporal variation in net secondary enrollment rates.Footnote 12 Finally, among various potential education measures that we considered, net secondary enrollment exhibits the strongest correlation with available CNA scores (R = 0.67).Footnote 13

The key independent variable is CNA participation. This is a dichotomous variable coded as 0 for countries that have never participated in a CNA, and 1 after a country initiates participation for the first time. We use this coding scheme for several reasons. First, we view the first instance of participation in a CNA as a critical milestone. The first instance of participation generates a country-specific score and relative ranking for the first time. In addition, switching from non-participation to participation triggers several developments that could plausibly have a large impact on education policymaking – intensive elite interaction with assessment agencies for the first time, as well as the first revelation of a country’s score and relative ranking to domestic and international audiences. While subsequent participation will also plausibly produce an incremental impact, the clearest test of our propositions is the first shift from non-participation to participation. Second, CNAs are not usually conducted annually, and coding off years as non-participation is likely to produce misleading results. Third, it is impractical to code “dropouts” from CNAs because of the irregular timing of CNAs and the fact that we do not have a long track record since initiation of participation for most countries.Footnote 14 Our theoretical priors also suggest dropping out is unlikely to have a “reverse effect” of the same magnitude as participating for the first time – once a country participates in a CNA, a score and relative ranking will be available regardless of future behavior, and we would not expect the learning and socialization that took place during the initial round of participation to be reversed by a decision to drop out. We thus divide countries broadly into two categories, i.e. “non-treated” countries that have never participated in a CNA and “treated” countries that have.

For the purposes of empirical analysis, it is important to be cognizant of the logistics of CNA administration. Figure 2 depicts a typical timeline for a country that participates in a CNA.Footnote 15 A country will typically declare their intention to participate in a CNA about two years in advance of the assessment date. The period immediately preceding the assessment involves preparations and interaction with the assessment agency and other relevant parties, such as donor agencies. This interaction continues after the assessment is concluded with technical consultations and international conferences to discuss the findings. Test scores are typically released about a year after the assessment takes place. This means that the effect of participation on education policymaking will tend to occur in two distinct phases, which gives us some leverage over causal mechanisms: 1. The runup to the assessment, primarily involving elite interaction with international actors, which provides opportunities for learning, socialization, and intra-elite contestation; 2. The period after test scores are released. During this period, we expect the elite mechanism to continue to matter, but the release of test scores can also trigger a response from domestic and international third-party audiences.

Fig. 2
figure 2

The Logistics of Participation in a CNA. Note: Each T is one year. The specifics of this timeline are based on the administration of PISA 2015, but the timeline is typical based on our review of 23 additional CNAs

For the purposes of empirical analysis, we will incorporate some flexibility regarding the specific timing of CNA initiation to account for these factors. It is also important to recognize that policy measures in response to CNA participation are unlikely to take effect immediately, i.e. we would not expect secondary enrollment to increase instantaneously upon the initiation of consultations with assessment agencies or the revelation of test scores. On the other hand, an impact within one or two years is plausible. Net secondary enrollment can be increased quickly by allowing or facilitating the advancement of traditionally underprivileged students, such as students with low socioeconomic status or girls in many developing countries. In addition, evidence from impact evaluations indicate that policymakers have tools at their disposal to rapidly and cheaply improve salient outcomes such as student and teacher absenteeism (Benhassine et al., 2013; Duflo et al., 2012; Kremer et al., 2013; Miguel & Kremer, 2004).

For our analysis, we use a generalized difference-in-differences estimation, controlling for year and country fixed effects across all OLS specifications. We also control for GDP/capita (PPP) to account for differences in education outcomes associated with levels of development and polity scores to account for the fact that democracies tend to invest greater resources in public goods, such as education (Baum & Lake, 2003).

In Table 1, we examine the association between CNA participation and secondary enrollment rates. CNA participation is positively associated with secondary school enrollment across specifications. The magnitude of this association is substantively large, representing about a 6–7 percentage point increase in the net secondary enrollment rate, or 27–32 million additional students annually on a global basis.Footnote 16 As we discussed earlier, the logistics of CNAs are such that effects on policy outcomes could somewhat lag behind administration – e.g., this would be the case if the main impact of CNAs occurs from policy responses to the publication of test scores. Alternatively, we might conceptualize participation differently and treat expressions of intent to participate as the true initiation of participation. To account for these possibilities, we reran our empirical specifications by leading and lagging the key independent variable by three years. As the second and third column of Table 1 illustrate, this produces substantively similar results. One and two year lags also produced substantively similar results.

Table 1 Panel Analysis—Net Secondary Enrollment (OLS)

Self-selection is an important potential concern for our results. Participation in CNAs is usually voluntary. Leaders who fear the domestic political repercussions of publicizing the state of their education systems may choose not to participate, while countries with improving education performance may opt in. An important assumption of our empirical approach is that CNA participants and non-participants would have been subject to common trends in net secondary enrollment rates in the absence of CNA participation. If CNA participants exhibit a higher rate of increase in secondary enrollment prior to participation, it would be strongly indicative of endogeneity: e.g. countries with preexisting education reforms selecting into CNAs. Following the approach of Autor (2003) we reran our empirical specifications including indicator variables for leads and lags of participation to examine if the participant group exhibits any distinct trends in secondary enrollment prior to participation. More specifically, we omit our key independent variable and instead include dummy variables for t-4, t-3, t-2, t-1, t = 0, t + 1, t + 2, t + 3, and t > 3, where t = 0 is the first time a country participated in a CNA. Each indicator variable is coded as 1 only in the relevant year, with the exception of t > 3, which is coded 1 for all years subsequent to year 3. The substantive results are presented in Fig. 3. The figure illustrates that, among countries that participated in CNAs, there is no statistically significant trend in secondary enrollment prior to the year they initiated participation, i.e. there does not appear to be an anticipatory increase in enrollment prior to CNA initiation.

Fig. 3
figure 3

Estimated Impact of CNA Participation on Net Secondary Enrollment (%) for Years Before, During, and After Participation. Note: There is no statistically significant pre-trend in secondary enrollment prior to the year of CNA participation

In order to interpret the findings in Fig. 3, it is helpful to refer back to the timeline we presented in Fig. 2. Elite interaction with international actors generally commences about two years prior to the administration of an assessment. Strictly speaking, this means that it would not be contrary to our theoretical propositions if we observed some shift in secondary enrollment between t-2 and t = 0 (it would be far more problematic if we observed a trend prior to t-2). However, it is not very realistic to expect that policy measures to boost secondary school enrollment would be developed and implemented immediately. The observation of an effect on secondary enrollment at t = 0 is broadly consistent with the proposition that CNAs exert an effect on education outcomes primarily through elite politics. Policy reforms in response to the revelation of test scores will occur after t + 1, when scores are typically released. The point estimates in Fig. 3 suggest that secondary enrollment continues to climb after t + 1, but subsequent, incremental increases above the level at t = 0 are not statistically distinguishable from zero. This evidence tends to favor H1 over H2: the improvement in secondary enrollment associated with CNA participation primarily occurs prior to the public release of test results.

Although we have shown that there is no difference in the pre-trend for CNA participants and non-participants, one residual concern is self-selection in the period immediately surrounding CNA participation. More specifically, some countries might initiate participation in CNAs in conjunction with an education reform, for example to help evaluate the efficacy of the reforms. Under these circumstances, we could potentially observe CNA participation coinciding with improvements in enrollment levels even if CNA participation per se has no effect. Furthermore, this would not show up as a difference in pre-trends.

To address this concern, we classify countries according to whether or not CNA participation was triggered by a domestic education reform. We are able to do this because one of the questions we administered in our survey explicitly asked officials if domestic reforms were an important reason for their country’s first-time participation in CNAs.Footnote 17 We distinguish countries that answered in the affirmative and negative to this question. Countries that answered in the affirmative are cases where the effects we are attributing to CNA participation could be in part due to education reforms that would have been implemented anyway. For countries that answered in the negative, we can be more confident that we are observing the effect of CNA participation per se.

Based on our identification of reform and non-reform countries, we recoded our key independent variable, CNA participation. As with the original variable, countries are coded as 1 for all years after participation is initiated, subject to being a non-reform country. Using an analogous procedure, we created a dummy variable for reform country, encompassing countries that indicated in our surveys that CNA participation was motivated by the onset of a new education reform. Finally, we analogously coded undetermined reform for countries that participated in CNAs but for which we could not determine whether their initial participation was due to a reform.Footnote 18

The results are presented in Table 2. If selection bias is an important problem, we would expect to see a large difference between reform and non-reform countries. There is some evidence for this: the point estimate for reform countries is about twice as large as that for non-reform countries. In countries where education reforms are ongoing when CNA participation is initiated, secondary enrollment is boosted not only from the effects of CNA participation, but also due to existing reform initiatives. However, more importantly for our purposes, the coefficient for non-reform countries is also positive and statistically significant. Even among countries where no domestic education reform was ongoing, CNA participation is clearly associated with increases in secondary enrollment.

Table 2 Panel Analysis – Separating Countries by the Onset of Domestic Reform (OLS)

3.1 Robustness checks

We performed several additional robustness checks: the full results are available in the Supporting Information. To avoid overreliance on a single measure, we reran the analyses using two alternative dependent variables that should also respond to CNA participation based on our theoretical premises. The first is the survival rate to the last year of primary education: a precondition for higher education attainment is the completion of lower levels of education. The second variable is adolescents out of school as a share of secondary school aged children, which accounts for absenteeism: higher enrollment rates may not be very meaningful if formally enrolled students are not going to school consistently. These variables are associated with first-time participation as expected: the survival rate to the last year of primary education is positively associated with participation, while adolescents out of school as a share of secondary school aged children is negatively associated with participation.

As an alternative strategy to account for selection bias, we reran the analysis after separating countries for which CNA participation was mandated by an international organization, such as the OECD or an international aid agency. Self-selection is less of a concern for countries that were externally mandated into CNA participation. This information was coded from our survey data using a procedure analogous to the reform variable.Footnote 19 Again, first-time participation was positively and significantly associated with increases in net secondary enrollment even for countries that participated due to an external mandate. We also reran the analysis using counterfactual estimators for time-series cross-sectional data (Liu et al., 2022), which relaxes the key assumptions of constant treatment effects and absence of time-varying confounders in two-way fixed-effects models – and matching methods for causal inference in TSCS data (Imai et al., 2021). In both cases, the results were consistent with the reported findings.

As an additional test of H1 against H2 and H3, we examined the effect of CNA participation on wealthy autocracies, such as Kuwait, Qatar, and Saudi Arabia.Footnote 20 For these countries, with closed political systems and no need for support from international aid agencies, the impact of H2 and H3 should be relatively muted. Rerunning the models in Table 1 for this subset of countries shows a statistically significant increase in secondary enrollment after CNA participation. This provides additional evidence in favor of H1 over H2 and H3: even in autocratic countries with no reliance on foreign aid, CNA participation boosts enrollment.

We also tested several observable implications that follow from our theory. All three hypotheses are consistent with the proposition that CNAs administered by OECD and IEA should be associated with a greater impact compared to minor assessments by less prominent agencies. The OECD and IEA are the two most prominent assessment agencies and thus plausibly linked to greater authority, technical sophistication, legitimacy, visibility, and credibility vis-à-vis foreign aid donors. We thus separated “major” assessments conducted by these two organizations from “minor” assessments conducted by other agencies. The results suggest that participation in major CNAs is meaningfully associated with increases in enrollment, while the association for minor CNAs is positive but attenuated.Footnote 21 Although we predict that CNA participation is particularly important for first-time participants, it is plausible that participating repeatedly creates a greater cumulative impact by among other things facilitating more opportunities for elite socialization and learning. We thus reran the analysis separating countries that only participated in CNAs once and those that have participated multiple times. The results are consistent with some cumulative effect – although one-time-only participation is meaningfully associated with higher enrollments, the coefficient for multiple-time participation is about twice as large.

To make sure that our results are not unduly influenced by any specific country in the dataset, we reran the analyses while sequentially omitting each country from the models. In all cases, the substantive results were unchanged. For developing countries in our sample, there is a potential concern that increases in secondary enrollment could be a statistical artifact. For example, CNA administration may lead to better census data on students in the education system, boosting official enrollment numbers.Footnote 22 We thus reran the analysis by separating out OECD countries, for which this is unlikely to be a concern. This produced similar substantive results. The results were also substantively similar when separating out non-OECD countries and separating out democracies and non-democracies.

3.2 Transnational pressure

Thus far, our analysis suggests that participation in CNAs is associated with increases in net secondary enrollment. In addition, the evidence is more consistent with H1 (elite politics) than H2 (non-elite mobilization): increases in enrollment occur primarily during the period in which education elites are expected to respond to CNA participation but before the publication of test results. However, H3 (transnational pressure) still remains a potentially plausible explanation for our findings. For aid-dependent countries, relations with the international donor community are an important feature of their engagement with CNAs. CNAs can enhance a country’s accountability and reputation among international donors. If so, we might expect CNA participation to be associated with increased education aid inflows, which in turn facilitate education reforms. Aid agencies could plausibly reward CNA participation as a sign of improving accountability even before initial test results are publicly released.

We repeat the analysis from the prior section, substituting official development assistance to the education sector as the dependent variable.Footnote 23 As this is an aid inflow measure, there is no variation in the dependent variable for countries that receive no aid. All other details remain the same. The results are presented in Table 3. The first column presents our baseline model. CNA participation is strongly associated with increases in education aid. As with the previous section, we also reran the models with three-year leads and lags, and the results were broadly similar. We also used leads and lags to confirm that the parallel trends assumption is satisfied (see Supporting Information).

Table 3 Panel Analysis—Education Aid (OLS)

It appears that there is a positive association between CNA participation and education aid inflows: countries that participate are more likely to receive aid. For H3 to be supported, one more condition is necessary: higher education aid inflows need to be associated with higher net secondary enrollment. To test this, we return to our previous model that used net secondary enrollment as the dependent variable, but include education aid as an independent variable. The results are presented in the fourth column of Table 3. As the results show, there is no association between education aid inflows and net secondary enrollment. This contradicts H3: although CNA participation does lead to a bump in foreign aid inflows, these aid inflows are not associated with increases in enrollment.Footnote 24

In sum, our results indicate that CNA participation is associated with meaningful increases in net secondary enrollment, our proxy for education quality and attainment. Furthermore, these increases remain even after accounting for potential self-selection by participating states. CNA participation also facilitates greater inflows of foreign aid to education. Among our proposed hypotheses, we find the strongest support for H1 (elite politics). The timing of the observed effects is consistent with H1 but less supportive of H2 (non-elite mobilization): large increases in secondary enrollment occur before the release of CNA results and hence opportunities for mobilization by domestic groups around numerical rankings. In addition, there is no association between foreign aid inflows and enrollment, a necessary condition for H3 (transnational pressure).

4 Evidence from survey and interviews

To further evaluate our claims, we draw on evidence from an original survey of education elites in 46 countries and separate, in-depth interviews conducted with 48 officials from target countries, assessment agencies, and donor agencies. The evidence from the surveys and interviews supports our earlier findings. Respondents generally perceive a meaningful impact of CNA participation on education reforms. In addition, our survey and interview findings are largely consistent with the panel analysis regarding the impact of each mechanism. Respondents generally perceive elite mechanisms as an important pathway through which CNAs alter education policymaking. While we were able to identify specific cases where domestic non-elite and transnational factors likely mattered, they appear to be less common compared to policy reforms motivated by elite mechanisms.

4.1 Survey description

We collected data in 2011 and 2012 at two international conferences hosted by the IEA, a major assessment agency responsible for several CNAs including TIMMS. At these conferences, we collected data from delegates who were responsible for the implementation and analysis of CNA results in their countries. The conferences engaged the delegates in high-level policy discussions on topics such as assessment administration, participation, and analysis of results. The conferences included 150 delegatesFootnote 25 from 67 countries.Footnote 26 77 delegates representing 46 countries responded to our survey.Footnote 27 To consider potential bias from self-selection into the survey, we ran bivariate comparisons on the following variables: GDP/capita, net secondary enrollment rate, polity score, and assessment scores for Reading, Mathematics and Science at the secondary level (see Supporting Information). There were no statistically significant differences between responding and non-responding countries across these variables.

This survey evidence is helpful for examining the plausibility of our findings from the panel analysis. We were able to acquire information from officials responsible for education policymaking in a diverse set of countries. It would be troubling if these policymakers reported impressions at odds with our earlier findings: for example, that CNAs have no impact on education policy or that elite mechanisms are unimportant. On its own, the affirmation of an impact by survey respondents and interviewees is not decisive – e.g., education officials might have an inflated perception of the influence of CNAs. However, the findings in this section complement our earlier findings by illustrating that education officials do in fact generally perceive a meaningful impact of CNAs on education policy outcomes in a manner consistent with the panel analysis.

4.2 Cross-national assessments and education reforms

We begin by considering whether survey respondents perceive a relationship between CNA participation and education policies in their countries. The delegates we surveyed are policy elites deeply embedded within their countries’ education policymaking establishments who are also responsible for the implementation of CNAs. They are predominantly policymakers and government bureaucrats on rotation as director or senior administrator to manage international assessments and liaise between their ministry and international organizations or assessment agencies. Therefore, they are not advocates or promoters of CNAs. Furthermore, the conference focused on largely technical topics such as assessment administration rather than the broader impact of CNAs on education policy. We therefore do not have a strong reason to believe that the respondents would have an exaggerated perception about the impact of CNAs. Nonetheless, the subjective nature of their responses should be noted and interpreted with appropriate caution.

One of the survey items asked for an open-ended response about the impact of CNAs on national education policy decisions. Omitting responses that indicated that it was too early to assess the impact of CNAs due to recent participation, 70% of the respondents described how CNA participation directly affected their country’s education policy or curriculum, and 7% indicated that participating in CNAs had other impacts such as gaining knowledge about how to administer national assessment systems, increases in education funding, or participation in other CNAs. About 22% of respondents noted that CNAs had no impact on education policies in their country.

Our findings reinforce existing, mostly qualitative work that identifies a link between CNA participation and education reforms (Abdul-Hamid et al., 2011; Takayama, 2008; Addey, 2015; Grek, 2009; Breakspear, 2012). Many respondents provided specific examples of how CNA participation influenced the course of education reform in their countries. For example, a delegate from New Zealand noted that “The impact of the early cycles of TIMSS was quite significant as a driver for math educational policy, with the establishment of a math and science taskforce and then the numeracy strategy.”Footnote 28 In Botswana, participation in the 2003 TIMSS resulted in “changes in the curriculum for [grades] 1 to 3… [and] introduction of a programme called SMASSE (Strengthening of Math and Science Programme in Education.”Footnote 29 In Iran, “policy-makers changed the content of textbooks in Science and Reading….and adapted our curriculum to the framework of TIMSS and PIRLS.”Footnote 30

To further examine the association between CNA participation and education reforms, we collected data for education reforms in all countries initiating participation in CNAs since 1980. It is challenging to quantify education reforms systemically due to variation in factors like definitions of reforms, policy contexts, and information availability. For consistency, we coded education reforms based on official World Bank and UNESCO documentsFootnote 31 and examined the 10-year window around each country’s first-time CNA participation. According to this data, countries implemented an average of 0.4 education reforms per year in the 5 years before CNA participation, and this increased to 0.8 education reforms per year in the 5 years after participation. We made an analogous comparison using a separately collected dataset of education reforms in developed European countries during 1929–2000 (Braga et al., 2013). The trends were similar: there were on average 0.1 education reforms per year 5 years prior to first-time CNA participation, and this increased to 0.3 reforms in the 5 years after. In years prior to the 10-year window, the rate was also 0.1 education reforms per year.

4.3 Elite politics

Responses to the survey generally indicated agreement with questions related to elite politics mechanisms. 84% agreed that CNA participation “improves our capacity to conduct and evaluate our own assessments,” and 73% agreed that participation “facilitates exchange of information between countries/economies.” The respondents also highlighted how participating in CNAs enabled countries to gain knowledge from assessment agencies on how to develop, prepare, administer, and assess student learning through the administration of large-scale assessments. A Chilean official highlighted a large delegation visit to OECD that came as part of the country’s participation in PISA, “a very, very impressive experience… it was very important to have progress in our capabilities.”Footnote 32 A delegate from Trinidad and Tobago stated that participation in international assessments provided an opportunity to “interact with IEA to learn about best practices…and to validate our own standard system.”Footnote 33 According to a group of education delegates representing Botswana, the advantage of CNA participation “… is to improve our research skills.”Footnote 34 Participation in CNAs facilitates capacity building and acquisition of technical expertise to assess, monitor, and evaluate national education systems.

Some interview subjects also noted that CNA participation altered the discourse and norms surrounding education in favor of goals promoted by international assessment agencies, such as expanded enrollment (“education for all”), gender equality, and a focus on foundational skills like math and reading (Martens & Niemann, 2013). A delegate from Kuwait stated that, “Once our country was exposed to international meetings that talked about international assessments, the revolution of education began. We started many awareness campaigns to also educate the mass[es].”Footnote 35 Indeed in 1995, the year of Kuwait’s first-time CNA participation, the country initiated a major education reform that included efforts such as eliminating illiteracy in five years, focusing on girls’ education, enhancing teacher autonomy in classrooms, and instituting curriculum reforms that focus on basic skills.Footnote 36 A representative from Honduras similarly stated that participation in CNAs triggered a “big debate about what quality of education means” and the adoption of “very valid and reliable instruments” supported by assessment agencies strongly emphasizing core competencies in academic subjects such as math and reading.Footnote 37

Several interview subjects noted that political leaders support CNA participation to motivate education officials and strengthen domestic standards. In Vietnam, then Deputy Prime Minister Nguyen Thiện Nhân decided to participate in PISA for the first time although “lower-level ministry staff was unsure of what participation in PISA [meant] for Vietnam.”Footnote 38 Nguyen saw CNA participation as a mechanism to motivate education officials to pursue high standards and signal the country’s educational and economic competitiveness to a global audience.Footnote 39 In Brazil, President Cardoso made a decision to participate in PISA even though he suspected his country would “come out at the bottom,” based on an assessment that participation would boost performance of the education system.Footnote 40

Finally, participation in CNAs is clearly associated with status competition among policymaking elites. All surveyed officials responded affirmatively that a key motivation for countries to participate in CNAs is “to compare our education quality with other countries or economies.” The interviews corroborate the survey data: interviewees spoke frequently about status competition. For example, a delegate from Jordan stated that “When you participate in international studies, you know whether you are performing well or worse compared to others.”Footnote 41 Over half (56%) of the respondents agreed that participating in CNAs “improves our reputation/status in the international community,” despite our sample including many developing countries that have low scores.

To better understand status competition associated with CNAs, we asked respondents to list which countries they compare their own scores against. We observed three broad categories of comparison sets: 1. countries that typically rank toward the top, such as Finland, South Korea, and Singapore; 2. countries within the same region that share similar linguistic and/or cultural characteristics; and 3. countries at similar levels of economic development. Respondents who compare their own country with top performers were often representatives of relatively low-ranking countries. By providing transparent, comparative rankings of education performance, CNAs create an international status hierarchy, forcing countries to seek higher status by implementing reforms or accepting their ranking within a plausible peer group.

4.4 Non-elite mobilization and transnational pressure

Our survey and interview findings are largely consistent with the panel analysis regarding the impact of non-elite mobilization and transnational pressure: while some education policymakers see these factors as being important, they generally receive less support than elite politics factors. Respondents provided mixed views regarding the importance of domestic mobilization by non-elites in the administration of CNAs. In the survey, only 18% of respondents agreed that “Pressures from citizens about showing results in the education sector” was a factor in their decision to participate in CNAs, though 43% agreed that “Negative [test] results could result in public upheaval,” which suggests that policymakers are concerned about potential public backlash if test scores do not meet expectations. Similarly, only 12% of respondents agreed that CNAs result in “more resources/foreign aid to education by donors who credit our effort.”

To be sure, our interview subjects identified specific instances where non-elite mechanisms or donor influence appeared to play an important role in facilitating education reforms. An officer from the United States stated that “TIMSS results and PISA results provided data to justify the sweeping reforms of the No Child Left Behind Act,” although the legislation produced mixed results.Footnote 42 In Chile, there was a major backlash when the government decided to forego one cycle of TIMMS and the public “…accused the government of skipping TIMSS because the results of the reform were so poor that we were hiding.”Footnote 43 In many developing counties, donor agencies are strong proponents of CNAs. A delegate from Yemen mentioned that donor agencies are “interested to see the results of their support for countries.”Footnote 44 In developing countries, where the national assessment system is weak, data from CNAs can be used a reliable tool to audit education quality. An officer from Jordan mentioned that “[donor agencies] want to have access to indicators [to measure the progress] of the reform.”Footnote 45

Non-elite and transnational mechanisms appear to matter in some cases. However, our results provide a cautionary note about overgeneralizing from case studies, which have been the primary empirical approach in existing work on the impact of CNAs on the politics of education. Non-elite and transnational mechanisms do not receive strong support in the panel analysis, which includes all countries, or in our survey, which covers a broad subset of countries that participate in CNAs.

5 Conclusion

CNAs have proliferated rapidly over the past three decades, providing clear, transparent rankings of education systems. We have provided the first systematic analysis of how participation in CNAs shapes the politics of education. The rapid growth in the number of CNAs and participants represents an important shift in global education policymaking. Assessment agencies and international organizations play an increasingly influential role in how countries discuss, design, and evaluate education policy. The rapid adoption of CNAs worldwide has coincided with the evolution of education from a national to a global issue (Steiner-Khamsi, 2003) and an increasing recognition that education is a basic human right and global public good (Meyer et al., 2010; Ramirez et al., 2007; Tsutsui & Wotipka, 2004). Although CNAs have not been without critics, this article shows how participation can positively impact education outcomes. Our findings link CNA participation to meaningful increases in net secondary enrollment rates across a wide range of countries. We find that there is substantive impact of 6–7 percentage point increase in net secondary enrollment rate, which is equivalent to 27–32 million additional students globally on an annual basis. Participation in cross-national assessments is also associated with higher primary completion rate, lower levels of absenteeism at the secondary level, acceleration in reform efforts, and greater influx of foreign aid to education.

Elite politics appear to be a particularly important mechanism for CNAs. The findings from both our panel and survey evidence support the importance of elite mechanisms. We find weaker evidence for the non-elite mobilization and transnational pressure mechanisms, though both the existing literature and our interviews identify specific instances where they likely mattered. CNAs inherently involve close coordination between target states and assessment agencies, international organizations, and experts, providing clear opportunities for learning, professionalization, and norm diffusion. The comparative and transparent nature of CNAs strongly evokes reputational and status concerns, motivating policymakers to improve education performance and climb international ranking tables.

Our findings present a mixed picture for the evolving literature on global performance indicators. Our findings echo some existing studies that suggest elite channels are a particularly important pathway of influence for global performance indicators (Honig & Weaver, 2019). However, CNAs feature intensive involvement of the assessing agency and direct interaction among education policy elites and international counterparts, which is not typical for most global performance indicators. We find limited support for non-elite and transnational mechanisms, which are critical for indicators that do not directly involve elites. Quantifying and ranking may not be enough without complementing mechanisms, such as learning, socialization, financial incentives, or enforcement. Future work can build on this research by examining which of these specific mechanisms matter and under what conditions, for example by comparing global performance indicators that are comparable but vary according to the presence of specific mechanisms.

It is also important to recognize several limitations of our study. Net secondary enrollment is an important measure, and it exhibited meaningful variation across countries during the time period examined. However, future research can build on our results by collecting and leveraging more comprehensive, nuanced data on education reforms (Bromley et al., 2021). Enrollment may be relatively easier to improve compared to more complex and contested outcomes, such as student learning and achievement. As some countries begin to approach universal enrollment, it may become more difficult to both measure and achieve improvements based on CNA participation. This article focused on the three mechanisms first proposed by Kelley and Simmons (2015) and sought to tease apart elite, domestic, and transnational channels of CNA influence. However, it may also be informative in future research to explore the interactions among these mechanisms, such as how elites may leverage or form coalitions with other domestic actors or international third parties to accelerate reforms.

Although our findings associate CNA participation with important benefits, it is important to acknowledge critics who see international assessments as a form of “education colonialism (Meyer, 2014).” Such critics argue that CNAs are administered by unaccountable, largely Western technocrats and that other indices focusing on national innovation or creativity better account for the ultimate consequences of education outcomes.Footnote 46 Indeed, our survey and interviews revealed considerable norm contestation over CNAs, such as concerns among officials from developing countries that assessments fail to reflect their ethnolinguistic diversity.Footnote 47 In some cases, pressure to adopt global norms may benefit underprivileged students, such as girls or minority groups underserved by traditional, local approaches to education. However, we are cognizant that our approach does not account for facets of education that are difficult to measure. Although we find large, substantively important increases in education attainment associated with CNA participation, we cannot rule out deterioration in unquantified dimensions such as moral and civic education, prosocial behavior, or artistic expression. Policymakers should not focus solely on an international metric of quality but rather use CNAs as one of many inputs that can enhance the quality of education for all in various learning communities around the world.