In this paper we address the organization of criminal justice forecasting and implications for its use in criminal justice policymaking. We use the terms forecasts, predictions, and simulations interchangeably (unless specified) to refer to estimates of future outcomes based upon a set of assumptions that will persist throughout the period to which they apply. We argue that the use of forecasting is relatively widespread in criminal justice agency settings, but it is used primarily to inform decision-making and practice rather than to formulate and test new policy proposals. Using predictive policing and prison population forecasting as our main examples of the range of forecasting methods adopted in criminal justice practice, we describe their uses, how their use is organized, and the implications of the organizational arrangements for the transparent, reviewable, and consensual use of forecasting. We conclude that while correctional population forecasting is done in a relatively transparent manner, predictive policing has been adopted in ways that are not transparent and present concerns about the legitimacy of its use.

Our paper is organized as follows: Drawing upon literature on forecasting and policy in macroeconomics, we identify principles for using forecasting in criminal justice policy. We then provide a brief historical review of criminal justice forecasting before describing current criminal justice demands for forecasting, where we point out its common use in supporting practical decision-making. From there, we take a step back to discuss some of the methodological issues associated with predictive policing and correctional population forecasts before discussing how the field and policy and practice might move toward a more transparent, independent, and democratic approach to their use.

Forecasting and Policymaking: Some Lessons from Macroeconomics

Forecasting can be used to formulate and assess policy proposals and to inform practice. For our purposes, forecasting to formulate policy involves modeling future outcomes to inform decisions about future policies, where the forecasts of the implications of those policies are assessed for accuracy. Forecasts for policy analysis means assessing the impacts of existing policies on future outcomes. Policy formulation and analysis forecasts focus on policy targets of interest, such as the future size of prison populations and can involve the use of individual- or aggregate- level data. Forecasting for criminal justice practice, on the other hand, typically involves predicting individual- or area- level outcomes such as recidivism or likely criminal hotspots. These forecasts aid criminal justice agencies in making case processing, treatment, and tactical response decisions, such as whether or not to detain defendants pretrial, what level of supervision is appropriate for parolees and probationers, or how to allocate limited police resources. In any case, choices about policy and practice are ultimately normative and can be informed by data from reliable forecasts, as well as by other qualitative information. To our knowledge, no agency mechanistically uses data from forecasts in making decisions; rather, these tools are used to inform (and sometimes justify) the decision-making process.

Relying on the relatively-well documented history of forecasting and policymaking in macroeconomics, we draw three main implications for the use of forecasting in criminal justice; these include theories underlying policymaking, forecasting accuracy, and the translation from forecast to action.

Theory of Policymaking

Forecasting and macroeconomic policymaking has a long and well-developed history (Pagan & Robertson, 2004; Wieland & Wolters, 2013) dating back to Theil (1958) and Tinbergen’s theory of economic policymaking (1952). This work suggests that a policymaker aims to maximize utility under conditions in which there are variables that the policymaker wants to influence but that are not directly under his or her control; these are the policy targets. To do this, policies are implemented that control or manipulate variables that are known or believed to influence the policy targets. For example, the Federal Reserve sets the maximization of employment, moderation of inflation, and price stability as targets. The short-term federal funds rate is the policy variable that gets manipulated to achieve these goals. Theory specifies the relationships among variables, especially how the policy variable to be manipulated is expected to influence the policy targets, and the variables beyond the control of a policymaker, such as a central banker. More than one forecast approach may be used, including those with outcome-based rules, forecast-based rules, and optimal control policies and forecast targeting. These each involve complexity and require assumptions to be made about different forecast loss functions, or deviations of the predicted outcomes from subsequent observations, and whether they are symmetric or asymmetric (Wieland & Wolters, 2013). For example, forecasts of government budget deficits that underestimate the deficits can result in different (and higher) costs to governments from those that overestimate them. Hence, a symmetric loss function may not be desirable for these forecasts. We expand upon this concept below, but put simply, the forecast model a policymaker chooses has substantial implications for whether directional prediction errors (e.g., false-positives/overestimations and false-negatives/underestimations) are treated equally (symmetrical) or unequally (asymmetrical).

Criminal justice policymaking differs from macroeconomic policymaking in several ways. For example, criminal justice policymaking occurs at different levels of government among different agencies that have different instruments for policy implementation. Theories of policymaking may differ across these levels of government. In macroeconomic policy, when a single entity such as the Federal Reserve sets interest rates, all economic actors are affected. By contrast, in federal criminal justice policy, federal law applies to federal justice agencies and justice assistance grant programs for state and local agencies. State and local agencies are not necessarily required to make policy changes in parity to receive funding, although federal agencies can exert influence by setting program priorities for grant awards. Criminal justice research on policymaking stresses the importance of understanding the policy-making environment, the information needs of stakeholders, and information gaps that can impact practice (Johnson et al., 2018; Ismaili, 2006; Garrison, 2009). This knowledge provides a basis for theorizing about forecasting and policymaking at different levels of criminal justice practice.

Forecast Accuracy

Accuracy of economic forecasts focus on the loss function (i.e., predictions vs. observed outcomes). Complicating evaluation efforts, analysts must make substantive and methodological decisions about their forecast models before subsequent observations of actual policy target outcomes occur and can be compared to model predictions. Assessing accuracy, thus, can only be done by making some types of assumptions about the future resembling the past, even though the future is likely to differ from the past upon which the forecasts are based. The utility of these assumptions is, in the theoretical sense, determined by the accuracy of the forecasts.

Assessments of accuracy of macroeconomic forecasts are complicated. They depend upon the series being forecast, the period or time horizon of the forecast (short-run forecasts are generally more accurate than longer-run forecasts), the forecaster, the variables predicted, and the choice of the actual data used to measure what happened (e.g., preliminary unemployment vs. final unemployment statistics). In economics, there is a robust literature on forecast accuracy that encompasses different forms of the loss function (e.g., symmetric, nonsymmetric, quadratic), the nature of errors (Gaussian, serially correlated, etc.), and the types of tests to employ when evaluating their accuracy (Diebold & Mariano, 1995; Buturac, 2022). Singular conclusions about accuracy are hard to develop given the many factors affecting forecasts, but among these are the periods for the forecast (e.g., annual, quarterly, monthly), how long into the future the forecast runs (i.e., the length of the forecast period; McNees, 1992), external influences such as exogenous shocks, and the periodicity of updates to forecast models (Congressional Budget Office, 2018).

Economic forecasting is reflective; it tries to figure out what went right and wrong, and how to improve models. For example, forecasters missed predicting the Great Recession; following that, in 2014 the Organisation for Economic Co-operation and Development (OECD) conducted a self-analysis of its forecasts (OECD, 2014), determining that the models tended to overestimate GDP growth, both before and after the financial crisis. Its review led to changes in forecasting methods and communication.

In criminal justice practice, much of the discussion about forecast accuracy, especially as it pertains to predictive policing, consists primarily of academic researchers evaluating the accuracy of methods they have developed or tested. For example, Berk (2011), Berk and Sorenson (2020), and Berk and Bleich (2013) have addressed a number of these issues, including accuracy in predictions of crime, the nature of the loss function, and approaches to making value-based choices about the forecasts to be produced. But there are currently no comprehensive reviews or catalogues of the performance of various prediction tools that police departments and local governments can use to make choices based on performance accuracy. This absence puts decision-makers in an information void that impacts their choices about forecast tools and puts them in the position of having to rely too heavily on information provided by the producers of forecasting tools in deciding whether and how to use them.

In correction population forecasting, most states use some form of a microsimulation model (Austin et al., 2007), and their methods and performance are relatively well documented by the agencies that produce them. Even though often required by state law, correctional population forecasters have much greater confidence in short-term (one-to-two-year forecasts) than long-run forecasts (Sabol et al., 1998).

Engaging Forecasting in the Policy-Making Process

Economic forecasting and policymaking involve iterative processes, at least as used by central banks. This requires making decisions about such issues as: what variables need to be forecast and over what time horizon and intervals; whether a model should be developed (and, if so, what type) or whether a forecast can be produced without a formal model; if a model is developed, what types and how many; and whether forecasts are conditional (that is, predicated upon changes in a policy variable) or unconditional (that is, a forecast that does not attempt to account for a policy change).

Pagan and Robertson (2004) point out that often some selling or marketing of ideas may be required to convince policymakers about the value of the forecasts. For example, they argue that forecasters need to be able to tell policymakers a convincing story about the forecasts, and to do this many variables, including policy targets, may need to be forecast to demonstrate the validity of the forecasts. Or, because there are many public and private sources of economic forecasts, combining the forecasts can help policymakers make more informed decisions. Consensus forecasts—which require summarizing information across a wide range of individual forecasts—are not without their own challenges (Clemen, 1989; Timmermann, 2006). For example, those that rely on groups or committees of persons to make judgments require consensus to gain agreement; such processes tend to be biased towards the views of the individual(s) developing the forecast (Chase, 2013) or anchoring biases that are tied to the most recent previous forecasts or realized outcomes (Campbell & Sharpe, 2007). Regardless of the method used, convincing policymakers of the value of the forecasts also helps them sell their policy choices and justify them, at least in part, by reference to a forecast that then enters the decision-making processes of other economic agents (Carnot et al., 2011).

In criminal justice settings, the existence of an analogous, iterative forecasting process occurs, at least in part, in state prison population forecasting (as we describe below), but we did not find evidence that it exists in other settings, especially those related to predictive policing. Rather, when it comes to using predictive policing forecasting tools, the processes used to adopt them are relatively opaque; there are not multiple forecasts from different producers to draw upon; and the reliability of the forecasts is not well documented.

A Brief History of Criminal Justice Forecasting

Attempts at forecasting and prediction in criminal justice settings date at least to the early 1900s with the first “push pin” crime maps used by the New York Police Department (Boba, 2001). This place-based policing approach combined crime mapping with officer intuition to guide resource allocation and address the spatial clustering of crime. In the early 1990s, Bill Bratton transformed crime mapping methods with the CompStat system (Weisburd et al., 2003) by creating a demand for data-driven analytics (Susser, 2021). Advancements in data collection and computing replaced intuition with statistical and machine learning applications that attempt to predict when and where future crimes will occur (Fitzpatrick et al., 2019).

The modern development of methods to predict individual-level parolee outcomes dates to Ernest Burgess, whose work in the 1920s is widely credited as the first application of an actuarial assessment tool to forecast parolee success (Berk, 2008; Farrington, 1987; Harcourt, 2007). Building upon work by Warner (1923) on factors associated with successful completion of parole, Burgess (1928) created a prediction scale by summing 21 such factors (scored as one (1) if present and zero (0) otherwise) and then analyzed the predictive accuracy of his model among Illinois probationers using unit-weighted regression. This “Burgess Method” laid the foundation for the later proliferation of actuarial prediction methods in criminal justice settings (Harcourt, 2007). During the 1980s, split population survival models gained prominence in evaluating parole outcomes (Maltz, 1984; Schmidt & Witte, 1989). By taking exposure time into account, these models allowed for the opportunity that a person did not recidivate, lessening the assumption of eventual failure for all persons in the sample.

The September 11th terrorist attacks accelerated the demand for predictive analytics that made their way into predictive policing. The failure to prevent the attacks was due to poor data and intelligence-sharing across federal, state, and local agencies; to thwart the next attack would require the development of massive, advanced data and computing systems; and local-level police departments would likely have to take on front-line roles in the War on Terror (Brayne, 2018; Waxman, 2009). Subsequently, the newly created Department of Homeland Security (DHS) and the Department of Defense (DOD) used grant funds to develop risk assessment and spatiotemporal predictive analytics to aid in the War on Terror (Perry et al., 2014). DHS issued grants to develop predictive tools to identify people and places likely to be involved in future terrorist plots within the U.S. (Masse et al., 2007), while DOD’s primary objectives involved the development of predictive mapping to identify future insurgency hotspots in-theater (Perry et al., 2014).

Prison population forecasting also has a long history with contributions from different disciplines. Stollmack’s (1973) mathematical model incorporated criminal justice system flows (e.g., arrest, conviction, recidivism rates) and length of stay to predict prison populations and represented a major improvement over forecasts based on linear extrapolation of trends from regression models that were common at the time. Blumstein et al. (1980) elaborated on this basic concept by considering variation in criminal justice system flows among subgroups in the population. Their demographically disaggregated flow models made forecasts for each subgroup that were added up to get a total population forecast. Barnett (1987) built upon the concept of the individual criminal career to develop a stochastic model that forecast populations under different assumptions about sentencing policy and the demographic structure of the population. It rested on the assumption that changes in prison populations arose primarily from changes in the population of chronic, or career offenders.

Microsimulation models to forecast prison populations also developed during this period. These models were based upon individual-level data on each person in prison and depended upon either estimates of how long each person would remain in prison (e.g., under a determinate system) or several probability distributions for lengths of stay for groups of inmates (e.g., under an indeterminate system), as well as reliable methods for forecasting future admissions and their lengths of stay. The California Department of Corrections and Rehabilitation (CDRC) developed a microsimulation model to forecast prison populations as early as the 1970s that has been the subject of several, decennial reviews (Maltz & Chaiken, 2009). In the 1980s, the National Council on Crime and Delinquency (NCCD) developed a microsimulation model that was based upon the CDRC model that could be customized to other states’ systems. Microsimulation models have grown to become the predominate method used in the states to forecast prison populations. By the mid-2000s, 36 states used some form of a micro-simulation model to generate their forecasts (Austin et al., 2007).

Demands for Criminal Justice Forecasts and What Gets Forecast

Most of the demands for forecasts by state and local criminal justice agencies are practice-focused rather than focused on examining policy proposals. Police departments’ interests in offender- and place- based forecasts are associated with their adoption of problem-oriented policing and their needs for information to help allocate resources (National Academies of Sciences, Engineering, and Medicine (NASEM), 2018). Although firm estimates of the number of departments that use predictive policing are difficult to find, the use of predictive analysis appears to be widespread geographically and is likely to increase. Friend (2013) reported that departments in California, Washington, South Carolina, Arizona, Tennessee, and Illinois used some variant of predictive policing. A 2014 survey by the Police Executive Research Forum (PERF) found that 38% of departments surveyed used predictive analysis and 70% said they planned to adopt its use in the next five-to-ten years (Police Executive Research Forum, 2014).

Several prosecutors’ offices have adopted “quasi-predictive prosecution strategies” that involve identifying suspects who are deemed more at risk for future serious criminal behavior, and then using that information for making charging decisions, bail-release requests, and arguments about sentences (Ferguson, 2016, p. 752). According to Sidhu (2015), courts are increasingly moving towards using risk-assessment tools in sentencing, especially for assessing the risk of serious violence.

The availability of risk instruments in making pretrial release decisions appears to be widespread. A National Association of Criminal Defense Lawyers publication estimated that over 60 jurisdictions, including several states, used pretrial risk instruments (primarily actuarial instruments), and these jurisdictions covered 25% of the U.S. resident population (Buskey & Woods, 2018). Advocacy organizations such as Mapping Pretrial InJustice and the Movement Alliance Project have attempted to measure the prevalence of use of risk instruments, and their estimates are much higher, showing that about 60% of U.S. residents are in jurisdictions using such tools (Movement Alliance Project, n.d.).

Alternatively, all 50 states and the Federal Bureau of Prisons (FBP) forecast prison and other correctional populations routinely. A primary purpose of the forecasts is to inform state budget and federal offices about the costs of corrections for budget and planning purposes. Collectively among the states, these costs run to about $50 billion in direct expenditures (Buehler, 2021), or about 3% of total state direct expenditures. Many states have laws that mandate that correctional population forecasts be generated and used for budget and planning purposes (McDonald et al., 2019).

We could not find an estimate of the number of local jail jurisdictions among the roughly 3,000 jurisdictions with jails that produce forecasts of jail populations. The information needs of jail administrators are like those of state budget offices (current and capital expenditures), and the need for forecasting jail populations is widespread, especially as it relates to jail construction (Bower, 2015). Despite efforts by organizations such as the National Association of Counties to reduce jail expenditures nationwide, decisions about jails are primarily a local county decision. Surette et al. (2006) argue that despite the need, local jail administrators do not have ready access to applications of forecasting techniques.

Two notable omissions from this list of justice agencies’ demands for forecasting are city-level demands for forecasts of crime rates and federal government agency demands for forecasts in helping to allocate federal assistance to state and local agencies. We draw attention to these two sources because of the opportunities that exist for them to use forecasts in formulating policy decisions.

Few if any cities or police departments routinely forecast city-level crime rates either to formulate policy or evaluate practices. The Major Cities Chiefs Association (an association of about 80 of the largest police departments in the nation) does not list crime forecasting as one of its priority issues. City-level forecasts have been notoriously difficult to produce. The Committee on National Statistics characterized efforts to develop city-level crime rates as “fraught with significant challenges” (NASEM, 2016, p. 96). This conclusion followed on Eric Baumer’s and John Pepper’s earlier efforts to forecast city-level crime rates (National Research Council [NRC], 2008). Both found this to be extremely difficult and concluded that (a) the empirical literature on crime trends that could inform forecast models was underdeveloped, and (b) pure forecast (not causal) models based on past trends were fragile, and small changes to a model could produce qualitatively different forecasts (National Research Council, 2008).Footnote 1

The federal government provides several billions of dollars in assistance to state and local justice agencies to “identify and address the most pressing challenges confronting the criminal and juvenile justice systems” (Office of Justice Programs, nd). For example, the President’s fiscal year 2023 budget requested $6.2 billion for state and local assistance, a little more than half of which was for discretionary funding and a little less than half of which was for mandatory programs. The Department of Justice (DOJ) does not routinely produce, support the creation of, or explicitly use forecasts of criminal justice outcomes to assist in allocating funding, formulating policy, or assessing impacts of its funding. For example, DOJ does not support the infrastructure to improve city-level crime rate forecasts. Rather, at least when it comes to DOJ formula funding, such as its Byrne/JAG grants, DOJ implicitly uses a forecast model that assumes that the crime conditions that existed during the past three years are the ones that will persist into the future and be addressed by the formula funding (Cooper, 2022).

An example of its use in formulation occurred with the Justice Reinvestment Initiative (JRI), where prison population forecasts were used to demonstrate opportunities to save money by improving corrections practice (Austin et al., 2007). BJA awarded funds to states under JRI with the aim of improving practices and reducing corrections costs. What was especially interesting about this effort was that it used each states’ prison population forecast to develop its overall estimates of anticipated prison populations, and it used state-level forecasts of the size of prison populations under JRI-type interventions and took the difference between the two as an indication of the potential impact of JRI reforms. But the effort fell short of best practices in forecasting, and its impacts have not been rigorously demonstrated (Austin & Coventry, 2014; Sabol & Baumann, 2020). For example, forecasts of the size of prison populations if no JRI-related changes were made were generally based on forecasts done before prison populations in the U.S. began to decline around 2008. And because the forecasts of the future size of prison populations were generally not updated as population growth slowed, the estimated impact of implementing reforms was overstated (Rhodes et al., 2015; Sabol & Baumann, 2020).

The absence of demand by cities or police departments for city-level crime forecasts affects the capacity of policymakers to assess the causal impacts of proposed policy changes on crime rates. In situations where predictive policing is not used, decisions about future crime and resource allocation are based on pattern recognition and instincts of police chiefs. Hesitancy on the part of cities to develop and use systematic crime forecasts may arise for a variety of reasons, such as difficulties in justifying an investment in forecasting over additional officers on the beat; resistance by management or even police unions; concerns that forecasts might influence development (for better or worse), or because the technology to develop them is lacking. In his paper on city-level crime forecasting for the National Research Council, Pepper (2008) pointed out that nowhere exists a research program on crime rate forecasting that could lead to improvements in models and their applicability. The utility of systematic forecasts of crime is apparent, but it would be inefficient for many cities to invest in or undertake separate efforts to build crime forecasting models. For that reason, we see an opportunity for DOJ to use its grant-making authority to support crime forecasting infrastructure building efforts.

Issues with Accuracy in Criminal Justice System Forecasting

After providing an overview of forecast methods in predictive policing and prison population forecasting, we address their accuracy issues.

Varieties of Models

Advances in data and computing have resulted in many forecasting tools available to criminal justice agencies.Footnote 2 True forecasting methods require relevant historical data be collected at one or more points in time before outcomes are measured (Farrington, 1987). Typically, these tools are calibrated by splitting available data into training and testing sets to optimize parameterization. The first 70% or so of the data is trained to predict the outcome of interest using some model or algorithm. Depending upon the methodology chosen, optimization may involve the minimization of bias (error) and/or variance (Berk, 2008). Once the optimal data structure is identified, the tool’s predictive accuracy is tested (calibrated) against the last 30% or so of the data. Effective forecasting tools should substantially improve predictive accuracy. In practice, however, successful calibrations result from any increase in accuracy above business as usual (Perry et al., 2014).

Broadly speaking, criminal justice-related predictive analytics are either model- or algorithm- based (Berk, 2021). Model-based forecasting generally involves some type of parametric regression analysis. While the many input variables and statistical methods now available provide near limitless options for parameterization, these models all include some matrix of response variables and specified functions for the inputs (predictors), lagged time variable(s), and disturbances (errors; Berk, 2008). For example, one of the best-known predictive policing products, PredPol (Mohler et al., 2011), uses a self-exciting point process model to predict where future crimes will occur. Simply put, PredPol uses a type of Poisson clustering process to model temporal clustering of criminal events in the same way that seismologists model aftershocks following earthquakes.

Algorithmic approaches are computational methods that often use machine learning techniques to produce forecasts. Using (automated) iterative learning processes, machine learning methods identify the function linking predictors and outcomes that best balances error and bias without overfitting (Berk, 2021). Iterative machine learning approaches test various specifications until the optimal structure is identified. Take, for example, the general iterative process applied to random forests. Conceptually, random forests are a series of decision trees, which are themselves series of binary “if–then” statements. Starting with training data, a tree splits (branches) by the values of the input variable most associated with the outcome of interest and then the outcomes are tallied. Using the remaining input variables, the branched data continues to split until all the remaining positive cases cluster together. Once all training cases are perfectly sorted, the test data is run down the tree and its predictive accuracy is calculated. A new decision tree reanalyzes the misclassified cases, and the process repeats until the algorithm reaches optimization.

Microsimulation models in state correction population forecasting consist of two main components: (1) a forecast of how long each person currently in a correctional population will stay in that population, and (2) a forecast of how many new arrivals are expected (Austin et al., 2007). Using individual-level data or data on homogeneous groups of persons in a correctional population and estimates of their length of stay, the models “age out” the current population and predict how many persons currently in a population will remain at future dates. Where systems are more determinate, such as with the Federal Bureau of Prisons, relatively precise estimates of how long each individual can expect to remain in prison until release can be applied to the stock population. Where systems are less determinate, several probability distributions are applied to the individuals or groups. To forecast new arrivals, statistical models based on past trends, flow models based on relationships among criminal justice events (e.g., arrests to convictions), and other methods are used. To forecast new arrivals, estimates of length of stay are applied in order to forecast their contribution to the future size of the prison population. The forecasts from the “aged-out” stocks and new arrivals are summed to get the size of the total future prison population.

To produce the forecasts of state prison populations, assumptions about past practice and policy changes are needed. Assumptions about past practice can be derived from the analysis of patterns and trends in variables used in a forecast model. Assumptions about policy are generally based on the assumption that the effects of policy will not change over the forecast period. For example, if a policy alters good-time credits, the effect of the credits on length of stay is presumed to be constant within eligible groups of persons. This is not to say that taking into account static effects of policy is necessarily a simple matter. For example, if a policy affects a class of offenses and that class is forecast to change, the effects of a policy may be multiplicative. Alternatively, some assumptions about policy may be based on anticipated growth or change in processes leading to prison admission, such as convictions or sentences. For example, in North Carolina, a Forecasting Technical Advisory Group sets the growth rates for convictions on an annual basis. This judgmentally-determined rate is based on the experiences and expertise of representatives from several groups that are involved in court processing (North Carolina Sentencing & Policy Advisory Commission, 2021).

State prison population forecasters explicitly identify assumptions. For example, in its 2022 forecast report, the Colorado Division of Criminal Justice explicitly identified ten main assumptions, including that the legislature would not pass new legislation affecting time served or the number of persons receiving an incarceration sentence, and that “[d]ecision-makers in the justice system will not change the way they use their discretion, except in ways that are accounted for in models” (Harrison, 2022, p. 2). Further in its table of estimates of expected length of stay for different classes of prisoners, the report contained 15 footnotes that identified different assumptions used to generate the estimates for each group of prisoners. The CDRC also explicitly lists assumptions used, but it also identifies boundaries and exclusions. For example, in its Spring 2022 forecast report it states, “The projections do not currently incorporate any assumptions about individuals awaiting trial and/or sentencing due to COVID-19 related court closures and related backlogs, which could generate a temporary increase in admissions to CDCR in the future” (California Department of Correction and Rehabilitation [CRDC], 2021, p. 28). The aforementioned North Carolina Sentencing and Policy Commission identified 16 assumptions, including one about the impacts of the COVID-19 Pandemic.

The explicit identification and elaboration of assumptions underlying forecast models and their implications, like in the examples above, is not merely a formality of academic research. Rather, attention to forecast assumptions is necessary to fully appreciate the outcome(s) actually being modeled and their applicability to reality. Indeed, the devil is in the details. Too often, these details are ignored, and tools are implemented in inappropriate contexts. That said, the practice of explicitly identifying, clarifying, enumerating, and reporting assumptions and limitations has generally improved over the past few decades. Sometimes, modifications to assumptions are made. For example, because of the COVID-19 pandemic, Texas forecasts were made after modifying some assumptions about the effects of current policies, procedures, and laws (Legislative Budget Board, State of Texas, 2021).

Accuracy in Prison Population Forecasting

At one level, accuracy in forecasting correction populations is difficult to determine, as some outcomes are more difficult to determine than others. For example, predicting growth or decline is easier to do than forecast, say, the actual amount of crowding that is expected to occur in a given prison facility (Berk, 2008). And inaccurate forecasts can arise if, in response to a forecast, changes in practice occur that lead to a different outcome than was forecast. For example, Surette et al. (2006) and Blomberg et al. (2010) document cases in which local jail administrators’ responses to forecasts of larger than anticipated populations caused the administrators to alter length of stay, which in turn reduced the size of the population from what was forecast and “caused” the forecast to become inaccurate. However, that cause of inaccuracy falls outside the scope of assumptions used in forecasting, which is that the conditions existing at the time of the forecast persist throughout the forecast period.

Most state reports on prison population forecasts report absolute and relative differences between the forecast and actual populations (or between different forecast populations), and on the accuracy of prior forecasts. Forecasts over shorter-terms, such as two-to-three years, tend to be accurate, conditional upon the assumptions used to generate them holding into the forecast horizon (e.g., Austin et al., 2007; Sabol et al., 1998) and that no unexpected and large exogenous shocks, such as the responses to COVID-19 pandemic, occur. Accuracy is based on deviations between forecast and actual populations, and a determination of the extent to which the assumptions of the forecast held. Many states’ prison population forecast reports address this issue. For example, the CDRC includes a section in its reports that reviews the assumptions used in the prior forecast, and, if the assumptions did not hold, explains implications for the forecast.

Accuracy is also assessed by routine updates of forecasts. Some states generate forecasts twice per year; most states do one forecast per year. Updates are used to assess the extent to which forecast assumptions have changed and, if so, their impacts on the forecasts. Stability in forecasts over the same forecast horizon across multiple forecasts adds credibility to the accuracy of the forecasts. Even in states that generate one forecast per year, a similar process is used in comparing past with current forecasts. These comparisons provide policymakers with regular and routine updates about the performance of forecast models, the stability of assumptions, and the implications of changes in assumptions.

Long-term forecast accuracy, such as over a five-to-ten-year period, are important for budgeting and planning purposes as they may imply the need for new capital expenditures. The long-term accuracy of a forecast produced today that predicts prison populations ten years hence is less important than the trends revealed by the forecast and the stability of the forecasts under a regime of regular updates that incorporate new information about policy and practice. For both short- and long- term forecasts, states attempt to forecast populations under the most up-to-date assumptions about new policies implemented in law that could alter assumptions about how long people stay in prison or how many people will arrive. The forecasts focused on understanding how existing policy will contributes to the size of future populations.

Issues of Accuracy and Bias in Predictive Policing

Whereas population forecasts have received qualified support, evaluations of predictive policing tools currently in use have produced mixed results; however, relatively few evaluation studies have been conducted to date (Browning & Arrigo, 2021), and to our knowledge, none have examined the use of predictive policing technology to guide or evaluate broader policy (as opposed to guiding manpower decisions or evaluating temporary, local-area policing techniques). The three evaluations listed on the Center for Evidence-Based Crime Policy (Hunt et al., 2014; Mohler et al., 2015) and Crime Solutions (Ratcliffe et al., 2021) websites (as of this writing) come to different conclusions about the efficacy of predictive policing. Mohler et al. (2015) conducted a single-blind experiment using treatment and control periods to evaluate the effectiveness of the tool that would become PredPol in the Foothill, North Hollywood, and Southwest community areas patrolled by the Los Angeles Police Department. Over the treatment period, the researchers identified a 7.4% reduction in crime, concluding that PredPol did a better job of predicting crime than business as usual.Footnote 3 In contrast, the evaluation of a joint predictive policing and intervention project in Shreveport, Louisiana conducted by Hunt et al. (2014) failed to find any differences in crime between treatment and control areas. The authors note that the null findings may have been due to lack of fidelity to the intervention model; however, discussions with district commanders revealed skepticism over the actual predictiveness of the forecasting tool.

The study reported by Ratcliffe’s team (2021) also failed to identify any substantial property or violent crime reduction benefits from the joint use of a place-based forecasting tool, HunchLab, and specific police interventions. Using a randomized control design, the team implemented three treatment interventions and one business-as-usual control across 20 of the city’s 22 districts. The HunchLab algorithm generated predictions for randomly selected 500 square foot mission grids in each study district. The treatments consisted of (1) making officers aware of the predictions at roll call, (2) dedicating marked police vehicles to the predicted areas, and (3) dedicating unmarked cars and plain-clothes officers to the predicted areas. For the first three months, police engaged in the prediction-driven interventions from 8am to 4 pm daily, focusing on property crime. After a break, the prediction-driven interventions focused on violent crime and were implemented from 6 pm to 2am daily for three months. Of the six interventions (i.e., awareness + property, awareness + violent, marked + property, etc.), only the dedication of marked cars during the property crime phase resulted in a reduction in crime; however, the study’s authors suggest the magnitude of the decrease was marginal. Additionally, small grid size (roughly a square block) likely contributed to low violent crime counts and poor predictive power.

Despite the limited availability of empirical evidence for or against forecasting tools used by police, concerns have been raised about issues that have direct bearing on these tools’ predictive accuracy. The models or algorithms, themselves, may introduce bias by incorrectly handling error and variance. An assumption underlying many model-based prediction tools is that of a symmetric loss function. As Berk (2008, 2011) demonstrates, symmetric loss functions in predictive applications are often incorrect and can bias model output. Treating positive and negative errors as equal is akin to saying that incorrectly predicting that a crime will occur is no better or worse than failing to predict a crime that subsequently occurs. For decision-makers in the criminal justice system, the (real or perceived) cost of positive and negative errors are rarely equal and are often a matter of politics and philosophy. This situation is especially relevant for person-based forecasting, where misclassification could lead to treating low risk individuals as high risk or vice versa (Wykstra, 2018).

For some more risk-averse criminal justice actors, many of whom answer to elected officials or are elected officials themselves, minimizing the false negative rate may justify an increase in false positives. The political consequences of the Willie Horton affair on Michael Dukakis’s failed 1988 presidential bid illustrates why. In 1987, Horton (a Massachusetts prisoner on weekend furlough) raped a woman, assaulted her fiancé, and stole his vehicle before being apprehended. Even though Dukakis, who was governor at the time, did not start the furlough program, that single incident is widely credited as a major cause of Dukakis’s campaign loss (Anderson, 1995). For criminal justice practitioners, the potential consequences of failing to identify a future Willie Horton may lead to a more conservative approach to prediction error. Other criminal justice agencies or individual practitioners may take a more civil libertarian approach to predictive technologies, leading to variability across and within jurisdictions in how cases are handled. For example, in cities that have elected district attorneys and judges with civil rights backgrounds, much more emphasis may be placed on avoiding unnecessary detention. Ultimately, which type of misclassification is to be considered more or less costly is a matter of policy and debate, as illustrated by a recent investigative report from ProPublica.Footnote 4

In 2016, ProPublica reported on an in-house analysis of potential bias in the criminal risk scores generated by Northpointe’s person-based predictive algorithm, COMPAS (Angwin et al., 2016). Courts and corrections agencies use COMPAS to make individual pretrial, sentencing, and correctional decisions (Brennan et al., 2009). Using Northpointe’s two-year prediction window for rearrest, Angwin and her colleagues analyzed the accuracy of COMPAS’s predictions for more than 7,000 people arrested in Broward County, Florida in 2013 and 2014 and found that the algorithm was twice as likely to misclassify black defendants as high risk as white defendants. While refusing to disclose the details of its algorithm, Northpointe disputed ProPublica’s findings (Angwin et al., 2016). In fact, Northpointe’s algorithm was calibrated to produce risk scores with an equal predictive accuracy for recidivism among black and white defendants. The Northpointe programmers decided that the cost of failing to identify high risk defendants was greater than that of misclassifying low risk defendants as high risk. The disparity occurred because black and white defendants whose data was used to train the COMPAS algorithm had different base rates of 2-year rearrest, and as a result, it was impossible for the algorithm to completely avoid the resulting bias (Berk et al., 2021). Had the COMPAS algorithm been adjusted to misclassify high risk defendants at equal rates, there would have been bias in the classification of low-risk defendants. The decision as to which type of misclassification is worse is a matter of policy that has real world implications.

Data quality, or the lack thereof, is also widely recognized as a significant barrier to fair and accurate prediction (Berk, 2008; Lee et al., 2021). This is especially relevant to policing, because the deployment of predictive analytics can occur with limited or no oversight, and the historical data these products use for calibration replicate historic trends. As policing practice in the U.S. is associated with racial disproportionality in involvement in crime and the criminal justice system (e.g., Donohue and Levitt, 2001), the uncritical use of historical data may taint the resulting output and lead to inferences about future crime that are biased or can degrade community-police relationships (Lum & Isaac, 2016). Evidence exists that uncritical use of data occurs, and that data based on documented histories of “dirty policing” practices gets incorporated into predictive policing models. Richardson et al. (2019) identified 13 police departments that deployed predictive policing tools and had documented histories of “dirty policing” (widespread illegal or biased police practice) at some point between 2003 and 2017, in which at least 9 of the 13 jurisdictions engaged in dirty policing during the period from which historical training data was likely drawn. The authors were unable to quantify the extent to which dirty policing translated into dirty data and, subsequently, dirty algorithms (Richardson et al., 2019). Ideally, data and algorithmic transparency among predictive analytics companies and public disclosure by police departments would have revealed these issues.

Other examples of police agencies inappropriately or unethically employing predictive algorithms exist, such as a case in Pasco County, Florida (Pasco County Sheriff’s Office, 2018; Taylor et al. v. Nocco, pending). Here, allegations have been made that, using BJA grant funds, the Sheriff adopted an intelligence-led, predictive policing program to identify “problem” residents and subsequently “[m]ake their lives miserable until they move or sue,” (McGrory & Bedi, 2020). But our point here is not to document these cases; rather, it is to point to problems arising from uncritical use of data in ways that perpetuates past or inappropriate practices.

Forecasting Processes for Transparency and Legitimacy

Our brief discussion of macroeconomic forecasting described the importance of a forecasting process in making policy choices and legitimizing them. We pointed out, for example, that the processes relied on multiple forecasts and aimed towards achieving some consensus about which to use. Such practices have been described as processes that lead to credible forecasts (Mears, 2002). Credible forecasts are based on reliable data and methods that have proven records of at least short-term accuracy and that are produced in a transparent and reviewable manner that leads them to be accepted as legitimate. In other words, the credibility of a forecast comes in part from transparent processes that are used to generate, explain, and justify their use in decision-making.

As discussed below, we find that there is much greater transparency in prison population forecasting than in the use of predictive policing. For example, state prison population forecasters routinely publish their forecast reports. By comparison, police departments do not routinely publish forecasts of local crime rates, nor do they share the results of the person- and place- based forecasting tools they employ. We argue that one reason for the greater transparency in prison population forecasting is the way it is organized. Specifically, the organization of prison population forecasting balances the competing interests of the departments responsible for managing prison populations and the departments responsible for state budgeting. We find no parallel in predictive policing.

Transparency in Policing and Corrections Forecasting

In predictive policing, the absence of information about forecasts and concerns over black box algorithms and secretive law enforcement deployments have spurred significant criticism from advocacy groups (e.g., Lau, 2020; Stop LAPD Spying Coalition, 2018; Stop LAPD Spying Coalition, & Free Radicals, 2020), investigative journalists (e.g., McGrory & Bedi, 2020; Sankin et al., 2021; Winston, 2018), practitioners (Institute for Justice, 2022; Lee et al., 2021), and some academics (e.g., Ferguson, 2017; Lum & Isaac, 2016; Richardson et al., 2019). The most fundamental problem these disparate groups have identified is the general unwillingness of for-profit predictive policing software companies and many of their law enforcement customers to publicly disclose even the most basic information, such as which departments deploy which systems. The tendency toward obfuscation among predictive policing firms and their clients has led to FOIA-related lawsuits around the country (Brennan Center for Justice, 2017; Stop LAPD Spying Coalition, 2018). Federal privacy laws may be driving this trend (Collins, 2018); however, high profile cases like Chicago’s Strategic Subject List (Stroud, 2016) and the legal and investigative saga currently taking place in Pasco County, Florida are reminders that law enforcement is not immune to secretive or even nefarious actors. Even if information about forecasts of crime is withheld unintentionally, the deployment of predictive policing tools in echo chambers limits the potential to build legitimacy and demand for the expansion of tools into proactive social welfare initiatives (e.g., using predictive software to identify vulnerable people or places for social program interventions and urban planning projects, respectively).

Even though police departments are often reluctant to release information on their predictive technology deployments, they don’t always know what’s going on under the hood, themselves, even when they know what data make up algorithms’ inputs (Lally, 2021). This lack of awareness about how what is generated got generated can have consequences, not the least of which is a poor understanding of what information these tools can really provide and how to use other information to adjust for or contextualize the data derived from the tools. As we cautioned above, predictive policing algorithms can replicate previous policing patterns rather than forecast crime (Lum & Isaac, 2016; Richardson et al., 2019). The problem is that, with a lack of transparency about data, methods, and decision-making, the magnitude of this problem cannot be estimated.

By comparison, the way states have organized the production of correctional population forecasts share characteristics of credible economic forecasting processes. The forecasts serve the needs of more than one state entity whose interests may be in conflict. Practically, state budget offices need to ensure that corrections departments aren’t “cooking the books” to request larger budgets than warranted; they need to ensure that forecasts are credible (Mears, 2002). Among the states, several methods are used to mitigate agency self-interests in generating forecasts. In addition to the twice-yearly forecasts that we mentioned above, each of which includes an assessment of the prior forecast’s performance and explanations of differences (e.g., CDCR, Oregon), other methods that states use include requiring an entity other than the department of corrections to produce its forecasts. State sentencing commissions (e.g., New Mexico, North Carolina), a state’s office of policy and planning (Connecticut), or a separate estimating agency that generates official forecasts for several departments (e.g., Florida’s Criminal Justice Estimating Conference) are examples of these. In Florida, not only does the Estimating Conference generate forecasts of the prison population, but Florida law requires the secretary of its department of corrections to develop a plan to manage prison populations when the forecast population exceeds capacity, even though the forecasts were generated by the independent Criminal Justice Estimating Conference and not by the department of corrections.Footnote 5

Several states require periodic external reviews of their forecast methods. California law requires that CDRC’s model be reviewed by an external entity that is not affiliated with the department every ten years (Maltz & Chaiken, 2009). Nonpartisan legislative offices also conduct reviews (O’Neill & Koushmaro, 2020). At the federal level, Congress has asked the Government Accountability Office (GAO) to review the FBP’s forecasting methods on several occasions (GAO, 1997, 2012).

In addition, many states use of some form of committee of administrators, representatives from police agencies, prosecutors’ offices, policy analysts and other stakeholders inside and outside of government as part of the process of generating and using prison population forecasts (Klay & Vonaset, 2008; Mikesell & Ross, 2014; McDonald et al., 2019; Wan, 2013). The purposes of such committees are twofold: First, they are used to gather input about policies, practices, and so forth that can affect future populations and be built into forecast models; and second, they are used to communicate information about the forecasts back to the stakeholders.

Legitimacy and Barriers to Expansion in Policing

With so many competing voices and such little transparency at the local level, the public does not have readily available mechanisms to assess predictive policing tools. Civil rights organizations view predictive policing as an inherently dangerous enterprise, whereas proponents focus on the accuracy of the algorithms while downplaying or not disclosing data on the magnitude of the problems these technologies present. The fact that many police agencies don’t know or don’t seem to care how these systems work is problematic, because it renders them incapable of effectuating the outreach necessary to build community buy-in.

When these predictive policing systems are deployed without transparency, as was the case with the secret partnership between the CIA-backed technology, Palantir, and the New Orleans Police Department that began in 2012 (Winston, 2018), it is not surprising that communities develop mistrust and resentment toward the police who are supposed to have their interests first in mind. Further damage to the legitimacy of the criminal justice system can occur when state and federal agencies exercise limited oversight when public money is provided to the companies developing these tools and the public safety departments that deploy them. In response to a recent inquiry from several members of the U.S. Senate (Wyden et al., 2021) into BJA’s role in subsidizing the development or purchase of predictive policing tools for police departments across the country, the agency was unable to quantify the number of departments that acquired such tools using Byrne funds (Cameron, 2022; Hyun, 2022). This only reinforces the impression of corporate and police impropriety and bureaucratic fecklessness.

Ultimately, the viability of expansion of front-end criminal justice system forecasting technologies to effectuate policy change and evaluation seems limited in the current climate of operational opacity, lack of accountability, and budgetary management issues. Cities wracked by scandals have triggered a move away from the use of these predictive technologies (Bhuiyan, 2021; Foody, 2020), and some city governments are outright banning their use (e.g., Johnston, 2020; Sturgill, 2020).Footnote 6 If these technologies are to be given a second chance at life in order to make and evaluate substantive change for hard-hit communities, the path forward will require significantly greater oversight.

Conclusions and Recommendations

The state of criminal justice forecasting in policymaking is underdeveloped compared to enterprises in macroeconomic policymaking. This issue is less pronounced in prison population forecasting than in predictive policing, where the accuracy and use of forecasts is harder to determine due to a comparative lack of accountability and consensus-orientation. There are also unmet demands for forecasts that, if produced credibly, could contribute to improving funding mechanisms, policy, and practice.

First, on the predictive policing side, it is unfortunate that the overall experience with place- and person- based predictive policing technologies has fallen short of their potential utility, but it is not yet inevitable that these systems will (or should) be relegated to the garbage heap of ill-conceived justice-related initiatives. Secretive use of black box forecasting algorithms to augment business-as-usual policing practice erodes public confidence and police legitimacy, irrespective of any real or imagined public safety benefit, and should be abandoned. That said, we do not advocate throwing the baby out with the bathwater. Rather, we envision the reconceptualization of these technologies as tools that promote transparency, collaborative problem solving and decision-making, and policy experimentation.

We recommend that policing agencies, cities, and counties interested in utilizing these technologies provide proactive (prior to procurement) public disclosures explaining (a) what systems are under consideration; (b) how much they cost and how they would be paid for; (c) their underlying inputs, weighting rules, and algorithms; (d) where and under what circumstances they would be used; (e) what any governmental responses utilizing their findings would entail; and (f) how residents can formally dispute their results. This requires officials to work collaboratively with affected neighborhood councils or advisory boards, and local nonprofits to ensure adequate information sharing. We suggest community stakeholders be included in any planning and decision-making processes, especially with regards to how the tools are used and how disputes might be independently resolved. This is particularly important for the use of person-based forecasting products, which have primarily been utilized to support crime suppression efforts. Communities should determine how juveniles and adults identified as high risk should be treated, whether that be with heightened surveillance and arrest (the usual approach), through social interventions (e.g., social services, crisis intervention programs), or some combination thereof. The interagency and public–private partnerships required to create, implement, and manage social interventions (be it individual-level case plans or community-level projects) should be in place prior to the implementation of any forecasting technologies.

We also recommend that forecasts be published or communicated to affected communities on a regular basis, as is done by state prison population forecasters. In the case of place-based forecasts, local governments should publish projected hotspots and the data used to derive those projections directly to their websites. This is not a far step from the crime maps that many police departments already make available through one or more police data portals. Partnerships with neighborhood councils and news outlets could facilitate awareness of these forecasts and increase community-police dialogue. This would allow residents and other community stakeholders to evaluate the forecasts and engage with policymakers on steps to address spatial vulnerabilities to crime.

Finally, we recommend that agencies utilize these forecasting tools to evaluate broader policy initiatives instead of simply using them to inform resource deployments or criminal investigations. Other commentators have already suggested a number of community-oriented uses for predictive policing tools, including informing multi-agency social services responses (e.g., Capotosto, 2017) or green space projects (e.g., Kutnowski, 2017). We would simply add that these forecasts should also be used to evaluate the performance of public policy interventions. The effectiveness of green spaces, for example, could be judged by comparing actual to forecast crime as a means of demonstrating the effectiveness of such urban planning projects in revitalizing high-crime areas.Footnote 7 Over time, these forecasting tools may assist in the identification of public policy programs that are more broadly effective.

Second, comparatively little criminal justice forecasting is done for policy formulation and testing; most of it focuses on practice issues, both in predictive policing and prison population forecasting. There are opportunities for federal leadership here, as indicated by the JRI example, where forecasts of prison populations under existing conditions and under conditions associated with policy reforms were used to identify potential cost savings and motivate states to consider changing practices. As we pointed out, even though the JRI effort fell short of best practices, it provides a model for examining the potential impacts of policy or practice reforms that could then be compared to the actual impacts. One concern associated with using this type of approach to assess policy impacts is that nothing remains constant; in other words, policy and practice change constantly. Even so, the practice of explicitly stating then reviewing and evaluating the assumptions used to generate forecasts against actual practice provides a basis for identifying and determining how much a change in practice contributed to the error of a policy forecast. At the state or local level and from a process-control perspective, the model and assumptions underlying a forecast can be used to identify the reasons for deviations from what was forecast, and these reasons can become part of policy discussions. At the federal level, because it provides several billions of dollars in state and local criminal justice assistance each year, using forecast models to compare expected outcomes with actual outcomes across grant recipients in this manner would provide it with information about the effectiveness of its investments.

We see a federal role here that is tied to federal assistance grants to state and local criminal justice agencies, much of which goes to local police departments through formula grants that are based on past crime patterns and not future crime problems. This approach is analogous to rewarding places that had higher crime, and not one that directs funding to address future crime problems. City-level crime forecast models would provide a more logical mechanism for awarding funds to address future problems. However, we are not advocating for thousands of police departments to develop their own forecast models; rather, the federal government should invest in developing the data and methods for creating reliable city-level crime forecasts. As we have pointed out, doing so won’t be easy; consequently, a multi-year investment in a program of research into city-level crime forecasting would be required. Such a program could examine and compare various approaches including the prediction market approach that we referenced briefly. Ultimately, whether a single federal entity—such as occurs in generating the data used in allocating law enforcement formula grants—or state and local entities generate the forecasts, they would come from a common forecast model. Ancillary to developing such forecast models would be data improvements. It is clear that improvements in technical capacity have outstripped improvements in data, and it is not clear that the technical improvements themselves have led to better forecasts. A federal investment in improving data infrastructure for the purpose of improving forecasts would benefit many state and local agencies and the federal government itself.