1 Introduction

In mid-March 2020, I was asked to be a voluntary (unpaid) member of the Los Angeles County Department of Health Services (DHS) Covid-19 Predictive Modeling Group, and I immediately agreed. In this paper I discuss some of the statistical issues that arose as a result in modeling and forecasting the spread of Covid-19 (SARS-CoV-2) in the United States, especially in Los Angeles County, California. Some of these issues are very elementary in nature, but are the types of things not conventionally studied in undergraduate or graduate training in Statistics, even though they must arise very frequently whenever statisticians collaborate with government agencies or industrial partners.

The Covid-19 Predictive Modeling Group, which was led by Dr. Roger J. Lewis, was charged with forecasting hospital demand, which simply meant the predicted number of people needing a hospital bed, Intensive Care Unit (ICU) bed, or ventilator or other medical equipment, due to Covid-19. An example of such a forecast, anticipating the number of daily new Covid-19 patients in Los Angeles County, made on July 14, 2020, is in Fig. 1. The main time horizon of interest was one week: the DHS Covid-19 Predictive Modeling Group would issue a press release at the beginning of each week, and Dr. Lewis would typically give a televised press conference each Tuesday or Wednesday morning summarizing the main points and showing a graphic or two. Seeing on television plots or results that I had somehow played even a miniscule part in helping create was very exciting.

Fig. 1
figure 1

Observed (white dots) and projected (white lines, with uncertainties in pink and red) daily new Covid-19 patients admitted to hospitals in Los Angeles County, California. The projection was made on July 14, 2020

2 SEIR versus Hawkes

The first and perhaps most important decision our group had to make was the choice of which type of model to use for the forecasts. The two main choices seemed to be a compartmental type model based on the SIR (Susceptible Infected Recovered) model of Kermack and McKendrick (1927), such as the SEIR (Susceptible, Exposed, Infected, Recovered) model used to model the spread of Ebola (Lekone and Finkenstädt 2006), or some version of the Hawkes model (Hawkes 1971), such as the HawkesN model of Rizoiu et al. (2018) or the Recursive model which had been used to describe the spread of Pertussis (Yang 2019) or Rocky Mountain Spotted Fever (Schoenberg et al. 2019).

Compartmental models like SEIR work by categorizing the population into categories or compartments, such as those who are Susceptible, Exposed, Infectious, or Recovered. In each time unit (usually 1 day), some fixed proportion of the susceptible become exposed, and some fixed proportion of the exposed become infectious, etc., and these proportions are to be estimated using the data. The recovered compartment includes those who have survived the disease and are no longer infectious as well as those who died.

Estimation of parameters in SEIR-type models involves finding an approximate solution to a closed loop of differential equations as shown in Fig. 2, which is taken from Kresin et al. (2021). Optimization may be performed by least squares, minimizing the sum of squared differences between the observed and predicted numbers of infections (Khan et al. 2020).

Fig. 2
figure 2

Diagram of the SEIR model, reproduced from Fig. 1 of Kresin et al. (2021). N is the number of susceptible individuals in the population, (β ⋅ I) is equal to the force of infection, Λ is the birth rate, µ is the death rate, γ is the mortality rate from the given epidemic, and α−1 is the average incubation period

Point process models like the Hawkes model take a different approach, viewing each reported case of the disease as an event, and allowing events to trigger other events, and those events to trigger still more events, and so on. Temporal Hawkes models are characterized by a conditional intensity

$$\lambda(t) = \mu + {\text{k}}\sum {{\text{g}}(t - {t_1})} ,$$

where µ represents the background rate at which infected people are immigrating into the location of interest, k is the productivity representing the expected number of people to whom each infected person spreads the disease, and g is the triggering density governing the time it takes for the disease to spread to the next person. Hawkes-type models have been proposed for earthquakes (Ogata 1988, 1998), and have proved to be the best-fitting models for earthquakes in earthquake forecasting experiments, the idea being that earthquakes can have aftershocks, and those aftershocks can trigger further aftershocks, and so on (Clements et al. 2011, 2012; Zechar et al. 2013; Bray et al. 2014; Gordon et al. 2015; Schorlemmer et al. 2018). When fitting Hawkes models to Covid-19 data, the productivity k is typically allowed to vary over time, and both the productivity and triggering density g may depend on parameters Θ to be estimated using the data. As with SEIR-type models, Hawkes models have parameters that may be estimated by minimizing the sum of squared differences between the daily observed numbers of infections, N(t), and the predicted numbers of infections according to the model (Schoenberg 2023), i.e. finding parameters Θ minimizing

$$\sum _{t=1}^{T}\left\{N\right(t) - [\mu +\sum _{u=1}^{t-1}k(u;\Theta)g(t-u;\Theta)]{\}}^{2}$$

.

The two types of models, SEIR-type and Hawkes-type models, have various pros and cons which are described in more detail in Kresin et al. (2021), but which can be summarized briefly here as follows. Compartmental models like SEIR are very widely used and certainly more standard for forecasting epidemics than Hawkes-type models. The SEIR-type models are also more physically sensible than Hawkes-type models, and seem often to forecast well for short-term forecasts during the outbreaks of epidemics. On the other hand, SEIR-type models seem to be prone to mis-specification, and depend critically on highly uncertain quantities such as estimates of the number of asymptomatic or unreported infections. I will discuss this latter problem more in Sect. 6, but the point for now is that it often results in large errors for SEIR forecasts. In addition to having relatively low accuracy, SEIR models are also typically poorly calibrated. For example, during the early stages of the Covid-19 pandemic, SEIR-based forecasts for Italy repeatedly under-estimated case counts, by approximately 14% on average. SEIR-based models have been known to have very low accuracy for longer-term forecasts; a prominent example is in 2003, when SEIR models applied to the spread of SARS in China estimated 30,000 to 10 million SARS cases would occur in the first 4 months of the spread of disease in China (Meyers 2007). Ultimately, only 782 cases were reported in China (World Health Organization 2003).

I argued in favor of the Hawkes-type models based on their superior forecasting performance for other contagious diseases (and perhaps secretly because I had much more experience with them in my own research). My colleague, Andrea Bertozzi, joined the group soon after me. Prof. Bertozzi is formally a professor of Applied Mathematics at UCLA whereas my appointment is in Statistics, but she is essentially a smarter version of whatever I am, and unsurprisingly, she agreed with me on the use of Hawkes models, and suggested the Covid-19 Predictive Modeling Group consider using both models.

To Prof. Bertozzi and me, as academics, the decision was clear: if Hawkes models forecast more accurately, and if our task is to do forecasting, then Hawkes models should be used. However, the committee decided to use a SEIR-type model instead. This decision was largely based on the fact that Hawkes models were newer and less accepted in the epidemiology community, whereas SEIR models were the industry standard. Dr. Lewis, a delightfully collegial and open-minded leader who is exceptionally eloquent on camera and extremely knowledgeable about a very wide array of matters, was also very humorously self-deprecating and said at one point that he did not want to risk issuing forecasts that might “make me look stupid, or maybe I should say, might correctly show the world how stupid I am.“ These kinds of differences in the approaches between academics and those in industry or government agencies are likely commonplace and familiar to those in other areas of application as well.

As it turns out, the decision to use a SEIR model was a small mistake. Although SEIR forecasts were reasonably accurate, Hawkes models forecast Covid-19 cases in California, and for most of the United States, a bit more accurately than SEIR-type models, with root-mean-square (RMS) errors 21–63% lower than SEIR models (Yuan 2020; Kresin et al. 2021). Hawkes models also outperformed a SEIR-type model called SQUIDER for forecasting doubling times of Covid-19 in California during three different surges (Kaplan et al. 2023).

3 Upper limits on the y-axis

Prof. Bertozzi and I were not always correct, however. One issue on which we were spectacularly wrong was the issue of scaling the y-axis in plots of daily new patient forecasts such as Fig. 1. The group initially put the ceiling of these forecasts rather arbitrarily at 1000 new patients per day in Los Angeles County hospitals. However, from March through November 2020, the observed number of daily new patients never even exceeded 350.

During the late Spring and early Summer of 2020, Prof. Bertozzi and I suggested perhaps lowering the upper limit on these plots from 1000 to 500, which would allow the viewer to see more clearly the weekly fluctuations in the forecasts as well as the data. Dr. Lewis very wisely declined, deciding that, if another surge should occur causing us to increase the upper bound of our forecasts above this upper limit of 1000 new cases/day, it could be cumbersome, embarrassing, and distracting to have to explain the rescaling of the y-axis in subsequent plots. Such a change would also make it difficult to compare earlier diagrams with later ones.

Dr. Lewis was right. In fact, as early as mid-December 2020, our SEIR model was producing forecasts whose 95% confidence bands extended so far above 1000 new cases/day in Los Angeles County, that in fact the upper bound on the y-axis had to be increased above 1000, as shown in Fig. 3, and Dr. Lewis unfortunately had to explain the change of scale of the y-axis, despite his earlier prescience.

Fig. 3
figure 3

Observed and projected daily new Covid-19 patients admitted to hospitals in Los Angeles County, California, made on December 16, 2020

4 Bayesian versus Frequentist SEIR model

The team’s modeling group, led by Kert Viele, decided on a Bayesian formulation of the SEIR model, following Clancy and O’Neil (2008), Frasso and Lambert (2016), and Ozanne et al. (2019). Although all my training and research has been frequentist, I had no objection at all to the team using a Bayesian framework, and Dr. Viele and his colleagues seemed much more experienced with Bayesian models, so the Bayesian option won out immediately without any opposition.

The decision to use a Bayesian versus frequentist SEIR model did not seem to be very consequential. The Bayesian formulation perhaps allows a bit more room for the user to adjust hyperparameters in order for the model to fit better and in particular to adjust the variability of the model forecasts, but the ratio of the number of adjustable parameters to datapoints in SEIR-type models is already so large that a Bayesian version of the model may have been unnecessary. The Bayesian model also seemed to facilitate adding in additional variability corresponding to uncertainty in whether future transmission behavior might change in the future (Figs. 1 and 3).

5 The lower limit in Spring 2021

An interesting moment for forecasting was in Spring 2021, when vaccines began to become widely used in California and the number of Covid-19 hospitalizations/day in Los Angeles County quickly plummeted (see Fig. 4). However, it was unclear whether the number of such cases per day would decrease all the way to zero, or would level off around 100 cases/day, or perhaps something in between. As seen in Fig. 4, when the summer 2020 surge abated, the number of hospitalizations per day decreased to approximately 100 before levelling off throughout August, September and October 2020. Thus in early March 2021, I suspected that perhaps the behavior would be similar and we would observe another levelling off around 100 hospitalizations/day, perhaps because when caseloads were low, hospitals might ease their admission requirements relative to when caseloads are higher, resulting in a stabilization in the number of hospital admissions.

Fig. 4
figure 4

Observed and projected daily new Covid-19 patients admitted to hospitals in Los Angeles County, California, made on March 22, 2021

The SEIR model, however, anticipated new daily Covid-19 hospitalizations to keep decreasing below 100 per day, albeit with sizable uncertainties. Impressively, the SEIR model was clearly right, and once again I was wrong. Daily hospitalizations from Covid-19 continued to decrease in April and May 2021, to a level below 50 new hospitalizations/day, as seen in Fig. 5.

Fig. 5
figure 5

Observed and projected daily new Covid-19 patients admitted to hospitals in Los Angeles County, California, June 1, 2021

6 Seroprevalence

As mentioned in Sect. 2, one problem with SEIR-type models is they seem to depend critically on estimates of the number of asymptomatic and unreported cases (Kresin et al. 2021), and these numbers are extremely difficult to estimate accurately. While there were some careful early studies of seroprevalence, or the percentage of the population with antibodies (Sood et al. 2020; Bendavid et al. 2021) including several conducted in Spring and early Summer of 2020 by the Centers for Disease Control and Prevention (CDC), those studies were terminated in mid-Summer 2020 when the Trump Administration cut off funding for the CDC (Wermer and Stein 2020; Bajema et al. 2021).

Although none of the CDC studies was focused on Los Angeles County, I thought it might be a good idea to use the results of these studies to help inform our SEIR model, and this was my main contribution to the Covid-19 Predictive Modeling Group. The product of the population size and the percent of total infections implied by the seroprevalence studies administered by the CDC would yield an estimate of the total number of infections having occurred in a location as of July 2020, and if one divides that total by the total number of confirmed cases in the location as of July 2020, the ratio was somewhat stable across locations, with a value of approximately 11 for Florida, Washington State, and Utah, 12 for New York State, and then there were two troubling outliers, Missouri with a ratio of 24 and Connecticut with a ratio of 6. If memory serves, I think Dr. Viele decided to have a prior for this ratio to be in the range of 10–15 for the purpose of estimating the SEIR model use for the LADHS forecasts, and of course the posterior was estimated using the data on case counts in Los Angeles County.

7 Transmission time density

One topic I investigated at this time but rather independently from the LADHS Covid-19 Predictive Modeling Group was the distribution of transmission times, which is a key component to a Hawkes-type model. Early reports by the CDC and other sources seemed contradictory, with unusually high uncertainties, and with the possibility of very large transmission times.

For instance, early studies of patients in Wuhan, China reported that the median time from the onset of SARS-CoV-2 illness to acute respiratory distress syndrome (ARDS) was 8–12 days and the median time from onset of illness to ICU admission was 9.5–12 days (Huang et al. 2020; Wang et al. 2020; Yang et al. 2020; Zhou et al. 2020). Based on such studies, the CDC summarized that the incubation period ranges from 2 to 14 days (CDC 2021a), that the contagious period for those with SARS-CoV-2 is typically up to 10 days following symptom onset (CDC 2021d) or 14 days following exposure (CDC 2021c, d), but may extend up to 20 days after exposure (CDC 2021a), and recommended infected individuals stay home for 14 days after last contact with someone with SARS-CoV-2 (CDC 2021b, 2021c). Similarly, the World Health Organization reported that because the incubation period averages 5–6 days but can range up to 14 days, infected individuals should quarantine for 14 days after exposure (WHO 2021a, 2021b).

Fitted Hawkes models with nonparametric triggering functions, allowing the data to determine the transmission time distribution, told quite a different story, however. Figure 6 below, which is a reproduction of Fig. 6a of Schoenberg (2023), shows fitted nonparametric estimates of the triggering density in Hawkes models fit to Covid-19 data from 1/23/20 to 8/25/21 for all 50 United States. For further details on the dataset and estimation methods, see Schoenberg (2023).

Fig. 6
figure 6

Nonparametric transmission time density estimates for Covid-19 for all 50 United States (thin grey curves), and mean of all 50 state estimates (thick black curve), from Schoenberg (2023)

The close agreement of the 50 transmission time estimates is evident, as is the peak at 7 days for the 50 transmission time curves. As noted in Schoenberg (2023), there is a surprising amount of mass in the estimated density corresponding to transmission times of 14 days, which is likely due to harmonic aliasing, and at 1 day, which might be attributable to contagion via physical contact. For example, if an individual’s hand comes in contact with virus, then hand-to-hand exposure or hand-surface-hand exposure might be a means of transmission with very short transmission time (Lotfi et al. 2020). Another possible explanation for the 1-day transmission density may be autocorrelation between one day’s case counts and the next for reasons such as continuity in human behaviors, policies, and recording decisions.

8 Summary

Studying a disease that is dominating the headlines while suffering from that disease, as I did in late March 2020, was truly a unique experience. Hopefully it will remain that way.

Looking back on the pandemic now that some time has passed since its inception, I feel that in modeling trends for Los Angeles County, we should have placed more emphasis on using more information from other locations and previous epidemics, rather than focusing so much on Los Angeles County data. Indeed, for very short-term forecasting, it makes relatively little difference whether one uses a compartmental model, a point process model like the Hawkes model, or simply does exponential smoothing or kernel smoothing, the latter two of which require no modeling and only a single tuning parameter. For longer-term forecasting, much more reliance should be put on trends in other locations and from past epidemics, in my opinion.

Most of the issues I have described here will sound familiar to statisticians who have worked with industry or government agencies in any discipline. Although some of the issues are rather elementary, it seems puzzling that Statistics courses and texts spend so little time discussing them, especially fundamental issues like how to choose a model or class of models to use for a given problem, and what criteria to use for choosing among competing models.

Since the pandemic has died down, too little attention has gone into comparing and evaluating the fit of the various models and forecasts. We need such studies so that we can learn from our mistakes and do better next time. I personally have had much more difficulty in the past publishing papers evaluating existing models and methods, and a much easier time when I have proposed some slightly new model or method, and this seems unfortunate.