1 Introduction

Motivation

The COVID-19 pandemic highlighted the need to anticipate the impact of a novel pathogen on healthcare [14] or the economy [5, 6]. One of the impact factors is the basic reproduction number \(\mathcal{R}_{0}\) [7], a demographic concept that has been repurposed for infectious disease epidemiology [812]. \(\mathcal{R}_{0}\) represents the average number of susceptible people a host infects in a completely susceptible population whilst that host is in its infected state [13, 14]. Based on estimates of \(\mathcal{R}_{0}\) for COVID-19’s causative agent SARS-CoV2, various categories of predictive [1, 1518], forecast [1921] and regression [2225] models have been constructed to anticipate healthcare system demand.

The COVID-19 pandemic’s infections have been periodic [2428]. Continuous feedback control loops like prevalence-dependent contact rates [29] and intervention fatigue [30] may contribute. Equally, irregular events like relaxation of previous restrictions, superspreader events and migration [30] result in perturbations in the rate of new infections or the active infections [26]. Cyclical events like seasonal host behaviour, pathogen biology, migration or waning immunity [28, 30] result in periodic infection perturbations [25]. The superposition of these perturbations manifest as pandemic waves [31].

For the COVID-19 pandemic, some subsequent waves of infection have been associated with mutations to the ancestral (wild-type) SARS-CoV2 in some countries [30, 3236]. Paradoxically, these distinct variant waves may be a consequence of SARS-CoV2’s slow virion mutation rate [35, 3739]. Even if the virion mutation rate is constant, the time to accumulate the appropriate number of mutations at the appropriate loci of a virion’s genome in a sufficiently gregarious individual and sufficiently connected geographical location to collectively constitute a VOC may vary [35, 37, 38]. Thus the timing and the impact of these events will be treated as random [35], and prediction requires manifestation in at least one region. This paper projects the impact of a random event like a novel VOC from a region in which it has manifested to one in which it has not [32]. We propose that the distinct collection and distribution of clinical manifestations, pathologies and mortality of each of the SARS-CoV2 VOCs [4042] can be treated as a novel pandemic and that COVID-19 is the collective manifestation of this family of overlapping pandemics [43]. These VOCs’ distinct transmission dynamics [32, 4447] provide additional justification for this approach.

Implicitly, each VOC contender is a potential new pandemic [30, 39, 4850]. For the particular case where the contender and incumbent VOC do not form a mixture and, instead, the contender rapidly replaces the incumbent, the reproduction number \(\mathcal{R}(t)\) for the SARS-CoV2 variant family at the time of transition \(t_{0}\) represents the challenger VOC’s \(\mathcal{R}_{0}\). This \(\mathcal{R}_{0}\) represents an upperbound of the challenger VOC’s impact in anticipation of it outcompeting and supplanting the incumbent [51]. It is an upperbound because by definition the \(\mathcal{R}_{0}\) assumes complete susceptibility to the new variant.

The HI-STR[52] model is a deterministic, compartment model prototype constructed to replace two assumptions of Kermack–McKendrick’s SIR prototype [5355]. It replaces the assumption that the removal rate from a compartment is proportional to the size of that compartment with the more biologically appropriate assumption that the transmissible period is fixed and, consequently, the removal rate is the same as the entrance rate one transmissible period ago [52]. It also replaces Hamer’s mass action law with its chemistry precursor – the law of mass action [56]. The latter allows the derivation of a population density-dependent \(\mathcal{R}_{0}\) [22, 23, 52].

The HI-STR model differs from existing compartment models by predicting that \(\mathcal{R}_{0}\) is not only a pathogen property but also depends on the host population’s characteristics, including population size N, density \(\rho _{n}\) [23] and social behaviour [30, 5759] common to that population. This paper describes a novel method of foretelling local \(\mathcal{R}_{0}\) in isolated populations with similar social behaviour. The method is designated projection. It proposes that if an estimate of \(\mathcal{R}_{0}\) exists for an isolated population \(y \; ({}^{y}\widehat{\mathcal{R}}_{0} )\), then the projection of \({}^{y}\widehat{\mathcal{R}}_{0}\) onto an isolated population z \(({}^{z}\widetilde{\mathcal{R}}_{0} )\) with similar social behaviour is

$$ \frac{{}^{z}\widetilde{\mathcal{R}}_{0}}{{}^{y}\widehat{\mathcal{R}}_{0}} = \sqrt[\mathfrak{B}]{ \frac{{}^{z}\hat{\rho}_{n}^{2}\,\times \,{}^{z}\widehat{N}}{{}^{y}\hat{\rho}_{n}^{2}\,\times \,{}^{y}\widehat{N}}}, $$
(1)

where \(\mathfrak{B}\) is specific for that pathogen variant’s transmission dynamics in those populations with similar social behaviour. The symbol  ˆ  represents an estimate.

Background

The omnipresent SIR compartment model for the temporal evolution of an infectious disease proposes that the individuals of an homogenous population can be grouped into three compartments: susceptible, infected and removed [5355]. Susceptible implies capable of contracting a pathogen, infected implies capable of replicating and spreading the pathogen, and removed refers to either recovery (expulsion of the pathogen and immunity) or death. Additional compartments [1, 60] and stratified or heterogenous populations [6164] result in sophisticated deterministic, compartment models.

An infectious epidemiology modelling taxonomy is proposed (Fig. 1) to distinguish between foretell’s common synonyms [65] in mathematical epidemiology. This taxonomy proposes that deterministic, compartment models are a subcategory of Differential Equation (DE), orthodox, predictive models. Predictive (mechanistic [66]) models presuppose that phenomena can be explained and that these explanations can be simulated. The orthodox predictive models consist of a three- or four-step process of explanation, abstraction into mathematics, the application of a numerical method and in silico simulation of the abstraction. The pioneering categories of orthodox models are stochastic and deterministic.

Figure 1
figure 1

Epidemiology prophesy taxonomy

The deterministic compartment models are DE models. The DEs assume a homogenous population and simulate averaged phenomena. The Ordinary Differential Equation (ODE) models only simulate the rate of change of the compartment sizes. Historically, the Delay Differential Equation (DDE) compartment models [67, 68] are an alternative to the Exposed (E) compartment of the Susceptible-Exposed-Infectious-Removed (SEIR) ODE model [6971]. Both the traditional delay term and the E compartment incorporate an incubation period into the SIR prototype. The HI-STR is a DDE model that reduces to an ODE for periodic phenomena [52]. The HI-STR’s delay is not due to incubation; it is intended to simulate a constant transmissible period. The transmissible period is another subtle difference from traditional ODE models. Similar to the infectious period, it is the period of time that a host can transmit the disease but can be limited biologically (e.g. the incubation period), behaviourally (e.g. isolation, quarantine [72] or hospitalisation) or technologically (e.g. face mask or pharmacy). An example of pharmacological restriction to a transmissible period is Human Immunodeficiency Virus (HIV) control where anti-retrovirals (ARVs) substantially reduce viral load and therefore transmissibility. Thus transmissibility may be idealised as a step function under appropriate circumstances [73]. Implicitly, transmissibility is a population characteristic, whereas infectivity is an individual characteristic. Partial Differential Equation (PDE) models typically model spatial spread as diffusion [7476]. Algebraic formulae for thresholds like \(\mathcal{R}_{0}\) and proportion to vaccinate are a consequence of deterministic models.

Stochastic, orthodox models translate to Binomial Chain Models (BCMs) [7779] or Stochastic Differential Equation (SDE) models that superimpose uncertainty on ODE models [8082]. They complement deterministic models with their ability to assign probabilities to outlier events [83] like pathogen extinction, provide confidence intervals to their predictions, incorporate noise and their applicability and utility in small samples. The distribution of the uncertainty is an assumption [84]. Note that the forecasting models (to be described) are also statistical. The distinction is that like the deterministic models, the stochastic models simulate a theory to prophesize the future, whereas the forecasting models extrapolate the past into the future.

Graph based epidemiological models can be interpreted as an abstraction of an explanation (or a translation) to a branch of mathematics, graph theory [85, 86], before in silico simulation [8790]. The latter interpretation provides the flexibility of graph theory or the heritage of an established application like social network theory [9194]. Here graph- or network-based methods are therefore classified as orthodox predictive methods and ODE alternatives.

The unorthodox predictive methods also presume that phenomena can be explained but the explanation is not translated into mathematics before simulation. Rather, direct in silico simulation of the explanation is performed. Thus some graph-based implementations can be interpreted as unorthodox [18, 9598]. Graphs consist of vertices and edges, where (for infectious diseases and social networks) the vertices represent individuals, and the edges represent relationships or interactions. Traditionally, the vertices have no geometric interpretation and do not simulate spatial spread [99], but the vertices can be mapped to location [97].

Agent Based Models (ABMs) [6, 17, 100, 101] and Cellular Automata (CA) [102104] are spatial, unorthodox, predictive models and PDE alternatives. CA are constructed on a regular lattice, and this restriction is removed for ABMs [105]. As examples of Artificial Life [106], an agent (or node) acts independently subject to simple rules on the local environment. The collective can prophesize complex phenomena that other predictive methods cannot [107]. These models simulate heterogeneity and mixing [108], but the PDEs that they represent are not apparent [52, 109]. CA can reduce to ODEs [110], and for at least one application (computational fluid dynamics), the PDEs that they represent have been derived [111]. Lattice Gas Cellular Automata (LGCA) [110] and Probabilistic Cellular Automata (PCA) or Stochastic Cellular Automata (SCA) [103, 104] are subclassifications of CA [52, 109].

The author considers Monte Carlo simulations of COVID-19 to be an unorthodox predictive method [112115]. Note that here the BCMs have been classified as stochastic orthodox predictive. One may require reclassification [116]. Other less traditional, in the infectious epidemiology context, stochastic models [113] include multivariate stochastic processes [117] and Brownian motion [118].

Forecasting presumes that phenomena have a recognisable and reproducible pattern. Forecasting fits a curve to a historical pattern and extrapolates the pattern into the foreseeable future [66, 119124]. The Fourier theorem states that any curve can be reproduced by an infinite series of superimposed sinusoidal waves [24, 125129]. Filtering refers to the attenuation (or omission) of frequencies that do not substantively contribute to the signal [24, 125, 130], resulting in a finite series. In electrical engineering, signal noise is presumed to have high frequency. A low-pass filter (allowing low frequencies to pass) attenuates the noise and smooths the resultant signal [125]. Generally, smoothing is a subset of filtering [125] that attenuates high-frequency signals.

The Box–Jenkins forecasting models [121, 122, 131133] also fit curves. The prototype is the Autoregressive Moving Average (ARMA) model that forecasts weakly stationary behaviour. The Autoregressive Integrated Moving Average (ARIMA) includes trend by differencing to transform the ARIMA model into a stationary model [121, 131136]. Seasonality (periodicity) can be incorporated into these time series models [132, 137]. The term autoregression refers to historical data points of a curve being used to estimate the model parameters that predict future values of that same curve [138]. Autocorrelation is a metric of how well past results may foretell future results.

ARIMA models fit a linear combination of a finite number of earlier observations and their differences – parsimonious models [131, 133, 137, 139142]. See Mills [138, Chaps. 6 and 11] for an introduction to nonlinear functions. The curve fitting distinguishes the traditional time-series models [19, 122, 133, 135, 137, 140145] from the Artificial Intelligence (AI) time-series models [143, 146148]. Traditional, statistical estimation methods include the maximum likelihood method, the conditional sum of least squares and the ordinary sum of least squares [131133, 135, 145, 149, 150]. AI is an expanding topic with an evolving definition [151156]. For the examples in this paper, AI is defined mechanistically as a collection of methods (techniques) that searches a space for an adequate solution [152, 153, 155]. For COVID-19 forecasting, the AI methods search for a set of parameters that fits a curve adequately [20, 21, 146, 147, 157]. Although the parameters are not necessarily optimal, AI excels at nonlinear models with or without a priori knowledge or understanding of the system’s behaviour [158]. The search techniques used in COVID-19 are biologically inspired, modern AI methods [153, 155, 156]. These include swarming-inspired Particle Swarm Optimisation (PSO)[159], evolution-inspired Genetic Algorithms (GA)[160], neurologically inspired Artificial Neural Network (ANN) [161] and deep learning methods like Long Short-term Memory (LSTM) [161].

Orthodox, predictive models also generate time-series that can be compared to field epidemiology time-series [162, 163]. Machine Learning (ML) [143, 164166], ANN [167169] and GA [168, 169] have found orthodox, predictive model parameters that adequately replicate COVID-19 field time-series. Deep learning is a neurologically inspired ML technique [146, 154, 156, 164] that has also been used in orthodox, predictive model parameter estimation [170, 171]. Forecasting provides a framework (e.g. ARIMA) and the orthodox, predictive models provide context (e.g. SIR) that constrain the AI search. AI also finds “black-box” associations [147, 161, 172] – associations found without a priori knowledge of mechanism (transmission dynamics in the infectious epidemiology context) and not providing a posteriori understanding of causation (pathophysiology here). ANN [173175] and ML techniques like Random-Forest [176179], decision tree [178, 180, 181], support vector machine [136, 178, 179, 182184] and LSTM [136, 185, 186] have found such unbiased COVID-19 associations [147, 156]. A time series can also be an association [136, 186189]. Unorthodox, predictive models like CA and ABMs are swarming-inspired AI [153, 155], which are not search methods. Figure 2 classifies the AI methods discussed in citations of this paper.

Figure 2
figure 2

Artificial intelligence methods used in COVID-19 as cited in this paper

State-space models are a subset of signal-plus-noise problems [130] and are introduced as a form of forecasting [132, 138]. Briefly, an observation (space) equation and a state equation are coupled. Each of these equations has a superimposed uncertainty that is assumed Gaussian [190]. The observation (measurement) equation’s independent variable is the signal. In infectious epidemiology, reported new cases, disease mortality [191], waste water serology [192] and combinations thereof are examples of signals. The signal can be a proxy [193, 194] that can be affected by both testing strategy and implementation [36, 47, 191]. For example, South Korea’s strategy of significantly increasing access to testing [195] in COVID-19 may have affected the signal quality. Conceivably, universal testing is more effective [196, 197] but less efficient [198] than opportunistic, symptomatic testing [199202]. Nevertheless, the effectiveness of these strategies should converge when asymptomatic infection is rare. Conceivably, a well-implemented track-and-trace policy can outperform a poorly promoted/implemented universal testing policy [198, 203, 204].

The unobserved state function is based on a priori knowledge of a system’s behaviour and can be deterministic [205, 206] or empiric [207, 208]. The measured observation/signal is coupled to an unknown state. Backward and forward recursion approximates a state that corresponds to the signal [130]. The Kalman filter is a popular recursion method for implementing state-space models [192, 205210].

The above models require local, disease-specific data for prophesy. The local estimate requirement causes delay. Cardoso and Gonçalves [22] propose a form for a universal population size- or density-dependent scaling law and use regression [119] to determine the parameters for COVID-19. Their approach potentially circumvents the need to estimate local modelling parameters locally. Rather, parameters from other centres can be projected – adjusted for local conditions. Figure 3 illustrates the one week delay [211] in the stage of spread of the ancestral SARS-CoV2 between the UK and the USA [212, 213]. Given the time dependence of intervention, the universal scaling law may prove more beneficial to regions less connected to the epicentre like India in Fig. 3’s COVID-19 example.

Figure 3
figure 3

Delay in confirmed cases per million population by country [212, 213]

Hu et al. [214] potentially circumvent Cardoso and Gonçalves’ [22] regression’s requirement for multiple pre-existing disease centres by repurposing formulae from the kinetic theory of ideal gases to derive population density-dependent contact rates. Hu et al.’s contact rates are an alternative to the HI-STR model’s law of mass action. The HI-STR prototypes’s contact rate is population-size and density dependent [52].

The orthodox, predictive models generally assume that the rate of change of the infectious compartment is directly proportional to the size of the infectious and susceptible compartments:

$$ \partial _{t}(S\ast I) =\dot{I} \propto S\times I, $$

where \(S\ast I\) is a state where members of the two compartments are sufficiently close to transmit the pathogen (or react in the chemistry analogue to follow). This assumption is Hamer’s mass action law. As its name suggests, it is modified from chemistry’s (empiric) law of mass action, which states that the reaction rate is proportional to the concentration of the reagents:

$$ \partial _{t}(s\ast i) \propto s \times i, $$

where s and i are the concentration in the chemistry analogue and the density (per unit surface area) in the epidemiology analogue. Superficially, if the volumes (or surfaces) remain constant, then these equations are the same. Intuitively, S and I molecules occupying 1 m3 are more likely to interact than the same number of molecules occupying 1000 m3 because of their proximity to each other.

The HI-STR [52] proposes that a probability density function exists for the likelihood that an infected (or transmission-cable) individual in compartment I can be sufficiently physically close to an individual in compartment S to transmit enough pathogen for the individual in S to become infected and a member of I – transmission. It further proposes that this probability density function is proportional to the population densities in the two compartments:

$$ P(t) \propto s \times i. $$

For population \(N \gg 1\), the possible interactions are \(\approxeq \frac{N^{2}}{2}\) [52], and the transmission rate is

$$ \dot{T} \propto N^{2} \times s \times i, $$

which resembles the law of mass action – the rate of the reaction is proportional to the concentration of the reactants.

The HI-STR further proposes that there are location-specific behavioural differences that either retard or promote the transmission of the pathogen [59] such that the rate at which individuals enter the transmissible compartment is

$$ \dot{T}_{\mathrm{in}} \propto \kappa (\mathbf{x})N^{2} s\times i, $$

where \(\kappa (\mathbf{x})\) is a metric reflecting behavioural difference with respect to the transmission of disease.

Given that experimentally determined \(\kappa (\mathbf{x})\) are unavailable, HI-STR projection can only be applied across populations with the same behaviour. This paper creates the intuitive concept of Sufficiently Similar Social Bahaviour (SSSB) with respect to pathogen transmission. Two populations have SSSB with respect to a disease’s transmission if transposing the behaviour of group A to group B does not result in an appreciable change in the transmission dynamics for that disease in group B. A possible formalism could be let \(\varepsilon (x)\) represent uncertainty in x. Then the uncertainty in the measurement of disease transmission metric ζ for behaviour A is \(\varepsilon (\widehat{\zeta}(A))\). After fitting a prediction curve, let the predicted \(\zeta (A)\) be \(\breve{\zeta}(A)\). The prediction uncertainty is

$$ \varepsilon \bigl(\breve{\zeta}(A)\bigr) = \sqrt{ \varepsilon \bigl(\breve{ \zeta}(A) - \widehat{\zeta}(A)\bigr) ^{2} + \varepsilon \bigl( \widehat{\zeta}(A)\bigr)^{2}}. $$

Let SSSB exist if

$$\begin{aligned}& \bigl\vert \breve{\zeta}(A) - \widehat{\zeta}(B) \bigr\vert \leq \frac{\varepsilon (\breve{\zeta}(A))}{2}, \\& \varepsilon \bigl(\breve{\zeta}(A) - \widehat{\zeta}(B) \bigr) \leq \frac{\varepsilon (\breve{\zeta}(A)) }{2}. \end{aligned}$$

The rate at which individuals leave the transmissible compartment is not proportional to the size of the compartment but to the rate at which they leave the susceptible compartment one transmissible period (Δτ̅) ago. Thus

$$ \dot{T}_{\mathrm{out}}(t) = \dot{S}(t-\Delta \overline{\tau}) \approx - \dot{T}(t- \Delta \overline{\tau}). $$

The resultant rate of change in the T compartment (\(\dot{T}_{\mathrm{in}} + \dot{T}_{\mathrm{out}}\)) is a DDE,

$$ \dot{T}(t) \propto \kappa (\mathbf{x})N^{2} s(t) \times i(t) -\dot{T}(t- \Delta \overline{\tau}). $$
(2)

The population densities \(s(t)\) and \(i(t)\) emphasize that (2) has been constructed for an homogenous population on a surface. However, (2) is not a PDE and can predict neither spatial spread nor a spatial gradient.

The HI-STR creates another intuitive concept – the Sufficiently Isolated Population (SIP) – to circumvent the problem that (2) is not a PDE. The SIP is related to the flow of information – in this instance, SARS-CoV2 Ribonucleic Acid (RNA) – onto and over a surface [215]. There are two requirements for a SIP to exist. For a completely isolated population, information never flows across the boundary of the SIP’s surface. For a completely connected region, information flows instantaneously across the boundary between regions A and B as soon as it appears in one. Clearly, each of these extremes is an idealisation. Incompletely isolated (or connected) regions are characterised by a delay in the transmission of information. This first SIP criterion can be identified retrospectively and phenomenologically as illustrated in Fig. 3, where the daily per capita cases for six countries follow the same trajectory, but there is s delay. In Fig. 3 the UK is well connected to Germany, and the USA is well connected to Canada. In Fig. 3, it is also evident that India is more isolated from Italy than the UK is isolated from Italy. In the opposite direction, it is not obvious that the UK is more isolated from India than the USA is isolated from India. Restated, isolation may be directional. The sufficiency in SIP depends on objective. For example, India may take longer to convert 10% of ward beds to intensive care unit (ICU) beds than Germany. In this context the time to readiness may define the delay that equates to sufficient isolation. The second SIP criterion avoids a spatial gradient on the surface on which a SIP exists by requiring information to flow instantly over the surface. Effectively, this criterion is recursive – SIPs cannot exist within a SIP – and the probability of transmission is the same for all points on the surface. For HI-STR projection, the delay is of less significance. Potential SIPs are recognised by demonstrable distinct regional pandemic characteristics [23, 216219] instead.

The much broader topic of \(\mathcal{R}_{0}\) estimation is beyond the scope of this paper [7, 59, 220]. Let it suffice that Böckh’s original concept of \(\mathcal{R}_{0}\) – the average number of girls produced by a female during her reproductive years – was calculated by counting births in the public record [8, 12, 220, 221]. This statistical approach to the estimation of \(\mathcal{R}_{0}\) is distinct from the stochastic models presented above, in part, because they make no a priori assumptions about the mechanism by which pathogens propagate and spread – in the epidemiology analogue. Here the (noun) estimate of a parameter refers to the metrics that quantify that parameter (e.g. \(\mathcal{R}_{0}\)) and the uncertainty in the quantification of that parameter. The verb estimate refers to the experimental or field epidemiology techniques used to generate an estimate. Implicitly, estimating does not prophesize. The latter also distinguishes estimating from stochastic modelling and forecasting. This paper reviews models that prophesize the proliferation of diseased individuals, contextualises the recently derived HI-STR model within a classification of these models, uses \(\mathcal{R}_{0}\) estimates \(\widehat{\mathcal{R}}_{0}\) to validate the HI-STR’s prediction of population size- and density-dependent \(\mathcal{R}_{0}\) and demonstrates the HI-STR’s ability to project \(\mathcal{R}_{0}\) (\(\widetilde{\mathcal{R}}_{0}\)).

Restated, a spectrum of methods exist for estimating \(\mathcal{R}_{0}\). Direct counting makes no a priori assumptions about the mechanism of disease propagation nor does it prophesize disease proliferation [222]. The most obvious and direct counting method would involve frequently, regularly, efficiently and effectively recording the state of every individual in a population with a perfect test of state (infection) [223]. At the other extreme, \(\mathcal{R}_{0}\) estimation assumes a model of disease proliferation and then uses curve-fitting to estimate parameters [215, 224227]. In principle, the predictive models reviewed in this paper (and summarised in Fig. 1) constitute the a priori assumptions of the curve-fitting extreme. The forecasting models described above can appear as methods at either extreme depending upon whether they make a priori assumptions about the mechanism of propagation. The method of curve-fitting can either be traditional (a collection of statistical methods) or AI as described above.

Between the above estimation extremes are at least two types of indirect counting methods. The first counts the relevant parameter (e.g. \(\mathcal{R}_{0}\)) more efficiently than direct counting by assuming a recognised statistical distribution for a sample of that parameter (\(\mathcal{R}_{0}\)) [223, 228]. It assumes neither a mechanism of propagation nor a form of proliferation. The second indirect method assumes a model of propagation to relate the parameter being estimated to a proxy and then indirectly counts the proxy [211, 216, 219, 223, 226, 228231]. For example, \(\mathcal{R}(t)\), the number of secondary cases directly generated by a primary case, is mathematically related to the generation interval \(T_{c}\) by assuming a form to the proliferation. \(T_{c}\) is counted indirectly and efficiently by assuming a statistical distribution for the \(T_{c}\) sample [229, 232]. Finally, \(\mathcal{R}\) is inferred from the mathematical relationship between \(T_{c}\) and \(\mathcal{R}\). Several relationships \(T_{c} \sim \mathcal{R}\) exist [9, 225, 233, 234]. Two such relationships are [229]

$$ \mathcal{R}_{0} \geq \mathcal{R} = 1 + rT_{c}\quad \textrm{and}\quad \mathcal{R}_{0} \geq \mathcal{R} = e^{rT_{c}}, $$

where r is the rate of exponential growth, a consequence of the Lotka–Euler equation [9, 229]). Other proxies include final epidemic size, equilibrium conditions like age-independent prevalence data and age-specific prevalence data [8].

Superimposed on biases introduced by different estimation methods, differences in the definition of \(\mathcal{R}_{0}\) and their formalisation can exist. The Sharpe and Lotka [235] formalisation of Böckh’s demographic calculation [8, 12, 229] is

$$ \mathcal{R}_{0} = \int _{0}^{\infty} p(a)\beta (a)\,da. $$
(3)

In the demographic analogy, \(p(a)\) is the probability of a woman surviving to age a, and \(\beta (a)\) is the rate at a which women of age a give birth to girls. In the epidemic analogue, a is the time from becoming infected and is designated the age of infection [222], \(p(a)\) is the probability that a host remains a host at time a, and \(\beta (a)\) is the rate at which new hosts are generated at time a. The formalism can, in principle, be applied directly early in the epidemic when contact tracing is practical. In this context, (3) is the average number of secondary cases that a host would infect in a completely susceptible population – MacDonald’s epidemiological definition of \(\mathcal{R}_{0}\) [8, 12, 236]. The author has not seen McDonald’s original paper [236], only quotations [8, 12].

For ODE compartment models, several formalisations of \(\mathcal{R}_{0}\) exist. For the SIR model [13, 237],

$$ \mathcal{R}_{0} = \frac{\xi (t_{0})}{\alpha}, $$

where \(\xi (t)\) is the contact rate [13] (the average number of adequate contacts that an infective makes per day [237]) or the horizontal transmission incidence [238] (infection rate of susceptible individuals through their contacts with infectives), α is the removal rate [13, 237], and \(1/{\alpha}\) is the duration of infection. This formalism and similar (e.g. the equivalent for the SEIR model) are sufficiently intuitive that the parameters can be estimated and, in turn, be used to calculate \(\widehat{\mathcal{R}}_{0}\) [8, 230, 237, 238].

In an homogenous population, one may anticipate a unimodal (and possibly symmetric) distribution of secondary infections from a sample of hosts. In such a population, a direct application of formalism (3) may calculate \(\widehat{\mathcal{R}}_{0}\). In a heterogenous population, host samples with such unimodal distributions may be anticipated in carefully selected subpopulations, but this assumption may be implausible for the whole population [239]. To apply (3) directly would require both appropriately defined subpopulations and the appropriate formula for averaging over the subpopulations [7, 59]. Diekmann et al. [240] define \(\mathcal{R}_{0}\) as the average number of new cases of an infection caused by one typical infected individual in a population consisting of susceptibles only [241]. A typical host is a distributed individual – a composite of the homogenous subpopulations that form the heterogenous population [240]. This \(\mathcal{R}_{0}\) is formalised as the spectral radius (dominant eigenvalue) of the Next Generation Matrix (NGM) [240]. Neither the definition nor the formalism is intuitive and does not necessarily produce the same value as MacDonald’s definition and corresponding formalisms [8, 240]. However, it identifies the average \(\mathcal{R}_{0}\) as the geometric mean of the subpopulations’ \(\mathcal{R}_{0}\)s [8, 239, 240]; it provides the formula for relating the fitted parameters to \(\widehat{\mathcal{R}}_{0}\) in curve-fitting estimates [163, 227, 241] and retains the threshold (\(\mathcal{R}_{0} =1\)) between self-limitation and endemicity. Although Diekmann et al.’s definition is interpreted as applying to heterogenous populations occupying the same geographic space [242] (like the 8 population gonorrhea model [239]), it can also be interpreted as applying to composite generations – super-generations consisting of sub-generations containing unique hosts in the same spatial region. Examples include the two-gender gonorrhea model [239] and host-vector models [240].

For the HI-STR[52], two timescales are constructed. The transmissible timescale is constructed to reduce the DDE model to a more familiar ODE model. A rhythmic timescale is constructed to obviate the periodicity of the contact rate (e.g. diurnal variation). \(\mathcal{R}_{0}\) is defined as the average number of secondary cases that a host produces in a completely susceptible population while that primary host is in a transmissible state. It differs from MacDonald’s definition because the primary host does not necessarily directly infect secondary hosts. Restated, Benjamin [52]’s definition of \(\mathcal{R}_{0}\) allows (incomplete) tertiary and quarternary generations of hosts to be included in the sum, provided that the infections occur whilst the primary host is still in the transmissible state. This definition was selected to be consistent with the \(\mathcal{R}_{0}\) formalism derived from the HI-STR’s system of ODEs in the transmissible timescale. Like MacDonald’s definition, it is for a homogenous population. Nevertheless, given that the spectral radius of the NGM reduces to ODE compartment models’ formulae for \(\mathcal{R}_{0}\) in homogenous populations, the NGM represents an alternative formalism for the HI-STR’s \(\mathcal{R}_{0}\) in the transmissible times scale. A generation in the transmissible timescale is a compound generation consisting of \(\mathfrak{B}\) generations in the rhythmic timescale. A compound generation is a super-generation consisting of sub-generations having the same characteristics. The \(\mathcal{R}_{0}\) in the rhythmic time scale is the geometric mean of the \(\mathcal{R}_{0}\) in the transmissible timescale [52] – similar to the NGM’s \(\mathcal{R}_{0}\) for composite generations.

This paper neither endeavours to compare \(\mathcal{R}_{0}\) definitions and formalisms nor the most efficient and effective \(\mathcal{R}_{0}\) estimation methods. It accepts that every combination of definition, formalism and method is biased. Of relevance is that the same definition-formalism-method combination is used across regions to validate HI-STR projection for COVID-19.

2 Methods

The HI-STR prototype is based on the SIR model but replaces two assumptions and is formulated for an isolated population on a surface [52]. Thus

  1. 1.

    it is explicit that the model only applies to SIPs,

  2. 2.

    results in a population-density-dependent contact rate, and

  3. 3.

    the PDE problem is replaced by SIPs and SSSB recognition problems.

Hamer’s mass action law [243] assumption is replaced with the law of mass action [52], its chemistry precursor [244, 245]. A probability density function for a single successful transmission is constructed, which reflects the proximity of transmission capable and susceptible:

$$ P(t) = \eta \mu \kappa (\mathbf{x}) s(t)\tau (t), $$
(4)

where η is an infectious disease-specific variable that reflects avidity, μ is a function of mode of transmission, \(\kappa (\mathbf{x})\) is a function of social behaviour, \(s(t)\) is the density of susceptible individuals, and \(\tau (t)\) is the density of hosts capable of transmitting the pathogen [52]. The total transmission (including those of secondary hosts) in a population of size \(N \gg 1\) and population density \(\rho _{n}\) over the period that the primary host is transmission capable (Δτ̅) is shown to be

$$ \int _{\Delta \overline{\tau}}\dot{T}(t)\,dt \approx \int _{\Delta \overline{\tau}} \eta \mu \kappa \frac{N^{2}}{2} s(t) \tau (t)\,dt = \int _{\Delta \overline{\tau}} \beta _{A} \rho _{n}^{2} S(t) T(t)\,dt, $$

where \(S(t)\) is the size of the susceptible population, \(T(t)\) is the size of the transmission-capable population, and \(\beta _{A} =\frac{\eta \mu \kappa}{2}\) [52].

The SIR model’s exponential infectious period assumption is replaced with the HI-STR prototype’s more biologically appropriate constant transmission period. This results in the SIR-like DDE system of equations [52]

$$\begin{aligned}& \dot{S}(t) = -\beta _{A}\rho _{n}^{2} (\mathbf{x}) S(t) T(t), \\& \dot{T}(t) = \beta _{A}\rho _{n}^{2} ( \mathbf{x}) S(t) T(t) - \dot{T}( t-\Delta \overline{\tau}), \\& \dot{R}(t) = \dot{T}(t-\Delta \overline{\tau}). \end{aligned}$$

The delay term reflects that the rate at which individuals leave a compartment is the same as that at which they entered one transmissible period (Δτ̅) ago.

Selecting a timescale (the transmissible timescale) in which a unit of time Δt equates to Δτ̅ (\(1:\Delta t = 1: \Delta \overline{\tau}\)) renders the delay negligible, reducing the above DDE model to an ODE model [52]:

$$\begin{aligned}& {}_{\tau}\dot{S}(t) = -\beta _{A}\rho _{n}^{2} (\mathbf{x}) S(t) T(t), \\& {}_{\tau}\dot{T}(t) = \beta _{A}\rho _{n}^{2} (\mathbf{x}) S(t) T(t) - \dot{T}( t), \\& {}_{\tau}\dot{R}(t) = \dot{T}(t), \end{aligned}$$
(5)

where the τ identifies the timescale being used. The transmissible timescale is a short timescale with less temporal detail – remaining consistent with the spatial scaling analogy’s terminology.

The short-term periodicity of infection opportunity in (4) – e.g. diurnal variation in contact rate in (5) – is obviated by constructing a second timescale (the rhythmic timescale) in which the infection opportunity can be treated as constant. The rhythmic timescale is \(1:\Delta t = 1: \delta t\), where δt is the period of the infection opportunity cycle for Equation (4). For respiratory infectious diseases, δt is the host’s sleep-wake cycle (i.e. 1 day). A unit of time in the rhythmic timescale is necessarily shorter than the equivalent in the transmissible timescale because at least one cycle of infection opportunity must exist within the period that the host is infectious for an infectious disease to spread. The rhythmic timescale is a long timescale with greater temporal detail to remain consistent with the spatial scaling analogy’s terminology.

Let there be \(\mathfrak{B}\in \mathbb{N}\) units of δt in Δτ̅. For temporal continuity between the timescales, the number of new transmission after one compound generation in the transmissible timescale must equal the new transmissions after \(\mathfrak{B}\) generations in the rhythmic timescale. Alternatively, the new infections after \(\mathfrak{B}\delta t\) in the rhythmic timescale are the same as after Δτ̅ in the transmissible timescale [52]. The temporal continuity condition can be formalised as

$$\begin{aligned} {}_{\tau}T(t+ \Delta \overline{\tau}) - {}_{\tau}T(t) = {}_{\rho}T(t + \mathfrak{B}\delta t) -{}_{\rho}T(t)\quad \forall t, \end{aligned}$$

where ρ identifies the rhythmic timescale. Using a binomial expansion to enforce continuity, the transmissible timescale ODE system (5) reduces to

$$\begin{aligned}& {}_{\rho}\dot{S}(t) = - \sqrt[\mathfrak{B}]{\beta _{A}\rho _{n}^{2}N} \frac{S(t) T(t)}{N(\mathbf{x})}, \\& {}_{\rho}\dot{T}(t) = \sqrt[\mathfrak{B}]{\beta _{A} \rho _{n}^{2}N} \frac{S(t) T(t)}{N(\mathbf{x})} - \sqrt[ \mathfrak{B}]{{}_{\tau}\alpha } T(t), \\& {}_{\rho}\dot{R}(t) = \sqrt[\mathfrak{B}]{{}_{\tau} \alpha } T(t) \end{aligned}$$

in the rhythmic timescale [52], where τα is the infection frequency in the transmissible timescale. The HI-STR model’s rhythmic timescale basic reproduction number for SIP z is then [52, 246]

$$ {}_{\rho}^{z}\mathcal{R}_{0} = \sqrt[\mathfrak{B}]{\frac{\beta _{A} \,\times \, {}^{z}\rho _{n}^{2} \,\times \, {}^{z}N}{{}_{\tau}\alpha}}. $$
(6)

Both \(\beta _{A}\) and τα are dependent on social behaviour, which may be cultural [47, 57, 247]. These are assumed constant for populations with SSSB. There is a subtle difference between Böckh’s \(\mathcal{R}_{0}\) and its rhythmic timescale equivalent \({}_{\rho}\mathcal{R}_{0}\). The former only counts new infections in the second generation, whereas the latter also counts new infections in subsequent generations, provided that they are infected in the primary host’s transmissible period. Nevertheless, these are used interchangeably here [52]. Equation (6) is recognised as the geometric mean of the \(\mathcal{R}_{0}\)s for the \(\mathfrak{B}\) generations in the rhythmic timescale that constitute a compound generation in the transmissible timescale.

Dividing Equation (6) for SIP z by the same for SIP y with SSSB derives

$$ \frac{{}^{z}\mathcal{R}_{0}}{{}^{y}\mathcal{R}_{0}} = \sqrt[\mathfrak{B}]{\frac{{}^{z}\rho _{n}^{2}\,\times \,{}^{z}N}{{}^{y}\rho _{n}^{2}\,\times \,{}^{y}N}} $$

– the origin of Equation (1). It is assumed that the anglophone UK and USA have similar concepts of personal space and familiarity with an associated hierarchy of physical interaction rituals [248] such that Equation (1) applies. A metric for SSSB was not identified. Conceivably, host social behaviour could be sufficiently similar across all SIPs. In this case, Equation (1) is a universal scaling law (independent of social behaviour) and should be compared with Cardoso and Gonçalves’ universal scaling law [22] obtained by regression. The UK and USA were selected to increase the likelihood of a successful validation. From Fig. 3, projection should provide the states most connected to the UK an additional week to prepare based on \(\widetilde{\mathcal{R}}_{0}\). The ancestral SARS-CoV2 pathogen was selected because transmission dynamics data were available; there was no interference from VOCs, and the ICL group used the same \(\mathcal{R}_{0}\) estimation method for the ancestral SARS-CoV2 in both the USA and the UK.

The ICL group’s statistical estimates of \(\mathcal{R}_{0}\) for the wild-type SARS-CoV2 in the UK and the states of the USA are used to validate the HI-STR’s projection from the UK to these USA states. The ICL group estimated the ancestral SARS-CoV2’s \(\widehat{\mathcal{R}}_{0}\) for the UK [211] and the individual states of the USA [219] by counting a proxy – reported mortality. For the UK, estimates counted reported COVID-19 deaths from February 2020 to 4 May 2020 [211]. The USA estimates extend the UK method by counting \(100{,}506\) deaths and \(479{,}422\) cases due to COVID-19 from 11 May 2020 to 1 June 2020 for each state [219]. Their semi-mechanistic Bayesian hierarchical model is sensitive to the generation interval [211], but a gamma distribution with mode 6.5 days was used in their \(\widehat{\mathcal{R}}_{0}\) for both the UK and the USA’s states. Consequently, biases introduced by different field and estimation methods are avoided.

From Equation (1),

$$ {}^{z}\widetilde{\mathcal{R}}_{0} = \sqrt[\mathfrak{B}]{\frac{{}^{z}\hat{\rho}_{n}^{2}\,\times \,{}^{z}\widehat{N}}{{}^{UK}\hat{\rho}_{n}^{2}\,\times \,{}^{UK}\widehat{N}}} \times{}^{UK}\widehat{ \mathcal{R}}_{0}. $$
(7)

The paired Student t-test is used to compare the USA basic reproduction number estimate for state z \(( {}^{z}\widehat{\mathcal{R}}_{0} )\) [219] to the UK’s projection on z \(( {}^{z}\widetilde{\mathcal{R}}_{0} )\).

Fortuitously, the UK and USA subsequently embarked on similar COVID-19 interventions [247]. Thus, having demonstrated that projection can be performed between the UK and the USA’s states, a contender VOC – common to the UK and USA – that rapidly replaced an incumbent was selected to demonstrate that projection could be applied to a VOC. For this implementation of projection, rapid replacement (minimal mixing) is required to ensure that the SARS-CoV2 family’s reproduction number \(\widehat{\mathcal{R}}(t)\) estimated when the VOC contender emerges at \(t = t_{0}\) represents that VOC’s \(\widehat{\mathcal{R}}_{0}\). The Delta (B.1.617.2) variant was selected because it replaced the incumbent Alpha (B.1.1.7) variant in both the UK [32, 249] and the USA [35, 47, 250] within a month as described in Appendix D.

Vaccination protects against symptomatic SARS-CoV2 Delta infection [251] but is not as effective against Delta infection [252255] depending on the time from vaccination [256, 257]. This finding supports the argument that \(\mathcal{R}(t_{0})\) is equivalent to \(\mathcal{R}_{0}\) for the SARS-CoV2 Delta variant when it emerged.

A private organisation’s variant sequencing data [250] was used to establish \(t_{0}\) – the date of the transition to Delta – for each state of the USA. They were selected because they provided state level variant sequencing data for the USA. Variant prevalence data can be found at https://public.tableau.com/ [258]. The threshold prevalence for \(t_{0}\) was arbitrarily set as the Delta variant representing 20% of the sequenced SARS-CoV2 genomes. 20% was considered small enough to satisfy the condition that the population is completely susceptible to the Delta variant. Conversely, given that most time series treat COVID-19 as one disease with one time series for all SARS-CoV2 variants, 20% was considered large enough for a substantial portion of a combined \(\widehat{\mathcal{R}}(t)\) to be due to the Delta variant. It is recognised that these criteria introduce bias.

Counting a proxy estimates \(\mathcal{R}_{0}\) for the Delta variant. These estimates validated the HI-STR’s VOC \(\widehat{\mathcal{R}}_{0}\) projection from the UK to the USA. Yap and Yong [216] smooth reported case time series with a moving average. A Poisson distribution then constructs a symptom onset time series based on the reported case time series. A second Poisson distribution constructs a date infected time series (\(k_{t}\)). A third Poisson distribution is fitted to the date infected time series to estimate \(\lambda _{t}\) – the expected infections for the day. For the previous day’s estimated new infections (\(k_{t-1}\)), the formula relating \(\mathcal{R} \sim \lambda _{t}, k_{t-1}\) is

$$ \lambda _{t} = k_{t-1}e^{\alpha (\mathcal{R}-1)}, $$

where \(\alpha ^{-1}\) is the infectious period. The https://cv19.one [216, 259] data repository was selected to provide \(\widehat{\mathcal{R}}_{0}(\Delta )\) – the Delta variant’s \(\widehat{\mathcal{R}}_{0}\). This selection ensures that the same method can be used to estimate the UK’s \(\widehat{\mathcal{R}}(t,\Delta )\) and state level USA \(\widehat{\mathcal{R}}(t,\Delta ) \) [216, 259]; \(\widehat{\mathcal{R}}(t_{0},\Delta ) \approx \widehat{\mathcal{R}}_{0}( \Delta )\).

Independent \({}^{UK}\widehat{\mathcal{R}}_{0}(\Delta ) = 1.44-1.5\) [249, 260] with 95% CI [1.2-1.75] [260] is provided for comparison. The statistical analysis was conducted in the open-source R Project for statistical computing (https://www.r-project.org).

The \(\mathfrak{B}\) for the ancestral SARS-CoV2 and the Delta variant are estimated in Appendices A and D, respectively. The transmissible periods are estimated as the median of other estimates with neither weighting nor confidence intervals.

3 Results

This parameter estimations performed for both the wild-type SARS-CoV2 and for the SARS-CoV2 Delta variant assume no inherent immunity to COVID-19. In immunology, innate immunity refers to a static, generic immune system that non-specifically targets any pathogen invasion. The adaptive immune system generates variations that specifically target a particular invasive pathogen. Natural immunity to a pathogen is due to either previous exposure to that pathogen or cross-reactivity from exposure to another pathogen – heterologous immunity [261]. To avoid confusion with these terms, inherent immunity refers to a genetic (inborn) resistance to or attenuation of certain diseases. Examples of inherent immunity are mutations to the CCR5 receptor on cells that renders some individuals immune to infection by HIV [262], the haemoglobinopathy-malaria hypothesis that proposes that the high haemoglobinopathy carrier prevalence in some population may exist because this carrier state protects against malaria [263] and cystic fibrosis that may offer carriers protection against cholera [264]. Immunity can be natural, inherent or due to vaccination. Vaccination can be heterologous immunity.

3.1 The ancestral SARS-CoV2

The transmission dynamics, distribution of pathology, case fatality rate and other clinical, pathological and epidemiological characteristics associated with the ancestral (wild-type) SARS-CoV2 variant are collectively designated COVID-19(wt). Appendix A demonstrates that the transmissible timescale for COVID-19(wt) is \(1:9\) days, and the rhythmic timescale is \(1:1\) day. The scaling factor between these timescales (\(\mathfrak{B}(wt)\)) is 9.

The \({}^{UK}\widehat{\mathcal{R}}_{0}(wt) = 3.8 \) [3.0–4.5] [211], the \({}^{UK}\widehat{N} = 67{,}886{,}011\) and the \({}^{UK}\hat{\rho}_{n} = 280.6~\mathrm{km^{-2}}\) in 2020 [265]. Equation (7) projects this \({}^{UK}\widehat{\mathcal{R}}_{0}(wt)\) onto the states of the USA. Appendix B removes any outliers among these projections – reducing the sample size to 40 states.

Figure 4 compares the estimated basic reproduction number \(\widehat{\mathcal{R}}_{0}(wt)\) density distribution [219] of the remaining 40 states to those projected from the UK’s wild type SARS-CoV2 estimate \({}^{UK}\widehat{\mathcal{R}}_{0}(wt)\) [211]. Figure 4(a) projects the median UK estimate \(({}^{UK}\widehat{\mathcal{R}}_{0}(wt) = 3.8 )\) for the wild-type SARS-CoV2, whereas Fig. 4(b) projects \({}^{UK}\widehat{\mathcal{R}}_{0}(wt) = 4.2\). The latter remains within the uncertainty of the UK’s \(\widehat{\mathcal{R}}_{0}(wt)\) estimate [211].

Figure 4
figure 4

(a) Density distribution comparison between estimated wild-type SARS-CoV2 \(\mathcal{R}_{0}\) and the median estimated wild-type SARS-CoV2 \(\mathcal{R}_{0}\) for the UK projected on to the USA’s states. (b) Box-and-whisker plot comparison between estimated wild-type SARS-CoV2 \(\mathcal{R}_{0}\) and a wild-type SARS-CoV2 \(\widehat{\mathcal{R}}_{0} = 4.2\) for the UK projected on to the USA’s states

Table 1 summarises the results of the paired Student t-test comparing the \(\widehat{\mathcal{R}}_{0}(wt)\)s [219] of the 40 USA states \(({}^{z}\widehat{\mathcal{R}}_{0}(wt):1\leq z\leq 40, z \in \mathbb{N} )\) to the UK’s projections \((\widetilde{\mathcal{R}}_{0}(wt) )\) for \(3.0 \leq {}^{UK}\widehat{\mathcal{R}}_{0}(wt) \leq 4.5 \) [211] onto those 40 states. For \(4.2 \leq {}^{UK}\widehat{\mathcal{R}}_{0}(wt) \leq 4.5\), a statistically significant difference between the \({}^{z}\widehat{\mathcal{R}}_{0}(wt)\) and \({}^{z}\widetilde{\mathcal{R}}_{0}(wt)\) samples does not exist for those 40 states. For \(3.0 \leq {}^{UK}\widehat{\mathcal{R}}_{0}(wt) \leq 4.1\), although there is a statistically significant difference between the \(\widehat{\mathcal{R}}_{0}(wt)\) and \(\widetilde{\mathcal{R}}_{0}(wt)\) samples, this difference is not epidemiologically significant when compared to the uncertainty in \({}^{z}\widehat{\mathcal{R}}_{0}(wt) \) [219]. An epidemiologically significant change is one greater than the uncertainty in the estimate.

Table 1 Comparison of paired Student t-test results between estimated and projected wild-type SARS-CoV2 basic reproduction numbers in the USA for various UK basic reproduction number estimates for the wild-type SARS-CoV2. \(\mu _{d}\) is the mean of differences. \(CI_{d}\) is the 95% confidence interval of the differences. \(N=40\)

The parameter estimates for \(\mathfrak{B}(wt)\) have considerable variation (Tables 2 and 3). Appendix C is a sensitivity analysis demonstrating that, up to an inherently immune fraction of 50%, the change in the \({}^{z}\mathcal{R}_{0}(wt)\) projections are not epidemiologically significant – Fig. 5(b). Similarly, changing the symptomatic fraction causes no significant change in \(\mathcal{R}_{0}(wt)\) relative to the uncertainty in the estimate – Fig. 5(a). The infectious and transmissible periods are varied in Figs. 5(c) and (d), respectively.

Figure 5
figure 5

Sensitivity analysis for COVID-19(wt): (a) Symptomatic fraction; (b) Inherently immune fraction; (c) Infectious period; (d) Transmissible period

Table 2 Asymptomatic prevalence for COVID-19(wt)
Table 3 Median latent, incubation, infectious and transmission periods for symptomatic and asymptomatic COVID-19(wt) patients from early 2020. \(A = \mathit{asymptomatic}\), \(d=\mathit{days}\)

Despite symptomatic fraction and inherent immune fraction not having significant effects on \(\widetilde{\mathcal{R}}_{0}(wt)\) relative to the uncertainty in the ICL’s \(\widehat{\mathcal{R}}_{0}(wt)\) for the UK and the states of the USA, the HI-STR predicts that increasing the symptomatic fraction decreases \(\mathfrak{B}(wt)\) and, consequently, \(\mathcal{R}_{0}(wt)\) by increasing the contribution of those with a shorter transmissible period (Fig. 5(a)). It confirms that increasing the inherently immune fraction reduces \(\mathcal{R}_{0}(wt)\) (Fig. 5(b)). As expected, increasing the infectious or transmissible periods increase \(\mathfrak{B}(wt)\) and therefore \(\mathcal{R}_{0}(wt)\) (Figs. 5(c) and (d)).

3.2 The SARS-CoV2 Delta variant

Appendix D calculates that \(\mathfrak{B}(\Delta )=6\) when 40% of the UK and USA populations were fully vaccinated against COVID-19.

The ‘COVID-19 HeatMap’ dashboard [216, 259] was used to lookup \(\widehat{\mathcal{R}}_{0}(\Delta )\). This database relates reported cases to infection dates via Poisson distributions. The infection date time series is related to \(\mathcal{R}(t,\Delta )\). This database was selected because it provides both state and country level \(\widehat{\mathcal{R}}(t,\Delta )\) using the same method – avoiding method biases in \(\mathcal{R}(t,\Delta )\) estimation.

The Delta variant transition date for a SIP was arbitrarily set as the date when the Delta variant represents 20% of the SARS-CoV2 Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) sequences for that SIP on that date. From Fig. 10(a) in Sect. D, the UK’s transition date (\(t_{0}\)) is May 15, 2021 [249, 260]. On May 15, 2021, the ‘COVID-19 HeatMap’ dashboard [216, 259] finds that UK’s \(\widehat{\mathcal{R}}(t_{0}) = 1.5 \approx {}^{UK} \widehat{\mathcal{R}}_{0}(\Delta )\). The ‘our world in data’ dashboard [212] and the ‘COVID-19 HeatMap’ dashboard [216, 259] estimates in May 2021 are within the \({}^{UK}\widehat{\mathcal{R}}_{0}(\Delta )\)’s 95% confidence interval [249, 260]. The population size and density estimates for the UK and the USA states used for the ancestral SARS-CoV2 were reused for the SARS-CoV2 Delta variant’s projection.

The date of the transition to the Delta variant could be established for 11 states of the USA. This small sample is because the private organisation’s [250] variant sequencing is opportunistic (not from a screening program) and several states had no data for periods that included the transition prevalence of 20%. \({}^{UK}\widehat{\mathcal{R}}_{0}(\Delta )\) was projected onto these states using (7) with \({}^{UK}\widehat{\mathcal{R}}_{0}(\Delta )\) as the independent variable. These projections are compared to the ‘COVID-19 HeatMap’ estimates [216, 259] in Fig. 6.

Figure 6
figure 6

Comparison of the the UK’s effective reproduction number projection, at the start of the Delta wave, on to 11 states of the USA with Delta wave estimates for those states

4 Discussion

COVID-19 is the collective pathological manifestation of the ancestral SARS-CoV2 and its variants. New variants have the potential to supplant pre-existing variants. Projection provides an efficient method to prophesize location- and variant-specific resource requirements. The HI-STR has demonstrated that projection can foretell the impact of a pathogen variant (the ancestral SARS-CoV2) on the individual states of the USA, provided that an estimate exists for the UK. This was possible because the HI-STR accounts for the effect of population characteristics on the basic reproduction number \(\mathcal{R}_{0}\). These regions were selected because it is assumed that the individual states of the USA can be approximated as SIPs and because it is assumed that these anglophone regions possess SSSB.

It is noted that the HI-STR prototype does not include the effect of demography [266] on \(\mathcal{R}_{0}\) estimation and projection but age-stratified SIR models can be adapted for the HI-STR. Genetically or behaviourally predisposed individuals also represent subpopulations that affect the average transmission period. For COVID-19, diabetics are a subpopulation that are at increased risk of severe disease and death [267, 268]. The opportunity exists to extend the HI-STR to heterogenous populations. Clearly, neither the UK nor the individual states of the USA are homogenous, but an accepted collection of public social behaviourial norms must exist in each instance.

Hawaii, Montana, Alaska and Wyoming are among the outliers. The HI-STR model’s \(\widetilde{\mathcal{R}}_{0}\) is an overestimate for these states. For Hawaii, the sea acts as a natural barrier between SIPs. Because the HI-STR is nonlinear, these regions cannot be combined. Combining SIPs results in \(\widetilde{\mathcal{R}}_{0}\) overestimates. Communities within Alaska, Montana and Wyoming may be sufficiently isolated for them to be treated as SIPs. Conversely, states like New York and Washington, DC, may be insufficiently isolated. Neither the method for averaging \(\widehat{\mathcal{R}}_{0}\) across SIPs nor the determination of \(\widehat{\mathcal{R}}_{0}\) across SIPs is obvious.

The transition to the SARS-CoV2 Delta variant could only be identified for 11 States of the USA. The significance of the comparison between the estimated and projected basic reproduction numbers for the 11 states of the USA could not be determined. Nevertheless, it has been demonstrated that for the minimal-mixing VOC projection, the SARS-CoV2 Delta variant’s \(\widehat{\mathcal{R}}_{0}\) for the UK could have been projected onto at least 11 states in the USA.

This paper’s motivation is the anticipation and preparation for the local impact of novel pathogen or new VOC. Implicitly, each variant is being treated as a new pathogen to which the local population is completely susceptible. The SARS-CoV2 variants are sufficiently closely related that both vaccination and previous infection by the incumbent may confer immunity to the new variant in some individuals. Thus the projection represents an upperbound in which the challenger VOC replaces the incumbent [51]. This model does not address equilibrium states where VOCs form a mixture [39]. Intuitively and theoretically, the inherently and naturally immune individuals should affect the transmission dynamics of the variant and the transmissible period. Given the uncertainty in the wild-type SARS-CoV2 \(\widehat{\mathcal{R}}_{0}\), an epidemiologically significant impact could not be demonstrated here.

Intuitively, asymptomatic carriers increase the reproduction number [269]. Uniquely, the HI-STR predicts this phenomenon (see Appendix C and Fig. 5(a)) but, given the uncertainty in the ancestral SARS-CoV2’s \(\mathcal{R}_{0}\) estimates, epidemiological significance could not be demonstrated. For diseases where a correlation exists between symptoms and mortality, an intervention that only converts symptomatic individuals into asymptomatic individuals may reduce mortality. Ironically, the theory predicts that such an intervention will increase \(\mathcal{R}_{0}\) [270]. Conversely, the HI-STR model explains why the SARS-CoV2 Delta’s higher symptomatic ratio [270] is associated with lower reproductive numbers.

5 Conclusion

When confronting a novel pathogen, the impact of the disease has to be foretold to prepare accordingly. Some of these impacts are the basic reproduction numbers (a proxy for how fast the disease will spread), mortality and morbidity. Some of the impacts that are beyond the scope of this document are the economic and socio-political instabilities caused by the disease and interventions, as well as pathogen evolution.

The HI-STR prototype is a deterministic alternative to the SIR prototype. In principle, it has two advantages over the more mature deterministic compartment models – it incorporates the population size and density in the model, and it acknowledges and includes the impact of social behaviour. The latter is controversial, but it should be noted that the HI-STR has the flexibility to include social behaviour. It may be that physical interaction across regions and cultures is sufficiently similar (from an infectious disease perspective), so that this variable can be treated as a constant. In the latter case, the HI-STR derives a population size- and density-dependent universal scaling law for \(\mathcal{R}_{0}\).

Projection allows region-specific planning and pre-emptive resource allocation. This region-specific \(\mathcal{R}_{0}\) provides a baseline to compare interventions. This paper demonstrates that the HI-STR model can project the UK’s ancestral SARS-CoV2’s \(\widehat{\mathcal{R}}_{0}\) and the SARS-CoV2 Delta variant’s \(\widehat{\mathcal{R}}_{0}\) onto the states of the USA. Applicability in other anglophone and non-anglophone regions remains to be demonstrated.

The long-term success of an intervention depends both on the policy [271, 272] and implementation [273, 274]. Policies and strategies can only be evaluated retrospectively [275278] because of unforeseen long-term risks [279283], low-probability high-impact events [284287] and unintended consequences [288294]. Nevertheless, projection provides timeous local baselines for the comparison of the implementation of similar policies across SIPs with SSSB.

The HI-STR model, like other ODE compartment models, does not predict COVID-19’s waves. COVID-19 has demonstrated that some of these waves may be due to new variants outcompeting incumbents [30]. Although the HI-STR model does not incorporate pathogen evolution and random events like VOCs, here (for the SARS-CoV2 Delta variant) it has been demonstrated that an impact of such a random event can still be projected timeously.