Increasing heroin use is a major contributor to the global opioid crisis. Rates of heroin use, dependence, and death have grown dramatically over the past decade, and the burden of disease associated with opioid use disorder is greater than any other illicit drug class (Degenhardt et al., 2013, 2019; National Institute on Drug Abuse, 2021). In the USA, national opioid-related deaths more than doubled between 2010 and 2019 (WONDER, 2020) and more than 28% of overdose deaths in 2019 involved heroin (WONDER, 2021). While the prevalence of opioid-related incidents across Europe are comparatively lower than the USA, opioid-related deaths in Scotland surged between 2009 and 2018, exceeding those in the USA (29% vs. 14% in 2018), and 45% of drug-related deaths involved heroin (van Amsterdam et al., 2021). In Australia, opioids (including heroin) have been the leading cause of drug-related deaths for the past two decades (Australian Institute of Health & Welfare, 2021b). One in four drug-induced Australian deaths in 2019 was due to heroin; the highest since 1997 (Chrzanowska et al., 2021). This global trend has continued despite decades of research demonstrating the effectiveness of a range of treatments available for heroin dependence, supported by numerous high-quality randomised-controlled trials (RCTs) and naturalistic longitudinal studies (Degenhardt et al., 2019; Hser et al., 2015; Mattick et al. 2009a; Mattick et al., 2014).

Research has also revealed a range of risk and protective factors for relapse, remission, and other outcomes (Chen et al., 2019, 2020; Darke, 2011; Mattick et al. 2009b). For example, poorer treatment outcomes have been associated with more severe heroin use, more extensive polydrug use (Chen et al., 2019; Darke et al., 2007; Hser et al., 1999), criminal involvement (Flynn et al., 2003; Strang et al., 2020), psychological distress, and poorer mental health (Hser, 2007; Schaar & Öjehagen, 2001; Strang et al., 2020). In contrast, ‘treatment readiness’ has been linked to improved client outcomes (Darke et al., 2005; Flynn et al., 2003). Long-term stable treatment, across fewer treatment episodes, has been associated with improved outcomes, regardless of treatment modality (Flynn et al., 2003; Gossop et al., 1999; Teesson et al., 2015), with the exception of detoxification. Due to its nature as a short-term intervention, detoxification is not associated with improved outcomes; on the contrary, repeated episodes of detoxification have been associated with poorer outcomes (Teesson et al., 2015). As with other long-term, stable treatment approaches, the evidence regarding residential rehabilitation demonstrates its efficacy across a range of outcomes (Darke et al., 2006; de Andrade et al., 2019; Gossop et al., 1999).

However, with the expansion in new knowledge comes the complex challenge for clinicians in how to synthesise and integrate the evolving evidence base to guide clinical decision-making in the provision of personalised healthcare—that is, the selection and tailoring of interventions to ensure that the ‘right treatment’ is delivered to the ‘right person’ at the ‘right time’ (Bates, 2010; Primary Health Advisory Group 2016). There is a critical need for easy-to-use clinical tools that synthesise research evidence and provide clinically meaningful information that can be applied in busy practice settings.

Delivering the right treatment to a client based on individual characteristics and personalised decision-making maximises the likelihood of positive outcomes and reduces the costs and negative side effects associated with using inappropriate treatments (Bates, 2010). Personalised healthcare is based on a solid foundation of evidence-based clinical practice that acknowledges variability in patient response to treatments and seeks to incorporate client-specific factors as prognostic indicators (Bates, 2010). Prediction models support clinical decision-making by estimating the risk that a positive (or negative) outcome will occur within a designated timeframe, in a person with particular objective risk factors (Moons et al., 2009). They are fundamental to guiding shared-decision-making between clinicians and healthcare consumers (Moons et al., 2009; Steyerberg, 2019), in the selection of personalised, well-timed, and targeted interventions, care coordination, and referral.

The past two decades have seen rapid growth in the development and advancement in machine learning, with > 600 risk prediction models across medical fields including outcomes related to cardiovascular disease (e.g., Damen et al., 2016), falls after stroke (e.g., Walsh et al., 2016), diabetes (e.g., Yang et al., 2021; Zhang et al., 2016), cancer (e.g., Lee et al., 2019; Usher-Smith et al., 2016), kidney injury (Wilson et al., 2016), and heart failure (e.g., Adler et al., 2020; Sahle et al., 2017). A small number of risk-prediction models have been developed in relation to opioid use; however, these have largely been restricted to examinations of the risk of overdose among people using prescription opioids for pain (Glanz et al., 2018; Klimas et al., 2019; Lo-Ciganic et al., 2019). Despite the wealth of evidence regarding risk and protective factors that contribute to the maintenance and recovery from heroin dependence, as well as other outcomes including fatal and non-fatal overdose, mortality, and co-occurring psychopathology (Chen et al., 2019, 2020; Darke et al., 2014, 2016; Davoli et al., 2007; Hser et al., 2001, 2017; Jimenez-Treviño et al., 2011; Strang et al., 2020; Teesson et al., 2017), there are currently no existing models to assist clinicians utilise and integrate this critical information into their practice and guide treatment selection.

The present study aimed to address this gap using data from the 11-year follow-up of the Australian Treatment Outcome Study (ATOS), the largest and longest prospective longitudinal study of heroin dependence conducted in Australia, with a view to developing a clinical risk prediction model to assist clinicians calculate the risk of a range of heroin-related outcomes at varying follow-up intervals for their clients, depending on baseline information. This information is critical to guiding clinicians in selecting the right treatment, for the right person at the right time, potentially transforming healthcare for people with heroin dependence.

Method

Procedure

The ATOS cohort comprised 615 people with heroin dependence. Baseline data were collected between 2001 and 2002 from 615 people with heroin dependence, 535 of whom were recruited from 19 agencies treating heroin dependence in the greater Sydney region, Australia (201 entering maintenance therapies, 201 entering detoxification, 133 entering residential rehabilitation) (Ross et al., 2005). A comparison group of 80 people who were not entering treatment were recruited from needle and syringe programs. Follow-up interviews were conducted with 549 (89.3%), 495 (80.5%), 469 (76.3%), 429 (69.8%), and 431 (70.2%) people at 3, 12, 24, 36 months and 11 years, respectively, with 94.5% completing at least one follow-up interview. A search of the Australian National Death Index, a statutory register of all deaths in Australia administered by the Australian Institute of Health and Welfare (AIHW), was conducted in July, 2015, and details of deaths that had occurred among participants since 2001 obtained. Participants were matched by full name, sex, and date of birth.

Factors associated with study retention were examined by Teesson et al (2015), and included index treatment modality, age, sex, previous treatment history, number of heroin use days in the preceding month, number of drug types used in the preceding month, major depression, post-traumatic stress disorder (PTSD), antisocial personality disorder (ASPD), and borderline personality disorder (BPD). The final model revealed that the sample re-interviewed at 11 years was broadly representative of the initial cohort; the only independent predictors of loss to follow-up were male sex (67.8 vs. 74.5%; OR = 0.65, 95% CI = 0.44, 0.99) and a diagnosis of major depression at baseline (64.9 vs. 71.8%; OR = 0.67, 95% CI = 0.45, 0.99). ASPD was retained in the final model but was not related significantly to retention at 11 years (p = 0.074). Ethical approval was obtained from the University of New South Wales Human Research Ethics Committees and participating area health services.

Measures

Structured interview

Structured interviews were administered to participants at baseline and at each follow-up utilising measures with established psychometric properties and have been described in detail previously (Ross et al., 2005). In brief, baseline and follow-up interviews included demographic (age, sex, past month main source of income, and past month accommodation), treatment history, drug use history, injection-related health, physical and mental health, and psychopathology. Past-month drug use and criminal involvement were assessed using the opiate treatment index (OTI) (Darke et al., 1992). DSM-IV diagnoses of current heroin dependence and past-month major depression were obtained using the Composite International Diagnostic Interview version 2.1 (CIDI) (World Health Organisation 1993). Lifetime diagnoses of PTSD, ASPD, and BPD were assessed at baseline. DSM-IV diagnoses of ASPD were obtained using a modified version of the Diagnostic Interview Schedule (Robins et al., 1981); participants were screened for ICD-10 BPD using the International Personality Disorders Examination Questionnaire (Loranger et al., 1994); general mental health was measured using the Short Form-12 (SF-12) (Ware et al., 1996); and DSM-IV diagnoses of lifetime PTSD were obtained using the CIDI (World Health Organisation 1993). Demographics, drug use, criminal involvement, general health, and depression were reassessed at each follow-up, in addition to the number of times participants had commenced treatment for heroin dependence and the treatment duration (Teesson et al., 2008). To maximise participant recall, the time-line follow-back (TLFB) method was used with the 11-year interviews, which anchors interview questions to significant events in participants’ lives (e.g., significant relationships or separations, births of children, deaths) (Hunt & Andrews, 1995). The life chart was conducted at the beginning of the interview, and participants were referred back to the chart when further questions were asked that required the recall of dates.

Statistical analyses

Machine-learning algorithms include a set of predictive techniques, which are particularly useful when there are several predictor variables (i.e., high-dimensional) and the focus is on the generalisability and reliability of prediction. Guidelines from the National Institute on Drug Abuse (NIDA) highlight machine learning as ‘the most promising approach to identify predictive markers for psychiatric disorders and classify psychiatric populations with high-dimensional data’. For the prediction modelling, data preparation included standardisation of continuous predictors and dummy coding of categorical variables. Moreover, as several outcomes presented in the current study are considered ‘rare events’ even in a high-risk sample, an adapted strategy is necessary to address the potential imbalance among those with and without the outcome. Hence, to address the class imbalance, an oversampling strategy was implemented for each outcome. Random oversampling involves randomly duplicating examples from the minority class and adding them to the training dataset. Examples from the training dataset are selected randomly with replacement. This means that examples from the minority class can be chosen and added to the new ‘more balanced’ training dataset multiple times; they are selected from the original training dataset, added to the new training dataset, and then returned or ‘replaced’ in the original dataset, allowing them to be selected again. This technique can be effective for those machine learning algorithms that are affected by a skewed distribution and where multiple duplicate examples for a given class can influence the fit of the model. This was followed by a three-step analysis plan to establish the performance of an ensemble learning model using cross-validation and investigate the importance of each predictor variable for each outcome.

In the first step, we trained and fine tuned an ensemble machine learning model for each outcome separately. Ensemble learning is a machine learning method that produces a single prediction function based on a weighted average of all individual algorithms considered for prediction (i.e., ‘weak learners’) with an optimal trade-off between bias and variance (Pavlyshenko, 2018). The ensemble learning model used here was a combination of random forests (RF), support vector machine (SVMs), and elastic-net algorithms. Using the tidymodels package in R statistical software, a nested iterative cross-validation procedure was used to tune the hyperparameters of each model and to evaluate the predictive power of models on independent observations. The cross-validation procedure partitioned the full sample into subsamples of data, using an iterative process where 25% of data was set aside as the test sample, and a ‘training model’ was fine tuned on the tenfold partitioned observations in the remaining data (training and validation sets). This fine-tuned training model was then used, in the second step, to predict the observations in the set-aside test fold, thereby ensuring the independence of the test fold sample.

In the second step, to measure the predictive performance of each algorithm, a set of prediction indices were calculated based on the model’s ability to predict later adverse outcomes in the test set (Šimundić, 2009). The performance of the machine-learning algorithms was evaluated using sensitivity, specificity, F1 prediction score, and accuracy (ACC). Briefly, sensitivity is defined as the number of true positives divided by the number of true positives plus false positives. Recall is defined as the number of true negatives divided by the number of true negatives plus the number of false negatives. The F1 prediction score is defined as the balance between the precision and the recall. Finally, ACC is defined as the number of true positives plus the number of true negatives divided by the total sample. For all indices, values closer to 1.0 demonstrate better predictive power. Previous cut-points were used to determine strength of prediction with acceptable ranging between 0.6 and 0.75, good ranging between 0.75 and 0.9, and excellent ranging between 0.9 and 1.

To improve the interpretability of our results, the third step focused on determining the relative feature importance of the predictive coefficients for each algorithm. For this, feature importance measures for each of the individual weak learners, namely RF, elastic net, and the SVM models, were extracted using the vip package in R. The Gini importance (or mean decrease impurity), a measure of information gain for each predictor, was used for the RF model, whereas the magnitude of non-shrinked coefficients was used for the elastic net model. As the SVM creates a hyperplane to maximise the distance between the predicted classes, the feature importance can be conceptualised as the vector of coordinates orthogonal to the hyperplane. Finally, the features were ranked from the least to the most important and the mean rank of feature importance was calculated across the three weak learners.

Outcome variables: heroin use, remission, overdose, mortality

Four outcome domains were examined: heroin use, remission from heroin use, non-fatal overdose, and mortality. The risk of experiencing these outcomes over the short (1 year), medium (5 years), and long (10 years) term was modelled for all variables (degree of balance illustrated in Table 1 of the supplementary materials). Due to the availability of data, risk of mortality was additionally modelled over 15 years (Table 1).

Table 1 Outcomes: heroin use, remission, overdose, and death

Past-month heroin use was defined as any heroin use occurring in the month prior to 1-year post-baseline, 5-year post-baseline, and 10-year post-baseline. Consistent with definitions of remission utilised in other areas of public health and community medicine, remission from heroin use was operationalised into three groups: short-term remission (defined as ≥ 3 and < 12 months of continual abstinence from heroin use); medium-term remission (defined as ≥ 12 months and < 5 years of continual abstinence from heroin use); and long-term remission (defined as ≥ 5 years of continual abstinence from heroin use) (Worley, 2017). Non-fatal overdose was defined as any self-reported heroin-related overdose that occurred between baseline and 1-, 5-, and 10-year post-baseline. Mortality from baseline to 1-, 5-, 10-, and 15-year post-baseline was based on data obtained from the AIHW.

Predictor variables

Informed by the literature, 41 baseline predictor variables were selected and included in the analysis, and included predictors relating to demographics, psychopathology, drug use, and treatment (Table 2). With the exception of age and years of school completed which were continuous, and drug used for first high which was categorical, all variables were binary (yes/no, male/female).

Table 2 Summary of predictor variables (n = 41) included in analysis

Results

Cohort characteristics

The baseline characteristics of the ATOS cohort have been described in detail previously and are provided in Table 2 of the Supplementary Materials (Ross et al., 2005). In brief, the mean age at baseline was 29.3 years (standard deviation [SD] 7.8) and 66.2% were male. Participants reported completing a mean of 10.0 years (SD 1.7) of school education, and just under half (45.7%) reported government allowances as their main source of income. More than half (54.6%) had been involved in past month crime and 40.8% had spent time in prison. Participants reported a mean length of heroin-using career of 9.6 years (SD 7.4), and the mean number of drugs used in the past month was 9.0 (SD 1.7). There were high rates of psychopathology, with 24.6% meeting criteria for current major depression, 41.1% for lifetime PTSD, 71.5% for ASPD, and 45.5% screening positive for BPD.

Model performance

The model performance for all outcomes is illustrated in Table 3 (area under the curves [AUC] and receiver operating characteristic curves [ROC] are provided in the Supplementary Materials). For the past-month heroin use outcome, the ensemble learning model showed an acceptable performance (ACC = 0.73, F1 = 0.76) for 1-year post-baseline, a good performance (ACC = 0.83, F1 = 0.84) for 5-year post-baseline, and a good performance (ACC = 0.85, F1 = 0.86) for 10-year post-baseline. For the short-term remission outcome, the ensemble learning model showed a good performance (ACC = 0.87, F1 = 0.87) for 1-year post-baseline, an excellent performance (ACC = 0.91, F1 = 0.91) for 5-year post baseline, and a good performance (ACC = 0.88, F1 = 0.89) for 10-year post baseline. For the medium-term remission outcome, the ensemble learning model showed an acceptable performance (ACC = 0.75, F1 = 0.78) for 5-year post-baseline and a good performance (ACC = 0.82, F1 = 0.85) for 10-year post-baseline. For the long-term remission outcome, the ensemble learning model showed an acceptable performance (ACC = 0.75, F1 = 0.75) for 10-year post baseline. For the overdose outcome, the ensemble learning model showed an excellent performance (ACC = 0.91, F1 = 0.91) for 1-year post-baseline, a good performance (ACC = 0.86, F1 = 0.86) for 5-year post-baseline, and an acceptable performance (ACC = 0.76, F1 = 0.77) for 10-year post-baseline. For the mortality outcome, the ensemble learning model showed a good performance (ACC = 0.85, F1 = 0.85) for 10-year post baseline and a good performance (ACC = 0.89, F1 = 0.89) for 15-year post-baseline.

Table 3 Details of the model performance measures for each outcome

Feature importance

The top 10 most important predictors for all outcomes identified by the ensemble learning model are shown in Figs. 1, 2, and 3. As illustrated, for heroin use occurring in the month prior to 1-, 5-, and 10-year post-baseline, the feature importance analysis highlighted the role of being Australian born, age, age when first got high, daily use of heroin (past month), and overdosed in past 12 months at baseline. Similarly, for short-term remission occurring 1-, 5-, and 10-year post-baseline, the feature importance analysis identified age, age of first heroin use, age when first got high, receiving a government benefit as main source of income, past month cocaine use, past month criminal involvement, and prison history. For medium-term remission occurring 5- and 10-year post-baseline, the feature importance analysis highlighted the important role of age first used heroin, drug used for first high, age when first experienced a high, and residential rehabilitation. Unstable housing and age when first used heroin were identified as prominent features associated with long-term remission.

Fig. 1
figure 1

Top 10 performance and feature importance predictors of heroin use and overdose. Past-month heroin use defined as any heroin use occurring in the month prior to 1-, 5-, and 10-years post-baseline. Non-fatal overdose defined as any self-reported heroin-related overdose that occurred up to 1-, 5-, and 10-year post-baseline. Age, age at baseline; yrs school completed, years of school completed; age first heroin, age when first used heroin; major depression, past-month major depression; pm oth opioids, past month other opioid use; prison hist, prison history at baseline; drug used first high, drug used for first high; resi rehab, in residential rehabilitation; ever od, ever overdosed; pm cannabis, past-month cannabis use; age first heroin, age when first used heroin; sexual trauma, experienced sexual trauma; aspd, antisocial personality disorder; pm cocaine, past-month cocaine use; sev mh disability, severe mental health disability; sev ph disability, severe physical health disability; aust born, born in Australian; first high > 13 years, first got high when aged over 13 years; od past 12mths, overdosed in the past 12 months; pm alcohol, past-month alcohol use; age first heroin 17yrs, aged over 17 years when first injected; age first heroin, age when first used heroin; bpd, borderline personality disorder; interprsnl trauma, experienced interpersonal trauma; ptsd, post-traumatic stress disorder; pm benzos, past-month benzodiazepine use; pm amphetamines, past-month amphetamine use; mmt, in methadone maintenance therapy

Fig. 2
figure 2

Top 10 performance and feature importance predictors of remission. Short-term remission was defined as ≥ 3 and < 12 months of continual abstinence from heroin use; medium-term remission was defined as ≥ months and < 5 years of continual abstinence from heroin use; long-term remission was defined as ≥ years of continual abstinence from heroin use. As with heroin use, remission was modelled over the short (1 year), medium (5 years), and long (10 years) term. Pm crime, involved in past month criminal activity; pm cannabis, past-month cannabis use; age first high, age when first experienced a high; pm cocaine, past-month cocaine use; pm antidepressant, past-month antidepressant use

Fig. 3
figure 3

Top 10 performance and feature importance predictors of mortality. Mortality based on data obtained from the AIHW, and modelled up to 1-, 5-, 10-, and 15-year post-baseline

For non-fatal heroin-related overdose that occurred up to 1-, 5-, and 10-year post-baseline, the feature importance analysis identified age, age when first used heroin, overdose history, experienced interpersonal trauma, past-month use of benzodiazepines, past-month criminal involvement, sex, and years of school completed.

In regard to mortality, the feature importance analysis highlighted age, age of first high, overdosed in the 12 months prior to baseline, and experienced sexual trauma. More detailed graphs illustrating the performance of all predictors in the model across all outcomes are included in the Supplementary Materials.

Discussion

Using 11-year follow-up data from ATOS, the largest and longest prospective longitudinal study of heroin dependence conducted in Australia, this study developed machine learning models to predict the risk of heroin use, remission from heroin use, non-fatal heroin-related overdose, and mortality, at varying follow-up intervals. Despite decades of research and significant economic investment, high levels of health service utilisation (Australian Institute of Health & Welfare, 2021a; Darke et al., 2015), and a large body of research regarding evidence-based treatment (Degenhardt et al., 2019; Hser et al., 2015; Mattick et al. 2009a; Mattick et al., 2014), many people with heroin dependence do not receive the help they need.

One of the complex challenges for clinicians is the synthesis and integration of the rapidly evolving evidence base into information that can guide clinical decision-making, yet many clinicians lack time, resources, guidance, or procedures on how to best synthesise, analyse, or utilise this information in practice to deliver personalised care (DeRubeis et al., 2014; Grol & Grimshaw, 2003; Oxman & Flottorp, 2001). Easy-to-use clinical tools can synthesise vast amounts of research evidence and provide meaningful information to guide clinicians in selecting the right treatment, for the right person at the right time.

This study builds on the strengths of the stacking approach to predictive modelling, which is combining different predictive models into one ensemble model. Ensemble modelling is a recently developed yet a well-established and widely used technique to improve model performance that enables averaging out modelling error from diverse models and thereby enhances the generalisable signal detection (Gunes et al., 2017). The model performances across all outcomes over all follow-up intervals were predominantly good. The exceptions were past-month heroin use at 1-year post-baseline, which was of acceptable performance; short-term remission that occurred 5-year post-baseline, and non-fatal overdose that occurred 1-year post-baseline, which were both of excellent performance. The stacking ensemble learning approach combines predictions from multiple machine learning algorithms and uses these predictions as inputs to a higher-level learning model. Recent studies highlighted that stacking-based predictive modelling is highly effective for time series forecasting and classification problems with highly imbalanced data (Pavlyshenko, 2018).

A strength of the current study is the ability to examine a wide range of predictors. The variables most consistently ranked in the top 10 in terms of their level of importance across outcomes included age; age first got high, used heroin, or injected; sexual trauma; years of school completed; prison history; severe mental health disability; past-month crime; and past-month benzodiazepine use. Somewhat surprisingly given the consistent association of major depression and poor outcome in analysis of the ATOS data and other longitudinal studies of heroin dependence (Malik et al., 2019; Marel et al., 2023), severe mental health disability and the experience of sexual trauma were the only psychiatric variables identified among the most predictive factors in this study. In contrast, the absence of a mood disorder was found to reduce the likelihood of prescription opioid use disorder in a recent systematic review among those commencing prescription opioids for pain (Klimas et al., 2019).

These findings are of particular clinical importance for identifying those most at risk across a number of outcome domains, with implications for prevention and early intervention (for example, school-based programs aimed at delaying age of onset (Gardner et al., 2023), early intervention for people who have experienced sexual trauma).

Findings from this study are broadly consistent with the few studies that have undertaken similar modelling procedures to assess opioid-related risk. The majority of this work has been undertaken to estimate non-fatal and/or fatal overdose risk, using administrative and linked datasets to develop models, such as Medicare or prescription claims (Chang et al., 2019; Heo et al., 2022; Lo-Ciganic et al., 2019), insurance plans (Sun et al., 2020), pharmacy-based opioid therapy (Glanz et al., 2018), prescription claims (Chang et al., 2019; Heo et al., 2022), hospital or public health services (Nguyen et al., 2023; Ripperger et al., 2021; Saloner et al., 2020), the criminal justice system (Ripperger et al., 2021), and death certificates (Nguyen et al., 2023; Saloner et al., 2020). Age, mental health disorder, substance abuse/dependence, tobacco use/abuse/dependence, and long-acting or extended-release opioid formulation were associated with the risk of two-year fatal and non-fatal opioid overdose in a US study utilising data from patients enrolled in pharmacy-based opioid therapy (Glanz et al., 2018). Similarly, being male, younger, and a greater number of benzodiazepine prescriptions were associated with higher odds of non-fatal opioid overdose in a retrospective cohort study, which utilised prescription and patient data in Maryland (Chang et al., 2019). However, the exclusive use of administrative data in these studies prevented the capturing of non-fatal overdose that did not result in hospitalisation, and the risk of non-prescribed opioid use on overdose was not included.

To our knowledge, this is the first study to predict a range of heroin-related outcomes across different follow-up intervals. While recent advances in modelling and machine learning have led to the development of models to assess single outcomes associated with the use of opioids, these have predominantly been limited to prescription opioid use (Glanz et al., 2018), chronic pain-care patients (Klimas et al., 2019), or exclusively based on administrative data (Lo-Ciganic et al., 2019). While these studies have been able to generate large sample sizes from constructed cohorts and incorporate other health outcomes depending on the data sources, the exclusive use of administrative data can be problematic. The sole reliance on administrative data for investigating outcomes associated with highly disadvantaged and vulnerable population groups is likely to result in systematic bias, particularly when data are dependent on self-disclosure in environments that are typically highly stigmatising and marginalising toward people with mental and substance use disorders (Calderwood & Lessof, 2009; Marel & Mills, 2018). These cohorts are also subject to Berkson’s bias (Westreich, 2012), whereby the combination of exposure to a risk (e.g., heroin dependence) and occurrence of disease make it more likely that an individual will present to health services, leading to elevated associations between the exposure and the health outcomes under investigation relative to that which would be found among the broader population of people with heroin dependence.

The findings from this study should be interpreted with several caveats in mind. Firstly, the study was based on self-report measures. While there has been debate surrounding self-report as a tool for measuring substance use, there has been extensive literature consistently demonstrating its validity and reliability in research settings (Jackson et al., 2005; Napper et al., 2010; Teesson et al., 2015). However, participants were asked to report on events up to a 10- to 11-year period, and findings may be subject to recall bias. Secondly, while the demographic characteristics and substance use histories of the ATOS cohort were consistent with previous international and Australian studies of people with heroin dependence (Darke et al., 2002; Gossop et al., 2000; Hubbard et al., 1997), care should be taken in generalising these results to other treatment systems outside Australia. Despite the strengths of using the modelling techniques in this study, one limitation is that direction of effects cannot be attributed to individual variables as they are derived based on two- to three-level interactions across the myriad variables and averaged over three models in the ensemble. As such the ‘direction’ of each predictor cannot be ascertained independently of the numerous other predictors. In addition, the current study used between-group data to assume individual level predictions. While these results might provide some indication of outcomes for at-risk individuals determined at one time point, they do not reveal the potential impact of time-varying covariates and whether within-person trajectories might result in differing outcomes or more accurate predictions. Furthermore, variables used in this model were examined as static predictors. Future research should examine the effects of time-varying risk and protective factors in model development that may further inform the timing of interventions.

In spite of these limitations, the current study provides clinically relevant information on key risk factors associated with heroin use, remission, non-fatal overdose, and mortality among people with heroin dependence. The ability to identify such clinically meaningful outcomes has vital implications for healthcare practitioners and policymakers who lack access to accurate information on which to base decisions on the delivery and timing of targeted interventions.