1 Introduction

Urban rail systems play a central role in the economic growth and development of cities. However, unplanned disruptions have become more frequent due to aging railway infrastructure, fast-growing passenger demand, and overcrowding. Currie et al. [1] investigated unplanned railway disruptions and found that passengers are most concerned about the time when the impacts are over after train delays occur.

Figure 1 illustrates the typical relationship between train delays and the duration of the disruption impacts on passenger trips. The train delay is the time elapsed from the occurrence of the incident until the incident is cleared. The duration of the disruption impact on passenger trips is the time from the occurrence of the incident until passengers no longer experience any residual effects from the incident. The time when the disruption impact on passenger trips is cleared can be later than the time when the train service resumes. Some of the impacted passengers may have started their trips even before the incident began.

Fig. 1
figure 1

Durations of train delay and impact on passenger trips.

Because of the complex and dynamic structure of urban rail systems, the duration of disruption impacts on passenger trips is difficult to predict in the initial phase of the incident. Even after the train service has returned to normal, many passengers could still experience delays, for example, being left behind due to overcrowding inside the system. Therefore, the duration of disruption impacts on passenger trips is not necessarily the same as the duration of train delays, and often the impact duration is longer than the duration of train delays. For example, Liu et al. [2] reported that the duration of disruption impacts to passengers is longer than the duration of train delays by up to 90 minutes.

Generally, disruption studies focus on the duration of incidents or their impact on operations, such as train delays. Such studies, developing predictive models and investigating important factors that impact incident duration, are common in car traffic cases. Valenti et al. [3] summarized recent studies on predicting the duration of traffic incidents. They compared the performance of different incident duration prediction models, including multiple linear regression, decision trees, and artificial neural network models. The results showed that all models achieved good performance in terms of accuracy. Ozbay et al. [4] applied Bayesian networks to model the length of time necessary to clear traffic congestion after an incident. They also examined incident clearance patterns and found that the number of vehicles and road type were important factors that impacted duration. Nan et al. [5] used hazard-based duration models to predict the duration of incidents and found that a wide variety of factors affected the incident duration, including the detection time, weather, and the number of vehicles involved. Wei et al. [6] developed neural network models to predict the duration of traffic accidents, where one model is used to predict the duration at the time when the incident is known, and the other updates the remaining duration continuously until the incident is cleared.

On the rail side, several studies have forecast the duration of train delays (i.e., the incident itself) in urban rail systems. For example, Weng et al. [7] developed accelerated time failure (ATF) models to predict the duration of train delays. They found that the log-logistic AFT model performed the best for the given dataset. Weng et al. [8] developed a maximum likelihood regression tree (MLRT) model. They showed that the incident type, especially the power and equipment failures, significantly impacts train delays. Wang et al. [9] developed a model to predict train delays at each station using a gradient-boosted regression tree model, and achieved reasonable prediction results. The authors found that the main factors causing train delays were severe weather and equipment failures. Lapamonpinyo et al. [10] proposed real-time passenger train delay prediction models using random forest, gradient boosting machines, and multilayer perceptron. They found that the station location and precipitation significantly contribute to the duration of train delays.

Very few studies have focused on predicting the duration of disruption impacts on passenger trips. Conceptually, the impact duration can be different when measured using different types of impacts, such as journey times or crowding. As explained above, passengers can continue to be impacted by the disruption even if the incident is cleared. In a study similar to our paper, Shi et al. [11] used a hazard-based model to predict the duration of passenger flow congestion in urban rail systems under typical conditions (i.e., no disruption). The duration of passenger flow congestion is defined as the time that the system experiences a state of congestion (i.e., the passenger flow volume exceeds a predefined threshold). Our paper focuses on the modeling of the duration of impacts of unplanned disruptions on passenger trips. It deals with both the inference of the impact duration and understanding complex factors that influence it.

An important constraint in developing predictive models of impact duration on passenger trips is that the duration of the impacts is not directly observed or reported by automated data collection systems such as automatic vehicle location (AVL) and automated fare collection (AFC). This may partially explain why the literature on predicting the impact duration on passenger trips is limited, despite its importance. Some studies focus on determining (but not predicting) the duration of disruption impacts on passengers from various perspectives. For example, Gu et al. [12] developed a framework that combines offline passenger flow with online detection methods for real-time anomaly demand detection using smartcard data, which aids in decision-making in pre-warning management. Chen et al. [13] used the number of exiting passengers at stations to detect system anomalies, using a random matrix theory (RMT) approach. The duration of disruption impact on passengers is then estimated as the duration of abnormal exiting passenger patterns identified. Wang et al. [14] used a robust principal component analysis (rPCA) to detect abnormality in entry and exit passenger flows, and then applied the ST-DBSCAN algorithm to determine the duration of disruption impacts for different events. Wang et al. [9] proposed a real-time model using passenger outflows to detect abnormalities. Malandri et al. [15] used the link volume over capacity (link load/link capacity) at different time windows to infer abnormalities, where the duration of disruption impacts on passengers corresponds to the period of the abnormality. Yap et al. [16] used passenger delays, defined as the difference between actual journey time and scheduled journey time (calculated from the timetable), to predict how often different types of disruptions occur at different stations. Lastly, Webb et al. [17] used AFC, AVL, and general feed transit specification (GTFS) data to estimate passenger waiting time when buses are delayed.

While these approaches can be used to infer the duration of disruption impacts on passengers, they have some limitations. For example, some studies require both AFC and AVL data, while others use scheduled journey times for all origin–destination (OD) pairs. However, AVL data may not be accurate or may be mostly unavailable during disruptions (depending on the incident type). In addition, some studies use model-derived attributes to infer the impact duration (e.g., link loads) which may lead to estimation bias. A more robust approach is needed for estimating the disruption impact duration on passenger trips. Lastly, these approaches focus on a specific type of performance metric (e.g., journey time) in defining and estimating the disruption impact on passenger trips. A deeper understanding of factors contributing to disruption impact duration defined using different performance metrics is useful in practice.

In view of these limitations, the main contributions of this work are as follows:

  • Proposal of a robust method to estimate the duration of disruption impact on passenger trips using AFC data. It uses a piecewise linear regression model and a probabilistic model to estimate the start and end times of the disruption impact. An important feature of the method is that the identification of the duration is automatic and relies only on AFC data (which are consistently available).

  • Development (and comparison) of statistical and machine learning approaches (multiple linear regression [MLR], accelerated failure time [AFT], and random forest models) to predict the duration of unplanned disruption impacts in terms of important performance metrics (from the passenger’s perspective). It examines the problem from two perspectives: predictions of actual duration and the interval that the duration belongs to. It also explores important factors influencing disruption impact durations on passenger trips.

The remainder of the paper is organized as follows. The methodology section presents methods used to identify the duration of disruption impacts on passenger trips, the explanatory variables for model development, and the identified candidate models. The case study section develops impact duration prediction models using real-world data from a major metro system and discusses important factors that influence the disruption impact duration. The final section concludes the paper and discusses future research directions.

2 Methodology

This section describes the research framework and methods for computing response variables and prediction models. It also discusses variables that capture important factors that influence impact duration.

2.1 Research Framework

Figure 2 shows the research framework for predicting the duration of disruption impacts on passenger trips. It consists of four modules: input data, response variable inference, explanatory variable development, and model building.

  • Input data: includes AFC data for closed systems (tap-in/tap-out stations and times are recorded), incident records (information of when, where, and what type of disruptions occurred), and network.

  • Response variable inference: involves inferring the disruption impact duration on passenger trips from multiple perspectives (passenger accumulation and average journey time at both system and line levels) using smartcard data. This variable is not directly observed and has to be inferred from related information, as explained later in the section.

  • Explanatory variable development: identifies broad categories of explanatory variables that may influence the disruption impact duration, including incident characteristics, operating conditions (e.g., headways), infrastructure (e.g., above-/underground), external factors (e.g., weather), and demand. AFC data are used to estimate passenger demand such as passenger arrivals at a station and the number of passengers inside the system before the disruption occurs.

  • Model building: involves the explanatory variable selection, predictive models, and performance evaluation. Several models are developed to predict the actual duration of the impacts and the interval to which the impact duration belongs.

Fig. 2
figure 2

Research framework.

The input data are AFC, incident logs, and network structure (path choice fractions for OD pairs). The disruption impacts on passenger trips (response variables) are measured using the following metrics: (1) passenger accumulation (number of passengers in the system), (2) passenger accumulation on the impacted line, (3) system-wide average journey times, and (4) average journey time on the impacted line. The first two metrics are measures of crowding based on the number of passengers both at stations and in trains. The last two are measures of delays that passengers experience under disruptions. As mentioned above, the duration of disruption impacts on passenger trips may be different from the duration of the train delay since the former may last longer than the time it takes for the train service to resume.

However, the actual duration of the impact is not observed. This is a critical input to any predictive model, since it is the dependent variable, and training of any model requires observations of this quantity along with corresponding explanatory variables. We propose a method to automatically infer the actual duration from AFC data, based on the identification of breakpoints in the time series representing the metric of interest. Candidate explanatory variables are incident characteristics, operating conditions, infrastructure, external factors, and passenger demand levels. Lastly, various predictive models are examined for their potential, including MLR, AFT, and random forest.

2.2 Response Variables

This module consists of two components: (a) the definition of metrics for the disruption impacts on passenger trips, and (b) the automated inference of the duration of the impact from AFC data.

2.2.1 Impact on Passenger Metrics

As mentioned in the previous section, four metrics are proposed to capture the impact of incidents on passengers from different perspectives using AFC data: (1) passenger accumulation (number of passengers) in the system, (2) passenger accumulation on the impacted line, (3) system-wide average journey time, and (4) impacted line based average journey time.

Passenger Accumulation After an incident occurs, the number of passengers inside the system typically increases due to long waiting times. Accumulation mainly captures the level of crowding on the platforms and trains. Passenger accumulation in the system at time T is defined as the number of passengers in the system at that time (either on stations or trains). This metric is useful to understand how disruption impacts crowding in the system in general or the impacted line specifically. It is calculated as the difference between the actual number of passengers who enter the system and the actual number of passengers who exit the system by time T [2].

$${N}^{acc}\left(T\right)={\int }_{0}^{T}\left[{N}^{ent}\left(t\right)-{N}^{ext}\left(t\right)\right]dt$$
(1)

where \({N}^{acc}(T)\) is the number of passengers in the system at time T, \({N}^{ent}(t)\) is the number of passengers who have entered the system by time t, and \({N}^{ext}(t)\)is the number of passengers who have exited the system by time t.

Passenger accumulation at the line level is the total number of passengers in stations and trains related to the line. It is estimated as the difference between the number of passengers who enter or transfer into any station of the line and the total number of passengers who exit or transfer out of any station of the line [2].

$${N}_{l}^{acc}\left(T\right)={\int }_{0}^{T}{X}_{l}^{+}(t)dt-{\int }_{0}^{T}{X}_{l}^{-}\left(t\right)dt+{\int }_{0}^{T}{Y}_{l}^{+}\left(t\right) dt-{\int }_{0}^{T}{Y}_{l}^{-} \left(t\right) dt$$
(2)

where \({N}_{l}^{acc}(T)\) is the number of passengers accumulated on line \(l\) at time \(T\). \({X}_{l}^{+}\left(t\right)\) is the number of passengers entering any station of line \(l\) at time \(t\), and \({X}_{l}^{-}(t)\) is the total number of passengers exiting line \(l\) at time \(t\). \({Y}_{l}^{+}\left(t\right)\) is the number of passengers transferring to line \(l\) at time \(t. {Y}_{l}^{-}(t)\) is the number of passengers transferring out of line \(l\) at time \(t\).

\({X}_{l}^{+}\left(t\right)\) and \({X}_{l}^{-}(t)\) can be calculated directly from AFC records. The calculation of transferring passengers \({Y}_{l}^{+}\left(t\right)\) and \({Y}_{l}^{-}\left(t\right)\) is not straightforward. We estimate the number of transfer passengers at time \(t\) by assigning OD flows to paths using path choice fractions (estimated from surveys) and the journey times from the origin to the transfer stations of the line [2].

$${Y}_{l}^{+}\left(t\right)={\sum }_{od}{\sum }_{p\in {P}_{od}}{\sum }_{s\in {S}_{l}}{\pi }_{odh}^{p}{q}_{odh}{\delta }_{sp}^{l+}$$
$${Y}_{l}^{-}\left(t\right)={\sum }_{od}{\sum }_{p\in {P}_{od}}{\sum }_{s\in {S}_{l}}{\pi }_{odh}^{p}{q}_{odh}{\delta }_{sp}^{l-}$$
(3)

where \({q}_{odh}\) is the passenger flow of OD pair \(\left(o,d\right)\) in time period \(h\), and \({\pi }_{odh}^{p}\) is the fraction of passengers of OD pair \(\left(o,d\right)\) using path \(p\in {P}_{od}\) at entry time period \(h. {\delta }_{sp}^{l+}=1\) if station \(s\) is the transfer station into line \(l\) along path \(p\); 0 otherwise. \({\delta }_{sp}^{l-}=1\) if station \(s\) is the transfer station out of line \(l\) along path \(p\); 0 otherwise. The time periods of interest \(h\) are defined as \(h=t-{\tau }_{os}^{p}\), with \({\tau }_{os}^{p}:\) the journey time from origin \(o\) to station \(s\in {S}_{l}\).

Average Journey Time Journey time is an important level of service metric. In general, passengers experience longer journey times than usual when an incident occurs due to longer waiting times and a high probability of being left behind due to overcrowding. The system-level journey time is the average journey time of all trips in the system [2].

$$JT(t)=\frac{1}{n(t)}\sum_{i=1}^{n(t)}[{tt}_{i}^{o}-{tt}_{i}^{in}]$$
(4)

where JT(t) is the average journey time in the system in time period t, \(n(t)\) is the total number of trips during period t, and \({tt}_{i}^{o}\) and \({tt}_{i}^{in}\) are tap-out and tap-in times, respectively, of trip i. The line-level journey time is the average journey time of all trips on a line [2].

$${AJT(t)}_{e}=\frac{1}{{n\left(t\right)}_{e}}\sum_{i=1}^{{n(t)}_{l}}[{tt}_{i,e}^{o}-{tt}_{i,e}^{in}]$$
(5)

where e is the line of interest.

In summary, these four metrics capture how disruption impacts passengers’ travel from different aspects. Passenger accumulation mainly captures the crowding level inside the system. Journey times capture both the status of the train service and the crowding levels. The metrics can be calculated directly using data from closed AFC systems (with both tap-in/tap-out information).

2.2.2 Impact Duration Inference

The duration of the disruption impact on passenger trips for each of the above metrics is not observed directly. An automated inference method for the disruption impact duration is proposed based on a comparison of the values of the metric of interest on the incident day to a baseline representing typical days. A typical day is a day with normal operations, regular demand patterns, and no disruptions. The baseline performance metric is defined as the average value of the metric over several typical days having similar characteristics as the incident day (e.g., time of day, day of the week). The approach consists of two steps: (a) a piecewise linear regression model to identify breakpoints in the time series of the performance metric of interest, and (b) a probabilistic model to associate breakpoints to the time when the disruption starts to impact passengers and the time the impact is over by comparison to those of typical days.

(a) Breakpoint analysis A piecewise regression model is used to identify breakpoints in the time series of the difference between the incident and typical days for each metric [18]. Breakpoints \({b}_{i}\) are defined by the time where the slope of the linear regression changes. \(B\) is the set of all breakpoints identified, and \(B=\left\{{b}_{i}, \forall i\in {1,2},\dots ,n\right\}\). If \(n\) breakpoints are used, the piecewise linear regression model is defined as

$${Y}_{1}={{\beta }_{ 1}+\alpha }_{1}t \,for\, t\le {C}_{1}$$
$${Y}_{2}={{\beta }_{ 2}+\alpha }_{2}t\, for \,{C}_{1}<t\le {C}_{2}$$
$$\ldots$$
(6)
$${Y}_{n}={{\beta }_{n}+\alpha }_{n}t \,for\, {C}_{n-1}<t\le {C}_{n}$$
$${Y}_{n+1}={\beta }_{ n+1}{ + \alpha }_{n+1}t\, for \,t>{C}_{n}$$

where Y is the difference between the baseline and the incident day for the corresponding metric (e.g., passenger accumulation), t is the time of day (minutes after midnight), and \({\alpha }_{i}\) and \({\beta }_{ i}\) are parameters to be estimated (\(i = {1,2}, \dots , n+1\)). Note that other kernel functions could be used for the breakpoint analysis, for example, the polynomial function to capture the nonlinear time series trend (with the breakpoint as the point where the gradient or changes are nearly zero) [19].

(b) Inference of impact start and end times Among all detected breakpoints on the incident day, it is necessary to identify the ones that most likely represent the time when the disruption starts to impact passengers and the time when the impact ends. A probabilistic method is used to infer the breakpoints associated with the start and end times of the impact.

We assume that values of the metric of interest \({Z}_{t}\) at time interval \(t\) on typical days are normally distributed \({Z}_{t}\sim \mathcal{N}\left({\mu }_{t},{\sigma }_{t}\right)\) with the mean \({\mu }_{t}\) and standard deviation \({\sigma }_{t}\). \({\mu }_{t}\) and \({\sigma }_{t}\) are estimated using observations from all typical days for the time period \(t\) (e.g., 7:00–7:15). Let \(P\left({Z}_{t}>{z}_{t}^{inc}\right)\) be the probability that the value of metric \({Z}_{t}\) on a typical day is greater than the value observed on the incident day, \({z}_{t}^{inc}\).

$$P\left({Z}_{t}>{z}_{t}^{inc}\right)=\Phi \left(\frac{{z}_{t}^{inc}-{\mu }_{t}}{{\sigma }_{t}}\right)$$
(7)

where \(\Phi \left(z\right)\) is the cumulative distribution function (CDF) of the standard normal distribution. A low value of \(P\left({Z}_{t}>{z}_{t}^{inc}\right)\) means that the performance metric (e.g., delay times) on the incident day is considerably larger than that on typical days, indicating that it is likely that passengers have been experiencing the disruption impact at the corresponding breakpoint time. Based on this probability, a label describing the state of the breakpoint is assigned:

$${l}_{b}= \left\{\begin{array}{l}normal, \quad if P \left({Z}_{{t}_{b}}>{z}_{{t}_{b}}^{inc}\right)\ge \epsilon \\ abnormal, \quad otherwise\end{array}\right.$$
(8)

where \({l}_{b}\) is the label of the state of breakpoint \(b\), and \(\epsilon\) is a predetermined threshold.

Note that the assumption in Eq. 7 is based on the observed distribution of the metric of interest (e.g., OD journey time) under normal operation conditions (without disruption). However, the inference method of impact start and end times is not constrained to such assumption since it uses the CDF in Eqs. 7 and 8 for the detection. Given enough sample data, the empirical CDF could be used without a need for the distribution assumption.

To infer the disruption impact start time, \({t}_{s}\), and end time, \({t}_{e}\), the reported incident (train delays) start time \({t}_{s}^{inc}\) and end times \({t}_{e}^{inc}\) are used as a reference. Let \({B}_{s}\) be the set of candidate breakpoints that are labeled as normal and whose corresponding times are earlier than the reported incident start time \({t}_{s}^{inc}\) by no more than a threshold \({\tau }_{s}\).

$${B}_{s}=\left\{b|0\le {{t}_{s}^{inc}-t}_{s}\le {\tau }_{s} \,\,and\,\,{l}_{b} = normal, \forall b\in B\right\}$$
(9)

If set \({B}_{s}\) is not empty, the disruption impact start time \({t}_{s}\) is inferred as the time of the candidate breakpoint in \({B}_{s}\) which is closest to the reported incident start time \({t}_{s}^{inc}\). If \({B}_{s}\) is empty, the start time of the impact is set equal to the start time of the incident.

$$\begin{array}{c}{t}_{s}=\left\{\begin{array}{l}{t}_{b}^{*},\quad such \,that {t}_{b}^{*}\ge {t}_{b}, \forall b\in {B}_{s}, if {B}_{s}\ne \varnothing \\ {t}_{s}^{inc}, \quad otherwise\end{array}\right. \\ \end{array}$$
(10)

To identify the end time of the impacts, the set of candidate breakpoints \({B}_{e}\) consists of breakpoints that are labeled as normal and whose corresponding times are later than the reported incident end time \({t}_{e}^{inc}\) by no more than a threshold \({\tau }_{e}\).

$${B}_{e}=\left\{b|{0\le {t}_{b}-t}_{e}^{inc}\le {\tau }_{e}\,and\, {l}_{b}=normal, \forall b\in B\right\}$$
(11)

If \({B}_{e}\) is not empty, the disruption impact end time \({t}_{e}\) is inferred as the time of the breakpoint in \({B}_{e}\) which is closest to the reported incident end time \({t}_{e}^{inc}\). If \({B}_{e}\) is empty, the impact end time is set equal to the incident end time.

$$\begin{array}{c}{t}_{e} = \left\{\begin{array}{c}{t}_{b}^{*}, \quad such \,that {t}_{b}^{*}\le {t}_{b}, \forall b\in {B}_{e}, if {B}_{e}\ne \varnothing \\ {t}_{e}^{inc}, \quad otherwise\end{array}\right. \\ \end{array}$$
(12)

Figure 3 illustrates an example of the inference approach, using the average journey time on the impacted line as the metric of interest.

Fig. 3
figure 3

Example of the inference of times when the disruption impact starts/ends. For example, at 18:50, the difference of average journey time between incident and baseline days is close to 0, which indicates high confidence in the normal operation condition (with \(P\left({Z}_{{t}_{b}}>{z}_{{t}_{b}}^{inc}\right)\)=0.85 using Eq. 7).

The set \(B\) of the eight breakpoints identified from the piecewise regression model is indicated using the red line. The two black lines show the reported times when the incident starts/ends in the incident log. For a breakpoint \(b\in B\), the associated probability \(P\left({Z}_{{t}_{b}}>{z}_{{t}_{b}}^{inc}\right)\) that the average journey time \({Z}_{{t}_{b}}\) on a typical day exceeds the observed average journey time \({z}_{{t}_{b}}^{inc}\) on an incident day is reported in the table (right side of Fig. 3). Without loss of generality, we assume that the breakpoint state label threshold is \(\epsilon =0.05\), and the time threshold \({\tau }_{s}={\tau }_{e}=60\,{\rm min}\).

Based on Eqs. 9 and 10, the set \({B}_{s}\) of the candidate start points consists of breakpoint \({b}_{1}\) (it has a time 40 minutes before the reported incident start time of 11:00 and is labeled as normal with a probability \(P\left({Z}_{{t}_{b}}>{z}_{{t}_{b}}^{inc}\right)\) of 0.68). The impact start time \({t}_{s}\) is inferred as the time of breakpoint \({b}_{1}\) (i.e., 10:20).

Based on Eqs. 11 and 12, the set \({B}_{e}\) of the candidate end points consists of breakpoints \({b}_{6}\) and \({b}_{8}\) (they have times later than the reported incident end time and are labeled as normal observations with probability > 0.05). The impact end time \({t}_{e}\) is inferred as the time of the breakpoint \({b}_{6}\) which is closest to the incident end time (i.e., 15:50).

Given the inferred impact start and end times, the duration of the disruption impact is estimated as 330 minutes (with respect to the average journey time on the impacted line), compared to 270 minutes for the duration of the incident reported in the incident log (train delays). That is as expected, as the duration of the impact on passengers is usually longer than that on trains.

2.3 Explanatory Variables

Potential explanatory variables that influence the disruption impact duration belong to five main categories: (a) incident characteristics, (b) operating conditions, (c) infrastructure, (d) external factors, and (e) demand [7, 8, 11].

  1. (a)

    Incident Characteristics

  • Incident type There are six main causes of incidents according to previous studies [20]: power failure, rolling stock failures, signaling and telecommunications failures, operating failures (e.g., personnel, passengers), subway turnout failures, and crashes.

  • Incident location Incidents can impact a single station or multiple stations. Normally, if incidents impact across multiple stations, train delays and duration of disruption impacts on passengers tend to be longer.

  • The severity of incidents Since most metro systems have experienced many incidents, transit agencies may have a reasonable estimate of the severity of incidents immediately after they occur. The severity of incidents can be characterized as minor, moderate, or severe.

  • Time of day During the morning or afternoon peak hour, the number of passengers in the system as well as frequency is typically the highest. Therefore, incidents during peak hours could impact more passengers and have cascading effects.

  • Day of the week The frequency of train service is different between weekdays and weekends. The characteristics of the passengers are different as well (e.g., more commuters use the system on weekdays, whereas more non-commuters use it on weekends).

  • The number of trains involved Incidents involving more than one train typically have a longer duration of disruption impacts since more time is required to restore service.

  1. (b)

    Operating Conditions

  • Frequency of service The frequency of service varies based on the time of day, day of the week, and line. If headways are small (high frequency), even short delays can propagate easily throughout the entire line.

  • Resources Available resources (e.g., staff) when an incident occurs may influence the duration of the disruption impact. If the incident occurred during the day, it may be more easily cleared than during the night.

  1. (c)

    Infrastructure

  • Control system Traditional signaling systems are based on the fixed block concept. The track is divided into multiple blocks that govern train separation and speed. As signaling control technology improves, more systems adopt moving block systems, and communications to control trains and keep them at a braking distance. Moving block systems increase line capacity. Signal system design, therefore, may affect the duration of impacts.

  • Line length As the impacted line becomes longer, the duration of the disruption impacts may increase, as time to achieve clearance may be longer.

  • Transfer station Transfer stations typically have higher demand than other stations. If an incident occurs at a transfer station, it impacts multiple lines. Furthermore, an incident at a transfer station likely has a more severe (longer) impact on passengers than at a non-transfer station.

  • Terminal station If an incident occurs at a terminal, the duration of the disruption impacts might not be long., as a train can be removed to an alternative track at crossover and a new train can be dispatched immediately. Thus, the impact is isolated and does not propagate.

  1. (d)

    External factors


External factors, mainly weather conditions, may impact the duration of a disruption impact, especially in a system with sections of the railroad aboveground. As a result, incidents occurring under extreme weather conditions may have different impact duration depending on the section of the line on which they occur. The percentage of stations above can be used as a proxy to capture the impact of weather conditions.

  1. (e)

    Demand


The level of demand when an incident occurs affects the duration of the disruption impact on passengers. Generally, high demand and high number of passengers inside the system lead to longer durations. The impact of demand is captured as the difference between the demand in the time periods before the incident occurred and the typical day demand.

  • Entry passenger flow The number of passengers who enter the system/impacted line reflects the status of passenger demand at the time of the incident. Higher demand is expected to increase the incident impacts.

  • Exit passenger flow Exit flows provide a measure of system status. High values of exit flow tend to indicate that the system may be in the recovery phase.

  • Passenger accumulation The number of passengers in the system at the time when the incident occurs may impact the duration. Incidents that occur when the passenger accumulation is high might be associated with a longer duration.

  • Urbanization level This is defined as the daily demand at the impacted station divided by the average daily demand at all stations. It measures how much the demand at the impacted station is above the average station level and captures the urbanization level of the station.

Most of the data required to calculate the explanatory variables listed above are available in databases maintained by transit authorities. Google Maps and other application programming interfaces (APIs) can be used to obtain information regarding the characteristics of metro systems, weather, and other factors of interest.

2.4 Model Building

This study examines two cases of impact duration prediction. The first aims to predict the actual duration of the disruption impacts, while the second focuses on predicting the interval that the impact duration belongs to (e.g., 10–20 minutes). The latter, from a practical point of view, is adequate for applications.

2.4.1 Duration Prediction

For the duration prediction model, the MLR and AFT models are considered. The MLR model, which is widely used, is a method for modeling the linear relationship between explanatory and response variables. The AFT model is a type of hazard-based duration model, with common AFT duration distribution models including Weibull, log-normal, and log-logistic. The dependent variable is the inferred disruption impact duration (Fig. 3), and the independent variables are described in the explanatory variables section. It has been used extensively in the transportation field, especially to predict the duration of traffic accidents [5]. The model shows good performance in predicting the duration of train delays [7, 8, 11].

2.4.2 Interval Prediction

For predicting the specific time interval that the impact duration belongs to, random forest models are considered. Random forests are an ensemble method for classification and regression problems. They extend decision tree methods by generating trees based on a random sample of available explanatory variables [21]. As a result, the trees used for the prediction are not correlated. The method is robust and reduces problems with overfitting.

The dependent variable is the duration interval based on the inferred incident impact duration, and the independent variables are described in the explanatory variables section. Previous studies indicate that random forest models predict accident duration satisfactorily [3].

2.5 Model Validation

The fivefold cross-validation method is used to validate the MLR and AFT models. Typical performance metrics are used to measure the model accuracy, including mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). The fivefold cross-validation approach is also used to validate the performance of the random forest model for interval prediction.

3 Case Study

A major metro system with several lines and almost 100 stations is used in this case study. The control system for most of the lines uses the moving block technology, while some of the lines use a fixed block signaling system. Some heavy rail lines are located exclusively underground, one is located aboveground, and the rest are both above and below ground.

3.1 Data

AFC data, incident logs, and path choice fractions for a period of 6 years are used for the analysis. The anonymized AFC data include tap-in time/station, tap-out time/station, and card type. As such, the AFC data provides direct information about OD flows by the time of day. Path fractions determine the percentage of passengers that use alternative paths for a given OD and time period. The incident logs record incident information including date, start time, impacted line/station, incident cause, and a description of the incident. There are a few extreme incidents caused by train collisions, large-scale signaling, and other equipment failures. These extreme incidents were not included in the analysis, resulting in a dataset of 327 incidents.

The incident types are organized into six groups based on their characteristics. Figure 4a shows the frequency of train delays (time elapsed from the occurrence of the incident until the service returns to normal) by type. Figure 4b shows the boxplot of train delays by type including the median value and 25% and 75% percentile values. The severity of the incidents varies significantly. For example, there are only 10 incidents caused by power distribution issues, but the median duration of these incidents is 107 minutes. On the other hand, there are 157 incidents caused by signaling and telecommunications issues, but the median duration is 56 minutes.

Fig. 4
figure 4

Number of incidents (a) and incident delay (b) by type.

3.2 Model Estimation

Detailed results are presented for the average journey time in the system (Sys_tt) metric. For the rest of the metrics, the model performance is similar. Table 1 summarizes the explanatory variables used in the final model specification.

Table 1 Explanatory variables (in parenthesis the name used in the discussion).

Demand-related variables were calculated from AFC data. The rest of the variables were obtained from incident logs, websites [22], and a previous study [23]. The response variable impact duration is estimated from the AFC data using the method described in the previous section. As discussed in the previous section, two cases of prediction are considered: actual impact duration prediction and interval prediction.

3.2.1 Actual Duration Prediction

Actual duration prediction models are developed using MLR and AFT approaches. Table 2 summarizes the estimation results of various models for the system-level average journey time (Sys_tt) metric. The column “parameter” shows the estimated coefficients, and the column “p-value” indicates its corresponding significance level. The variable is interpreted as significant in explaining the variability in the dependent variables if its p-value is less than or equal to 0.05.

Table 2 Summary of MLR and AFT model estimation results.

Of the various incident types, public (staff or passenger error) issues contribute the least to the duration, while power failures have the largest impact. This result is consistent with a previous study by Weng et al. [7], reporting that incidents caused by power failure cause the longest train delays. A moving block system is more likely than a fixed block system to cause a shorter impact duration. Moving block systems can automatically maintain the distance between trains, which may help to achieve faster service recovery. Disruption impacts last longer on weekends, probably due to lower frequency and lack of transit staff. Incidents during heavy rain result in longer impact duration, probably because part of the system is aboveground, and braking and acceleration are impacted when the rail track is wet. As expected, the initial conditions, i.e., the severity of the incident, have a major effect on the duration of the impacts on passenger trips.

Figure 5 compares the actual and prediction results of system travel time (Sys_tt) for each model. The red line shows the 45-degree reference line. The results show that there are some incidents (actual impact duration between 30 and 80 minutes) that were overestimated by the models. The log-logistic AFT model seems to cluster the impact duration in two separate groups, with few points predicted in the middle range.

Fig. 5
figure 5

Sys_tt prediciton results for MLR, Weibull AFT, Log-logistic AFC´T and Log-normal AFT models

Table 3 shows the model performance in terms of MAE, RMSE, and MAPE. These metrics are calculated from the average score of the fivefold cross-validation. The three AFT-based models have similar prediction performance. The MLR model has slightly better prediction accuracy than the AFT models.

Table 3 MAE, RMSE, and MAPE of Sys_tt.

3.2.2 Duration Interval Prediction

A random forest model is used to predict the duration interval of disruption impacts for various ranges of duration intervals (10, 15, 20, and 30 minutes). The model utilizes the same set of explanatory variables as the actual duration models in the previous section. For evaluation, a metric based on the confusion matrix in classification problems is used, as reported by Stehman [24], providing information about prediction errors for each class. The score \({r}_{ij}\) for each pair of actual and predicted duration intervals is defined as

$${r}_{ij}=\frac{{m}_{ij}}{{n}_{i}}$$
(11)

where \({n}_{i}\) is the number of incidents that belong to interval i, \({m}_{ij}\) is the number of incidents in interval i that are predicted to belong in interval j, and \({r}_{ii}=1.0\) means that all incidents have been predicted by the model to belong to their actual interval.

Figure 6 shows the score \({r}_{ij}\) results of the random forest model for 10-, 15-, 20-, and 30-minute intervals. The x-axis represents the predicted duration interval and the y-axis the actual duration interval. The color scheme indicates the fraction of incidents in interval i, the interval that has been predicted to belong in interval j. The darker the color of the cells on the diagonal, the higher the prediction accuracy is. The accuracy increases marginally as the duration interval becomes larger. Consistent with the actual duration prediction results, the model has a relatively low prediction accuracy when the actual impact duration interval is between 60 and 140 minutes.

Fig. 6
figure 6

Prediction accuracy of the random forest model: a 10-minute; b 15-minute; c 20-minute; d 30-minute

The relation between duration and duration interval models are connecting in parallel to different application contexts when providing information to passengers or operators under disruptions. For example, the duration model produces the expected delay time information (a quantitative value, e.g., 15 minutes), while the duration interval model gives the estimate as a delay time interval (a range value, 10–20 minutes). From the passenger’s perspective, knowing the average delay time is equally important as knowing the range of the delay when they make their travel decisions under disruptions.

4 Conclusion

This paper studies the problem of inferring and predicting the duration of disruption impacts on passenger trips using smartcard data in urban rail systems. Regression and machine learning models are developed to explore the contributing factors, including incident characteristics, infrastructure, demand, and external factors. The methods are validated in a case study with data from a crowded metro system.

The results show that the MLR model performs better than AFT models in predicting the impact duration, while the decision tree model performs well in predicting the duration interval. Generally, the model prediction error is acceptable, though in a few cases the error is high. The factor analysis shows that disruptions caused by power failures have longer impact durations on passenger trips. It also indicates that the type of the signal control system affects the duration as well. Incidents on lines with a moving block signaling system had shorter durations than those on lines with a fixed block. Other important factors include the time of day, day of the week, weather, area affected, disruption severity, and disruption cause (e.g., platform, rolling stock, and communication systems). Future studies will employ more data to identify the main causes of high prediction errors in impact duration intervals. Also, it would be interesting to analyze the impact of the factors on the impact duration between the time when the impact on passengers ceases and the time when the incident is cleared.