Risk Chain Mining of Hazard Sources in Metro Operation System Safety: A New Method to Mine and Control Risk for Safety Management

Xiaobing, Ding; Gan, Shi; Zhigang, Liu; Hua, Hu

doi:10.1007/s40864-023-00192-3

Risk Chain Mining of Hazard Sources in Metro Operation System Safety: A New Method to Mine and Control Risk for Safety Management

ORIGINAL RESEARCH PAPERS
Open access
Published: 15 May 2023

Volume 9, pages 147–178, (2023)
Cite this article

Download PDF

You have full access to this open access article

Urban Rail Transit Aims and scope Submit manuscript

Risk Chain Mining of Hazard Sources in Metro Operation System Safety: A New Method to Mine and Control Risk for Safety Management

Download PDF

Ding Xiaobing ORCID: orcid.org/0000-0003-1739-7684¹,
Shi Gan¹,
Liu Zhigang¹ &
…
Hu Hua¹

1653 Accesses
Explore all metrics

Abstract

With the promotion of the major strategy of national transportation power, the super-large-scale metro operation network has become an inevitable trend, and operation safety has become increasingly prominent. Metro operation dispatch logs and accident reports were taken as the research object, the hazard sources were efficiently and accurately identified, the risk chains of hazard sources were mined, and the risk evolution mechanism was revealed. Firstly, the transportation lexicon was constructed to improve the accuracy of word segmentation, the text features were extracted based on the term frequency-inverse document frequency (TF-IDF) algorithm, and the key eigenvalues were mined to identify the key hazard sources. Secondly, pattern matching was used to extract explicit causality, and the Self-Attention BiLSTM extracted implicit causality and integrated event trigger word position identification to enhance the effectiveness of implicit causality extraction. Finally, an example was given to verify the efficiency and accuracy of the model, the experiments showed that the F value of extraction effect was increased by nearly 10%, the extraction accuracy was 87.53%, the identification of operational hazard sources was more accurate, and the visualization of risk evolution law was realized.

A Review of Research into the Standardization of High-Speed Railway Text Operation and Maintenance Data

A chemical accident cause text mining method based on improved accident triangle

Article Open access 02 January 2024

Importance Evaluation of Factors for the Railway Accidents Based on TF-K

1 Introduction

At the beginning of 2021, the CPC Central Committee and the State Council announced the outline of the national comprehensive three-dimensional transportation network planning, outlining the blueprint of the national comprehensive three-dimensional transportation network, while the rail transit is the backbone of the comprehensive transportation network.

In recent years, rail transit failures have occurred frequently: for example, a sudden power supply failure in the running section from Century Park to Zhangjiang High Tech of Shanghai Metro Line 2 paralyzed the section for up to 5 h, resulting in the early warning of class I large passenger flow; the escalator equipment of Beijing Metro Line 4 failed and passengers fell, resulting in a stampede accident, resulting in one death, two serious injuries and 26 minor injuries; when the train stopped in a tunnel due to sparks in the carriage of Guangzhou Metro Line 8, frightened passengers staged a tunnel escape. Through in-depth analysis of previous accident cases, it can be seen that the operation risk is transmitted in a chain from the occurrence to the last accident. It is very important to explore the risk chain and diagnose and prevent it in time. In case of failure or accident during the operation periods, the transmission process will show a chain structure. The safety of metro operation has become a difficult problem that must be faced.

Urban rail transit operation risk chain refers to an orderly sequence of hidden dangers in the process of urban rail transit operation, which are not found and controlled in time, and then transmitted successively to form a chain transmission effect, resulting in accidents against will. Because of the obvious transmissibility of rail transit hazard sources, for example, the power failure of the overhead contact system in elevated station will affect the normal operation plan. If the operation plan changes and the passenger flow organization of the station changes abruptly, the station may produce huge hidden dangers such as sudden large passenger flow and platform congestion, forming a risk chain transmission chain, as shown in Fig. 1.

Different risk chains are intertwined and superimposed to form a risk chain network, and the impact of accidents will expand in series, resulting in unpredictable serious disasters. Therefore, the importance of hazard control and governance of rail transit operation under super large-scale network is self-evident.

This study presents a novel approach that integrates data mining and information fusion technologies to analyze the safety status of metro operation based on accumulated data. A systematic multi-source, high-dimensional, and heterogeneous collection and analysis experiment is designed to mine and integrate safety-related data, reveal the correlation mechanisms behind hazard sources, and form risk chain groups. By identifying key factors in the risk chain, this study provides a methodological framework for breaking the chain in a timely manner and establishing special emergency and response plans. It also serves as a theoretical and decision-making basis for improving metro operation safety and service level under a super-network mode, enhancing social image, and formulating and revising relevant industrial standards and specifications.

2 Literature Review

At present, there are some methods of hazard identification in academia based on accidents, such as grey relational analysis, analytic hierarchy processing, probability statistics, and subjective experience. Because of the widespread subjectivity, these methods are unable to scientifically determine the key hazard sources.

Various studies have investigated aspects of hazard identification. For example, He [1] studied the hierarchical optimization method of hazard identification and evaluation of metro operation, strengthened the quantification of the index system and reduced the error caused by subjectivity. Some scholars looked for hazard sources through the fault tree method. Yang [2] applied the fault tree method to identify the hazard sources of train operation and determine the risk factors, which has a certain application value. However, the fault tree method can only carry out simple causal reasons, and it is not suitable for the analysis of complex systems. Subsequently, in order to overcome this defect, Yang [3] and others introduced the Bayesian network method into the hazard identification of railway passenger station and incorporated the uncertainty fuzzy theory as a powerful supplement and improvement to the safety analysis of complex systems. Some scholars tried to improve the shortcomings of the above algorithm by using rough set theory. Their basic idea is to screen out the key factors from several attributes and identify the hazard sources. Wang [4] and others used rough set theory to identify the risk factors of construction. On this basis, Jia [5] and others combined rough set and genetic algorithm to study the key risk factors of identified hazard sources, which was more objective and accurate than other methods. With the development of big data mining technology, Ding [6] deeply studied the data mining algorithm, modeled the rail transit dispatch log of a mega city, mined the main hazard sources of rail transit, and developed an intelligent hazard identification system based on a data warehouse. Zhou et al. [7] created the subway construction safety risk early warning index system based on historical data. Zhang et al. [8] determined the causes of construction accidents through 571 investigation reports of construction accidents collected and combined with grey correlation analysis. Hei [9] used the text mining method to analyze the hidden dangers of subway construction, and visually displayed the redundant text data information, providing strong support for the hidden danger investigation. Xu et al. [10] added information entropy term weighting to keyword importance evaluation to solve the impact of accident reports with different lengths on text mining results. Other scholars have studied the comprehensive application and design of machine learning (ML) algorithms in construction safety. Bugalia et al. [11] used an extensive experimentation strategy consisting of input data processing, n-gram modeling, and sensitivity analysis, which made great contributions to data classification, but the ML classifiers faced challenges in distinguishing between “unsafe act (UA)” and “unsafe condition (UC).” Therefore, they made improvements based on the research object.

The concept of risk chain communication comes from the fields of project management, trade finance, media, public opinion and culture. There is little research in the field of rail transit safety management. Chapman [12] first put forward the idea of risk chain, and then the risk management theory was applied in engineering areas. Kangari et al. [13] studied and verified the risk chain from the perspective of the supply chain, and proposed that the risk chain is a series of chain effects caused by a logical relationship, which forms interrelated risks. On the basis of previous studies on risk chain, Liu [14] analyzed the derailment data from the FRA rail equipment accident database for the period 2001–2010, accounting for frequency of occurrence by cause and number of trains derailed. The statistical analyses were conducted to examine the effects of accident cause, which showed that broken rails or welds were the leading derailment cause on main, yard, and side tracks. These analyses can provide a guidance method for mining the causes of rail traffic accidents. Liu [15] reported that there are mutual coupling and transmission relationships among risks based on supply chain according to the transmission relationship among risks. Ma [16] divided the risk chain into four types: cause and effect chain, combination chain, replication chain and migration chain. Wang [17] constructed an unsafe event data analysis model with regular expression and pattern matching technology, and then established the matching model of high-speed railway derailment-based external environment risk factors. The two models were applied to the occurrence of unsafe events. Moradi [18] proposed that a challenging problem in risk and reliability analysis of complex engineering systems (CES) is performing and updating risk and reliability assessments on the whole system with sufficiently high frequency. Li et al. [19] studied the risk element transmission theory, which laid a solid theoretical basis for risk identification. Cao et al. [20] used Monte Carlo simulation to simulate the interaction of risk chain and constructed the risk chain evaluation model. In the field of engineering, there are abundant researches on the chain transmission of faults and accidents, which were reflected in the cause analysis, causal factors and Relationship Research of accidents. However, there are few studies that can provide reference for the construction of rail transit operation risk chain. Li et al. [21] proposed a knowledge reasoning method based on the ontology modeling of an accident chain scenario to conduct correlation analysis on hidden dangers and clarify the cause mechanism of accidents. Qiu et al. [22] handled unstructured accident cases in a structured way and proposed the grid operation method to mine possible accident cause chains, so as to achieve targeted prevention and emergency disposal. Fu [23] developed a general modeling and analysis procedure for risk interactions based on association rule mining and weighted network theory.

To sum up, many domestic and international scholars have conducted extensive research on the identification algorithm of hazards, risk classification, and control in the engineering field. However, there are few research achievements regarding the identification of hazard sources in metro operation. In the railway transportation and mining engineering field, there are some sporadic hazard source identification algorithms that are primarily concentrated on the fault tree analysis method and the association rule algorithm. However, these methods can only identify the cause combination of the accident and cannot uncover the transmission chain between hidden dangers. Other studies mainly focus on the accident cause model of hazard sources, but there is almost no research on the chain coupling mechanism and risk chain between traffic and transportation failure (accident) cause factors. Therefore, research on the risk chain based on major hazard sources has important practical significance. Big data and data mining algorithms have been widely used in China, but mainly for commercial purposes or bank loan risk assessment, with less focus on rail transit safety management. The structure of metro operation safety data is complex, heterogeneous, multi-source, and extremely difficult to collect. It is necessary to set up acquisition equipment, build data transmission networks, design high-speed algorithms with parallel computing capabilities, and explore the construction of data mining algorithm groups to mine and identify rail transit operation hazards. The current metro operation safety management and control lack pertinence, making it difficult to handle accidents and faults quickly. Metro operation is a complex and large-scale engineering system, involving many hazard sources. Each hazard source requires an accurate emergency or disposal plan, as well as an accurate starting mechanism and conditions.

So, as an emerging technology in the era of big data, text mining combines risk analysis with big data is the general trend. The directional edge in the risk chain is related to a certain probability, which depends on the possibility of the development direction of the event and shows the direction orientation of risk occurrence and transfer. It has great research value for accident prediction and auxiliary decision-making.

3 Problem Description

3.1 The Basic Problem

With the expansion of the operation scale of urban rail transit, super-large-scale network operation has become a trend, the operation complexity has increased, the cascade impact among lines is obvious, and the potential safety hazards in the operation process have continuously increased. The operation process mainly includes line passenger flow (including stations, platforms, elevators and transfer channels), dispatch logs, equipment operation status, account records, and rail damage, which are mostly stored in text form and partially unstructured. It is particularly important to analyze the data and explore the key hazard sources leading to accidents and the relationship among hazard sources.

In this paper, we explore how to deeply mine the risk transmission mechanism behind the data from the massive rail transit hidden danger data and provide support for the identification and chain management and control of operation hazard sources, which is an unavoidable problem that needs to be solved by metro operation enterprise. For operation risk management of urban rail transit, many domestic and foreign scholars have done considerable basic research on the identification algorithm of hazard sources, risk classification and risk control in the engineering field, and preliminarily explored the hazard source mining and risk chain transmission mechanism in the transportation field. On the basis of analyzing the existing research, they comprehensively used data mining, machine learning, and natural language processing (NLP) to conduct in-depth research on the key risk sources of rail transit operation and realize the effective risk transmission among the key risk sources of rail transit operation.

3.2 Research Technology Route Structure

This paper proposes a method of mining and constructing risk chain of metro operation hazard sources based on text mining and causality extraction. It identifies the urban rail transit operation hazard sources from unstructured text, takes the hazard sources as the core, explores the risk transmission law from the perspective of chain risk transmission to build the causal risk chain, and identifies the key cause points, changes the development direction of the accident by controlling the key cause points, and can provide strong support for the early warning and prediction of urban rail transit operation safety. The research plan is shown in Fig. 2.

4 Hazard Source Mining Based on Dispatch Logs and Accident Reports

Data from each department of rail transit operation have both semi-structured and unstructured characteristics, which makes data collection, standardization and informatization difficult. We used a collection of multi-source, high-dimensional and heterogeneous data, mainly including passenger flow videos, public works maintenance data, dispatch logs, station accounts (paper version), delay data of more than 15 minutes, and automated fare collection (AFC) passenger flow data. Mining rail transit operation safety hazards involves a data acquisition scheme, data flow algorithms, data semantic definition and standardization, in-depth study of rail transit operation data structure, conceptual description and definition of data, and establishing an abstract structure that can describe fault data and a semantic description framework that specifically expresses internal associations. Finally, a data warehouse that can be used for rail transit operation safety data mining is constructed.

4.1 The collection of operation safety data

(1) Dispatch log

The Metro enterprise has stored a large amount of train fault information and rescue information in their daily operation. With the continuous accumulation of operation, a large number of dispatch logs have been formed, which has high guiding value for operation practice, as shown in Table 1.

(2) Accident reports

Table 1 Operational scheduling failure log record form (partial)

Full size table

In order to make up for the possible limitations and shortcomings of the data in the operation dispatch logs, 110 urban rail transit operation accident reports at home and abroad are collected and stored in a folder as a text file (txt) for data mining supplement.

There are three ways to obtain accident reports:

i.
Search and collect reports related to urban rail transit operation accidents through relevant safety management websites;
ii.
Collect the accident data analyzed in relevant references and obtain the key information and relevant materials of the accident through network query;
iii.
Search for reports related to urban rail transit operation accidents from the official website of metro operation units.
iv.
The collected data cannot fully cover rail transit operation accidents, so representative data were selected for mining and analysis, and some accident reports are shown in Table 2.

Table 2 Accident report (partial)

Full size table

4.2 Text Data Preprocessing

(1) Construction of word segmentation lexicon

For Chinese word segmentation, words that have a professional lexicon but are not included in the lexicon and must be segmented will affect the text word segmentation. The third-party library Jieba package is loaded for word segmentation in the Python environment. In order to ensure the effect of word segmentation, the professional word library and out-of-service word library of rail transit and transportation are established. If the user-defined lexicon and deactivated lexicon are configured, the specific process of word segmentation is shown in Fig. 3.

(1) Custom lexicon configuration

Jieba package can recognize some words, but the effect of proper noun segmentation is not ideal. For example, “door fault” is often divided into “door” and “fault,” but “door fault,” as a proper noun of urban rail transit operation fault, should be extracted as a whole rather than further divided. In addition, some words are difficult to recognize because they are colloquial. Therefore, before word segmentation, a user-defined lexicon should be built, and a total of 804 urban rail transit hazard-related words and 406 urban rail transit professional terms will be built to form a user-defined thesaurus.

(2) The stop-word lexicon configuration

Stop-words affect the text word segmentation, and there is a large amount of data to cover up the real meaningful feature words. Removing stop-words and retaining high-correlation feature words can improve the efficiency of text mining. The common stop-words involved are as follows:

(i)
Common adverbs, conjunctions, and exclamations, such as “of,” “make,” “after” and so on, which have high frequency and no redundancy;
(ii)
The selected corpus is the failure log and accident report text of urban rail transit operation. The words that appear frequently but have no practical significance, such as “urban rail transit,” “safety” and the names of Metro stations, are also merged into stop-words.

The stop-word lexicon of the Machine Laboratory of Sichuan University and the stop-word lexicon of the Harbin Institute of Technology were included. Finally, a stop-word lexicon containing 3050 words is formed.

(2) Word segmentation result

The configured word segmentation system is used to process the rail transit operation dispatch log and accident report text. Some word segmentation results are shown in Table 3, resulting in 20,876 words in total.

Table 3 Word segmentation result (part)

Full size table

There are too many original feature items obtained from the first word segmentation, and it contains a lot of useless information, which has a significant impact on the statistical analysis results of key words of operation hazard sources.

Previous research [20] shows that when the proportion of words with strong discrimination in topic representation is less than 1/10 of the mining document set, the amount of the original data set is large, and it is necessary to select and calculate the eigenvalues of the original word segmentation results. It can be seen from Table 3 that words such as “traffic dispatching,” “fault” and “station” cannot be used as semantic expressions affecting the operation safety of urban rail transit. On this basis, it is necessary to select eigenvalues and screen out words unrelated to the hazard sources of metro operation, which is an important step for further text mining.

4.3 Selection of Key Eigenvalues

Text feature representation methods mainly include square root function, Boolean function, and term frequency-inverse document frequency (TF-IDF) function. The TF-IDF function is the most widely used vector space model, which can take into account the change of word frequency and the distinction of text semantic features. A TF-IDF algorithm is selected to screen the eigenvalues.

TF-IDF algorithm consists of TF and IDF. TF refers to word frequency, which indicates the ratio of the number of occurrences of the word in the text to the total number of occurrences in the text after removing the stop-word. The higher the word frequency, the greater the weight of the word in the text. IDF refers to the inverse document frequency, which means that if all texts in the total text contain word W, the word W has little significance to the whole text. On the contrary, if only a few texts contain word w, and w can be expressed as the theme of the total text. The TF-IDF algorithm combines the characteristics of TF and IDF, filters some common words that are meaningless to the topic, and can filter out the content reflecting the topic. TF-IDF calculation is shown in Eq. (1):

$$\left\{ \begin{gathered} {\text{TF}} - {\text{IDF}} = tf_{i,j} \times idf \hfill \\ idf_{i} = \log \frac{\left| D \right|}{{DF_{i} }} \hfill \\ \end{gathered} \right.$$

(1)

in which $\left| D \right|$ refers to the total number of documents;

${\text{DF}}_{i}$ refers to the frequency of the text containing the eigenvalue;$tf_{i,j}$ indicates the frequency of occurrence of characteristic value $t_{i}$ in J-Class documents.

After the word segmentation of Jieba package, 20,876 words are obtained, which contain a large number of words that have no significant effect. Further, the TF-IDF algorithm is used to calculate the text eigenvalue and delete the words with low TF-IDF value. First, we perform smoothing processing according to Eq. (1) to prevent $DF_{i}$ from being 0 and calculate the score word as TF-IDF value, as shown in Tables 4 and 5:

Table 4 High TF-IDF characteristic value

Full size table

Table 5 Low TF-IDF characteristic value

Full size table

Through calculation, the words with higher TF-IDF value can better represent hazard sources of rail transit operation. The TF-IDF values of characteristic values are sorted, and the top 5% TF-IDF values are selected as key characteristic words. Because of the large amount of data, the mining process cannot be completed at one time. Relevant steps need to be improved, including the update of word segmentation thesaurus and the selection of keywords, so as to improve the accuracy of key eigenvalue selection.

According to the above process, through three rounds of mining analysis, the final mining statistical results of 76 key feature words are obtained for subsequent cluster analysis of key hazard sources.

After obtaining the key feature words, in order to achieve the visual effect of the final data expression, tableau is used for visual display, and the final statistical results are displayed in the form of word cloud diagram, making the key subject words more obvious and intuitive. The word cloud diagram is shown in Fig. 4.

The causes and degrees of faults in different dispatch fault logs and accident reports are different, but through word cloud visualization, we can find the common and universal key eigenvalues in these text data, which greatly affect the operation safety of urban rail transit and should be analyzed as key risk factors.

In Fig. 4, 75 key features were obtained from text mining according to certain rules. The figure is arranged from inside to outside according to the text size and color depth, the higher the weight of the key feature items, the larger the font, the darker the color and the more centered. As can be seen from Fig. 4, the first five risk factors are large passenger flow, power failure due to network contact, PSD failure, foreign matter intrusion and train delay, which is consistent with the actual situation.

4.4 Identification of Key Hazard Sources

Through the analysis of 75 hazard source keywords identified by text mining, there are multiple expressions of the same keyword, which needs to be further classified and summarized to obtain the hazard sources of urban rail transit operation shown in Table 6.

Table 6 Key hazards in urban rail transit operations

Full size table

5 Risk Chain Construction Method Based on Key Hazard Sources

5.1 Parameter Construction of Risk Chain Edge

5.1.1 Determining the Direction of the Risk Chain

There is no clear definition of risk chain in academic circles. Risk chains usually spread in a chain-like manner, which is the sum of a series of explicit and implicit risks [24, 25]. Because of the particularity and complexity of urban rail transit operation, the internal risk relationship of the system is complex and has the characteristics of obvious chain transmission. Usually, the occurrence of accidents is caused by the continuous diffusion after the occurrence of the original risk event. The risk chain expresses the correlation between risk elements and the risk transmission path, provides strong support for the control of key risk points, and is conducive to preventing the source of risk and reducing the possibility of risk transmission.

Only some key hazard sources are obtained through the calculation in 4.4. In the face of massive operation data, in order to obtain the disaster risk chain of key hidden dangers, it is necessary to reduce and extract information for the whole life cycle of key hidden dangers. The flow chart shown in Fig. 5 is designed to realize the construction of the direction of the risk chain.

The specific operation steps are designed as follows:

Step1. Build risk pairs to data set E [n] = {[R₁, R₂],…,[R_i, R_j], …}, which indicates that there are n items in the data set, for example, in the data set E[1]=[R₁,R₂], R₂ risk is caused by R₁ risk, which indicates that R₁ and R₂ constitute a risk data set. The construction method of risk data set is described in details in Section 5.2;

Step 2. Execute risk chain construction function get_ RiskChain();

Step 3. Obtain a total of N causal data sets E[n]=[R₁,R₂],…,[R_i,R_j] ;

Step 4. Set the hazard source as S₀, which belongs to an event in R₁... R_j, then search all events related to S₀ and build the risk chain caused by it;

Step 5. Traverse all N data sets and search all data sets which are related to the existence of S0;

Step 6. When a data set containing S₀ is found, it is determined that another data R with causal relationship is a new hazard source S₁;

Step 7. Take S1 as the search object, traverse the n-item data set again to find the next risk point related to S1 until all risk points with causality are found;

Step 8. Save and output all S, so the risk chain constructed by S₀ has been completed;

Step 9. End the risk chain construction function, return the built risk chain and output them.

5.1.2 Determination of Risk Chain Edge Transfer Probability

The main purpose of risk chain construction based on key hazard sources is to reveal the law and mode of risk development. In order to measure the possibility of risk causal transmission, the risk transfer probability is marked on the edge of the risk chain. Transfer probability refers to the possibility of risk R_j when risk R_i occurs, that is, the probability of risk R_j caused by risk R_i, as shown in Eq. (2):

$$P(R_{i} \backslash R_{j} ) = \frac{{{\text{count}}(R_{i} ,R_{j} )}}{{\sum\nolimits_{k} {{\text{count}}(R_{i} ,R_{k} )} }}$$

(2)

in which ${\text{count}}(R_{i} ,R_{j} )$ indicates the frequency of R_i and R_{j ,}${\text{count}}(R_{i} ,R_{k} )$ indicates the total number of all possible risk events when Ri occurs.

Because some risk tuples in the extracted causality pairs are not strongly related to the occurrence of hazard sources, these events should not be linked to the causality risk chain and will become isolated points [26].

Through experimental calculation, it can be obtained that the edge transfer probability of 70% of the data are greater than 0.48, and the other 30% are lower than this value. Furthermore, the data dispersion is strong, the edge transfer probability of most risk tuples is lower than 0.1, indicating that these risk tuples have low correlation with hazard sources and do not form the edge of the risk chain. Therefore, we set the pruning threshold to 0.48 and delete the risk event pairs with low importance to improve the accuracy of risk chain construction. Finally, the risk chain based on key hazard sources is obtained as shown in Fig. 6, where R* represents the key hazard source, R1, R2..., R8 represent the associated risk factors between the risks leading to R*, the direction of the arrow represents the causal relationship between the risk factors, and the dotted line represents that the edge transfer probability between risk pairs is lower than the pruning threshold.

5.2 The Construction of Risk Data Set

The extraction of causal relationship between risk points is the basic work of risk chain construction. The collected operation accident report and dispatch fault log data are fused to form a data warehouse for mining, the risk events with causality are extracted, and the risk pairs are formed to construct the risk chain candidate set. When extracting causality from text information, it is found that some texts contain explicit causal connectives or causal cues, such as “because,” “because of,” “so,” “cause,” and “therefore.” This causality is called explicit causality. Text information that does not contain causal cues but expresses causality accounts for about 2/3 of the data, which is called implicit causality. This paper extracts the causal relationship of risk events in the text information to form risk pair data sets. The construction process is shown in Fig. 7.

5.2.1 Explicit Causality Extraction Based on Pattern Matching

Explicit causality is the most common in text, and it is the main source of causality extraction. The extraction of explicit causality in metro operation safety accident text mainly aims at the extraction of event tuples in causality clauses and sentences, and expresses the causality between events by inputting explicit causality sentences. To extract explicit causality, it needs to build causality matching rules and templates to match the cause clause and result clause in the sentence, then continue to split the cause clause and result clause, and express the split results in a structured form to form event pairs. The explicit causality extraction framework is shown in Fig. 8.

Sentences with explicit causality have obvious causal cues. The causal semantics are formally expressed through language patterns. Firstly, several specific causal patterns are determined, and then the sentences in the text are matched with those associated with the same semantics one by one. The advantage of this method is that it is relatively accurate and consistent with people's habits.

There are a large number of causal suggestive words in the text, which can represent the relationship of causal sentence patterns. Sentences with explicit causal relationship can be divided into terminal type, middle type and supporting type. Based on the above three syntactic patterns and data text features, this paper summarizes five categories of causal cue words suitable for causality extraction in the field of rail transit. Table 7 shows the syntactic patterns matched by each causal cue word type, in which Pi represents the i-th syntactic pattern.

Table 7 Causal syntactic patterns and causal cue word correspondence table

Full size table

The causality in the text is abstracted into five specific syntax patterns, and the corresponding extraction matching rules are designed as follows:

$$\begin{array}{*{20}l} {{\text{(i)}}\;{\text{rule}}\;{1}} \hfill & {} \hfill & {{\text{(ii)}}\;{\text{rule}}\;2} \hfill \\ {{\text{if }}w_{1} \in cue1{\text{ and }}w_{i} \in sign} \hfill & {} \hfill & {if \, w_{i} \in cue2{\text{ and }}w_{1} \notin cue4} \hfill \\ {{\text{then s}}_{i} \in P_{1} } \hfill & {} \hfill & {{\text{then s}}_{i} \in P_{2} } \hfill \\ {{\text{cause}}_{{P_{1} }} = \{ w_{2} ,...,w_{i} \} {\text{ and}}} \hfill & {} \hfill & {{\text{cause}}_{{P_{2} }} = \{ w_{1} ,...,w_{i - 1} \} {\text{ and}}} \hfill \\ {{\text{effect}}_{{P_{1} }} = \{ w_{i + 1} ,...,w_{n} \} } \hfill & {} \hfill & {{\text{effect}}_{{P_{2} }} = \{ w_{i + 1} ,...,w_{n} \} } \hfill \\ \end{array}$$

$$\begin{array}{*{20}l} {({\text{iii}})\;{\text{rule}}\;3} \hfill & {} \hfill & {({\text{iv}})\;{\text{rule}}\;4} \hfill \\ {if \, w_{i} \in cue3{\text{ and }}w_{1} \notin cue6} \hfill & {} \hfill & {i{\text{f }}w_{1} \in cue4 \, and \, w_{i} \in cue5} \hfill \\ {{\text{then s}}_{i} \in P_{3} } \hfill & {} \hfill & {{\text{then s}}_{i} \in p_{4} } \hfill \\ {{\text{effect}}_{{P_{3} }} = \{ w_{1} ,...,w_{i - 1} \} \;{\text{and}}} \hfill & {} \hfill & {cause_{{p_{4} }} = \{ w_{2} ,...,w_{n - 1} \} \;{\text{and}}} \hfill \\ {{\text{cause}}_{{P_{3} }} = \{ w_{i + 1} ,...,w_{n} \} } \hfill & {} \hfill & {effect_{{p_{4} }} = \{ w_{i + 1} ,...,w_{n} \} } \hfill \\ \end{array}$$

$$\begin{gathered} ({\text{v}})\;{\text{rule}}\;5 \hfill \\ if \, w_{1} \in cue6\;{\text{and}}\;w_{i} \in cue7 \hfill \\ {\text{then s}}_{i} \in P_{5} \hfill \\ {\text{effect}}_{{P_{5} }} = \{ w_{2} ,...,w_{i - 1} \} \;{\text{and}} \hfill \\ {\text{cause}}_{{P_{5} }} = \{ w_{i + 1} ,...,w_{n} \} \hfill \\ \end{gathered}$$

in which sign represents the punctuation mark in the sentence, cause represents the reason clause, and effect represents the result clause. S_i refers to the i-th sentence in the text,$\{ w_{m} ,...,w_{n} \}$ indicates the text content between the m-th word and the n-th word in the sentence.

The cause and result clause extracted from the text are expressed in the form of tuples so as to form a tuple pair of causal risk events, and prepare for the sorting of subsequent risk transmission relationship and the construction of risk chain. When urban rail transit operation accidents occur, there are often a variety of hazard sources involved, which are represented in the form of risk chain, and it can better describe the sequence of risks. The trigger word described in the text sentence is usually the predicate, that is, the event trigger word, which can clearly express the trigger of the event. The words before and after the event trigger word are event arguments [27], which describe the object, participant, time, place of the event.

The pattern matching method is used to extract the cause and result clause, and the extraction results are shown in Table 8.

Table 8 Examples of reason and result clauses under pattern matching

Full size table

The extraction of reason and result clause need to be expressed in a structured form to form a causality diagram. Therefore, the event triplet [24] to express the cause and structure events was introduced. The most common event representation is the event triplet, i.e. e = (S, P, O), which is expressed as a text event [subject, predicate, object]. This expression is often used in the ontology description structure in the security field. P, S and O are, respectively, represented as actions, roles and events executed by roles on actions. In the process of causality extraction, each event contains at least one event trigger word (i.e. P).

The extraction of event tuples usually uses the semantic role of language technology platform (LTP) [25] to mark each reason and result sentence and count whether there is (S, P, O). If there is, the event tuples are extracted directly. For the reason and result sentences that cannot be directly marked with event triples, the dependency parsing framework is used for extraction, and the extraction steps based on dependency syntax are as follows:

Step1: The lexicon is used to store the syntactic dependent child nodes of each word in the sentence, and the storage location represents the relationship between the word and the corresponding child nodes; then store the parent node and extract the dependency relationship;

Step2: According to the dependency structure that does not generate the word, record the part of speech and dependency relationship between the word and the parent node;

Step3: Cyclic extraction of words with verb object relationship and attribute postposition relationship;

Step4: For the words in the extracted subject and object, find the words with related dependency structure and delete the irrelevant words;

Step5: Extract the identified subject or object as an event tuple.

5.2.2 Implicit Causality Extraction Based on Machine Learning

There are still a large number of causal relationships without causal cues in the text information, and the text of urban rail transit operation safety accidents is no exception, which is called implicit causality. For example: “As soon as the train of urban rail transit line 10 bound for Jinsong failed, it stopped at Suzhou Street station for repair. The whole line was affected, resulting in passenger detention and overcrowding. Therefore, Haidian Huang Zhuang station was forced to take current limiting measures.” The description of causality is not obvious, but it can summarize a causality chain: train fault→line influence→congestion→current limiting measures.

In contrast to explicit causality, the extraction of implicit causality often needs to analyze and judge whether there is causality according to the context content because there is no causal prompt word. Based on this, the process of implicit causality extraction is to extract the events in the sentence first, then filter to form candidate risk event pairs, and finally complete causality extraction by using the deep learning framework. The two-way LSTM (long short-term memory network) method based on the self-attention mechanism is used to extract the implicit causality. The input of the two-way LSTM model based on the self-attention mechanism is the sentence set marked with event trigger words. The implicit causality extraction is regarded as a classification problem. Based on the classification model, the qualified causal event pairs are selected from the candidate event pairs. This method is based on the statistical model and is a machine learning method. The method of extracting the implicit causality of event tuples is shown in Fig. 9:

For the extraction of implicit causality, we should first extract the event tuple, select the verb as the event trigger word through part of speech filtering, then extract the event triplet, and finally get the candidate event pair. The generation steps of candidate event pairs are as follows:

Step1: Filter word classes of the text sentence to be mined, and keep the verb as the trigger word of the event;

Step2: Using the dependency parsing method, keep the subject part and object part corresponding to the trigger word as the subject and object of the event tuple;

Step3: Screen out the components related to the subject and object of the event trigger words in the dependency parsing results;

Step4: Express the event tuple in a structured form: {subject and subject related components, trigger word and trigger word related components, object and object related components}, which can also be expressed in the form of similar event triples:$E = [s + s_{0} ,t + t_{0} ,o + o_{0} ];$

Step5: Generation of candidate causal pairs: define that the sentence s contains m events $s = \{ E_{1} ,E_{2} ,...,E_{m} \} ,$ pair all the events contained in the sentence into event pairs, for example: $\langle E_{i} ,E_{j} ,{\text{direction}}\rangle ,$ in which direction indicates the causal relationship type, and ${\text{dierection}} = \{ 1, - 1,0\} ,\;1 \le i \ne j \le m$ direction=1 indicates that E_i is cause, E_j is effect. If direction= - 1, it is just the opposite. Define direction=0, it means that there is no causal relationship between E_i and E_j. The logarithm of events that can be generated in each sentence , and it can be expressed: $\frac{m(m - 1)}{2},$ in which m indicates the number of events in a sentence. For instance, “Through video monitoring, the staff found that there was water seepage in the ceiling of the substation. The station management immediately closed the substation below Chongwenmen station, and the normal operation of some sections of line 5 was forced to be interrupted,” so E={e₁: water seepage in substation ceiling; e₂: Close the substation under Chongwenmen station; e₃: normal operation was forced to be interrupted }, then, the resulting event pairs containing event set E are combined in pairs to form candidate causal pairs, such as $\langle e_{2} ,e_{3} ,1\rangle .$

With the development of computer technology, attention mechanism [28, 29] is widely used to deal with natural language analysis tasks. Attention mechanism combines the internal experience of biological observation with the external feeling and can be applied to the extraction of sparse data features. By calculating the probability distribution of attention, we can not only highlight the key inputs that have a great impact on the final output results, but also analyze the relationship between model input and output.

In order to improve the experimental effect, the location identifier of event trigger words is added to the self-attention mechanism model, and a two-way LSTM model based on the self-attention mechanism is established to enhance the accuracy of implicit causality extraction. The location identifier of the event trigger word is marked as pi, and the model is marked as Self-Attention BiLSTM+Pi, based on which the implicit causality extraction model includes the following layers:

(1)
Word embedding layer: transform the text information into vector expression, the sentence sequence s contains M words, after training, the sequence vector of the sentence is obtained and expressed as: $s = \{ X_{1} ,X_{2} ,...,X_{m} \} ,$ The word vector corresponding to the ith word can be expressed as $X_{i} = [W_{i}^{{{\text{word}}}} ,P_{i}^{{{\text{position}}}} ],\;W_{i}^{{{\text{word}}}}$ is the text word vector, $P_{i}^{position}$ is position identification (PI), $P_{i}^{{{\text{position}}}} = [d_{i} ,v_{i} ],\;d_{i}$ represents the vector formed by the distance between the trigger word and other words in the text, in which: $v_{i} = \{ 0,1\}$, where $v_{i} = 1$indicates a trigger word, and $v_{i} = 0$indicates other words.

(2)
BiLSTM layer: consists of a forward LSTM network and a reverse LSTM network. The forward and reverse directions indicate recognition in the forward and reverse order of sentences, respectively, and it can be used to process the text information of the context. The module is composed of many neural network modules, which are combined to realize the function of information transmission and sharing; meanwhile, a memory unit is designed to store historical information to avoid long-term and short-term dependence. The LSTM has a “gate” structure, including a Tanh layer and a Sigmoid Layer, which are used to filter useful information and transfer it to the next period, which are shown as Eqs. (3) and (4):
$$h_{i} = \vec{[}LSTM \oplus LST\mathop{M}\limits^{\leftarrow} ]$$
(3)
$$H = [h_{1} ,h_{2} ,...,h_{m} ]$$
(4)

(3)
Self-attention layer: the linear combination of M LSTM implicit vectors in H is used to represent variable length sentences, which are encoded into fixed vectors or matrices. In addition, the self-attention mechanism is introduced to give different importance to different words, as shown in Eqs. (5) and (6):
$$A = soft\max (w_{1} \tanh (w_{2} H^{T} ))$$
(5)
$$M = HA$$
(6)

in which:$H \in R^{m*2u}$ represents the set of all hidden states H, w₁ represents the weight matrix, w₂ represents the parameter matrix, M is a matrix level sentence obtained by the calculation of Eq. (5).

(4)
Output layer: use softmax classifier to classify the candidate causal events obtained in sentence s to the relationship represented by <e_i, e_j> and obtain the classification result label.
$$h^{*} = \tanh (M)$$
(7)
$$\hat{p}(y\left| s \right.) = soft\max (w^{(s)} h^{*} + b^{(s)} )$$
(8)
$$\hat{y} = \arg \, \max \hat{p}(y\left| s \right.)$$
(9)

The direction of the final output of the Eqs. (7)–(9) model represents the relationship category of event tuple pairs, and the event tuple pairs with labels that is not 0 are filtered to obtain the risk event tuple pairs with causal relationship.

5.3 Risk Pair Extraction and Analysis

This section adopts the metro operation safety accident investigation report and operation dispatch fault log and collects 110 accident reports and 30,027 dispatch fault log records. According to Section 4, after the configuration of the word segmentation lexicon and the preprocessing of word segmentation, with the help of LTP4.0 platform, “Jieba,” “Chon,” “re” and regular expression square, we judge the explicit causal sentences of the text data, in the studied accident report and dispatch log text data sources, 95,601 sentences were collected, including 21,988 sentences with causal prompts and 73,612 sentences without causal prompts, accounting for 23% and 77%, respectively.

5.3.1 Explicit Causality Risk Pair Extraction and Results Analysis

Because the raw data are primarily composed of operational dispatch fault logs and accident reports, the matched sentences generated by the causal relationship pattern may contain only causal keywords such as P1, P2, P3, P4 or P5, without explicit causal relationship information about the hazards. To more accurately extract valid matching risk events and remove invalid data, this paper defines matching sentences that contain hazards and related information as valid matching sentences. The specific identification method is to loop through the hazard information extracted in Section 4.4 in the matching sentences using SQL statements. If it exists, the matching sentence is identified as a valid matching sentence. Partial specific matching cases are shown in Table 9.

Table 9 Results of identification of valid matching sentences (partial)

Full size table

Valid matching risk event pairs refer to risk event pairs that can clearly represent the causal relationship after the effective matching sentences are extracted. Table 10 shows partial results of effective matching risk event pairs extraction. The reason clause is stored in the “reason” column, the result clause is stored in the “result” column, and the subject and object of the effective risk event pair are stored in the “cause” and “effect” columns, respectively.

Table 10 Explicit causality of accident report extraction results (partial)

Full size table

The proposed pattern matching-based display causality extraction method in this paper evaluates its causality extraction effectiveness using precision as the metric. The calculation formula is shown in Eq. (10):

$$E_{{{\text{Accuracy}}}} = \frac{{E_{{{\text{match}}}} }}{{{\text{Total}}_{{{\text{matching}}}} }} \times 100\%$$

(10)

in which:

$E_{{{\text{Accuracy}}}}$: the accuracy of causality effect of causality extraction;

$E_{{{\text{match}}}}$: the effective risk event pairs;

${\text{Total}}_{{{\text{matching}}}}$: the total matching sentences.

And the extraction effect of explicit causality is shown in Table 11.

Table 11 Analysis on the extraction effect of explicit causality

Full size table

5.3.2 Implicit Causality Risk Pair Extraction and Results Analysis

To validate the effectiveness and accuracy of the proposed the Self-Attention BiLSTM+PI method for extracting implicit causal relationships, experiments were conducted on both the publicly available SemEval_Task8 data set and the urban rail transit operation scheduling fault log and accident report text data set. Firstly, the 73,612 sentences without causal markers in the text corpus studied in this paper were divided into training and testing sets in a 7:3 ratio. Additionally, 1331 annotated sentences indicating causal relationships from the public data set SemEval_Task8 were selected as the contrast data and were also divided into training and testing sets in the same proportion. Next, the Self-Attention BiLSTM+PI method was executed to obtain the causal relationship matrix of risk events based on the training set. Partial causal relationship matrices of some risk events are shown in Table 12.

Table 12 Matrix of causal relationships between risk events (partial)

Full size table

Then, the risk event pairs were extracted from the testing set and matched with the corresponding directions in the causal relationship matrix using a softmax classifier. Finally, the event pairs with non-zero directions were selected to obtain the pairs of risk events with causal relationships in the testing set. The risk event pairs (part) extracted by implicit causality are shown in Table 13, where e₁ and e₂ represent the two risk event tuples extracted from the sentence column of the original sentence. The value of the direction column represents the relationship between the two risk event tuples, a direction of 1 means e₁ is the causal event of e₂, and direction of -1 means e₁ is the resulting event of e₂.

Table 13 Risk event pairs extracted from implicit causality

Full size table

This experiment adopts Intel (R) Core (TM) i5-1035 G1 CPU @ 1.0GHZ 1.19GHZ, and the software environment is the Windows 10 operating system and the Anaconda 3.0 hardware experiment environment. The extraction effect is evaluated by using the evaluation indicators in the relationship extraction field: accuracy (P), recall (R) and harmonic mean value F, the calculation Eqs. (11) - (13):

$$p = \frac{a}{b}$$

(11)

$$R = \frac{a}{c}$$

(12)

$$F = \frac{2PR}{{P + R}}$$

(13)

in which, the accuracy rate P represents the ratio of the number of correct causal relationships to the number of all relationships, the recall rate R represents the ratio of the number of correct causal relationships to the total number of such relationships in the test sample, and the F value represents the comprehensive evaluation index of the whole experimental effect. The experimental results are shown in Table 14.

Table 14 Analysis of the effect of implicit causality

Full size table

As can be seen from Table 14, in public data set SemEval_2010_Task8, the F value of the Self-Attention BiLSTM method is 4.64% higher than that of the BiLSTM method, while the F value after adding the location identifier is 15.27% and 10.63% higher than that of BiLSTM and Self-Attention BiLSTM, respectively, which proves that the extraction effect has been significantly improved after adding the self-attention mechanism.

Generally, risk event R can directly or indirectly cause other risk events, and when the state of risk event R changes, it can cause other risk events to occur, there is an obvious correlation between these risk events. Therefore, to study the transmission process of causal risk chain, the key to the control of risk chain is the source event, and protective measures should be taken in time to reduce the risk transmission possibility of the whole risk chain.

5.3.3 Analysis and Discussion on the Robustness of Input Data Characteristics for Hazard Sources

The robustness of natural language processing is a problem that must be discussed. If the robustness of the model is not high, there will be large problems in practical applications. Therefore, it is necessary to discuss and analyze the robustness of NLP proposed in this section:

In the process of text processing based on the rail transit operation dispatch log, the structure, carrier form, and processing flow of the original data will affect the data set. Therefore, for text data, even if a character or position is replaced, the semantics of the original text may change, or the generated text does not conform to the grammatical structure, which will not adapt to the established model and affect its robustness [30]. In terms of improving the robustness of the model, we try to use the robust loss function to improve the robustness, which can be achieved by constraining the training log data samples, that is, the data within a given distance are artificially divided into the same classification as the original sample, as shown in Fig. 10, to deal with the model classification errors caused by small disturbances, thus improving the robustness of the model. Through this method, we can achieve a more accurate classification curve.

In addition, regularization constraints on model parameters can be added to further improve the robustness of the model. As we know, adding regularization constraints can improve the generalization of the model, but the higher the characteristic dimension of the model, the worse its generalization performance, and the robustness is declining.

5.4 Visualization and Analysis of Risk Chain

Part of the key hazard sources have been mined through the research in Section 4. To obtain the disaster risk chain of key hazard sources, it is necessary to extract the causal relationship from the data set through the risk pair mining method constructed in Section 5.3 to express the risk cause, screen the key hazard sources in the risk data, extract their full life cycle information, and reveal the risk cause sequence and the risk transmission mechanism through the visual method.

There are 16 key hazard sources calculated in Section 4.4. Taking the first five hazard sources in Table 4 as an example, according to the risk chain construction method described in 5.2, the risk chains obtained are shown in Figs. 11, 12, 13, 14 and 15.

Taking the risk chain with “catenary failure” as the key hazard source, it can be concluded that there are many reasons for catenary failure, such as, plastic bags, kites and other sundries entangle the catenary in windy weather, which is usually attributed to the weather node in the traditional hazard source disposal scheme. For another example, the pantograph arcing phenomenon frequently occurs due to the excessive conductor difference design, which leads to sparks on the catenary, thus forming the catenary fault. As for the disposal scheme of the risk chain, the traditional method will simply be attributed to the equipment problem, rather than the detailed display of which equipment and why, let alone the detailed process of risk transmission.

In the traditional handling of metro accidents, the operating enterprises usually seek the causes of accidents according to the accidents that have occurred, and through a period of analysis, write a report to rectify the causes of accidents and control the hidden dangers. This kind of hidden danger treatment is actually “headache cures the head, foot pain cures the foot,” which lacks a real holistic view and overall view, and it is unable to accurately control and manage the dangerous sources of rail transit operation. From the risk chain obtained from the research results of this paper, we can also find out the subsequent events affected after the occurrence of key hazard sources, such as the subsequent impact events caused by catenary failure: list the results of failure to leave the warehouse, train outage, train braking failure, etc. The risk chain reflects the general evolution mechanism and transmission mode between risk events. Guided by the risk chain, the direction of risk transmission can be changed by changing the state of risk occurrence, providing accurate direction guidance for emergency disposal and management and control of accidents.

While the advantages of dealing with hazard sources from the perspective of refined identification of hazard sources proposed in this paper are: through the algorithm proposed in this paper, the hazard sources that cause accidents can be accurately identified, and the causative chain of accidents can be mined from the past multi-dimensional massive accident data to form the accident chain, which forms a visual risk chain for the metro operators, and can quickly find the location of the chain break, and control the occurrence of accidents with the fastest speed and the lowest cost.

6 Conclusions and Further Studies

Based on the theoretical knowledge of event causality extraction in the field of natural language processing, this paper proposes a text mining framework for hazard identification, which can realize the visualization of the whole life cycle information of key hazard sources. Taking more than 100,000 metro operation dispatch logs and 110 accident reports related to operation safety collected in a city from 2019 to 2020 as the data set for in-depth research, the main results are listed as follows:

(i)
A framework of hazard source text extraction is proposed, and a professional domain lexicon is constructed. Combining Chinese word segmentation technology and TF-IDF algorithm, hazard sources of urban rail transit operation safety accidents are mined. Using this method, 75 key feature items are obtained, and 15 key hazard sources are further mined at last.
(ii)
The combination method based on pattern matching and machine learning is used to extract text data, and the extraction accuracy reaches 87.53%. In terms of F value of text data, the Self-Attention BiLSTM+PI method is better than the Self-Attention BiLSTM, improved by 10%.
(iii)
A risk chain construction method based on key hazard sources is proposed to extract the full life cycle information based on key hazard sources, it can realize the visual display of the risk chain, and can help to find the position of “break the chain” points quickly, which can improve the accident management and control ability.

The concept of risk chain is introduced into the research of metro operation risk management and control, providing new ideas for risk management, and can realize the transformation of metro operation safety from “passive aftermath” to “active defense,” which is of far-reaching significance for identifying key hazard sources of metro operation and their chain propagation mechanism. It can provide decision-making basis for scientific formulation of relevant industry standards and specifications.

Through the research in this paper, a systematic multi-source, high-dimensional, heterogeneous rail transit operation and production data collection center can be established to integrate the production practice data of operation, public works, and maintenance and form the intelligent identification data warehouse of rail transit hazard sources and risk chain library, in order to change the status quo that the utilization rate of rail transit production data is relatively low, but the failure rate is high. From a new perspective, the accident cause transmission mode of hazard sources, namely the risk chain, is clarified, and the key nodes in the risk chain are identified to provide decision support and theoretical support for “chain breaking.” A data mining method group and application direction was formed for the application of massive operation data of urban rail transit with metro operation safety as the core and operation demand-oriented. A set of methods applicable to intelligent identification of hazard sources in rail transit operation was formed, and then the safety management of rail transit was standardized, and the refined management of hazard sources was realized. A methodological basis for formulating national, local and industrial standards and specifications related to rail transit operation safety was provided.

References

He L (2018) Application of hazard identification and assessment grading optimization method for metro lines. Safety 39(8):8–11
Google Scholar
Yang NH, Ye M, Wu AM (2017) Hazard identification and risk assessment for safe operation of modern tram. Modern Transp Technol 14(2):75–80
Yang QM (2018) Research on safety risk identification and analysis of large railway passenger station system [D]. Beijing Jiaotong University, Beijing
Wang J (2010) Research on building construction safety prediction based on RS-GA-BP. J Eng Manag 24(6):647–651
Google Scholar
Jia RY, Pan WW (2012) Partner selection of virtual enterprise based on rough set and genetic algorithm. J Harbin Eng Univ. 33(6):730–734
Ding X, Yang X, Hu H et al (2017) The safety management of urban rail transit based on operation fault log. Saf Sci 94:10–16
Article Google Scholar
Zhou Z, Li C, Mi C (2019) Exploring the potential use of near-miss information to improve construction safety performance. Sustainability 11(5):1264
Article Google Scholar
Zhang G, Thai VV, Law AWK et al (2020) Quantitative risk assessment of seafarers’ nonfatal injuries due to occupational accidents based on Bayesian network modeling. Risk Anal 40(1):8–23
Article Google Scholar
Hei YJ (2019) Research on hidden danger analysis and visualization of subway construction based on text mining. Huazhong University of science and technology, Wuhan, Hubei
Google Scholar
Xu N, Ling MA, Liu Q et al (2021) An improved text mining approach to extract safety risk factors from construction accident reports. Saf Sci 138(8):105216
Article Google Scholar
Bugalia N, Tarani V, Kedia J, Gadekar H (2022) Machine learning-based automated classification of worker-reported safety reports in construction. J Inf Technol Constr 27:926–950
Google Scholar
Chapman CB, Cooper DF (1987) Risk analysis for large project: models method and cases. Wiley, Hoboken, pp 544–549
Kangari R (1995) Risk management perceptions and trends of US construction. J Constr Eng Manag 121(4):422–429
Article Google Scholar
Liu X et al (2012) Analysis of causes of major train derailment and their effect on accident rates. Transp Res Rec 2289(1):154–163
Article MathSciNet Google Scholar
Liu YS (2007) Supply chain risk early warning mechanism. China Materials Press, Beijing
Google Scholar
Ma Y, Wang HL et al (2007) Research on project group risk chain management technology. Aerosp Manuf Technol 6:35–37
Google Scholar
Wang HX, Tian YD, Yin H (2021) Correlation analysis of external environment risk factors for high-speed railway derailment based on unstructured data. J Adv Transp 2021(Pt.6):6980617.1-6980617.11
Google Scholar
Moradi R, Moradi R et al (2022) Integration of deep learning and Bayesian networks for condition and operation risk monitoring of complex engineering systems. Reliab Eng Syst Saf 222:108433
Article Google Scholar
Li CB, Lu GS (2012) System dynamics model of risk element transmission in engineering projects. Syst Eng Theory Pract 32(12):2731–2739
Google Scholar
Cao JM, Shen LF (2015) Progress risk assessment of construction projects from the perspective of risk chain. J Tongji Univ 43(03):468–474
Google Scholar
Li JY, Gou JQ, Mu WX et al (2017) Research on railway potential risk correlation reasoning based on accident chain situation. J Railw 39(11):8–14
Google Scholar
Qiu GF, Wang J (2019) Research on knowledge discovery of construction safety accidents based on concept lattice. J Saf Environ 19(05):1625–1630
Google Scholar
Fu L, Wang X, Zhao H et al (2022) Interactions among safety risks in metro deep foundation pit projects: an association rule mining-based modeling framework. Reliab Eng Syst Saf 221:108381
Article Google Scholar
Jia X, Guo B (2022) Reliability analysis for complex system with multi-source data integration and multi-level data transmission. Reliab Eng Syst Saf 217:108050
Article Google Scholar
Liu AH (2013) Research on dynamic evolution model of urban disaster chain and risk assessment method of disaster chain. Central South University, Changsha, Hunan
Google Scholar
Ding X, Qin B (2015) BUEES: a bottom-up event extraction system. Front Inf Technol Electron Eng 16(7):541–552
Article Google Scholar
Liu ZT, Huang ML et al (2009) Research on event oriented ontology. Comput Sci 36(11):189–192
Google Scholar
Che W, Feng Y, Qin L, et al. N-LTP (2021) An Open-source neural language technology platform for Chinese[C]. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp 42-49
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 2017(12):5998–6008
Chen HS, Chen JJ (2020) Construction and optimization of Internet of things intrusion detection classification model based on RESNET and bidirectional LSTM fusion. J Hunan Univ 47(8):1–8
Google Scholar

Download references

Funding

This work was supported by the Shanghai Philosophy and Social Science Planning Fund under Grant 2022BGL001.

Author information

Authors and Affiliations

School of Urban Rail Transportation, Shanghai University of Engineering Science, Shanghai, 201620, China
Ding Xiaobing, Shi Gan, Liu Zhigang & Hu Hua

Authors

Ding Xiaobing
View author publications
You can also search for this author in PubMed Google Scholar
Shi Gan
View author publications
You can also search for this author in PubMed Google Scholar
Liu Zhigang
View author publications
You can also search for this author in PubMed Google Scholar
Hu Hua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ding Xiaobing.

Ethics declarations

Conflicts of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service or company that could be construed as influencing the position presented in, or the review of this manuscript.

Additional information

Communicated by Baoming Han.

Appendices

Appendix A: Statistical analysis of 110 safety accidents during metro operation

No	Date	Position	Introduction of accident	Reason
1	1987.11.18	London Metro	A fire broke out at King's Cross underground station in London	Power supply accident
2	1995.3.20	Tokyo Metro	Sarin gas leak in Tokyo Metro	Power supply accident
3	1995.10.28	Baku Metro	The electric arc caused by the aging of electrical equipment ignited the interior decoration and caused a fire	Vehicle cause
4	2000.11.11	Austrian cable car	A fire broke out in the tunnel in the Austrian cable car mountain	Power supply accident
5	2003.2.18	Daegu Subway	Deliberate arson in Daegu Metro	Elevator fault
6	2004.1.5	Hong Kong Metro	A 50-year-old man ignited a flammable object in a subway of Hong Kong, causing a fire	Human causes
7	2004.7.21	Guangzhou Metro	Power failure	Signal failure
8	2005.7.7	London, UK	Terrorists committed suicide bombings in three subway stations and one bus	Human causes
9	2005.8.26	Beijing metro	A fan short-circuited and caused fire	PSD fault
10	2005.9.27	Beijing metro Xizhimen station	Natural gas leakage at Xizhimen Metro Station	Power supply accident
11	2007.12.21	Shanghai Metro	Passenger squeeze on an escalator in Century Avenue Station	Elevator fault
12	2008.1.21	Singapore Metro	Singapore East-West line vehicle collision accident	Turnout fault
13	2008.3.4	Beijing metro Dongdan station	Passenger stampede caused by abnormal sound of escalator at Dongdan Metro Station	Signal failure
14	2008.7.4	Beijing metro line 5	Beijing Metro Line 5 was shut down for 3 hours due to rainwater reverse irrigation	Signal failure
15	2009.12.22	Shanghai Metro line 1	OCS of line 1 tripped, causing side collision of two trains	PSD fault
16	2010.3.29	Moscow Metro	Terrorist attacks on Moscow Metro	Power supply accident
17	2010.7.5	Shanghai Metro Line 2	Shanghai Metro Line 2 was jammed and the injured died after ineffective treatment	Power supply accident
18	2010.12.14	Beijing Metro Line 10	Subway stampede accident	PSD failure
19	2011.7.5	Beijing Metro Line 4	Escalator failure at zoo station of Beijing Metro Line 4	Signal failure
20	2011.7.13	Beijing Metro Line 10 and line 4	Beijing Metro Line 10 morning peak flow limit 40 minutes	Signal failure
21	2011.9.27	Shanghai Metro Line 1	Rear-end collision of Shanghai Metro Line 1	Vehicle reason
22	2012.1.7	Guangzhou Metro Line 4	Bloodshed caused by fight over seating in Guangzhou Metro	Power supply accident
23	2012.4.26	Beijing Metro Line 8	Stampede on the escalator of Beijing Metro Line 8	Elevator failure
24	2012.4.30	One train of Nanjing Metro Line 2	The driver was hit by his own subway	Signal failure
25	2012.5.7	Shenzhen Metro Line 1	Passengers drinking, violence	Switch fault
26	2012.5.8	Nanjing Metro Line 2	Nanjing Metro Line 2 is braking again	Power supply accident
27	2012.5.12	Shenzhen Metro	Fire in the cooling tower of Shenzhen Metro	Switch fault
28	2012.6.4	Nanjing Metro Line 1	Glass curtain wall burst in subway station	Two reasons
29	2012.6.7	Shenyang Metro Line 2	Door failure results in one door to get on and off for 50 minutes	Signal failure
30	2012.6.11	Nanjing Metro Line 1	Power supply equipment failure of Nanjing Metro Line 1 occurred in the morning, and the porcelain bottle insulator of power supply catenary was damaged	PSD failure
31	2012.6.14	Two light rail trains in Hong Kong	28 people injured in rear-end collision of MTR	Vehicle reason
32	2012.6.17	Guangzhou Metro	Escalator stops suddenly	Power supply accident
33	2012.6.17	Shanghai Metro Line 7	A lift is out of service due to leakage of Shanghai Metro Line 7	Vehicle reason
34	2012.6.22	Shanghai Metro Line 4	Shanghai subway passengers stretch their feet to hinder subway operation	Vehicle reason
35	2012.6.25	Beijing Metro Line 1 and line 2	Sudden turnout failure in morning peak of Beijing Metro Line 1 and line 2	Signal failure
36	2012.7.2	Shanghai Metro Line 1 and line 2	Fault of Line 2 occurred in the morning rush hour; Line 2, faulty signal	Signal failure
37	2012.7.3	Shanghai Metro Line 2	Due to the failure of signal equipment, the speed limit operation and the interval of departure shifts in some sections are prolonged	Vehicle failure
38	2012.7.4	Guangzhou Metro Line 3	Screen door burst	Vehicle failure
39	2012.7.4	Shanghai Metro Line 1	Door failure of Shanghai Metro Line 1	Vehicle failure
40	2012.7.5	Beijing Metro Line 1	Two-way outage due to failure	Ticket machine failure
41	2012.7.6	Nanjing Metro Line 1	Nanjing Metro enters Gulou Station, and the roof sparks	Signal failure
42	2012.7.6	Beijing Metro Line 4	Five people bruised due to elevator emergency stop in Beijing Metro	Vehicle reason
43	2012.7.9	Shanghai Metro Line 1	The emergency brake device was pulled by two passengers due to fighting, resulting in emergency braking of the subway	Power supply accident
44	2012.7.10	Beijing Metro	Leakage in many places of subway station after heavy rain	Power supply accident
45	2012.7.12	Shanghai Metro Line 3	Window glass burst at morning peak	Vehicle failure
46	2012.7.14	Nanjing Metro	Subway passengers hurt themselves and slash people with knives	Elevator failure
47	2012.7.16	Shanghai Metro Line 2	Two men were involved in a dispute over overcrowding, resulting in physical conflict and injuring a female passenger on the side	Elevator failure
48	2012.7.18	Shanghai Metro Line 2	Signal equipment failure of Shanghai Metro Line 2	Power supply accident
49	2012.7.19	Beijing Metro	Beijing subway security inspectors robbed	Elevator failure
50	2012.7.19	Beijing Airport Express	Beijing Metro Airport Express rail link failure	Power supply accident
51	2012.7.21	Beijing Metro	Beijing Metro Airport line was suspended due to rain fault	Power supply accident
52	2012.7.22	Xi'an Metro Line 2	Xi'an subway stops suddenly due to abnormal noise	Vehicle reason
53	2012.7.23	Guangzhou Metro	A girl's arm injured by car door	Power supply accident
54	2012.7.23	Shenyang Metro Line 1	Shenyang Metro Line 1 tripped and stopped by wire	Elevator failure
55	2012.7.25	Seoul Metro Line 2	Seoul subway station escalator suddenly retrograde	Elevator failure
56	2012.7.29	Shenzhen Metro	Illegal entry of Shenzhen Metro Line by outsiders	Signal failure
57	2012.8.22	Guangzhou Metro Line 5	Passengers trapped for half an hour due to elevator failure	Signal failure
58	2012.8.25	Guangzhou Metro Line 1	40-min delay of Metro due to failure	PSD failure
59	2012.9.3	Shanghai Metro Line 2	A woman in Shanghai Metro illegally entered the track line, causing the subway to run at the speed limit from Century Avenue to Longyang Road	Power supply accident
60	2012.9.3	Beijing Metro Line 1	Turnout failure causes four metro trains to clear people and turn back	Escalator failure
61	2012.9.5	Shenzhen Metro Line 4	Two consecutive failures occurred in Metro Line 4: door failure and power supply room failure	Switch fault
62	2012.9.22	Guangzhou Metro Line 2	Sudden failure of Nanpu Station of Guangzhou Metro Line 2	Signal failure
63	2012.9.28	Guangzhou Metro	Passenger congestion caused by failure during morning peak	Signal failure
64	2012.9.29	Shanghai Metro Line 2	Due to the failure of signal equipment, the speed limit operation of the subway in the section from Songhong Road station to People's Square Station is limited, the departure interval is extended, and the estimated impact time is more than 10 minutes	PSD failure
65	2012.10.8	Within Beijing Metro Line 5	Beijing Metro TV shows “Wang Peng you”“	Power supply accident
66	2012.10.12	Guangzhou Metro Line 3	The tunnel of Guangzhou Metro was punctured by drilling rig, and the subway was stopped twice	Power supply accident
67	2012.10.14	Beijing Metro Line 1	Two subway stations in Beijing have “caught fire”	PSD failure
68	2012.10.20	Tokyo Metro	14 injured in Tokyo subway explosion	Signal failure
69	2012.10.26	Shenzhen Metro Line 5	A glass screen door is broken at Lingzhi Station of Shenzhen Metro Line 5	Signal failure
70	2012.10.30	Shanghai Metro Line 9	The operation of Shanghai Metro Line 9 was delayed for 25 minutes in the morning rush hour	Vehicle reason
71	2012.11.18	Shanghai Metro Line 3.4	Partial outage of two lines of Shanghai Metro due to catenary deformation	Power supply accident
72	2012.11.19	Guangzhou Metro Line 8	Tunnel escape of dry passengers caused by subway failure	Elevator failure
73	2012.11.22	Busan Metro, South Korea	More than 100 people injured in the rear-end collision of Busan subway in South Korea	Signal failure
74	2012.12	Beijing Metro Line 5 and line 10	On Metro Line 5 and line 10, several women were scratched on their faces by strange men	Switch fault
75	2012.12.7	Xi'an Metro	The subway was delayed by 7 minutes after sudden failure	Power supply accident
76	2012.12.8	Beijing Metro	One hundred people stop high when Beijing subway passengers fainted and fell to the platform	Switch fault
77	2012.12.16	Hangzhou Metro Line 1 to Linping / Xiasha	Shutdown for 20 minutes due to equipment failure	Vehicle reason
78	2012.12.20	Beijing Metro Line 13	Many subway stations, such as Longze Station of Beijing Metro Line 13, took measures to close the stations, resulting in passenger congestion outside the subway stations	Signal failure
79	2012.12.24	Guangzhou Metro Line 1	Two door faults	PSD failure
80	2012.12.27	Shenzhen Metro Line 1	Occurred in the morning due to switch failure Incur loss through delay	Vehicle reason
81	2012.12.31	Chengdu Metro Line 2	There was a conflict between two passengers Hold on to the subway for 3 minutes	Power supply accident
82	2012.12.31	Chengdu Metro Line 1	Due to the switch failure at Shengxianhu Station of Line 1, the waiting interval of Metro was longer. Some stations took measures such as clearing passengers and carrying passengers on small routing.	Vehicle reason
83	2013.1.6	Wuhan Metro	Wuhan Metro stops at the bottom of a river for 5 minutes	Vehicle reason
84	2013.1.7	Shanghai subway	Passenger took the wrong subway platform. While crossing the rail, changing sides	Signal failure
85	2013.1.8	Kunming Metro	Derailment accident occurred in no-load commissioning of the first phase project	Signal failure
86	2013.1.19	Wuhan Metro Station	Ticket machine failure. The guest drives the white note	Vehicle reason
87	2013.1.19	Linping, Hangzhou Metro Line 1 Paragraph	When the subway breaks down, stay on the roof and enter the station	Switch fault
88	2013.2.19	Chengdu Metro	Drunk passengers passed out in Chengdu Metro Station	Switch fault
89	2013.2.20	Nanjing Metro Line 1	Nanjing Metro Line 1. Automatic brake in case of sudden failure	Ticket machine failure
90	2013.3.30	Shenzhen Metro Line 3	Cable fire in Shenzhen Metro. Multiple metro stops	Signal failure
91	2013.4.5	Wuhan Metro Station	A 70-year-old man was beaten up for a seat on the light subway	Vehicle reason
92	2013.4.5	Wuhan Metro	North wind caused the Wuhan light rail to stop working for half an hour	Power supply accident
93	2013.4.8	Beijing Metro Line 1	Smoke in the apple orchard station	Power supply accident
94	2013.4.8	Wuhan Metro	The escalator of Wuhan subway is repeatedly stopped by malicious action	Vehicle reason
95	2013.4.9	Shenzhen Metro Line 3	Men on track forced the Shenzhen Metro to stop for half an hour	Elevator failure
96	2013.4.15	American subway	New York Metro Line suspended due to Boston explosion	Elevator failure
97	2013.4.26	Beijing Metro	A thief in Beijing subway jumped off the platform and got into the tunnel	Power supply accident
98	2013.5.1	Guangzhou Metro Line 1	Delay of the whole line in case of emergency	Elevator failure
99	2013.5.5	Shanghai Metro Line 1	Caused by fire of billboard downtown. Shanghai Metro access closed	Power supply accident
100	2013.5.5	Beijing Metro Line 10	Beijing subway station. Clavicle fracture caused by ladder fall	Power supply accident
101	2013.5.17	Beijing Metro	A man jumped out of the Beijing subway station as it was about to enter the station	Vehicle reason
102	2013.5.17	Hong Kong Light Rail	77 people injured due to derailment of Hong Kong Light Rail Transit	Power supply accident
103	2013.5.19	Shanghai Metro Line 4	Shoes of three passengers rolled into the elevator	Elevator failure
104	2013.6.4	Taiwan Metro	Heavy rain in Taiwan, subway station becomes a “waterfall”	Elevator failure
105	2013.6.5	Moscow Central Metro Station	Moscow Central subway station closed due to fire	Signal failure
106	2014.4.2	Shanghai subway	Escalator suddenly reversed and many people were injured	Signal failure
107	2014.11.6	Beijing Metro	A woman was stuck between the screen door and the door have Bad luck	PSD failure
108	2015.4.20	Shenzhen Metro	A female passenger fainted on the platform, causing panic among passengers. 12 passengers were injured and sent to hospital	Power supply accident
109	2016.3.2	Beijing Metro	A passenger died of electric shock after entering the main line of operation track	Escalator failure
110	2017.2.10	Hong Kong Metro	The Hong Kong Metro was set on fire and burned	Switch fault

Appendix B: Jieba word segmentation program

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xiaobing, D., Gan, S., Zhigang, L. et al. Risk Chain Mining of Hazard Sources in Metro Operation System Safety: A New Method to Mine and Control Risk for Safety Management. Urban Rail Transit 9, 147–178 (2023). https://doi.org/10.1007/s40864-023-00192-3

Download citation

Received: 02 December 2022
Revised: 02 March 2023
Accepted: 15 March 2023
Published: 15 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s40864-023-00192-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Risk Chain Mining of Hazard Sources in Metro Operation System Safety: A New Method to Mine and Control Risk for Safety Management

Abstract

Similar content being viewed by others

A Review of Research into the Standardization of High-Speed Railway Text Operation and Maintenance Data

A chemical accident cause text mining method based on improved accident triangle

Importance Evaluation of Factors for the Railway Accidents Based on TF-K

1 Introduction

2 Literature Review

3 Problem Description

3.1 The Basic Problem

3.2 Research Technology Route Structure

4 Hazard Source Mining Based on Dispatch Logs and Accident Reports

4.1 The collection of operation safety data

4.2 Text Data Preprocessing

4.3 Selection of Key Eigenvalues

4.4 Identification of Key Hazard Sources

5 Risk Chain Construction Method Based on Key Hazard Sources

5.1 Parameter Construction of Risk Chain Edge

5.1.1 Determining the Direction of the Risk Chain

5.1.2 Determination of Risk Chain Edge Transfer Probability

5.2 The Construction of Risk Data Set

5.2.1 Explicit Causality Extraction Based on Pattern Matching

5.2.2 Implicit Causality Extraction Based on Machine Learning

5.3 Risk Pair Extraction and Analysis

5.3.1 Explicit Causality Risk Pair Extraction and Results Analysis

5.3.2 Implicit Causality Risk Pair Extraction and Results Analysis

5.3.3 Analysis and Discussion on the Robustness of Input Data Characteristics for Hazard Sources

5.4 Visualization and Analysis of Risk Chain

6 Conclusions and Further Studies

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Appendices

Appendix A: Statistical analysis of 110 safety accidents during metro operation

Appendix B: Jieba word segmentation program

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation