1 Introduction

To adapt to the rapid development of the national economy and meet the growing freight transport needs, freight overloading has gradually become an important measure to solve the current transport capacity shortage. Therefore, special attention has been paid to the development of the heavy-haul railway, with the increase of axle load and traction weight. This brings severe challenges to the service performance and safety maintenance of the heavy-haul railway. Especially, concrete sleepers and ballast are important components of railway track structures. Their performance has a key influence on the safety and reliability of heavy-haul trains. Consequently, conducting damage recognition on concrete sleepers and ballast with big data is very important.

At present, the axle load of heavy-haul trains has reached 40 tons in Australia [1]. The Ministry of Railways in China has put forward a plan to improve the axle load to 30 tons [2]. Under this load, the bearing characteristics of the track and subgrade inevitably change significantly. Compared with the existing load, the load acting on the track structure is larger, so the vertical load transferred to the subgrade surface increases considerably. Dynamic stress on the subgrade surface exceeded 120 kPa in the Chinese Shuo-Huang railway line when the axle load of trains exceeded 30 tons [3], which is quite close to the limit of 150 kPa proposed by the Department of Chinese Railway for guaranteeing the safety of the subgrade.

Furthermore, with the train periodic load frequently acting on the track structure, as the main support structure, the granular ballast in the track inevitably breaks, is pulverized and is crushed [4]. Differential settlement [5, 6] inevitably occurs in some sections of railway lines, which directly causes the separation of sleepers from the ballast, and then leads to the loss of ballast support. In addition, under the long-term wheel–rail load, uneven settlement was caused between the subgrade and foundation [7], especially in areas with abundant water resources, leading to damage of the supporting effect of the ballast, as shown in Fig. 1.

Fig. 1
figure 1

Damage of ballast support in the track: a early development stage and b heavy damage

Compared with the passenger traffic railway, the existence of absent sleeper support further aggravates the loading characteristics of the track and subgrade in the heavy-haul railway because of the heavy axle load and brings severe challenges to the safe operation of the existing track and subgrade structure. It has been found that the vibration of the vehicle and the track change due to the influence of absent sleeper support. The sleeper vibration is inevitably affected by the loss of ballast support [8, 9]. Eigenfrequencies and the pertinent eigenmodes of the railway sleeper vibration change due to the reduction of the support stiffness under the sleeper when the ballast is damaged [10]. In addition, the loss of ballast support affects the wheel–rail forces acting on the rail [11]. For example, the gap size and the number of absent sleeper supports have a significant effect on the peak of wheel–rail force [12]. In some specific situations, the wheel–rail contact forces can increase by 30% [13]. Bezin et al. [14] found that loss of sleeper support had a more obvious influence on the track force than the wheel–rail force, resulting in the reduction of rail life. Sleeper–ballast contact forces at the sleeper adjacent to the hanging one increased by 70%. The uneven loading of the ballast bed induced an irregular settlement of the track bed [15,16,17].

In total, the existence of absent sleeper support leads to deterioration of the wheel–rail force and track force and the failure of track structures. Therefore, detecting the existence of absent sleeper support is critical. There are mainly two ways to detect sleepers with loss of support. One is a traditional method based on the subjective judgement and experience of engineers, which is not entirely reliable. The other is a monitoring method based on on-board or track-side data, which is relatively more credible. For example, Balouchi et al. [18] identified voids underneath the track from the on-board acceleration response measured in the vehicle cab, and also identified the type of track asset (e.g. S&C, structure or plain-line track) that the void is located under. Additionally, Lam et al. [8] studied the feasibility of using the track-side vibration characteristics of the in situ sleeper to quantify the ‘‘health’’ status of the underlying ballast using the model updating approach. Clark et al. [19] presented an application of non-destructive acoustic emission technology for damage detection in railway concrete sleepers. Kaewunruen [20] analysed the dynamic mode shape to evaluate curvature ratios under different types of ballast losses using the sleeper finite element model and then developed a curvature-based damage detection method to identify ballast voids under railway track sleepers. It also belongs to the track-side method. Lee [21] proposed a method using the spectral velocity of flexural waves and mobility function from the impulse–response test to investigate the quality of a railway sleeper. The above investigations need to set the criteria for loose sleepers based on theoretical or numerical methods; however, this may result in quite different results for various vehicles and tracks.

Therefore, data-driven damage recognition has gradually become an effective method in engineering systems. Ankrah et al. [22] built a supervised machine learning model to predict faulty and healthy states of suspension system components based on a support vector machine method by using vertical accelerations of the railway vehicle. Janssens et al. [23] developed a feature-learning system based on convolutional neural networks (CNNs) for bearing fault detection. Wen et al. [24] proposed a new CNN based on LeNet-5 for fault diagnosis through a conversion method converting signals into two-dimensional images. Cha et al. [25] proposed a vision-based method using a deep architecture of CNNs for detecting concrete cracks without calculating the defect features. In addition, a novel hierarchical adaptive deep convolution neural network based on an improved algorithm was proposed by Guo et al. [26] to diagnose bearing faults and determine their severity. Hu et al. [27] also adopted the deep neural network to recognize faults in the bogies of high-speed trains.

For track defect monitoring, Sresakoolchai and Kaewunruen [28] used a deep neural network, convolutional neural network and recurrent neural network to detect and evaluate the severity of rail combined defects; they found that the accuracy of CNN models can exceed 99%. They [29] also adopted axle box accelerations as key indicators for machine learning models to achieve the prognostics of absent sleeper support. A deep neural network with an accuracy of 94.31% established by them [30] was used to detect rail, switch and crossing, fastener and rail joint defects. Huang [31] proposed a phone-based hybrid machine learning model combining a pre-train convolutional neural network as a feature extractor and a support vector regressor as a predictor to estimate the on-board passenger comfort level. Sysyn et al. [32] collected statistical information from different-extent voided zones and the corresponding reference zones without voids by multipoint track-side measurements of rail-dynamic displacements using high-speed video records and digital imaging correlation methods. They achieved exact recent void identification using wavelet scattering feature extraction from track-side measurements by machine learning methods. In addition, machine learning has the potential to predict the weight of a train effectively [33], identify dynamic properties of railway track components [34], detect wheel flats [35] and monitor rail corrugation for railway track maintenance [36]. It can be seen that machine learning gradually has become a valuable way to achieve fault recognition for railway trains and tracks.

Hence, this paper focuses on the detection of absent sleeper support in ballasted track, with an emphasis on the integration of model-based and data-driven methods. The study was structured as follows. A mathematical model based on vehicle–track coupled theory was first built for the acquisition of the training dataset. Then, an architectural hierarchy of the convolutional neural network with three convolutional layers (TLCNN - Three Layers Convolutional Neural Network) was developed. Thirdly, a verification process of the mathematical model was carried out. Lastly, from the data-driven perspective, the TLCNN was used to train the raw data of the sleeper vertical displacement and detect the existence of absent sleeper support.

2 Model-Based Method

2.1 Vehicle–Track Coupled Model Considering Hanging Sleepers

To acquire the raw data needed by the data-driven method to detect absent sleeper support, a mathematical model was first built, as shown in Fig. 2. This model consists of three subsystems, i.e., the wagon subsystem, track subsystem and wheel–rail contact subsystem. The wagon subsystem was established based on the multi-body system theory. The track subsystem was established as a multi-layer discrete-elastic support beam subsystem considering absent sleeper support, in which rails, sleepers and ballast were considered in detail.

Fig. 2
figure 2

Schematic diagram of the mathematical model consisting of the wagon subsystem, track subsystem and wheel–rail contact subsystem

Vehicle–track coupled theory [37,38,39,40,41] was used to clarify the dynamics between the vehicle and the track. A wagon with a three-piece bogie widely used in China is shown in Fig. 3. The wagon consists of 13 rigid bodies, i.e., four wheelsets, four side frames, two bolsters and one car body. According to the vehicle structures, a multi-body model with 47 degrees of freedom (DOF) was established [2].

Fig. 3
figure 3

Freight vehicle–track coupled model: a end view; b secondary suspension with friction wedges; c primary suspension with a rubber pad

Before the model was established, the coordinate system was first defined, where the x-axis represents the running direction of trains, the y-axis located in the plane of the track is directed to the right side of the running direction, and the z-axis is perpendicular to the track plane and points down.

For the wagon, it was regarded as a 47-DOF multi-body system taking both the primary suspension and the secondary suspension into account. Hence, the motion equation of the wagon model is expressed as

$${\mathbf{M}}_{{\text{v}}} {\mathbf{\ddot{u}}}_{{\text{v}}} { + }{\mathbf{C}}_{{\text{v}}} {\dot{\mathbf{u}}}_{{\text{v}}} { + }{\mathbf{K}}_{{\text{v}}} {\mathbf{u}}_{{\text{v}}} { = }{\mathbf{F}}_{{\text{v}}} ,$$
(1)

where Mv, Kv and Cv are matrices of the generalized mass, the generalized stiffness and the generalized damping, respectively [2], and Fv is the vector of the generalized force; uv is the vector of the displacement.

As shown in Fig. 4, for the ballasted track, only vertical motion of rails, sleepers and the ballast bed was considered. Euler–Bernoulli beams on the discrete-elastic fastening systems were used to describe the motion of the rails. Hence, the equation of rails is expressed as

$$EI\frac{{\partial y^{4} }}{{\partial x^{4} }} + \rho A\frac{{\partial y^{2} }}{{\partial t^{2} }} = \sum\limits_{i = 1}^{4} {P_{i} (t)\delta (x,x_{wi} (t)) - } \sum\limits_{k = 1}^{Ns} {F_{sk} (t)\delta (x,x_{sk} )} ,$$
(2)

where EI is the bending stiffness of rails, ρ is the density of rails, A is the cross-sectional area of rails, Pi is the wheel–rail force at the ith wheel position xwi(t), Fsk is the rail–sleeper force at the kth sleeper, xsk is the longitudinal coordinate of the kth sleeper, δ(·) represents the delta function, and Ns is the number of sleepers. The set U represents the indices of the absent sleeper support, and Fsk = 0 if k∈U.

Fig. 4
figure 4

Side view of the wagon–track coupled model

For sleepers, vertical motions are written as

$$M_{s} \ddot{z}_{srk} + K_{p} z_{srk} - K_{s} z_{sbk} + C_{p} \dot{z}_{srk} - C_{s} \dot{z}_{sbk} = 0 \, k \notin U,$$
(3)

where Ms is the mass of the sleeper, Kp and Cp are the vertical stiffness and damping of the rubber pad, respectively, Ks and Cs are the vertical stiffness and damping of the ballast bed, respectively, zsrk is the relative displacement between the kth sleeper and the rail, and zsbk is the relative displacement between the kth sleeper and the kth ballast bed.

For the ballast bed, it was treated as a series of discrete mass blocks, and only vertical motion was considered [42]. Hence, motions can be described as

$$M_{{\text{b}}} \ddot{z}_{sbk} + K_{s} z_{sbk} + C_{s} \dot{z}_{sbk} = 0,$$
(4)

where Mb is the mass of the sleeper, and Kf and Cf are the vertical stiffness and damping of the subgrade, respectively.

As shown in Fig. 5a, the interaction force between sleepers and ballast bed changes when absent sleeper support appears in the ballasted track. Hence, the non-linear characteristics of the sleeper–ballast force can be expressed in Fig 5b. However, if the relative displacement between sleepers and ballast bed is smaller than the gap, the sleeper–ballast interaction force disappears. In the current study, we assumed that the gap is large enough that the sleeper–ballast interaction force is 0.

Fig. 5
figure 5

Absent sleeper support in the track: a sketch map and b the sleeper–ballast interaction force in the void zone

In addition, the wheel–rail contact relationship was used to couple the wagon and track models. In order to accurately describe the dynamic interaction between wheels and rails, non-linear Hertz theory was adopted, and the vertical wheel–rail force is expressed as

$$P_{i} = [G \cdot \max (0,z_{wi} (t) - z_{r} (x_{wi} ,t) - z_{g} (t))]^{1.5} ,$$
(5)

where zwi(t) represents the wheel vertical displacement at time t, zr(xwi,t) represents the rail vertical displacement under the wheel xwi at time t, zg(t) is the geometrical irregularity of track, and G is a constant depending on the wheel/rail profile and the rail mechanical/material properties; herein, G = 2.3 × 107 N2/3/m.

2.2 Random Track Irregularities

The track irregularity is a main excitation for the vehicle–track coupled dynamics system, which is generally obtained by the inversion of its power spectral density (PSD) function. Hence, it is vital to acquire the real PSD function. Based on the long-term inspection of the track irregularities for a domestic heavy-haul line [43], the PSD function can be modelled as

$$S_{v} { = }\frac{A}{{f^{n} }},$$
(6)

where Sv is the power spectral density [mm2/(1/m)], f is the spatial frequency (m−1), A is the roughness constant, and n is a constant coefficient. Two different exponential functions are used to access the parameters of the PSD. Herein, A1 = 0.1686, n1 = 1.6552, A2 = 0.0049, and n2 = 3.0789. The piecewise point between the two exponential functions is 12.02 m for the vertical surface profile.

With the PSD function of the track, the track irregularity excitation is generated as follows. The spectrum X(k) in the frequency domain is constructed by discrete sampling on the PSD function, and then the inverse Fourier transform is carried out to obtain the track irregularity excitation function x(n) in the time domain, which is defined as

$$\left\{ {\begin{array}{*{20}l} {X(k) = N_{r} \xi (k)\sqrt {S_{v(f = k\Delta f)} \Delta f} } \hfill \\ {\xi (k) = e^{{(i\Phi_{n} )}} } \hfill \\ {x(n) = \frac{1}{{N_{r} }}\sum\limits_{k = 0}^{{N_{r} - 1}} {X(k)e^{{\left( {\frac{2\pi kn}{{N_{r} }}i} \right)}} } } \hfill \\ {k,n = 0,1, \ldots ,N_{r} - 1} \hfill \\ \end{array} } \right.,$$
(7)

where X(k) is a complex, whose modulus is obtained from the PSD function at a frequency of kΔf. Nr is the number of frequency samples, Δf is the frequency sampling interval, and i is the complex symbol. Φn is the phase angle obeying a uniform distribution of 0~2π.

2.3 Random Sample Generation

In this study, the detection of absent sleeper support is based on a convolutional neural network. It needs a large amount of training data to achieve a reliable recognition effect. Due to the difficulties in field tests of the vehicle–track system with absent sleeper support, it is difficult to obtain enough data to support machine learning. Therefore, we obtained the raw data for the training of a convolutional neural network from the perspective of the model-based method. As shown in Fig. 6, the procedure of the model-based data acquisition is described in detail.

  • Input the vehicle and track parameters of the above mathematical model.

  • A sample of the track irregularity is generated based on the power spectrum density curve and inputted into the above mathematical model as systematic excitation.

  • Randomly generate the positions of absent sleeper support in the track and the train running speed.

  • The dynamics calculation of the vehicle–track coupled system is carried out.

  • A time–space sample of sleeper vertical displacement is obtained.

  • Continue back to step (2) until the Nth time–space sample of sleeper vertical displacement is obtained.

Fig. 6
figure 6

Procedure for the model-based data acquisition

It should be noted that the running speed, track irregularity and position of absent sleeper support are regenerated for each experiment. The running speed varies from 60 to 80 km/h. Hanging sleepers next to each other as a pack are randomly located in the track section between the first sleeper and the 21st sleeper.

3 Data-Driven Method

Because the time series signal of sleeper vertical displacement is one-dimensional, it is necessary to reconstruct this data into two-dimensional time–space data. Figure 7 shows how to achieve the data conversion. The abscissa axis of this two-dimensional data represents the sleeper number, of which 21 sleepers are considered. The ordinate axis represents the time history.

Fig. 7
figure 7

Establishment process of time–space sample of the sleeper vertical displacements

With the input data, the convolutional neural network was built as shown in Fig. 8, which mainly includes three convolutional layers and two pooling layers. The output label i means that i hanging sleeper(s) exists in the ballasted track, which is in the range from 1 to 6.

Fig. 8
figure 8

Architectural hierarchy of the TLCNN

The output V(1) of the first convolutional layer can be expressed as

$${\mathbf{V}}^{(1)} = conv2d\left( {{\mathbf{W}}^{(1)} ,{\mathbf{X}}^{(1)} ,n^{(1)} ,\hbox{``}relu\hbox{''}} \right),$$
(8)

where X is the input matrix, W is the convolution kernel, and superscript (1) indicates the layer number. n is the number of the convolution kernel; herein n(1) = 32. The size of the convolution kernel W(1) is 3 × 3. The activation function is the “ReLU” function, which is defined as

$$f(x) = \max (0,x).$$
(9)

The stride of the first pooling layer is 2 × 2. The pooling function is defined as

$$V_{ij} = max(x_{i,j} ,x_{i,j + 1} ,x_{i + 1,j} ,x_{i + 1,j + 1} ).$$
(10)

Through the pooling process, V(1) becomes V(2).

The output V(3) of the second convolutional layer can be expressed as

$${\mathbf{V}}^{(3)} = conv2d\left( {{\mathbf{W}}^{(3)} ,{\mathbf{V}}^{(2)} ,n^{(3)} ,\hbox{``}relu\hbox{''}} \right),$$
(11)

where the size of the convolution kernel W(3) is 3×3; herein, n(3) = 32.

The output V(4) of the third convolutional layer is

$${\mathbf{V}}^{(4)} = conv2d\left( {{\mathbf{W}}^{(4)} ,{\mathbf{V}}^{(3)} ,n^{(4)} ,\hbox{``}relu\hbox{''}} \right),$$
(12)

where the size of the convolution kernel W(4) is also 3×3; herein, n(4) = 32.

The stride of the second pooling layer is also 2 × 2. Through the pooling process, V(4) becomes V(5). The two layers below are the full connection layers; herein, V(5)V(6)V(7). V(6) and V(7) are column vectors. The last output is V(8).

The categorical cross-entropy is chosen as the loss function, which can be expressed as

$${\text{loss}} = - \frac{1}{m}\sum\limits_{i = 1}^{m} {\left[ {y\ln V_{i}^{(8)} + (1 - y)\ln \left( {1 - V_{i}^{(8)} } \right)} \right]} ,$$
(13)

where y is the expected value.

In this paper, sleeper vertical displacement is treated as a performance index to detect whether absent sleeper support exists. The number of absent sleeper support is in the range of 1 to 6. The relationship between the label and the track supporting failure is listed in Table 1.

Table 1 Classes of track supporting failure

4 Validation of the Mathematical Model

According to the established model above, the response of the track structure was calculated and then compared with that from the field experiment. A target wagon with a 30-ton axle load was chosen, and its dimensions are shown in Fig. 9a, where L1 represents the length between wheel axles, L2 represents the length between bogies, and L3 represents the wagon length. Herein, L1 = 1.86 m, L2 = 9.8 m and L3 = 13.6 m. The main parameters of the wagon and the track are listed in Table 2. For further detail, refer to the literature [44]. Test instruments are shown in Fig. 9b.

Fig. 9
figure 9

Schematic diagram of a test wagon and b test instruments

Table 2 Parameters of the wagon–track coupled subsystem as considered in the numerical model

The sleeper vertical displacement was treated as a main object of comparison. As shown in Fig. 10a, the results of sleeper vertical displacement from both the measurement and simulation are given. Through the comparison, it was found that both results show a relatively consistent tendency in terms of the waveform and the peak. Relative error at peaks is also given in Fig. 10b and indicates that the maximum error is only 7.7%. Therefore, the mathematical model shows relatively reliable accuracy.

Fig. 10
figure 10

Comparison of sleeper vertical displacements between the simulation and the experiment: a time history and b error analysis

To illustrate the accuracy of our model considering the absent sleeper support, results from other literature [32] were used for comparison. Sysyn et al. measured the rail deflections for different void damage by using the DIC method. Corresponding vehicle and track parameters remained consistent. The vehicle used was ICE1, and its weight was 70 tons. The running speed selected was 110 km/h. The void size was 2 mm. Results are shown in Fig. 11. Rail deflections under normal conditions are displayed in Fig. 11a. Rail deflections under the condition of absent sleeper support are given in Fig. 11b. It can be seen that the calculated results for rail vertical displacement are consistent with the experimental results acquired by Sysyn. The above results illustrate the effectiveness of the proposed model again when the absent sleeper support in the track is considered.

Fig. 11
figure 11

Comparison of experimental and simulation results of rail deflections: a under normal condition, and b under the condition of considering absent sleeper support

5 Dynamic Behaviour Due to Absent Sleeper Support Using the Model-Based Method

To clarify the dynamic behaviour of the track with absent sleeper support, the sleeper vertical displacement was analysed when a wagon with the 30-ton axle load passes through the site with 1–6 hanging sleepers. The running speed of the wagon is 80 km/h.

Aimed at understanding the vibration characteristics of sleepers under the influence of absent sleeper support, we analysed the sleeper vertical displacement located at the centre of the void area. As shown in Fig. 12, the changes of the sleeper vertical displacements are given when absent sleeper support is considered in the track. It indicates that the sleeper vertical displacement increases significantly when absent sleeper support occurs in the track. As the number of absent sleeper support increases from 1 to 5, its peak increases from 0.85 mm to 9.83 mm. Furthermore, the spatial distribution of the peak of the sleeper vertical displacement along the track is given in Fig. 12b. It indicates that sleepers in the void area no longer bear the load from the wagon, and displacements of hanging sleepers increase significantly.

Fig. 12
figure 12

Responses of the sleeper vertical displacement in the numerical model: a time history of vertical displacements of the hanging sleeper, and b the spatial distribution of the peak of the sleeper vertical displacement along the track

To understand the load-bearing characteristics of sleepers under the influence of absent sleeper support, we further analysed the loads acting on the sleepers directly adjacent to the hanging ones. As shown in Fig. 13, the changes of the load acting on the sleepers under conditions with and without absent sleeper support are given. The force acting on the first sleeper adjacent to the void area was selected. Under normal conditions, the rail–sleeper force is about 51.8 kN. However, when absent sleeper support occurs in the track, the rail–sleeper force adjacent to the void area increases significantly. As the number of absent sleeper supports increases from 1 to 5, the peak of the rail–sleeper force increases from 73.8 to 225.1 kN, which increases from 42.5 to 335% compared with the normal state. In conclusion, the increase of the rail–sleeper force is quite significant, which leads to damage to the track sequentially.

Fig. 13
figure 13

Time history of the rail–sleeper force with and without absent sleeper support in the track in the numerical model

As mentioned above, the bearing load of normal sleepers adjacent to the void area increased significantly due to the existence of absent sleeper support. This load is inevitably transmitted downward to the track and the subgrade. As shown in Fig. 14, time-domain variations of vertical stress on the ballast surface and the subgrade surface are given when the wagon passes through the track with 3–5 hanging sleepers, respectively. According to the Chinese standard, the vertical stress on the ballast surface should be controlled within 0.5 MPa, and the vertical stress on the subgrade surface should be less than 0.15 MPa.

Fig. 14
figure 14

Time history of vertical stress loading on a the ballast and b the subgrade surface when the wagon passes through the track with absent sleeper support in the numerical model

It can be found that stress obviously increases with the influence of absent sleeper support. Even though it meets the requirement of the maximum allowable stress, sufficient attention should be paid because the increasing stress on the track may cause damage to track structures and affect the running safety of wagons. However, for this particular case study, track properties and model considered, the surface stress of the subgrade reached 170 kPa when 5 hanging sleepers exist in the ballasted track, which is larger than the maximum allowable stress of 150 kPa for the subgrade. Therefore, sufficient attention should be paid when a certain number of continuous absent sleeper supports occur in the track. Otherwise, it causes excessive stress on the subgrade and even causes damage to the subgrade structure. The number of continuous absent sleeper supports should be controlled within 5 from the view of guaranteeing subgrade safety.

6 Discussion of the Detection of Absent Sleeper Support

From the above analysis, it is found that the absence of sleeper support has a significant impact on the dynamic response of the vehicle–track coupled system. Under harsh conditions, it causes serious damage to the track structure and threatens the safety of the train operation. Therefore, to ensure the running safety of trains, accurate identification and detection of absent sleeper support are quite vital.

Because the vibration frequency of the sleeper vertical displacement is relatively low, the sampling interval of the data is set to 0.05 s. Herein, the size of the two-dimensional time–space data established based on the method in Fig. 5 is 70 × 21. A total of 60,000 samples were obtained through the model-based method. Eighty percent of these data were used as the training dataset, and the remaining 20% were used as the test dataset.

Due to the varying speed and the random track irregularity, the two-dimensional time–space data for different samples vary widely. Hence, the normalization should be carried out on the raw data before training the TLCNN. The linear normalization conversion was adopted here, which is described as

$$x_{ij}^{^{\prime}} = \frac{{x_{ij} - \min (x_{ij} )}}{{\max (x_{ij} ) - \min (x_{ij} )}},\quad i = 1\sim 70,\quad j = 1\sim 21,$$
(14)

where i and j represent the row and column of the input matrix, respectively. Through this treatment, all data can be compressed in the range of [0, 1], so that the mathematical units between the data are consistent. It is beneficial for machine learning to improve the model accuracy and increase the convergence speed.

After the normalization treatment, heat maps of vertical sleeper vertical displacements under conditions of absent sleeper support are shown in Fig. 15. The dark area in the figure represents the displacement response of the hanging sleepers. It can be concluded that as the number of absent sleeper supports rises, the dark area increases significantly. It should be noted that the dark area changes for each sample in the heat map because of the randomness of the position of the hanging sleepers. In the process of data learning in this section, the input is the original two-dimensional data rather than the heat map.

Fig. 15
figure 15

Heat maps of the sleeper vertical displacement after the normalization treatment under conditions of a 3 hanging sleepers, b 4 hanging sleepers, c 5 hanging sleepers and d 6 hanging sleepers

In this study, the proposed TLCNN classifier was implemented in Python 3.6 using Anaconda3 in 64-bit. All experiments were conducted on a computer (Intel® Core™ i5-9400F CPU and 2.90 GHz processor with 24 GB of RAM) and an NVIDIA GeForce 1660 GPU on a Windows 10 Pro system platform. It should be noted that all experiments that follow were calculated by a GPU rather than a CPU.

The performance of our proposed method was appraised using the collected sleeper vibration dataset. To obtain enough failure features beneficial for classification from raw vibration signals, 60,000 samples were considered in our proposed TLCNN method. The size of the input matrix for TLCNN was 70 × 21. The row and column of the input matrix represent the time history and the sleeper number, respectively. The number of filters was 32 for all three convolutional layers. The same filter length of 3 for all filters was used for capturing enough information from a local region of the input signal.

The length for both pooling layers was set as 2. In the classification stage, the number of neurons in a fully connected hidden layer was chosen as 128. The output size of TLCNN was 6, corresponding to the number of considered failure conditions of the sleepers. The proposed TLCNN was trained from scratch using the training set described in Sect. 2. The TLCNN was optimized by the Adam gradient descent optimization algorithm with a mini-batch size of 128 samples. The number of training epochs was 50. The learning rate was initialized to 0.0001 with no decay on each update. A dropout rate of 0.25 was used for fully connected layers to avoid the risk of overfitting.

Training results after 50 epochs are shown in Fig. 16, which reveal the global accuracy curves of the training dataset and the validation dataset for the TLCNN method. It can be seen that the training accuracy and the validation accuracy reach stable values after 20 epochs; no overfitting was observed. This demonstrated that the training dataset utilized here is sufficiently large to train the model with the three-layer network structure. The accuracy of training data was up to 98%. Testing classification results with the TLCNN method are shown in Fig. 17 by using the confusion matrix. It gives correctly classified samples and misclassified samples. The x-axis and y-axis represent predicted labels and true labels, respectively. It is easily found that for each condition, the TLCNN shows excellent diagnostic performance.

Fig. 16
figure 16

Training result of training dataset with respect to the epoch: a accuracy and b loss

Fig. 17
figure 17

Confusion matrix of the test dataset

The dimensionality reduction technique, t-distributed stochastic neighbour embedding (tSNE), was also used for the visualization of the test dataset, as shown in Fig. 18. It was already used to determine the damage zone in the railway track [45, 46]. It shows that the test dataset can be efficiently separated by using the established TLCNN. The above results illustrate that the TLCNN is a reliable way to detect the existence of absent sleeper support with relatively high accuracy.

Fig. 18
figure 18

The results of t-SNE analysis for the detection of the number of absent sleeper supports (Pc1 = first principal component, Pc2 = second principal component)

6.1 Robustness Against Noise

In practical applications, railway trains and tracks often work in complex working environments, and the acquired track vibration signals are easily corrupted by strong background noise. Hence, the robustness of the proposed TLCNN against noise should be determined. For this consideration, Gaussian white noise is added to the raw vibration signals to generate noisy signals with different signal-to-noise ratios (SNRs).

The SNR is defined as

$${\text{SNR}} = 10\log_{10} \left( {\frac{{P_{signal} }}{{P_{nosie} }}} \right),$$
(15)

where Psignal is the power of a signal, and Pnoise is the power of the noise. Its unit is dB.

The damage identification problem studied in this paper is a multiclass classification (six classes in our case). To carry out the performance evaluation and comparison, an F1 score [47] was utilized in the current study. It is a common index to quantify the performance of classification methods, which is written as

$$F_{1} = \frac{2TP}{{2TP + FP + FN}},$$
(16)

where true positive (TP) is correctly classified as positive samples, false positive (FP) is misclassified as positive samples, true negative (TN) is correctly classified as negative samples and false negative (FN) is misclassified as negative samples.

In this study, the proposed TLCNN approach was evaluated by using noisy signals with different SNRs ranging from −5 to 10 dB. The sample size utilized was 20,000. The assessment results for TLCNN are listed in Table 3. It is obvious that the proposed TLCNN significantly shows a strong anti-interference capability, with over 98% testing performance in terms of F1 score within all considered SNR levels.

Table 3 F1 score for noisy signals with different SNRs

According to the results, the proposed TLCNN exhibits excellent robustness against noise even though additional denoising preprocessing is not carried out. In other words, the TLCNN can acquire and extract robust characteristics in a noisy environment. It is quite suitable in engineering environments where test interference occurs.

6.2 Effect of the Number of Samples on Recognition Accuracy

Generally, the accuracy of damage recognition progressively increases with the number of signal samples, while it will not increase as the sample size exceeds a certain value. Consequently, the effect of the number of samples on recognition results for the TLCNN was further investigated, as shown in Fig. 19.

Fig. 19
figure 19

Effect of the number of samples on the accuracy

The figure indicates that the recognition accuracy on average exceeds 99.6% when the sample size is 40,000, even though signals are badly corrupted by background noise. In addition, the figure points out that the recognition accuracy rate is relatively low when the sample size is small. When the sample size is 5000, the recognition accuracy rate is only 92.0% for the noisy signal with an SNR of −5 dB. As the sample size increases to 20,000, the recognition accuracy rate reaches an average of 99.0% for all four noisy signals. Therefore, it is evident that the proposed TLCNN in this study is influenced by sample size. This result also demonstrates that the proposed TLCNN displays wide applicability if the sample size exceeds 20,000.

6.3 Effect of Depth on Recognition Accuracy

The depth of the CNN can easily affect the abstraction level of obtained features. Low-level features gained from the raw vibration signals may be significantly influenced by background noise. The abstraction level of damage features can meaningfully affect the recognition performance. To identify the influences of the CNN depth on the recognition performance, CNNs with one to three layers were evaluated, where the noisy input signals with different SNRs ranging from −5 to 10 dB were considered. The sample size utilized was 48,000. The results are given in Fig. 20.

Fig. 20
figure 20

Effect of CNN depth on the accuracy

It can be found that the recognition performance of the CNN increases with depth. This is because the CNN with more layers can discover and extract more nonobjective and robust characteristics at higher levels. From the figure, the CNN models with three layers have the best performance (average 99.7% for the four noisy signals with different SNRs) in the aspect of F1 score, which is significantly better (with an improvement of 1–2%) than the CNN models with one and two layers. In addition, it can be seen that the low-layer CNN is relatively easily influenced by the background noise, and the high-layer CNN has better capability to resist interference. Therefore, to handle more complex recognition tasks and further improve performance, deeper CNN models should be preferentially considered in practical applications.

6.4 Comparison with Other CNNs

To illustrate the recognition of the proposed TLCNN, comparisons with other CNNs, i.e., VGG16, Resnet50 [48] and Xception [49], were carried out. In this study, the sample size was 60,000, and noisy signals with an SNR of 10 dB were chosen. The calculation results are listed in Table 4.

Table 4 Comparison with other CNNs

VGG16 is a convolutional neural network model proposed by Simonyan and Zissermap [50], which includes 13 convolutional layers, three fully connected layers and five pooling layers. All of its convolutional layers use the same convolution kernel size of 3 × 3. Small filters and deeper networks are the typical features of VGG16. Resnet50 is a kind of residual network, in which the convolution operation is performed for the input, followed by four residual blocks, and finally, the full connection operation is performed to facilitate the classification task. In total, Resnet50 contains 50 convolutional layers, which proves that the CNN can develop towards deeper models (with more hidden layers). Xception architecture has 36 convolutional layers, which are structured as 14 modules, with the linear residual connection between all but the first and last modules.

It can be seen from the table that F1 scores (above 99% for all methods) are basically consistent, but training time, test time and total parameters show obvious differences. The TLCNN spends the minimum time (only 3 s) to train the raw data for every epoch, while Resnet50 takes the maximum time of 59 s. For the test time, the TLCNN also displays the best performance, and its cost time is only 0.06 ms for a sample, which is 10~16% of other methods. In addition, the total training parameters in the proposed TLCNN are lowest compared with the three other methods. These results illustrate that the proposed TLCNN shows a relatively excellent recognition accuracy and spends a minimum computational cost.

7 Conclusions

The purpose of this paper was to detect absent sleeper support in a ballasted railway line, with an emphasis on the integration of model-based and data-driven methods. To this end, a mathematical model based on the vehicle–track coupled dynamics theory was established considering absent sleeper support, and an architectural hierarchy of the TLCNN was developed. A verification of results from both the simulation and the experiment was then done to illustrate the accuracy of the mathematical model. The dynamic behaviour of the ballasted track with absent sleeper support was investigated. Lastly, the TLCNN was used to train the raw data of the sleeper vertical displacement and detect absent sleeper support. As the model used is a one-dimensional set of mass-spring-damper systems for track and train, any conclusions concerning the dynamic behaviour or kinematics responses in the track are to be handled with care. This model cannot consider vibration propagation waves and relies very much on the mass-spring-damper parameters assumed for the elements. Some important conclusions were drawn.

  1. (1)

    The sleeper vertical displacement and the rail–sleeper force increase markedly in the absence of sleeper support in the track. For this particular case study (simplification hypothesis assumed by the model and track properties considered), as the number of absent sleeper supports increase from 1 to 5, the peak of sleeper vertical displacement increases from 0.85 to 9.83 mm, and the peak of the rail–sleeper force adjacent to the void zone increases from 73.8 to 225.1 kN. The significant variation of dynamic behaviour would lead to damage to track sequentially. Therefore, the number of continuous absent sleeper supports should be controlled from the view of guaranteeing railway safety, and the detection of absent sleeper support is vital.

  2. (2)

    The TLCNN is an effective way to detect the existence of absent sleeper support with a high accuracy of up to 98%. It can acquire and extract robust characteristics in a noisy environment, which retains over 98% testing performance in terms of F1 score for noisy signals with different SNRs ranging from −5 to 10 dB.

  3. (3)

    The TLCNN is influenced by sample size. When the sample size is 5000, the recognition accuracy rate is only 92.0% for the noisy signal with an SNR of −5 dB. It displays wide applicability if the sample size exceeds 20,000. Meanwhile, to handle more complex recognition tasks and further improve performance, deeper CNN models should be preferentially considered in practical applications. CNN models with three layers show good performance and are recommended for the detection of absent sleeper support.

The current study carried out mainly focuses on heavy-haul railways. There is a fact here, which is that trains operating on heavy-haul railway lines usually travel at speeds below 100 km/h. Therefore, we did not analyse the dynamic response at a speed of up to 250 km/h and identify the damage of absent sleeper support. However, the method proposed by us is general, as it does not depend on the eigenvalues of traditional methods, and has wide extension space. In the future, we will continue to further study the dynamic response at higher speeds exceeding 250 km/h.