1 Introduction

As Bitcoin is universally recognized by more organisations, institutes and governments, it is booming in an increasing number of areas [1]. Currently, many businesses, such as PayPal, Microsoft, and Overstock, have embraced Bitcoin as a method of payment. Meanwhile, various online cryptocurrency trading platforms, such as Coinbase, GeminiFootnote 1, and PayPal, have enabled users to purchase, sell, store, and transfer Bitcoins. As a result, more Bitcoin exchanges are expected to be populated into the Bitcoin blockchain. Unfortunately, due to the confirmation mechanism in the system, only a limited number of transactions (restricted to the capacity of a block) can be confirmed at a time. Therefore, many transactions cannot be immediately confirmed, and confirmation delays commonly occur in the Bitcoin system. Concerned about this, it becomes vital to help a user to understand (if possible) how long it may take for a transaction to be confirmed in the Bitcoin blockchain.

Most previous attempts at estimating the confirmation time for a transaction focus on predicting a specific timestamp or predicting the number of blocks a transaction needs to wait for before it is confirmed  [2,3,4,5,6,7,8,9]. However, it is usually more practical to predict the confirmation time as falling into the corresponding predefined time intervals (e.g., within 1 hour, between 1 hour and 4 hours, and more than 4 hours). It is motivated by the following considerations: On one hand, when attempting to estimate a specific timestamp, one issue is that the estimation performance can be affected by the submission time, especially for transactions that are scheduled for confirmation in the subsequent block. The confirmation time for these transactions is influenced by the remaining time before the next block is produced. Consequently, this can lead to a situation where, as a result of delayed submission, a transaction with a significantly higher fee can experience a longer delay than a transaction with a lower fee if the higher fee one is submitted later than the lower-fee one. The second issue arises from the unpredictable nature of block generation time, which can span from mere seconds to several hundred seconds. As a result, the confirmation time for two transactions submitted at different block heights but confirmed within the same block interval can exhibit unpredictable differences, which may undermine users’ satisfaction when using a client-side transaction system.

On the other hand, by utilizing the block as the unit of measurement for confirmation time, the variance in confirmation time can be significantly diminished. However, a challenge arises as the estimation result can be heavily influenced by a small proportion of transactions, especially when there is a scarcity of historical transactions for that interval. In such cases, the estimation result may become highly dependent on a single or a few transactions. Moreover, when the estimated confirmation time (in terms of both a specific time and a block interval) exceeds a certain level, users tend to pay a higher transaction fee to prioritize the confirmation process. In conclusion, we suggest that as long as the confirmation time falls within an acceptable range, it may be more practical and reasonable to estimate a confirmation time range rather than a confirmation time stamp to system users. Under such background, if we divide the future into a number of block intervals (representing a number of classes), the confirmation time prediction problem can be considered as a classification problem.

The accuracy of transaction confirmation time estimation is crucial for blockchain-based applications. However, existing efforts suffer from four key drawbacks in their frameworks: (1) The existing methods for transaction confirmation estimation do not provide tailored estimates for individual transactions. Instead, most of them estimate the confirmation time for a group of transactions. For example, some works such as [5, 6] estimate the average confirmation time of high-feerate class transactions and low-feerate transactions, while others like [8] estimate the average confirmation time of all the unconfirmed transactions. (2) Models proposed in [3, 10] predict only whether a transaction can be confirmed in the next block, treating the problem as a binary classification task. However, such models may not be sufficient in practice as they do not provide more detailed confirmation information beyond a simple yes or no. (3) Some of the assumptions made in existing approaches [2, 4,5,6,7,8] are not realistic. For example, the confirmation process in blockchain systems is often modeled as a steady-state queueing system [11]. They assume that transaction submission is slower than transaction confirmation, and a fixed number of transactions can be confirmed each time. However, in reality, the number of transactions in a block can vary, and the rate of transaction submissions can exceed the rate of confirmations. (4) There is insufficient utilisation of information on transactions, blocks, and mempool, which can provide further information on the current blockchain system. For example, information in the block sequence can signal the size and generation rate of future blocks, which can help improve the estimation accuracy.

To address the limitations in the previous works, we propose a framework based on neural networks and XGBoost (Extreme Gradient Boosting) [12] to estimate transaction confirmation time for a specific transaction. This framework draws upon three different sources of information: historical transactions within the blockchain, unconfirmed transactions in the mempool, and the estimated transaction itself. In this framework, neural networks are applied to identify complex structures and generate high-level concepts from these inputs. Subsequently, XGBoost is employed to perform classification based on the hidden patterns derived from the neural networks.

To summarize, we have made the following contributions:

  • We comprehensively examine the features of historical transactions in the blockchain, unconfirmed transactions in the mempool, as well as the estimated transaction itself features related to transaction confirmation.

  • We design a strategy to discretize confirmation time into non-overlapping intervals based on transaction distribution.

  • We develop a transaction confirmation time estimation framework Hybrid-CTEN based on neural networks and XGBoost to analyze transaction confirmation time in the Bitcoin blockchain.

  • We demonstrate the efficiency and effectiveness of the proposed framework Hybrid-CTEN in handling complex estimation tasks using real-world blockchain data.

This work builds upon our prior work [13] by proposing a new transaction confirmation time framework based on neural networks and XGBoost. In this work, we also incorporate transformer [14] as an alternative technique for feature extraction, leveraging its well-established efficiency in handling sequential data. Furthermore, during the feature construction process, we have optimized our approach for modeling transaction distribution to better align with the characteristics of real-world blockchain data in both block and mempool analyses. Moreover, we have adopted the strategy of training a model for the entire testing dataset, instead of retraining at each block height, and we have meticulously fine-tuned hyperparameters to achieve improved performance.

The rest of this paper is organized as follows: Section 2 provides a review of the related work and Section 3 defines the transaction confirmation time estimation problem. Section 4 presents our proposed framework Hybrid-CTEN for transaction time estimation. Section 5 details our experimental setup and evaluation. Finally, Section 6 concludes this paper.

2 Related work

Many approaches have been proposed to estimate the transaction confirmation time in the Bitcoin blockchain. In [3, 10], the authors approached the confirmation time estimation problem as a binary classification task, focusing on predicting whether a transaction can be confirmed in the upcoming block. They employed supervised learning models such as SVM, random forest, and AdaBoost to estimate the confirmation delay. This estimation was based on two key factors: the characteristics of the transaction itself and the characteristics of the unconfirmed transactions in the mempool. The latter was described through metrics such as transaction count and transaction feerate distribution.

Other studies such as [2, 4,5,6,7,8] have adopted a different approach by analyzing the distribution of transaction submission and confirmation. Among these, [5, 6] approach the estimation problem by modeling it as a bulk service queueing system denoted as \(M/G^{B}/1\), where transaction arrivals follow the Poisson distribution and batches of transactions (B) are confirmed at a rate with a specified distribution. Balsamo et al. [2] describe it as another type of bulk service queueing systems, \(M/M^{B}/1\), with transaction arrival following the Poisson distribution and the confirmation of batches following an exponential distribution. Zhao et al. [8] introduce the concept of a zero-transaction service within the traditional bulk queueing system, accounting for the possibility of a block containing zero transactions. It assumes that transaction arrivals follow a Poisson distribution, and batch confirmations adhere to a stochastic density function. Apart from these queueing system-based solutions, [4, 7] model the confirmation process as a Cramér-Lundberg process with a fixed rate of transaction arrivals and an exponential distribution for the confirmation of a predetermined number of transactions. In the work [13], transaction confirmation time was approached as a classification problem. They conducted an extensive comparison of various models, including neural networks, ensemble learning models, and a method that solely considers transaction fees.

3 Problem definition

Previous studies have demonstrated that transaction confirmation is a complex process that depends on various factors, such as the characteristics of the transaction, the competition among unconfirmed transactions in the mempool, mining policy, and system resources.

Given a newly submitted transaction \(\hat{tx}\), the goal is to predict its confirmation time (interval) denoted as \(y \in \{y_{1}, \cdots , y_{n} \}\), where \(\{y_{1}, \cdots , y_{n} \}\) constitute a collection of non-overlapping confirmation time intervals, collectively representing the future timeline. \(\mathcal {F}\) is the designed function to estimate the confirmation time within which the submitted transaction will likely be confirmed, relying on various sources of information.

$$\begin{aligned} y = \mathcal F (\text {TxInf}(\hat{tx}), \text {BlockInfo, MemInfo}) \end{aligned}$$

where,

  • TxInf(\(\hat{tx}\)) contains the information of transactions that is related to transaction validation and confirmation in the network, such as transaction feerate, transaction weight, transaction inputs, and submission time.

  • BlockInfo provides information on the characteristics of mined blocks, such as historical transaction feerate, block size, and transaction distribution. These characteristics can implicitly reflect the volume and mining preference of mining activity, which can be helpful in predicting future confirmation time.

  • MemInfo provides information about unconfirmed transactions in the mempool, which is a temporary storage area for transactions that have been broadcasted to the Bitcoin network but are yet to be included in a block. As the capacity of a block is limited in the Bitcoin system, submitted transactions compete with each other to be confirmed in the next block.

4 Methodology

Hybrid-CTEN is a framework based on neural networks and XGboost to estimate the transaction confirmation time in the Bitcoin blockchain. It operates by taking the estimated transaction itself, confirmed transactions in the blockchain, and unconfirmed transactions in the mempool as inputs, and generates the estimated confirmation time for this transaction.

Figure 1 presents the overview workflow of Hybrid-CTEN. It consists of three components: a Data Preprocessing Module, a Feature Extraction Module, and a Confirmation Time Estimation Module. The Data Preprocessing Module encompasses the construction of features derived from the blockchain, which holds records of confirmed transactions in the blocks, mempool information representing unconfirmed transactions, and transaction data encompassing transaction-specific characteristics. Meanwhile, within this module, the entire confirmation time spectrum is discretized into distinct block intervals. In the subsequent Feature Extraction Module, the constructed features from blocks, mempool, and transactions are amalgamated to unveil underlying patterns. These extracted patterns are then relayed to the Confirmation Time Estimation Module for the final estimation process.

4.1 Data preprocessing module

This module is employed for constructing features from transactions, blocks and mempool. It also outlines the procedure of discretizing continuous confirmation time into intervals.

4.1.1 Feature construction

By analyzing the confirmation process in the Bitcoin blockchain, we identified three factors that contribute to transaction confirmation, namely, transaction features, block states, and mempool states.

  • Transaction features describe the unique details of a submitted transaction. We select the features that we believe may affect a transaction’s validation and confirmation.

    • transaction weight: measures transaction sizeFootnote 2.

    • transaction feerate: refers to the transaction fee per size unit, with each unit being approximately equivalent to a quarter of a transaction weight unit. Generally, transactions with higher feerates are considered confirmed earlier than those with lower feerates.

    • number of inputs and number of outputs: mainly contribute to the validation cost of miners, as they need to check the legitimacy of the assets stated in each transaction input by tracking the previous transactions in the blockchain.

    • transaction first-seen time: refers to the moment when a Bitcoin transaction is initially observed by a node in the Bitcoin network. In this work, it is treated as an approximation of the transaction submission time. This is because it can be challenging to accurately determine the exact time when a transaction was submitted to the network, as well as the transaction propagation time in the network before the Bitcoin system observes it.

    • mempool position: indicates the competition among unconfirmed transactions in the system. As higher fee transactions are typically expected to be processed earlier than this one, we computed it by summing the weights of all unconfirmed transactions with higher feerates and subsequently dividing the sum by the maximum block size.

  • Block states encompasses attributes associated with the blocks that have been mined, such as block size and transaction confirmation distribution. These characteristics are used to infer future transaction confirmation and mining preference based on historical transactions confirmed in each block.

Miners typically prefer to include transactions with higher fees to maximize their profits. Therefore, we construct the block states by characterizing the transaction distribution within a block based on their respective feerates. During the distribution construction process, it becomes infeasible to represent every individual feerate value due to the continuous nature of feerate values and the uneven distribution of transactions across different feerates. Hence, we made a trade-off between the precision of the scale and the scale dimension, assigning smaller scales to the lower feerate zones and larger scales to the higher feerate zones, based on the distribution characteristics. Figure 2 illustrates the confirmed transaction distribution across different feerates in the blockchain, revealing that 94% of transactions have a feerate lower than 84 and transactions are more densely packed at lower feerate levels than at higher levels.

Fig. 1
figure 1

An overview of Hybrid-CTEN

Fig. 2
figure 2

Cumulative distribution function (CDF) of transaction across different feerates

Fig. 3
figure 3

The distribution of transactions confirmation within a 2-block interval

Specifically, we partition the complete feerate spectrum into 36 non-overlapping intervals with exponentially increasing interval sizes. The first interval is defined with a feerate range of 0-3, and for each subsequent interval, the maximum feerate is expanded by 10% of the prior interval’s maximum feerate. This progression continues until the final interval, which encompasses all transactions with feerates surpassing the maximum feerate (approximately 84). The division criteria for every three intervals are depicted by the dotted line in Figure 2, using smaller scales for the lower feerate range and larger scales for the higher feerate range.

Fig. 4
figure 4

The distribution of transaction confirmation in the Bitcoin blockchain

Finally, the block states of a block, denoted as \(dist\_b\), are modeled by the distribution of transactions in this block b across each feerate interval u. It is calculated by summing the weight w of every transaction tx in this block with its feerate r falling into that interval according to (1):

$$\begin{aligned} dist\_b(u)=\sum _{{tx \in b, r\in u}}w(tx) \end{aligned}$$
(1)
  • Mempool states indicates the competition among the unconfirmed transactions in the mempool Mem, where transactions compete to be included in the next block. Similar to the block states, we model the mempool states mem as the distribution of unconfirmed transactions across each feerate interval u. It is calculated by summing the weight of each transaction tx with its feerate r falling into that interval, as defined in (2):

    $$\begin{aligned} mem(u)=\sum _{{tx \in Mem, r\in u}}w(tx) \end{aligned}$$
    (2)

4.1.2 Confirmation time discretization

This work discretizes future time into block intervals rather than time intervals. This choice is mainly owing to the unpredictability of block generation time, which can introduce significant variance in the estimation outcomes. Figure 3 illustrates the confirmation time of transactions with a 2-block confirmation interval. We can see that the duration of the 2-block interval can vary from a few seconds to hundreds of seconds. Therefore, a fixed-time range could represent different block intervals, with some cases having a one-block interval and others having 2-block intervals, leading to inevitable classification errors. Fortunately, these variances can be addressed by discretizing future time into block intervals, which mitigates classification errors that may occur due to the variation in confirmation time.

Specifically, we adhere to two discretization rules when dividing the confirmation time (block intervals) into multiple classes: The first rule is that transactions with the same confirmation block interval will be grouped in the same class. The second rule is to try to seek a balance in the number of transaction samples in each class. According to Figure 4, the Bitcoin blockchain system exhibits a long tail in the distribution of transaction confirmation time. Transaction confirmation time ranges from a few blocks to over a hundred blocks, with the majority of transactions being confirmed within 10 blocks. When the confirmation duration exceeds 10 blocks, the distribution becomes sparser across each confirmation block interval.

The discretization process, guided by the two rules, is outlined in Algorithm 1. It encompasses the following steps: (1) Set the lower bound \(e^{low}\) of a new decentralized interval, with the smallest block elapsed time (starting from a 1-block interval) in the unclassified time range. (2) Determine the split ratio \(p_{split}\) to discretize the unclassified confirmation time range, aiming for an equal division of the remaining block elapsed time range. (lines 4–7). (3) Determine the upper bound \(e^{upp}\) of the new decentralized interval by accumulating higher block elapses until the split ratio is reached (lines 9-11), yielding the new decentralized interval with the time range [\(e^{low}\), \(e^{upp}\)]. (4) Repeat (1)–(4) iteratively until k intervals (E) have been defined.

Algorithm 1
figure a

Confirmation Time Discretization Algorithm

4.2 Feature extraction module

After embedding these three sources of information, the Feature Extraction Module is applied to generate high-level concepts from these inputs. Specifically, it takes inputs from sequences of block states and mempool states, as well as the transaction features and then outputs the extracted patterns. In this module, the incorporation of block states is intended to assist in inferring transaction confirmation by considering the volume of future blocks and mining preferences based on confirmed transactions in each block. The sequence of mempool states is utilized to infer the competition among unconfirmed transactions in the future.

As shown in Figure 1, sequence models are first applied to the block sequence and mempool sequence to extract trend information. The generated sequential trend patterns are then combined with the transaction features and passed through a series of stacked fully-connected layers for further pattern extraction. Finally, a softmax algorithm is applied to produce a classification result, which is then used to train the feature extraction module. To mitigate overfitting, each fully-connected layer is trailed by a dropout operation and activated using a ReLU function.

In addition to the Long Short-Term Memory (LSTM) sequence model [15], which aggregates information on a token-by-token basis in sequential order, we have also implemented alternative sequence models that attempt to capture the relationships between different positions of a single sequence to generate a representation for the sequence. These models encompass additive attention [16], self-attention [14], weighted attention [17], and transformer encoder [14]. We compare their performance in processing sequential data, particularly given their significant achievements in handling time series data [18,19,20,21].

4.3 Confirmation time estimation module

In this module, the output of the chosen fully-connected layer in the Feature Extraction Module (the third layer in Figure 1), encompassing both specific and conceptual attributes of the transaction features, block states, and mempool states, will be obtained. Subsequently, XGBoost is employed for the final classification, drawing from its well-established effectiveness and efficiency within the machine learning community.

5 Experiments

5.1 Datasets

We collected transaction data from the block range 621001–622500 via Blockchain.comFootnote 3. Each dataset consists of 225 continuous blocks selected from every 250 blocks. The first 80% of blocks in each dataset (approximately 400000 transactions) are used for training, while the remaining 20% (approximately 100000 transactions) are used for testing. Only newly submitted transactions are selected when selecting instances for training and testing. Once a model is trained, we evaluate its performance on the testing dataset relative to the training dataset, as shown in Table 1.

5.2 Confirmation time discretization

In this work, we discretize the range of transaction confirmation time into four different class sizes, as indicated in Table 2: \(k=2\), \(k=4\), \(k=6\), and \(k=8\). For \(k=2\), the confirmation time range is categorized: ‘confirmed within 1 block interval’ and ‘confirmed within more than 1 block interval (\(\ge 2\) blocks)’. In this case, this problem can be viewed as a binary classification problem of predicting whether a transaction will be confirmed in the next block as studied in the works [3, 10].

In this work, we choose not to further discretize the confirmation time into more than 8 classes, as transactions confirmed beyond 50 blocks are rare at each block interval, as depicted in Figure 4. Meanwhile, Class 8 in the set of 8 classes (k = 8) already encompasses transactions confirmed beyond a 59-block interval, as shown in Table 2.

Table 1 Experimental dataset for transaction confirmation time estimation
Table 2 Transaction confirmation time discretization under class k=2, 4, 6, 8

5.3 Evaluation metrics

In this work, we have selected precision, recall, and f1-score as our primary metrics for evaluation, while accuracy is considered a secondary metric. This choice is due to the nature of the data and the distribution among the prediction classes. Given the uneven distribution of transactions confirmed with various block intervals, approximately 60% of them being confirmed within a 1-block interval and the rest representing a smaller fraction (as indicated in Table 2), an important consideration arises. This situation can lead to a classification model achieving a high accuracy score simply by predicting the majority class, even if its performance on the minority class is unsatisfactory.

  • Primary metric:

    $$\begin{aligned} \text {recall}&=\frac{\text {TP}}{\text {TP+FN}}\end{aligned}$$
    (3)
    $$\begin{aligned} \text {precision }&=\frac{\text {TP}}{\text {TP+FP}}\end{aligned}$$
    (4)
    $$\begin{aligned} \text {f1-score}&=2 \cdot \frac{\text {recall}\cdot \text {precision}}{\text {recall+precision}} \end{aligned}$$
    (5)
  • Secondary metric:

    $$\begin{aligned} \text {accuracy}= \frac{\text {TP+TN}}{\text {TP+TN+FP+FN}} \end{aligned}$$
    (6)

where TP (true positive), FP (false positive), FN (false negative), and TN (true negative) are observed classification results.

5.3.1 Compared methods

We compare the estimation performance of neural network models, ensemble learning models, a baseline model as well as the proposed Hybrid-CTEN framework.

  • Neural Network (NN) refers to neural network models, which have gained prominence in the machine learning community for their effectiveness in addressing classification tasks [22, 23].

    • MLP, Multi-Layer Perceptron, is composed of stacked fully-connected layers. In this work, it takes only transaction features as input.

    • Lstm, Lstm\(^{+}\) and Lstm_prev utilise LSTM as a sequence model to process both block states and mempool states. The output of the LSTM is then combined with transaction features and passed through a fully-connected layer network. The difference lies in that Lstm\(^{+}\) applies a deeper 7-layer fully-connected network, while Lstm_pre models the block states and mempool states applied in the work [13].

    • Adv, Wht, Self, and Transf correspond to models that utilize different attention techniques to extract features from block states and mempool states. These techniques include additive attention [16], weighted attention [17], self-attention [14], as well as the transformer model [14]. The extracted sequential features are then combined with transaction features and fed into a fully-connected layers network.

    In the neural network models, the sequence processing model is configured with 8 hidden units and, if applicable, a sequence length of 3. The fully connected layers consist of a three-layer neural network with hidden unit configurations of [64, 8]. A dropout rate of 0.2 is applied, followed by the specified class size. For the Transformer model, we used 2 attention heads and 32 units in its fully-connected layer. The batch size is set to 1000 when applicable, and the models are optimized using stochastic gradient descent with the Adam optimizer.

  • Ensemble Learning (EL) models enhance prediction performance by training multiple estimators and integrating their predictions [12, 24,25,26,27]. In this work, we study the classification performance of four state-of-the-art ensemble approaches: XGBoost [12], LightGBM [25], Random Forest (RF) [26], and Rotation Forest (RoF) [27], all of which are well-known for their outstanding performance in handling classification tasks. XGBoost is a cutting-edge gradient boosting framework of decision trees that gained popularity in the 2015 Kaggle classification challenge. Compared to XGBoost, LightGBM employs histogram-based algorithms to reduce execution time and memory consumption. RF, ensembling decision trees based on the bagging technique, is popular owing to its generalized performance, high prediction accuracy, and quick operation speed. Meanwhile, RoF has been demonstrated to score much better on classification tests than other ensemble approaches such as Bagging, AdaBoost, and Random Forest [27]. In addition, we also study the classification performance of deep forest (DF) [28], which maintains the layer structure of a neural network while replacing the neurons in the fully connected layers with base estimators (some ensemble learning models). In this work, the base estimators consist of two random forest models and two extremely randomized trees classifiers as introduced in [29]. We also investigate its variants with the introduction of a misclassification penalty, DF_Cost [30]. In both the ensemble learning models and deep forest models, the number of estimators is set to 100 by default. However, for XGBoost, the number of estimators is set to 300, and the booster is set to gbtree. Additionally, the values for \(\gamma =0.1, max\_depth=6, \lambda =2\), and \(colsample\_bytree=0.7\) are set, with all other parameters left at their default values.

  • Hybrid-CTEN is our proposed framework that combines neural network and XGBoost. We implemented two feature extraction frameworks, Lstm and Adv, and applied XGBoost to analyze the extracted features, resulting in two variants: HybridLstm and HybridAdv. Additionally, we introduced a variant HybridLstm\(^{+}\) by stacking a seven-layer neural network with hidden unit configurations of [64, 48, 36, 24, 18, 12] for the first 6 layers, along with a final layer designed for the specified class size, as opposed to the 2-layer network used in HybridLstm.

Table 3 Comparison of model performance across different class sizes

5.4 Result analysis

We compare the prediction performance of our proposed Hybrid-CTEN framework with the other models across different class sizes. Additionally, we conduct a further study on the effectiveness of the feature construction proposed in this work.

5.4.1 Evaluation of classification performance

Table 3 presents the overall performance of different classification models across different class sizes, obtained by averaging the performance across each dataset. We can observe that as the class size increases, the performance of each model decreases under every evaluation strategy. Meanwhile, we can find that both ensemble learning models and our proposed Hybrid-CTEN models outperform neural network models in handling this classification task.

Among all the models, XGBoost achieves the most competitive performance in predicting whether a transaction can be classified into the next block (k = 2). However, when handling more complicated classification tasks, our proposed Hybrid-CTEN framework variants, HybridAdv, HybridLstm, and HybridLstm\(^{+}\), which incorporate the features of block states and mempool states, gradually dominate the prediction performance on precision, recall, and f1-score, outperforming XGBoost. In the classification case k = 4, HybridLstm\(^{+}\) achieves the best performance on precision and f1-score, while HybridLstm achieves the best performance on recall. When the class size increases to 4 and 6, both HybridLstm\(^{+}\) and HybridLstm dominate the other models on all precision, recall, and f1-score. Furthermore, it is worth noting that HybridLstm\(^{+}\) outperforms HybridLstm with more complex feature extraction work, highlighting the significance of block states and mempool states in handling complicated classification tasks. This conclusion is consistent with the observation in neural network models, where incorporating block states and mempool states yields better performance than solely relying on transaction features.

XGBoost dominates the other models in terms of accuracy. This is due to its strength in binary classification and the unbalanced nature of the classification task, which favors predicting the majority class in each task. Figure 5 presents the confusion matrices generated by XGBoost and HybridLstm\(^{+}\) for different classes on dataset S3. It can be seen that for Class 1, where the majority of instances sit, XGBoost outperforms HybridLstm\(^{+}\) with a significantly higher proportion of correct predictions. However, XGBoost loses its dominance in the other classes. The same conclusion can be drawn for the other datasets as well.

In the context of our Hybrid-CTEN framework, the superior performance of HybridLstm and HybridLstm\(^{+}\) over HybridAdv suggests that LSTM-based models are more effective in estimating transaction confirmation time.

Fig. 5
figure 5

Confusion matrix of XGBoost and HybridLstm\(^{+}\) on different classes in dataset S3

Fig. 6
figure 6

Prediction performance evaluation of feature construction

5.4.2 Evaluation of optimization on feature construction

Compared to our previous work [13], we made several adjustments to the feature extraction framework: (1) We updated the transaction distribution modeling in block states and mempool states based on the feerate characteristics of the blockchain. This was achieved by assigning smaller scales to lower feerate zones and larger scales to higher feerate zones, with a 10% incremental increase in each subsequent interval. (2) We increased the number of fully-connected layers in the neural network solutions from 3 to 7 layers. Results on Figure 6 demonstrates the optimization. First, in terms of feature construction on block states and mempool states, we proved its effectiveness by comparing the performance of Lstm models with different feature constructions. We observed that the model Lstm with updated features outperformed the previous one, Lstm_prev. Additionally, we further demonstrate the effectiveness of deeper feature extraction layers through the superior classification performance of HybridLstm\(^{+}\) over HybridLstm, especially when handling complex classification tasks with class size \(k\in \{4,6,8\}\).

Fig. 7
figure 7

Prediction performance evaluation of HybridLstm\(^{+}\) with different extraction layers

Furthermore, we evaluated the classification performance of HybridLstm\(^{+}\) using features extracted from different layers. Starting from the output of Layer_0, which contains the original concatenated features from the outputs of the sequential model and transaction raw features, we compared its prediction performance with features extracted from each of the following 7-layer fully-connected layers. Our results, as shown in Figure 7, indicate that Laye_0 outperforms the other layers, suggesting that the extracted data from the subsequent layers may have missed essential information required by XGBoost.

In conclusion, these findings emphasize the importance of feature selection and extraction in developing effective classification models for transaction confirmation time estimation tasks.

6 Conclusion

This paper approaches the problem of transaction confirmation time estimation by framing it as a classification problem. Then a transaction confirmation time estimation framework, Hybrid-CTEN, is proposed to solve this problem. This framework combines historical confirmation time in the block, unconfirmed transactions in the mempool, and the estimated transaction itself to improve estimation performance over multiple classification tasks. The experiments on the real-world blockchain data demonstrate that other than XGBoost excelling in the binary classification case (to predict whether a transaction will be confirmed in the next generated block), Hybrid-CTEN surpasses state-of-the-art methods in terms of precision, recall, and f1-score across all multiclass classification scenarios (4-class, 6-class, and 8-class).

Our future research will concentrate on two main directions. Firstly, we will optimize our framework for the Bitcoin blockchain. This will involve enhancing the extraction of transaction features and addressing the issue of handling imbalanced data distribution in estimating transaction confirmation time, which is a common challenge encountered in real-world datasets. Secondly, we plan to adapt our proposed framework to different blockchain systems. For example, we will explore its applicability to predicting gas usage in the Ethereum network. This adjustment will involve considering the specific characteristics and requirements of other blockchain platforms to ensure the effectiveness of our framework across diverse contexts.