Few-shot EEG sleep staging based on transductive prototype optimization network

Li, Jingcong; Wu, Chaohuang; Pan, Jiahui; Wang, Fei

doi:10.3389/fninf.2023.1297874

ORIGINAL RESEARCH article

Front. Neuroinform., 06 December 2023
Volume 17 - 2023 | https://doi.org/10.3389/fninf.2023.1297874

Few-shot EEG sleep staging based on transductive prototype optimization network

School of Software, South China Normal University, Guangzhou, China

Electroencephalography (EEG) is a commonly used technology for monitoring brain activities and diagnosing sleep disorders. Clinically, doctors need to manually stage sleep based on EEG signals, which is a time-consuming and laborious task. In this study, we propose a few-shot EEG sleep staging termed transductive prototype optimization network (TPON) method, which aims to improve the performance of EEG sleep staging. Compared with traditional deep learning methods, TPON uses a meta-learning algorithm, which generalizes the classifier to new classes that are not visible in the training set, and only have a few examples for each new class. We learn the prototypes of existing objects through meta-training, and capture the sleep features of new objects through the “learn to learn” method of meta-learning. The prototype distribution of the class is optimized and captured by using support set and unlabeled high confidence samples to increase the authenticity of the prototype. Compared with traditional prototype networks, TPON can effectively solve too few samples in few-shot learning and improve the matching degree of prototypes in prototype network. The experimental results on the public SleepEDF-2013 dataset show that the proposed algorithm outperform than most advanced algorithms in the overall performance. In addition, we experimentally demonstrate the feasibility of cross-channel recognition, which indicates that there are many similar sleep EEG features between different channels. In future research, we can further explore the common features among different channels and investigate the combination of universal features in sleep EEG. Overall, our method achieves high accuracy in sleep stage classification, demonstrating the effectiveness of this approach and its potential applications in other medical fields.

1 Introduction

Electroencephalogram (EEG) is a method for detecting brain signals (Ismail et al., 2016). It uses tiny electrodes attached to the scalp to detect electrical activity in the brain. The EEG signals generated by brain thinking activity can be analyzed and processed by corresponding analysis algorithms, and then converted into corresponding commands to control computers or electronic devices (Hramov et al., 2021).

In recent years, non-invasive brain–computer interfaces (BCI) have achieved significant results in the acquisition of EEG signals (Galán et al., 2008). BCI have been applied in many fields such as sleep signal acquisition, disease diagnosis, emotion analysis, and robot control, which have broad application prospects (Allison et al., 2007).

The application of EEG in monitoring sleep quality is also essential (Sadeh, 2015). Sleep staging can monitor the quality of each sleep segment and determine a person's sleep quality (Carskadon et al., 2005). In the auxiliary diagnosis of some sleep-related diseases such as epilepsy and sleep apnea, sleep staging plays an important role in the diagnosis of the disease (Samy et al., 2013) and can help improve our analysis of these related diseases. The identification of sleep stages is crucial for the diagnosis of sleep disorders, among which obstructive sleep apnea (OSA) is one of the most common diseases (Korkalainen et al., 2019). Traditional manual sleep staging using EEG signals is time-consuming and laborious since it requires analyzing sleep stages from the entire night's sleep signal.

In recent years, a research method for EEG sleep staging using deep learning algorithms has been proposed. Deep learning is a new research direction in the field of machine learning (Arel et al., 2010). It is introduced into machine learning to make it closer to its original goal of artificial intelligence (AI) (Arrieta et al., 2020). Deep learning is a complex machine learning algorithm that learns the internal rules and representation levels of sample data. The information obtained during the learning process is very helpful for interpreting data such as text, images, and sound (LeCun et al., 2015; Tsinalis et al., 2016). The ultimate goal is to enable machines to analyze and learn like humans and recognize data such as text, images, and sound. Deep learning is a powerful tool in the processing of EEG signals and has shown excellent performance in speech and image recognition (Amin et al., 2019; Sun et al., 2019). Traditional EEG sleep staging algorithms include deep learning algorithms or end-to-end trained deep learning algorithms, including convolutional neural network (CNN) or recurrent neural network (RNN) algorithms (Dong et al., 2017; Chambon et al., 2018; Phan et al., 2018; Perslev et al., 2019; Qu et al., 2020), involving state-of-the-art sleep staging networks such as DeepSleepNet (Supratak et al., 2017) and SeqSleepNet (Phan et al., 2019).

The traditional approach to deep learning research involves obtaining a large dataset for a specific task and training a model from scratch using that dataset (Dietterich, 1997). Although deep learning models can achieve high accuracy, the training time and computational cost of this method are significant due to the requirement for large amounts of data (Alzubaidi et al., 2021). In the case of unfamiliar subjects, using deep learning algorithms would require re-calculating, which would consume a considerable amount of computational time and resources. Furthermore, data from different cohorts may come from varying sources due to variations in the number and location of EEG channels, sampling frequency, experimental paradigms, and subject variability, making models trained on one cohort not directly applicable to another, limiting their applicability in clinical settings (Boostani et al., 2017; Andreotti et al., 2018).

Meta-learning, also referred to as learning to learn, involves a systematic examination of the performance of various machine learning methodologies across a diverse spectrum of learning tasks. This process enables the acquisition of knowledge from the amassed meta-data, allowing for significantly accelerated learning of novel tasks beyond conventional capabilities (Vanschoren, 2019). This not only expedites and enhances the development of machine learning workflows and neural network architectures but also facilitates the replacement of manually engineered algorithms with innovative data-driven approaches. Yaohui Zhu proposed a multi-attention meta-learning (MattML) method for few-shot finegrained image recognition (FSFGIR) (Zhu et al., 2020). Instead of using only base learner for general feature learning, the proposed meta-learning method uses attention mechanisms of the base learner and task learner to capture discriminative parts of images.

Meta-learning algorithms can enable cross-subject EEG sleep staging, greatly reducing the training time required for sleep staging. Nannapas proposed a meta-learning MAML-based method, MetaSleepLearner, for sleep staging EEG signals (Finn et al., 2017; Banluesombatkul et al., 2020). They introduced a transfer learning framework based on model-agnostic meta-learning (MAML) to transfer acquired sleep stage knowledge from a large dataset to new individual subjects. The accuracy achieved on the Fpz-Cz validation channel was 72.1% and the MF1 score was 64.8, demonstrating the feasibility of cross-subject EEG sleep staging. Shi et al. (2023) used meta-transfer learning, proposed MTSL, further improved the feature extracter based on the meta-learning framework, and introduced the idea of transfer learning to improve the performance of sleep staging in small sample scenarios through a new meta-transfer framework, and achieved 79.8% ACC on sleep-EDF. In MetaSleepLearner, the experiment uses too many training sets for hybrid training and requires fine-tuning on unused subject data, so the complexity of the experiment does not favor a specific implementation. In MTSL, a multi-stream parallel CNN network is used to extract EEG features from each of the three scales, and finally, the multi-scale features are fused through feature splitting to obtain the final EEG feature representation. Since the network features are too complex, the running time and computational consumption are too large and difficult to implement.

In this study, we propose a few-shot EEG sleep staging based on transductive prototype optimization network (TPON) method to improve the accuracy of cross-subject EEG sleep staging. The aim is to improve the performance of cross-subject EEG sleep staging and also to achieve innovative cross-channel recognition of sleep with good performance. Our experiments are carried out with 20 subjects in the Sleep-EDF dataset, which has a small amount of data and moderate network complexity. We use the prototype network model in meta-learning. Compared with traditional machine learning and other meta-learning methods, our experiment has shorter training time and higher accuracy improvement. Our experiment is based on the prototype network method of meta-learning proposed by Snell et al. (2017). To improve it, we utilize the Transductive Distribution Optimization (TDO) algorithm proposed by Liu et al. (2023). Our experiment is conducted on the Fpz-Cz and Pz-Oz channels and uses the AASM (Berry et al., 2012) scoring standard, which classifies sleep into five stages.

The main contributions of this study are as follows:

• We propose a few-shot EEG sleep staging based on transductive prototype optimization network (TPON) method to improve the performance of cross-subject EEG sleep staging.

• By using few-shot learning and TPON method, we effectively alleviated the problem of too few samples in sleep staging and improved the generalization ability to new subjects.

• In the five-way 15-shot scenario, the cross-subject sleep staging accuracy of TPON can be improved to 87.1%, MF1 to 81.7, and the cross-channel sleep staging can also achieve an accuracy of 82.4%. Additionally, we first experiment and discuss the feasibility of cross-channel sleep staging recognition.

2 Proposed method

In this study, we propose a few-shot EEG sleep staging based on transductive prototype optimization network (TPON) method to improve the accuracy of cross-subject EEG sleep staging. Our experiments are based on the prototypical network approach of meta-learning, where prototypes are used in combination with high confidence unlabeled samples to achieve subject transfer.

2.1 Overall framework of TPON

The overall framework of TPON is depicted in Figure 1. Different subjects will be used for both meta-training and meta-testing, which is shown at stage A in Figure 1, 19 of them for meta-training and the remaining one for meta-testing. In the meta-training phase, we combine the sleep data of 19 meta-training subjects. Each participant had two nights of data. In the meta-testing phase, data from two nights of one meta-testing subject are combined. We also cross-tested 20 times to obtain average results for all subjects. During meta-testing and meta-training, the sleep network shares the weight values. In the meta-training stage, prototypes of five sleep cycles are obtained through 50 experiments and randomly averaged sampling. Then, during our meta-testing phase, unlike meta training, due to the few-shot size of the meta-testing set.

FIGURE 1

Figure 1. The overall framework of TPON.

Phase B in Figure 1 is the backbone network we used, including SleepNet combined with Transformer. The C and D stages in Figure 1 are the improved prototype network and transformation distribution optimization (TDO) methods we used, respectively. The mentioned three phases are discussed in detail in the following sections.

We utilizing the various distance metric functions, including the Cosine distance formula, the Manhattan distance formula, the Euclidean distance formula, and the Chebyshev distance formula. By comparison, we then obtain the highest accuracy among the four. We identify the class to which a test EEG data segment belongs based on its proximity to the prototype. We then compute the average accuracy, precision, loss, and F score for each of the five categories.

2.2 Prototypical networks

In this study, we focus on the prototypical network approach, which is shown in Figure 2 and stage C in Figure 1. The method is based on the classification of sleep EEG signals. We have developed our prototypical network model, which maps sleep EEG signals to embedding vectors and uses their clustering for classification (Hori et al., 2001). The novel feature of our model is that it constructs a richer embedding space through a learned prototypical network related to EEG sleep, such that EEG signals can be projected there. In the Figure 2, we show the five-way five-shot during the five periods of sleep. They are clustered under a distance metric of orientation and class relevance, which is then used for classification (Schultz and Joachims, 2003; Chen et al., 2009).

FIGURE 2

Figure 2. Prototypical network.

In few-shot learning (Sung et al., 2018; Wang et al., 2020), if our task is an N-way K-shot, then the support set S, with K-labeled samples, can be expressed as follows:

\begin{array}{l} S= {(X_{1}, Y_{1}), (X_{2}, Y_{2}), …, (X_{N}, Y_{N})} & (1) \end{array}

Each prototype is an average vector of embedded support points belonging to its class. To better represent the features of each class, the average value of the features of each class is computed by the backbone network F, which is called the prototype C_q. Under the K-shot dimension, x_i∈{1, …, K} is the eigenvector of class i, and i is any one of the N classes, y_i∈{1, …, K} are the labels of the corresponding category. Then, S_q respectively represent the support set of class q. $| S_{q} |$ is expressed as the absolute value of S_q. The calculation formula is as follows:

\begin{array}{l} C_{q} = \frac{1}{| S_{q} |} \sum_{(x_{i}, y_{i}) \in S_{q}} F (x_{i}), & (2) \end{array}

Based on the distance from the embedding space to the prototype, the distance metric function d is given, the prototypical network generates a distribution over classes for the query point x using a softmax activation function, which is computed as follows :

\begin{array}{l} P (y = q | x) = \frac{\exp (- d (F (x_{i}), C_{q}))}{\sum_{q^{'}} \exp (- d (F (x_{i}), C_{q^{'}}))}, & (3) \end{array}

Specifically, P(y = q|x) means that the query sample x is compared to all $C_{q^{'}}$ prototypes, classifying x as a probability value of class q.

A common prototypical network consists of an backbone network that maps sleep EEG signals to embedding vectors. One batch contains a subset of the available training EEG signals. EEG data from each class is randomly split into support and query sets. The embedding of the support set is used to define the class prototype, i.e., the prototype embedding vector of the class. By using a metric function to measure the distance between the query set and the prototype, the query set is classified.

2.3 Distance metric

For prototypical networks and matching network, any measurement function is allowed. In our experiment, d_Cos means Cosine distance, d_Man means Manhattan distance, d_Euc means Euclidean distance, and d_Che means Chebyshev distance, they were used as comparisons. We obtain the cosine distance as our best matching and most accurate measurement function.

If there is a query sample Z_f, its high-dimensional spatial characteristics can be expressed F(Z_f), and n represents the dimension of the vector. Therefore, the distance function can be used to obtain the distance between the high-dimensional vector F(Z_f) of our query sample and the prototype C_q. The distance calculation formula is as follows:

\begin{array}{l} d_{Cos} (F (Z_{f}), C_{q}) = - cos θ = - \frac{F (Z_{f}) \cdot C_{q}}{| | F (Z_{f}) | | | | C_{q} | |}, & (4) \end{array}

\begin{array}{l} d_{Man} (F (Z_{f}), C_{q}) = \int_{k = 1}^{n} | F (Z_{f, k}) {- C}_{q, k} |, & (5) \end{array}

\begin{array}{l} d_{Euc} (F (Z_{f}), C_{q}) = ({\int_{k = 1}^{n} {| F (Z_{f, k}) {- C}_{q, k} |}^{2})}^{\frac{1}{2}}, & (6) \end{array}

\begin{array}{l} d_{Che} (F (Z_{f}), C_{q}) = \underset{f}{MAX} | F (Z_{f}) {- C}_{q} |, & (7) \end{array}

After the Cosine distance between the query sample and the prototype C_q is obtained, the negative value of the Cosine distance between the query sample Z_f and the prototype C_qis formed into a probability distribution on the class through the softmax function. The calculation formula is as follows:

\begin{array}{l} P (y_{n} ∣ Z_{f}) = \frac{\exp (- dist (F (Z_{f}), C_{q}))}{\sum_{n = 1}^{q} \exp (- dist (F (Z_{f}), C_{q}^{'}))}, & (8) \end{array}

where C_n(n = 1, 2, …, Q) is the prototype of class n and dist() can represent four different distance measurement functions.

At the same time, to have a good evaluation index during model training, the experiment uses cross entropy as loss function to train and then minimizes the loss function. The calculation formula is as follows:

\begin{array}{l} Loss = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{n = 1}^{q} y_{i n} \times log (p (y_{n} ∣ Z_{f})), & (9) \end{array}

where n is the number of query samples and y_i is the actual label of the sample.

2.4 Transductive distribution optimization

Due to the small number of subjects per sleep period in the sleep data SleepEDF-2013, the problem is that the selected sleep segments do not accurately describe our actual prototypes. To solve this problem, we used the TDO method based on the original prototype network.

We propose to use the prototype network approach of TDO to capture the features of new classes. We first used the original method, using labeled samples from the five sleep epochs as a support set, to obtain prototypes of the five sleep epochs. However, due to the small number of samples in the learning process, it is not possible to accurately obtain the true distribution of each class. Therefore, we introduce TDO, which combines support set and high-confidence unlabeled sample query set to improve the matching degree of prototypes in prototype network, which is depicted in Figure 3 and stage D in Figure 1. Algorithm 1 summarizes the prediction process of our proposed method.

FIGURE 3

Figure 3. The transductive distribution optimization (TDO) method.

ALGORITHM 1

Algorithm 1. Transductive prototype optimization network (TPON) algorithm.

The main steps are illustrated in the figure above, which include the following three parts:

2.4.1 Generate an original class boundary

In the first stage, we generate an original class boundary using the backbone network and the labeled data. The feature extraction prototype network extracts an original prototype for a five-way N-shot task by using the backbone network, generating an original class boundary, where all the support set samples come from labeled data. For an N-way K-shot task, to find out the similarity scores between all query samples and the support classes, we use the sample-to-class metric measure to get the relational matrix R^{(N × q) × N} of similarity probability scores.

2.4.2 Generate a new class boundary

The class distribution is optimized by using a robust feature extractor to capture the feature distribution of each class. It is based on the original support set and some highly confident unlabeled query samples to obtain a ground-truth prototype of the transformed distribution. New class boundaries are generated by combining the original support set and some highly confident unlabeled query samples. The goal is to generate a new classifier that predicts the labels of all remaining query samples.

Then, we obtain a similarity probability score matrix B∈R^{(N × q) × N} between each prototype C_q and all query samples Z_f. For each class prototype C_q, we select the top k query samples with the highest similarity probability score as the prototype class candidate set:

\begin{array}{l} K_{q} = {f | b_{q f} \in t o p k (B_{q *})}, \\ C_{q} = {x_{q} | x_{q} \in Q, q \in K_{q}}, & (10) \end{array}

topk(·) is an operator to select the top k elements from each row of the matrix B, k is a hyperparameter that denotes the number of samples in the prototype class candidate set for each class, and Q denotes the query set after Tukey's transformation. Ki stores the index of the k most similar query samples of class i and Ci stores the samples corresponding to Ki.

2.4.3 Generate a new classifier

A new classifier is generated by combining the original support set and some highly confident unlabeled query samples to generate new class boundaries. With the goal of predicting the labels of all remaining query samples, the mean of the feature distribution for each class is then computed using the support set and the candidate set of prototype classes:

\begin{array}{l} C_{q}^{'} = \frac{\sum_{x_{q} \in S_{f} \cup C_{q}} x_{q}}{| S_{q} | + k}, & (11) \end{array}

where $| S_{q} |$ denotes the number of samples in S_q and k denotes the number of samples in C_q. This significantly removes the distribution bias caused by the category mismatch. Our method does not introduce any additional rational parameters and can be paired with most classification models and feature extractors. The introduction of this approach does not add a significant amount of computation, but it can greatly improve the classification accuracy and achieve significant learning results.

3 Dataset and experimental setup

In this section, we present the details of our experiments on the proposed method, including dataset and experimental setup. For our experiments, the hardware and software configurations used in our experiments are based on a platform with an Nvidia RTX 3090Ti, Ubuntu 16.04, and PyTorch 1.9.0.

3.1 Datasets

This section describes the use and preprocessing of the experimental data. The experiment used the benchmark sleep data disclosed by PhysioNet Sleep-EDF, which included 20 healthy subjects (26–35 years old), including 10 healthy men and 10 healthy women. The polysomnography (PSG) recording time of each person is about 20 h. This dataset includes sleep EEG of healthy subjects' SC. The * PSG.edf as the suffix contains EEG (from Fpz-Cz, Pz-Oz electrode positions) and the * Hypnogram.edf files contain the notes of sleep mode corresponding to PSG. The sampling rate was 100 Hz for all EEG.

Sleep experts manually divide these records into eight categories (W, N1, N2, N3, N4, REM, MOVEMENT, and UNKNOWN). These modes (hypnograph) include sleep stages W, N1, N2, N3, N4, REM, M (body movement time), and “?” (unknown time). This PSG is segmented into 30-s epochs, which are then be classifified into different sleep stages by the experts according to sleep manuals such as the Rechtschaffen and Kales (R&K).

We combined the N3 and N4 phases into a single phase N3 to maintain the AASM standard (Berry et al., 2012). At the beginning and end of each recording, there is a long period of W-phase in which the subject is not sleeping, which we cut off. We only include 30 min before and after the sleep time, and delete M (body movement time) and “?” (unknown time). The notes during EEG sleep have been given separately in the hypnographic files available in the database. Sleep notes are provided every 30 s in each EEG signal to note which sleep stage it belongs to. We divided the sleep EEG of 20 healthy subjects into meta-training subjects and meta-testing subjects. One of the 20 subjects was used as a meta-testing subject, and the remaining 19 subjects were tried as our meta-training subjects, so we can do a 20-fold cross check. We take a time window of 30 s to intercept the sleep samples. Sleep data from two nights for each subject were fused into one subject.

3.2 Data enhancement

There is a problem of imbalanced categories in the dataset of SleepEDF-2013. The number of a certain category of training is too small during the meta-training. To solve this problem, we adopted the method of oversampling in the meta-training dataset and kept the number of meta-training data consistent during the five sleep periods by randomly copying the original category of EEG. Let the backbone network learn the category information in an efficient and balanced way without the problem of class imbalance.

3.3 Backbone network

In this study, we proposed a feature extraction network to analyze sleep EEG signals. The network consists of two main components: a convolutional neural network (SleepNet) and a multilayer transformer encoder module. Which is shown in Figure 4 and Stage B in Figure 1. The CNN is used to extract local features from the signals, while the transformer is used to capture global correlations between different parts of the signals.

FIGURE 4

Figure 4. Backbone network.

3.3.1 SleepNet features extraction

The CNN component comprises three convolutional layers with batch normalization and dropout, followed by a linear layer, as shown in the Table 1. The first layer has 64 filters with a kernel size of 64 and a stride of 16. The second layer has 128 filters with a kernel size of 8 and a pooling layer with a kernel size of 4. The third layer has 256 filters with a kernel size of 8 and a pooling layer with a kernel size of 4.

TABLE 1

Table 1. CNN feature extraction.

Formally, SleeNet extracts the ith feature from one EEG epoch X_i, CNN_{(_θ_r)} represents CNN converted from single channel EEG to eigenvector, and θ_r is the variable parameter of the CNN. The size of f_{X_i} depends on the sampling rate of input EEG. In the formula, f represents the CNN network we use. As shown in the following formula:

\begin{array}{l} f_{(X_{i})} = {CNN}_{(θ_{r})} (X_{i}), & (12) \end{array}

the network is trained using the NAdam optimizer to minimize the cross-entropy loss.

3.3.2 Transformer encoder module

The output of the third pooling layer is fed to the transformer component, which consists of encoder layers and transformer encoder. The encoder layer has a dimensionality of 128 and an attention mechanism. The encoder is applied to the input signals, and the output is averaged along the time axis before being fed to a fully connected layer with a 128-dimensional output. After the feedforward layer, our output feature vector is fed into the prototype network. After feature extraction of transformer encoder module, feature output formula is as follows:

\begin{array}{l} F_{(X_{i})} = Encoder (f (X_{i})), & (13) \end{array}

where F_{X_i} represents features extracted by CNN and Transformer.

3.4 Model settings

Before using a prototypical network, we need to extract features from the collected data. We can construct a prototypical network architecture with five-way (1-shot, 3-shot, 5-shot, 10-shot, 15-shot, 20-shot, 25-shot). We randomly select 1, 3, 5, 10, 15, 20, and 25 epochs from the W, N1, N2, N3, and R phases of the meta-training set, respectively. The five sleep phases are preprocessed and SleepNet with transformer is used as our pre-trained neural network to compute prototypes for each sleep phase.

We divided a batch process into a support set and a query set, utilizing the embedding vectors of the support set to establish a class prototype. This prototype represented a typical embedding vector of a given class, and we then utilized values closely related to it for classification to compare the performance of our approach. In our experiment, Cosine distance, Manhattan distance, Euclidean distance, and Chebyshev distance were used as comparisons, and we obtain the Cosine distance as our best matching and most accurate measurement function. As such, we adopted the Cosine distance function as our distance evaluation metric.

To train the backbone network, we use the NAdam optimizer. To learn feature centers for distinct classes, we employ a stochastic gradient descent optimizer with a learning rate of 0.0009 and a center-loss weight of 0.0009. Among them, we use the pre-trained neural network to extract feature vectors, take the average value, conduct normalization processing, and use softmax for prediction analysis. Experiments were performed on 50 times and then fine-tuned by taking the average loss of gradient descent.

In the meta-testing, the remaining subjects from the meta-training were used as the meta-testing set, as a previously unseen category, for cross-subject EEG sleep staging. We randomly selected N samples in five sleep periods from the meta-testing set, and the five groups of samples were used as the meta-testing support set. Similar to meta-training, after building the initial prototype network on the first layer, we introduced the TDO method, including introducing a high confidence unlabeled query set as our support set, and recalculating the prototype network. All the rest sleep data is used as the meta-testing set, and the construction task is verified repeatedly. The average accuracy is taken as the accuracy of the final test result and the ACC, F1 values, and the accuracy of each category are obtained.

Since the experimental results may vary depending on the chosen support set sample, this experiment is repeated 50 times using the support set randomly, and the average precision is obtained as the final statistical result of the experiment. The average accuracy is taken as the accuracy of the final test result and the ACC, F1 values, and the accuracy of each category are obtained.

3.5 Evaluation

In our experiments, different subjects were used for cross-subject validation for meta-training and meta-testing subjects, and the meta-testing query set did not include meta-training subjects. Our experiments were conducted for 20 rounds, thus validating our experimental results. The experiment is a multi-class classification task for sleep staging. Accuracy, F-measure, recall, and kappa values are used to evaluate the performance of sleep staging. The overall performance is evaluated in terms of accuracy and Cohen's Kappa coefficient. The above evaluation metrics are formulated as follows:

\begin{array}{l} A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}, & (14) \end{array}

\begin{array}{l} κ = \frac{Accuracy - P_{e}}{1 - P_{e}}, & (15) \end{array}

\begin{array}{l} R e c a l l = \frac{T P}{T P + F N}, & (16) \end{array}

\begin{array}{l} P r e c i s i o n = \frac{T P}{T P + F P}, & (17) \end{array}

\begin{array}{l} F_{1} = \frac{2 \times Recall \times Precision}{Precision + Recall}, & (18) \end{array}

where TP is true positive, TN is true negative, FP is false positive, FN is false negative, and P_e is the hypothetical probability of chance agreement.

4 Results and discussion

In this section, we present a detailed analysis of the experimental results, including the performance of the sleep stage and the confusion matrix, and compare them with state-of-the-art experiments. We also conducted qualitative and quantitative experiments, including using different shots, different distance metric functions, and different learning rates. The t-SNE plots of the ablation experiments are also compared in the experiment to demonstrate the effectiveness of our experiment. We also discuss the feasibility of cross-channel sleep analysis and the limitations of our experiments.

4.1 Sleep stage scoring performance

In Table 2, we show the performance difference between the most advanced algorithm and our proposed prototypical network TPON on the Sleep-EDF-2013 dataset. It included two channels, Fpz-Cz and Pz-Oz (compared using five-way 15-shot method).

TABLE 2

Table 2. The performance difference between the most advanced algorithm and our proposed TPON method.

Our proposed meta learning cross-subject sleep segmentation algorithm TPON can be seen from Tables 2, 3. Under the Fpz-Cz channel, we use fewer subjects and samples than other meta-learning algorithms, such as MetaSleepLearner. However, the accuracy is improved by 15%, and the F1 score and MF1 for the five sleep epochs are higher than those of MAML's MetaSleepLearner using Meta-learning, achieving a phased achievement. Compared to the traditional deep learning algorithm DeepSleepNet, the overall accuracy of TPON under the Fpz-Cz channel is also improved by 5.1%. This is a pioneering use of meta-learning algorithms and makes them comparable in accuracy to traditional machine learning algorithms. Cohen's Kappa values increased to 0.82 and MF1 score increased to 81.7. In the Pz-Oz channel, the accuracy rate reached 83.3, the MF1 reached 73.5, and Cohen's Kappa value reached 0.77.

TABLE 3

Table 3. PR, RE, and F1 performance in Fpz-Cz and Pz-Oz channels from the SleepEDF-2013 datase.

TPON is a prototype network algorithm proposed by us. It can better train our sleep EEG, after extracting sleep features from the backbone network. TPON can be effectively mapped into the space to achieve a better classification effect. In the process of cross-subject identification, we can also train and identify our unfamiliar subjects across the subject to achieve a more accurate classification effect.

To perform a comprehensive experimental analysis of the proposed TPON, we performed ablation experiments, as can be seen in Table 4. Our ablation experiments are compared with our TPON using the full experimental procedure. The first method is TPON. The second approach is to remove the attention mechanism. The third approach excludes the TDO mechanism. Finally, the fourth approach excludes the attention mechanism and the TDO mechanism. The fifth method excludes the TDO mechanism and the prototype network and adopts the traditional deep learning approach. The results of the ablation experiments are shown in Table 4. From Table 4, it can be seen that TPON is the best sleep staging method in our ablation experiments.

TABLE 4

Table 4. Ablation study on TPON.

Figure 5 shows confusion matrices on the Fpz-Cz and Pz-Oz channels from the Sleep-EDF datasets. Each row and column represent the number of 30-s EEG epochs of each sleep stage classifified by the sleep expert and our model, respectively. The last three columns in each row indicate per-class performance metrics computed from the confusion matrix. Table 3 shows the PR, RE, and F1 performance in Fpz-Cz and Pz-Oz channels from the SleepEDF-2013 datase. PR means precision, RE means recall, and F1 means F-measure.

FIGURE 5

Figure 5. Confusion matrix obtained Fpz-Cz and Pz-Oz channels from the SleepEDF-2013 dataset.

4.2 Training time performance

In this section, we will calculate the training time of our proposed few-shot EEG sleep staging algorithm, TPON, based on the prototype network. This includes the training time for each validation fold on each node, with a total of 20 validation folds.

Our proposed few-shot EEG sleep classification algorithm effectively solves the problem of long training time in traditional deep learning. The network initialization can quickly adapt to new tasks and has the ability to train models on a small number of samples, including “learning to learn” features. TPON introduces a prototype-based meta-learning algorithm during the training process, greatly reducing the time for single test validation. The training time for each validation fold is ~22 min, which greatly saves computation energy and reduces the time for doctors to manually stage patients.

4.3 Consideration of N-shot learning

In five-way N-shot, we analyze the effect of variation in the number of shot samples on the accuracy of sleep quintet classification and MF1 measurement. We show the accuracy of different shot quantities in the Fpz-Cz and Pz-Oz channels, which is shown in Figure 6. It can be seen that the accuracy of the experiment increases with the number of shots. In the Fpz-Cz channel, the highest accuracy is 87.1% at 10-shot. In the Pz-Oz channel, the highest accuracy is 83.3% at 15-shot.

FIGURE 6

Figure 6. The accuracy of different shot quantities in the Fpz-Cz and Pz-Oz channels.

We can see that the five-way 15-shot method of using few-shot learning is similar to the traditional method of using deep learning with a large number of data samples.

4.4 Different metric distance functions

We compare different distance measurements functions in Fpz-Cz channel which is shown in Figure 7. In the prototypical networks, we use different distance metric functions as our benchmarks, and finally obtain that the most efficient distance metric function is the Cosine distance function. In our experiments, the Cosine distance, the Manhattan distance, the Euclidean distance, and the Chebyshev distance are used as comparisons. We can see from the following figure that under the condition of using the same five-way (1-shot to 25-shot), and using different distance measurement functions, the prototypical networks has different effects. Finally, we choose the best performing Cosine distance as the metric function for our prototypical network.

FIGURE 7

Figure 7. Compare different distance measurement functions in different shot.

4.5 Performance of t-SNE visualization in the attention mechanism ablation experiment

In this section, we will present three t-SNE plots, which are shown in Figure 8. We used ablation experiments to compare TPON performance without a TDO, attentional mechanism. By comparing the t-SNE plots of the two cases, we can observe that the use of the TDO and attention mechanism leads to a clear clustering effect, a more reasonable sample distribution, and a better representation of the distance between samples of different classes. This indicates that we are able to capture the differences between different classes more accurately with the TDO and attention mechanism. Therefore, we can conclude that in sleep EEG staging, the use of TDO and attention mechanisms can increase the performance of classification.

FIGURE 8

Figure 8. Performance of t-SNE visualization in the TDO and attention mechanism ablation experiment. (A) Remove the TDO mechanism. (B) Remove the attention mechanism. (C) Transductive prototype optimization network (TPON).

4.6 The effect of learning rate

In Figure 9, we consider the impact of learning rate on our backbone network feature extraction. Therefore, we use a variety of ways to compare learning rates. We use the 15th subject as our meta-testing subject to reduce training time. In the meta-training phase and meta-testing phase, we used to select the most suitable learning rate η∈{1 × 10⁻¹, 1 × 10⁻², 1 × 10⁻³, 1 × 10⁻⁴, 1 × 10⁻⁵, 1 × 10⁻⁶, 0}, as well as the number of training iterations. The default maximum training iterations were set to 50, respectively. It can be seen that there are significant differences in the feature extraction effect among different learning rates. When the learning rate is 1 × 10⁻³, the learning effect is the best. After subdivision learning, we finally determined the learning rate to be 0.0009.

FIGURE 9

Figure 9. The effect of learning rate.

4.7 Cross-channel sleep staging

In the meta-learning, the meta-training stage trains the ability of the model to “learn to learn.” We propose the hypothesis that the manually segmented sleep data by the physician has only one channel. We need to classify the sleep EEG signals of another channel. So, we can perform meta training on the sleep data of existing channels. Then, the model not only has the ability to “learn to learn” but also has the ability to recognize unfamiliar channel data to a certain extent.

Therefore, we propose a cross-channel EEG recognition network. The key to this idea is to use and train on EEG sleep data from known channels, and then perform sleep staging on EEG sleep data from unfamiliar channels. Our experiment uses the same mechanism as TPON. The only difference is that our meta-training subjects and meta-testing subjects used different sleep channels. Adopting this approach is to simulate the real-life situation described above.

Our experiment used Pz-Oz channel data as the meta-training set and Fpz-Oz channel data as the meta-testing set. The data of the Pz-Oz channel include 19 subjects, while the data of the Fpz-Oz channel uses the remaining subjects. To ensure the rigor of the experiment, we repeated it 20 times and calculated the average value.

From the experimental results, our cross-channel EEG sleep staging achieves good results. Especially in the case of five-way 25-shot, the accuracy is 82.3%, which has reached a high level.

4.8 Limitations of the study

Our dataset refers to the dataset adopted by DeepSleepNet, using the data of 20 SC subjects (healthy subjects) in Sleep-EDF, which included 20 healthy subjects (26 to 35 years old), including 10 healthy men and 10 healthy women. But we might be dealing with real-world people with sleep disorders, so the results could be biased.

There is another limitation here, which is that performance is slightly worse when there is a large difference between labeled and unlabeled data. This includes training with one type of data and testing with another, and the data are very different, which may be slightly less effective in our cross-data testing.

In the t-SNE diagram of TPON (Stage A and Stage B in Figure 8) we learned, we can see that our W, N2, N3, and R stages have obvious distribution intervals, and the distribution differences are obvious. However, the N1 phase is not better separated and mixed with the R phase, which causes difficulties in our segmentation, and many N1 phases are misclassified as R phases.

5 Conclusion

In this study, we propose a few-shot EEG sleep staging based on transductive prototype optimization network (TPON) method. A modified version of the prototypical network algorithm was used for the experiments, and the Cosine distance function was used as the distance metric function. Given the diverse nature of EEG sleep data across subjects, efficient adaptation and training with new data from previously unseen subjects remains a significant challenge. Our future work is to experimentally improve the problem of having too few N1 stages in the meta-testing dataset. The low accuracy for N1 staging can be explained by the fact that most of the disagreements occurred during transitions between sleep stages and N1 stage typically has a lower bout length (number of consecutive 30-s epochs scored as N1) compared to the other stages (Rosenberg and Van Hout, 2014). Although the problem of having too few N1 stages is related to the proportion of N1 stages in the whole night during human sleep, we can introduce a relevant proportionality coefficient to solve the problem of having too low a fraction of N1. Our future research directions also include the adoption of more advanced meta-learning algorithms, followed by the improvement of our backbone network and the adoption of dynamic convolutional neural networks to address the problem of imbalanced sample distributions and too few support set samples in few-shot learning.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found at: https://physionet.org/content/sleep-edfx/1.0.0/.

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

JL: Formal analysis, Investigation, Methodology, Writing—original draft. CW: Formal analysis, Investigation, Methodology, Software, Writing—original draft. JP: Conceptualization, Project administration, Validation, Writing—review & editing. FW: Methodology, Supervision, Validation, Writing—review & editing.

Funding

This work was supported by the STI 2030-Major Projects 2022ZD0208900, the National Natural Science Foundation of China (Grant Nos. 62006082 and 61906019), the Key Realm R and D Program of Guangzhou (Grant No. 202007030005), and the Guangdong Basic and Applied Basic Research Foundation (Grant Nos. 2021A1515011600, 2020A1515110294, and 2021A1515011853).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Allison, B. Z., Wolpaw, E. W., and Wolpaw, J. R. (2007). Brain-computer interface systems: progress and prospects. Expert Rev. Med. Devices 4, 463–474. doi: 10.1586/17434440.4.4.463

PubMed Abstract | CrossRef Full Text | Google Scholar

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., et al. (2021). Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J. Big Data 8, 1–74. doi: 10.1186/s40537-021-00444-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Amin, S. U., Alsulaiman, M., Muhammad, G., Mekhtiche, M. A., and Hossain, M. S. (2019). Deep learning for EEG motor imagery classification based on multi-layer cnns feature fusion. Future Gener. Comput. Syst. 101, 542–554. doi: 10.1016/j.future.2019.06.027

ORIGINAL RESEARCH article

Few-shot EEG sleep staging based on transductive prototype optimization network

1 Introduction

2 Proposed method

2.1 Overall framework of TPON

2.2 Prototypical networks

2.3 Distance metric

2.4 Transductive distribution optimization

2.4.1 Generate an original class boundary

2.4.2 Generate a new class boundary

2.4.3 Generate a new classifier

3 Dataset and experimental setup

3.1 Datasets

3.2 Data enhancement

3.3 Backbone network

3.3.1 SleepNet features extraction

3.3.2 Transformer encoder module

3.4 Model settings

3.5 Evaluation

4 Results and discussion

4.1 Sleep stage scoring performance

4.2 Training time performance

4.3 Consideration of N-shot learning

4.4 Different metric distance functions

4.5 Performance of t-SNE visualization in the attention mechanism ablation experiment

4.6 The effect of learning rate

4.7 Cross-channel sleep staging

4.8 Limitations of the study

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher's note

References

This article is part of the Research Topic

People also looked at