1 Introduction

Infectious diseases, such as middle east respiratory syndrome coronavirus (MERS), Zika virus (ZIKV), and coronavirus diseases 2019 (COVID-19), have spread worldwide, causing a number of casualties [1, 2]. To prevent the rapid spread of the pandemic diseases, the early detection is of great significance and various diagnosis technologies including immunoassay, enzyme-linked immunosorbent assay (ELISA), lateral flow assay (LFA) have previously been reported [3,4,5,6]. Among these methods, a polymerase chain reaction (PCR) has aroused a great deal of attention due to its high precision and superior specificity as compared to other detection techniques. For these reasons, a quantitative real-time PCR (qPCR) is currently used as a gold standard for COVID-19 diagnosis [7, 8]. Although a number of advances have previously been made in a qPCR, the difficult quantification and non-specific amplification are still limitations [9].

A digital PCR (dPCR) has been demonstrated as a solution to resolve the abovementioned limitations [10]. Unlike the qPCR, dPCR fractionates the reaction mixture into tens of thousands of compartments, such as microwells or droplets [11,12,13]. Since the PCR reaction is massively conducted in a small volume (pL-nL), dPCR offers better inhibitor tolerance than qPCR. The compartments are classified into a positive and negative according to the presence of target nucleic acids (NAs). After DNA amplification is conducted on each compartment, the positive one exhibits distinct fluorescence signals [14]. The ratio of positive compartments is analyzed by end-point fluorescence detection and the initial concentration of the NAs is calculated by Poisson’s distributions. Due to the unique properties, dPCR enables absolute quantification without any calibration or reference. In this process, a dPCR essentially entails thresholding process, which detect the positive compartments based on fluorescence intensity. For precise quantification of target NAs, the accurate determination of the threshold is of critical importance [15]. In general, the threshold segmentation method is widely used to set the threshold value [16]. By applying a single threshold value to all dPCR images, the ratio of positive compartments can be simply estimated. However, this method involves parameter modification for each analysis, which is labor-intensive and time-consuming [17, 18]. More importantly, these manual thresholding methods can cause significant errors at low concentrations, where the positive signals are rarely observed [19].

One way to overcome these limitations is to automatically distinguish the positive compartments without any manual intervention [20]. For this, various methods, such as segmentation algorithms [21], clustering [22], and automatic thresholding [23], have previously been applied to the dPCR analysis. These methods allow fully automated analysis of the dPCR images and enable high-throughput, accurate, and rapid detection. With the recent advances of deep learning technology, the image processing methods have further been improved [24]. After training dataset of the images, the location of the compartments is predicted by a convolutional neural network (CNN) and segmentation is conducted with remarkable accuracy. In particular, a mask region CNN (Mask R-CNN) method exhibits outstanding performance as compared to other image analysis algorithms [25]. However, most of the Mask R-CNN method only detect positive compartments and the capability of an absolute quantification remains relatively unexplored.

In this paper, we present deep learning-assisted droplet dPCR (ddPCR) analysis combining Mask R-CNN-based image processing and Gaussian mixture model (GMM) clustering. The structure of the Mask R-CNN is designed with open source libraries and augmentation process is conducted to improve the accuracy. Next, the image processing is characterized and the parameters for training and validation are optimized. The Mask R-CNN algorithm is compared with the conventional methods and detection performance is validated using homogeneous and non-homogeneous droplet images. Furthermore, the absolute quantification of the target is successfully demonstrated with various concentrations of human coronavirus DNA. Therefore, our ddPCR analysis method could be a powerful tool in the field of the molecular diagnosis and digital healthcare.

2 Materials and Methods

2.1 ddPCR Experiment

For ddPCR experiment, the sample mixtures were prepared in a 50 µL of master mix (one-step RocketScript™ Reverse Transcriptase, Bioneer, Korea) containing 1 µL of the template RNA, 1 µL of primers (10 µM each) and 1 µL of probes (10 µM) following the manufacturer’s protocol. Highly concentrated samples were diluted in nuclease-free water (Sigma-Aldrich, USA). The initial concentration was quantified by Nanodrop UV spectrophotometry (NanoDrop 2,000, Thermo Scientific, USA). The samples were partitioned by a droplet-based microfluidic chip as previously reported in our research group [26]. The droplets were collected in a tube and isothermal ddPCR was conducted at 39 °C for 20 min. The resulting droplets were pipetted onto slide glass and the fluorescence images were taken by a fluorescence microscopy (IX73, Olympus, Japan).

2.2 Computation Platform and Structure

The software environment for the experiment platform was based on Google Colab Pro and all steps were performed in Python 3.7.12 software (Python Software Foundation, Wilmington, NC, USA). The frameworks used in image analysis were tensorflow 1.15.0 and keras 2.2.5 for training Mask Region Convolutional Neural Network (Mask R-CNN).

2.3 Data Preparation

The image dataset consisted of fluorescence images taken through ddPCR experiments and 20 images were finally selected for training and validation process. The initial resolution of images was 1920 × 1200 pixels and modified to 1200 × 1200 pixels. Training dataset was preprocessed by VGG image annotation tool for masking and labeling. After annotation process, the training set was symmetrically augmented in the horizontal and vertical symmetry to quadruple the dataset. Through augmentation process, 64 training images and 16 validation images of ddPCR were finally selected.

2.4 Comparative Methods and Training

The simple detection and comprehensive detection algorithm were selected as the comparative methods for droplet detection. For the simple detection algorithm, the images were converted to 8-bit images, and Gaussian blur was applied. The preprocessed images were analyzed by watershed binarization and particle analysis tool in Image J software, which could quantify the number of droplets by their size and circularity. For the comprehensive detection algorithm, the HoughCircles function in OpenCV library was utilized to obtain coordinates and radius of targets.

2.5 Evaluation of Detection Performance

To compare detection performance of algorithms, the accuracy (ACC) and false positive rate (FPR) of detection algorithms were calculated by following equations:

$$ {\text{ACC}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}} + {\text{TN}}}}, {\text{FPR}} = \frac{{{\text{FP}}}}{{{\text{TP}} + {\text{FP}} + {\text{TN}}}} $$

where true positive (TP) is positive targets with accurate detection, true negative (TN) is negative targets with accurate detection, and false positive (FP) is negative targets with inaccurate detection.

2.6 Absolute Quantification

After the detection process, the dataset of the augmented images was processed through the absolute quantification algorithm. GMM clustering algorithm was adopted for bisection process. Through GMM clustering, a fluorescence intensity distribution of droplets was classified into positive and negative. The mean radius of the droplets was obtained from the image and was used to calculate the mean volume of droplets. The initial concentration of the samples was calculated by Poisson distribution formula [14]:

$$ P\left( {k, \lambda } \right) = \frac{{e^{ - \lambda } \cdot \lambda^{k} }}{k!} $$
$$ P\left( {k \ge 1} \right) = 1 - P\left( {k = 0} \right) = 1 - e^{ - \lambda } $$
$$ c = \frac{\lambda }{{\overline{V}}} = \frac{{ - {\text{ln}}\left( {1 - P\left( {k \ge 1} \right)} \right)}}{{\overline{V}}} $$

where \(\lambda\) is the average number of target NAs per droplet, \(k\) is the number of target NAs per droplet, \(c\) is the initial concentration of the target NAs, and \(\overline{V}\) is the average volume of the droplet.

3 Results and Discussion

3.1 Workflow of the ddPCR Analysis Method

We developed a ddPCR analysis method combining detection algorithm and quantification algorithm. The workflow of the method is shown in Fig. 1. The ddPCR images were used for training 3 modules of Mask R-CNN: box regression, classification, and binary masking. The images were validated by the trained Mask R-CNN and detection performance was evaluated. During detection process, the information of the droplets containing location, radius, and fluorescence intensity was extracted and transferred to quantification algorithm. The intensity was plotted on the histogram and the signals were segmented by GMM clustering. Finally, the concentration of target NAs was calculated by Poisson distribution using the fraction of positive compartments and volume of the droplets.

Fig. 1
figure 1

Workflow of deep learning-assisted quantitative analysis of ddPCR images. The method consisted of a combination of two algorithms: (1) In the detection algorithm, three models of Mask R-CNN are trained using training dataset. Based on the trained weights, the droplets were detected from ddPCR images and extracted data were transferred to the quantification process. (2) In the quantification algorithm, a histogram plotting was performed with the extracted data and absolute quantification was performed with the parameters obtained through GMM clustering

3.2 Design and Optimization of the Mask R-CNN Method

For detection process, we adopted the Mask R-CNN method due to its unique properties. The Mask R-CNN method, one of the effective object detection algorithms, can achieve sufficient accuracy even with a small amount of dataset [27]. Before training the Mask R-CNN, the architecture was implemented and the training dataset was augmented with the Python language and open source of Tensorflow and Keras libraries (Fig. 2). The overall architecture of the Mask R-CNN is shown in Fig. 2a. Classification branch was added to predict object class with respect to region of interest (RoI) as previously reported [28]. The RoI was obtained from region proposal network (RPN) and a mask branch that could predict the segmentation mask in parallel with bounding box regression branch. The Mask R-CNN consisted of 4 parts: the backbone, RPN, fully connected (FC) layers, and fully convolutional network (FCN). The backbone was designed with Resnet 101 + feature pyramid networks (FPN) to extract feature maps of various sizes from each specified layer. The process of constructing a pyramid by extracting the feature map was proceeded according to a bottom-up path, a top-down path, and a side connection. The RPN, a light neural network, received the feature map generated by the backbone and found the RoI containing the object with a high probability. The resulting RoI was sent to FC layer and FCN to generate 3 outputs: the bounding box, the classification of RoI, and the masked output. To effectively perform the segmentation task, the RoI align layer was added to FC layer and FCN for preserving the spatial location of objects. In general, the deep neural network requires a large number of training dataset to achieve accurate models. However, the preparation of a large number of ddPCR dataset is difficult and time-consuming. To achieve higher performance with the small amount of the dataset, the data augmentation process was conducted [29]. Instead of traditional augmentation methods including adjustment in fluorescence, saturation, and RGB channel, we performed the data augmentation through spatial transformation, such as flipping the training image data horizontally or vertically (Fig. 2b). Through the augmentation process, we obtained a sufficient training dataset for neural network training by increasing the number of droplets by 4 times. The detection accuracy has reached an adequate value after the augmentation which was not sufficient before the process.

Fig. 2
figure 2

Design of the Mask R-CNN-based detection algorithm. a The architecture of Mask R-CNN, b Annotated image and augmented results. The scale bars are 100 μm

3.3 Characterization of the Mask R-CNN

To achieve the best performance, a Mask R-CNN was trained with the augmented image dataset and the optimum weight was characterized. Before training process, the ddPCR images were divided into training and validation sets in a ratio of 8:2. Mask R-CNN training was performed up to 2,000 iterations using the prepared dataset and the loss value was recorded every 100 iterations (Fig. 3). During the training procedure, the models of the box regression, classification, and binary mask in the Mask R-CNN algorithm were simultaneously trained. As shown in the Fig. 3a, the detection performance increased as the iteration number increased. In the image of the 10 iterations, the performance of the box regression model was poor. However, we observed improved results from the 100 iterations. After 1,000 iterations, the masking model showed excellent regression performance and most of the droplets were detected. These results indicated that the training process was successfully conducted to predict the location of the droplets. We further evaluated the accuracy of the algorithm based on the loss values. The measured loss values of the training and validation datasets were plotted in Fig. 3b. Train box loss decreased as the iteration increased and validation loss converged to 0.1 from 1,000 iterations. In general, the excessive repetition of the training results in overfitting, which decreases the detection performance [30]. The overfitting can be confirmed by the difference in loss values between the training and validation. As shown in the results, the tendency of the loss value was consistent in both cases, indicating that the augmentation process successfully prevented the overfitting. We further validated the segmentation performance using the prepared validation dataset and measured the accuracy every 500 iterations (Fig. 3c). The accuracy was proportional to the iteration number up to a certain limit. After 1000 iterations, the accuracy increased to more than 90% and eventually converged to 93.56% at 2,000 iterations. Based on those results, we selected 2,000 iterations as an optimum weight and trained parameters were implemented to the subsequent detection process.

Fig. 3
figure 3

Mask R-CNN training and performance. a Original image and processed image after 10 iterations, 100 iterations, and 1000 iterations, b Loss curves of training and validation dataset by iteration (red: accuracy on the training; black: accuracy on the validation), c Accuracy curve of validation dataset by iteration. The scale bars are 100 μm

3.4 Validation of Mask R-CNN with Comparative Methods

Since the detection performance was generally dependent on the size distribution of droplets, we classified the image dataset based on the size distribution of droplets (Fig. 4) [31]. The size of the droplet is one of the main parameters in ddPCR assay as the concentration of target DNA is calculated as based on the volume of the fractions. In general, the detection performance decreased as the variance of droplet size increased. For comparative analysis, we prepared the droplet images with various size distribution, and the images were categorized into homogeneous and non-homogeneous groups based on the CV of 10%. The CV in the non-homogeneous group varied from 10 to 30%, whereas the CV in the homogeneous group was less than 10%. In both groups, 1200 droplets obtained from 8 images were used for evaluation, and the total area of the droplets was slightly higher in a non-homogeneous group. We compared the Mask R-CNN method with the conventional detection algorithms (Fig. 4a). In a homogeneous group, the positive droplets were successfully detected by three methods. However, the simple method failed to detect negative droplets, because this method segmented the droplets based on the single threshold. As the negative droplets generally exhibit negligible fluorescence similar to the background, the simple method is not suitable for ddPCR applications. In contrast, the comprehensive method showed good performance as it detected droplets using the various parameters, such as size, distance, and sensitivity. However, the accuracy decreased in the non-homogeneous groups due to the wide size distribution (Fig. 4b). In the ddPCR assay, droplet merging frequently occurs during the amplification process [32].

Fig. 4
figure 4

Comparative analysis of the detection performance with three detection methods: Simple, Comprehensive, and Mask R-CNN. a Processed image of homogenous droplet size (CV < 10%) and non-homogeneous (CV > 10%) dataset, b Bar graph of detection accuracy evaluated from detection process, c Bar graph of detection FPR evaluated from detection process, d Confusion matrix of the Mask R-CNN detection results. The scale bars are 100 μm

For universal application of ddPCR, it needs to cope with the irregular size of droplets. The Mask R-CNN showed higher accuracy than other methods in both homogenous and non-homogenous groups as it detected in the feature maps of various sizes through FPN. The detection accuracy of Mask R-CNN was 94.42% for homogenous, 93.12% for non-homogenous, and standard deviation was less than 2%. The difference of the accuracy between two groups was 1.38%, indicating the durability of Mask R-CNN method. The FPR is another key parameter for ddPCR as it can cause fatal errors in quantitative analysis [33]. As shown in the Fig. 4c, a Mask R-CNN showed lower FPR (< 4%) and higher reproducibility (CV < 2%) in both cases as compared to the other methods. The differences in false positive rate (FPR) between groups in the simple, comprehensive, and Mask R-CNN methods were 13.5, 6.7, and 1.15%, respectively. Compared with the Mask R-CNN method, the simple method showed a 11.7 times, the complex method showed a 5.8 times higher increment in FPR. Unlike the traditional detection methods, the Mask R-CNN exhibited excellent detection performance regardless of the size variance. These results show the robustness of the Mask R-CNN-based detection algorithm. The results of the detection analysis using the Mask R-CNN method were displayed in a confusion matrix (Fig. 4d). The confusion matrix consisted of 93.84% of true positive values, 2.94% of true negative values, and 3.22% of false positives. These results indicate the high precision and robustness of Mask R-CNN method. In addition, it is worth mentioning that the Mask R-CNN does not require any manual intervention in the detection process.

3.5 Absolute Quantification of Human Coronavirus

The ddPCR enables an absolute quantification of target NAs based on the fraction of positive droplets. To evaluate the performance of quantitative analysis, three methods were compared using human coronavirus DNA with a concentration of 100 copies/µL (Fig. 5a/c). The simple method showed a 75.6% higher value as compared to the expected concentration, because the negative droplets were not effectively detected in the detection process. The comprehensive method relatively showed higher precision, however, the errors were still significant as compared to the expected concentration (26.9%). It seems to be caused by high FDR and errors in the measurement of the droplet size. In contrast, the Mask R-CNN predicted the concentration with the high accuracy and low standard deviation due to the high detection performance and low FDR. To further demonstrate the quantitative analysis of the Mask R-CNN, a ddPCR assay was conducted using various concentrations of human coronavirus DNA. For precise quantitative analysis, it is necessary to perform accurate thresholding process, which can distinguish the positive droplets from negative ones. Here, we adopted the GMM clustering method for classification of the fluorescence intensities [34]. The intensity data were assumed to be a hypothetical function consisting of the sum of two Gaussian functions, representing positive and negative partitions. The parameters for the Gaussian functions were predicted by maximum likelihood estimation and optimized by the expectation–maximization algorithm. Through this process, the fractions of the positive and negative partitions were calculated, and the absolute quantification was performed by Poisson distribution. The images of the droplet segmentation are shown in Fig. 5b. To visually confirm the performance, the droplets were captioned as positive and negative. As expected, the fraction of the positive droplets increased as the concentration of the target DNA increased. The fluorescence data of each concentration was plotted on histograms and the positive partitions were segmented through GMM clustering (Fig. 5d). After thresholding process, the measured concentrations were compared with the initial concentrations (Fig. 5e). The results showed the high linearity over the all concentrations (R2 = 0.9973), demonstrating the excellent performance of the combination of Mask R-CNN-based detection and GMM clustering algorithm. Therefore, our analysis method with robustness, high accuracy, and low FPR can enhance the precision of the ddPCR assay.

Fig. 5
figure 5

Quantitative analysis of human coronavirus DNA. a Bar graph comparing the positive and negative droplet fraction between three different detection methods with dataset of 100 copies/μL, b Segmented image with Mask R-CNN algorithm in various concentrations (NTC, 10, 50, 100, 500, and 1000 copies/μL), c Bar graph of absolute quantification results; The dotted line indicates the expected concentration. d Histogram graphs of Mask R-CNN-based detection and segmented results in various concentrations (5, 10, 50, 100, and 500 copies/μL), e Linearity of the ddPCR assay analyzed with our method (R2 = 0.9973). The scale bars are 100 μm

4 Conclusions

In this paper, we developed a deep learning-assisted ddPCR analysis for absolute quantification of target DNA. The analysis combined image processing for droplet detection and clustering algorithm for distinguishing positive droplets. The training iteration was optimized to improve the Mask R-CNN model. Over 2,000 iterations, the loss value decreased to less than 0.1 and the accuracy increased to more than 90%. We further compared the Mask R-CNN with the conventional methods using homogeneous and non-homogeneous droplets. The Mask R-CNN exhibited higher accuracy (> 93%) and lower FPR (< 4%) than other methods in both cases. Additionally, the capability of the absolute quantification was validated using human coronavirus DNA. The positive and negative droplets were segmented using GMM clustering algorithm and the results were in good accordance with the fluorescence images. Furthermore, the estimated concentration of target DNA agreed well with the actual concentration ranging from 10 to 1,000 copies/µL (R2 = 0.9973). To further improve the accuracy of the analysis, the number of droplets can be increased. Therefore, our machine learning algorithm could be considered as an effective tool for digital analysis of various infectious diseases.