Abstract

Due to the desirable and interesting applications of refrigerants in organic Rankine cycles, heat pumps, and refrigeration, engineers and researchers are becoming more interested in refrigerant properties. One of the most dominant thermophysical properties of these fluids is their normal boiling point (Tb). In the current study, a novel extreme learning method (ELM) and ensemble decision tree boosted algorithm (EDT Boosted) are proposed to forecast the normal boiling point from 16 different molecular groups and one topological index. To this end, a total of 334 data points of Tb are gathered to prepare and test ELM and EDT boosted algorithms. The visual and mathematical comparisons of model outputs and real Tb express that proposed models have great potential to predict Tb of refrigerant. Moreover, sensitivity analysis is applied to explain the effectiveness of input parameters on the determination of Tb for refrigerants.

1. Introduction

The fundamental of Organic Rankine Cycle (ORC), heat pump, and refrigeration system investigations is the study of refrigerants [1]. The studies on refrigerants have been highlighted recently [24]. The accuracy of thermophysical properties of materials is known as a main parameter in process design [2, 5, 6]. However, a large number of refrigerants’ physical properties have been reported in the previous research studies [79]. Developing alternative desirable refrigerants is necessary because of increasing attention to the greenhouse effect and depletion of the ozone layer [10]. In the computer-aided molecular design (CAMD) process for refrigerants, predicting approaches for estimation of their properties are important, so highly accurate prediction models are needed for engineers and scientists working on this issue [1115]. One of the important thermal properties of refrigerants is their normal boiling point (Tb), which has applications in the prediction of other thermal properties. The temperature at which the vapor pressure of a liquid is equal to atmospheric pressure is called the normal boiling point. There are several approaches to estimate Tb [16]. Joback and Reid suggested a group contribution approach that estimates an approximation of Tb for aromatic and aliphatic hydrocarbons [17]. This property can be predicted by the summation of the contributions of all molecular groups that exist in material structures. This approach does not perform accurately, but it is an acceptable manner for the preliminary determination of Tb. Then, Devotta and Pendyala upgraded the Joback approach to estimate halogenated mixtures’ normal boiling point more accurately [18]. Later, Constantinou and Gani proposed a new group contribution approach involving UNIQUAC functional-group activity coefficients (UNIFACs) group [19]. Marrero−Morejón and Pardillo−Fontdevila presented a group interaction contribution method [20]. After that, Wang and workers estimated the Tb for organic materials by position group contribution [21]. Abooali and Saboti improved some group contribution prediction methods by applying molecular descriptors to them [22]. Deng et al. used the artificial neural network to estimate the Tb of refrigerants based on a topological index and 16 molecular groups [23].

The prediction of normal boiling points has vital importance on modeling of different processes in refrigerants. The needed time and cost of experimental studies make computational and modeling studies valuable. On the other hand, artificial intelligence methods have shown interesting performance in different topics. In the present study, two novel extreme learning methods (ELM) and ensemble boosted trees (EDT Boosted) are suggested to forecast the normal boiling point of 16 different molecular groups and one topological index. Furthermore, sensitivity analysis is applied to explain the effectiveness of input parameters on the determination of Tb for refrigerants.

2. Methodology

2.1. Experimental Dataset Collection

In this work, a comprehensive dataset of normal boiling points for 334 compounds, which consist of refrigerant components, is gathered from previous works. The data points are collected from three reliable sources, namely, SciFinder, Molbase, and Chemical Abstracts Service, which are provided in Deng et al.’s paper [23]. In some research studies, in order to achieve suitable properties for refrigerants, a classification for them was suggested, that is, alcohols, amines, halogenated hydrocarbons, ethers, organic compounds-alkanes, and alkenes [11, 24]. In order to better classify molecules, the functional groups are selected on the basis of classification (see Figure 1).

2.2. Ensemble Boosted Trees (EDT Boosted)

Quinlan proposed decision trees (DTs) [25] is one of the popular machine learning methods which has the potential to solve many actual problems, such as a short-term photovoltaic power prediction, landslide spatial estimation, flash flood forecasting, and risk factor determination for using drugs [2629]. The fundamental of the DT is constructed based on the utilization of a series of rules for the identification of regions which have the most homogenous outputs to inputs on which a value is fitted to each one. There are some advantages to the DT as given as follows:(i)This method expresses information in an easy and intuitive manner for visualization(ii)It is a reliable tool for mining interactions and nonlinear effects of various variables(iii)No mathematical assumption is required between input and output variables(iv)It has the ability to handle outliers and suspected values

On the other hand, there are some drawbacks to the DT as given as follows:(i)There are some difficulties in modeling smooth functions(ii)This method is highly sensitive to training data, so that a small alteration in training data can obtain different outputs(iii)It has low bias and high variance [30, 31]

Hence, many strategies have been suggested to improve the predictive ability of the DT, such as ensemble boosted trees (EDT Boosted).

EDT Boosted represents an additive regression algorithm which has a simple tree individual term. The combination of boosting and regression trees is known as an ensemble method, which uses recursive binary splits to connect outputs to input variables. EDT Boosted uses the advantages of DT-based approaches and also overcomes their disadvantages [31, 32]. EDT Boosted has been used in different issues, such as medical [33], ecology [34], and banking [32]. In the current work, EDT Boosted is used for the estimation of the normal boiling point of refrigerants for the first time.

By supposing x = (x1, x2, …, xn) as predictor vector, y represents response. The EDT boosted model can be trained by the following formulation [35]:in which , , and are weights of nodes, split variables, and single decision trees, respectively.

2.3. Extreme Learning Machine

All neurons of the single hidden layer feedforward neural networks (SLFNs) are located in an input layer, an output layer, and a hidden layer by considering the applied function. These algorithms work by first connecting the input layer to the input mode and then switching it to the hidden layer. Thus, it is worthy to point out that in ANN, hidden layers can be considered as processors and then the output layer constructs the output mode [3639].

The requirement of SLFN algorithms to differentiable activation functions and layered base of these algorithms converts them to complex and inefficient algorithms. Figure 2 gives information about the scheme of ELM, which includes the three aforesaid layers.

By considering {(xi, yi),i = 1, …, N} in Rn × Rm which the target and training values are shown by yi and xi, respectively, and also dimensions of output and input data are illustrated by m and n, respectively. For the ELM algorithm, if the number of nodes is assumed to be L, this algorithm can be written as follows:in which β = [β1, …, βL] and h(x) = [h1(x),…hL(x)] are known as weight vector and nonlinear feature mapping. denotes the estimated value. There are different kinds of functions which have the potential to be utilized in neurons of the hidden layer, such as multiquadric, cosine, hyperbolic tangent, and hard limit. hi(x) for real conditions is written as follows:in which biR, aiRd, and are nonlinear differential functions in terms of a and b, which are hidden node parameters for the ELM estimation process. This approach is constructed based on a two-step learning system of preparation of SLFN, including random feature mapping and linear parameter solution. As a first step, ELM uses input weights and hidden biases to estimate the mapping matrix of the hidden layer of transferring feature space. As a second step, the weights of the hidden layer should be connected to outputs, which is done by β. These parameters will be determined by the minimum squared error as follows:in which u and ||.|| point to the matrix of hidden layer outputs and Frobenius norm. “u” is defined as follows:Also, y points to the target matrix, which is expressed as follows:

Equation (4) can be solved by the following formulation:in which denotes the Moore−Penrose inverse of u which can be rewritten with substitution by as follows [40, 41]:in which uT is the transpose of u.

3. Results and Discussion

In this study, ELM and EDT boosted algorithms were developed to estimate Tb for different refrigerants based on the molecular groups and one topological index. To this end, the activation function of the ELM algorithm is chosen as the sigmoid function and also the initial weights of inputs are made randomly in the range of −1 to 1. Moreover, the cross-validation method is used to determine the number of hidden layer neurons [42]. This parameter is equal to 10 for normal boiling point calculation. The general performance of ELM and EDT boosted algorithms in the prediction of normal boiling point is evaluated by the following error measurements [43, 44]:

The results of the abovementioned error measurements are inserted in Table 1. In the prediction of Tb, R2 = 0.995, MRE = 1.08, MSE = 25.96, RMSE = 5.16, and STD = 3.76 are determined for the overall process of the ELM. Furthermore, the details of the training and testing phases are reported in this table. This error analysis shows the high ability of ELM in estimation of the refrigerant’s normal boiling point. Another powerful approach for assessing the suitability of the ELM in the estimation of Tb is visual comparison of model outputs and real Tb values, as shown in Figure 3.

Moreover, the cross plot of actual normal boiling point versus experimental value is depicted in Figure 4 which illustrates that Tb points lie on the bisector line.

On the other hand, predicted Tb values have great agreement with their real ones. On the one hand, relative deviations between model outputs and experimental normal boiling points are calculated (see Figure 5). It is shown that determined values are near the x axis, so the proposed models have interesting potential in the prediction of normal boiling point. The accuracy of models shows their potential to be used in the estimation of the normal boiling point of refrigerants. These models can be employed easily; therefore, the construction of accurate software based on these models is possible and reasonable.

In the current work, a new analysis called sensitivity is used to show effects of different molecular descriptors on the normal boiling point of refrigerants. In this method, the relevancy factor (r) is defined for each molecular group as follows [45]:where , , ,and are experimental Tb, the average of Tb, kth of input, and the average of inputs. Figure 6 illustrates the value of r for each molecular descriptor. In this method, the negative value of r expresses that as the number of associated molecular group increases in refrigerant molecules, the normal boiling point will decrease.

Furthermore, the range of this parameter lies between −1 and 1. The absolute value of r represents the intensity of the effect of the molecular group on the normal boiling point.

The correctness of the predicting model is function of the accuracy of the utilized real data [46, 47]. The present study employs some experimental data related to the normal boiling points of refrigerants, so they may have some measurement errors. For the abovementioned reasons, identification of suspected experimental data becomes necessary [48]. The leverage method is one of the applicable approaches to this problem. This method uses a matrix called Hat to obtain the following criteria [49, 50]:

This matrix is obtained as a function of X matrix s × t which denotes the number of samples and model variables, respectively. Another major parameter of this method is the critical leverage value () which is shown as follows [51]:

As shown in Figure 7, the reliable area for experimental data is defined as the zone inbounded between red and green lines. The main number of normal boiling points is placed in a reliable zone, so the overall databank is dependable for preparing models.

4. Conclusions

In the present study, novel ELM and EDT boosted algorithms are used to predict the normal boiling point of refrigerants in terms of 16 molecular groups and the topology index EATII. The suggested outputs and actual Tb have been compared by different methods, including simultaneous visual comparison, cross plot, relative error depiction, and statistical analysis. These comparisons confirm the high ability of algorithms in the estimation of Tb. In addition, sensitivity analysis is applied to distinguish the effect of molecular descriptors on the determination of Tb for refrigerants. Finally, it is recommended to employ other available machine learning models trained on a wider databank for the utilization as software. These models should be compared with each other to choose the most accurate model.

Nomenclature

Tb:Boiling point
ELM:Extreme learning method
EDT:Ensemble decision tree
CAMD:Computer-aided molecular design
UNIFAC:UNIQUAC functional-group activity coefficients
ORC:Organic rankine cycle
SLFN:Single hidden layer feedforward neural network
MSE:Mean squared error
MRE:Mean relative error
RMSE:Root mean square error
STD:Standard deviation
R2:R-squared
r:Relevancy factor.

Data Availability

The references of experimental data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the 2021 Scientific Research Fund Project of Liaoning Provincial Education Department “Study on the development of compound juice of small berry and its quality change during storage” (LJKZ1127).