Abstract
Due to industrial development, designing and optimal operation of processes in chemical and petroleum processing plants require accurate estimation of the hydrogen solubility in various hydrocarbons. Equations of state (EOSs) are limited in accurately predicting hydrogen solubility, especially at high-pressure or/and high-temperature conditions, which may lead to energy waste and a potential safety hazard in plants. In this paper, five robust machine learning models including extreme gradient boosting (XGBoost), adaptive boosting support vector regression (AdaBoost-SVR), gradient boosting with categorical features support (CatBoost), light gradient boosting machine (LightGBM), and multi-layer perceptron (MLP) optimized by Levenberg–Marquardt (LM) algorithm were implemented for estimating the hydrogen solubility in hydrocarbons. To this end, a databank including 919 experimental data points of hydrogen solubility in 26 various hydrocarbons was gathered from 48 different systems in a broad range of operating temperatures (213–623 K) and pressures (0.1–25.5 MPa). The hydrocarbons are from six different families including alkane, alkene, cycloalkane, aromatic, polycyclic aromatic, and terpene. The carbon number of hydrocarbons is ranging from 4 to 46 corresponding to a molecular weight range of 58.12–647.2 g/mol. Molecular weight, critical pressure, and critical temperature of solvents along with pressure and temperature operating conditions were selected as input parameters to the models. The XGBoost model best fits all the experimental solubility data with a root mean square error (RMSE) of 0.0007 and an average absolute percent relative error (AAPRE) of 1.81%. Also, the proposed models for estimating the solubility of hydrogen in hydrocarbons were compared with five EOSs including Soave–Redlich–Kwong (SRK), Peng–Robinson (PR), Redlich–Kwong (RK), Zudkevitch–Joffe (ZJ), and perturbed-chain statistical associating fluid theory (PC-SAFT). The XGBoost model introduced in this study is a promising model that can be applied as an efficient estimator for hydrogen solubility in various hydrocarbons and is capable of being utilized in the chemical and petroleum industries.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
One of the fundamental properties for designing gas absorption and stripping columns in chemical industries is the solubility of gases in liquids1. While the basic principles of solubility thermodynamics are well known, it is still a challenging issue to accurately predict solubility for important industrial systems applying molecular thermodynamics alone. Nowadays, hydrogen is an eminent substance in the industry. Hydrogen plays a substantial role in industrial processes, hence the solubility of it in various hydrocarbon solutions such as fuels is very important for designing and optimal operating of these processes2. Hydrogen is a useful compound in the chemical and petroleum industries. The quality of heavy petroleum fractions can be upgraded through hydrovisbreaking or hydrocracking processes by adding hydrogen to them and increase the hydrogen to carbon ratio (H/C). The production of low sulfur fuels in the oil refining industry is such that large amounts of hydrogen are used for desulfurization plants3,4,5. Design and operating processes such as hydrogenation and hydrocracking, along with corresponding kinetic models, require hydrogen solubility data6. Pressure, temperature, and composition of solvents can remarkably affect the hydrogen solubility as a thermodynamic quantity. Increasing pressure and temperature have an increasing impact on the solubility of gases. Also, from the molar fraction point of view, as hydrocarbon carbon number increases, hydrogen solubility increases as demonstrated by experimental tests2,7,8,9. It is well known that traditional equations of state (EOSs) are limited in accurately predicting the solubility of hydrogen for the modeling of hydrogenation processes. There is a potential for energy waste and even a potential safety hazard in the hydrogenation process due to the overuse of hydrogen. Therefore, solubility data is very significant to predict the optimal amount of hydrogen in this process and can lead to improved plant safety. Performing experiments for heavy hydrocarbons due to the complexity of them is particularly difficult. Also, the risks associated with high-pressure or/and high-temperature conditions in industrial processes do not make extensive testing an attractive choice. Hence, modeling based on experimental data can be a good alternative.
The methods for predictions of hydrogen solubility in solvents such as hydrocarbons or petroleum mixture are mostly based on the application of empirical and semi-empirical models such as EOSs and are alike to those applied for solubility of other gases such as methane and CO210,11,12,13,14,15. Shaw16 proposed a correlation for measuring the solubility of hydrogen in hydrocarbon solvents including heterocyclic, aromatic, and alicyclic type, by applying corresponding state theory16. Yuan et al.17 used molecular dynamics simulations to estimate the hydrogen solubility in heavy hydrocarbons for a range of pressures and temperatures. They concluded that a combination of the EOSs and molecular dynamics simulations can lead to more accurate and practical predictions for the hydrogen solubility at high pressures and temperatures17. Riazi and Roomi5 proposed a method for predicting the hydrogen solubility in hydrocarbons and their mixtures based on regular solution theory. Their procedure was based on calculating the parameter of hydrogen solubility according to the type of solvents or their molecular weight. The advantage of their method was that, unlike EOSs or other models, critical properties of solvent were not needed to calculate the hydrogen solubility. However, the need for other calculations in this method can still be considered a negative point5. Torres et al.18 applied the augmented Grayson Streed method19 to better model the solubility of hydrogen in heavy oil cuts. However, they noted that the homogeneous EOSs models could provide better results. The solubility of hydrogen in n-alcohols has been measured and modeled by d’Angelo and Francesconi20. Also, in their work, individual correlations as pseudo-Henry’s constants were used to better estimate hydrogen solubility20. Luo et al.21 experimentally investigated the hydrogen solubility in coal liquid and several hydrocarbons. They also proposed a mathematical model based on Henry’s law and the Pierotti method21. Yin and Tan22 obtained hydrogen solubility data in toluene in the presence of CO2 (i.e., ternary system H2 + toluene + CO2). An EOS named Peng–Robinson associated with the van der Waals mixing rule was used to model the vapor–liquid equilibrium (VLE) data22. Qian et al.23 used Peng–Robinson EOS to model a large dataset of various hydrogen-containing binary systems with the implementation of the group-contribution method for calculating temperature-dependent binary interaction parameters23. This method was previously been proposed by Jaubert and Mutelet to predict the VLE of hydrocarbons binary mixtures24. The solubility of hydrogen in several heavy normal alkanes has measured and modeled by Florusse et al.2. They used statistical associating fluid theory (SAFT) approach to model the hydrogen solubility after experiments. However, this method is a complex method due to the adjustable parameters and parameters required for any potential function2. Perturbed-Chain SAFT (PC-SAFT) EOS25 is another method that can be used to estimate the solubility of hydrogen in hydrocarbons. This method has been utilized to propose several models for prognostication of the solubility of hydrogen in hydrocarbons and heavy oils6,26,27,28. The classical EOSs, activity models, etc. require adjustable parameters, proper mixing rules, iterative calculations, etc. Traditional EOSs are only reliable in specific temperature and pressure ranges and have bounded flexibility for substances used.
Complex calculations in chemical and petroleum sciences have been facilitated by artificial intelligence (AI) methods in recent years. Regarding the use of artificial intelligence in the case of hydrogen solubility, Safamirzaei et al.29 have considered the hydrogen solubility in primary n-alcohols and after that, they applied artificial neural networks (ANNs) to overcome EOSs and simple correlations constraints in achieving best modeling29. Nasery et al.30 implemented Adaptive Neuro-Fuzzy Inference System (ANFIS) to estimate the solubility of hydrogen in heavy oil fractions30. Safamirzaei and Modarress31 modeled hydrogen solubility in heavy n-alkanes (C46H94, C36H74, C28H58, C16H34, and C10H22) by ANNs31. As can be seen in the literature studies, the issue of modeling hydrogen solubility in different solvents especially hydrocarbons has always been the focus of researchers. Also, according to the classification scheme of van Konynenburg and Scott32 and the updated version by Privat and Jaubert33, hydrogen-containing systems systematically show type III phase behavior, and such systems are acknowledged to be particularly difficult to correlate. Hence, there is a window for developing a more general model to estimate hydrogen solubility in hydrocarbons using AI methods, which accounts more influential variables, with higher precision. Due to the nature of data-driven soft computing techniques, such a comprehensive model can be developed by combining more data points and various operating conditions.
In the current work, we apply a total of 919 experimental hydrogen solubility data points for 26 different hydrocarbons accumulated at different operating conditions1,2,8,11,14,21,34,35,36,37,38,39,40,41,42,43,44. Advanced machine learning methods namely extreme gradient boosting (XGBoost), adaptive boosting support vector regression (AdaBoost-SVR), gradient boosting with categorical features support (CatBoost), light gradient boosting machine (LightGBM), and multi-layer perceptron (MLP) optimized by Levenberg–Marquardt (LM) algorithm are utilized to develop models for estimating the hydrogen solubility in hydrocarbons. Moreover, the validity of the proposed models is checked by applying statistical parameters and graphical error analyses. Also, several hydrogen solubility systems are estimated by the models developed in this work and five EOSs including Soave–Redlich–Kwong (SRK), Peng–Robinson (PR), Redlich–Kwong (RK), Zudkevitch–Joffe (ZJ), and perturbed-chain statistical associating fluid theory (PC-SAFT) to make a comparison between these models and EOSs.
Data gathering
To accurately model hydrogen solubility in hydrocarbons, 919 experimental hydrogen solubility data were gathered from the literature1,2,8,11,14,21,34,35,36,37,38,39,40,41,42,43,44. Table 1 represents the sources of the experimental hydrogen solubility data used in this work along with the pressure range, temperature range, and uncertainty values for each system. Since the type of hydrocarbon dictates hydrogen solubility, a broad range of hydrocarbons was selected with properties represented in Table S1. Hydrocarbon families used in this study include alkane, alkene, cycloalkane, aromatic, polycyclic aromatic, and terpene.
To model hydrogen solubility in hydrocarbons, thermodynamic properties were considered for model development. In this work, molecular weight, critical pressure, and critical temperature of solvents along with pressure and temperature were selected as input parameters to the models. The hydrogen solubility (in terms of mole fraction) at different pressures and temperatures is set to be the model output. Moreover, a short statistical description of input and target parameters of the data bank applied for modeling is listed in Table 2. Using the uncertainty values of the experimental data in data-driven modeling can make the model really reliable. However, because uncertainty values (for test conditions and results of solubility tests) were not reported or fully reported in some papers, it was not possible to use them in modeling.
It is very important to apply different systems to achieve a comprehensive model for predicting hydrogen solubility in hydrocarbons. The characterization data for the 26 various hydrocarbons from 6 hydrocarbon families utilized for modeling are presented in Table S1. A databank including 919 data points was gathered from 48 different systems of the literature1,2,8,11,14,21,34,35,36,37,38,39,40,41,42,43,44, the statistical information of which is reported in Table 2. The carbon number of hydrocarbons is ranging from 4 to 46 corresponding to a molecular weight range of 58.12–647.2 g/mol. Also, the experimental hydrogen solubility data were collected in a broad range of operating temperatures, 213–623 (K) and pressures, 0.1–25.5 (MPa). According to the statistics reported in Table 2, the variation range and distribution of model input parameters are wide enough to provide a general model for estimating hydrogen solubility in hydrocarbons.
Models implementation
Extreme gradient boosting (XGBoost)
The main idea behind a tree-based ensemble technique is to utilize an ensemble of classification and regression trees (CARTs) such that the training data is fitted by the minimization of a regularized objective function. XGBoost is one of these tree-based models under the framework of gradient boosting decision tree (GBDT). To elaborate on the CART’s structure, every cart consists of (I) a root node, (II) internal nodes, and (III) leaf nodes as shown in Fig. 1. According to the binary decision practice, the root node which embodies the whole data set is subjected to be split into internal nodes, while the leaf nodes represent the ultimate classes. In order to build a robust ensemble in gradient boosting, a series of base CATRs are consecutively constructed where the weight of every individual CART needs to be tuned through the training process45.
To model the output y for a given dataset where m and n are dimension features and examples, respectively, an ensemble of n tress needs to be trained:
where the example × is mapped by the decision rule q(x) to the binary leaf index. In Eqs. (1) and (2), f represents the space of regression trees, fk is the kth independent tree, T denotes the number of leaves on the tree, and ω is the weight of the leaf.
The determination of the ensemble of trees is performed by the minimization of regularized objective function L:
where Ω is the regularization term limiting the model intricacy, assisting the reduction of the overfitting; l denotes a differentiable convex loss function; γ stands for the minimum loss reduction which is needed to split a new leaf, and λ shows the regulation coefficient. It should be noted that γ and λ in these sets of equations help to soar the model variance and decrease the overfitting46.
In the gradient boosting approach, the objective function for every individual leaf is minimized through which more branched will be added iteratively.
where t represents the t-th iteration in the aforementioned training process. To notably ameliorate the ensemble model, the XGBoost’s approach greedily adds the space of regression trees which is usually referred to as “greedy algorithm”. Therefore, the model output is iteratively updated through the minimization of the objective function:
The XGBoost benefits from the shrinkage strategy in which newly added weights are scaled after every step of boosting by a learning factor rate. This helps to diminish the effects of future new trees on every existing individual tree, thereby reducing the risk of overfitting47.
Light gradient boosting machine (LightGBM)
Another new gradient learning framework built up upon the idea of the decision tree is LightGBM48. The salient features of LightGBM which dominates XGBoost are consuming less memory, utilizing a leaf-wise growth approach with depth restrictions, and benefiting from a histogram-based algorithm that expedites the training process49. Using the aforementioned histogram algorithm, LightGBM discretizes continuous floating-point eigenvalues into k bins, hence leading to building a k-width histogram. In addition, extra storage of pre-sorted results is not required in the histogram algorithm and values can be stored in an 8-bit integer after the feature discretization that reduces the memory consumption to 1/8. Nevertheless, this rough partitioning approach does decrease the model accuracy. LightGBM also uses a leaf-wise approach which is more effective than the traditional growth strategy named level-wise. The rationale behind this inefficiency in level-wise strategy is that the leaves from the same layer are considered at each step, thereby leading to a gratuitous memory allocation. Instead, the leaves with the highest branching gain are found at every step in the leaf-wise approach after which the algorithm continues to the branching cycle. Thus, the errors can be diminished and higher precision is achieved with the same number of segmentations compared to the horizontal direction. In Fig. 2, the strategy of leaf-wise tree growth is depicted. The downside of leaf orientation is growing deeper decision trees which unavoidably results in overfitting. However, LightGBM precludes this overfitting while furnishing high efficiency by applying a maximum depth limit to the leaf top48,49.
In the followings, calculations for LightGBM are shown50:
For a given training dataset \(X = \left\{ {(x_{i} ,y_{i} )} \right\}_{{_{i = 1} }}^{m}\), LightGBM searches an approximation \(\widehat{f}(x)\) to the function f*(x) to minimize the expected values of specific loss functions L(y, f (x)):
LightGBM ensembles many T regression trees \(\sum_{t=1}^{T}{f}_{t }(x)\) to approximate the model. The regression trees are defined as wq(x), \(q \in \left\{ {1, \, 2,...,N} \right\}\), where w shows a vector representing the sample weights of leaf nodes, N stands for the number of tree leaves, and q represents the decision rule of trees. The model is trained in the additive form at step t:
Newton's approach is used to approximate the objective function.
Gradient boosting with categorical features support (CatBoost)
For categorical boosting, categorical columns are used in CatBoost which uses permutation techniques such as one_hot_max_size (OHMS) and target-based statistics. In this technique, a greedy method is used for each new split of the current tree which enables CatBoost to find the exponential growth of the feature combination51. The following steps are applied in CatBoost for every feature possessing more categories compared to OHMS:
-
1.
Random subset formation of the records
-
2.
Label conversion to integers
-
3.
Categorical feature transformation to numeric, as follows:
$$avgT\arg et = \frac{countInClass + prior}{{totalCount + 1}}$$(7)
where countInClass counts targets with the value of one for a given categorical feature, and totalCount counts previous objects (the starting parameters determine the prior to count the objects)52,53.
Adaptive boosting (AdaBoost)
For supervised classification, Freund and Schapire54 have suggested the AdaBoost system. In this model, reweighted data, that the eights are chosen reliability refers to the consistency of the output of the learners, are sequentially assumed in the week learners. This trick reduces the inexperienced learner in order to concentrate on the hard cases55. The following represent the key steps of the Adaboost technique:
-
Defining Weights: \({w}_{j}=\frac{1}{n}, j=\mathrm{1,2},\dots .,n\) ;
-
For each i, set the training data to a weak learner \({Wl}_{i}(x)\) using weights and obtain the weighted error
$${Err}_{i}=\frac{{\sum }_{j=1}^{n}{w}_{j}I({t}_{j}\ne {wl}_{i}\left(x\right))}{{\sum }_{j=1}^{n}{w}_{j}}, I\left(x\right)=\{\begin{array}{c}0 if x=false \\ 1 if x=true\end{array}$$ -
For each i, determine weights for predictors as: \({\beta }_{i}=log\left(\frac{(1-{Err}_{i})}{{Err}_{i}}\right)\)
-
Modified data wights for each i to N ( N denotes the number of learners);
-
Adjust weak learner for data test (x) as output.
In this study, support vector regressors (SVR) were applied as the weak learners in Adaboost systems.
Support vector regression (SVR)
Support Vector machine (SVM) is a group of similar supervised machine learning algorithms that can be applied for both regression and clustering tasks56. SVR is a systematic technique for soft computation, with a well-established mathematical formulation. As it has been shown to be very stable for modeling multiple complex structures, this approach has gained significant interest. In the literature, the fundamental concept behind SVR is commonly presented57. Therefore, we present a short description of the SVR conception for the sake of brevity. SVR attempts to obtain a regression function f(x) for a given dataset \([\left({x}_{1}, {y}_{1}\right),\dots ..,\left({x}_{n}, {y}_{n}\right)]\) with \(x\in {R}_{d}\) as the d-dimensional input space and \(y\in R\) as the output vector dependent on the input data to estimate the output as below:
where \(b\) denotes bias vectors, \(w\) shows the weight, and \(\phi \left(x\right)\) refers to the function of the kernel. The following minimization problem proposed by Vapnik should be solved in order to achieve the right values of the weight and bias vectors58:
where \(T\) represents the transpose operator, \(\varepsilon\) shows the error tolerance, C represents a positive regularization parameter that defines the variance from \(\varepsilon\), \({\zeta }_{j}^{+}\) and \({\zeta }_{j}^{-}\) consider positive parameters, attempting to point out the lower and higher excess variations, respectively.
By means of the Lagrange multipliers, the previously discussed constrained optimization problem is taken into a dual function. This move then leads to the final solution, which is presented as follows:
where \(K({x}_{k},{x}_{l})\) represents the kernel function; \({a}_{k}\) and \({a}_{k}^{*}\) represent the Lagrange multipliers that follow the 0 k and k C constraints.
Multilayer perceptron (MLP) neural network
MLP is a class of feedforward ANNs that consists of various layers. The primary layer which is pertinent to the input data is the input layer, the last layer which corresponds to the output of the model is the output layer and the middle layers which process the information are hidden layers59. In the hidden layers, each neuron will connect to every neuron in the next and prior layers. The manner of calculating the value of every neuron in the output or hidden layers is as follows: the amount of every neuron in the prior layer which is multiplying in its corresponding particular weight is summed together and a bias factor is appended to these values. Then, the resulting value passes through an activation function60. Table S2 summarizes different activation functions along with their corresponding mathematical equations. The number of hidden layers and neurons in any hidden layer should be optimized to acquire a highly efficient and accurate model, usually using the empirical method. The performance of the MLP model depends on the optimization algorithms such as Levenberg–Marquardt (LM)61 applied to train this intelligent model. In this work, the MLP model which is developed on the basis of the LM optimization algorithm is dubbed MLP-LM . Figure 3 represents a schematic of the developed MLP in this work.
The procedure of model development
For developing each model and take care of overfitting, we used grid search for optimizing hyperparameters of models. The hyperparameters used in grid search for each model were different, the importance of the hyperparameters was based on theoretical and practical aspects. The following hyperparameters were used for each model:
-
For XGBoost: max_depth, n_estimators, learning_rate, min_child_weight, base_score.
-
For LightGBM: Boosting type, objective, metric, learning rate, feature fraction, bagging fraction.
-
For AdaBoost-SVR: learning rate, loss, epsilon, n_estimators, γ, C.
-
For CatBoost: n_estimators, max_depth, learning rate.
-
For MLP-LM: learning rate, Epoches.
The empirical method is also applied to determine the optimal number of hidden layers and neurons in any hidden layer for the MLP neural network.
In this work, we used k-fold cross-validation on our train dataset because it cares that every observation from the dataset has the chance of appearing in training and validation. For all models, we did use KFold 6 (as we know Kfold should not be too small or too high, and it depends on data size) so the value is picked up based on our data. It means we split the train data randomly into 6 folds and then fit the model using K-1 (which is 5 folds) and validate the model using the remaining fold.
Equations of state (EOSs)
The analytical description of the relationship between volume, temperature, and pressure of a substance can be expressed by an EOS. The vapor–liquid–equilibria (VLE), volumetric behavior, and thermal properties of mixtures and pure substances can be described by this expression. The phase behavior of petroleum fluids is widely predicted by EOSs. As already mentioned, traditional EOSs offer poor predictions for the solubility of gases in solvents, especially in complex operating conditions. In this study, four cubic EOSs including SRK, PR, RK, and ZJ along with PC-SAFT as a type of SAFT EOSs are implemented to measure the hydrogen solubility in hydrocarbons and their precision in estimating the hydrogen solubility is compared with the proposed machine learning models. Conventional van der Waals one-fluid mixing rules are utilized in cubic EOSs. Table S3 shows the PVT relationships of the cubic EOSs and PC-SAFT equation in terms of the residual Helmholtz free energy. Furthermore, the parameters and mixing rules for the EOSs are presented in Table S4. Also, the pure-component PC-SAFT parameters for the substances used in this work are reported in Table S5. The binary interaction parameter (kij) in van der Waals mixing rules characterizing molecular interactions between molecules of two components, can be a key parameter in estimating the solubility of a solute in a solvent in cubic EOSs. A similar kij parameter is introduced by applying the van der Waals one-fluid mixing rules to the perturbation terms in PC-SAFT EOS that corrects the segment-segment interactions of unlike chains. The optimized values of kij parameter for all EOSs in different hydrogen solubility systems are reported in Table S6.
Model assessment
Statistical error analysis
The following definitions have been implemented for the statistical parameters of standard deviation (SD), average absolute percent relative error (AAPRE), root mean square error (RMSE), coefficient of determination (R2), and average percent relative error (APRE) to assess the validation and accuracy of the models:
In these formulas, HSi,e, HSi,p, and N, respectively, represent the experimental hydrogen solubility data and predicted values of hydrogen solubility in hydrocarbons by developed models, and the number of data points. The coefficient of determination which is represented almost everywhere by the R2 is one of the most well-known criteria for the goodness of fit of a model. R2 is an important statistical parameter that shows how well the model output corresponds to the experimental data and how valid the model is. If the R2 value is closer to 1, the fit of the model response to the experimental values is greater. The data scattering around zero deviation is assessed by RMSE. APRE and AAPRE measure the relative deviation and the relative absolute deviation from the target data, respectively. The measure of scattering is assessed by SD, which less value of it demonstrates a lower grade of dispersion.
Graphical error analysis
Besides the statistical error analysis that has already been mentioned, visual graphical analysis can also help to understand the validity of the models developed in this work. The significant items are classified as follows:
Crossplot: in this graph, the estimated values of a model are plotted versus experimental values. If the finest fit line of the model estimation has no deviation from the 45° line and the computed data are mostly concentrated nearest to the unit slope line (Y = X), the performance of the model is excellent.
Error distribution plot: the presence or absence of error trend is checked by measuring the error scattering around the zero-error line. Here, the relative error (Ei) is calculated through Eq. (16):
Cumulative frequency graph: the cumulative frequency of data is sketched versus absolute relative error (Ea). The higher cumulative frequency curve reveals that most of the estimations fall within the usual error range. In other words, the closer the curve to the vertical axis, the model error in estimating the high percentage of data is less. In this work, the Ea is calculated through Eq. (17):
Group error diagram: the data are divided into diverse ranges and their error at each range is calculated and sketched.
Trend plot: in this diagram, both target data and estimated values by the proposed model are sketched against the index of data points and their coverage and trend are tracked.
Results and discussion
Description of model development
The optimal values of the important hyperparameters along with the search interval of the hyperparameters tuned for the machine learning models implemented in this work are presented in Table 3.
In Table 3, n_estimators show the number of trees; subsample is subsample ratio of the training instance; C denotes a degree of importance that is given to misclassifications; max_depth represents maximum depth of a tree; min_child_weight is the minimum sum of instance weight (hessian) needed in a child; bagging_fraction shows the fraction of data to be used for each iteration; feature_fraction is parameters randomly selected in each iteration for building trees; learning_rate controls the impact of each tree on the final outcome; base_score represents the initial prediction score of all instances; epsilon is a parameter affect the number of support vectors applied to construct the regression function; γ shows kernel coefficient, and epochs show the number of times that the learning algorithm is passed through a full training dataset.
Statistical assessment of the developed models
To identify the most accurate model, we should compare the developed models using statistical factors including, R2, AAPRE (%), SD, APRE (%), and RMSE. The calculated values for these parameters are reported in Table 4. The results reveal that among all developed models, XGBoost provides the most accurate predictions, followed by AdaBoost-SVR, LightGBM, CatBoost, and MLP−LM models, respectively. Based on Table 4, AAPRE values of 2.14% for the testing set, 1.71% for the training set, and 1.81% for the total set of data, suggest that the XGBoost model has the most accurate estimation of hydrogen solubility in hydrocarbons. However, Table 4 reveals that other models also display good accuracy.
For a comparative evaluation of the models developed in this work with five EOSs, 30 hydrogen solubility data points in three different systems including hydrocarbons with low, medium, and high molecular weight collected from the literature8,11,39 were estimated by these models. Predictions of models along with the results calculated by the EOSs are presented in Table 5. The AAPRE reported in Table 5 is much higher for the EOSs than the machine learning models. ZJ EOS with an AAPRE of 15.78% has the best calculations for hydrogen solubility in hydrocarbons among the other cubic EOSs. Also, PC-SAFT as a modern type of EOSs shows good estimates with AAPRE of 9.56% and has superior performance compared to traditional cubic EOSs. All machine learning models have good predictions and show a significant advantage over EOSs. XGBoost model has the best performance among all models and EOSs with an AAPRE of 1.92%. It is noteworthy that uncertainty values are different for different systems. According to our studies, AAPRE values reported in Table 5 can vary about 5–10% due to uncertainty values, but it is better to trust the reported experimental values in the literature.
To further evaluate the validity and reliability of the XGBoost model, an external validation dataset containing 413 hydrogen solubility data in 18 different hydrocarbons, including 6 new hydrocarbons (i.e. ethane, propane, ethene, 1-hexene, 1-heptene, and diphenylmethane) over a wide range of operating temperatures (98–701 K) and pressures (1.03–78.45 MPa), were collected from the literature. The properties of all hydrocarbons used in this work are presented in Table S1. Table 6 describes this validation dataset of hydrogen solubility data. This dataset is completely outside the training and testing sets used for modeling in this paper. Hence, it allows evaluating the performance of the model outside the modeling data sets. AAPRE values for each system are calculated using experimental data and predictions of the XGBoost model. The AAPRE values reported in Table 6 show that the XGBoost model has good predictions for all systems, even for new hydrocarbons not used in modeling. Overall AAPRE of 1.78% for this validation dataset shows the high validity of the XGBoost model in predicting hydrogen solubility in hydrocarbons.
Visual error analysis
For a more detailed assessment of the accuracy of the proposed models, visual analysis applying the crossplot of predicted hydrogen solubility against the corresponding experimental values was depicted in Fig. 4. Besides, Fig. 5 presented the error distribution diagram for each of the two testing and training sets of all models. Figure 4 demonstrates that the high concentration of data points surrounding the 45° line for all models. However, the XGBoost model performs much better than other models, indicating its high reliability for predicting hydrogen solubility in hydrocarbons. The relative errors among experimental hydrogen solubility and estimated values by the proposed models versus the experimental data for the test and training sets are illustrated in Fig. 5. This figure demonstrates that the relative errors of XGBoost and AdaBoost-SVR models are highly near the zero-error line, but the errors of the predictions of CatBoost, LightGBM, and MLP-LM models are not as low as the XGBoost and AdaBoost-SVR models. The maximum percent relative error among the estimated hydrogen solubility values and the experimental data for the XGBoost model is 19%. Figures 4 and 5 reflect the significant extent of agreement between the experimental hydrogen solubility data and the XGBoost model predictions.
Figure S1 represents the trend plot of the predicted values of hydrogen solubility in hydrocarbons for all proposed models and the experimental hydrogen solubility data versus the index of data points. As demonstrated in Fig. S1 in the Supplementary file, all models show good overlap between the estimated hydrogen solubility data and the experimental values, but the degree of overlap is excellent for the XGBoost model.
Figure S2 depicts the cumulative frequency of the data versus Ea for all developed models. Based on this figure, more than 70% of estimated hydrogen solubility by the XGBoost model have an absolute relative error < 1.3%, as well as more than 90% of the estimated data, have an absolute relative error < 3.6%. However, for the AdaBoost-SVR, LightGBM, CatBoost, and MLP-LM models respectively 81%, 79%, 73%, and 48% of predicted hydrogen solubility data have an absolute relative error < 3.6%, indicating the high validity of the XGBoost model.
Operating pressure and temperature greatly affect the solubility of hydrogen in hydrocarbons. As mentioned earlier, predicting hydrogen solubility under high-pressure/ igh-temperature conditions in various industries, is very important and the safety and efficiency of industrial processes depend on it. Figure 6 presents the validity of models at selected values of pressure and temperature ranges by applying the group error plots. It is worth noting that the group error analysis is performed by splitting all data into various ranges of pressure (i.e. 0–5 MPa, 5–10 MPa, 10–15 MPa, 15–20 MPa, and 20–25 MPa) and temperature (i.e. 210–294 K, 294–378 K, 378–462 K, 462–546 K, and 546–630 K) to investigate the validity of the proposed models at various ranges of these important parameters. AAPRE was calculated for the mentioned intervals and plotted in Fig. 6a for pressure parameter and Fig. 6b for temperature parameter. As can be seen in Fig. 6, LightGBM and MLP-LM models have relatively higher errors in low and high pressures and temperatures. Also, CatBoost and AdaBoost-SVR models have relatively higher errors in low pressures and temperatures. XGBoost model has the lowest error among all models for different temperature and pressure operating conditions, which proves the previous claims of good performance of this model.
Trend analysis
At the next stage, several different analyses were performed to assess the performance of the XGBoost model in different systems of hydrogen solubility in hydrocarbons. First, the impact of pressure on the hydrogen solubility in n-Decane at a high temperature of 432 K2 is evaluated in Fig. 7. The hydrogen solubility values predicted by the XGBoost model for this system along with the values calculated by the EOSs are demonstrated in Fig. 7. As indicated in this figure, at high-temperature conditions, the deviation between traditional RK EOS calculations and experimental data is high, but the other EOSs and XGBoost model predict experimental data excellently. As expected, the solubility of hydrogen in the n-Decane increases with increasing pressure. However, cubic EOSs slightly overestimate or underestimate the increase in solubility with increasing pressure at high temperatures, while the XGBoost model follows the trend very well. PC-SAFT EOS also has good predictions with low deviation from experimental data and outperforms traditional cubic EOSs.
Next, the hydrogen solubility data in a hydrocarbon named diphenylmethane76 with a molecular weight of 168.23 and a carbon number of 13 are predicted by the XGBoost model at high temperature and pressure conditions (Fig. 8). Again, as depicted in Fig. 8, the XGBoost model correctly detects data trends and provides excellent forecasts. As can be seen, the effect of temperature increase along with increasing pressure on hydrogen solubility is correctly predicted by the XGBoost model.
As mentioned earlier, the solubility of hydrogen increases with an increasing carbon number of hydrocarbons2,7,8,9. Therefore, the predictions of the XGBoost model for the solubility of hydrogen in several hydrocarbons with different carbon numbers (decane, eicosane, octacosane, and hexatriacontane) at a temperature of 373 K, which have been studied experimentally in literature8, are presented in Fig. 9. In this case, as well, the estimations of the XGBoost model are in good agreement with the reported experimental hydrogen solubility data for all these hydrocarbons.
Conclusions
In this work, five robust machine learning models were introduced for estimating the hydrogen solubility in hydrocarbons as a function of critical pressure, critical temperature, and molecular weight of solvents along with pressure and temperature operating conditions. A databank including 919 data points gathered from 48 different systems of the 26 various hydrocarbons was applied to model the hydrogen solubility. Implementing the techniques of XGBoost, CatBoost, LightGBM, AdaBoost-SVR, and MLP-LM revealed that the estimations of hydrogen solubility in hydrocarbons from the five proposed models reached the AAPRE of 1.81%, 3.40%, 3.52%, 4.70%, and 6.01% for XGBoost, AdaBoost-SVR, LightGBM, CatBoost, and MLP-LM , respectively. XGBoost is introduced as the best-proposed model in this work based on graphical and statistical error analysis. Evaluation of the XGBoost model with an external validation dataset containing 413 hydrogen solubility data in 18 different hydrocarbons over a wide range of operating temperatures (98–701 K) and pressures (1.03–78.45 MPa) also proved the validity and reliability of the XGBoost model in predicting hydrogen solubility in hydrocarbons. Also, the calculation of hydrogen solubility in hydrocarbons for several different systems by EOSs showed that PC-SAFT has the best predictions for hydrogen solubility in hydrocarbons among the other EOSs. However, ZJ EOS also outperformed another cubic EOSs.
Abbreviations
- ZJ:
-
Zudkevitch–Joffe EOS
- XGBoost:
-
EXtreme gradient boosting
- VLE:
-
Vapor–liquid equilibrium
- SVR:
-
Support vector regression
- SD:
-
Standard deviation
- SVM:
-
Support vector machine
- SAFT:
-
Statistical associating fluid theory
- SRK:
-
Soave–Redlich–Kwong EOS
- ReLU:
-
Rectified linear unit
- RMSE:
-
Root mean square error
- RK:
-
Redlich–Kwong EOS
- pred:
-
Predicted
- PC-SAFT:
-
Perturbed-chain statistical associating fluid theory
- PR:
-
Peng–Robinson EOS
- OHMS:
-
One_hot_max_size
- MLP-LM:
-
Multilayer perceptron optimized by Levenberg–Marquardt algorithm
- LightGBM:
-
Light gradient boosting machine
- HS:
-
Hydrogen solubility
- exp:
-
Experimental
- EOSs:
-
Equations of state
- EOS:
-
Equation of state
- CARTs:
-
Classification and regression trees
- CatBoost:
-
Gradient boosting with categorical features support
- AAPRE:
-
Average absolute percent relative error
- ANFIS:
-
Adaptive neuro-fuzzy inference system
- APRE:
-
Average percent relative error
- AdaBoost-SVR:
-
Adaptive boosting support vector regression
- R2 :
-
Coefficient of determination
- Ei :
-
Relative error
- Ea :
-
Absolute relative error
References
Katayama, T. & Nitta, T. Solubilities of hydrogen and nitrogen in alcohols and n-hexane. J. Chem. Eng. Data 21, 194–196 (1976).
Florusse, L., Peters, C., Pamies, J., Vega, L. F. & Meijer, H. Solubility of hydrogen in heavy n-alkanes: Experiments and saft modeling. AIChE J. 49, 3260–3269 (2003).
Pacheco, M. A. & Dassori, C. G. Hydrocracking: An improved kinetic model and reactor modeling. Chem. Eng. Commun. 189, 1684–1704 (2002).
Alves, J. J. & Towler, G. P. Analysis of refinery hydrogen distribution systems. Ind. Eng. Chem. Res. 41, 5759–5769 (2002).
Riazi, M. & Roomi, Y. A method to predict solubility of hydrogen in hydrocarbons and their mixtures. Chem. Eng. Sci. 62, 6649–6658 (2007).
Saajanlehto, M., Uusi-Kyyny, P. & Alopaeus, V. Hydrogen solubility in heavy oil systems: Experiments and modeling. Fuel 137, 393–404 (2014).
Lal, D., Otto, F. & Mather, A. Solubility of hydrogen in Athabasca bitumen. Fuel 78, 1437–1441 (1999).
Park, J., Robinson, R. L. J. & Gasem, K. A. Solubilities of hydrogen in heavy normal paraffins at temperatures from 323.2 to 423.2 K and pressures to 17.4 MPa. J. Chem. Eng. Data 40, 241–244 (1995).
Cai, H.-Y., Shaw, J. & Chung, K. Hydrogen solubility measurements in heavy oil and bitumen cuts. Fuel 80, 1055–1063 (2001).
Schwarz, B. J. & Prausnitz, J. M. Solubilities of methane, ethane, and carbon dioxide in heavy fossil-fuel fractions. Ind. Eng. Chem. Res. 26, 2360–2366 (1987).
Tsuji, T., Shinya, Y., Hiaki, T. & Itoh, N. Hydrogen solubility in a chemical hydrogen storage medium, aromatic hydrocarbon, cyclic hydrocarbon, and their mixture for fuel cell systems. Fluid Phase Equilib. 228, 499–503 (2005).
Moysan, J., Huron, M., Paradowski, H. & Vidal, J. Prediction of the solubility of hydrogen in hydrocarbon solvents through cubic equations of state. Chem. Eng. Sci. 38, 1085–1092 (1983).
Li, H. & Yan, J. Evaluating cubic equations of state for calculation of vapor–liquid equilibrium of CO2 and CO2-mixtures for CO2 capture and storage processes. Appl. Energy 86, 826–836 (2009).
Park, J., Robinson, R. L. & Gasem, K. A. Solubilities of hydrogen in aromatic hydrocarbons from 323 to 433 K and pressures to 21.7 MPa. J. Chem. Eng. Data 41, 70–73 (1996).
Jamali, M., Izadpanah, A. A. & Mofarahi, M. Correlation and prediction of solubility of hydrogen in alkenes and its dissolution properties. Appl. Petrochem. Res. 20, 1–10 (2021).
Shaw, J. A correlation for hydrogen solubility in alicyclic and aromatic solvents. Can. J. Chem. Eng. 65, 293–298 (1987).
Yuan, H., Gosling, C., Kokayeff, P. & Murad, S. Prediction of hydrogen solubility in heavy hydrocarbons over a range of temperatures and pressures using molecular dynamics simulations. Fluid Phase Equilib. 299, 94–101 (2010).
Torres, R., De Hemptinne, J.-C. & Machin, I. Improving the modeling of hydrogen solubility in heavy oil cuts using an augmented Grayson Streed (AGS) approach. Oil Gas Sci. Technol. Rev. IFP Energies Nouvelles 68, 217–233 (2013).
Streed, G. G. In 6th World Petroleum Congress. (World Petroleum Congress).
d’ Angelo, J. V. H. & Francesconi, A. Z. Gas−liquid solubility of hydrogen in n-alcohols (1≪ n≪ 4) at pressures from 3.6 MPa to 10 MPa and temperatures from 298.15 K to 525.15 K. J. Chem. Eng. Data 46, 671–674 (2001).
Luo, H., Ling, K., Zhang, W., Wang, Y. & Shen, J. A model of solubility of hydrogen in hydrocarbons and coal liquid. Energy Sources Part A Recov. Util. Environ. Effects 33, 38–48 (2010).
Yin, J.-Z. & Tan, C.-S. Solubility of hydrogen in toluene for the ternary system H2+ CO2+ toluene from 305 to 343 K and 1.2 to 10.5 MPa. Fluid Phase Equilib. 242, 111–117 (2006).
Qian, J.-W., Jaubert, J.-N. & Privat, R. Phase equilibria in hydrogen-containing binary systems modeled with the Peng–Robinson equation of state and temperature-dependent binary interaction parameters calculated through a group-contribution method. J. Supercrit. Fluids 75, 58–71 (2013).
Jaubert, J.-N. & Mutelet, F. VLE predictions with the Peng–Robinson equation of state and temperature dependent kij calculated through a group contribution method. Fluid Phase Equilib. 224, 285–304 (2004).
Gross, J. & Sadowski, G. Perturbed-chain SAFT: An equation of state based on a perturbation theory for chain molecules. Ind. Eng. Chem. Res. 40, 1244–1260 (2001).
Saajanlehto, M., Uusi-Kyyny, P. & Alopaeus, V. A modified continuous flow apparatus for gas solubility measurements at high pressure and temperature with camera system. Fluid Phase Equilib. 382, 150–157 (2014).
Ghosh, A., Chapman, W. G. & French, R. N. Gas solubility in hydrocarbons—a SAFT-based approach. Fluid Phase Equilib. 209, 229–243 (2003).
Ma, M., Chen, S. & Abedi, J. Modeling the solubility and volumetric properties of H2 and heavy hydrocarbons using the simplified PC-SAFT. Fluid Phase Equilib. 425, 169–176 (2016).
Safamirzaei, M., Modarress, H. & Mohsen-Nia, M. Modeling the hydrogen solubility in methanol, ethanol, 1-propanol and 1-butanol. Fluid Phase Equilib. 289, 32–39 (2010).
Nasery, S., Barati-Harooni, A., Tatar, A., Najafi-Marghmaleki, A. & Mohammadi, A. H. Accurate prediction of solubility of hydrogen in heavy oil fractions. J. Mol. Liq. 222, 933–943 (2016).
Safamirzaei, M. & Modarress, H. Hydrogen solubility in heavy n-alkanes; modeling and prediction by artificial neural network. Fluid Phase Equilib. 310, 150–155 (2011).
Van Konynenburg, P. & Scott, R. Critical lines and phase equilibria in binary van der Waals mixtures. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Sci. 298, 495–540 (1980).
Privat, R. & Jaubert, J.-N. Classification of global fluid-phase equilibrium behaviors in binary systems. Chem. Eng. Res. Des. 91, 1807–1839 (2013).
Ronze, D., Fongarland, P., Pitault, I. & Forissier, M. Hydrogen solubility in straight run gasoil. Chem. Eng. Sci. 57, 547–553 (2002).
Gao, W., Robinson, R. L. & Gasem, K. A. High-pressure solubilities of hydrogen, nitrogen, and carbon monoxide in dodecane from 344 to 410 K at pressures to 13.2 MPa. J. Chem. Eng. Data 44, 130–132 (1999).
Gao, W., Robinson, R. L. & Gasem, K. A. Solubilities of hydrogen in hexane and of carbon monoxide in cyclohexane at temperatures from 344.3 to 410.9 K and pressures to 15 MPa. J. Chem. Eng. Data 46, 609–612 (2001).
Sebastian, H. M., Simnick, J. J., Lin, H.-M. & Chao, K.-C. Gas-liquid equilibrium in the hydrogen+ n-decane system at elevated temperatures and pressures. J. Chem. Eng. Data 25, 68–70 (1980).
Kim, K. J., Way, T. R., Feldman, K. T. & Razani, A. Solubility of hydrogen in octane, 1-octanol, and squalane. J. Chem. Eng. Data 42, 214–215 (1997).
Brunner, E. Solubility of hydrogen in 10 organic solvents at 298.15, 323.15, and 373.15 K. J. Chem. Eng. Data 30, 269–273 (1985).
Aslam, R. et al. Measurement of hydrogen solubility in potential liquid organic hydrogen carriers. J. Chem. Eng. Data 61, 643–649 (2016).
Phiong, H.-S. & Lucien, F. P. Solubility of hydrogen in α-methylstyrene and cumene at elevated pressure. J. Chem. Eng. Data 47, 474–477 (2002).
Peramanu, S. & Pruden, B. B. Solubility study for the purification of hydrogen from high pressure hydrocracker off-gas by an absorption-stripping process. Can. J. Chem. Eng. 75, 535–543 (1997).
Klink, A., Cheh, H. & Amick, E. Jr. The vapor-liquid equilibrium of the hydrogen—n-butane system at elevated pressures. AIChE J. 21, 1142–1148 (1975).
Nelson, E. & Bonnell, W. Solubility of hydrogen in n-butane. Ind. Eng. Chem. 35, 204–206 (1943).
Chen, T. & Guestrin, C. In Proceedings of the 22nd Acm SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794.
Zhang, J. et al. A unified intelligent model for estimating the (gas+ n-alkane) interfacial tension based on the eXtreme gradient boosting (XGBoost) trees. Fuel 282, 118783 (2020).
Dev, V. A. & Eden, M. R. Computer Aided Chemical Engineering Vol 47 113–118 (Elsevier, 2019).
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017).
Yang, X., Dindoruk, B. & Lu, L. A comparative analysis of bubble point pressure prediction using advanced machine learning algorithms and classical correlations. J. Petrol. Sci. Eng. 185, 106598 (2020).
Sun, X., Liu, M. & Sima, Z. A novel cryptocurrency price trend forecasting model based on LightGBM. Financ. Res. Lett. 32, 101084 (2020).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv:1706.09516 (arXiv preprint) (2017).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv:1810.11363 (arXiv preprint) (2018).
Meng, Q. et al. A communication-efficient parallel algorithm for decision tree. arXiv:1611.01276 (arXiv preprint) (2016).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
Dargahi-Zarandi, A., Hemmati-Sarapardeh, A., Shateri, M., Menad, N. A. & Ahmadi, M. Modeling minimum miscibility pressure of pure/impure CO2-crude oil systems using adaptive boosting support vector regression: Application to gas injection processes. J. Petrol. Sci. Eng. 184, 106499 (2020).
Smola, A. J. & Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004).
Schölkopf, B., Smola, A. J., Williamson, R. C. & Bartlett, P. L. New support vector algorithms. Neural Comput. 12, 1207–1245 (2000).
Vapnik, V., Golowich, S. E. & Smola, A. Support vector method for function approximation, regression estimation, and signal processing. Adv. Neural Inf. Process. Syst. 20, 281–287 (1997).
Lashkarbolooki, M., Hezave, A. Z. & Ayatollahi, S. Artificial neural network as an applicable tool to predict the binary heat capacity of mixtures containing ionic liquids. Fluid Phase Equilib. 324, 102–107 (2012).
Mohammadi, M.-R., Hemmati-Sarapardeh, A., Schaffie, M., Husein, M. M. & Ranjbar, M. Application of cascade forward neural network and group method of data handling to modeling crude oil pyrolysis during thermal enhanced oil recovery. J. Petrol. Sci. Eng. 20, 108836 (2021).
Hagan, M. T. & Menhaj, M. B. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 5, 989–993 (1994).
Sagara, H., Arai, Y. & Saito, S. Vapor-liquid equilibria of binary and ternary systems containing hydrogen and light hydrocarbons. J. Chem. Eng. Jpn. 5, 339–348 (1972).
Trust, D. & Kurata, F. Vapor-liquid phase behavior of the hydrogen-propane and hydrogen-carbon monoxide-propane systems. AIChE J. 17, 86–91 (1971).
Aroyan, H. J. & Katz, D. L. Low temperature vapour–liquid equilibria in hydrogen-n-butane system. Ind. Eng. Chem. 43, 185–189 (1951).
Sattler, H. Solubility of hydrogen in liquid hydrocarbons. Z. Tech. Phys. 21, 410–413 (1940).
Peter, S. & Reinhartz, K. Das phasengleichgewicht in den systemen H 2—n-heptan, H 2-methylcyclohexan und H 2–2, 2, 4-trimethylpentan bei Höheren Drucken und temperaturen. Z. Phys. Chem. 24, 103–118 (1960).
Sokolov, V. & Polyakov, A. Solubility of H2 in n-decane, n-tetradecane, 1-hexane, 1-octene, isopropyl benzene, 1-methyl naftalene and decalin. Zh. Prikl. Khim 50, 1403–1405 (1977).
Schofield, B., Ring, Z. & Missen, R. Solubility of hydrogen in a white oil. Can. J. Chem. Eng. 70, 822–824 (1992).
Dean, M. & Tooke, J. Vapor-liquid equilibria in three hydrogen-paraffin systems. Ind. Eng. Chem. 38, 389–393 (1946).
Lin, H.-M., Sebastian, H. M. & Chao, K.-C. Gas-liquid equilibrium in hydrogen+ n-hexadecane and methane+ n-hexadecane at elevated temperatures and pressures. J. Chem. Eng. Data 25, 252–254 (1980).
Berty, T., Reamer, H. & Sage, B. Phase behavior in the hydrogen-cyclohexane system. J. Chem. Eng. Data 11, 25–30 (1966).
Simnick, J. J., Sebastian, H. M., Lin, H.-M. & Chao, K.-C. Solubility of hydrogen in toluene at elevated temperatures and pressures. J. Chem. Eng. Data 23, 339–340 (1978).
Connolly, J. Thermodynamic properties of hydrogen in benzene solutions. J. Chem. Phys. 36, 2897–2904 (1962).
Krichevskii, I. & Efremova, G. FAZOVYE I OBEMNYE SOOTNOSHENIYA V SISTEMAKH ZHIDKOST-GAZ PRI VYSOKIKH DAVLENIYAKH. Zh. Fiz. Khim. 22, 1116–1125 (1948).
Malone, P. V. & Kobayashi, R. Light gas solubility in phenanthrene: The hydrogen—phenanthrene and methane—phenanthrene systems. Fluid Phase Equilib. 55, 193–205 (1990).
Simnick, J. J., Liu, K. D., Lin, H.-M. & Chao, K.-C. Gas-liquid equilibrium in mixtures of hydrogen and diphenylmethane. Ind. Eng. Chem. Process. Des. Dev. 17, 204–208 (1978).
Simnick, J., Lawson, C., Lin, H. & Chao, K. Vapor-liquid equilibrium of hydrogen/tetralin system at elevated temperatures and pressures. AIChE J. 23, 469–476 (1977).
Author information
Authors and Affiliations
Contributions
M.-R.M.: investigation, data curation, visualization, writing-original draft. F.H.: conceptualization, validation, modeling. M.P.: writing-review and editing, validation. S.A.: writing-review and editing, methodology, validation. M.T.M.: writing-original draft, supervision. A.H.-S.: methodology, validation, supervision, writing-review and editing. A.M.: writing-review and editing, validation, funding. A.M.: writing-review and editing, validation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mohammadi, MR., Hadavimoghaddam, F., Pourmahdi, M. et al. Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state. Sci Rep 11, 17911 (2021). https://doi.org/10.1038/s41598-021-97131-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-97131-8
- Springer Nature Limited
This article is cited by
-
Modeling liquid rate through wellhead chokes using machine learning techniques
Scientific Reports (2024)
-
Prediction of compressive strength and tensile strain of engineered cementitious composite using machine learning
International Journal of Mechanics and Materials in Design (2024)
-
Nanoscale modeling of an efficient Carbon Nanotube-based RF switch using XG-Boost machine learning algorithm
Microsystem Technologies (2024)
-
Machine Learning Approach in Dosage Individualization of Isoniazid for Tuberculosis
Clinical Pharmacokinetics (2024)
-
Compositional modeling of gas-condensate viscosity using ensemble approach
Scientific Reports (2023)