Abstract
Accurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N2) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N2 solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N2 solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N2 solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N2 solubility in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
Gas and fluids interactions are an undeniable part of many industrial procedures, which plays some major roles in many industries like petrochemical1,2,3, oil and gas4,5,6,7,8,9, medicine10, food11,12, environment13,14, polymer15,16, etc. Among the common gaseous phases normally present in the mentioned environments, colorless odorless nitrogen (N2) is one of the most common gases included as the feed or product in many processes. On the other hand, the presence of this gas as the dominant part of atmosphere components makes it an important case to be investigated accurately. The oil and gas industry would not be an exception, and N2 applications are observed in many subsidiaries of this industry, from the upstream to downstream. As a clear example, N2 and its related treatments have been used since few decades ago because of its unique properties for enhanced oil recovery (EOR) operations17,18,19. Usually, carbon dioxide (CO2) or N2 gases are continuously injected into the oil reservoir for miscible/immiscible oil displacement. These gases are extracted back out with the recovered oil, recaptured, and reinjected along with new gas until as much oil as possible is produced20. Cost efficiency and higher feasibility make some advantages for this component (N2) in comparison with CO2 and methane (CH4)21,22. However, N2 has been commonly utilized in deep reservoirs as it needs a higher injection pressure to gain miscibility with the reservoir fluids than does CO220. Also, in the midstream, N2 is used in pipeline drying, which is an essential part of pipeline commissioning to prevent unwanted aerosols through contaminant displacing23. There are many significant instances of N2 usage in downstream, like nitrogen purging which is a technique to avoid unintentional reaction of hazardous gas and hydrocarbons through the oxygen reduction in the environments that is susceptible to explosion24 that is a similar technique which is used in nitrogen blanketing25 in hydrocarbon storage tanks. Crude oil is a complex mixture of hydrocarbons. Achieving reliable predictions for the thermodynamics and phase equilibrium data of N2/oil systems is complex and difficult. Alkanes are the major constituents of crude oil and most petroleum products. Therefore, in many studies, the behavior of alkanes and the desired gas like N2 is studied first, and the obtained information will be later generalized to crude oil.
Solubility is one of the most important thermodynamics values representing the value of a gas dissolution in a liquid at a specific pressure and temperature. While many analytical methods are used to calculate the solubilities of gases in liquids mainly through the equations of state (EOSs)26,27,28,29, the accuracy of their prediction, especially in some critical industrial applications, has been a serious challenge yet. Based on previous experiments, the solubility of N2 in hydrocarbons is positively affected by increasing pressure and temperature26,27,28. Furthermore, as the molecular weight rises, N2 solubility increases, as evidenced by laboratory experiments29. Properly estimating phase equilibrium data in binary systems containing N2 and a hydrocarbon is difficult. Because, based on the classification scheme of Van Konynenburg and Scott30,31, binary systems of a hydrocarbon and N2 are recognized as type III phase diagrams, except the binary system of N2 + CH4, which is recognized as a type I system30,31. Risk of energy waste and potential hazards exist in operations which use N2. As a result, solubility data is critical for predicting an appropriate quantity of N2 to use in this operation, and it can improve plant safety. Studies with heavy hydrocarbons are particularly challenging due to their complexity. Furthermore, the dangers of high-temperature and/or high-pressure conditions in industrial operations make the extensive experiments an undesirable option. As a result, modelling with experimental data would be an alternative.
Mainly, the strategies for the prediction of N2 solubility in hydrocarbon solvents or petroleum blends rely on experimental and semi-empirical models like EOSs, and are comparable to those utilized to estimate the solubility of other gasses like CH4, CO2, and hydrogen32,33,34,35,36,37. In compressed N2, the vapor-phase solubilities of n-Decane, ferf-butylbenzene, 2,2,5-trimethylhexane, and n-dodecane were determined by Davila et al.38 and the second virial cross coefficients (\(B_{12}\)) were computed using these data38. A static equilibrium cell was used by Tong et al.29 to test the solubilities of N2 in four n-paraffin hydrocarbons (Decane, Eicosane, Octacosane, and Hexatriacontane). The Soave–Redlich–Kwong (SRK) and Peng-Robinson (PR) EOS were applied to analyze the data. The results show a growing trend in N2 solubility with rising pressure, temperature, and n-paraffin chain length29. N2 solubilities in various naphthenic (trans-Decalin and cyclohexane) and aromatic (naphthalene, 1-methylnaphthalene, benzene, phenanthrene, pyrene) solvents were determined by Gao et al.26 using a static cell. When a single interaction parameter (\(C_{ij}\)) is employed in each binary system, the PR-EOS was demonstrated to fit the model26. Privat et al.39,40 used the PR EOS combined with the group contribution method, called the PPR78 model, for predicting phase equilibrium data of mixtures containing various hydrocarbons and N2. This model is able to predict temperature-dependent binary interaction parameters (kij). The mentioned model provided satisfying results with an overall deviation lower than 10%. They also mentioned that for the hydrocarbon + N2 systems (except CH4); kij is a decreasing function of temperature39,40. At low temperatures, Justo-Garcia et al.41 modeled vapor–liquid-liquid equilibria (VLE) for N2 and alkanes in three distinct ternary systems. The findings demonstrate that both SRK and PC-SAFT EOSs estimate the experimentally observed values with reasonable accuracy41. In another study, Justo-Garcia et al.42 used the SRK and PC-SAFT EOSs to model three-phase vapor–liquid–liquid equilibria for a combination of natural gas having high N2 content. The results revealed that the PC-SAFT EOS accurately predicts phase behavior, but the SRK EOS suggests a three-phase region that is larger than what was observed experimentally42. The Krichevsky–Ilinskaya equation was used by Zirrahi et al.27 to estimate the solubility of light solvents (CO2, N2, CH4, C2H6, and CO) in bitumens from five Alberta reservoirs. The gas phase is analyzed applying the PR-EOS. The suggested model is then validated using experimental data on light solvent solubility. The results demonstrated that the proposed model accurately reflects known solubility data in bitumen for light hydrocarbons (CH4 and C2H6) and non-hydrocarbon solvents (N2, CO2, and CO)27. Haghbakhsh et al.43 investigated the vapor–liquid equilibria of binary N2–hydrocarbon mixtures across an extensive range of temperature and pressure applying PR and ER EOSs. They introduced a new correlative mode for the proposed equations to improve accuracy, which was likely to be effective, improving accuracy by up to three times43. Thermo-physical characteristics of CO2 and N2/bitumen solutions were studied by Haddadnia et al.28. Furthermore, PR-EOS was used to describe the calculated solubility28. PC-SAFT and SRK EOSs were employed by Wu et al.44 to estimate gas solubilities in n-alkanes. The PC-SAFT EOS was found to be able to accurately predict an empirically observed linear connection between gas solubilities in n-alkanes and their carbon number. Despite its satisfactory accuracy for gas solubility in lighter n-alkanes, the SRK EOS typically produces significantly poorer results than the PC-SAFT EOS44. Tsuji et al.45 investigated N2 and oxygen gas solubilities in benzene, divinylbenzene, and styrene. For a particular isotherm, gas solubility in liquids had a linear pressure dependency and declined with rising temperature. Ultimately, PR-EOS was implemented to predict gas solubilities45. Aguilar-Cisneros et al.46 determined the solubility of N2, CO2, and CH4 in petroleum fluids using the PR-EOS in conjunction with various mixing rules in systems including bitumens, heavy oils, refinery cuts, and coal liquids. The universal and van der Waals mixing rules revealed satisfactory outcome between experimental data and predicted values, while the modified Huron-Vidal of order one mixing rule produced large discrepancies46.
During the last decade, alongside the developments of intelligent methods based on machine learning (ML) techniques, many attempts have been made to predict thermodynamic results with a higher accuracy based on reliable experimental data. Abdi-Khanghah et al.47 studied alkane solubility in supercritical CO2. Two kinds of artificial neural networks were used for their study: Radial basis function (RBF) and multi-layer perceptron (MLP) artificial neural network (ANN). The MLP-ANN outperformed the RBF-ANN in predicting n-alkane solubility in supercritical CO247. Songolzadeh et al.48 demonstrated that the PSO–LSSVM model is an effective technique for predicting n-alkane solubility in supercritical CO2 with high accuracy. The least-squares support vector machine (LSSVM) was employed, which was tuned using two different optimizing algorithms: particle swarm optimization (PSO) and cross-validation-assisted Simplex algorithm (CV-Simplex)48. Chakraborty et al.49 developed a set of data-driven models capable of predicting VLE for the binary systems of C10-N2 and C12-N2. In comparison to the VLE modeled using the PR-EOS, both models significantly improved the estimated value of binary mixture equilibrium pressure49. Mohammadi et al.50 implemented different ML models to predict hydrogen solubility in various pure hydrocarbons in wide pressure and temperature ranges and compared them with some of the common EOSs. Their results showed that using intelligent models shows more precise results than the common usage of EOSs in hydrogen solubility estimation50. To predict nitrogen solubility in unsaturated, cyclic and aromatic hydrocarbons, Mohammadi et al.51 employed a convolutional neural network (CNN) and the results showed that pressure is the most significant factor for nitrogen solubility in unsaturated hydrocarbons. In general, prediction based on EOSs semi-analytical methods has been the common way to estimate the N2 solubilities in alkanes. On the other hand, the mentioned method is case-specific and it is limited to some defined hydrocarbons with specific parameters for each EOS. Hence, using intelligent models like proper ML algorithms and reliable experimental data may lead to a model for predicting N2 solubility in normal alkanes with high accuracy and this helps to accelerate predictions.
In this study, we use a dataset containing 1982 experimental N2 solubility data points for 19 distinct normal alkanes gathered under various operating states. Models for estimating N2 solubility in normal alkanes are constructed using well-known ML algorithms namely k-nearest neighbor (k-NN) and random forest (RF), as well as innovative ML methods such as extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost), and light gradient boosting machine (LightGBM). Furthermore, statistical parameters and graphical error assessments are used to verify the validity of the suggested models. Numerous N2 solubility systems are predicted by the methods proposed in this research and five EOSs, namely perturbed-chain statistical associating fluid theory (PC-SAFT), Redlich-Kwong (RK), Peng-Robinson (PR), Soave–Redlich–Kwong (SRK), and Zudkevitch-Joffee (ZJ). Eventually, the relevancy factor is utilized to assess the relative impact of input parameters on N2 solubility in normal alkanes.
Data collection
The modeling of N2 solubility in normal alkanes was performed using a large solubility databank containing 1982 data points collected from the literature29,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91. The properties of 19 normal alkanes (nC1 to nC36) utilized in this survey are presented in Table 1.
The inputs of the models were chosen to be temperature (K), pressure (MPa), and molecular weight (g/mol) of normal alkanes, whereas N2 solubility (in terms of mole fraction) was the desired output. The statistical details of the N2 solubility databank used for modeling are tabulated in Table 2. The validity, accuracy, and applicability of the model depend on the quantity and variety of N2 solubility data collected in different systems. The broad ranges of pressure (0.0212–69.12 MPa), temperature (91.21–703.4 K), and normal alkanes (nC1 to nC36) can lead to a reliable general model for estimating the solubilities of N2 in normal alkanes.
Models’ implementation
Algorithms’ selection
Due to recent advances in computation capacities and also the advent of new machine learning algorithms, there are many choices to use as algorithms for the problem under consideration. Because of the size of the dataset and small instance number and also based on the limited number of the features, some of the non-parametric ML models which mainly focus on the dataset and do not suffer from the small size of the dataset were noticed as the best choices in this case.
K-nearest neighbors (k-NN)
The k-NN method is an ML technique that is employed to solve both classification and regression problems. This supervised algorithm is widely used as a non-parametric technique for various applications92. In this algorithm, the k is the number of neighbors which are assigned to a new sample to predict the target based on its inheritance from these k samples that are closest to the new sample using a uniform weight assigning system or a specific distance function93. Distance function is a tool to allocate a weight to each of the k samples features to identify its contribution in final predicted value. Minkowski distance equation is the typical choice for the distance function. The general form of this equation is provided in Eq. (1), where X and Y are two samples feature sets. This function turns to Manhattan or Euclidean distance function in most of the cases by using the p = 1 or p = 2, respectively. Finding and selection of the optimal value of the k hyper-parameter is the most crucial stage in the training of this algorithm to achieve a satisfactory accuracy. Hence, the algorithms are run by a wide range of k value and the optimal case is revealed based on the comparison of statistical accuracy measurements among the explored cases.
Random forest
Random forest is a bagging supervised learning technique for classification and regression using the ensemble learning approach based on CART (Classification and Regression Trees)94. This algorithm avoids high prediction variance, which is a common issue in the decision tree algorithm. Random forests have trees, which run parallelly. These trees do not have any interaction with each other during the forest construction. It works by training a large number of decision trees and then determining the class that is the mean prediction of the individual trees in regression cases. At each node, the number of attributes that may be divided is limited to a certain proportion of the total which is known as the hyperparameter. This guarantees that the ensemble model does not depend too strongly on any specific attribute and that all potentially predictive variables are considered equally. In any CART tree training, the random forest technique picks the training dataset Ti, randomly from the complete training set T, by replacement (i.e., bootstrapping sampling). The data that was not included in the random sampling technique is referred to as "out-of-bag" data. The random forest technique picks N features or input variables randomly from a set of M input independent factors (N < M) while building each CART tree. According to the randomly picked Ti and M characteristics, the best splitting for each CART tree is calculated. The final results of the regression are being determined via majority voting. To increase the estimation precision, the averaged prediction reduces the averaged squared error on the individual estimations produced from an individual CART tree. The resulting ensemble trees are designated as follows (Eq. 2):
Extreme gradient boosting (XGBoost)
The fundamental concept behind a tree-based ensemble method is to use an ensemble of classification and regression trees (CARTs) to fit training data using a regularized objective function minimization. One of those other tree-based models is XGBoost, which is part of the gradient boosting decision tree framework (GBDT). To further explain the construction of the CART, each cart is made up of (I) a root node, (II) internal nodes, and (III) leaf nodes, as illustrated in Fig. 1. The root node, which represents the entire dataset, is split into internal nodes by the binary decision technique, whilst the leaf nodes reflect the final classifications. In gradient boosting, a sequence of basic CATRs are created simultaneously, with the weight of each individual CART being adjusted via the training process95.
An ensemble of n trees must be trained to predict the y for a specific dataset, m and n respectively show the count of features and instances.
where the decision rule \(q\left( x \right)\) maps the example to the binary leaf index. \(n\) shows the regression trees space, \(f_{k}\) shows the kth independent tree, T represents the count of tree’s leaves, and w shows the leaf’s weight in Eqs. 3 and 4.
The minimization of the regularized objective function \(L\) is used to determine the ensemble of trees:
where Ω shows the regularization term that helps to reduce overfitting by reducing the model's complexity; l stands for a loss function that is differentiable and convex; γ is the minimal loss reduction required to split a new leaf; and λ displays the regulation coefficient. It is worth noting that in these equations λ and γ assist to increase model variance and avoid overfitting.
The objective function for each individual leaf is reduced in the gradient boosting technique, and additional branches are added sequentially.
The t-th iteration of the above-mentioned training procedure is represented by t. The XGBoost method aggressively adds the space of regression trees to greatly improve the ensemble model, which is sometimes dubbed "greedy algorithm". As a result, the model output is updated continuously by minimizing the objective function:
The XGBoost takes use of a shrinkage technique in which newly added weights are scaled by a learning factor rate after each stage of boosting. This minimizes the risk of overfitting by reducing the impact of future additional trees on each available individual tree96.
Light gradient boosting machine (LightGBM)
LightGBM is a novel gradient learning framework based on the decision tree concept. The main advantages of LightGBM over XGBoost are that it uses less memory, uses a leaf-wise growth method with depth constraints, and uses a histogram-based technique to speed up the training process. LightGBM discretizes continuous floating-point eigenvalues to k bins through using the aforementioned histogram technique, resulting in a k-width histogram. Furthermore, the histogram technique does not require additional storing of pre-sorted results, and values may be stored in an 8-bit integer after feature discretization, reducing memory usage to 1/8. Despite this, the model's accuracy suffers as a result of the harsh partitioning method. LightGBM also employs a leaf-by-leaf technique, which is more successful than the usual level-by-level strategy. The reason for this inefficiency in level-wise approach is that at each step, only leaves from the same layer are examined, resulting in unnecessary memory allocation. Alternatively, at each stage of the leaf-wise method, the algorithm finds the leaves with the largest branching gain, and then proceeds to the branching cycle. In comparison to the horizontal direction, errors can be reduced and greater precision can be attained with the same number of segmentations. The leaf-wise tree development technique is illustrated in Fig. 2. The disadvantage of leaf orientation is that it forces you to build deeper decision trees, which invariably leads to overfitting. On the other hand, LightGBM prevents overfitting while maintaining high efficiency by imposing a maximum depth restriction on the leaf top97,98.
For a specific training dataset \(X = \left\{ {(x_{i} ,y_{i} )} \right\}_{{_{i = 1} }}^{m}\), LightGBM searches an approximation \(\hat{f}\left( x \right)\) to the function f*(x) to minimize the expected values of specific loss functions L (y, f (x)):
LightGBM ensembles many T regression trees \(\mathop \sum \limits_{t = 1}^{T} f_{t } \left( x \right)\) to approximate the model. The regression trees are defined as wq(x), \(q \in \left\{ {1, \, 2, \ldots ,N} \right\}\), where q shows the decision rule of trees, N is defined as the count of tree leaves, and w denotes a vector shows the sample weights of leaf nodes. The model is trained in the additive form at step t:
To estimate the objective function, the newton's approach is employed.
Gradient boosting with categorical features support (CatBoost)
CatBoost, which employs one hot max size (OHMS) that is a permutation technique beside the target-based statistics, employs categorical columns for categorical boosting. For a new split of the present tree, a greedy approach is utilized in this methodology, allowing CatBoost to identify the exponential evolution of the feature combination99. In CatBoost, for each feature with more categories than OHMS, the following steps are applied:
-
1.
Records are divided into subsets at random.
-
2.
Integer conversion of labels
-
3.
Convert categorical features to numeric values as follows:
$$avg\;Target = \frac{countInClass + prior}{{totalCount + 1}}$$(9)
where \(countInClass\) is the number of targets having a value of one for a category attribute, and \(totalCount\) is the number of preceding objects (the starting parameters specify prior to count objects)100,101.
Equations of state (EOSs)
EOS is a mathematical expression for the connection among a substance's volume, temperature, and pressure. This equation may be used to explain VLE, volumetric behavior, and thermodynamic properties of mixtures and pure substances. EOSs are used to estimate the phase behavior of petroleum fluids. As previously stated, EOSs have poor predictors of gas solubility in solvents, particularly under complicated working circumstances. Five EOSs were used to assess N2 solubility in hydrocarbons in this research, and their reliability in predicting N2 solubility is compared to ML algorithms. Mathematical equations of implemented EOSs are shown in Table 3. Table 4 also shows the parameters of the EOSs. Also, some required molecular parameters corresponding to each substance which is investigated with PC-SAFT EOS are provided in Table 5. Besides, a proper mixing rule is needed to use for estimation of each mixture’s parameters. In this study, van der Waals one-fluid mixing rules have been utilized, and its corresponding mathematical expression is provided in Table 4.
Evaluation of models
The following statistical parameters, namely root mean square error (RMSE), standard deviation (SD), and coefficient of determination (R2) were used in this survey to evaluate the performance of models:
where Z, NSi,exp, and NSi,pred are the count of data, experimental N2 solubility, and predicted N2 solubility in normal alkanes, respectively.
On the other hand, the following graphical tools were utilized simultaneously to evaluate the performance of the ML models:
Cross plot: The most well-known graphical analysis in which the predicted values are plotted against the measured values and the accuracy of the models is evaluated by examining the proximity of the data points to the unit slop line.
Trend plot: This plot helps to check the validity of the model by sketching both real data and the model's estimation versus the specific property or data index.
Error distribution plot: The error (measured value − predicted value) is plotted against the real data to assess the scatter of data around the zero-error line and to explore the possible error trend.
Histogram plot of errors: This graph shows how the errors from the model are distributed. This statistical tool indicates the discrepancy between the measured and predicted values, in which a normal distribution centered at zero error is expected for a good model.
Results and discussion
Model optimization and tuning
To find the best model in each aforementioned algorithm, a routine procedure has been done to find the hyperparameters and the other functional features of each model. Since these models have been implemented in python, different libraries including scikit-learn for k-NN and Random forest110, xgboost for XGBoost, lightgbm for LightGBM98, and catboost99 for Catboost have been employed in this study. In each of these involves some parameters that should be set by user or they can be work on default mode. To find the best model state in each of algorithms, a wide range of selective parameters have been selected and the best model based on the training and test data RMSE has been chosen. The search space and the final arrangements of model are provided in Table 6.
Statistics and performance metrics of the models
The model’s precision in predicting N2 solubility in normal alkanes was assessed statistically based on several statistical criteria including RMSE, R2, and SD. Table 7 reports the calculated values of these statistical factors for the training subset, testing subset, and the entire dataset of all ML models. The possibility of overtraining is completely rejected given that no meaningful difference was seen between the testing and training subsets for all models. Based on Table 7, the CatBoost model has the lowest prediction errors among the developed ML models with RMSE values of 0.0125, 0.0213, and 0.0147 for the training subset, testing subset, and the entire dataset, respectively. Also, the overall R2 of 0.9943 for the CatBoost model is higher than other models and has a lower SD, indicating a better fit for this model to the experimental data. Moreover, random forest, XGBoost, LightGBM, and k-NN models are categorized after the CatBoost model in terms of good performance, respectively.
As mentioned earlier, several EOSs have been used comparatively with the ML models to estimate N2 solubility in normal alkanes. Hence, the solubilities of N2 in several normal alkanes namely Hexadecane, Eicosane, Octacosane, and hexatriacontane, which experimental values have been reported in the literature29,90, are estimated utilizing ML models and EOSs. Tables 8, 9, 10 and 11 represented the N2 solubility data and predictions of EOSs and ML models along with RMSE values for each of them. As can be seen, the CatBoost model provides the best estimates among the ML models and EOSs for the N2 solubility in all considered normal alkanes. ZJ EOS also had precise estimations for solubility values and outperformed other EOSs. On the other hand, as shown in Table 3, the Péneloux-type volume translation (c) has been used in the PR and SRK EOSs for the sake of investigation. Based on our studies, Péneloux-type volume translation does not have any effect on the obtained solubility values111,112.
Graphical analysis of the models
In the next step, the evaluation of the ML models is performed by graphical analysis. First, cross plots of the experimental N2 solubility data versus predicted values by the ML models for the training and testing stages are presented in Fig. 3. All five ML models performed well in both training and testing stages and most of the data points are accumulated around the X = Y line, although the scatter of points is much less for the CatBoost model and is more concentrated around the X = Y line, indicating the excellent performance of this model in estimating N2 solubility in normal alkanes.
Next, the distributions of the N2 solubility prediction errors (measured—predicted) utilizing the ML models versus the experimental data are shown in Fig. 4. High concentrations of near-zero error points for a predictive tool indicate a better performance of that predictive tool in predicting N2 solubility in normal alkanes. Again, the CatBoost model resulted in near-zero errors, verifying its accuracy and reliability. However, other ML models especially random forest shows good predictions with low errors for the N2 solubility in normal alkanes.
The next step of the graphical assessment of introduced ML models for the prediction of N2 solubility in normal alkanes is related to the frequency of errors. Figure 5 depicts the histograms of errors corresponding to the proposed ML models in this work. As it is clear, the symmetric distributions are seen in the histogram graphs of all ML models. Also, the bursts of growing at the zero-error value for all developed models confirm the superb match between estimated and experimental data of N2 solubility in normal alkanes. However, the percentage frequency of errors at the zero-error value is about 85% for the CatBoost model and it is much higher than other ML models indicating the high credit of this model in estimating N2 solubility in normal alkanes.
However, all the models used in this study show satisfactory performances. As it is obvious from the statistical and graphical analyses, the CatBoost model shows the best performance among the implemented ML models. The performance of a model depends on many factors, such as the case of study and the structure of the dataset, and this superiority in performance for this model stems from two main reasons. The first one is the structure of the dataset used in this work, based on the shape of the dataset, there are many instances that have equal values in the n-1 feature and their only difference is in one feature. This feature enables the tree-based models to do a better splitting operation and finally brings higher accuracy. Secondly, Catboost models use symmetric trees and it helps to have a faster inference. Also, its boosting schemes are the main reason which avoids overfitting and increases the model quality after the training process. Finally, it should be noted that these advantages for Catboost strongly depend on the dataset and it cannot be generalized to all problems.
Pressure and temperature trend analysis
As the final assessment step, various visual evaluations were executed to appraise the CatBoost model's capability in various N2 solubility in hydrocarbons systems. Figure 6 represents the effect of pressure on N2 solubility for n-Decane system at the temperature of 503 K. Figure 6 shows N2 solubilities estimated by the CatBoost model for this case, as well as the values determined by the EOSs along with the literature experimental results87. The mismatch between standard EOSs estimations and actual experimental data is quite significant at high temperatures. As seen in this figure, the CatBoost model predicts experimental data quite well. Based on expectations, the solubility of N2 in n-Decane rises as the pressure increases. Meanwhile, the EOSs overestimate or underestimate the N2 solubility ‘growth when pressure rises, while the CatBoost model strictly traces the trend.
The predictions of CatBoost and other proposed ML models for N2 solubility data in a light hydrocarbon (methane)61 under various operation conditions at a constant temperature of 180 K are provided in Fig. 7. All the intelligent models follow the trend well, and show a positive trend in N2 solubility as pressure increases. The CatBoost model, as shown in this figure, accurately recognizes data patterns and provides excellent estimations in all pressures.
Finally, a similar trend analysis performed to investigate the performance of different ML models at various temperature states to estimate the N2 solubility in n-hexane at the constant pressure of 27.57 MPa74. Based on Fig. 8, similar to the previous case, a satisfactory trend capturing is observed in all the intelligent models. However, the Catboost model provides more accurate predictions. Also, the figure indicates an increase in N2 solubility as temperature rises.
Sensitivity analysis
Utilizing the CatBoost model as the best-developed model in the current study, a sensitivity analysis was performed. To this end, the relevancy factor (r)113 was calculated for each input parameter using the following equation, with the knowledge that the higher the r-value, the greater impact on the model's output. It should also be noted that the positive r-value for a parameter indicates its direct effect on the output of the model and vice versa114.
where Ii,j represents the jth value of the ith input variable (i is molecular weight of normal alkanes, pressure, and temperature); Im,i shows mean value of the ith input; NSm and NSj denote the mean value and the jth value of predicted N2 solubility in normal alkanes, respectively. The outcomes of the relevancy factor analysis are depicted in Fig. 9. According to Fig. 9, all input parameters, namely temperature, pressure, and molecular weight of normal alkanes have a positive effect on N2 solubility in normal alkanes. The results reveal that the pressure has the greatest impact on N2 solubilities in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes. Based on Henry's law, the amount of dissolved gas in a liquid is proportional to its partial pressure in equilibrium with that liquid. When the gas is at a higher pressure, its molecules collide more with each other and with the liquid's surface. As the molecules collide more with the surface of the liquid, they can squeeze between the liquid molecules and thus become a part of the solution115,116. On the other hand, the sensitivity analysis overall shows that the solubility of N2 in normal alkanes increases when the temperature increases. This shows the reverse order solubility phenomenon that is the opposite of what commonly happens for a binary mixture of a supercritical component and a subcritical component73,81. The reason for this may be due to the repulsive nature of N2–N2 interaction. The N2–N2 repulsive force decreases with an increase in temperature, which results in increased solubility of N2 at higher temperatures. However, increasing the solubility of N2 with an increase in temperature may not be true for all normal alkanes and literature survey shows that the N2 solubility in methane and ethane decreases with increasing temperature117. Normal alkanes are nonpolar, as they contain nothing but C–C and C–H bonds. N2 is also a nonpolar molecule and nonpolar substances tend to dissolve in nonpolar solvents such as normal alkanes. The molecular weight of the normal alkanes is mainly increased by adding C–C and C–H bonds. The obvious consequence of this is that the N2 solubility increases as the number or length of the nonpolar chains increases.
Conclusions
In the present work, N2 solubility in normal alkanes (nC1 to nC36) was modeled using five representative ML models namely CatBoost, k-NN, LightGBM, random forest, and XGBoost by utilizing a large N2 solubility databank in a wide range of operating temperature (91.21–703.4 K) and pressure (0.0212–69.12 MPa). Also, five EOSs namely RK, SRK, ZJ, PR, and PC-SAFT were used comparatively with the ML models to estimate N2 solubility in normal alkanes. The developed CatBoost model was superior to all of ML models and EOSs with an overall RMSE of 0.0147 and R2 of 0.9943. Moreover, Random Forest, XGBoost, LightGBM, and k-NN models were ranked after the CatBoost model in terms of good performance, respectively. Furthermore, ZJ EOS showed the best performance among the EOSs. Finally, the results of relevancy factor analysis indicated that all input variables to the models, namely temperature, pressure, and molecular weight of normal alkanes have a positive effect on N2 solubilities in normal alkanes and pressure has the greatest effect among these input variables. The solubility of N2 increases with increasing the molecular weight of normal alkanes.
Abbreviations
- CARTs:
-
Classification and regression trees
- CNN:
-
Convolutional neural network
- EOR:
-
Enhanced oil recovery
- EOS:
-
Equation of state
- exp:
-
Experimental
- k-NN:
-
K-nearest neighbors
- ML:
-
Machine learning
- Mw:
-
Molecular weight
- NS:
-
Nitrogen solubility
- PC-SAFT:
-
Perturbed-Chain Statistical Associating Fluid Theory
- PR:
-
Peng-Robinson EOS
- pred:
-
Predicted
- RMSE:
-
Root mean square error
- RK:
-
Redlich-Kwong EOS
- SAFT:
-
Statistical associating fluid theory
- SRK:
-
Soave–Redlich–Kwong EOS
- SD:
-
Standard deviation
- SW:
-
Schmidt-Wenzel EOS
- VLE:
-
Vapor–liquid equilibria
- XGBoost:
-
EXtreme Gradient Boosting
- N2 :
-
Nitrogen
- R2 :
-
Coefficient of determination
- Pc :
-
Critical pressure
- Tc :
-
Critical temperature
References
Baukal, C. E., Hayes, R., Grant, M., Singh, P. & Foote, D. Nitrogen oxides emissions reduction technologies in the petrochemical and refining industries. Environ. Prog. 23(1), 19–28 (2004).
Hodges, A., Fica, Z., Wanlass, J., VanDarlin, J. & Sims, R. Nutrient and suspended solids removal from petrochemical wastewater via microalgal biofilm cultivation. Chemosphere 174, 46–48 (2017).
Carvalho, M. A. F. D. et al. A potential material for removal of nitrogen compounds in petroleum and petrochemical derivates. Chem. Eng. Commun. 208, 1564–1579 (2020).
Ahmed, T., Menzie, D. & Crichlow, H. Preliminary experimental results of high-pressure nitrogen injection for EOR systems. Soc. Petrol. Eng. J. 23(02), 339–348 (1983).
Rezaei, M., Shadizadeh, S., Vosoughi, M. & Kharrat, R. An experimental investigation of sequential CO2 and N2 gas injection as a new EOR method. Energy Sources A 36(17), 1938–1948 (2014).
Heucke, U. Nitrogen injection as IOR/EOR solution for North African oil fields. In SPE North Africa Technical Conference and Exhibition, OnePetro (2015).
Tovar, F. D., Barrufet, M. A. & Schechter, D. S. Enhanced oil recovery in the wolfcamp shale by carbon dioxide or nitrogen injection: An experimental investigation. SPE J. 26(01), 515–537 (2021).
Ameli, F., Hemmati-Sarapardeh, A., Schaffie, M., Husein, M. M. & Shamshirband, S. Modeling interfacial tension in N2/n-alkane systems using corresponding state theory: Application to gas injection processes. Fuel 222, 779–791 (2018).
Barati-Harooni, A. et al. Estimation of minimum miscibility pressure (MMP) in enhanced oil recovery (EOR) process by N2 flooding using different computational schemes. Fuel 235, 1455–1474 (2019).
De Santis, L., Parmegiani, L. & Scarica, C. Changing perspectives on liquid nitrogen use and storage. J. Assist. Reprod. Genet. 38(4), 783–784 (2021).
Prandi, B. et al. Food wastes from agrifood industry as possible sources of proteins: A detailed molecular view on the composition of the nitrogen fraction, amino acid profile and racemisation degree of 39 food waste streams. Food Chem. 286, 567–575 (2019).
Wang, H. et al. Improving the functionality of proso millet protein and its potential as a functional food ingredient by applying nitrogen fertiliser. Foods 10(6), 1332 (2021).
Winkler, M. K. & Straka, L. New directions in biological nitrogen removal and recovery from wastewater. Curr. Opin. Biotechnol. 57, 50–55 (2019).
Vollmer, A. C. & Bark, S. J. Twenty-five years of investigating the universal stress protein: Function, structure, and applications. Adv. Appl. Microbiol. 102, 1–36 (2018).
Han, A. et al. A polymer encapsulation strategy to synthesize porous nitrogen-doped carbon-nanosphere-supported metal isolated-single-atomic-site catalysts. Adv. Mater. 30(15), 1706508 (2018).
Vandenbossche, M. & Hegemann, D. Recent approaches to reduce aging phenomena in oxygen-and nitrogen-containing plasma polymer films: An overview. Curr. Opin. Solid State Mater. Sci. 22(1), 26–38 (2018).
Fahandezhsaadi, M. et al. Laboratory evaluation of nitrogen injection for enhanced oil recovery: Effects of pressure and induced fractures. Fuel 253, 607–614 (2019).
Fathinasab, M., Ayatollahi, S. & Hemmati-Sarapardeh, A. A rigorous approach to predict nitrogen-crude oil minimum miscibility pressure of pure and nitrogen mixtures. Fluid Phase Equilib. 399, 30–39 (2015).
Hemmati-Sarapardeh, A., Mohagheghian, E., Fathinasab, M. & Mohammadi, A. H. Determination of minimum miscibility pressure in N2–crude oil system: A robust compositional model. Fuel 182, 402–410 (2016).
Zhao, H., Morgado, P., Gil-Villegas, A. & McCabe, C. Predicting the phase behavior of nitrogen+ n-alkanes for enhanced oil recovery from the SAFT-VR approach: Examining the effect of the quadrupole moment. J. Phys. Chem. B 110(47), 24083–24092 (2006).
Liang, S. et al. Study on EOR method in offshore oilfield: Combination of polymer microspheres flooding and nitrogen foam flooding. J. Petrol. Sci. Eng. 178, 629–639 (2019).
Burrows, L. C. et al. A literature review of CO2, natural gas, and water-based fluids for enhanced oil recovery in unconventional reservoirs. Energy Fuels 34(5), 5331–5380 (2020).
Xiaofeng, D., Yongchun, H. & Weimao, P. Nitrogen dry replacement technology in natural gas pipeline and its practical application. Chem. Eng. Oil Gas/Shi You Yu Tian Ran Qi Hua Gong 40(3), 325–328 (2011).
Kameya, T. et al. Nitrogen purge condition for simultaneous GC/MS measurement of chemicals. J. Water Environ. Technol. 12(2), 161–175 (2014).
Yanisko, P., Zheng, S., Dumoit, J. & Carlson, B. Nitrogen: A security blanket for the chemical industry. Chem. Eng. Prog. 107(11), 50–55 (2011).
Gao, W., Gasem, K. A. & Robinson, R. L. Solubilities of nitrogen in selected naphthenic and aromatic hydrocarbons at temperatures from 344 to 433 K and pressures to 22.8 MPa. J. Chem. Eng. Data 44(2), 185–189 (1999).
Zirrahi, M., Hassanzadeh, H., Abedi, J. & Moshfeghian, M. Prediction of solubility of CH4, C2H6, CO2, N2 and CO in bitumen. Can. J. Chem. Eng. 92(3), 563–572 (2014).
Haddadnia, A., Zirrahi, M., Hassanzadeh, H. & Abedi, J. Solubility and thermo-physical properties measurement of CO2-and N2-Athabasca bitumen systems. J. Petrol. Sci. Eng. 154, 277–283 (2017).
Tong, J., Gao, W., Robinson, R. L. & Gasem, K. A. Solubilities of nitrogen in heavy normal paraffins from 323 to 423 K at pressures to 18.0 MPa. J. Chem. Eng. Data 44(4), 784–787 (1999).
Van Konynenburg, P. & Scott, R. Critical lines and phase equilibria in binary van der Waals mixtures. Philos. Trans. R. Soc. Lond. A 298(1442), 495–540 (1980).
Privat, R. & Jaubert, J.-N. Classification of global fluid-phase equilibrium behaviors in binary systems. Chem. Eng. Res. Des. 91(10), 1807–1839 (2013).
Jamali, M., Izadpanah, A. A. & Mofarahi, M. Correlation and prediction of solubility of hydrogen in alkenes and its dissolution properties. Appl. Petrochem. Res. 11, 89–98 (2021).
Park, J., Robinson, R. L. & Gasem, K. A. Solubilities of hydrogen in aromatic hydrocarbons from 323 to 433 K and pressures to 21.7 MPa. J. Chem. Eng. Data 41(1), 70–73 (1996).
Li, H. & Yan, J. Evaluating cubic equations of state for calculation of vapor–liquid equilibrium of CO2 and CO2-mixtures for CO2 capture and storage processes. Appl. Energy 86(6), 826–836 (2009).
Schwarz, B. J. & Prausnitz, J. M. Solubilities of methane, ethane, and carbon dioxide in heavy fossil-fuel fractions. Ind. Eng. Chem. Res. 26(11), 2360–2366 (1987).
Tsuji, T., Shinya, Y., Hiaki, T. & Itoh, N. Hydrogen solubility in a chemical hydrogen storage medium, aromatic hydrocarbon, cyclic hydrocarbon, and their mixture for fuel cell systems. Fluid Phase Equilib. 228, 499–503 (2005).
Twu, C. H., Coon, J. E., Harvey, A. H. & Cunningham, J. R. An approach for the application of a cubic equation of state to hydrogen−hydrocarbon systems. Ind. Eng. Chem. Res. 35(3), 905–910 (1996).
D’Avila, S. G., Kaul, B. K. & Prausnitz, J. M. Solubilities of heavy hydrocarbons in compressed methane and nitrogen. J. Chem. Eng. Data 21(4), 488–491 (1976).
Privat, R., Jaubert, J.-N. & Mutelet, F. Addition of the nitrogen group to the PPR78 model (predictive 1978, Peng Robinson EOS with temperature-dependent k ij calculated through a group contribution method). Ind. Eng. Chem. Res. 47(6), 2033–2048 (2008).
Privat, R., Jaubert, J.-N. & Mutelet, F. Use of the PPR78 model to predict new equilibrium data of binary systems involving hydrocarbons and nitrogen. Comparison with other GCEOS. Ind. Eng. Chem. Res. 47(19), 7483–7489 (2008).
Justo-García, D. N., García-Sánchez, F., Stateva, R. P. & García-Flores, B. E. Modeling of the multiphase behavior of nitrogen-containing systems at low temperatures with equations of state. J. Chem. Eng. Data 54(9), 2689–2695 (2009).
Justo-García, D. N., García-Sánchez, F., Díaz-Ramírez, N. L. & Díaz-Herrera, E. Modeling of three-phase vapor–liquid–liquid equilibria for a natural-gas system rich in nitrogen with the SRK and PC-SAFT EoS. Fluid Phase Equilib. 298(1), 92–96 (2010).
Haghbakhsh, R., Parvaneh, K. & Esmaeilzadeh, F. New models for the binary interaction parameters of nitrogen–alkanes mixtures based on the cubic equations of state. Chem. Eng. Commun. 205(7), 914–928 (2018).
Wu, H., Zheng, K., Wang, G., Yang, Y. & Li, Y. Modeling of gas solubility in hydrocarbons using the perturbed-chain statistical associating fluid theory equation of state. Ind. Eng. Chem. Res. 58(27), 12347–12360 (2019).
Tsuji, T. et al. Gas solubilities of nitrogen or oxygen in benzene, divinylbenzene, styrene and of an equimolar (N2: O2) mixture in styrene at (293–313) K. Fluid Phase Equilib. 492, 34–40 (2019).
Aguilar-Cisneros, H., Uribe-Vargas, V. & Carreon-Calderon, B. Estimation of gas solubility in petroleum fractions using PR-UMR and group contributions methods. Fuel 275, 117911 (2020).
Abdi-Khanghah, M., Bemani, A., Naserzadeh, Z. & Zhang, Z. Prediction of solubility of N-alkanes in supercritical CO2 using RBF-ANN and MLP-ANN. J. CO2 Util. 25, 108–119 (2018).
Songolzadeh, R., Shahbazi, K. & Madani, M. Modeling n-alkane solubility in supercritical CO 2 via intelligent methods. J. Pet. Explor. Prod. 11(1), 279–287 (2021).
Chakraborty, S., Sun, Y., Lin, G. & Qiao, L. Vapor-liquid equilibrium predictions of n-alkane/nitrogen mixtures using neural networks. arXiv preprint (2020).
Mohammadi, M.-R. et al. Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state. Sci. Rep. 11(1), 1–20 (2021).
Mohammadi, M.-R. et al. Modeling of nitrogen solubility in unsaturated, cyclic, and aromatic hydrocarbons: Deep learning methods and SAFT equation of state. J. Taiwan Inst. Chem. Eng. https://doi.org/10.1016/j.jtice.2021.10.024 (2021).
Makranczy, J., Megyery-Balog, K. M., Rusz, L. & Patyi, L. Solubility of gases in normal-alkanes. Hung. J. Ind. Chem. 4(1), 269–280 (1976).
Wilcock, R. J., Battino, R., Danforth, W. F. & Wilhelm, E. Solubilities of gases in liquids II. The solubilities of He, Ne, Ar, Kr, O2, N2, CO, CO2, CH4, CF4, and SF6 in n-octane 1-octanol, n-decane, and 1-decanol. J. Chem. Thermodyn. 10(9), 817–822 (1978).
Tremper, K. K. & Prausnitz, J. M. Solubility of inorganic gases in high-boiling hydrocarbon solvents. J. Chem. Eng. Data 21(3), 295–299 (1976).
Bloomer, O. T. & Rao, K. N. Thermodynamic Properties of Nitrogen (Institute of Gas Technology, 1952).
Cheung, H. & Wang, D.-J. Solubility of volatile gases in hydrocarbon solvents at cryogenic temperatures. Ind. Eng. Chem. Fundam. 3(4), 355–361 (1964).
Chang, S.-D. & Lu, B. C. Vapor-Liquid Equilibriums in the Nitrogen-Methane-Ethane System (University of Ottawa, 1967).
Miller, R., Kidnay, A. & Hiza, M. Liquid-vapor equilibria at 112.00 K for systems containing nitrogen, argon, and methane. AIChE J. 19(1), 145–151 (1973).
Parrish, W. & Hiza, M. Liquid-vapor equilibria in the nitrogen-methane system between 95 and 120 K. In Advances in Cryogenic Engineering 300–308 (Springer, 1995).
Stryjek, R., Chappelear, P. S. & Kobayashi, R. Low-temperature vapor-liquid equilibriums of nitrogen-methane system. J. Chem. Eng. Data 19(4), 334–339 (1974).
Kidnay, A., Miller, R., Parrish, W. & Hiza, M. Liquid-vapour phase equilibria in the N2-CH4 system from 130 to 180 K. Cryogenics 15(9), 531–540 (1975).
Eakin, B. E., Ellington, R. & Gami, D. Physical-Chemical Properties of Ethane-Nitrogen Mixtures (Institute of Gas Technology, 1955).
Stryjek, R., Chappelear, P. S. & Kobayashi, R. Low-temperature vapor-liquid equilibriums of nitrogen-ethane system. J. Chem. Eng. Data 19(4), 340–343 (1974).
Grausø, L., Fredenslund, A. & Mollerup, J. Vapour-liquid equilibrium data for the systems C2H6+ N2, C2H4+ N2, C3H8+ N2, and C3H6+ N2. Fluid Phase Equilib. 1(1), 13–26 (1977).
Gupta, M. K., Gardner, G. C., Hegarty, M. J. & Kidnay, A. J. Liquid-vapor equilibriums for the N2+ CH4+ C2H6 system from 260 to 280 K. J. Chem. Eng. Data 25(4), 313–318 (1980).
Schindler, D., Swift, G. & Kurata, F. More low temperature VL design data. Hydrocarb. Process. 45(11), 205 (1966).
Poon, D. & Lu, B.-Y. Phase equilibria for systems containing nitrogen, methane, and propane. In Advances in Cryogenic Engineering 292–299 (Springer, 1995).
Frolich, P. K., Tauch, E., Hogan, J. & Peer, A. Solubilities of gases in liquids at high pressure. Ind. Eng. Chem. 23(5), 548–550 (1931).
Akers, W., Attwell, L. & Robinson, J. Nitrogen-butane system. Ind. Eng. Chem. 46(12), 2539–2540 (1954).
Roberts, L. & McKetta, J. J. Vapor-liquid equilibrium in the n-butane-nitrogen system. AIChE J. 7(1), 173–174 (1961).
Skripka, V., Barsuk, S., Nikitina, I., Gubkina, G. & Benyaminovich, O. Liquid-vapor equilibriums in a nitrogen-n-butane system. GazoV. Promst 14(4), 41–45 (1969).
Kalra, H., Robinson, D. B. & Besserer, G. J. The equilibrium phase properties of the nitrogen-n-pentane system. J. Chem. Eng. Data 22(2), 215–218 (1977).
Silva-Oliver, G., Eliosa-Jiménez, G., García-Sánchez, F. & Avendaño-Gómez, J. R. High-pressure vapor–liquid equilibria in the nitrogen–n-pentane system. Fluid Phase Equilib. 250(1–2), 37–48 (2006).
Poston, R. & McKetta, J. Vapor-liquid equilibrium in the methane-n-hexane system. J. Chem. Eng. Data 11(3), 362–363 (1966).
Baranovich, Z., Bogdanova, L. & Smirnova, A. Solubility of argon in nhexane at low temperatures. Russ. J. Appl. Chem 42(6), 1393–1396 (1969).
Eliosa-Jiménez, G., Silva-Oliver, G., García-Sánchez, F. & de Ita de laTorre, A. High-pressure vapor–liquid equilibria in the nitrogen+ n-hexane system. J. Chem. Eng. Data 52(2), 395–404 (2007).
Boomer, E., Johnson, C. & Piercey, A. Equilibria in two-phase, gas-liquid hydrocarbon systems: IV. Methane and heptane. Can. J. Res. 16(11), 396–410 (1938).
Akers, W., Kehn, D. & Kilgore, C. Volumetric and phase behavior of nitrogen-hydrogen systems: Nitrogen-n-heptane system. Ind. Eng. Chem. 46(12), 2536–2539 (1954).
Peter, S. & Eicke, H. Phase equilibrium in the systems nitrogen-n-heptane, nitrogen-2, 2, 4-trimethylpentane, and nitrogen-methylcyclohexane at higher pressures and temperatures. Ber. Bunsen-Ges 74(3), 190–194 (1970).
Brunner, G., Peter, S. & Wenzel, H. Phase equilibrium in the systems n-heptane-nitrogen, methylcyclohexane-nitrogen and n-heptane-methylcyclohexane-nitrogen at high pressures. Chem. Eng. J. 7(2), 99–104 (1974).
García-Sánchez, F., Eliosa-Jiménez, G., Silva-Oliver, G. & Godínez-Silva, A. High-pressure (vapor+ liquid) equilibria in the (nitrogen+ n-heptane) system. J. Chem. Thermodyn. 39(6), 893–905 (2007).
Graham, E. & Weale, K. The Solubility of Compressed Gases in Non-Polar Liquids. In Progress in International Research on Thermodynamic and Transport Properties 153–158 (Elsevier, 1962).
Baranovich, Z. SOLUBILITE DE N2 DANS LE N-HEXANE ET LE N-OCTANE A BASSES T. (1972).
Eliosa-Jiménez, G., García-Sánchez, F., Silva-Oliver, G. & Macías-Salinas, R. Vapor–liquid equilibrium data for the nitrogen+ n-octane system from (344.5 to 543.5) K and at pressures up to 50 MPa. Fluid Phase Equilib. 282(1), 3–10 (2009).
Silva-Oliver, G., Eliosa-Jiménez, G., García-Sánchez, F. & Avendaño-Gómez, J. R. High-pressure vapor–liquid equilibria in the nitrogen–n-nonane system. J. Supercrit. Fluids 42(1), 36–47 (2007).
Azarnoosh, A. & McKetta, J. Nitrogen-n-decane system in the two-phase region. J. Chem. Eng. Data 8(4), 494–496 (1963).
García-Sánchez, F., Eliosa-Jimenez, G., Silva-Oliver, G. & Garcia-Flores, B. E. Vapor−liquid equilibrium data for the nitrogen+ n-decane system from (344 to 563) K and at pressures up to 50 MPa. J. Chem. Eng. Data 54(5), 1560–1568 (2009).
Rupprecht, S. D. & Faeth, G. Investigation of Air Solubility in Jet a Fuel at High Pressures (NASA, 1981).
García-Córdova, T., Justo-García, D. N., García-Flores, B. E. & García-Sánchez, F. Vapor− liquid equilibrium data for the nitrogen+ dodecane system at temperatures from (344 to 593) K and at pressures up to 60 MPa. J. Chem. Eng. Data 56(4), 1555–1564 (2011).
Sultanov, R., Skripka, V. & Namiot, A. Phase equilibria in the systems methane–n-hexadecane and nitrogen–n-hexadecane at high temperatures and pressures. Deposited Doc. VINITI 2888-71 (1971).
Lin, H.-M., Kim, H. & Chao, K.-C. Gas-liquid equilibria in nitrogen+ n-hexadecane mixtures at elevated temperatures and pressures. Fluid Phase Equilib. 7(2), 181–185 (1981).
Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992).
Thanh Noi, P. & Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 18(1), 18 (2018).
Breiman, L. Bagging predictors. Mach. Learn. 24(2), 123–140 (1996).
Chen, T. & Guestrin, C. In Xgboost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016).
Dev, V. A. & Eden, M. R. Gradient boosted decision trees for lithology classification. Comput. Aided Chem. Eng. 47, 113–118 (2019).
Yang, X., Dindoruk, B. & Lu, L. A comparative analysis of bubble point pressure prediction using advanced machine learning algorithms and classical correlations. J. Pet. Sci. Eng. 185, 106598 (2020).
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv preprint (2017).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv preprint (2018).
Meng, Q. et al. A communication-efficient parallel algorithm for decision tree. arXiv preprint (2016).
Ronze, D., Fongarland, P., Pitault, I. & Forissier, M. Hydrogen solubility in straight run gasoil. Chem. Eng. Sci. 57(4), 547–553 (2002).
Pedersen, K. S., Christensen, P. L. & Shaikh, J. A. Phase Behavior of Petroleum Reservoir Fluids (CRC Press, 2014).
Péneloux, A., Rauzy, E. & Fréze, R. A consistent correction for Redlich-Kwong-Soave volumes. Fluid Phase Equilib. 8(1), 7–23 (1982).
Gross, J. & Sadowski, G. Perturbed-chain SAFT: An equation of state based on a perturbation theory for chain molecules. Ind. Eng. Chem. Res. 40(4), 1244–1260 (2001).
Chen, Y., Mutelet, F. & Jaubert, J.-N. Modeling the solubility of carbon dioxide in imidazolium-based ionic liquids with the PC-SAFT equation of state. J. Phys. Chem. B 116(49), 14375–14388 (2012).
Kwak, T. & Mansoori, G. WVan der Waals mixing rules for cubic equations of state. Applications for supercritical fluid extraction modelling. Chem. Eng. Sci. 41(5), 1303–1309 (1986).
Florusse, L., Peters, C., Pamies, J., Vega, L. F. & Meijer, H. Solubility of hydrogen in heavy n-alkanes: Experiments and saft modeling. AIChE J. 49(12), 3260–3269 (2003).
Tihic, A., Kontogeorgis, G. M., von Solms, N. & Michelsen, M. L. Applications of the simplified perturbed-chain SAFT equation of state using an extended parameter table. Fluid Phase Equilib. 248(1), 29–43 (2006).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Jaubert, J.-N., Privat, R., Le Guennec, Y. & Coniglio, L. Note on the properties altered by application of a Péneloux-type volume translation to an equation of state. Fluid Phase Equilib. 419, 88–95 (2016).
Privat, R., Jaubert, J.-N. & Le Guennec, Y. Incorporation of a volume translation in an equation of state for fluid mixtures: Which combining rule? Which effect on properties of mixing?. Fluid Phase Equilib. 427, 414–420 (2016).
Chen, G. et al. The genetic algorithm based back propagation neural network for MMP prediction in CO2-EOR process. Fuel 126, 202–212 (2014).
Mohammadi, M.-R., Hemmati-Sarapardeh, A., Schaffie, M., Husein, M. M. & Ranjbar, M. Application of cascade forward neural network and group method of data handling to modeling crude oil pyrolysis during thermal enhanced oil recovery. J. Pet. Sci. Eng. 205, 108836 (2021).
Vallero, D. Fundamentals of Air Pollution (Academic Press, 2014).
Battino, R. The Ostwald coefficient of gas solubility. Fluid Phase Equilib. 15(3), 231–240 (1984).
Kumar, P. & Chevrier, V. F. Solubility of nitrogen in methane, ethane, and mixtures of methane and ethane at Titan-like conditions: A molecular dynamics study. ACS Earth Space Chem. 4(2), 241–248 (2020).
Author information
Authors and Affiliations
Contributions
S.A.M.: Investigation, Modeling, Visualization, Writing-Original Draft, M.-R.M.: Investigation, Data curation, Visualization, Writing-Original Draft, S.A.: Writing-Review & Editing, Methodology, Validation, A.A.: Writing-Review & Editing, Validation, A.H.-S.: Methodology, Validation, Supervision, Writing-Review & Editing, A.M.: Writing-Review & Editing, Validation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Madani, S.A., Mohammadi, MR., Atashrouz, S. et al. Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state. Sci Rep 11, 24403 (2021). https://doi.org/10.1038/s41598-021-03643-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-03643-8
- Springer Nature Limited