Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Madani, Seyed Ali; Mohammadi, Mohammad-Reza; Atashrouz, Saeid; Abedi, Ali; Hemmati-Sarapardeh, Abdolhossein; Mohaddespour, Ahmad

doi:10.1038/s41598-021-03643-8

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Article
Open access
Published: 22 December 2021

Volume 11, article number 24403, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Download PDF

Seyed Ali Madani¹,
Mohammad-Reza Mohammadi²,
Saeid Atashrouz³,
Ali Abedi⁴,
Abdolhossein Hemmati-Sarapardeh^2,5,6 &
…
Ahmad Mohaddespour⁴

4241 Accesses
15 Citations
4 Altmetric
Explore all metrics

Abstract

Accurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N₂) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N₂ solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N₂ solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N₂ solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N₂ solubility in normal alkanes and the N₂ solubility increases with increasing the molecular weight of normal alkanes.

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Article Open access 09 September 2021

Modeling solubility of CO₂–N₂ gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state

Article Open access 07 March 2022

Modeling the solubility of light hydrocarbon gases and their mixture in brine with machine learning and equations of state

Article Open access 02 September 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Introduction

Gas and fluids interactions are an undeniable part of many industrial procedures, which plays some major roles in many industries like petrochemical^1,2,3, oil and gas^4,5,6,7,8,9, medicine¹⁰, food^11,12, environment^13,14, polymer^15,16, etc. Among the common gaseous phases normally present in the mentioned environments, colorless odorless nitrogen (N₂) is one of the most common gases included as the feed or product in many processes. On the other hand, the presence of this gas as the dominant part of atmosphere components makes it an important case to be investigated accurately. The oil and gas industry would not be an exception, and N₂ applications are observed in many subsidiaries of this industry, from the upstream to downstream. As a clear example, N₂ and its related treatments have been used since few decades ago because of its unique properties for enhanced oil recovery (EOR) operations^17,18,19. Usually, carbon dioxide (CO₂) or N₂ gases are continuously injected into the oil reservoir for miscible/immiscible oil displacement. These gases are extracted back out with the recovered oil, recaptured, and reinjected along with new gas until as much oil as possible is produced²⁰. Cost efficiency and higher feasibility make some advantages for this component (N₂) in comparison with CO₂ and methane (CH₄)^21,22. However, N₂ has been commonly utilized in deep reservoirs as it needs a higher injection pressure to gain miscibility with the reservoir fluids than does CO₂²⁰. Also, in the midstream, N₂ is used in pipeline drying, which is an essential part of pipeline commissioning to prevent unwanted aerosols through contaminant displacing²³. There are many significant instances of N₂ usage in downstream, like nitrogen purging which is a technique to avoid unintentional reaction of hazardous gas and hydrocarbons through the oxygen reduction in the environments that is susceptible to explosion²⁴ that is a similar technique which is used in nitrogen blanketing²⁵ in hydrocarbon storage tanks. Crude oil is a complex mixture of hydrocarbons. Achieving reliable predictions for the thermodynamics and phase equilibrium data of N₂/oil systems is complex and difficult. Alkanes are the major constituents of crude oil and most petroleum products. Therefore, in many studies, the behavior of alkanes and the desired gas like N₂ is studied first, and the obtained information will be later generalized to crude oil.

Solubility is one of the most important thermodynamics values representing the value of a gas dissolution in a liquid at a specific pressure and temperature. While many analytical methods are used to calculate the solubilities of gases in liquids mainly through the equations of state (EOSs)^26,27,28,29, the accuracy of their prediction, especially in some critical industrial applications, has been a serious challenge yet. Based on previous experiments, the solubility of N₂ in hydrocarbons is positively affected by increasing pressure and temperature^26,27,28. Furthermore, as the molecular weight rises, N₂ solubility increases, as evidenced by laboratory experiments²⁹. Properly estimating phase equilibrium data in binary systems containing N₂ and a hydrocarbon is difficult. Because, based on the classification scheme of Van Konynenburg and Scott^30,31, binary systems of a hydrocarbon and N₂ are recognized as type III phase diagrams, except the binary system of N₂ + CH₄, which is recognized as a type I system^30,31. Risk of energy waste and potential hazards exist in operations which use N₂. As a result, solubility data is critical for predicting an appropriate quantity of N₂ to use in this operation, and it can improve plant safety. Studies with heavy hydrocarbons are particularly challenging due to their complexity. Furthermore, the dangers of high-temperature and/or high-pressure conditions in industrial operations make the extensive experiments an undesirable option. As a result, modelling with experimental data would be an alternative.

Mainly, the strategies for the prediction of N₂ solubility in hydrocarbon solvents or petroleum blends rely on experimental and semi-empirical models like EOSs, and are comparable to those utilized to estimate the solubility of other gasses like CH₄, CO₂, and hydrogen^{32,33,34,35,36,37}. In compressed N₂, the vapor-phase solubilities of n-Decane, ferf-butylbenzene, 2,2,5-trimethylhexane, and n-dodecane were determined by Davila et al.³⁸ and the second virial cross coefficients ($B_{12}$) were computed using these data³⁸. A static equilibrium cell was used by Tong et al.²⁹ to test the solubilities of N₂ in four n-paraffin hydrocarbons (Decane, Eicosane, Octacosane, and Hexatriacontane). The Soave–Redlich–Kwong (SRK) and Peng-Robinson (PR) EOS were applied to analyze the data. The results show a growing trend in N₂ solubility with rising pressure, temperature, and n-paraffin chain length²⁹. N₂ solubilities in various naphthenic (trans-Decalin and cyclohexane) and aromatic (naphthalene, 1-methylnaphthalene, benzene, phenanthrene, pyrene) solvents were determined by Gao et al.²⁶ using a static cell. When a single interaction parameter ($C_{ij}$) is employed in each binary system, the PR-EOS was demonstrated to fit the model²⁶. Privat et al.^39,40 used the PR EOS combined with the group contribution method, called the PPR78 model, for predicting phase equilibrium data of mixtures containing various hydrocarbons and N2. This model is able to predict temperature-dependent binary interaction parameters (kij). The mentioned model provided satisfying results with an overall deviation lower than 10%. They also mentioned that for the hydrocarbon + N₂ systems (except CH₄); k_ij is a decreasing function of temperature^39,40. At low temperatures, Justo-Garcia et al.⁴¹ modeled vapor–liquid-liquid equilibria (VLE) for N₂ and alkanes in three distinct ternary systems. The findings demonstrate that both SRK and PC-SAFT EOSs estimate the experimentally observed values with reasonable accuracy⁴¹. In another study, Justo-Garcia et al.⁴² used the SRK and PC-SAFT EOSs to model three-phase vapor–liquid–liquid equilibria for a combination of natural gas having high N₂ content. The results revealed that the PC-SAFT EOS accurately predicts phase behavior, but the SRK EOS suggests a three-phase region that is larger than what was observed experimentally⁴². The Krichevsky–Ilinskaya equation was used by Zirrahi et al.²⁷ to estimate the solubility of light solvents (CO₂, N₂, CH₄, C₂H₆, and CO) in bitumens from five Alberta reservoirs. The gas phase is analyzed applying the PR-EOS. The suggested model is then validated using experimental data on light solvent solubility. The results demonstrated that the proposed model accurately reflects known solubility data in bitumen for light hydrocarbons (CH₄ and C₂H₆) and non-hydrocarbon solvents (N₂, CO₂, and CO)²⁷. Haghbakhsh et al.⁴³ investigated the vapor–liquid equilibria of binary N₂–hydrocarbon mixtures across an extensive range of temperature and pressure applying PR and ER EOSs. They introduced a new correlative mode for the proposed equations to improve accuracy, which was likely to be effective, improving accuracy by up to three times⁴³. Thermo-physical characteristics of CO₂ and N₂/bitumen solutions were studied by Haddadnia et al.²⁸. Furthermore, PR-EOS was used to describe the calculated solubility²⁸. PC-SAFT and SRK EOSs were employed by Wu et al.⁴⁴ to estimate gas solubilities in n-alkanes. The PC-SAFT EOS was found to be able to accurately predict an empirically observed linear connection between gas solubilities in n-alkanes and their carbon number. Despite its satisfactory accuracy for gas solubility in lighter n-alkanes, the SRK EOS typically produces significantly poorer results than the PC-SAFT EOS⁴⁴. Tsuji et al.⁴⁵ investigated N₂ and oxygen gas solubilities in benzene, divinylbenzene, and styrene. For a particular isotherm, gas solubility in liquids had a linear pressure dependency and declined with rising temperature. Ultimately, PR-EOS was implemented to predict gas solubilities⁴⁵. Aguilar-Cisneros et al.⁴⁶ determined the solubility of N₂, CO₂, and CH₄ in petroleum fluids using the PR-EOS in conjunction with various mixing rules in systems including bitumens, heavy oils, refinery cuts, and coal liquids. The universal and van der Waals mixing rules revealed satisfactory outcome between experimental data and predicted values, while the modified Huron-Vidal of order one mixing rule produced large discrepancies⁴⁶.

During the last decade, alongside the developments of intelligent methods based on machine learning (ML) techniques, many attempts have been made to predict thermodynamic results with a higher accuracy based on reliable experimental data. Abdi-Khanghah et al.⁴⁷ studied alkane solubility in supercritical CO₂. Two kinds of artificial neural networks were used for their study: Radial basis function (RBF) and multi-layer perceptron (MLP) artificial neural network (ANN). The MLP-ANN outperformed the RBF-ANN in predicting n-alkane solubility in supercritical CO₂⁴⁷. Songolzadeh et al.⁴⁸ demonstrated that the PSO–LSSVM model is an effective technique for predicting n-alkane solubility in supercritical CO₂ with high accuracy. The least-squares support vector machine (LSSVM) was employed, which was tuned using two different optimizing algorithms: particle swarm optimization (PSO) and cross-validation-assisted Simplex algorithm (CV-Simplex)⁴⁸. Chakraborty et al.⁴⁹ developed a set of data-driven models capable of predicting VLE for the binary systems of C₁₀-N₂ and C₁₂-N₂. In comparison to the VLE modeled using the PR-EOS, both models significantly improved the estimated value of binary mixture equilibrium pressure⁴⁹. Mohammadi et al.⁵⁰ implemented different ML models to predict hydrogen solubility in various pure hydrocarbons in wide pressure and temperature ranges and compared them with some of the common EOSs. Their results showed that using intelligent models shows more precise results than the common usage of EOSs in hydrogen solubility estimation⁵⁰. To predict nitrogen solubility in unsaturated, cyclic and aromatic hydrocarbons, Mohammadi et al.⁵¹ employed a convolutional neural network (CNN) and the results showed that pressure is the most significant factor for nitrogen solubility in unsaturated hydrocarbons. In general, prediction based on EOSs semi-analytical methods has been the common way to estimate the N₂ solubilities in alkanes. On the other hand, the mentioned method is case-specific and it is limited to some defined hydrocarbons with specific parameters for each EOS. Hence, using intelligent models like proper ML algorithms and reliable experimental data may lead to a model for predicting N₂ solubility in normal alkanes with high accuracy and this helps to accelerate predictions.

In this study, we use a dataset containing 1982 experimental N₂ solubility data points for 19 distinct normal alkanes gathered under various operating states. Models for estimating N₂ solubility in normal alkanes are constructed using well-known ML algorithms namely k-nearest neighbor (k-NN) and random forest (RF), as well as innovative ML methods such as extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost), and light gradient boosting machine (LightGBM). Furthermore, statistical parameters and graphical error assessments are used to verify the validity of the suggested models. Numerous N₂ solubility systems are predicted by the methods proposed in this research and five EOSs, namely perturbed-chain statistical associating fluid theory (PC-SAFT), Redlich-Kwong (RK), Peng-Robinson (PR), Soave–Redlich–Kwong (SRK), and Zudkevitch-Joffee (ZJ). Eventually, the relevancy factor is utilized to assess the relative impact of input parameters on N₂ solubility in normal alkanes.

Data collection

The modeling of N₂ solubility in normal alkanes was performed using a large solubility databank containing 1982 data points collected from the literature^{29,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91}. The properties of 19 normal alkanes (nC₁ to nC₃₆) utilized in this survey are presented in Table 1.

Table 1 The normal alkanes utilized in this survey.

Full size table

The inputs of the models were chosen to be temperature (K), pressure (MPa), and molecular weight (g/mol) of normal alkanes, whereas N₂ solubility (in terms of mole fraction) was the desired output. The statistical details of the N₂ solubility databank used for modeling are tabulated in Table 2. The validity, accuracy, and applicability of the model depend on the quantity and variety of N₂ solubility data collected in different systems. The broad ranges of pressure (0.0212–69.12 MPa), temperature (91.21–703.4 K), and normal alkanes (nC₁ to nC₃₆) can lead to a reliable general model for estimating the solubilities of N₂ in normal alkanes.

Table 2 The statistical information of the N₂ solubility databank used in this paper.

Full size table

Models’ implementation

Algorithms’ selection

Due to recent advances in computation capacities and also the advent of new machine learning algorithms, there are many choices to use as algorithms for the problem under consideration. Because of the size of the dataset and small instance number and also based on the limited number of the features, some of the non-parametric ML models which mainly focus on the dataset and do not suffer from the small size of the dataset were noticed as the best choices in this case.

K-nearest neighbors (k-NN)

The k-NN method is an ML technique that is employed to solve both classification and regression problems. This supervised algorithm is widely used as a non-parametric technique for various applications⁹². In this algorithm, the k is the number of neighbors which are assigned to a new sample to predict the target based on its inheritance from these k samples that are closest to the new sample using a uniform weight assigning system or a specific distance function⁹³. Distance function is a tool to allocate a weight to each of the k samples features to identify its contribution in final predicted value. Minkowski distance equation is the typical choice for the distance function. The general form of this equation is provided in Eq. (1), where X and Y are two samples feature sets. This function turns to Manhattan or Euclidean distance function in most of the cases by using the p = 1 or p = 2, respectively. Finding and selection of the optimal value of the k hyper-parameter is the most crucial stage in the training of this algorithm to achieve a satisfactory accuracy. Hence, the algorithms are run by a wide range of k value and the optimal case is revealed based on the comparison of statistical accuracy measurements among the explored cases.

$$\begin{aligned} D\left( {X,Y} \right) & = \left( {\mathop \sum \limits_{i = 1}^{n} \left| {x_{i} - y_{i} } \right|^{p} } \right)^{\frac{1}{p}} \\ X & = \left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right) \;{\text{and}}\;Y = \left( {y_{1} ,y_{2} , \ldots , y_{n} } \right) \in {\mathbb{R}}^{n} \\ \end{aligned}$$

(1)

Random forest

Random forest is a bagging supervised learning technique for classification and regression using the ensemble learning approach based on CART (Classification and Regression Trees)⁹⁴. This algorithm avoids high prediction variance, which is a common issue in the decision tree algorithm. Random forests have trees, which run parallelly. These trees do not have any interaction with each other during the forest construction. It works by training a large number of decision trees and then determining the class that is the mean prediction of the individual trees in regression cases. At each node, the number of attributes that may be divided is limited to a certain proportion of the total which is known as the hyperparameter. This guarantees that the ensemble model does not depend too strongly on any specific attribute and that all potentially predictive variables are considered equally. In any CART tree training, the random forest technique picks the training dataset T_i, randomly from the complete training set T, by replacement (i.e., bootstrapping sampling). The data that was not included in the random sampling technique is referred to as "out-of-bag" data. The random forest technique picks N features or input variables randomly from a set of M input independent factors (N < M) while building each CART tree. According to the randomly picked T_i and M characteristics, the best splitting for each CART tree is calculated. The final results of the regression are being determined via majority voting. To increase the estimation precision, the averaged prediction reduces the averaged squared error on the individual estimations produced from an individual CART tree. The resulting ensemble trees are designated as follows (Eq. 2):

$$\begin{gathered} \left\{ {\phi_{{T_{b} ,m}} \left| {b = 1, \ldots ,B} \right.} \right\} \hfill \\ \hat{Y} = \phi_{T,P} \left( X \right) = \frac{1}{B}\mathop \sum \limits_{b = 1}^{B} \phi_{{T_{b} ,m}} \left( X \right) \hfill \\ \end{gathered}$$

(2)

Extreme gradient boosting (XGBoost)

The fundamental concept behind a tree-based ensemble method is to use an ensemble of classification and regression trees (CARTs) to fit training data using a regularized objective function minimization. One of those other tree-based models is XGBoost, which is part of the gradient boosting decision tree framework (GBDT). To further explain the construction of the CART, each cart is made up of (I) a root node, (II) internal nodes, and (III) leaf nodes, as illustrated in Fig. 1. The root node, which represents the entire dataset, is split into internal nodes by the binary decision technique, whilst the leaf nodes reflect the final classifications. In gradient boosting, a sequence of basic CATRs are created simultaneously, with the weight of each individual CART being adjusted via the training process⁹⁵.

An ensemble of n trees must be trained to predict the y for a specific dataset, m and n respectively show the count of features and instances.

$$\begin{aligned} \hat{y}_{i} = \sum\limits_{k = 1}^{N} {f_{k} \left( {X_{i} } \right),\;\;\,f_{k} \in f} \\ & With\;f= \left\{ {f(X) = \omega_{q(x)} } \right\},\;\left( {q:{\mathbb{R}}^{m} \to T,\;\omega \in {\mathbb{R}}^{T} } \right) \\ \end{aligned}$$

(3)

where the decision rule $q\left( x \right)$ maps the example to the binary leaf index. $n$ shows the regression trees space, $f_{k}$ shows the kth independent tree, T represents the count of tree’s leaves, and w shows the leaf’s weight in Eqs. 3 and 4.

The minimization of the regularized objective function $L$ is used to determine the ensemble of trees:

$$\begin{aligned} & L = \sum\limits_{i}^{n} {l(\hat{y}_{i} ,y_{i} ) + \sum\limits_{k}^{N} {\Omega \left( {f_{k} } \right)} } \\ & With\;\Omega (f) = \gamma T + \frac{1}{2}\lambda \left\| \omega \right\|^{2} \\ \end{aligned}$$

(4)

where Ω shows the regularization term that helps to reduce overfitting by reducing the model's complexity; l stands for a loss function that is differentiable and convex; γ is the minimal loss reduction required to split a new leaf; and λ displays the regulation coefficient. It is worth noting that in these equations λ and γ assist to increase model variance and avoid overfitting.

The objective function for each individual leaf is reduced in the gradient boosting technique, and additional branches are added sequentially.

$$L^{(t)} = \sum\limits_{i = 1}^{n} {\left\{ {l(y_{i} ,\hat{y}_{i}^{(t - 1)} ) + f_{t} (X_{i} )} \right\}} + \Omega (f_{t} )$$

(5)

The t-th iteration of the above-mentioned training procedure is represented by t. The XGBoost method aggressively adds the space of regression trees to greatly improve the ensemble model, which is sometimes dubbed "greedy algorithm". As a result, the model output is updated continuously by minimizing the objective function:

$$\hat{y}_{i}^{(t)} = \hat{y}_{i}^{(t - 1)} + f_{t} (X_{i} )$$

(6)

The XGBoost takes use of a shrinkage technique in which newly added weights are scaled by a learning factor rate after each stage of boosting. This minimizes the risk of overfitting by reducing the impact of future additional trees on each available individual tree⁹⁶.

Light gradient boosting machine (LightGBM)

LightGBM is a novel gradient learning framework based on the decision tree concept. The main advantages of LightGBM over XGBoost are that it uses less memory, uses a leaf-wise growth method with depth constraints, and uses a histogram-based technique to speed up the training process. LightGBM discretizes continuous floating-point eigenvalues to k bins through using the aforementioned histogram technique, resulting in a k-width histogram. Furthermore, the histogram technique does not require additional storing of pre-sorted results, and values may be stored in an 8-bit integer after feature discretization, reducing memory usage to 1/8. Despite this, the model's accuracy suffers as a result of the harsh partitioning method. LightGBM also employs a leaf-by-leaf technique, which is more successful than the usual level-by-level strategy. The reason for this inefficiency in level-wise approach is that at each step, only leaves from the same layer are examined, resulting in unnecessary memory allocation. Alternatively, at each stage of the leaf-wise method, the algorithm finds the leaves with the largest branching gain, and then proceeds to the branching cycle. In comparison to the horizontal direction, errors can be reduced and greater precision can be attained with the same number of segmentations. The leaf-wise tree development technique is illustrated in Fig. 2. The disadvantage of leaf orientation is that it forces you to build deeper decision trees, which invariably leads to overfitting. On the other hand, LightGBM prevents overfitting while maintaining high efficiency by imposing a maximum depth restriction on the leaf top^97,98.

For a specific training dataset $X = \left\{ {(x_{i} ,y_{i} )} \right\}_{{_{i = 1} }}^{m}$, LightGBM searches an approximation $\hat{f}\left( x \right)$ to the function f*(x) to minimize the expected values of specific loss functions L (y, f (x)):

$$\hat{f}\left( x \right) = \arg \mathop {\min }\limits_{f} E_{y,x} L(y,f(x))$$

(7)

LightGBM ensembles many T regression trees $\mathop \sum \limits_{t = 1}^{T} f_{t } \left( x \right)$ to approximate the model. The regression trees are defined as w_q(x), $q \in \left\{ {1, \, 2, \ldots ,N} \right\}$, where q shows the decision rule of trees, N is defined as the count of tree leaves, and w denotes a vector shows the sample weights of leaf nodes. The model is trained in the additive form at step t:

$$G_{t} \cong \sum\limits_{i = 1}^{N} {L(y_{i} ,F_{t - 1} (x_{i} ) + f_{t} (x_{i} ))}$$

(8)

To estimate the objective function, the newton's approach is employed.

Gradient boosting with categorical features support (CatBoost)

CatBoost, which employs one hot max size (OHMS) that is a permutation technique beside the target-based statistics, employs categorical columns for categorical boosting. For a new split of the present tree, a greedy approach is utilized in this methodology, allowing CatBoost to identify the exponential evolution of the feature combination⁹⁹. In CatBoost, for each feature with more categories than OHMS, the following steps are applied:

1.
Records are divided into subsets at random.
2.
Integer conversion of labels
3.
Convert categorical features to numeric values as follows:
$$avg\;Target = \frac{countInClass + prior}{{totalCount + 1}}$$
(9)

where $countInClass$ is the number of targets having a value of one for a category attribute, and $totalCount$ is the number of preceding objects (the starting parameters specify prior to count objects)^100,101.

Equations of state (EOSs)

EOS is a mathematical expression for the connection among a substance's volume, temperature, and pressure. This equation may be used to explain VLE, volumetric behavior, and thermodynamic properties of mixtures and pure substances. EOSs are used to estimate the phase behavior of petroleum fluids. As previously stated, EOSs have poor predictors of gas solubility in solvents, particularly under complicated working circumstances. Five EOSs were used to assess N₂ solubility in hydrocarbons in this research, and their reliability in predicting N₂ solubility is compared to ML algorithms. Mathematical equations of implemented EOSs are shown in Table 3. Table 4 also shows the parameters of the EOSs. Also, some required molecular parameters corresponding to each substance which is investigated with PC-SAFT EOS are provided in Table 5. Besides, a proper mixing rule is needed to use for estimation of each mixture’s parameters. In this study, van der Waals one-fluid mixing rules have been utilized, and its corresponding mathematical expression is provided in Table 4.

Table 3 EOSs Formulas utilized in this study.

Full size table

Table 4 Parameters of EOSs and mixing rules.

Full size table

Table 5 Parameters of PC-SAFT EOS^105,108,109.

Full size table

Evaluation of models

The following statistical parameters, namely root mean square error (RMSE), standard deviation (SD), and coefficient of determination (R²) were used in this survey to evaluate the performance of models:

$$RMSE = \sqrt {\frac{1}{Z}\sum\limits_{i = 1}^{Z} {\left( {NS_{i,\exp } - NS_{i,pred} } \right)}^{2} }$$

(10)

$$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{Z} {(NS_{i,\exp } - NS_{i,pred} )^{2} } }}{{\sum\limits_{i = 1}^{Z} {(NS_{i,\exp } - \overline{{NS_{\exp } }} )^{2} } }}$$

(11)

$$SD = \sqrt {\frac{1}{Z - 1}\sum\limits_{i = 1}^{Z} {\left( {\frac{{NS_{i,\exp } - NS_{i,pred} }}{{NS_{i,\exp } }}} \right)}^{2} }$$

(12)

where Z, NS_i,exp, and NS_i,pred are the count of data, experimental N₂ solubility, and predicted N₂ solubility in normal alkanes, respectively.

On the other hand, the following graphical tools were utilized simultaneously to evaluate the performance of the ML models:

Cross plot: The most well-known graphical analysis in which the predicted values are plotted against the measured values and the accuracy of the models is evaluated by examining the proximity of the data points to the unit slop line.

Trend plot: This plot helps to check the validity of the model by sketching both real data and the model's estimation versus the specific property or data index.

Error distribution plot: The error (measured value − predicted value) is plotted against the real data to assess the scatter of data around the zero-error line and to explore the possible error trend.

Histogram plot of errors: This graph shows how the errors from the model are distributed. This statistical tool indicates the discrepancy between the measured and predicted values, in which a normal distribution centered at zero error is expected for a good model.

Results and discussion

Model optimization and tuning

To find the best model in each aforementioned algorithm, a routine procedure has been done to find the hyperparameters and the other functional features of each model. Since these models have been implemented in python, different libraries including scikit-learn for k-NN and Random forest¹¹⁰, xgboost for XGBoost, lightgbm for LightGBM⁹⁸, and catboost⁹⁹ for Catboost have been employed in this study. In each of these involves some parameters that should be set by user or they can be work on default mode. To find the best model state in each of algorithms, a wide range of selective parameters have been selected and the best model based on the training and test data RMSE has been chosen. The search space and the final arrangements of model are provided in Table 6.

Table 6 Models' tuning search space and selected model based on RMSE.

Full size table

Statistics and performance metrics of the models

The model’s precision in predicting N₂ solubility in normal alkanes was assessed statistically based on several statistical criteria including RMSE, R², and SD. Table 7 reports the calculated values of these statistical factors for the training subset, testing subset, and the entire dataset of all ML models. The possibility of overtraining is completely rejected given that no meaningful difference was seen between the testing and training subsets for all models. Based on Table 7, the CatBoost model has the lowest prediction errors among the developed ML models with RMSE values of 0.0125, 0.0213, and 0.0147 for the training subset, testing subset, and the entire dataset, respectively. Also, the overall R² of 0.9943 for the CatBoost model is higher than other models and has a lower SD, indicating a better fit for this model to the experimental data. Moreover, random forest, XGBoost, LightGBM, and k-NN models are categorized after the CatBoost model in terms of good performance, respectively.

Table 7 ML models’ statistics and performance metrics.

Full size table

As mentioned earlier, several EOSs have been used comparatively with the ML models to estimate N₂ solubility in normal alkanes. Hence, the solubilities of N₂ in several normal alkanes namely Hexadecane, Eicosane, Octacosane, and hexatriacontane, which experimental values have been reported in the literature^29,90, are estimated utilizing ML models and EOSs. Tables 8, 9, 10 and 11 represented the N₂ solubility data and predictions of EOSs and ML models along with RMSE values for each of them. As can be seen, the CatBoost model provides the best estimates among the ML models and EOSs for the N₂ solubility in all considered normal alkanes. ZJ EOS also had precise estimations for solubility values and outperformed other EOSs. On the other hand, as shown in Table 3, the Péneloux-type volume translation (c) has been used in the PR and SRK EOSs for the sake of investigation. Based on our studies, Péneloux-type volume translation does not have any effect on the obtained solubility values^111,112.

Table 8 Estimations of different EOSs and ML models for N₂ solubility in Hexadecane.

Full size table

Table 9 Estimations of different EOSs and ML models for N₂ solubility in Eicosane.

Full size table

Table 10 Estimations of different EOSs and ML models for N₂ solubility in Octacosane.

Full size table

Table 11 Estimations of different EOSs and ML models for N₂ solubility in Hexatriacontane.

Full size table

Graphical analysis of the models

In the next step, the evaluation of the ML models is performed by graphical analysis. First, cross plots of the experimental N₂ solubility data versus predicted values by the ML models for the training and testing stages are presented in Fig. 3. All five ML models performed well in both training and testing stages and most of the data points are accumulated around the X = Y line, although the scatter of points is much less for the CatBoost model and is more concentrated around the X = Y line, indicating the excellent performance of this model in estimating N₂ solubility in normal alkanes.

Next, the distributions of the N₂ solubility prediction errors (measured—predicted) utilizing the ML models versus the experimental data are shown in Fig. 4. High concentrations of near-zero error points for a predictive tool indicate a better performance of that predictive tool in predicting N₂ solubility in normal alkanes. Again, the CatBoost model resulted in near-zero errors, verifying its accuracy and reliability. However, other ML models especially random forest shows good predictions with low errors for the N₂ solubility in normal alkanes.

The next step of the graphical assessment of introduced ML models for the prediction of N₂ solubility in normal alkanes is related to the frequency of errors. Figure 5 depicts the histograms of errors corresponding to the proposed ML models in this work. As it is clear, the symmetric distributions are seen in the histogram graphs of all ML models. Also, the bursts of growing at the zero-error value for all developed models confirm the superb match between estimated and experimental data of N₂ solubility in normal alkanes. However, the percentage frequency of errors at the zero-error value is about 85% for the CatBoost model and it is much higher than other ML models indicating the high credit of this model in estimating N₂ solubility in normal alkanes.

However, all the models used in this study show satisfactory performances. As it is obvious from the statistical and graphical analyses, the CatBoost model shows the best performance among the implemented ML models. The performance of a model depends on many factors, such as the case of study and the structure of the dataset, and this superiority in performance for this model stems from two main reasons. The first one is the structure of the dataset used in this work, based on the shape of the dataset, there are many instances that have equal values in the n-1 feature and their only difference is in one feature. This feature enables the tree-based models to do a better splitting operation and finally brings higher accuracy. Secondly, Catboost models use symmetric trees and it helps to have a faster inference. Also, its boosting schemes are the main reason which avoids overfitting and increases the model quality after the training process. Finally, it should be noted that these advantages for Catboost strongly depend on the dataset and it cannot be generalized to all problems.

Pressure and temperature trend analysis

As the final assessment step, various visual evaluations were executed to appraise the CatBoost model's capability in various N₂ solubility in hydrocarbons systems. Figure 6 represents the effect of pressure on N₂ solubility for n-Decane system at the temperature of 503 K. Figure 6 shows N₂ solubilities estimated by the CatBoost model for this case, as well as the values determined by the EOSs along with the literature experimental results⁸⁷. The mismatch between standard EOSs estimations and actual experimental data is quite significant at high temperatures. As seen in this figure, the CatBoost model predicts experimental data quite well. Based on expectations, the solubility of N₂ in n-Decane rises as the pressure increases. Meanwhile, the EOSs overestimate or underestimate the N₂ solubility ‘growth when pressure rises, while the CatBoost model strictly traces the trend.

The predictions of CatBoost and other proposed ML models for N₂ solubility data in a light hydrocarbon (methane)⁶¹ under various operation conditions at a constant temperature of 180 K are provided in Fig. 7. All the intelligent models follow the trend well, and show a positive trend in N₂ solubility as pressure increases. The CatBoost model, as shown in this figure, accurately recognizes data patterns and provides excellent estimations in all pressures.

Finally, a similar trend analysis performed to investigate the performance of different ML models at various temperature states to estimate the N₂ solubility in n-hexane at the constant pressure of 27.57 MPa⁷⁴. Based on Fig. 8, similar to the previous case, a satisfactory trend capturing is observed in all the intelligent models. However, the Catboost model provides more accurate predictions. Also, the figure indicates an increase in N₂ solubility as temperature rises.

Sensitivity analysis

Utilizing the CatBoost model as the best-developed model in the current study, a sensitivity analysis was performed. To this end, the relevancy factor (r)¹¹³ was calculated for each input parameter using the following equation, with the knowledge that the higher the r-value, the greater impact on the model's output. It should also be noted that the positive r-value for a parameter indicates its direct effect on the output of the model and vice versa¹¹⁴.

$$r(I_{i} ,NS) = \frac{{\sum\limits_{j = 1}^{n} {\left( {I_{i,j} - I_{m,i} } \right)\left( {NS_{j} - NS_{m} } \right)} }}{{\left( {\sum\limits_{j = 1}^{n} {\left( {I_{i,j} - I_{m,i} } \right)^{2} \sum\limits_{j = 1}^{n} {\left( {NS_{j} - NS_{m} } \right)^{2} } } } \right)^{0.5} }}$$

(13)

where I_i,j represents the jth value of the ith input variable (i is molecular weight of normal alkanes, pressure, and temperature); I_m,i shows mean value of the ith input; NS_m and NS_j denote the mean value and the jth value of predicted N₂ solubility in normal alkanes, respectively. The outcomes of the relevancy factor analysis are depicted in Fig. 9. According to Fig. 9, all input parameters, namely temperature, pressure, and molecular weight of normal alkanes have a positive effect on N₂ solubility in normal alkanes. The results reveal that the pressure has the greatest impact on N₂ solubilities in normal alkanes and the N₂ solubility increases with increasing the molecular weight of normal alkanes. Based on Henry's law, the amount of dissolved gas in a liquid is proportional to its partial pressure in equilibrium with that liquid. When the gas is at a higher pressure, its molecules collide more with each other and with the liquid's surface. As the molecules collide more with the surface of the liquid, they can squeeze between the liquid molecules and thus become a part of the solution^115,116. On the other hand, the sensitivity analysis overall shows that the solubility of N₂ in normal alkanes increases when the temperature increases. This shows the reverse order solubility phenomenon that is the opposite of what commonly happens for a binary mixture of a supercritical component and a subcritical component^73,81. The reason for this may be due to the repulsive nature of N₂–N₂ interaction. The N₂–N₂ repulsive force decreases with an increase in temperature, which results in increased solubility of N₂ at higher temperatures. However, increasing the solubility of N₂ with an increase in temperature may not be true for all normal alkanes and literature survey shows that the N₂ solubility in methane and ethane decreases with increasing temperature¹¹⁷. Normal alkanes are nonpolar, as they contain nothing but C–C and C–H bonds. N₂ is also a nonpolar molecule and nonpolar substances tend to dissolve in nonpolar solvents such as normal alkanes. The molecular weight of the normal alkanes is mainly increased by adding C–C and C–H bonds. The obvious consequence of this is that the N₂ solubility increases as the number or length of the nonpolar chains increases.

Conclusions

In the present work, N₂ solubility in normal alkanes (nC₁ to nC₃₆) was modeled using five representative ML models namely CatBoost, k-NN, LightGBM, random forest, and XGBoost by utilizing a large N₂ solubility databank in a wide range of operating temperature (91.21–703.4 K) and pressure (0.0212–69.12 MPa). Also, five EOSs namely RK, SRK, ZJ, PR, and PC-SAFT were used comparatively with the ML models to estimate N₂ solubility in normal alkanes. The developed CatBoost model was superior to all of ML models and EOSs with an overall RMSE of 0.0147 and R² of 0.9943. Moreover, Random Forest, XGBoost, LightGBM, and k-NN models were ranked after the CatBoost model in terms of good performance, respectively. Furthermore, ZJ EOS showed the best performance among the EOSs. Finally, the results of relevancy factor analysis indicated that all input variables to the models, namely temperature, pressure, and molecular weight of normal alkanes have a positive effect on N₂ solubilities in normal alkanes and pressure has the greatest effect among these input variables. The solubility of N₂ increases with increasing the molecular weight of normal alkanes.

Abbreviations

CARTs:: Classification and regression trees
CNN:: Convolutional neural network
EOR:: Enhanced oil recovery
EOS:: Equation of state
exp:: Experimental
k-NN:: K-nearest neighbors
ML:: Machine learning
Mw:: Molecular weight
NS:: Nitrogen solubility
PC-SAFT:: Perturbed-Chain Statistical Associating Fluid Theory
PR:: Peng-Robinson EOS
pred:: Predicted
RMSE:: Root mean square error
RK:: Redlich-Kwong EOS
SAFT:: Statistical associating fluid theory
SRK:: Soave–Redlich–Kwong EOS
SD:: Standard deviation
SW:: Schmidt-Wenzel EOS
VLE:: Vapor–liquid equilibria
XGBoost:: EXtreme Gradient Boosting
N₂ :: Nitrogen
R² :: Coefficient of determination
P_c :: Critical pressure
T_c :: Critical temperature

References

Baukal, C. E., Hayes, R., Grant, M., Singh, P. & Foote, D. Nitrogen oxides emissions reduction technologies in the petrochemical and refining industries. Environ. Prog. 23(1), 19–28 (2004).
CAS Google Scholar
Hodges, A., Fica, Z., Wanlass, J., VanDarlin, J. & Sims, R. Nutrient and suspended solids removal from petrochemical wastewater via microalgal biofilm cultivation. Chemosphere 174, 46–48 (2017).
ADS CAS PubMed Google Scholar
Carvalho, M. A. F. D. et al. A potential material for removal of nitrogen compounds in petroleum and petrochemical derivates. Chem. Eng. Commun. 208, 1564–1579 (2020).
Google Scholar
Ahmed, T., Menzie, D. & Crichlow, H. Preliminary experimental results of high-pressure nitrogen injection for EOR systems. Soc. Petrol. Eng. J. 23(02), 339–348 (1983).
CAS Google Scholar
Rezaei, M., Shadizadeh, S., Vosoughi, M. & Kharrat, R. An experimental investigation of sequential CO2 and N2 gas injection as a new EOR method. Energy Sources A 36(17), 1938–1948 (2014).
CAS Google Scholar
Heucke, U. Nitrogen injection as IOR/EOR solution for North African oil fields. In SPE North Africa Technical Conference and Exhibition, OnePetro (2015).
Tovar, F. D., Barrufet, M. A. & Schechter, D. S. Enhanced oil recovery in the wolfcamp shale by carbon dioxide or nitrogen injection: An experimental investigation. SPE J. 26(01), 515–537 (2021).
CAS Google Scholar
Ameli, F., Hemmati-Sarapardeh, A., Schaffie, M., Husein, M. M. & Shamshirband, S. Modeling interfacial tension in N2/n-alkane systems using corresponding state theory: Application to gas injection processes. Fuel 222, 779–791 (2018).
CAS Google Scholar
Barati-Harooni, A. et al. Estimation of minimum miscibility pressure (MMP) in enhanced oil recovery (EOR) process by N2 flooding using different computational schemes. Fuel 235, 1455–1474 (2019).
CAS Google Scholar
De Santis, L., Parmegiani, L. & Scarica, C. Changing perspectives on liquid nitrogen use and storage. J. Assist. Reprod. Genet. 38(4), 783–784 (2021).
PubMed PubMed Central Google Scholar
Prandi, B. et al. Food wastes from agrifood industry as possible sources of proteins: A detailed molecular view on the composition of the nitrogen fraction, amino acid profile and racemisation degree of 39 food waste streams. Food Chem. 286, 567–575 (2019).
CAS PubMed Google Scholar
Wang, H. et al. Improving the functionality of proso millet protein and its potential as a functional food ingredient by applying nitrogen fertiliser. Foods 10(6), 1332 (2021).
CAS PubMed PubMed Central Google Scholar
Winkler, M. K. & Straka, L. New directions in biological nitrogen removal and recovery from wastewater. Curr. Opin. Biotechnol. 57, 50–55 (2019).
CAS PubMed Google Scholar
Vollmer, A. C. & Bark, S. J. Twenty-five years of investigating the universal stress protein: Function, structure, and applications. Adv. Appl. Microbiol. 102, 1–36 (2018).
CAS PubMed Google Scholar
Han, A. et al. A polymer encapsulation strategy to synthesize porous nitrogen-doped carbon-nanosphere-supported metal isolated-single-atomic-site catalysts. Adv. Mater. 30(15), 1706508 (2018).
Google Scholar
Vandenbossche, M. & Hegemann, D. Recent approaches to reduce aging phenomena in oxygen-and nitrogen-containing plasma polymer films: An overview. Curr. Opin. Solid State Mater. Sci. 22(1), 26–38 (2018).
ADS CAS Google Scholar
Fahandezhsaadi, M. et al. Laboratory evaluation of nitrogen injection for enhanced oil recovery: Effects of pressure and induced fractures. Fuel 253, 607–614 (2019).
CAS Google Scholar
Fathinasab, M., Ayatollahi, S. & Hemmati-Sarapardeh, A. A rigorous approach to predict nitrogen-crude oil minimum miscibility pressure of pure and nitrogen mixtures. Fluid Phase Equilib. 399, 30–39 (2015).
CAS Google Scholar
Hemmati-Sarapardeh, A., Mohagheghian, E., Fathinasab, M. & Mohammadi, A. H. Determination of minimum miscibility pressure in N2–crude oil system: A robust compositional model. Fuel 182, 402–410 (2016).
CAS Google Scholar
Zhao, H., Morgado, P., Gil-Villegas, A. & McCabe, C. Predicting the phase behavior of nitrogen+ n-alkanes for enhanced oil recovery from the SAFT-VR approach: Examining the effect of the quadrupole moment. J. Phys. Chem. B 110(47), 24083–24092 (2006).
CAS PubMed Google Scholar
Liang, S. et al. Study on EOR method in offshore oilfield: Combination of polymer microspheres flooding and nitrogen foam flooding. J. Petrol. Sci. Eng. 178, 629–639 (2019).
CAS Google Scholar
Burrows, L. C. et al. A literature review of CO2, natural gas, and water-based fluids for enhanced oil recovery in unconventional reservoirs. Energy Fuels 34(5), 5331–5380 (2020).
CAS Google Scholar
Xiaofeng, D., Yongchun, H. & Weimao, P. Nitrogen dry replacement technology in natural gas pipeline and its practical application. Chem. Eng. Oil Gas/Shi You Yu Tian Ran Qi Hua Gong 40(3), 325–328 (2011).
Google Scholar
Kameya, T. et al. Nitrogen purge condition for simultaneous GC/MS measurement of chemicals. J. Water Environ. Technol. 12(2), 161–175 (2014).
Google Scholar
Yanisko, P., Zheng, S., Dumoit, J. & Carlson, B. Nitrogen: A security blanket for the chemical industry. Chem. Eng. Prog. 107(11), 50–55 (2011).
CAS Google Scholar
Gao, W., Gasem, K. A. & Robinson, R. L. Solubilities of nitrogen in selected naphthenic and aromatic hydrocarbons at temperatures from 344 to 433 K and pressures to 22.8 MPa. J. Chem. Eng. Data 44(2), 185–189 (1999).
CAS Google Scholar
Zirrahi, M., Hassanzadeh, H., Abedi, J. & Moshfeghian, M. Prediction of solubility of CH4, C2H6, CO2, N2 and CO in bitumen. Can. J. Chem. Eng. 92(3), 563–572 (2014).
CAS Google Scholar
Haddadnia, A., Zirrahi, M., Hassanzadeh, H. & Abedi, J. Solubility and thermo-physical properties measurement of CO2-and N2-Athabasca bitumen systems. J. Petrol. Sci. Eng. 154, 277–283 (2017).
CAS Google Scholar
Tong, J., Gao, W., Robinson, R. L. & Gasem, K. A. Solubilities of nitrogen in heavy normal paraffins from 323 to 423 K at pressures to 18.0 MPa. J. Chem. Eng. Data 44(4), 784–787 (1999).
CAS Google Scholar
Van Konynenburg, P. & Scott, R. Critical lines and phase equilibria in binary van der Waals mixtures. Philos. Trans. R. Soc. Lond. A 298(1442), 495–540 (1980).
ADS Google Scholar
Privat, R. & Jaubert, J.-N. Classification of global fluid-phase equilibrium behaviors in binary systems. Chem. Eng. Res. Des. 91(10), 1807–1839 (2013).
CAS Google Scholar
Jamali, M., Izadpanah, A. A. & Mofarahi, M. Correlation and prediction of solubility of hydrogen in alkenes and its dissolution properties. Appl. Petrochem. Res. 11, 89–98 (2021).
CAS Google Scholar
Park, J., Robinson, R. L. & Gasem, K. A. Solubilities of hydrogen in aromatic hydrocarbons from 323 to 433 K and pressures to 21.7 MPa. J. Chem. Eng. Data 41(1), 70–73 (1996).
CAS Google Scholar
Li, H. & Yan, J. Evaluating cubic equations of state for calculation of vapor–liquid equilibrium of CO2 and CO2-mixtures for CO2 capture and storage processes. Appl. Energy 86(6), 826–836 (2009).
CAS Google Scholar
Schwarz, B. J. & Prausnitz, J. M. Solubilities of methane, ethane, and carbon dioxide in heavy fossil-fuel fractions. Ind. Eng. Chem. Res. 26(11), 2360–2366 (1987).
CAS Google Scholar
Tsuji, T., Shinya, Y., Hiaki, T. & Itoh, N. Hydrogen solubility in a chemical hydrogen storage medium, aromatic hydrocarbon, cyclic hydrocarbon, and their mixture for fuel cell systems. Fluid Phase Equilib. 228, 499–503 (2005).
Google Scholar
Twu, C. H., Coon, J. E., Harvey, A. H. & Cunningham, J. R. An approach for the application of a cubic equation of state to hydrogen−hydrocarbon systems. Ind. Eng. Chem. Res. 35(3), 905–910 (1996).
CAS Google Scholar
D’Avila, S. G., Kaul, B. K. & Prausnitz, J. M. Solubilities of heavy hydrocarbons in compressed methane and nitrogen. J. Chem. Eng. Data 21(4), 488–491 (1976).
CAS Google Scholar
Privat, R., Jaubert, J.-N. & Mutelet, F. Addition of the nitrogen group to the PPR78 model (predictive 1978, Peng Robinson EOS with temperature-dependent k ij calculated through a group contribution method). Ind. Eng. Chem. Res. 47(6), 2033–2048 (2008).
CAS Google Scholar
Privat, R., Jaubert, J.-N. & Mutelet, F. Use of the PPR78 model to predict new equilibrium data of binary systems involving hydrocarbons and nitrogen. Comparison with other GCEOS. Ind. Eng. Chem. Res. 47(19), 7483–7489 (2008).
CAS Google Scholar
Justo-García, D. N., García-Sánchez, F., Stateva, R. P. & García-Flores, B. E. Modeling of the multiphase behavior of nitrogen-containing systems at low temperatures with equations of state. J. Chem. Eng. Data 54(9), 2689–2695 (2009).
Google Scholar
Justo-García, D. N., García-Sánchez, F., Díaz-Ramírez, N. L. & Díaz-Herrera, E. Modeling of three-phase vapor–liquid–liquid equilibria for a natural-gas system rich in nitrogen with the SRK and PC-SAFT EoS. Fluid Phase Equilib. 298(1), 92–96 (2010).
Google Scholar
Haghbakhsh, R., Parvaneh, K. & Esmaeilzadeh, F. New models for the binary interaction parameters of nitrogen–alkanes mixtures based on the cubic equations of state. Chem. Eng. Commun. 205(7), 914–928 (2018).
CAS Google Scholar
Wu, H., Zheng, K., Wang, G., Yang, Y. & Li, Y. Modeling of gas solubility in hydrocarbons using the perturbed-chain statistical associating fluid theory equation of state. Ind. Eng. Chem. Res. 58(27), 12347–12360 (2019).
CAS Google Scholar
Tsuji, T. et al. Gas solubilities of nitrogen or oxygen in benzene, divinylbenzene, styrene and of an equimolar (N2: O2) mixture in styrene at (293–313) K. Fluid Phase Equilib. 492, 34–40 (2019).
CAS Google Scholar
Aguilar-Cisneros, H., Uribe-Vargas, V. & Carreon-Calderon, B. Estimation of gas solubility in petroleum fractions using PR-UMR and group contributions methods. Fuel 275, 117911 (2020).
CAS Google Scholar
Abdi-Khanghah, M., Bemani, A., Naserzadeh, Z. & Zhang, Z. Prediction of solubility of N-alkanes in supercritical CO2 using RBF-ANN and MLP-ANN. J. CO2 Util. 25, 108–119 (2018).
CAS Google Scholar
Songolzadeh, R., Shahbazi, K. & Madani, M. Modeling n-alkane solubility in supercritical CO 2 via intelligent methods. J. Pet. Explor. Prod. 11(1), 279–287 (2021).
CAS Google Scholar
Chakraborty, S., Sun, Y., Lin, G. & Qiao, L. Vapor-liquid equilibrium predictions of n-alkane/nitrogen mixtures using neural networks. arXiv preprint (2020).
Mohammadi, M.-R. et al. Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state. Sci. Rep. 11(1), 1–20 (2021).
Google Scholar
Mohammadi, M.-R. et al. Modeling of nitrogen solubility in unsaturated, cyclic, and aromatic hydrocarbons: Deep learning methods and SAFT equation of state. J. Taiwan Inst. Chem. Eng. https://doi.org/10.1016/j.jtice.2021.10.024 (2021).
Article Google Scholar
Makranczy, J., Megyery-Balog, K. M., Rusz, L. & Patyi, L. Solubility of gases in normal-alkanes. Hung. J. Ind. Chem. 4(1), 269–280 (1976).
CAS Google Scholar
Wilcock, R. J., Battino, R., Danforth, W. F. & Wilhelm, E. Solubilities of gases in liquids II. The solubilities of He, Ne, Ar, Kr, O2, N2, CO, CO2, CH4, CF4, and SF6 in n-octane 1-octanol, n-decane, and 1-decanol. J. Chem. Thermodyn. 10(9), 817–822 (1978).
CAS Google Scholar
Tremper, K. K. & Prausnitz, J. M. Solubility of inorganic gases in high-boiling hydrocarbon solvents. J. Chem. Eng. Data 21(3), 295–299 (1976).
CAS Google Scholar
Bloomer, O. T. & Rao, K. N. Thermodynamic Properties of Nitrogen (Institute of Gas Technology, 1952).
Google Scholar
Cheung, H. & Wang, D.-J. Solubility of volatile gases in hydrocarbon solvents at cryogenic temperatures. Ind. Eng. Chem. Fundam. 3(4), 355–361 (1964).
Google Scholar
Chang, S.-D. & Lu, B. C. Vapor-Liquid Equilibriums in the Nitrogen-Methane-Ethane System (University of Ottawa, 1967).
Google Scholar
Miller, R., Kidnay, A. & Hiza, M. Liquid-vapor equilibria at 112.00 K for systems containing nitrogen, argon, and methane. AIChE J. 19(1), 145–151 (1973).
CAS Google Scholar
Parrish, W. & Hiza, M. Liquid-vapor equilibria in the nitrogen-methane system between 95 and 120 K. In Advances in Cryogenic Engineering 300–308 (Springer, 1995).
Google Scholar
Stryjek, R., Chappelear, P. S. & Kobayashi, R. Low-temperature vapor-liquid equilibriums of nitrogen-methane system. J. Chem. Eng. Data 19(4), 334–339 (1974).
CAS Google Scholar
Kidnay, A., Miller, R., Parrish, W. & Hiza, M. Liquid-vapour phase equilibria in the N2-CH4 system from 130 to 180 K. Cryogenics 15(9), 531–540 (1975).
ADS CAS Google Scholar
Eakin, B. E., Ellington, R. & Gami, D. Physical-Chemical Properties of Ethane-Nitrogen Mixtures (Institute of Gas Technology, 1955).
Google Scholar
Stryjek, R., Chappelear, P. S. & Kobayashi, R. Low-temperature vapor-liquid equilibriums of nitrogen-ethane system. J. Chem. Eng. Data 19(4), 340–343 (1974).
CAS Google Scholar
Grausø, L., Fredenslund, A. & Mollerup, J. Vapour-liquid equilibrium data for the systems C2H6+ N2, C2H4+ N2, C3H8+ N2, and C3H6+ N2. Fluid Phase Equilib. 1(1), 13–26 (1977).
Google Scholar
Gupta, M. K., Gardner, G. C., Hegarty, M. J. & Kidnay, A. J. Liquid-vapor equilibriums for the N2+ CH4+ C2H6 system from 260 to 280 K. J. Chem. Eng. Data 25(4), 313–318 (1980).
CAS Google Scholar
Schindler, D., Swift, G. & Kurata, F. More low temperature VL design data. Hydrocarb. Process. 45(11), 205 (1966).
CAS Google Scholar
Poon, D. & Lu, B.-Y. Phase equilibria for systems containing nitrogen, methane, and propane. In Advances in Cryogenic Engineering 292–299 (Springer, 1995).
Google Scholar
Frolich, P. K., Tauch, E., Hogan, J. & Peer, A. Solubilities of gases in liquids at high pressure. Ind. Eng. Chem. 23(5), 548–550 (1931).
CAS Google Scholar
Akers, W., Attwell, L. & Robinson, J. Nitrogen-butane system. Ind. Eng. Chem. 46(12), 2539–2540 (1954).
CAS Google Scholar
Roberts, L. & McKetta, J. J. Vapor-liquid equilibrium in the n-butane-nitrogen system. AIChE J. 7(1), 173–174 (1961).
CAS Google Scholar
Skripka, V., Barsuk, S., Nikitina, I., Gubkina, G. & Benyaminovich, O. Liquid-vapor equilibriums in a nitrogen-n-butane system. GazoV. Promst 14(4), 41–45 (1969).
CAS Google Scholar
Kalra, H., Robinson, D. B. & Besserer, G. J. The equilibrium phase properties of the nitrogen-n-pentane system. J. Chem. Eng. Data 22(2), 215–218 (1977).
CAS Google Scholar
Silva-Oliver, G., Eliosa-Jiménez, G., García-Sánchez, F. & Avendaño-Gómez, J. R. High-pressure vapor–liquid equilibria in the nitrogen–n-pentane system. Fluid Phase Equilib. 250(1–2), 37–48 (2006).
CAS Google Scholar
Poston, R. & McKetta, J. Vapor-liquid equilibrium in the methane-n-hexane system. J. Chem. Eng. Data 11(3), 362–363 (1966).
CAS Google Scholar
Baranovich, Z., Bogdanova, L. & Smirnova, A. Solubility of argon in nhexane at low temperatures. Russ. J. Appl. Chem 42(6), 1393–1396 (1969).
CAS Google Scholar
Eliosa-Jiménez, G., Silva-Oliver, G., García-Sánchez, F. & de Ita de laTorre, A. High-pressure vapor–liquid equilibria in the nitrogen+ n-hexane system. J. Chem. Eng. Data 52(2), 395–404 (2007).
Google Scholar
Boomer, E., Johnson, C. & Piercey, A. Equilibria in two-phase, gas-liquid hydrocarbon systems: IV. Methane and heptane. Can. J. Res. 16(11), 396–410 (1938).
Google Scholar
Akers, W., Kehn, D. & Kilgore, C. Volumetric and phase behavior of nitrogen-hydrogen systems: Nitrogen-n-heptane system. Ind. Eng. Chem. 46(12), 2536–2539 (1954).
CAS Google Scholar
Peter, S. & Eicke, H. Phase equilibrium in the systems nitrogen-n-heptane, nitrogen-2, 2, 4-trimethylpentane, and nitrogen-methylcyclohexane at higher pressures and temperatures. Ber. Bunsen-Ges 74(3), 190–194 (1970).
CAS Google Scholar
Brunner, G., Peter, S. & Wenzel, H. Phase equilibrium in the systems n-heptane-nitrogen, methylcyclohexane-nitrogen and n-heptane-methylcyclohexane-nitrogen at high pressures. Chem. Eng. J. 7(2), 99–104 (1974).
CAS Google Scholar
García-Sánchez, F., Eliosa-Jiménez, G., Silva-Oliver, G. & Godínez-Silva, A. High-pressure (vapor+ liquid) equilibria in the (nitrogen+ n-heptane) system. J. Chem. Thermodyn. 39(6), 893–905 (2007).
Google Scholar
Graham, E. & Weale, K. The Solubility of Compressed Gases in Non-Polar Liquids. In Progress in International Research on Thermodynamic and Transport Properties 153–158 (Elsevier, 1962).
Google Scholar
Baranovich, Z. SOLUBILITE DE N2 DANS LE N-HEXANE ET LE N-OCTANE A BASSES T. (1972).
Eliosa-Jiménez, G., García-Sánchez, F., Silva-Oliver, G. & Macías-Salinas, R. Vapor–liquid equilibrium data for the nitrogen+ n-octane system from (344.5 to 543.5) K and at pressures up to 50 MPa. Fluid Phase Equilib. 282(1), 3–10 (2009).
Google Scholar
Silva-Oliver, G., Eliosa-Jiménez, G., García-Sánchez, F. & Avendaño-Gómez, J. R. High-pressure vapor–liquid equilibria in the nitrogen–n-nonane system. J. Supercrit. Fluids 42(1), 36–47 (2007).
CAS Google Scholar
Azarnoosh, A. & McKetta, J. Nitrogen-n-decane system in the two-phase region. J. Chem. Eng. Data 8(4), 494–496 (1963).
CAS Google Scholar
García-Sánchez, F., Eliosa-Jimenez, G., Silva-Oliver, G. & Garcia-Flores, B. E. Vapor−liquid equilibrium data for the nitrogen+ n-decane system from (344 to 563) K and at pressures up to 50 MPa. J. Chem. Eng. Data 54(5), 1560–1568 (2009).
Google Scholar
Rupprecht, S. D. & Faeth, G. Investigation of Air Solubility in Jet a Fuel at High Pressures (NASA, 1981).
Google Scholar
García-Córdova, T., Justo-García, D. N., García-Flores, B. E. & García-Sánchez, F. Vapor− liquid equilibrium data for the nitrogen+ dodecane system at temperatures from (344 to 593) K and at pressures up to 60 MPa. J. Chem. Eng. Data 56(4), 1555–1564 (2011).
Google Scholar
Sultanov, R., Skripka, V. & Namiot, A. Phase equilibria in the systems methane–n-hexadecane and nitrogen–n-hexadecane at high temperatures and pressures. Deposited Doc. VINITI 2888-71 (1971).
Lin, H.-M., Kim, H. & Chao, K.-C. Gas-liquid equilibria in nitrogen+ n-hexadecane mixtures at elevated temperatures and pressures. Fluid Phase Equilib. 7(2), 181–185 (1981).
CAS Google Scholar
Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992).
MathSciNet Google Scholar
Thanh Noi, P. & Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 18(1), 18 (2018).
ADS Google Scholar
Breiman, L. Bagging predictors. Mach. Learn. 24(2), 123–140 (1996).
MATH Google Scholar
Chen, T. & Guestrin, C. In Xgboost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016).
Dev, V. A. & Eden, M. R. Gradient boosted decision trees for lithology classification. Comput. Aided Chem. Eng. 47, 113–118 (2019).
CAS Google Scholar
Yang, X., Dindoruk, B. & Lu, L. A comparative analysis of bubble point pressure prediction using advanced machine learning algorithms and classical correlations. J. Pet. Sci. Eng. 185, 106598 (2020).
CAS Google Scholar
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017).
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv preprint (2017).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv preprint (2018).
Meng, Q. et al. A communication-efficient parallel algorithm for decision tree. arXiv preprint (2016).
Ronze, D., Fongarland, P., Pitault, I. & Forissier, M. Hydrogen solubility in straight run gasoil. Chem. Eng. Sci. 57(4), 547–553 (2002).
CAS Google Scholar
Pedersen, K. S., Christensen, P. L. & Shaikh, J. A. Phase Behavior of Petroleum Reservoir Fluids (CRC Press, 2014).
Google Scholar
Péneloux, A., Rauzy, E. & Fréze, R. A consistent correction for Redlich-Kwong-Soave volumes. Fluid Phase Equilib. 8(1), 7–23 (1982).
Google Scholar
Gross, J. & Sadowski, G. Perturbed-chain SAFT: An equation of state based on a perturbation theory for chain molecules. Ind. Eng. Chem. Res. 40(4), 1244–1260 (2001).
CAS Google Scholar
Chen, Y., Mutelet, F. & Jaubert, J.-N. Modeling the solubility of carbon dioxide in imidazolium-based ionic liquids with the PC-SAFT equation of state. J. Phys. Chem. B 116(49), 14375–14388 (2012).
CAS PubMed Google Scholar
Kwak, T. & Mansoori, G. WVan der Waals mixing rules for cubic equations of state. Applications for supercritical fluid extraction modelling. Chem. Eng. Sci. 41(5), 1303–1309 (1986).
CAS Google Scholar
Florusse, L., Peters, C., Pamies, J., Vega, L. F. & Meijer, H. Solubility of hydrogen in heavy n-alkanes: Experiments and saft modeling. AIChE J. 49(12), 3260–3269 (2003).
CAS Google Scholar
Tihic, A., Kontogeorgis, G. M., von Solms, N. & Michelsen, M. L. Applications of the simplified perturbed-chain SAFT equation of state using an extended parameter table. Fluid Phase Equilib. 248(1), 29–43 (2006).
CAS Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Jaubert, J.-N., Privat, R., Le Guennec, Y. & Coniglio, L. Note on the properties altered by application of a Péneloux-type volume translation to an equation of state. Fluid Phase Equilib. 419, 88–95 (2016).
CAS Google Scholar
Privat, R., Jaubert, J.-N. & Le Guennec, Y. Incorporation of a volume translation in an equation of state for fluid mixtures: Which combining rule? Which effect on properties of mixing?. Fluid Phase Equilib. 427, 414–420 (2016).
CAS Google Scholar
Chen, G. et al. The genetic algorithm based back propagation neural network for MMP prediction in CO2-EOR process. Fuel 126, 202–212 (2014).
CAS Google Scholar
Mohammadi, M.-R., Hemmati-Sarapardeh, A., Schaffie, M., Husein, M. M. & Ranjbar, M. Application of cascade forward neural network and group method of data handling to modeling crude oil pyrolysis during thermal enhanced oil recovery. J. Pet. Sci. Eng. 205, 108836 (2021).
CAS Google Scholar
Vallero, D. Fundamentals of Air Pollution (Academic Press, 2014).
Google Scholar
Battino, R. The Ostwald coefficient of gas solubility. Fluid Phase Equilib. 15(3), 231–240 (1984).
CAS Google Scholar
Kumar, P. & Chevrier, V. F. Solubility of nitrogen in methane, ethane, and mixtures of methane and ethane at Titan-like conditions: A molecular dynamics study. ACS Earth Space Chem. 4(2), 241–248 (2020).
CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemical and Petroleum Engineering, Sharif University of Technology, Tehran, Iran
Seyed Ali Madani
Department of Petroleum Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
Mohammad-Reza Mohammadi & Abdolhossein Hemmati-Sarapardeh
Department of Chemical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Saeid Atashrouz
College of Engineering and Technology, American University of the Middle East, Egaila, Kuwait
Ali Abedi & Ahmad Mohaddespour
College of Construction Engineering, Jilin University, Changchun, 130012, China
Abdolhossein Hemmati-Sarapardeh
Key Laboratory of Continental Shale Hydrocarbon Accumulation and Efficient Development, Ministry of Education, Northeast Petroleum University, Daqing, 163318, China
Abdolhossein Hemmati-Sarapardeh

Authors

Seyed Ali Madani
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad-Reza Mohammadi
View author publications
You can also search for this author in PubMed Google Scholar
Saeid Atashrouz
View author publications
You can also search for this author in PubMed Google Scholar
Ali Abedi
View author publications
You can also search for this author in PubMed Google Scholar
Abdolhossein Hemmati-Sarapardeh
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Mohaddespour
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.A.M.: Investigation, Modeling, Visualization, Writing-Original Draft, M.-R.M.: Investigation, Data curation, Visualization, Writing-Original Draft, S.A.: Writing-Review & Editing, Methodology, Validation, A.A.: Writing-Review & Editing, Validation, A.H.-S.: Methodology, Validation, Supervision, Writing-Review & Editing, A.M.: Writing-Review & Editing, Validation.

Corresponding authors

Correspondence to Saeid Atashrouz, Abdolhossein Hemmati-Sarapardeh or Ahmad Mohaddespour.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Madani, S.A., Mohammadi, MR., Atashrouz, S. et al. Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state. Sci Rep 11, 24403 (2021). https://doi.org/10.1038/s41598-021-03643-8

Download citation

Received: 28 August 2021
Accepted: 07 December 2021
Published: 22 December 2021
DOI: https://doi.org/10.1038/s41598-021-03643-8
Springer Nature Limited

This article is cited by

Modeling crude oil pyrolysis process using advanced white-box and black-box machine learning techniques
- Fahimeh Hadavimoghaddam
- Alexei Rozhenko
- Abdolhossein Hemmati-Sarapardeh
Scientific Reports (2023)
Modeling solubility of CO2–N2 gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state
- Reza Nakhaei-Kohani
- Ehsan Taslimi-Renani
- Abdolhossein Hemmati-Sarapardeh
Scientific Reports (2022)

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Abstract

Similar content being viewed by others

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Modeling solubility of CO2–N2 gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state

Modeling the solubility of light hydrocarbon gases and their mixture in brine with machine learning and equations of state

Explore related subjects

Introduction

Data collection

Models’ implementation

Algorithms’ selection

K-nearest neighbors (k-NN)

Random forest

Extreme gradient boosting (XGBoost)

Light gradient boosting machine (LightGBM)

Gradient boosting with categorical features support (CatBoost)

Equations of state (EOSs)

Evaluation of models

Results and discussion

Model optimization and tuning

Statistics and performance metrics of the models

Graphical analysis of the models

Pressure and temperature trend analysis

Sensitivity analysis

Conclusions

Abbreviations

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Modeling crude oil pyrolysis process using advanced white-box and black-box machine learning techniques

Modeling solubility of CO2–N2 gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state

Search

Navigation

Modeling solubility of CO₂–N₂ gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state