1 Introduction

Oil products usually retain a certain quantity of water during processing, which will impact the storage, transportation, processing patterns of oil products and deteriorate the product qualities. Generally, free water will be formed if the water content in petroleum exceeds its solubility. It may cause equipment corrosion, catalyst damage, and so on [1,2,3,4]. Water in oil products mainly comes from the following sources: Firstly, there are a large amount of water injected into crude oil during the oil recovery process. Considering the quality of crude oil is getting heavier and deteriorated, the water injection rate tends to increase significantly which makes it difficult for the electric desalination and dewatering unit to control the water content of oil (≤ 0.5%, wt%) [5]. Besides, in the rectification unit operations, such as atmospheric and vacuum distillation, steam is usually introduced for stripping to decrease the partial pressure of oil. In addition, some components of oil may be oxidized yielding water during storage, transportation and processing, which cannot be ignored.

Petroleum is a complex organic mixture where organic acids, containing sulfur and nitrogen, are widely distributed in the gasoline, diesel and kerosene fractions. The trace amount of water provides an ionizing environment for these organic acids. This means H+ can corrode equipment and depress the stability of oil products. For example, alkenes are likely to be oxidized under acidic conditions. As a common electrochemical corrosion accelerant, Cl, combined with water, can cause serious corrosion and damage the equipment at the dew point [6]. Furthermore, the fluidity of oil products is affected by water below its freezing point, that is, water easily precipitates and freezes [7]. For this reason, the atomization of oil products through engine nozzle gets worse, and it gives off black smoke in the tail gas. The requirement for aviation kerosene is even more critical, because the ice crystals are likely to block the filter and lead to stalling of the engine [8]. In addition, the surfactants in jet fuel promote the emulsification of oil and water [9]. The tiny amounts of impurities accumulate on the filter surface, since it is difficult to separate water from oil products, which may shorten the service life of the filter. Meanwhile, the lubrication of the fuel may be destroyed due to the presence of water [10] and result in the growth of flocculation and microorganisms. In summary, to restrain the water content in oil products, it is necessary to accurately predict its solubility. This can not only control the quality of fractions, but also play an important role in process selection and design.

In view of the complexity of the composition of oil products, it is feasible to study the solubilities of water in hydrocarbons by the classification method of hydrocarbon groups. Note that the solubilities of water in hydrocarbons is very small, thus providing a low accuracy, restricted by the precisions of experimental methods and instruments. For example, the reported solubility of water in benzene ranges from 0.0026–0.0049 at 298.15 K [11]. Besides, we have trouble carrying out experiments on bulk hydrocarbons, so it’s necessary to develop the calculation methods to predict the solubility.

At present, there are four kinds of methods for modelling research, including state equations, activity coefficients, empirical correlations and COSMO-RS.

1.1 State Equation Methods

Kraska et al. [12] has used the revised LJ-SAFT state equation to predict the solubilities of water in hydrocarbons. The results show that it is only applicable for hydrocarbons with CN < 9. In 2006, Karakatsani et al. [13] employed the tPC-SAFT model to predict the solubilities of water in hydrocarbons, where the predicted results for the systems don’t agree well with the experimental data, apart from n-hexane and cyclohexane. In 2007, Oliveira et al. [14] calculated the solubilities by using the CPA state equation and van der Waals mixing rule. Its accuracy, however, is poor at low temperature.

1.2 Activity Coefficient Methods

Magnussen et al. [15] has developed UNIFAC-LLE model to calculate the solubilities of water in hydrocarbons but the deviations are large. There is also a lack of parameters for naphthenic hydrocarbon systems. In 2013, Satyro et al. [16] proposed an infinite dilution activity coefficient model, based on the NRTL equation, to calculate the solubilities at different temperatures. The mean relative deviation for 78 different hydrocarbons is about ± 34%. In 2016, Possani et al. [17] adopted the F-SAC model, combined with the SRK state equation, to calculate the solubility, according to the SCMR mixing rule presented by Staudt and Soares [18]. This model is not suitable for hydrocarbons with CN < 12.

1.3 Empirical Correlation Methods

In 1952, Hibbard and Schalla [8] proposed a general correlation for predicting the solubilities of water in various hydrocarbons. However, its error is as high as ± 56%; also it is not suitable for alkenes. In 2001, Tsonopoulos [19] measured the solubilities of water in C5–C10 hydrocarbons, including n-alkanes, alkylcyclohexanes, alkylbenzenes and 1-alkenes. He then established a correlation where temperature was employed as an independent variable. Nevertheless, the accuracy of this correlation was reduced to assuming the differences between the enthalpy of water in various hydrocarbons and that of pure water was a constant. In 2013, new correlations were published in the American Petroleum Institute (viz. API) [20] handbook for the solubilities of water in hydrocarbons, (see Eqs. 1 and 2) where xw is the solubility of water in hydrocarbons, a1 and a2 are component-specific parameters, and T is the absolute temperature. Note that Eq. 1 can only be applied in hydrocarbons with given parameters, while Eq. 2 is used for the hydrocarbons not involved in Eq. 1. In particular, Eq. 2 is not suitable for alkenes or cycloalkanes.

$$\ln x_{w} = a_{1} + \frac{{a_{2} }}{T}$$
(1)
$$\ln x_{{\text{w}}} = - \left( {4200\frac{H}{C} + 1050} \right)\left( {\frac{1.8}{T} - 0.0016} \right)$$
(2)

1.4 COSMO-RS Method

This is a molecular simulation method, which can be used to calculate the thermodynamic properties of the system, including solubility. Specifically, it calculates the molecular geometry and relevant parameters based on the quantum chemical density functional theory (DFT), and then predicts the solubilities of water in the hydrocarbons according to the changes of thermodynamic quantities in the dissolution process. In 2003, Klamt [21] predicted the solubility data with COSMO-RS and compared it with the experimental data of Tsonopoulos [19]. This fully proved the ability of COSMO-RS to calculate the solubilities of water in hydrocarbons. However, limited by the calculation technology at that time, the solubility could not be obtained directly, by means of Gibbs energy difference, ΔGw. Moreover, only the TZVP basis set was used in the calculations, which is not applicable to the systems with hydrogen bonding between water and hydrocarbons.

In this study, the solubilities of different water–hydrocarbon systems were determined with different basis sets, which improved the accuracy of calculation. That is, the TZVP basis set was used for n-alkanes, alkylcyclohexanes and 1-alkenes in the range of C5–C10, while the TZVPD-FINE basis set was used for alkylbenzenes in the range of C5–C10. The predictions of COSMO-RS were compared with the experimental data [11, 22,23,24,25,26,27,28,29,30,31,32], and the predictions of API correlations [20] and the Tsonopoulos correlation [19], respectively, where the effects of CN and T on the solubilities of water–hydrocarbon systems were analyzed systematically.

2 Method

COSMO-RS is an priori prediction method for the thermodynamic properties of mixtures based on the results of quantum chemical calculations. It characterizes the interactions between molecules by using the surface shield charge density [33], which is calculated by COSMO. More detail is available in the literature [34,35,36].

COSMO-RS calculations are two-step procedures. First, the distribution of molecular surface polarity, Pi (σ), is calculated for all compounds, which gives the relative amount of surface with polarity, σ, on the surface of the molecule. The solute molecules are calculated in a virtual conductor environment. Thus, the dielectric constant of a continuous conductor is infinite, which can be regarded as an ideal conductor [34]. In this environment, the solute molecules induce a polarization charge density, σ, on the molecular surface, which is a good local descriptor of the molecular surface polarity. During the quantum chemical (QC) self-consistency algorithm, the solute molecule is converged to its energetically optimal state in a conductor with electron density and geometry [21].

Then, the molecular interactions are simulated by the statistical thermodynamic method and the interaction energy of pair-wise interaction surface segments are quantified by the polarization charge density, respectively. Hydrogen bonding and electrostatic interactions are treated as the most important molecular interaction modes, while van der Waals interactions are less specific and taken into account in a slightly more approximate way. On the basis of the molecular interactions and a coupled set of non-linear equations, the chemical potential can be calculated for the thermodynamic properties, e.g. water solubilities in hydrocarbons. The calculated solubilities are output in the form of logarithmic mole fraction log10(x).

In this study, molecular structures are optimized by using the BP-RI-DFT [37,38,39] method along with TZVP [40] basis set and TZVPD-FINE basis set of TURBOMOLE. However, the TZVP basis set of COSMO-RS model does not take the effects of hydrogen bonding into account. Consequently, this study switches to the TZVPD-FINE basis set proposed in 2012, which based on a Turbomole BP-RI-DFT COSMO single point calculation with the TZVPD basis set on top of an optimized BP/TZVP/COSMO geometry, and the molecular surface cavity structure is set as fine grid marching tetrahedron cavity. Its parameters include a novel hydrogen bonding interaction term (HB2012). After using TURBOMOLE to get the COSMO files for all components, they can be reused in COSMOtherm for a series of calculations, which adopts version C30_1201 [41].

3 Results and Discussion

3.1 Solubilities of Water in Hydrocarbons at 298.15 K

3.1.1 Solubilities of Water in n-Alkanes at 298.15 K

In this study, TZVP and TZVPD-FINE basis sets were employed to calculate the solubilities of water in C5–C10 hydrocarbons at 298.15 K which were compared with the experimental data edited by Maczynski et al. [11, 22,23,24,25,26,27,28,29,30,31,32] in 2005, and the predictions of API correlations [20] and Tsonopoulos correlation [19], respectively.

From Fig. 1, we can see that the average relative deviation between the prediction of COSMO-RS and the experimental data [23,24,25, 29, 31, 32] is ± 9.83%. Its accuracy is better than those of API correlations [20] (± 19.65%) and Tsonopoulos correlation [19] (± 19.32%). The data of the plots in Fig. 1 are listed in Table 1 (see “Appendix”). On the other hand, the experimental solubility curve decreases firstly with the increase of CN when CN ≤ 7, and then increases with the increase of CN when CN > 7. That is, there is a minimum point at CN = 7. However, both the curves of API correlations [20] and Tsonopoulos correlation [19] arise monotonously with the increase of CN, which are inconsistent with the experimental curve.

Fig. 1
figure 1

Solubilities of water in n-alkanes at 298.15 K

The non-monotonic profile of the solubility curve of water in n-alkanes with the increase of CN can be explained as follows. Generally, the difference of Gibbs energy of the system, ΔG, determines the solubilities of water in hydrocarbons. According to the Gibbs equation, \(\Delta G = \Delta H - T\Delta S\); when T remains constant, the solubilities of water in n-alkanes are affected by both enthalpy difference, ΔH, and entropy difference, ΔS. If the carbon chain is short, ΔG is mainly affected by ΔH but not ΔS. Apparently, with the increase of CN, the number of methylene groups grows, and hence the dissolution of water in n-alkanes requires more energy because the methylene group has stronger repulsive force to water than the methyl group, i.e. \(\Delta H > 0\). As a result, the solubility decreases with the increase of CN when CN ≤ 7.

For CN > 7, the effect of ΔS become stronger. The Newman projections of n-hexane (CN < 7), projected from No. 3 and No. 4 carbon atoms, and n-octane (CN > 7), projected from No. 4 and No. 5 carbon atoms, are illustrated in Fig. 2. There are two ethyl groups in the projection formulae of n-hexane and two propyl groups in those of n-octane. In addition to the four extreme conformations of staggered conformation, partially staggered conformation, partially eclipsed conformation, eclipsed conformation (see Fig. 2), the extreme conformations of n-octane are more diversified than those of n-hexane because the conformational isomerization forms of propyl are more abundant than for ethyl. This vividly demonstrates that the randomness of n-octane system is greater than that of n-hexane system. Consequently, we can infer that with the increase of CN, n-alkanes have more conformational isomerizations [42], and the randomness of the system will increase greatly. Therefore, when CN > 7, the impact of ΔS will be gradually become greater than ΔH, thus leading to a minimum point in the change of ΔG.

Fig. 2
figure 2

Conformational isomerizations of n-hexane and n-octane

3.1.2 Solubilities of Water in Alkylcyclohexanes at 298.15 K

Figure 3 shows the prediction solubilities of water in alkylcyclohexanes at 298.15 K by COSMO-RS, compared with the experimental data [23, 30, 32] and the predictions of API correlations [20] and the Tsonopoulos correlation [19]. This shows that the solubilities of water in alkylcyclohexanes increase significantly when the CN changes from 7 to 8. The predicted value of API correlations [20] and Tsonopoulos correlation [19] are obviously lower than the experimental one [23, 30, 32], where the profile of the two predicted curves are approximately linear and quite distinct from the experimental curve. Meanwhile, the predicted value of COSMO-RS is more consistent with the experimental one, where the mean relative deviation is ± 13.21%, much lower than those of API correlations [20] (± 23.64%) and Tsonopoulos correlation [19] (± 27.72%). The data of the plots in Fig. 3 are shown in Table 2 (see “Appendix”).

Fig. 3
figure 3

Solubilities of water in alkylcyclohexanes at 298.15 K

The dissolution mechanism of water in alkylcyclohexanes can be explained as follows. Compared with n-alkanes, there are six methylene groups in cyclohexanes, that is, the weaker polarity of methylene groups results in lower solubility with same carbon number. Compared to cyclohexane, although the substituent R in alkylcyclohexanes has little effect on ΔH of the system, it will greatly increase the randomness, ΔS. Therefore, the solubilities of water in alkylcyclohexanes increase with the increase of CN. With respect to alkylcyclohexanes, the solubility depends on the substituent group R and the effect of R on overall molecular conformations. Note that although methyl cyclohexane (CN = 7) has one fewer methylene group than ethyl cyclonexane (CN = 8), the ethyl group has more conformations compared to the methyl group, thus leading to the significant increase of \(\Delta S\), and thereby promoting the increase of solubility.

As illustrated in Fig. 4, methyl cyclohexane has two conformations: the substituent can be in an axial position (R is on a-bond) or in an equatorial position (R is on e-bond). Apparently, the e-bond substitution is more stable than a-bond substitution due to steric hindrance. With the increase of CN of the R group, the ratio of the conformations of R on e-bond increases. When CN = 9, the e-bond substitution is 97% [43]. Because of the decrease in the ratio of a-bond substitution, the randomness is reduced, which offsets the increase of ΔS caused by the conformational isomerizations of the propyl group. Thus, the increase of solubility slows down from ethyl cyclohexane (CN = 8) to propyl cyclohexane (CN = 9).

Fig. 4
figure 4

Two conformations of methyl cyclohexane

3.1.3 Solubilities of Water in Alkylbenzenes at 298.15 K

Since alkylbenzenes contain a benzene ring composed of 6 carbon atoms, water molecules will form weak hydrogen bonds with the π-bond of alkylbenzenes during the dissolution process. Thus, the calculations switch to the TZVPD-FINE basis set of the COSMO-RS model. Figure 5 shows the solubilities of water in alkylbenzenes predicted by COSMO-RS. Its mean relative deviation is ± 5.60%, which is much better than that of API corrections [20] (± 9.52%, the parameters are confined only to C6–C9), and are approximately equal to Tsonopoulos correction [19] (± 5.48%) as a whole. However, the predicted values of COSMO-RS with higher carbon number are closer to the experimental value [11, 22, 24, 25, 32]. The data of the plots in Fig. 5 are listed in Table 3 (see “Appendix”).

Fig. 5
figure 5

Solubilities of water in alkylbenzenes at 298.15 K

The solubility of water in benzene is over 10 times higher than that of n-alkanes. This is mainly due to the formation of π-type hydrogen bonds between the π-electrons on the benzene ring and water. As a result, the attraction between alkylbenzene and water greatly reduces the value of ΔH. In addition, benzene has seven resonance structures, which increases randomness as well. In particular, if there is a substituent R on the benzene ring, R will delocalize the π-electron of the benzene ring and balance the electric charge density in the whole molecule. The greater the CN is, the more remarkable the delocalization phenomena will be. The delocalization of π-electron weakens the hydrogen bond strength. Additionally, the increase of CN in R will lead to the increase of ΔS. Although the increased randomness increases the solubility, it is insufficient to compensate for the decrease of solubility caused by the weakened hydrogen bond strength. Therefore, the solubility decreases with the increase of CN of R for alkylbenzenes.

3.1.4 Solubilities of Water in 1-Alkenes at 298.15 K

Owing to the lack of experimental data of the water solubilities in 1-alkenes at 298.15 K, the calculated values of COSMO-RS are only compared with those of the API correlation in Fig. 6. It can be seen from Fig. 6 that the calculated values from COSMO-RS for water solubilities in 1-alkenes are greater than those predicted by the API correlations [20], while the profiles of the two curves are similar. The data of the plots in Fig. 6 is shown in Table 4 (see “Appendix”).The error of the experimental data recommended by IUPAC is within \(\pm \;30{\%}\). Therefore, this confirms the reliability of COSMO-RS to some extent.

Fig. 6
figure 6

Solubilities of water in 1-alkenes at 298.15 K

3.2 Solubilities of Water in Hydrocarbons at Different Temperatures

3.2.1 Solubilities Calculated by COSMO-RS at Different Temperatures

As shown in Figs. 7, 8, 9 and 10, the solubilities of water in n-hexane, cyclohexane, 1-hexene and alkylbenzene in the range 298.15–323.15 K were calculated by COSMO-RS and compared with the experimental values [11, 30, 31] and the predictions of the API correlations [20]. The calculations of COSMO-RS in n-hexane, cyclohexane, 1-hexene were based on the TZVP basis set, while that in alkylbenzene were based on the TZVPD FINE basis set.

Fig. 7
figure 7

Solubilities of water in n-hexane at different temperatures

Fig. 8
figure 8

Solubilities of water in cyclohexane at different temperatures

Fig. 9
figure 9

Solubilities of water in 1-hexene at different temperatures

Fig. 10
figure 10

Solubilities of water in benzene at different temperatures

Figure 7 shows the solubilities of water in n-hexane in the range 298.15–323.15 K. It can be found that the COSMO-RS prediction is consistent with the prediction of the API correlation [20] at 298.15 K. After that, it is gradually closer to the experimental value [31] with the increase of temperature. On the whole, the accuracy of COSMO-RS in n-hexane (± 12.13%) is higher that of API correlations [20] (± 25.31%) and the Tsonopoulos correlation [19] (± 25.36%). The data of the plots in Fig. 7 are reported in Table 5 (see “Appendix”).

Figure 8 shows the solubilities of water in cyclohexane at 298.15–323.15 K. It can be found that the solubility of water in cyclohexane is consistent with the prediction of API correlations [20] at relatively lower temperatures. However, with the increase of temperature, the predicted value of COSMO-RS is gradually higher than the experimental data [30]. Its mean relative deviation is ± 22.68%, which is a little lower than those of API correlations [20] (± 16.41%) and the Tsonopoulos correlation [19] (± 16.45%). The data of the plots in Fig. 8 are reported in Table 6 (see “Appendix”).

Figure 9 shows the solubilities of water in 1-hexene at 298.15–323.15 K. Owing to the lack of the experimental data for water solubility in 1-hexene, the predicted values of COSMO-RS are mainly compared with API correlations [20]. It can be found that the COSMO-RS calculated solubility of water in 1-hexene is in good agreement with the prediction of API correlations [20] at 298.15 K. After that, it is gradually closer to the experimental values [30] with the increase of temperature. On the whole, the accuracy of COSMO-RS in 1-hexene is higher that of the API correlations [20]. Its mean relative deviation is ± 24.78%, which is much better than that of API correlations [20] (± 42.69%) and equal to the Tsonopoulos correlation [19] (± 43.18%) on the whole. The data of the plots in Fig. 9 is reported in Table 7 (see “Appendix”).

Figure 10 shows the solubilities of water in benzene at 298.15–323.15 K. This shows that the COSMO-RS calculated solubilities of water in benzene agree well with the experimental values [11, 22,23,24,25,26,27,28,29,30,31,32], where the mean relative deviation calculated by the TZVPD-FINE basis set of COSMO-RS is ± 19.83%, which is equal to that of the API correlations [20] (± 17.42%), but higher than the Tsonopoulos correlation [19] (± 3.51%). However, COSMO-RS has a wider range of applicability as a whole, being applicable to any hydrocarbon. The data of the plots in Fig. 10 are shown in Table 8 (see “Appendix”).

In general, COSMO-RS can quantitatively calculate the solubilities of water in C5–C10 hydrocarbons in the above temperature range. However, at higher temperature, the deviation of COSMO-RS is slightly larger. This is mainly due to the decrease of molecular density, caused by the increase of intermolecular distance and the weakening of hydrogen bonds when the temperature increases. Moreover, among the hydrocarbons mentioned above, cyclohexane has the lowest water solubility while alkylbenzene has the highest, so it has high accuracy in the calculation for alkylbenzene but a lower accuracy for cyclohexane by COSMO-RS.

3.2.2 Analysis of the Effect of Temperature

From Figs. 7, 8, 9 and 10, we find that temperature has a great influence on the solubilities of water in hydrocarbons, which is interpreted as follows. Firstly, according to the equation, \(E = \frac{3}{2}kT = \frac{1}{2}m\overline{v}^{2}\), where E is the molecule’s average kinetic energy; k is the Boltzmann constant; T is the temperature; m is the molecule mass; \(\overline{v}\) is the molecule’s average velocity. The molecule’s average kinetic energy rises with the increase of temperature. For liquid/liquid systems, the increase of molecular average kinetic energy is insufficient to separate solute from solvent, but the increase of irregular motion can increase the randomness of system. That is, the heat Q given to the system not only raises the temperature of the system, but also increases the entropy of the system. As mentioned above, the increase of ΔS, yet with little change of ΔH, will certainly lead to the decrease of ΔG and result in the increase of the solubility. Then, with the temperature rising, the distance between the hydrocarbons molecules also increases, which makes the interaction between the hydrocarbon molecules weaker than at lower temperatures. In addition, the water molecule, whose volume is much smaller than those of hydrocarbons, may easily exist in the interstices between hydrocarbon molecules. Therefore, the solubility of water is improved due to the enhanced attraction between water–hydrocarbon molecules. Finally, as temperature rises, the hydrogen bonds formed between water molecules become weaker, and the network connection structure is likely to break. As Klein indicated, the hydrogen bond at 303.15 K is weaker than that at 293.15 K [42], which weakens the interactive force between water molecules and increases the randomness. Consequently, the solubilities of water in hydrocarbons increases sharply with temperature.

3.3 Validation

Figure 11 compares the calculated and experimental values of solubilities of water in C5–C10 hydrocarbons at the temperature of 283.15–323.15 K. The average relative deviation among 111 data points is ± 17.82%, with a maximum value of ± 29.41%. The results show that COSMO-RS has good accuracy in application, where the deviation is mainly caused by the relatively higher value at high temperature.

Fig. 11
figure 11

Error analysis of the solubilities of water in hydrocarbons (COSMO-RS)

4 Conclusions

  1. (1)

    COSMO-RS, based on the quantum chemical density functional theory (DFT), can predict the solubilities of water in various hydrocarbons according to the changes of thermodynamic quantities during the dissolution process. This method is convenient and time-saving, which has a great potential for applications, even for complex mixtures. As a theoretical calculation method of molecular simulation, to some extent, it can make up the lack of data caused by insufficient experimental conditions and play an active role in selecting process and controlling oil products’ quality.

  2. (2)

    The solubilities of water in C5–C10 n-alkanes, alkylcyclohexanes, alkylbenzenes and 1-alkenes in the range of 283.15–323.15 K were calculated by COSMO-RS, which were compared with the experimental data [11, 22,23,24,25,26,27,28,29,30,31,32], the predictions of API correlations [20] and the Tsonopoulos correlation [19]. The results proved that COSMO-RS has better accuracy where the maximum deviation is within ± 29.41%.

  3. (3)

    COSMO-RS predicted the effects of carbon number on the solubilities of water in hydrocarbons, which were in good agreement with the experimental curves. Particularly, it demonstrated that there was a minimum point (CN = 7) in the solubility curve of water in n-alkanes with the increase of carbon number. Based on the interpretation of ΔH and ΔS on ΔG of the system, as well as the description of intermolecular force, the dissolution mechanisms of water in various hydrocarbons were explained, respectively.

  4. (4)

    COSMO-RS predicted the effects of temperature on water solubilities in hydrocarbons and systematically interpreted the dissolution mechanism, which were in good agreement with the experimental data on the whole. However, at high temperature, the predicted result was slightly higher because the decrease of molecular density, caused by the increase of intermolecular distance but the weakening of hydrogen bonds strength when temperature goes up [21] was neglected, which should be considered.