INTRODUCTION

Natural waters and wastewaters contain numerous phenolic compounds, many of which are dangerous toxicants. Phenols in water are rarely determined individually; instead, their total content (cΣ) is monitored, expressed as the phenol index (PhI) [1–3]. Unfortunately, determining the PhI often leads to significantly underestimated assessments of the total content [4–7]. This is due to the passivity of certain phenols in the reaction with 4-aminoantipyrine, intragroup signal selectivity, and the improper choice of standard substance. A good alternative to measuring the PhI is a recently developed method [8] involving the conversion of phenols into azo dyes by another reagent: diazotized sulfanilic acid (DSA). After a 10-min or 60‑min exposure, the generalized signal of phenols (AΣ) is measured at 360 nm and expressed relative to a standard substance (Xst) using one-dimensional calibration. This method allows for the assessment of the total content at levels of 10–6–10–4 mol/L; the errors of analysis are systematic and, with the correct choice of Xst, do not exceed 30 rel % in magnitude.

It is known that any assessments of cΣ obtained by recalculating AΣ to a standard substance Xst are metrologically incorrect [9, 10]. Measuring the total content of similar analytes in units of another physical quantity (Xst concentration) not only violates the principle of measurement unity but also increases the uncertainty of the results of analysis. This applies to the method [8] as well. More accurate methods for assessing cΣ are needed that do not require recalculating to Xst. In particular, generalized signals can be measured at multiple wavelengths and then cΣ can be determined using inverted multivariate calibration [11, 12]. In recent years, this approach has been successfully applied to determine the total content of carbohydrates [13], hydrocarbons [14], anthocyanins [15], and other groups of similar organic compounds. Multivariate calibrations (MC) have not been used previously to determine the total content of toxic phenols in natural waters or wastewaters. The possibility of accurately assessing cΣ using this method requires experimental verification, which was the aim of our research. Generalized signals were measured according to the procedure [8], multivariate calibrations were calculated and optimized, and then multicomponent aqueous solutions of phenols with different but known compositions were analyzed. The application of the optimized method in the analysis of different types of natural waters and wastewaters will be discussed in the next article.

EXPERIMENTAL

Objects of research. We used eight individual phenolic compounds containing from one to three hydroxyl groups (Table 1).

Table 1.   Individual phenol compounds used in the experiment

Initial aqueous or aqueous alcoholic solutions of phenols were prepared using precise weighed amounts of chemically pure reagents without additional purification. Working solutions were prepared on the day of the experiment, diluting the original ones with distilled water.

Model mixtures (colored multicomponent aqueous solutions) were prepared by mixing the calculated volumes of initial solutions of different phenols and diazotizing reagents. Single mixtures contained from two to five individual phenols, with molar ratios of different phenols in their mixture not exceeding 10 : 1 and total phenolic contents (cΣ) ranging from 15 to 70 μmol/L. Hereinafter, cΣ values are given in final dilution, i.e., after conversion of phenols to azo dyes. In total, more than 60 colored solutions with known cΣ values were prepared. A part of the mixtures (training set) was used for construction of multivariate calibrations. These mixtures contained Ph, N1, G, and R (Table 2). The mixtures of the first test sample had the same qualitative composition, but different ratios of components, which were used to check and compare the efficiency of different calibrations.

Table 2.   Composition of some model mixtures used to form training sets

The composition of the mixtures for the formation of the second test sample included both the above and other phenols, namely, MC, N2, P, and PG. Mixtures from the third test sample contained only the last four phenols (Table 3). Each test sample contained seven mixtures of known composition. Thus, in contrast to a number of analogous studies, when testing the new methodology of group analysis, we deliberately used mixtures of not only the same but also a different qualitative composition than in the formation of the mathematical model.

Table 3.   Results of analysis of phenol mixtures from various test samples

Experimental technique. To convert individual phenols into azo dyes, 5.0 cm3 of 0.1 M NaHCO3 solution was introduced into a 50.00 cm3 volumetric flask (to create pH 7.4), along with distilled water to 2/3 of the volume of the flask and V (cm3) of working solution of the phenol under study. Then 1.0 cm3 of DSC solution with a concentration of 5.0 × 10–3 mol/L prepared from reagent of analytical grade according to the method [17] was added; the volume of the solution was brought to the mark with distilled water and mixed. The values of V were chosen so that the optical densities of the photometered solutions (AΣ) in the region of 350–410 nm were in the range from 0.1 to 1.0 units. In τ = 10 min after the addition of DSC, the absorption spectrum of the prepared solutions was recorded using an SF-2000 spectrophotometer in quartz cuvettes (l = 1.00 cm); the blank solution served as a reference solution. The AΣ values were measured at several (m) preselected analytical wavelengths (AWL). Similarly, multicomponent colored solutions were prepared and the generalized signals were measured. The spectra of each colored solution were recorded three times, the AΣ values obtained at the same AWL were averaged. The generalized signals had good precision: when remeasuring the optical density of one solution at any AWL, Sr < 1%, and when repreparing solutions, Sr < 3%. The formation and measurement of generalized signals are described in more detail in [8]. The additivity of generalized signals was checked using the 3S criterion [18].

Construction of multivariate calibrations. Inverted multivariate calibrations were constructed using Microsoft Excel with use of the formula

$${{c}_{{{{\Sigma }_{i}}}}} = \sum\limits_{j = 1}^m {{{b}_{j}}{{A}_{{ij}}}} ,$$
(1)

where cΣ is the total concentration of phenols in the ith mixture, Aij is the optical density of the ith mixture at the jth AWL, and bj is the regression coefficient for the jth AWL. Summation was carried out over all AWLs, the number of which was purposefully varied from three to ten during the experiment. The use of experimental data on n mixtures of the training set made it possible to form an overdetermined system of linear equations, which was solved with respect to coefficients by the least squares method (OLS algorithm [11]) using the Microsoft Excel software package. The determined values of bj were substituted into Eq. (1), obtaining the desired calibration. Thus, with m = 7 and n = 10, we obtained the following regression:

$$\begin{gathered} {{c}_{\Sigma }} = 115.374{{A}_{{350}}} + 49.13{{A}_{{360}}} - 12.270{{A}_{{370}}} \\ - \,\,214.2{{A}_{{380}}} + 114.10{{A}_{{390}}} - 16.476{{A}_{{400}}} + 39.85{{A}_{{410}}}. \\ \end{gathered} $$
(2)

Substituting the values of Aj characterizing the next sample into the resulting equations led to results (\(c_{\Sigma }^{*}\)) close to the total content of phenols in this sample taking into account its dilution during the analysis. Naturally, when changing m and/or n, we obtained slightly different calibrations and slightly different values of \(c_{\Sigma }^{*}\) for the same mixtures.

Estimation of errors. Statistical processing of the results of the analysis of each mixture was carried out using the Student algorithm (n = 3; P = 0.95). The errors in the analysis of single mixtures were found according to the formula

$$\delta c(\% ) = 100\frac{{c_{\Sigma }^{*} - {{c}_{\Sigma }}}}{{{{c}_{\Sigma }}}}.$$
(3)

Repeated photometry of the same mixture yielded highly reproducible values of \(c_{\Sigma }^{*}\) (Sr < 2%). The generalized error in the analysis of different mixtures included in a certain test set and analyzed with the help of some calibration was characterized by the RMSEC and RMSEP parameters [11], expressed in μmol/L, as well as in % of the average content of phenols in this set. Both parameters were calculated as follows:

$${\text{RMSEP}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^n {{{{\left( {c_{\Sigma }^{*} - {{c}_{\Sigma }}} \right)}}^{2}}} } .$$
(4)

RMSEC characterizes the adequacy of the model when applied to mixtures from the training set, and RMSEP characterizes the adequacy of the mixtures from the test sample.

To optimize the parameters m and n, the RMSEC or RMSEP values obtained using different models for the first test sample were compared. Maximum errors in the analysis of single mixtures were also taken into account.

RESULTS AND DISCUSSION

As previously established and confirmed during this study, phenolic toxicants in neutral aqueous solutions form stable azo dyes under the action of DSA (reagent), which absorb light well in the near UV region of the spectrum. The optical densities of these azo dyes in the concentration range of 10–5–10–4 mol/L are directly proportional to the concentration of the initial phenols (Fig. 1). The molar absorption coefficients of different azo dyes (ε) at the same wavelength vary, but in the range of 350–410 nm, they are of the same order of magnitude. For the set of phenols used, the ratio T = εmaxmin does not exceed 7 units.

Fig. 1.
figure 1

Univariate calibrations for the determination of individual phenols as azo dyes (pH 7.4, τ = 10 min, λ = 380 nm).

Deviations from additivity of analytical signals at the selected analytical wavelength in the vast majority of cases were found to be statistically insignificant, and in the remaining cases, they did not exceed 5 rel %. The relatively low level of intragroup selectivity and approximate additivity of signals provide the possibility of a correct estimation of cΣ when recalculated to a standard substance [8]. On the other hand, the mentioned characteristics of phenols and their corresponding azo dyes allow for the rapid construction of multidimensional linear models and their application for determining the total content of phenols according to the spectrum of the corresponding mixture. Examples of absorption spectra of mixtures of phenols after conversion of the components into azo dyes are shown in Fig. 2.

Fig. 2.
figure 2

Absorption spectra of phenol mixtures after their interaction with DSA (CDSA = 100 μmol/L, pH 7.4, τ = 10 min): (41) N1 + N2 + R + G, 59.8 [mu]mol/L; (42) R + G, 15.3 μmol/L; (43) Ph + N1 + R, 34.9 μmol/L; (44) Ph + R + G, 45.4 μmol/L).

Selecting the optimal mathematical model. To select a mathematical model connecting the total concentrations of phenols and their generalized signals, two series of experiments were conducted. In the first series, absorption spectra of seven model mixtures from the training set were recorded in the range from 350 to 410 nm. Generalized signals were measured at different AWLs, gradually increasing their number from m = 3 to m = 10. The values of Aij for the mixtures from the first test sample were substituted into the obtained regression equations. For each mixture, the values of \(c_{\Sigma }^{*}\) were calculated, the relative errors of group analysis (δcj) were determined, and then the generalized error of analysis of these mixtures (RMSEP) was calculated. As expected, with a fixed volume of the training set, the values of individual errors and RMSEP decreased as the number of AWL increased. Reasonably accurate results (RMSEP ≈ 10% of the average phenol content) were obtained starting from m = 7. Further increasing the number of AWLs did not significantly improve the accuracy of the results, but complicated the procedure; therefore, it was deemed impractical.

In the second series of experiments, the number and set of used AWLs remained unchanged (m = 7), but the parameter n was varied from 5 to 16. Increasing the number of model mixtures in the training set initially led to a reduction in errors of group analysis and then to their increase (presumably due to the accumulation of random errors). The minimum value of RMSEP, equal to 7.8% of the average phenol content in the first test sample, was observed at n = 10. For further application, a multivariate calibration obtained at m = 7 and n = 10 was selected (see Eq. (2)). The total concentrations of phenols in mixtures from test sample no. 1 were determined using this calibration with individual errors not exceeding 13 rel % (in absolute value).

For comparison, the same mixtures were analyzed using the method [8], expressing the results in terms of the best standard substance (1-naphthol), i.e., in the form of an total index (TI). In this case, individual errors reached up to 23 rel %, and the RMSEP parameter was 12.8% of the average phenol content (Table 4, top line). It can be seen that, with the same qualitative composition of the samples and mixtures from the training set, the transition from calculating II to using inverted multivariate calibration reduces the errors in determining the sum of phenols by approximately half. It should be noted that, when determining the sum of hydrocarbons or the sum of aromatics, the transition from II to MC results in even greater gains in accuracy [14, 19].

Table 4.   Maximum (in absolute value) and generalized errors of group analysis for phenol mixtures of different compositions

Obviously, replacing the metrologically incorrect operation (calculating TI) with calculating the total phenol content using inverted MC should increase the accuracy of results in the analysis of natural waters and wastewaters. The development of corresponding methods has already begun. The main problem in this case is the choice of the optimal composition of the training set.

Influence of the qualitative composition of phenolic mixtures on the accuracy of analysis. It is known that the qualitative composition of calibration samples should match the composition of the samples being analyzed. However, it is not always possible to create a training set that includes all components of future samples. This is feasible, for example, in quantitative analysis of synthetic pharmaceuticals containing known sets of components, but impossible in the analysis of heavy petroleum products [11]. In the separate determination of components using multivariate calibration, the presence of foreign substances in the sample belonging to the same group of analytes but not accounted for in the calibration leads to systematic errors [20]. The influence of the same factor on the results of group analysis is difficult to predict and poorly studied [19, 21]. Selecting the composition of the training set for determining the total phenol content in natural waters and waste waters is challenging because the qualitative composition of phenolic mixtures in such waters varies significantly depending on the type of water and the source of phenol contamination [22, 23].

In the course of this study, it was necessary to determine to what extent the results of group analysis would be distorted when the composition of the samples being analyzed does not match the mixtures used to construct the simplified multivariate calibration. A comparison of the results of analysis of different test set (see Table 4) shows that the presence of “foreign” phenols significantly increases the absolute values of systematic errors in group analysis. This occurs both in the calculation of TI and when using simplified MC, and in such cases, the use of MC can lead to even greater (in absolute value) errors than the calculation of TI. A similar conclusion was previously drawn when studying the influence of foreign compounds on the results of determining the sum of arenes [19].

The influence of “foreign” phenols on the results of group analysis is explained by differences in sensitivity coefficients when determining different phenols in the form of azo dyes (see Fig. 1). For example, the simplified multivariate calibration (MC) in the range of 360–410 nm is determined with the same or slightly higher sensitivity than the phenols included in the training set. Therefore, the presence of mC in the mixtures from test sets 2 and 3 led to small, positively biased errors in absolute value. On the contrary, phenols that react slowly with the diazotized sulfanilic acid (H2, PG, and PC) were determined with lower sensitivity after a 10-min exposure compared to the components of the training set. Therefore, the presence of these phenols in the samples should lead to significantly underestimated results, as observed during the conducted experiment.

CONCLUSIONS

On the basis of the results of the experiment, the following conclusions and practical recommendations can be formulated.

(1) The possibility of determining the total content of phenols in the form of azo dyes has been confirmed. It has been established that the analysis of multicomponent aqueous solutions containing phenols at a level of 10–5 mol/L can be carried out in a metrologically correct way based on the construction of inverted multivariate calibrations.

(2) Using inverted multivariate calibrations, it is possible to obtain significantly more correct estimates of the total content of phenols than by calculating the corresponding total index. After optimizing the number of analytical wavelengths and the amount of samples in the training set, the single errors in the group analysis of mixtures from the test sample according to the corresponding multivariate calibration did not exceed 13 rel %, and the generalized error of analysis (RMSEP) was 7.8% of the average phenolic content. At the same time, the sensitivity, precision, and duration of analysis using the new method are close to the characteristics of traditional methods for assessing the total content of phenols.

(3) The main disadvantage of the new technique is the sensitivity of the result of analysis to the individual composition of the samples being studied. If there are phenols in the sample that were not taken into account when constructing a simplified multivariate calibration, the systematic errors of the group analysis increase sharply. In such cases, the use of II may be preferable. Particularly dangerous is the presence of phenols in the sample, which are determined with much greater or much less sensitivity than the phenols used to construct the MC. Therefore, the training set should include the widest possible set of phenols, including compounds determined with particularly high and particularly low sensitivity. This recommendation was confirmed experimentally using multivariate calibration constructed using ten model mixtures, including all eight phenols we used: the total contents of phenols in model mixtures from all three test samples were determined quite accurately (single errors were less than 15 rel %, RMSEP = 8.5%).

(4) The use of multivariate calibrations to determine the total content of phenols in wastewater, as well as in heavily polluted natural waters, is possible and advisable. It is necessary only to form multivariate calibrations in accordance with the expected composition of phenolic mixtures in the corresponding waters, identifying their main components using HPLC, and then include these phenols in the training set. Thus, to analyze waters of different types, different multivariate calibrations will be required. An alternative field of methodological research is the elimination of intragroup selectivity of analytical signals, which can lead to the creation of a unified calibration.