Introduction

Spatial modeling of geologic phenomena is challenging and involves the evaluation of uncertainties innate to any estimation or simulation method (e.g., Bowen 2010; Koike et al. 2015; Battalgazy and Madani 2019). Uncertainty is often represented by a variable degree of a probability distribution (Pyrcz and Deutsch 2014), which originates from sparsity of sampled data and/or heterogeneity of geologic properties. Such spatial uncertainty can be modeled and quantified by geostatistical methods (Delfiner and Chilès 2012) using estimation and stochastic conditional simulations for unsampled locations. The uncertainty measures include estimate error variance (Calder and Cressie 2009) and the variability of multiple equiprobable simulated realizations (Pyrcz and Deutsch 2014).

Ordinary kriging (OK) is an optimal linear unbiased estimator that has been widely used in earth science (e.g., Isaaks and Srivastava 1989; Shahbeik et al. 2014; Ilyas et al. 2016; Pugliese et al. 2016). However, OK also brings inevitable problems common to any other estimators, particularly a smoothing effect that makes OK estimations much smoother than actual heterogeneities. Simulations are then required to reproduce the heterogeneity by considering stochastic properties. The most popular simulation method is the sequential Gaussian simulation (SGSIM) based on a Gaussian random field (Deutsch and Journel 1998; Chilès and Lantuéjoul 2005; Emery 2007). SGSIM is a straightforward method to obtain sequential values at each new simulation point from a conditional distribution assigned to the data and previously simulated values (Delfiner and Chilès 2012). Another typical simulation method is the turning bands simulation (TBSIM) that also relies on a Gaussian random field (Matheron 1973; Ren 2005; Emery and Lantuéjoul 2006). TBSIM generates \( {\mathbb{R}}^{n} \) simulations from multiple independent, unconditional \( {\mathbb{R}}^{1} \) simulations along lines that can be rotated in \( {\mathbb{R}}^{3} \) space. This unconditional simulation is then corrected to be conditional by kriging so that the simulated values are equal to sample values at the data locations. However, SGSIM is not useful for reproducing short-scale continuities (Lantuéjoul 1994), which causes numerical instability if the data covariance is constant over a short distance, and artifact discontinuities tend to appear in TBSIM results (Lantuéjou 1994; Olea 1999) when the number of lines is few (Gneiting 1999; Emery and Lantuéjoul 2006; Eze et al. 2018).

Although many studies have compared geostatistical methods (e.g., Iskandar et al. 2012; Paravarzar et al. 2015; Lu et al. 2016), there are only a few that specifically address situations under severely limited access to data. This situation is common in submarine resource exploration because of the technical difficulty and high costs associated with deep drilling of the seafloor. However, submarine resources have attracted considerable attention with the increasing demand for mineral resources. Seafloor massive sulfide (SMS) deposits are a seafloor mineral resource that are generally rich in base metals (Cu, Pb, Zn), but their distribution, configuration, and formation mechanism, especially sub-seafloor mineralization, remain poorly understood because considerably less survey data are available from drilling and geophysical prospecting compared with on-land areas.

Framed by the background presented above, this study aims to clarify the metal content distribution and locate mineralized zones using a limited amount of metal content data from a few drilling sites. We use principal component analysis (PCA) and three geostatistical methods (OK, SGSIM, and TBSIM) to evaluate the best method for the sparse data scenario. To address global necessity, the method selection must be useful for reserve assessment and deposit modeling. An active seafloor vent area in the Okinawa Trough, southwest Japan, was selected as a case study. The deposits in the subduction-related back-arc setting (Pirajno 2009) are regarded as a modern analog of Kuroko-type volcanogenic massive sulfide (VMS) deposits on land (Halbach et al. 1989; Ishibashi et al. 2015).

Materials and Methods

Geologic Setting

The study area is a part of a caldera floor in the middle Okinawa Trough (Fig. 1) formed by resurgent rhyolite domes and covered by up to 30-m-thick unconsolidated sediments including hemipelagic, silty Holocene clays, sulfide-bearing layers, tuff breccias, and pumice (Glasby and Notsu 2003; Ishibashi et al. 2015; Nozaki et al. 2018). Several normal faults are present in this area along the rifting axis of the Okinawa Trough and trend E–W or ENE–WSW (Kato et al. 1989; Kato 1990; Halbach et al. 1993). The faults act as pathways for hydrothermal fluid flow (Pirajno 2009).

Figure 1
figure 1

(a) Seafloor topography of the study area (Hakurei Site, Izena Hole in the middle Okinawa Trough) overlain with locations of six drill sites (I–VI). The target region for geostatistical modeling is shown in the red rectangle. (b) A seismic profile along the red line in (a) with interpreted lithotypes, mineralized zone, and fault distribution by Asakawa and Lee (2018). The black line in (a) is the location of the cross section of the geostatistical results presented in Figure 6. The three stars in (a) represent the active chimney locations

An area covering 700-m E–W × 130-m N–S at a depth of 1570 m below sea level (mbsl) (Fig. 1a) was chosen for geostatistical modeling, in which six drill sites (I–VI) are distributed nearly E–W. This area is termed the Hakurei Site, Izena Hole (Ishibashi et al. 2015; Totsuka et al. 2019). The westernmost drill hole (I) is located at a massive sulfide mound with a complex chimney structure. Based on the drill core observations, excluding the westmost sulfide mound, the top layer consists of poorly sorted primary and reworked underwater debris flow sediments with variable grain sizes mixed with volcaniclastic and hemipelagic sediments. The deeper portions consist of hydrothermal altered clay and altered volcaniclastic rocks (Nozaki et al. 2018). The basement presumably consists of intra-caldera ignimbrite with a dacitic–rhyolitic composition with pervasive hydrothermal alteration (Nozaki et al. 2018; Yamasaki 2018).

The geologic structure underneath the seafloor was imaged from a seismic profile by Asakawa and Lee (2018) that covered a part of the study area from drill sites III to VI (Fig. 1b). The profile outlines two major structures: a stratabound layer with a high-velocity anomaly and fault development between drill sites V and VI from the basement toward the seafloor. The layered structure is interpreted as a concentration zone of polymetallic sulfides and hydrothermal alteration (Asakawa and Lee 2018; Nozaki et al. 2018). Such zones are characterized by high permeability and sulfide precipitation in the diffusive flow and cooling of hydrothermal fluids (Rona et al. 1993; Nozaki et al. 2018). The inferred fault may have been generated during the caldera formation (Halbach et al. 1993; Yamasaki 2018).

Sample Data

Element contents in the vertical drill cores (ppm or wt%) were measured by inductively coupled plasma quadrupole mass spectrometry (ICP-QMS) following the HF–HNO3–HClO4 acid digestion method (Takaya et al. 2018). In total, 448 samples with six elements (Zn, Pb, Cu, Ba, Ag, and Cd) were selected for geostatistical modeling. These are major elements of the dominant constituent sulfide/sulfate minerals of SMS deposits in the Okinawa Trough (Halbach et al. 1989, 1993; Ishibashi et al. 2015; Nozaki et al. 2016). The sample sizes collected onboard were typically 10 to 20 cm3. Approximately 50 mg of powder of each sample was used for the ICP-QMS analyses.

Analytical Flow of Spatial Modeling

Mineralized zones were identified in the study area using 448 geochemical sample data points from the drill cores, implemented by four steps: pre-processing, PCA, normal score transformation, and spatial modeling by geostatistical estimation and simulations, as shown in the flowchart in Figure 2. The data at each sample point are multivariate and compositional. Because the content magnitude and variance differ substantially between the elements, we perform pre-processing to avoid generating spurious correlations between two elements by decreasing the bias of the content distribution. The centered log-ratio (clr) transformation by Aitchison (1986) was selected for this purpose because its suitability has been demonstrated in several previous case studies (e.g., Aitchison 2002; Pawlowsky-Glahn and Olea 2004; Pawlowsky-Glahn et al. 2011). In this method, the content data of a certain element, xi, is divided by the geometric mean (\(g_{\text{m}}\)) and then log-transformed as:

$$ {\text{clr}}\left( {x_{i} } \right) = { \ln }\frac{{x_{i} }}{{g_{\text{m}} }}\quad {\text{for}}\;g_{\text{m}} = \left( {\mathop \prod \limits_{j = 1}^{m} x_{j} } \right)^{{1}/{m}} = \sqrt[m]{{x_{1} \ldots x_{m} }} $$
(1)

where m is the number of data points for the target element. The content data after the clr transformation are expressed by adding *, e.g., Zn* for the original Zn data.

Figure 2
figure 2

Flowchart for clarification of mineralized zones using geochemical sample data from drill cores. The procedure consists of pre-processing, PCA, normal score transformation, and spatial modeling by geostatistical estimation and simulation

Because correlations generally exist between element contents in metal deposits, PCA was applied to decrease the dimensionality of the content data by linearly combining the correlated elements to yield lower-dimensional variables and principal components (PCs). The PCs were used for subsequent geostatistical analyses. PCA simplifies the geostatistical calculations by changing multivariate to univariate and facilitates the specification of mineralized zones. Each PC was then transformed into a normal score that follows a standard normal distribution with a mean of 0 and variance of 1. Because a dataset following a normal distribution is suitable to geostatistical analyses, this normal transformation is indispensable for the case that the data distribution is biased and far from a normal distribution.

For the actual geostatistical steps, the variography and principles, equations, and calculation procedure of OK, SGSIM, and TBSIM have been described in detail in several references (e.g., Isaaks and Srivastava 1989; Armstrong 1998; Deutsch 1998; Delfiner and Chilès 2012; Pyrcz and Deutsch 2014). TBSIM uses 1000 turning bands to avoid possible artifacts, as proposed by Emery (2008). By averaging 100 realizations for both SGSIM and TBSIM, e-type models of these simulation methods were produced. The estimated and simulated results obtained from the three methods were linearly back-transformed into the original data scale.

Grid Setting

For the geostatistical modeling, the study area was gridded by voxels of a unit size of 10 m along the X-axis (E–W), 10 m along Y-axis (N–S), and 0.4 m along the depth direction, Z-axis. These sizes were determined by considering the average intervals of neighboring drill sites in the horizontal direction and neighboring sample points along the drill site in the vertical direction. A small vertical size was chosen to reveal small content changes. The bottom location of the longest borehole III was used to set the bottom boundary of the modeling domain. The estimation and simulation were point based, i.e., the calculations were implemented at the grid points.

Results

Descriptive Statistics of the Content Data and PCs

The basic statistical features of the six selected elements are indicated by descriptive statistics (Table 1) and a correlation matrix between two elements (Table 2). Based on the median values, Zn, Pb, and Ba are the main enriched elements in the study area. Linear correlation coefficients (Rs) in the correlation matrix reveal that four metals: Zn*, Pb*, Cu*, and Cd*, are correlated with one another with the strongest correlation between Zn* and Pb* (R = 0.94). Ba* is not strongly correlated with the other five elements, and Ag* has moderate correlations with the above four metals with R = 0.64 to 0.75. The existence of those moderate and high correlations among the target elements demonstrates the effectiveness of using PCA, as noted by Swan and Sandilands (1995).

Table 1 Descriptive statistics of content data of six elements after the centered log-ratio transformation
Table 2 Correlation matrix of content data after the centered log-ratio transformation with correlation coefficients between two elements

The correlations cause particularly large eigenvalue for the first PC (PC1) in which 74.7% of the total variance is included (Fig. 3). PC2’s variance is much smaller (12.6%), and the sum of PC3 to PC6 variances is 12.7%. This means that the higher PC1 value corresponds to a higher sum of the six element contents and high-PC1 zones can be indicative of sulfide mineralization zones. Consequently, a set of only the PC1 values was used for the subsequent geostatistical analyses. A histogram of the PC1 values shows two peaks in the low and high values (Fig. 4a). The lower peak in the high value suggests the formation of mineralized zones. This bimodality is also observed in the main base metals: Zn*, Pb*, and Cu*. The histogram in Figure 4b verifies the correct transformation of PC1 into a standard normal distribution.

Figure 3
figure 3

Eigenvalues of six principle components, PC1 to PC6 (red line), and percentage of each eigenvalue for the total variance (blue bars)

Figure 4
figure 4

Comparison of two histograms of (a) original and (b) normal score transformed PC1 values

Validation of Semi-variogram Model and Data Search Size

Because the six drill sites are distributed along an E–W line, it was impossible to detect anisotropic behavior of the semi-variogram in the horizontal direction. Accordingly, two experimental semi-variograms for the omnidirectional horizontal and vertical directions were produced and the spherical model was fitted to both the directions (Fig. 5a). The resulting semi-variogram models show geometrical anisotropy and derive ranges of 115 m and 79 m along the horizontal and vertical directions, respectively. Considering the ranges, the neighborhood search area for the three geostatistical methods was set to be an ellipse shape with sizes of 150 m in the horizontal direction and 100 m in the vertical direction. This search area was set to encompass at least the closest four data, following Isaaks and Srivastava (1989).

Figure 5
figure 5

(a) Experimental semi-variograms along omnidirectional horizontal (circles) and vertical (squares) directions and their fitting to the spherical model as shown by the curves. (b) Cross-plot between the true and predicted PC1 normal scores by ordinary kriging, showing the accuracy of the kriging calculation

To check the suitability of the semi-variogram models and size of the neighborhood search, cross-validation was implemented using OK. The results show a cross-plot between the true and predicted PC1 values at each sample data point, indicating adequate prediction accuracy with R = 0.87 (Fig. 5b), which demonstrates its suitability.

Comparison of Spatial Models

Three PC1 spatial models generated through the OK and e-type of SGSIM and TBSIM are compared using the same color scale in Figure 6 from three viewpoints: the distribution on the seafloor, perspective view from the southeast, and vertical E–W cross section along the black line in Figure 1a. The three models are similar and have common features with high-content (i.e., high-PC1) zones that extend around the sulfide mound and likely horizontal stratabound mineralization, consistent with the seismic profile interpretation in Figure 1b. The former shape appears as an upside-down ring and is likely a stockwork feature (Fig. 6d). However, smoothing effects appear in the OK model for which the value range is much narrower than those from the SGSIM and TBSIM models. The mineralization zones are difficult to be specified in the OK model. In addition, the high-content zone is narrow and the stockwork feature is not entirely clear in the SGSIM model. One remarkable feature in the TBSIM model is that the stratabound mineralization is disconnected at the inferred fault (dashed line). This suggests that the fault acts as a path of downward recharge flow of seawater from the seafloor or upward discharge flow of hydrothermal fluids.

Figure 6
figure 6

Comparison of spatial models of PC1 values using the same color scale from three views: distribution of the seafloor, perspective view from the southeast, and vertical cross section along the black line in Figure 1a. The models were produced by the geostatistical estimation method (a) OK and two simulation methods, (b) SGSIM, and (c) TBSIM. The average of 100 realizations is used as an e-type model of the two simulations. A schematic model of a seafloor massive sulfide system accompanying the sulfide mound is shown in (d) to illustrate the development of a stockwork under the sulfide mound and fluid flows (arrows) by compiling data from previous studies (Lydon 1988; Herzig and Hannington 1995; Ohmoto 1996; Tornos et al. 2015)

The simulation result quality is checked by comparing semi-variograms of each location with the semi-variogram model for both the horizontal and vertical directions (Fig. 7), as proposed by Jewbali and Dimitrakopoulo (2011). We use the normal score transformed PC1 data for this comparison. Ideally, the average of 100 semi-variograms should coincide with the semi-variogram model for both the directions, which proves an unbiased simulation result (Eze et al. 2018). Common to both the methods and directions, the variability of semi-variograms increases with separation distance. Owing to the paucity of sampled data particularly in the horizontal direction, the horizontal semi-variograms are largely variable at each realization and their averages are far from the semi-variogram models in both the SGSIM and TBSIM results (Fig. 7a, c). On the contrary, the average semi-variogram of the TBSIM results along the vertical direction approaches the semi-variogram model because of the substantially closer data intervals than in the horizontal direction. The TBSIM results show a more similar trend of the average semi-variogram to the model than the SGSIM results (Fig. 7b, d).

Fig. 7
figure 7

Semi-variograms of realizations along the omnidirectional horizontal and vertical directions by SGSIM ((a) and (b)) and TBSIM ((c) and (d)) shown as black curves. Red and yellow curves represent the spherical models in Figure 5a and semi-variogram averages, respectively

To check the calculation accuracy, R and the standard error of the three models are compared between the true and predicted PC1 values (Table 3). Although differences are small among the three models, TBSIM has the highest R and smallest standard error.

Table 3 Cross-validation results showing correlation coefficient (R) and standard error between true and predicted PC1 values by OK and e-types of SGSIM and TBSIM

Discussion

Seafloor hydrothermal systems are comprised of a heat source (underlying magma chamber), recharge zone, circulation cell, and discharge zone that emits white and black smoke (Robb 2004). Because the heat source and circulation cells are located at 2–8 km depth underneath the seafloor (Pirajno 2009), these features do not appear in geostatistical models constructed using the drill site data with a 180-m depth maximum. Instead, the models highlight mineralization features of a stockwork around the sulfide mound and of a stratabound feature on the eastern side of the mound.

The most plausible mechanism of this mineralization is mixing of hydrothermal fluids with cold ambient seawater in pore spaces in permeable strata (Shanks and Thurston 2012). Mineralization at two separated zones with different configurations and massive and stratified shape at several tens of meters below the seafloor was recently confirmed by a resistivity distribution from a deep-tow marine electric sounding at the Iheya North, middle Okinawa Trough (Ishizu et al. 2019). The geostatistical models are therefore geologically appropriate, and the combination of PCA, normal score transformation, and geostatistics for clarifying the mineralization features is effective.

As a comparison of the three geostatistical models, the mineralization shapes at the bottom of the sulfide mound and layer by OK are much smoother than those by SGSIM and TBSIM (Fig. 6a–c). This smoothing effect is revealed quantitatively by a comparison of the value range of the PC1 data, − 4.26 (minimum) to 6.73 (maximum). The ranges of the resultant three PC1 models are OK: − 2.95 to 1.88, SGSIM: − 2.54 to 4.95, and TBSIM: − 3.73 to 5.29. Accordingly, the TBSIM best follows the variability of the sample values even under sparse data conditions in the horizontal direction, whereas OK induces a strong smoothing effect.

The three models are compared in Figure 8 by selecting PC1 zones > 1 and focusing on the stockwork structure beneath the sulfide mound. The high-PC1 zones > 3 colored by orange and red show the suggested mineralization zones. The stockwork structure does not appear in the OK model (Fig. 8a), and the high-PC1 zones are limited around the mound and near the seafloor and not vertically continuous in the SGSIM model. In contrast, the TBSIM model is best fitted to the stockwork conceptual model (Fig. 6d) and low-resistivity distribution (Ishizu et al. 2019).

Figure 8
figure 8

Comparison of PC1 zones > 1.0 by (a) OK, (b) SGSIM, and (c) TBSIM using the same color scale to highlight the stockwork structure underneath the sulfide mound. These are parts of the 3D models in Figure 6

The superiority of TBSIM over SGSIM under these conditions can be explained by the difference in neighboring data used for the kriging calculation at a certain voxel. SGSIM is a sequential algorithm that adds simulated values to the sample dataset for subsequent calculations, which reduces the distance to nearest neighbor data with and consequently induces a smoothing effect owing to the decrease in kriging variance (Delfiner and Chilès 2012). In contrast, TBSIM can preserve the variability of sample by using the sample data only for the simulation and setting many turning lines (Gneiting 1999; Emery 2008).

The upward shift of semi-variograms of all TBSIM realizations along the horizontal direction (Fig. 7c) is caused by the particular sparsity of the data locations. In particular, large increases of the ranges from 115 m of the semi-variogram model to 300 m of the averaged semi-variogram seem inappropriate. This increase is owing to the layer structure of mineralization having such extent, whose plausibility is supported by the seismic profile in Figure 1d. The mismatch therefore does not necessarily signify a defect of TBSIM, as suggested by Emery (2004).

Conclusions

The aims of this study were to clarify the mineralization structure in an active seafloor vent area 1570 mbsl in the middle Okinawa Trough using 448 content data from only six drill sites. Under these particularly sparse data conditions along the horizontal direction, three geostatistical methods (one estimation, OK, and two simulations, SGSIM and TBSIM) were compared by selecting the content data of six elements: Zn, Pb, Cu, Ba, Ag, and Cd, as typical elements of seafloor massive sulfide (SMS) deposits. Because these elements are strongly correlated, PCA was adopted to decrease the data dimensionality and the content information was consequently found to be condensed in the first principal component (PC1). Through cross-validation between the true and predicted PC1 values by the three methods, TBSIM was identified as the best method for these data conditions by reducing the smoothing effect and reproducing the semi-variogram of the sampled data along the more densely data-distributed, vertical direction.

The most significant result obtained by the e-type of TBSIM was clarification of two mineralization zones with different configurations: a massive shape in the seafloor vicinity and a stratified shape at several tens of meters below the seafloor. These shapes are concordant with the resistivity distribution obtained by a deep-tow marine electric sounding. The massive shape similar to an upside-down ring is likely a stockwork. In addition, the stratabound mineralization feature also appears in the seismic profile. The most plausible mechanism of this mineralization is mixing of hydrothermal fluids with cold ambient seawater in pore spaces in permeable strata. Consequently, PCA and geostatistical simulations contribute to the interpretation and formation mechanism of SMS deposits and reserve assessment.

Our next step is geologic modeling using core description data and geostatistical simulations in combination with the content model to specify mineralized zones and their geologic features in more detail to determine their formation mechanism.