Introduction

The supervised machine-learning algorithms have become a helpful tool by providing an efficient way to better understand the complex processes in different aspects of earth sciences. Some examples can be given from the studies of seismology (e.g., DeVries et al. 2018; Corbi et al. 2019; Hulbert et al. 2019; Park et al. 2020), volcanology (e.g., Anzieta et al. 2019; Ren et al. 2020; Watson 2020; Witsil and Johnson 2020a, b), and petrology (e.g., Petrelli and Perugini 2016; Petrelli et al. 2020; Ouzounis and Papakostas 2021; Pignatelli and Piochi 2021; Valetich et al. 2021).

Tephrochronology is an interdisciplinary field where both proximal and distal tephra (i.e., ash in Greek) deposits are characterized and used as powerful remarks for various geological and environmental processes (e.g., Sarna-Wojcicki 2000; Lowe and Hunt 2001; Turney and Lowe 2002; Lowe 2011). Together with the recent analytical developments, geochemistry (glass, mineral, or whole-rock compositions) comes in possession of a widely used method in tephrochronology, especially when the geochronological data are missing or problematic. The conventional way of using geochemistry data for tephra correlation is mostly to draw a variety of binary diagrams including major oxides, trace elements, and their ratios. However, this can turn into a challenging task for the tephras of some geographic locations that have an identical geochemical affinity (e.g., South Aegean Active Volcanic Arc “SAAVA”; Francalanci et al. 2005). At this point, the multivariate statistical methods (e.g., principal component analysis) together with some of the machine-learning algorithms (e.g., support vector machine) serve as a more efficient and discriminating approach (Lowe et al. 2017) to not only better handle the geochemical datasets, but also better correlate the tephras using their compositions. Of those, the machine-learning applications have been significantly increasing for tephrochronology in recent years (e.g., Petrelli et al. 2017; Bolton et al. 2020).

In this study, we applied several machine-learning algorithms (e.g., random forest, gradient boosting decision tree) on a geochemical dataset representing 8 volcanic fields within the SAAVA to provide a case study for the usage of this approach in tephrochronology considering both known (controlling group) and unknown (test group) tephras around the eastern Mediterranean (Satow et al. 2015; Gençalioğlu-Kuşcu and Uslular 2018; Vakhrameeva et al. 2018). Our primary aim is to elucidate the performance of machine-learning on an easily handled geochemical dataset in comparison to widely used conventional binary plots that require various manually determined combinations and thereby high effort. Here we also discuss the pros and cons of the various machine-learning algorithms applied for an imbalanced compositional dataset to provide further insights into their usage in tephrochronology and related fields.

South Aegean active volcanic arc (SAAVA)

The volcanism along the SAAVA (Fig. 1) started at the lower Pliocene around the Sousaki in line with slab rollback of the Hellenic arc and continued until the present day with the historic activities of Methana, Milos, Santorini, and Nisyros volcanoes (e.g., Fytikas et al. 1984; Francalanci et al. 2005; Pe-Piper and Piper 2005; Francalanci and Zellmer 2019; Vougioukalakis et al. 2019). The western parts of the SAAVA are mostly represented by small volume monogenetic volcanism, while the central and eastern parts consist predominantly of composite volcanoes (e.g., Santorini and Nisyros calderas; Francalanci et al. 2005). The volcanics within the SAAVA display typical arc-related geochemical compositions mostly characterized by the calc-alkaline to high-potassium calc-alkaline affinity (e.g., Francalanci et al. 2005; Pe-Piper and Piper 2005) (Table 1) together with the influences of Aegean slab tear through the eastern to central parts of the arc (Klaver et al. 2016). The volcanological history of each volcanic sector is rather complex and the details are beyond the scope of our study. Hence the readers can refer to the comprehensive reviews in the literature (Innocenti et al. 1982; Fytikas and Vougioukalakis 2005; Francalanci and Zellmer 2019; Vougioukalakis et al. 2019) for further detailed information.

Fig. 1
figure 1

Digital elevation model (15 arc-second global relief, SRTM15 + V2.1) displaying the volcanic fields along the South Aegean Active Volcanic Arc (SAAVA) used in this study for the application of machine-learning algorithms. Trench locations are from Jongsma (1977). The map was created using the PyGMT tool (Uieda et al. 2021)

Table 1 General characteristics of volcanic fields around the SAAVA

The products of explosive volcanism (aka tephra) along the SAAVA have been one of the important research aspects for both the understanding of volcanological evolution, the future risk assessment, and the paleoenvironmental/climatological construction of the eastern Mediterranean (e.g., Hamann et al. 2010; D'Antonio et al. 2016; Koutrouli et al. 2018; Wulf et al. 2018; Vakhrameeva et al. 2021). The distal tephras of Santorini (e.g., Minoan and Cape Riva), for example, have a widespread distribution found in the lake cores (e.g., Pearce et al. 2002) and the terrestrial settings of western Anatolia (Sulpizio et al. 2013) and marine cores of Aegean, Marmara, and even the Black Sea (e.g., Guichard et al. 1993; Wulf et al. 2002; Aksu et al. 2008; Satow et al. 2015). Nisyros volcano is another example that has mid-distal tephra records within the surrounding regions (e.g., Datça peninsula, Gençalioğlu-Kuşcu and Uslular 2018; Tilos island, Keller et al. 1990; Sterba et al. 2011). However, the distal tephra record of other volcanic fields along the SAAVA is almost absent and hence any evidence of non-correlated (or unknown) tephra layers within the eastern Mediterranean (documented and/or to be correlated) (e.g., Satow et al. 2015; Korkmaz et al. 2018; Vakhrameeva et al. 2018) needs some extra attention for the sake of better volcanological and paleoenvironmental reconstruction models.

Methodology

Source dataset

A substantial part of the geochemical dataset including the whole-rock (n = 1656) and glass (n = 1092) compositions of Plio-Quaternary volcanics along the SAAVA was compiled from the GEOROC (Geochemistry of Rocks of the Oceans and Continents) database (Fig. 1; Supplementary Data S1). Data from some unpublished studies were also included (e.g., Bohla 1986; Rehren 1988). Here, the main idea of compiling all geochemical data including both lava flows and pyroclastics was that they could represent the main geochemical characteristics of each volcanic field. Otherwise, only available data for the tephra around the studied volcanic fields would not be enough for the application of machine-learning algorithms. A similar approach was followed by the relevant studies in the literature (e.g., Petrelli et al. 2017).

We classified the compiled geochemical dataset into 8 groups based on the spatial distribution of the volcanic fields (Table 1) and filtered the dataset by selecting the major oxides (SiO2, TiO2, FeOT, MgO, CaO, Al2O3, K2O, Na2O, MnO, and P2O5 in weight percentage–wt.%) and selected trace elements (Zr, Ba, Sr, Rb, Nb, La, and Ce in parts per million–ppm). The main reason for such a selection of trace elements was that these are the default elements in the whole-rock analysis (i.e., X-ray fluorescence–XRF and inductively coupled plasma mass spectrometry–ICP-MS) and the most common in glass (i.e., shard or inclusion) geochemistry analysis (e.g., electron microprobe–EPMA and laser ablation inductively coupled plasma mass spectrometry–LA-ICP-MS). Although our approach in selecting the distinct trace elements resulted in an optimum number of data (especially for trace element values), there were still some missing data in the dataset (see the descriptive statistics in the Supplementary Data S1). Thus, we applied one of the data imputing methods by replacing the blank parts with zero as the machine-learning algorithms require numerical inputs. Here, we did not calculate the mean/median values of the column to replace them with the missing values as the trace element compositions of different volcanic fields in the SAAVA can be unique for each specific volcanic field, and hence any generalization may create a bias for further interpretations. The data with a sum of the major oxides below 95 wt.% and high loss on ignition values (LOI > 5 wt.%) were removed from the dataset. In addition, we did not perform any further filtering to the dataset (e.g., age constraint).

Modelling

Preprocessing and experiments

Before the training step, the possible duplicates in the dataset were removed by comparing all the features with one another (Fig. 2). We then used the RobustScaler tool of the Scikit-learn Python library (Pedregosa et al. 2011; Kramer 2016) to scale the data based on the quantile range. The values outside of the 0.95-quantile (i.e., outliers) were extracted since they could potentially affect the performance of the model or corrupt the measurements (Fig. 2). The Box-Cox transformation (Box and Cox 1964) was applied to the data (Fig. 2). We performed feature engineering and did selection on the data using the SMOTE (synthetic minority oversampling technique; Chawla et al. 2002) since the dataset can be considered as imbalanced (Table 1; Supplementary Data S1).

Fig. 2
figure 2

Flowchart of machine-learning processing performed in this study

Hyper-parameter tuning is a common process in machine learning used to maximize the algorithm’s performance (e.g., Bardenet et al. 2013). The hyper-parameters can parameterize the learning algorithms that construct a training model with a given dataset (e.g., Claesen et al. 2014). To obtain the best hyper-parameters that yield the optimal model, we here employed the RandomizedSearchCV (Bergstra and Bengio 2012), which is better for the high-dimensional datasets with a larger extend in grid search (Paper 2020). The optimized tuning configurations for each algorithm can be found in our Python codes (https://github.com/guslular/ML_for_tephrochronology.git).

Classifiers

We implemented 10 different learning algorithms (Fig. 2 and Table 2) using the optimum tuning configurations in the Scikit-learn Python library (Pedregosa et al. 2011; Kramer 2016). These are Support Vector Machine (SVM) with the both probabilistic and non-probabilistic (raw) model (Cortes and Vapnik 1995; Li et al. 2010), Random Forest (RF; Breiman 2001), k-Nearest Neighbors (KNN; Laaksonen and Oja 1996), Naïve Bayes (Complement NB; Rennie et al. 2003), Artificial Neural Network (ANN, multi-layer perceptron; e.g., Gardner and Dorling 1998), Linear Discriminant Analysis (LDA; e.g., Balakrishnama and Ganapathiraju 1998; Izenman 2013), XGBoost (eXtreme Gradient Boosting; Chen and Guestrin 2016), LightGBM (Light Gradient Boosting Machine; Ke et al. 2017), CatBoost (Category Gradient Boosting; Prokhorenkova et al. 2017), and Voting Classifier (VC; the ensemble of XGBoost, LightGBM, and CatBoost) (Table 2). Most of the algorithms are all tree-based ensemble classifiers (except for the SVM) consisting of both averaging (e.g., Voting Classifier) and boosting (e.g., CatBoost) methods, which are widely considered as the most efficient classifiers for the tabular data due to their higher performance even in more complex algorithms (e.g., Dietterich 2000). However, the gradient boosting algorithms (e.g., XGBoost) were not much preferred in the literature of machine-learning applications on tephrochronology due to their rather long computational time (Bolton et al. 2020). The further details related to the algorithms are beyond the scope of this study and hence can be found in Python scikit-learn documentation (Pedregosa et al. 2011; Garreta and Moncecchi 2013) and the well-established literature.

Table 2 Summary table for the machine learning algorithms (classifiers) used in this study

Evaluation

We assessed the trained models with both accuracy (e.g., the accuracy score, Compute Area Under the Receiver Operating Characteristic Curve, ROC-AUC) and the Precision-Recall (F1 macro and weighted scores) metrics (Table 3). We also calculated the Cohen’s kappa (Cohen 1960) values (Table 3), which are used to express the degree of agreement between the algorithms and the training dataset (Altman 1990). For all metrics, the models were cross-validated with 10-folds of stratified splits (all contain 10% of total samples from each group) (Fig. 2). The goal of cross-validation was to evaluate and see how the model was generalized to unseen data. As the different splits can vary in the results, this might introduce bias to the predictions. The cross-validation divides the data into 10 equal parts and allows the model to train on all splits except one, then evaluates the model on this split. In order not to bring any bias to the splits and to use all available data, this method was repeated n-times, and the average was considered (https://github.com/guslular/ML_for_tephrochronology.git).

Table 3 The mean accuracy scores and kappa values for each algorithm used in this study

We also implemented a feature sensitivity analysis to understand the impacts of used major oxides and trace elements on our machine-learning model. We used the “shap” function in the Scikit-learn library, which is based on the Tree SHapley Additive exPlanations (SHAP) method (Lundberg and Lee 2017). This function allows a fast and exact computation of SHAP values, especially for the gradient boosting algorithms (e.g., XGBoost), owing to excluding sampling and background datasets (Lundberg and Lee 2017).

Test dataset

We used the model to predict the original test dataset (not used in the training model) consisting of the whole-rock and glass geochemistry data of known and unknown Quaternary tephra (n = 439) from the eastern Mediterranean (Supplementary Data S2). The example of known tephra was selected as a controlling group from Datça peninsula (southwestern Anatolia, Turkey) where the distal deposits of the Nisyros Kyra unit (133.5 ± 3.4 ka; U-Th/He, Ar-Ar) were documented (Gençalioğlu-Kuşcu and Uslular 2018; Gençalioğlu-Kuşcu et al. 2020). Here, the idea was to validate our machine-learning model that will be used for the prediction of unknown tephras (test group), which were chosen from the studies of Satow et al. (2015) and Vakhrameeva et al. (2018). In addition to the Kyra tephra as a controlling group, we also used the known tephra samples from the aforementioned studies.

Results and discussion

Classification performance

The mean training accuracy of each algorithm used in this study was above 0.89 (except for the NB and the LDA) based on the results of both accuracy and Precision-Recall metrics (Table 3; Figs. 3a-c). The SVM performed the least accurate training results (0.89–0.97) among the other algorithms (e.g., RF, XGBoost) that have relatively higher accuracy scores (0.93–0.99; Table 3). In addition, the accuracy values of KNN and ANN have higher ranges revealed by their larger quartile intervals in the box plots (Figs. 3a-c). The mean Cohen’s kappa values for each algorithm were above 0.80, again except for the NB and LDA (Fig. 3d). Similar to the results of the accuracy metrics, the kappa values were the lowest in the SVM (0.80) and the highest in the gradient boosting algorithms (0.89; Fig. 3d) that correspond to the scale of “very good” agreement (Altman 1990) with the training dataset. The processing times (in seconds) of each algorithm for the training and testing models were also listed in Table 3, revealing that the computational cost of these algorithms is rather reasonable.

Fig. 3
figure 3

Box and whisker plots of model performance (mean values) on cross-validations a. Accuracy score; b. F1 score (average weighted); c. Compute Area Under the Receiver Operating Characteristic Curve (ROC-AUC) score; d. The Cohen’s kappa values

The RF, which is also known as a reliable probabilistic algorithm especially for the compositional data (Bolton et al. 2020 and references therein), and the gradient boosting algorithms with their average ensemble (VC) provided the best accuracy results and hence will be considered in further interpretations throughout the manuscript. Other than NB and LDA, the SVM (both probabilistic and non-probabilistic) has the lowest accuracy and kappa values among the algorithms (Fig. 3 and Table 3). There are different claims on the accuracy/performance of the different versions of the SVM algorithm for the labeled compositional data within the literature (e.g., Petrelli and Perugini 2016; Petrelli et al. 2017; Bolton et al. 2020). The non-probabilistic model of this algorithm was successfully performed to predict the possible tectonic regimes of the volcanic fields (Petrelli and Perugini 2016), or to discriminate the volcanic fields within the same province using the major, trace, and isotope geochemistry data (Petrelli et al. 2017; Ouzounis and Papakostas 2021). However, its probabilistic model provided the poorest performance in the source correlation of some Alaskan tephras (Bolton et al. 2020). The latter study, which applied several machine learning algorithms using the R package “caret” (Kuhn et al. 2020), highlighted this discrepancy and indicated the problematic parts of this algorithm in the probabilistic model, especially for the multi-class datasets as in the case of our study. However, we could not detect any difference between different versions of the SVM implemented using the Python Scikit-learn library. This might be related to the possible distinctions in the machine learning libraries of different programming languages (i.e., R and Python) or the optimization configurations. However, it is at least clear that the performance of the SVM algorithm is relatively lower than the RF and the gradient boosting algorithms (Table 3; Fig. 3). In addition, as stated by Bolton et al. (2020), we here like to re-express the imbalanced behavior of geochemical datasets in terms of both data distributions among the volcanic fields and also within the data itself (i.e., the abundance of major element data compared to the trace elements, especially for the glass geochemistry).

Classification scores of the machine learning algorithms applied on a geochemical dataset representing 8 volcanic fields along the SAAVA are shown in Fig. 4. The ANN with the optimum final architecture (100 hidden layers) failed in the training model and predicted all groups as either Nisyros or Santorini that have larger datasets (Fig. 4). However, other algorithms provided successful predictions in the confusion matrices (Fig. 4). The classification scores were above 0.80 for half of the volcanic fields that can be explained mostly by the imbalanced behavior of the data (Table 1) together with the geochemical similarities between some groups (e.g., the significant portions of Yali volcanics were predicted as Nisyros). Santorini, Nisyros, Kos, and Antiparos were the well-trained volcanic fields predicted by most of the algorithms (Fig. 4). Of these, Santorini and Nisyros that have the largest number of data in our dataset have also the highest classification scores (> 0.94; Fig. 4). This also highlighted the importance of the total amount of data in a geochemical dataset for the successful application of machine learning algorithms. In addition to these volcanic fields, Milos and Methana have also higher classification scores in some algorithms (up to 0.76 and 0.84, respectively) (Fig. 4).

Fig. 4
figure 4

Classification scores of machine-learning algorithms applied for the SAAVA volcanics. Algorithms from upper left to the lower right: ANN, KNN, SVM (probability), RF, CatBoost, LightGBM; XGBoost, and Voting Classifier

On the other hand, we applied a feature sensitivity analysis to one of the gradient boosting algorithms that gave the best results (i.e., XGBoost; Fig. 5). The summary box plot indicates that the Sr seems to be the most prominent element (except for Antiparos) affecting the predictions in all volcanic fields (especially Methana and Santorini, Fig. 5). Furthermore, the major/minor elements have a dominant impact on the machine-learning models. The possible explanations would be either the higher numbers of major-element data compared to the trace elements in our compiled dataset, or any petrological implications (e.g., magma affinity). For example, the K2O contents are especially important for the Antiparos and Kos volcanic fields (Fig. 5) that could not be definitely discriminated in our predictions (Fig. 4).

Fig. 5
figure 5

Summary box plots of feature sensitivity analysis applied on XGBoost algorithm

The main problem in the training model was the effect of larger datasets (i.e., Santorini and Nisyros) on the classification of other volcanic fields (Fig. 4). That is especially obvious in the ANN algorithm in which all the groups were correlated either by Nisyros or Santorini (Fig. 4). In addition, there are some strong similarities between the geochemical affinities of volcanic fields, such as between Nisyros and Yali (Fig. 4), which was also stated by a petrology-oriented study (Popa et al. 2019). This might show that the machine learning algorithms could help us to determine such geochemical similarities among the volcanic fields (if exist) in a more efficient and less time-consuming way than the manually created geochemical diagrams.

The source predictions of unknown Tephras: Conventional vs. machine learning approach

We report an example of binary discrimination plots used for the correlation of known and unknown tephras from the eastern Mediterranean (Satow et al. 2015; Gençalioğlu-Kuşcu and Uslular 2018; Vakhrameeva et al. 2018) with the volcanic fields in our dataset in Fig. 6. These diagrams can be varied using the different combinations of major oxides and trace elements, but the most common ones in our dataset were illustrated in Fig. 6. More detailed information can be found in the original studies.

Fig. 6
figure 6

Conventional binary geochemical plots generated by the compiled geochemical datasets corresponding to 8 volcanic fields of the SAAVA (Supplementary Data S1). a Gençalioğlu-Kuşcu and Uslular 2018; b Vakhrameeva et al. 2018; c Satow et al. 2015

Tephra-fall deposits found in Datça peninsula and correlated with the Nisyros Kyra unit (Gençalioğlu-Kuşcu and Uslular 2018; Gençalioğlu-Kuşcu et al. 2020) mostly plot with the Nisyros and Yali samples in Fig. 6. As the age (133.5 ± 3.4 ka) and other characteristics (e.g., depositional, glass and mineral chemistry) of the Kyra tephra are well documented, it is relatively easy to link these with the Kyra eruptions of Nisyros volcano occurred before Yali volcanism (e.g., upper pumice unit, 45 ± 10 ka, Guillong et al. 2014). In our machine learning model (Table 4 and Supplementary Data S2), we could predict the Datça distal tephras as Nisyros tephra with the higher scores of various algorithms (65 to 100%).

Table 4 The selected machine learning-based estimations of volcanic sources for the known and the unknown tephras in the Aegean region

On the other hand, it is notable that the significant amounts of tephra samples found in the sediment cores around the SE Aegean Sea (Satow et al. 2015) are mainly correlated with Santorini (Fig. 6). However, it is a challenging task to ensure that they are only correlated with the Santorini as there are some similarities with other volcanic fields in some diagrams (e.g., Kos, Milos, and Antiparos). The ones that were already correlated with Santorini (Satow et al. 2015) were also predicted as Santorini in our machine learning model with the higher scores ranging from 68% to 100% (Table 4; Supplementary Data S2). Here, except for the RF algorithm, other higher accuracy algorithms have scores greater than 90% (up to 100%; Supplementary Data S2). Our machine learning model can suggest some supporting corrections for the distal tephras, which were correlated using the conventional geochemical plots. For example, Satow et al. (2015) claimed that the tephra layer of LC21–2005 is mostly correlated with the Santorini, except for three samples that do not resemble within the geochemical plots. However, the results of our machine learning model revealed that all the samples belonging to this tephra are well correlated with the Santorini (Table 4; Supplementary Data S2). A similar example can be given for the following tephra layers named LC21–3225 and LC21–3775. The former is mostly correlated with Santorini as suggested by Satow et al. (2015) apart from one sample that did not plot together with others. Here, we suggest that there might be more samples that are not correlated with Santorini, and Nisyros would be a candidate for these tephras based on our machine learning model (Supplementary Data S2). Satow et al. (2015) provided an age interval for this sample between 21.7 ± 0.6 ka and 21.8 ± 0.6 ka that might result in the reconsideration of the available ages of the youngest Nisyros tephra (upper pumice, < 70 ± 24 ka; Guillong et al. 2014). As for the sample of LC21–3775 that could not be correlated with any volcanic source by Satow et al. (2015), we here proclaim that Santorini can be the best candidate for the volcanic source of this tephra layer based on our model (Supplementary Data S2).

Besides, Satow et al. (2015) could not exactly correlate some of the tephras (e.g., LC21–12625, LC21–13485 with the age of >128–121 ka) and marked them with a question mark (i.e., Kos- Nisyros-Yali?) in their study (Supplementary Data S2) even after using various geochemical combinations and other proxies (e.g., geochronological data). Despite the relatively lower performance scores compared to the predictions for Santorini, the source predictions of these uncorrelated tephras obtained by our model seem to be promising (Table 4; Supplementary Data S2). The five higher accuracy algorithms with various scores mostly address either Kos or Antiparos as a possible volcanic source (Table 4; Supplementary Data S2). However, since the last known volcanic activity in Antiparos was around the Pliocene (Innocenti et al. 1982; Hannappel and Reischmann 2005), these tephras could be correlated with the Kos Plateau Tuff (or Kos ignimbrite), which is one of the most voluminous ignimbrite deposits in the region with the age of 160–165 ka (Smith et al. 1996; Bachmann et al. 2010).

There are also some unknown tephras/cryptotephras together with those that have mostly correlated with the Santorini in the sediment core around Tenaghi Philippon (SE Balkan Peninsula; Vakhrameeva et al. 2018). The conventional binary plots also indicated that some of the tephras were correlated with the Santorini samples (Fig. 6). However, some tephras could not be correlated with any volcanic field around the Aegean arc (e.g., POP4 and POP5). or example, POP4 was geochemically correlated with either Kos or Milos (Vakhrameeva et al. 2018). However, our machine learning model predicted the volcanic source of this tephra as either Kos (up to 47%) or Yali (up to 100%). Here, two important geological facts falsify our prediction of Yali for this tephra dated as 358 ka: the first one is that the age of Yali volcanism significantly postdated the formation of this tephra; and the second one is that our machine learning model did not successfully train the Yali volcanics (Fig. 4) due to their similarities with some of the Nisyros volcanics, which have been stated in the literature as well (e.g., Popa et al. 2019). Therefore, we suggest that the older volcanics on Kos Island (i.e., Kefalos Tuff; Dalabakis and Vougioukalakis 1993) would be the best candidate for the possible volcanic source of POP4 tephra. Certainly, this claim desires some further analysis (e.g., mineral chemistry, etc.) and also the integration of different volcanic fields from the eastern Mediterranean (e.g., central Italy, central Anatolia) into the machine learning models to provide a better constraint on this prediction.

Tephra layers called POP5 could also not be correlated with the volcanic fields around the eastern Mediterranean even after employing various geochemical plots with the datasets of different volcanic fields (e.g., central Anatolia). Vakhrameeva et al. (2018) stated that an unknown eruption of Campanian volcanoes could be the possible candidate for these so-called unknown tephras mostly based on their identical age data (> 128–121 ka). Considering that the Campanian or other volcanoes around the eastern Mediterranean were not included in our machine learning model, we correlated POP 5 tephras of Vakhrameeva et al. (2018) mostly with the Kos (up to 77%), Milos (up to 98%), and Yali (up to 93%) (Table 4, Supplementary Data S2). As mentioned above, the temporal evolution of Yali volcanism is not appropriate for these tephras that have ages ranging from 419 ka to 438 ka (Vakhrameeva et al. 2018). Therefore, we here assert that Kos (i.e., Kefalos Tuff; Dalabakis and Vougioukalakis 1993) and Milos would be the best candidates for the possible volcanic sources of these tephras (Supplementary Data S2).

Concluding remarks

Our recent findings in this study enabled us to conclude that:

  • The Random Forest together with the gradient boosting algorithms (e.g., XGBoost, LightGBM) provided the best performance for the imbalanced geochemical dataset, while the Naïve Bayes, Linear Discriminant Analysis, Support Vector Machine, and the Artificial Neural Network were the poorest performing algorithms.

  • According to the feature sensitivity analysis on XGBoost algorithm, the Sr and some major elements (e.g., K2O, MnO) have higher impacts on the prediction models that can be linked with the petrological characteristics of the volcanic fields. However, this needs some further investigations for a better clarification.

  • Despite the recent developments and various configurations of the machine learning algorithms, the imbalanced behavior of a dataset, which is a common problem in earth sciences, is still one of the important debugs for successful training and testing models.

  • The testing model was successful to predict the known or unknown tephras when they corresponded to the volcanic field that has larger data in our dataset (e.g., Nisyros, Santorini). However, there is no clear difference between the conventional plotting and the machine learning application for the source prediction of unknown tephras that possibly correlate with the volcanic fields including relatively less amount of geochemical data.

  • The idea of geochemical similarity between some volcanics of Yali and Nisyros was also supported by our machine learning model. Despite a need for further analyses and corrections, we here suggested possible volcanic sources for POP4 (as Kefalos tuff) and POP5 (as Kos or Milos) samples of Vakhrameeva et al. (2018) and also for some samples (e.g., LC21–12625, LC21–13485) of Satow et al. (2015), which are correlated mostly with the Kos ignimbrites.

  • Our freely accessible Python code established in this study together with the extensible geochemical dataset to implement the machine learning algorithms can be easily used for further tephrochronology studies around the eastern Mediterranean as a regular tool to predict or at least to decrease the number of possible volcanic sources before elaborating the common correlation methods.

  • The machine learning applications in tephra correlation can be as yet considered as a fast and helpful tool (not a decision-maker) that eliminates the possible candidates and suggests a quantitatively best correlation. Here, there is still a need for an expert opinion to decide the possible volcanic sources for the unknown tephras considering the available geological and geochronological data. The accuracy of such applications will possibly be improved in the future together with the integration of region-based current tephra databases (e.g., Lowe et al. 2015; Tomlinson et al. 2015), the increasing amount of geochemical and interrelated datasets (e.g., mineral chemistry, geochronology), and the recent analytical and statistical developments on the tephra correlations (e.g., Lowe et al. 2017).