Keywords

1 Introduction

Nowadays, the use of remotely sensed spectral data has become a very powerful and popular technique for geological mapping, specifically in arid and semi-arid areas [1,2,3,4], due to its advantages in terms of cost efficiency, accuracy and time consuming in discriminating lithological units automatically over vast regions.

In the data side, the development of multi-spectral remote sensing technology has revolutionized the techniques to extract information about earth’s surface [5]. Optical remote sensing imagery, including both spaceborne and airborne sensors, differs in spectral, spatial and temporal resolutions. Since the selection of suitable earth observation (EO) datasets is considered as the first essential step for a successful image classification [6,7,8], the Sentinel-2 multispectral imager (MSI) developed by the European Space Agency (ESA) have shown a great potential for lithological mapping and mineral exploration in last decades, due to its high spatial and spectral resolutions compared to Landsat and SPOT sensors datasets especially in the VNIR region [9, 10].

Previous studies evaluate the impact of remotely sensed data and the fusion of SAR and optical datasets on lithological mapping [11], otherwise the selection of a suitable digital image classification is also a fundamental process to produce and update geological maps by relating pixel values to lithological units present on earth surface. Most of the approaches used to produce lithological map of a region use either pixel-based image analysis (PBIA) [12, 13] or Geographic Object-Based image analysis (GEOBIA), also termed as Object-Based image analysis (OBIA) [14].

The most common approach utilized for this target is PBIA approach, and it consists on analysing and distinguishing the closest match between spectral information of each pixel and the single ground class apart from [15] without examining the contextual, textural and spatial properties associated to the pixel of interest [16]. A wide range of classification methods has been applied for this purpose; it can be categorized by its statistical underlying assumptions (e.g., parametric vs. non-parametric), the way in which elements are classified (i.e., per-pixel, and subpixel), or the requirement of collecting representative endmember samples (e.g., supervised vs. unsupervised) [17]. The use of machine learning algorithms (MLAs) is influenced by many factors, including the selection of the right training samples, the choice of the ideal features and optimization of training parameters [18], which leads per-pixels classification algorithms more challenging [19, 20].

Numerous geoscientific studies have used PBIA MLAs especially in lithological mapping and mineral exploration [17, 21].

In contrast to PBIA classification methods that assign a class directly to individual pixels and cause problems associated with heterogeneity of earth surface and solar illumination angle that occur some drawbacks such as “salt and pepper” noise and topographic unfavourable effects [22] especially with high resolution images such as Sentinel-2 image with 10 m spatial resolution, GEOBIA approach has been developed to improve the deficiency of PBIA by introducing in addition to spectral characteristics, the spatial textural properties such as texture, shape, colour, size and association between the neighbouring objects [23], and these by using an additional critical stage in the classification process of this approach which is the multiresolution segmentation technique that aggregate like-pixels into homogeneous meaningful objects with similar spectral, textural and spatial information, and then assign the category of each feature by using classifiers, in this study the standard nearest neighbour classifier (SNN) was applied.

Several researches have applied GEOBIA approach for a variety of applications, including land use land cover mapping (LULC) [24, 25], lithological mapping [26], change detection [27], landform mapping [28], urban mapping [29], crop and vegetation classification [30, 31], with many studies demonstrated that GEOBIA approach produced a higher thematic classification accuracies than the traditional PBIA approaches [32, 33].

This study has been structured into two parts, the first consist on evaluating the performance of the supervised non-parametric PBIA machine learning algorithms, including Random Forest (RF) and k-Nearest Neighbour (k-NN), while the second part provides a more complete evaluation of GEOBIA and PBIA classification approaches for lithological mapping in the southern part of Skhour Rehamna, situated in the western Moroccan Meseta, using Sentinel-2 imagery.

2 Location and Geological Settings of the Study Area

Skhour Rehamna is an inlier of the Paleozoic and Paleoproterozoic basement that forms the Hercynian Rehamna massif (Central Morocco) to the north and the Jebilet to the south. In the division of the Hercynian chain of Morocco, this region belongs to the western Moroccan Meseta, where erosion dissects the sub-tabular Cretaceous-Eocene cover of the Gantour Plateau, more precisely located on approximately 100 km from Marrakech, crossed from north to south by the A7 highway and the No. 9 principle road linking Casablanca to Marrakech [34].

The focus of this research is a region along the southern of the Paleozoic massif of Skhour Rehamna that lies between the meridians 7°54′55″ and 7°43′50″ west and the parallels 32°22′30″ and 32°14′39″ north, as highlighted in Fig. 1 below, in order to analyse more precisely the results obtained.

Fig. 1
figure 1

modified by Michard et al. 2010)

Location of the southern of Skhour Rehamna (Google Earth, resolution 0.5 m) on the map of the geological domains of Morocco (

The study area (Fig. 2) is made up of stacked mica schist formations attributed to the Devonian (the Unit of Ouled Hassine) [35] that correspond to a pelitic series with six intercalations of quartzites and Metabasite and to the Paleoproterozoic (Lalla Tittaf Formation) [36], which contain metapelites and semipelites with intercalations of metabasites, orthogneiss, calcschists and marbles between the two lies the unit of Dalaat el Kahlat, which the age remains unknown [34]. The small granitic intrusions of Ras el Abiod are arenized from Pliovi lafranchien and expressed at the surface as a large area of thermal metamorphism. The Maastrichtian is directly transgressive on the mica schists and the Permian in the southern part of the map region, creating a cuesta clearly dominating the Paleozoic inlier. This is the plateau where the phosphates of Benguerir are mined [34].

Fig. 2
figure 2

Geological map of the study area (realized by the group BRGM-CID) published in 2004

3 Materials and Methods

3.1 EO Datasets Properties and Pre-processing

The satellite imagery source used in this study is Sentinel-2A product carry on board multispectral imaging instruments (MSI) with 13 wide-swaths spectral bands in the visible near infrared (VNIR) and short-wave infrared (SWIR) [37] and high to moderate spatial resolution ranging from 10 to 60 m [38, 39]. The VNIR spectral bands have a spatial resolution of 10 m which makes this product involve the potential for detailed exploration of earth surface, the infra-red bands have 20 m, and the three atmospheric corrections have 60 m spatial resolution [40].

In the following study we opted Sentinel-2A (Level 1C) imagery acquired on 29 October 2017. In order to achieve the level desired by the user, Sentinel-2 MSI products undergo multiple stages of processing; for this purpose the ESA Sen2Cor plugin available on the Sentinel Application Platform (SNAP) [37] was used to process reflectance image bands from Level 1C Top of Atmosphere (TOA) product, to Level 2A Bottom of Atmosphere (BOA) Sentinel-2 imagery, by applying Terrain and atmospheric corrections. Due to the low spatial resolution (60 m) and the sensitivity to the clouds and aerosol, spectral bands 1, 9 and 10 were omitted in this research. The remained bands with spatial resolution of 20 m (5, 6, 7, 8a, 11, 12) were cubically resampled to 10*10m2 spatial resolution to reach the same resolution as VNIR bands (2, 3, 4 and 8). Finally, all the bands were re-projected to the UTM (Universal Transverse Mercator projection) WGS84 in zone 29 N coordinate system.

3.2 Methodology

At a time when many innovative classification approaches were already produced, the Sentinel-2 satellite was launched. These approaches are based on pixels [41, 42] and objects [14, 43, 44]. To find the optimal method for the classification assessment of lithological units in the selected region using Sentinel-2 imagery, two typical machine learning algorithms, particularly RF and k-NN, were commonly applied and compared to GEOBIA approach. For the purpose of ensuring more diagnostic spectral features of the exposed rock units, numerous neo-bands extracted from Eigen-space-based algorithms in particular, the Minimum noise fraction (MNF) and the principal component analysis (PCA) were layer-stacked to Sentinel-2 spectral bands.

An outline of the methodology used in this study is demonstrated in the flow diagram (Fig. 3). However, the following sections described the data processing details, classification techniques applied in this study and subsequent statistical evaluations.

Fig. 3
figure 3

Workflow of the methodology applied in this study

Spectral Features Analysis. In general, multispectral limited channels provide a collection of mixed-pixels representing undistinguishable ground features [45,46,47]. Therefore, this challenge is overcome through dimensionality reduction of MSI bands using principal component analysis (PCA) and Minimum noise fraction (MNF) [48].

Principal Component Analysis (PCA). This transformation is a multivariate statistical and data reduction procedure, commonly employed for geological mapping [49,50,51,52,53]. In order to highlight and enhance spectral information related to specific rock unit [54], PCA can be applied to MSI datasets by transforming the original and high dimensional set of features to an uncorrelated lower dimension output bands through the calculation of covariance matrix, eigenvector and eigenvalue pairs as well as data orthogonal projection [55]. The dimensionality of the datasets is reduced by eliminating redundant data by extracting maximum information, and the first and the second PCs include the majority percentage of the scene variance in the data and succeeding component bands with a decreasing percentage of the variance [22]. Hence, we have selected three PCs in this research (PC1, PC2 and PC6) to generate a colour composite map (Fig. 4) that enable to better discriminate the lithological units and trace training polygons for the classification approaches used in this study.

Fig. 4
figure 4

Colour composite of the PCs 1, 2, and 6 of the Sentinel2 imagery

Minimum Noise Fraction (MNF). In order to reduce the residual noise of reflectance images and showcase homogeneous surfaces, Minimum Noise Fraction (MNF) technique [56, 57] was carried out. It is a wildly known Eigenvector procedure for multispectral and hyperspectral image, based on covariance structure of imagery noise. This algorithm consists of two successive PCA rotation which transforms data containing spectral distortion into new components sorted by image quality, with regularly increasing noise levels. The first one accounts for the covariance matrix to estimate the noise in the data in order to decorrelate and resize the noise, and the second rotation is based on a standard PCA transform to create several components that contain noise-whitened data. This results in denoising and identifying the components to keep those with useful information [58].

A visual study of the first three components from the MNF (containing more than 99% of the total information) allows discrimination between different surfaces in the study area (Fig. 5).

Fig. 5
figure 5

Colour composite of the MNF bands 1, 2 and 3 of the Sentinel2 image

PBIA Lithological Mapping. One of the most traditional classification methods used for Sentinel-2 imagery is the pixel-based MLAs, which allocate any pixel to a specific category, taking into account the spectral characteristics of the training samples that group a set of pixels representing the same class [59], the thing that makes selecting suitable training sample as the one of the most crucial step of PBIA classification approaches. The literature shows that among the MLAs used for classifying the lithological units using multispectral datasets, RF and k-NN are the most common MLAs applied for this purpose [60, 61].

Random Forest (RF). The first MLA implemented in this study is the Random Forest classifier (RF) developed by Breiman [62] and applied for remote sensing image classification by Pal [63]. It is a supervised non-parametric classification algorithm, which provides a group of tree classifiers that choose the majority vote class to assign a label for each pixel to be classified based on the partition of the results from multiple decision trees (DT). Randomness is introduced by randomly requiring a predefined number of characteristic parameters (mtry) and the input variables for each decision tree (ntree), by setting the input variables for splitting at each node in the DT and bagging; the latter technique, also known as bootstrap aggregation, is used to select training samples available for every tree [62]. Thus, each tree in the forest votes for the final classes produced by the forest.

RF performs greater than the other MLAs using numerous techniques such as bagging and boosting [64]. As RF is sensitive to the training samples, their spatial dispersion must be increased to improve classification results.

Furthermore, several studies have even proven to achieve optimal accuracies and vital lithological maps using RF algorithm compared to other MLAs like, Naive Bayes, k-Nearest Neighbours and Artificial Neural Networks [60], support vector machine [65, 66].

k-Nearest Neighbour (k-NN). The second algorithm applied in this research is k-NN classifier, and it is one of the most simple, popular, and instance-based non-parametric machine learning algorithms [67]. During classification, individual test instance that is nearest to k neighbouring training sets is in a feature space, based on a Euclidian distance metric function:

$$\mathrm{dE }(\mathrm{x},\mathrm{y}) = {\sum }_{\mathrm{i}=1}^{\mathrm{N}}\sqrt{{{\mathrm{x}}_{\mathrm{i}}}^{2}+{{\mathrm{y}}_{\mathrm{i}}}^{2}}$$
(1)

where x and y are histograms in X = Rm (and m is dimensionality of the image). Figure 6 shows the process of KNN classification [68]. Predictions are assigned by the majority vote among its k nearest neighbour samples [69,70,71]. If k = 1, then the object is simply assigned to the same class of the object nearest to it. k must be generally an odd integer if the number of classes is two. As a very low value of K lead to noisy and cause effects of misfits in the model, as well as high k can lead to smoother decision boundaries and instability in the model, appropriate values must be selected by trial and error [72].

Fig. 6
figure 6

The MRS result on the background colour composite image of the first three MNF bands

GEOBIA Lithological Mapping. In contrast to the PBIA approach, GEOBIA [73] is based on information extracted from a group of similar pixels, according to their spatial spectral and textural information, that is called image objects, which plays an important role in the classification by taking into consideration spectral content, size as well as the shape [74]. In this approach the image must be segmented into homogeneous and meaningful objects (step1) before the classification process (step2).

Multi-Resolution Segmentation (MRS). The Multiresolution segmentation (MRS) algorithm is recognized as the first and the most crucial step in GEOBIA approach because its outcomes influence directly all the following process [75]. MRS successively implements a bottom-up region merging technique that begins at random points with single pixels objects and then merges them into larger and real-world segments depending on the homogeneity criterion [76]. The purpose of this stage is to create real-world objects that would be classified according to their contextual, textural, spatial as well as spectral homogeneity. The MRS method's outcome is based on four parameters, namely, layer weight, compactness, shape and scale parameter (SP). Compactness is known as the weight of smoothness criterion, likewise the shape-colour criteria refers to spectral information of an object, whereas SP defines the maximum heterogeneity of the image objects [77]. The result of the MRS is illustrated in Fig. 6.

Classification Algorithm. The second and last stage in GEOBIA approach is selecting a set of feature vector to differentiate between the target classes and create connectivity between real-world classes and the image objects to apply a suitable classification rule. In this study, the classification of image objects was carried out by standard Nearest Neighbour (NN) classifier. It consists of searching for the appropriate training sample in the feature space for each object [76].

Accuracy assessment. In order to evaluate the classification accuracy of for all the classification methods in this research, the resultant lithological maps were assessed by comparing them the digitalized geological map of the study region using the confusion matrix [78]. Several measurements, including overall accuracy (OA), commission and omission errors, and a kappa coefficient (K), were calculated to identify the potential of each classification approach.

3.3 Results and Discussion

The lithological map obtained from PBIA, namely RF and K-NNMLAs as well as GEOBIA approach, was illustrated and assessed in the sub-sections below.

PBIA Results. The resultant lithological maps developed using Sentinel-2 imagery for both pixel-based MLAs are described in Fig. 7.

Fig. 7
figure 7

The resultant lithological maps using PBIA MLAs: a RF; b k-NN

In addition, a general comparison between the two Pixel-based MLAs (Fig. 7) reveals that the k-NN method (Fig. 7b) showed many facies that are poorly classified, for instance, the circles with orange and magenta colours that show the apparition of some misclassified classes that appears in k-NN approach, in addition to the leucogranite, marked by a red arrow, that has been assigned as Limestone, marls, phosphate in k-NN approach, and the continental terrigeneous series with conglomeratic dominance (CC), that are indistinguishable in k-NN approach as demonstrated by the blue circle, finally, as illustrated by the green circle some of the intrusive bodies of amphibolitized gabbro (AG), have been clearly manifested in RF MLA (Fig. 7a).

About the classification accuracy, RF performs much better for lithological mapping (OA = 81.92% and Kappa coefficient = 0.72) compared to k-NN algorithm.

GEOBIA Results. Unlike the PBIA approaches, GEOBIA shows homogeneous classes and reduces all the problems related to the misclassified pixels as well as salt and pepper artifacts since it performs by not only taking into account spectral properties, but also the shape, texture, and geometry of objects during the process of classification. Furthermore, as shown in Fig. 8, the GEOBIA technique has greater potential to generate lithological maps in which the overall accuracy of the classification results (OA = 83.46%, Kappa coefficient = 0.76) outperformed PBIA machine learning algorithms.

Fig. 8
figure 8

The resultant lithological maps using GEOBIA approach

Overall Comparison. The confusion matrix has been used in this study to evaluate the efficiency of the classification accuracy for the geological maps obtained using both MLAs of the PBIA approach (RF and k-NN) and GEOBIA technique. Therefore, the known pixels from the digitalized lithological map of the study area were used as reference data. Besides, the digitalized geological map (Fig. 9) of the study area is depicted into eight general classes: Continental terrigeneous series with conglomeratic dominance (CC), Leucogranite (LG), Limestones, marls, phosphates (LMP), Limit of the phosphates mining area (LPMA), Low and medium terraces and colluvium (LC), Schists and micaschists (SM), Set of homogeneous light-gray schists containing one or more sandstones quartzites bars very tectonized (SQ) and intrusive bodies of amphibolitized gabbro (AG).

Fig. 9
figure 9

Digitalized geological map of study area

Tables 1, 2, and 3 display the confusion matrix of PBIA (RF, K-NN) and GEOBIA approaches that is derived by comparing the corresponding classes to the reference samples.

Table 1 Confusion Matrix of PBIA RF Classification
Table 2 Confusion matrix of PBIA K-NN classification
Table 3 Confusion matrix for GEOBIA approach

General comparison of all classes shows that many lithological units are misclassified especially for intrusive bodies of amphibolitized gabbro (AG) and Low and medium terraces and colluvium (LC) for both PBIA MLAs (Tables 1 and 2), and these could be demonstrated by omission and commission errors that are greater than those of GEOBIA approach (Table 3). However, the overall accuracy and kappa coefficient for each method are shown in Fig. 10.

Fig. 10
figure 10

a Overall accuracies (OA) and b Kappa coefficient comparison of RF, k-NN PBIA and GEOBIA approaches

The misclassified classes, as well as salt and pepper artifacts caused by the effect of mixing pixel problem in PBIA algorithms, led to the lowest overall and kappa values; however, GEOBIA approach improved the results by achieving the highest accuracy statistics.

4 Conclusions

Finding the optimal classification method is the most critical step for geological mapping; for this purpose, this study is devoted to evaluate different approaches including pixel and object-based image analysis, in order to select the most accurate approach for mapping lithological units in semi-arid areas, where Skhour Rehamna was chosen as a case study.

The lithological mapping was successfully achieved by evaluating the performance of GEOBIA and PBIA approaches using spectral channels and neo-bands of Sentinel-2A imagery. However, the overall statistics of this research obviously indicate that the GEOBIA approach has considerable potential and advantages for generating more realistic and detailed lithological maps also acquiring lithological information and properly classifying all lithological units by reducing all the problems encountered while using PBIA MLAs.