Introduction

Africa is rich with heritage that documents human history from early primates up to recent complex societies (Connah 2004; Haaland 1995; Lange 2007; MacDonald 2013; Mattingly et al. 2007; Phillipson 2005; Shaw et al. 1993; Stahl 1994). Heritage sites on the continent are faced with dangers posed by both anthropogenic and natural threats such as mining activities, urban development, looting, flooding, erosion, and fires (Chirikure 2013; Kankpeyeng and DeCorse 2004; Khandlhela and May 2006; Lasaponara et al. 2016; Musyoki et al. 2016; Nienaber et al. 2008; Parcak 2015; Schmidt and McIntosh 1996; Smith 2012). In addition to this, heritage management institutions in Africa are facing several challenges, including lack of funds, which often lead to inadequate surveying, documentation, and monitoring of heritage sites (Chirikure 2013; Mabulla 2001; McIntosh 1993). Site surveying, documentation, and monitoring in some regions are also hampered by inaccessibility due to factors such as the presence of dangerous wild animals, conflicts, and property rights (Biagetti et al. 2017; Mabulla 2001; Thabeng et al. 2019).

The identification and documentation of archaeological features in Africa has traditionally been done through fieldwalking surveys (Fleisher and LaViolette 1999; Hitchner 1995; Huffman 2009a, 2011; McIntosh and McIntosh 1993). Fieldwalking surveys offer the surveyor an opportunity to identify, appreciate, and record finer details of different types of archaeological sites on the ground and to provide contextual records of archaeological materials (Foard 1977; Reid and Segobye 2000). Their limitation is that they are time-consuming, costly, and difficult to carry out over large areas (Banning et al. 2006; Corrie 2011; Hitchings et al. 2013). As a relatively cheap, fast, and systematic alternative, heritage managers and researchers have devised analytical techniques to predict the locations of archaeological sites over large and/or inaccessible areas within a short period of time (i.e., “predictive models”). These are based either on a sample of a region or on fundamental assumptions about human behavior (Danese et al. 2014; Keay et al. 2014; Kohler and Parker 1986; Lasaponara et al. 2014; Verhagen and Whitley 2012). Traditional models predict the location of archaeological sites based on the spatial analysis of environmental variables and/or other sites (Danese et al. 2014; Sharafi et al. 2016), while remote sensing predictive models exploit the spectral contrast between features and their surroundings (Corrie 2011). Although landscapes carry the cumulative traces of human–environment interactions, anthropogenic activities can have localized, long-lasting impacts on the soil’s physical and chemical properties, thus making certain areas distinct from their surroundings (Oonk et al. 2009; Wilson et al. 2008). For example, negative vegetation marks have been identified as indicators of the presence of subsurface archaeological features such as walls (Hejcman and Smrž 2010). This is because the presence of walls in the soil makes it more compact and less moisture-retentive therefore resulting in stunted vegetation growth (Gojda and Hejcman 2012). High moisture-retentive features such as ditches have been linked with positive vegetation marks (Featherstone et al. 1999; Reeves 1936). On the other hand, surface archaeological features can be identified based on their physical characteristics such as form (De Laet et al. 2007; Mason 1968; Sadr 2016) and ecological indicators (Denbow 1979; Reid 2016). Lastly, soil chemical and physical characteristics strongly influence the spectral behavior of soils and can be discriminated through spectral imaging (Ben-Dor 2002). As a result, several studies (Agapiou et al. 2012b, 2014a; Altaweel 2005; Beck 2007; Crawford 1923; Klehm et al. 2019; Mason 1968; Opitz and Herrmann 2018; Parcak 2007) have successfully employed remote sensing techniques to identify a number of archaeological site indicators. However, due to limited funding and incomplete site databases, research on the applicability of predictive remote sensing in an African context remains sporadic (Denbow 1979; Klehm et al. 2019; Mason 1968; Sadr and Rodier 2012).

Remote sensing data can be captured using broadband (multispectral) and narrowband (hyperspectral) sensors housed on handheld, airborne, and spaceborne platforms (Bradbury et al. 2013; Cavalli et al. 2013; Doneus et al. 2014; Mutanga et al. 2015; Schmidt and Skidmore 2003). At present, there are several multispectral satellite sensors with different spatial and spectral characteristics providing large volumes of data with great potential for the identification of archaeological sites; the challenge is to identify the suitable sensors for studying different archaeological features (Agapiou et al. 2014a; Parcak 2007). This is because, in addition to optimum environmental conditions, the ability to detect archaeological materials using remote sensing depends on the spatial and spectral resolutions of the sensor (Beck 2007). Spatial resolution represents the area on the ground that each pixel in an image covers and is a measure of the smallest object that can be resolved by the sensor (Liang et al. 2012). Higher spatial resolution means each pixel represents a smaller square of ground, with higher chances of detecting small archaeological features. In multispectral imagery (which are datasets containing more than one spectral band), spectral resolution is the width of each band (wavelength range) of the electromagnetic spectrum in the dataset, and it measures the ability of the sensor to resolve features in the electromagnetic spectrum (Lillesand et al. 2008). Since different surface materials can be distinguished by comparing their spectral responses (reflected radiation) over distinct wave ranges, the finer the bandwidth, the higher the ability of a sensor to make this distinction. Often, a trade-off between the two is needed for the identification of desired surface features. As such, a number of studies have compared the accuracies of different satellites in detecting archaeological features (Fowler 2002; Parcak 2007). This approach can be time-consuming and expensive, especially when using commercial satellite images.

Hyperspectral data offer high spectral resolution by capturing narrow bands across visible, near-infrared, and shortwave infrared portions of the electromagnetic spectrum. This high spectral resolution permits the identification of distinctive attributes of different features (Agapiou et al. 2012b; Cavalli et al. 2007; Cerra et al. 2018). As a result, many studies have used field and laboratory hyperspectral data to pilot investigations on the potential application of remote sensing principles in various fields including the analysis of soil’s physical and chemical properties (Cozzolino and Moron 2003; Nocita et al. 2014; Sørensen and Dalsgaard 2005), vegetation health (Dhau et al. 2018b; Kokaly 2001), spectral identification of different vegetation species (Adam et al. 2009; Cochrane 2000), and spectral discrimination of archaeological sites (Agapiou et al. 2010, 2012b; Melillos et al. 2018). Hyperspectral data has also been used to investigate the ability of planned multispectral satellite sensors to detect vegetation indices associated with buried archaeological features (Agapiou et al. 2014b). Currently, there are very few studies aimed at identifying spectral bands suitable for discriminating surface archaeological features in current operational multispectral sensors (Thabeng et al. 2019).

The use of hyperspectral sensors for discriminating different features has some limitations such as high computational demands and the large data redundancy due to the strong correlation between the spectral features (Burger and Gowen 2011; Doneus et al. 2014; Feng et al. 2016; Metternicht et al. 2010; Sibanda et al. 2016). Additionally, there are no operational airborne and spaceborne sensors matching the very high spectral resolution of hand-held spectrometers. As a result, numerous studies have resampled field and laboratory hyperspectral data acquired over small areas to the spectral resolutions of existing multispectral and hyperspectral sensors, in order to investigate applications for soil analysis (Nawar et al. 2014), vegetation studies (Adam et al. 2012; Mansour et al. 2012), and archaeology (Agapiou et al. 2014a). The major limitation of resampling using field and lab spectroscopy data is that these data have a high signal-to-noise ratio (SNR), which is impossible to achieve with imagery from airborne and spaceborne sensors (Mutanga et al. 2015). SNR is a measure that compares the level of a desired signal to the level of background noise and indicates, in remote sensing, how much of the recorded signal that appears as a pixel is useable information vs. unwanted distortion or noise. However, several studies (Mansour et al. 2012; Mutanga et al. 2015) have found no significant difference between the results obtained from resampling fine resolution data and those from the actual satellite image.

This study seeks to identify the most suitable multispectral sensors for mapping archaeological sites previously occupied by farming communities. This is done through resampling in situ hyperspectral data to the spectral resolutions of the most common multispectral sensors (namely GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5 and WorldView-2). The study was carried out in the Mapungubwe Cultural Landscape, an area of Southern Africa occupied by farming communities since the beginning of the first millennium AD (Huffman 2008; Huffman and Du Piesanie 2011). Unique surface features and a distinct settlement organization, known as the Central Cattle Pattern (CCP) (Hanisch 2002), further described below, make this landscape ideal for testing. The specific objectives of the paper are (i) to identify the optimum spectral resolution for predicting archaeological sites (middens, non-vitrified dung, and vitrified dung) using in situ hyperspectral data resampled to different remote sensing multispectral platforms; (ii) to compare the prediction accuracies of middens, non-vitrified dung, and vitrified dung achieved using resampled data, RF, and SVM classifiers; and (iii) to identify the importance of the different bands allocated in different multispectral sensors in predicting archaeological sites (middens, non-vitrified dung, and vitrified dung) using RF algorithm.

Materials and Methods

Study Area and Archaeological Context

The Mapungubwe Cultural Landscape is a UNESCO-listed heritage area situated where the Shashi and Limpopo rivers meet in the province of Limpopo, South Africa (Fig. 1). The Shashi-Limpopo Confluence Area (SLCA) forms the boundaries of three countries: Botswana to the west, South Africa to the south, and Zimbabwe to the north. Geologically, the SLCA lies within the Limpopo mobile belt, which joins the Zimbabwe and Kaapvaal cratons (Chinoda et al. 2009). This area is characterized by igneous and sedimentary rocks of the Karoo supergroup (Bordy and Catuneanu 2002). Erosion is rampant, particularly in areas closer to the river channels, thus forming sandstone ridges and outcrops, which cover most parts of the SLCA, with a sparse distribution of volcanic intrusions (Götze et al. 2008; Hanisch 1981). Generally, soils in the Limpopo mobile belt include clays and sands originating from the Karoo system.

Fig. 1
figure 1

Location of the study area in southern Africa

The Mapungubwe Cultural Landscape was occupied by different farming communities, which combined cultivation with herding and the smelting and forging of iron (Mitchell 2013) in two distinctive periods. The first occupation occurred during the early centuries of the first millennium AD (Huffman 2008; Huffman and Du Piesanie 2011) and the second from AD 900 onwards (Calabrese 2000; Eloff and Meyer 1981; Huffman 2000; Vogel and Calabrese 2000). A model of settlement organization known as the Central Cattle Pattern (CCP) (Hanisch 2002; Huffman 1982, 1986) has been used to describe the structure of the villages and the worldview of their inhabitants, both reflecting the centrality of cattle in the life of these communities. The main features of the CCP are (1) a central cattle byre (also called a kraal) with elite burials and storage pits for grains; (2) an area next to the kraal where men would gather; (3) and an outer residential zone characterized by huts arranged according to seniority (Fagan 1964; Huffman 2000, 2001, 2009b). Social and political changes in the Mapungubwe Cultural Landscape took place during the early centuries of the second millennium AD (AD 1000–1300), with the development of class distinction and sacred leadership (Huffman 2000; Meyer 2000). The chief/king was physically separated from the commoners at the beginning of thirteenth century AD with the occupation of Mapungubwe Hill (Huffman 2009b). This led to changes in the organization of the main settlements whereby the traditional centrality of the cattle byre was abandoned and stonewalls were built to seclude rulers from the commoners in major settlements (Huffman 2000; Meyer 2000). However, the CCP continued in the satellite settlements occupied by commoners (Huffman 2000).

Mapungubwe societies traded with merchants along the Indian Ocean coast (Huffman 2000; Meyer 2000; Pwiti 2005). Materials such as glass beads and marine shells were exchanged for metals, salt, ivory, and animal skins from the interior polities such as Toutswe and Bosutswe (Denbow 1990; Huffman 2000; Klehm et al. 2019; Klehm 2017; Koleini et al. 2016). At the peak of its power, the leadership of Mapungubwe is believed to have dominated societies up to 200 km away (Huffman 1982). A shift of power came towards the end of the thirteenth century AD, as the Mapungubwe Kingdom collapsed and the Great Zimbabwe Kingdom became dominant in the region (Calabrese 2000; Denbow 1990; Huffman 2009a; Klehm et al. 2019). However, trade between the societies in the Mapungubwe Cultural Landscape and the east coast merchants continued into the historic period (Huffman 2012).

Archaeologically, the most distinct features that remain in the Mapungubwe Cultural Landscape are cattle byres, marked by deposits of vitrified and/or non-vitrified dung (Huffman 2009b; Meyer 2000). Non-vitrified dung deposits consist of unburned dung (Huffman et al. 2013). Vitrified dung is a glassy biomass slag with high deposits of nitrates and phosphates formed by burning thick dung deposits at very high temperatures, usually in the region of 1100 °C (Peter 2001; Thy et al. 1995). The causes of dung vitrification are debated. Thy et al. (1995) posit that, for vitrification to occur, dung may have been burned by veld fires or lightning at very high temperatures, in an environment conducive to internal combustion. Other researchers (Huffman et al. 2013; Peter 2001) argue that vitrification results from the intentional burning of byres, most likely for cleansing purposes. Generally, the sites appear as bare patches within the savanna woody vegetation, in some cases barren and grayish-white in color (in particular when the dung is vitrified) and in other cases covered by grass, predominantly Cenchrus ciliaris (Denbow 1979; Mothulatshipi 2008). The distinct spectral signature, large size, and centrality of cattle byres, already examined by remote sensing-based studies in the region (Denbow 1979), make them an ideal indicator for the prediction of a household or village, depending on the scale at which the study is carried out.

Pits, grain bins, and middens are the other major features characterizing many sites (Huffman 2007). Middens include the discarded remains of materials such as broken potsherds, animal bones, beads and other artifacts, and the ashes from fireplaces (Chirikure et al. 2014; Huffman 2012). While pits and grain bins are small features of sub-meter sizes (not easily detectable by any optical remote sensing images), middens, which can differ in size depending on the duration and density of site occupation (Eloff and Meyer 1981), are generally larger than a few meters and could easily serve as another excellent site indicator.

Given the distinct spatial and spectral characteristics of these archaeological features, their detection through the analysis of multispectral remote sensing imagery could have major implications not only for the construction of predictive models. These features are not just associated with determining site location and settlement patterns but they can also be associated with sociopolitical factors such as site hierarchy and/or use (Denbow 1986; Huffman 1986, 2000, 2001, 2009b; Manyanga 2007; Meyer 2000; Mothulatshipi 2008). Expanding the knowledge of location and size of sites over vast areas, from local scales to regional landscapes, is fundamental for gaining insight into political hierarchies of contemporaneous settlements (Huffman 1986) and diachronic population aggregation and environmental strategies. This is especially true for understanding the role of small sites in hinterland locales (Antonites and Ashley 2016; Klehm 2017; Klehm and Ernenwein 2016).

Field Data Collection

A total of 356 soil surface samples (at a depth of 0–20 cm) were collected in February 2017 and packed in zip-lock plastic bags for spectral measurements in the laboratory. This procedure followed the traditional method of acquiring reproducible, stable, and accurate spectral measurements for the analysis of soil spectral characteristics (Ben-Dor et al. 2017; Stevens et al. 2010). Between 60 and 117 samples were collected for each category: non-sites, archaeological soils characterized by middens, and vitrified and non-vitrified dung deposits. A purposive sampling method was used during the fieldwork data collection by visiting archaeological sites that were known to be characterized by dung deposits and middens (Huffman 2009a, 2011). Non-site soil samples served as a control; these were collected at some distance from the targeted archaeological features in order to avoid possible contamination that could come from wind and water erosion. Although this measure does not guarantee that the collected soils are non-archaeological, the procedure ensured that the control soils were distinct and distant from the targeted archaeological features—byres and middens.

Lab Spectral Measurements and Resampling

A portable field spectrometer (FieldSpec® 4) was used to measure the reflectance spectra of vitrified dung, non-vitrified dung, midden, and non-site soils in a controlled environment. This was done in order to minimize the atmospheric effects caused by weather conditions. The Analytical Spectral Device (ASD) captures visible-near infrared and shortwave infrared spectral data between 350 and 2500 nm, at a bandwidth of 1.4 nm in the visible-near infrared region (350–1000 nm) and 1.1 nm in shortwave infrared region (1001–2500 nm) (Analytical Spectral Devices, Inc. 2018). These very narrow spectral channels have been successfully resampled to the resolution of broadband sensors (Castaldi et al. 2016; Mutanga et al. 2015). The spectrometer was calibrated using a white spectrolon reference panel before taking measurements of a new sample and thereafter every 10–15 measurements to offset any change in atmospheric condition (Analytical Spectral Devices, Inc. 2018). Soil samples were flattened on a black plastic plate to create a smooth surface. The spectral measurements were then taken directly from the soil surface of each sample at nadir position with 10-mm field of view using Hi-Brite contact probe fitted with 100 W halogen reflector lamp (Ben-Dor et al. 2015; Ogen et al. 2017). Between 60 and 117 samples were collected from non-sites, middens, vitrified dung, and non-vitrified dung sites in the field (see Table 1 below). Three spectral measurements were taken per sample by randomly moving the probe over the soil surface, in order to obtain a representative reflectance spectrum for the sample. The spectral measurements were then averaged to represent the absolute spectral reading of the soil class of interest (Fig. 2).

Table 1 Spectral characteristics showing band description, bandwidth, band center, and spatial resolution of different multispectral sensors
Fig. 2
figure 2

Visualization of the average reflectance of different soil classes: midden (MD); non-site (NS); vitrified dung (VD); and non-vitrified dung (NVD)

Hyperspectral data measured in the lab were then converted to an ASCII file containing 10-nm-wide band spacing using wavelengths between 350 and 2500 nm. The resultant hyperspectral data contained in the ASCII file was averaged to mimic, through resampling, the spectral resolutions of common multispectral sensors using the resampling spectral library function inherent within Environment for Visualizing Images (ENVI) software (v. 5.4). The resampling tool in ENVI employs a Gaussian model with a full width at half maximum (FWHM) equal to the specified band spacing to resample the data (Dhau et al. 2018a; Oumar and Mutanga 2010; Verrelst et al. 2013). The hyperspectral data were resampled to the spectral resolutions of a selection of popular multispectral sensors (GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5 and WorldView-2) using band centers in Table 1. Bands between 350 and 400 nm and 2400–2500 nm were removed from the data before resampling, as these bands are affected by noise (Castaldi et al. 2016).

The resulting resampled satellite datasets were divided into training (70%) and test (30%) datasets (Table 2). Thereafter, the datasets were used as input variables in RF and SVM classifiers to test if their spectral resolutions are suitable for predicting archaeological sites.

Table 2 Training and validation dataset for all the soil classes created by splitting the field data into 70:30

Data Classification

Although the use of conventional parametric classifiers such as Maximum Likelihood remains the preferred method for many remote sensing applicative studies, including archaeological ones (e.g., De Laet et al. 2009), this study used Random Forest (RF) and Support Vector Machines (SVM) to classify all soil classes (sites and non-sites). Despite the advantages offered by the availability of parametric classifiers in conventional image processing software packages (Yu et al. 2014) vis-à-vis uncertainties in how to use and implement machine-learning techniques effectively (Maxwell et al. 2018), RF and SVM classification algorithms have proved to provide better classification performance (higher accuracy) than traditional, statistically based, parametric procedures (Ahmad et al. 2010; Belgiu and Drăguţ 2016; Chagas et al. 2016; Grimm et al. 2008; Maxwell et al. 2018; Mountrakis et al. 2011; Pal and Mather 2003). Moreover, the machine-learning classifiers able to model complex class signatures characterized by many predictor variables (high dimensional feature space) are non-parametric. That is, they do not make assumptions about the data distribution (Maxwell et al. 2018) and can accept limited training datasets (Rodriguez-Galiano et al. 2012; Shao and Lunetta 2012). Furthermore, RF and SVM have a high generalization capacity, which makes it possible to apply them on incomplete or noisy (error prone) databases (Rodriguez-Galiano et al. 2012; Rodriguez-Galiano and Chica-Rivas 2014; Shao and Lunetta 2012). These characteristics are advantageous for archaeological site prediction, particularly in complex archaeological landscapes that may contain a high number of land covers, with low interclass separability, and/or limited access for the collection of training data.

Random Forest

RF is a non-parametric machine learning classification algorithm developed by Breiman (2001). The algorithm uses an ensemble of classification and regression trees for prediction. The algorithm grows each tree, without trimming it until its nodes reach purity, using a random subset of predictor variables (Adam et al. 2017). Each tree from the forest then contributes a single vote for the prediction class with the majority votes deciding the class. RF needs the optimization of the number of trees (ntree) and the number of the predictive variables taken into consideration at each node (mtry) in order to improve the classification accuracy (Genuer et al. 2010; Mureriwa et al. 2016). The bootstrap sampling of variables at random carried out in building each tree was performed with replacement from the population (Breiman 1996; Rodriguez-Galiano et al. 2012). This sampling technique divides the variables into two-thirds training data and uses the remaining third to assess the importance of each variable in classification and generalization error (Belgiu and Drăguţ 2016). The testing data is defined as the Out-Of-Bag (OOB) sample.

One major advantage of RF over other machine learning algorithms, such as artificial neural networks and SVM, is its inherent ability to measure the importance of each candidate predictor in the classification process. This advantage has been demonstrated in a number of studies where RF was used for reduction of dimensionality and variable selection in various domains like bioinformatics (Díaz-Uriarte and De Andres 2006; Farhat et al. 2016; Wu et al. 2008), ecology (Brieuc et al. 2015; Wei et al. 2010), remote sensing (Mutanga et al. 2012), and medical imaging (Lebedev et al. 2014). Gini importance measures the contribution of each predictor in keeping the nodes pure in a forest. The second measure of importance, mean decrease in accuracy, is calculated using the RF internal measure of accuracy. RF assesses the importance of each variable in the final model by measuring the decrease in accuracy by means of OOB error, when its values are removed from the sample with other variables remaining constant (Breiman 2001). The error is expected to rise if the variable is important in the prediction of the forest. The importance of the predictor variable yj can be defined as follows:

$$ \mathrm{MDA}\left(\ {y}_j\right)=\frac{1}{ntree}\sum \limits_{t=1}^{ntree}\left({ap}_{tj}-{a}_{tj}\right) $$

Whereby:

  • ntree is the number of trees of the RF,

  • aptb is the OOB error of tree t after randomly permuting the values of the predictor variable yj, and

  • atb is the OOB error of tree t before randomly permuting the values of the predictor variable yj

The end results for each predictor variable can then be used to assess its importance in relation to others in the prediction process. In this study, mean decrease accuracy was used to measure the importance of hyperspectral data and resampled satellite bands in predicting non-sites, middens, non-vitrified dung, and vitrified dung. The mtry and ntree were optimized using grid search and 10-fold cross-validation in the e1071 library of R statistical packages version 3.4.1 (Meyer et al. 2017). The resampled hyperspectral data was then classified in R using the randomForest package, which is based on the original RF algorithm developed by Breiman and Cutler (2007).

Support Vector Machines

SVM classification algorithm has previously been used to classify land cover data from satellite sensors (Adam et al. 2014; Ustuner et al. 2015). This is because of its robust generalization ability and capacity to deal with noise effects and achieve high classification accuracies (Shao and Lunetta 2012). SVM are non-parametric classifiers, therefore they do not assume normality within training statistics. In this study, SVM was used to predict the soil classes using resampled satellite bands. SVM is a kernel-based algorithm that predicts classes by finding the hyperplane that optimally separates two classes in high dimensional feature space (Chen and Lin 2006; Zhu and Blumberg 2002). The most used SVM kernels are the polynomial, sigmoid, linear, and radial basis function (RBF) (Ben-Hur and Weston 2010; Lin and Lin 2003; Pal and Mather 2005). A radial basis kernel function was used to classify the data in this study because of its ability to handle nonlinear relations between class labels and attributes (Hsu et al. 2003). The RBF defined as follows:

k(x, x1) = exp( − γx − x12)

Whereby x and x1 represent two points from training data with default kernel function parameter (γ), which is (1/(data dimension)). RBF requires two user-defined parameters, which are the regularization parameter (C) and kernel function parameter (γ) to run the SVM model. The regularization parameter regulates the accepted level of misclassification errors by determining the margin between class boundaries (Li et al. 2015). Kernel function parameter defines the width of the Gaussian kernel. In general, these parameters have an influence on the overall classification accuracy. Hence the need to run the model on optimum parameters in order to obtain good classification accuracy (Hsu et al. 2003). In this study, pairs of C and γ parameters were optimized using a 10-fold cross-validation and grid search. This method tests various combinations of C and γ parameters and chooses the one which attained the best cross-validation accuracy. The model follows the procedure described below:

  1. 1.

    Consider a grid space of (C, γ) with log2C ∈ {− 5,− 3, . ., 13} and log2γ ∈ {− 13,− 11, ., 3}.

  2. 2.

    For each pair of C and γ parameters in the search space, carry out 10-fold cross-validation on the training set.

  3. 3.

    Select a pair of C and γ, which will result in the best overall cross-validation classification rate.

  4. 4.

    Train a model using the selected best combination of parameters (C, γ)

The optimization of parameters and classification of the resampled hyperspectral data were done using the e1071 library of R statistical packages version 3.4.1 (Meyer et al. 2017).

Accuracy Assessment

Classification accuracy was assessed by means of the confusion matrix, which was constructed using a holdout dataset created by randomly dividing the resampled data into 70% (training data) and 30% (test data) (see Table 1 above). The confusion matrix enables the assessment of the classification of each class by giving the user’s accuracy and the producer’s accuracy (Congalton 1991). User’s accuracy shows the proportion of predictor variables correctly predicted as they are in reality. This measure is achieved by dividing the number of correctly predicted variables by the row total. Producer’s accuracy, on the other hand, measures the proportion of predictor variables, which were correctly predicted within a class. Producer’s accuracy is attained by dividing the number of correctly predicted variables by the column total. Above all, the confusion also offers the overall accuracy, which is the percentage of correctly classified test pixels across all classes. Cohen’s kappa coefficient was used to assess the agreement between the reference data and the classifier because of its ability to compensate for chance agreement (Rosenfield and Fitzpatrick-Lins 1986). Cohen’s kappa coefficient is defined as follows:

$$ K=\frac{\Pr \left(\mathrm{o}\right)-\Pr \left(\mathrm{c}\right)}{1-\Pr \left(\mathrm{c}\right)} $$

Where Pr(o) is the observed agreement and Pr(c) is the expected agreement. A perfect agreement is achieved if the kappa value (K) is one or close to one (McHugh 2012; Rosenfield and Fitzpatrick-Lins 1986).

Results

Optimization of RF and SVM

The optimization results of RF parameters (mtry and ntree) for different sensors are shown in Fig. 3. In general, the lowest error rates achieved by the different optimum mtry and ntree combinations for spectral data resampled to resolutions of various sensors ranges between 0.120 and 0.168 (Fig. 3). The optimum mtry and ntree parameter combinations for Sentinel-2 achieved the lowest OOB error rate at the value of 0.12. The best mtry and ntree parameter combination for hyperspectral data resampled to resolution each satellite sensor was used to classify its related data in the RF algorithm.

Fig. 3
figure 3

OOB errors of optimized RF parameters (mtry and ntree) using grid search procedure. The OOB method was used to identify the error rates for different sets of mtry and ntree; (a) 30 sets for GeoEye; (b) 60 sets for Landsat 8 OLI; (c) 40 sets for RapidEye; (d) 120 sets for Sentinel-2; (e) 30 sets for SPOT 5; and (f) 70 sets for WorldView-2

The exponentially growing sequence of C and γ values were assessed using grid search in an attempt to select the best parameter combinations for classifying dataset resampled to the spectral resolutions of different sensors. The optimization model achieved varying optimum combinations of C and γ for classifying data resampled to resolutions of GeoEye (C = 1000 and γ = 1), Landsat 8 OLI (C = 100 and γ = 1), RapidEye (C = 100 and γ = 1), Sentinel-2 (C = 1000 and γ = 0.1), SPOT 5 (C = 1000 and γ = 1), and WorldView-2 (C = 100 and γ = 1) sensors, in SVM classifier using RBF.

Band Importance

RF algorithm was used to assess the relative importance of each resampled band in predicting the classes of midden, non-vitrified dung, non-sites, and vitrified dung. Although the most important bands for each sensor are situated in different portions of the electromagnetic spectrum in the different sensors (Fig. 4), these are generally located within the visible spectrum. The green band (545 nm) was the most important band in discriminating midden, vitrified dung, non-vitrified dung, and non-sites in the SPOT 5 satellite sensor. This band combines wavelengths in the blue with those in the green part of the electromagnetic spectrum (Fig. 4). The SWIR band in the SPOT 5 sensor was the second most important band. The blue band was the most important band for satellites that have the ability to capture data in the blue portion of the electromagnetic spectrum such as GeoEye (480 nm), RapidEye (470 nm), Sentinel-2 (490 nm), and WorldView-2 (480 nm). Landsat 8 OLI was the only satellite that, despite having a blue band, had its most important band in discriminating midden, non-vitrified dung, non-sites, and vitrified dung located in the SWIR (1610 nm). The red band was the least important band for discriminating these soil classes for satellite sensors Landsat 8 OLI, SPOT 5, and GeoEye, while the red edge bands were the least important for satellite sensors WorldView-2, Sentinel-2, and RapidEye, which capture data in the red edge wavelengths. Overall, SPOT 5 had the most important bands for discriminating midden, vitrified dung, non-vitrified dung, and non-sites. Most bands of the Sentinel-2 sensor, which captures data in most regions of the magnetic spectrum, have low values in mean decrease of accuracy as compared with other sensors (Fig. 4).

Fig. 4
figure 4

Relative importance of each band for different sensors used in this study for predicting midden, non-vitrified dung, non-sites, and vitrified dung using RF. The spectral bands are distributed as follows: blue, green, red, and near-infrared for GeoEye; coastal, blue, green, red, near-infrared, SWIR1, and SWIR 2 for Landsat 8 OLI; blue, green, red, red edge, and near-infrared for RapidEye; aerosols, blue, green, red, red edge, red edge, red edge, near-infrared, red edge, water vapor, SWIRI-Cirus, SWIR 1, and SWIR 2 for Sentinel-2; green, red, near-infrared, and SWIR for SPOT 5; coastal, blue, green, yellow, red, red edge, NIR1, and NIR2 for WorldView-2. The most important variables are those with the highest mean decrease accuracy

The mean decrease in accuracy in RF was used to assess the relationship between the important bands for discriminating midden, non-vitrified dung, non-sites, and vitrified dung soil classes using hyperspectral data and the location of these bands for different sensors. The important bands for classification using hyperspectral data are spread across visible, near-infrared, and shortwave infrared portions of the electromagnetic spectrum (350–2500 nm), as shown in Fig. 5. However, there are notable peaks in the visible and the shortwave infrared portions of the electromagnetic spectrum between 350–576 nm, 1292–1380 nm, 1575–1748 nm, and 1801–1808 nm. All the satellite sensors have their bands located in the different areas of the visible spectrum. New satellite sensors, with a spatial resolution of less than 5 m do not have bands covering the shortwave infrared region, which also possesses some important bands in classifying the midden, non-vitrified dung, non-sites, and vitrified dung (Fig. 5). Nevertheless, only medium resolution sensors (Landsat 8 OLI, Sentinel-2, and SPOT 5) have bands that can capture data in the SWIR of the electromagnetic spectrum. The SWIR bands from the previously mentioned sensors are located at the same position of hyperspectral bands, which are important for discriminating midden, non-vitrified dung, non-sites, and vitrified dung. This, therefore, corroborates the importance of these bands in archaeological classification as shown in Fig. 4.

Fig. 5
figure 5

The location of different satellite sensor bands across the visible, near-infrared (NIR), and shortwave infrared (SWIR) portion of the electromagnetic spectrum (350–2500 nm) in relation to the relative importance of spectral bands collected using field spectrometer to predict midden, non-vitrified dung, non-sites, and vitrified dung using RF algorithm

Classification Accuracy

The classification of the midden, non-vitrified dung, non-sites, and vitrified dung sites was performed using RF and SVM on the hyperspectral data resampled to the spectral resolution of GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5, and WorldView-2 sensors, respectively. The error matrices for the output of each classifier were built using a holdout sample created by randomly dividing resampled laboratory data into 70 and 30% for training and validation, respectively. SVM achieved higher classification accuracies than RF for all datasets.

Accuracy assessment of the RF classifier, which was done using the validation data, achieved overall accuracies of 78.10, 80.00, 72.38, 81.90, 77.14, and 77.14% and Kappa coefficients of 0.7030, 0.7276, 0.6262, 0.7529, 0.6877, and 0.6905 when classifying hyperspectral data resampled to the spectral resolutions of GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5, and WorldView-2, respectively (Fig. 6). Generally, a lower classification accuracy of 72.38% and Kappa coefficient of 0.6262 were attained with the data resampled to the spectral resolution of RapidEye sensor, while Sentinel-2 achieved a very high classification accuracy of 81.90% and a kappa coefficient of 0.7529 (Table 3; Fig. 6). Sentinel-2 attained high producer’s and user’s accuracies of 82.86 and 78.38%, respectively, for NVD. RapidEye attained producer’s accuracy of 60.00% and user’s accuracy of 75.00% for the same class (Tables 3 and 4). However, mixed results were attained when classifying MD, with data resampled to RapidEye sensor resolution achieving high producer’s accuracy of 64.00% as compared with 60.00% for Sentinel-2 (Table 4). On the other hand, Sentinel-2 achieved higher user’s accuracy of 65.22% as compared with 50.00% (Tables 3 and 4).

Fig. 6
figure 6

The OA (%) and Kappa coefficients for RF classification of the midden, non-vitrified dung, non-sites, and vitrified dung achieved using a holdout sample from hyperspectral data resampled to resolutions of different multispectral sensors

Table 3 Error matrices of RF classification results for non-vitrified dung (NVD), midden (MD), non-sites (NS), and vitrified dung (VD) based on the holdout sample for hyperspectral data resampled to the spectral resolutions of RapidEye and Sentinel-2 sensors
Table 4 RF classification accuracies of non-vitrified dung (NVD), midden (MD), non-sites (NS), and vitrified dung (VD) achieved using a holdout sample from hyperspectral data resampled to resolutions of different multispectral sensors
Table 5 Error matrices of SVM classification results for non-vitrified dung (NVD), midden (MD), non-sites (NS), and vitrified dung (VD) based on the holdout sample for hyperspectral data resampled to the spectral resolutions of RapidEye and Sentinel-2 sensors

SVM classifier achieved overall accuracies of 86.67, 92.38, 82.86, 92.38, 91.43, and 86.67% and kappa coefficients of 0.8188, 0.8967, 0.7663, 0.8963, 0.8836, and 0.8192 when using a holdout sample from the data resampled to spectral resolutions of GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5, and WorldView-2 sensors, respectively (Fig. 7; Table 6). Overall, Sentinel-2 achieved a high overall classification accuracy of 92.38% and a kappa coefficient of 0.8963, while RapidEye attained the lowest overall classification of 82.86% and a kappa coefficient of 0.7663 (Table 5). MD achieved producer’s accuracy of 68.00% and user’s accuracy of 68.00% for hyperspectral data resampled to RapidEye sensor, while user’s and producer’s accuracies of 84.00 and 87.50% were achieved for the one resampled to a spectral resolution of the Sentinel-2 sensor. Similar user’s accuracies of 100% were archived for VD from the data resampled to the spectral resolutions of RapidEye and Sentinel-2, while varying producer’s accuracies of 83.33 and 94.44% were attained for the same datasets, respectively (Tables 5 and 6). Landsat 8 OLI also achieved a very high overall classification accuracy of 92.38%, which was similar to that of Sentinel-2 when using SVM classifier (Table 6; Fig. 7). However, their producer’s and user’s accuracy for NVD and MD were different (Table 6). Further results on producer’s and user’s accuracies for SVM classifiers are provided in Table 6.

Fig. 7
figure 7

The OA (%) and kappa coefficients for SVM classification of the midden, non-vitrified dung, non-sites, and vitrified dung achieved using a holdout sample from hyperspectral data resampled to resolutions of different multispectral sensors

Table 6 SVM classification accuracy of non-vitrified dung (NVD), midden (MD), non-sites (NS), and vitrified dung (VD) achieved using a holdout sample from hyperspectral data resampled to resolutions of different multispectral sensors

Discussion

Following recent improvements in both the spatial and spectral resolutions of satellite sensors, numerous studies have reported successful mapping of subsurface and surface archaeological material using various multispectral remote sensing data (Agapiou et al. 2012a; Beck et al. 2007; Lasaponara and Masini 2006; Masini and Lasaponara 2007; Melillos et al. 2018; Parcak 2007; Schuetter et al. 2013; Thabeng et al. 2019). Archaeological materials produce localized signatures that alter soil chemical and physical properties differently. On the one hand, this is the very reason why spectral variations of soils (and vegetation growing on them) can be used to discriminate archaeological features from their surroundings. On the other hand, these variations are not universal and cannot be uncritically used to predict archaeological sites in different contexts. As such, creating spectral libraries specific to local archaeological features and testing the potential of different sensors before acquiring imagery for predictive classification and further analyses of data is necessary. This is particularly important when, in the context of limited access to funding, there is the necessity of acquiring commercial imagery at a high cost. This study is an example of the methods available for identifying the best spectral bands, and thus the most suitable multispectral sensors, for detecting archaeological sites characterized by known archaeological surface features. This specific case study tested the separability of middens, vitrified dung, and non-vitrified dung, all of which characterize Southern African Iron Age sites interpreted as farming community settlements. Nevertheless, the methods presented here can be applied in other time periods and regions around the world, anywhere distinctive archaeological features expressed by soil or vegetation variations can also be measured on the ground.

The importance of each wavelength (350–2500 nm) in resampled hyperspectral data was effectively assessed using mean decrease in accuracy in RF. Results are in line with studies of the spectral characteristics of soils showing that VIS (400–700 nm), NIR (700–1100 nm), and SWIR (1100–2500 nm) spectral regions all serve as powerful tools for recognizing soils qualitatively and quantitatively (Ben-Dor et al. 2009). These wavelengths are commonly associated with soil color, which is influenced by a number of soil chromophores, parameters, or substances (chemical or physical) that significantly affect the shape and nature of a soil spectrum—for example, minerals, organic content, water, and particle size (Ben-Dor et al. 1997). This study found that wavelengths in the visible and shortwave infrared regions are the most important in discriminating natural soils and archaeological sites characterized by midden, non-vitrified dung, and vitrified dung deposits in all the resampled multispectral sensor data, followed by the IR region, while the resampled bands in the red and red edge are the least important in the classification of the aforementioned soils. Furthermore, the additional bands in the new Very High Resolution (VHR) satellites such as the yellow band in WorldView-2 are of limited importance (Fig. 4). The blue band is the most important variable for predicting the targeted archaeological classes using hyperspectral data resampled to GeoEye, RapidEye, Sentinel-2, and WorldView-2. However, the green band, which combines the wavelengths from the blue and the green sections of the electromagnetic spectrum in SPOT 5 sensor, is the most important across all bands (Fig. 5). The SWIR bands from the SPOT 5, Landsat 8 OLI, and Sentinel 2 sensors also show very high importance in the discrimination of the different deposits under study.

Despite their high spatial resolution, the results in this study show that new VHR multispectral sensors (GeoEye and WorldView-2) do not have the best bands for detecting these archaeological deposits. These sensors only capture data in the visible and near-infrared regions; they do not have bands in the SWIR that are sensitive to the important soil characteristics discussed above.

Overall, Sentinel-2, SPOT 5, and Landsat 8 OLI sensors detect the highest number of important spectral bands, making them the most suitable in discriminating midden, non-vitrified dung, and vitrified dung deposits from their surrounding natural soils. This is also supported by the high prediction accuracies they attained when using SVM (91.43–92.38%) and RF (77.14–81.90%) classifiers to predict the abovementioned features. Notably, Sentinel-2 achieved the highest classification accuracies in SVM (92.38%) and RF (81.90%) for the archaeological features discussed here because of its high spectral resolution, which captures data across wide portions of the electromagnetic spectrum. This is in line with findings by Cavalli et al. (2007) that bands in SWIR are important in detecting soil characteristics related to archaeological remains. The high classification accuracies achieved by Landsat 8 OLI, Sentinel-2, and SPOT 5 are an important development for archaeological heritage managers and researchers, particularly on the African continent where funding is often limited. In fact, imagery captured by Sentinel-2 and Landsat 8 OLI is free and readily available via World Wide Web portals. As a result, these sensors are potentially cost-effective for survey, documentation, and monitoring of archaeological sites over large areas.

Although the spectral resolutions of Landsat 8 OLI, Sentinel-2, and SPOT 5 are ideal for the detection of archaeological features in the study area, the lower spatial resolutions (2.5–15 m panchromatic; 10–60 m multispectral) might pose a challenge for scaling models up to satellite sensors. Two main issues can be distinguished: the size of single features and their proximity in space. On average, the diameter range of individual byres and middens in the study area is 3–18 m (Huffman pers. comm.). This is smaller than the minimum size of features that can be discriminated by the above satellites, which have a spatial resolution between 20 and 30 m in the SWIR region of the spectrum (Myint et al. 2011; Thabeng et al. 2019). The second issue is linked to the proximity of middens and byres (Calabrese 2000; Huffman 2009b), which have high chances of spectral confusion when captured in a single cell larger than 10 m.

As such, this study recommends two avenues for further research using actual satellite imagery. The first approach will be to compare the potential of very high spatial resolution multispectral satellites (WorldView-2 and GeoEye) and lower spatial resolution satellites (Landsat 8 OLI and Sentinel-2) in detecting archaeological sites characterized by surface features. This is because, in general, the spatial resolutions (0.46 m panchromatic; 1.84 m multispectral) of VHR satellite sensors have the potential to capture individual features with diameters as small as 4 m (Thabeng et al. 2019). On the other hand, low spatial resolution satellite sensors have the spectral ability to detect soil characteristics, which show reflectance differences within the visible and SWIR bands as discussed above. The second approach for detecting the middens and non-vitrified and vitrified dung deposits will be to assess the potential of data fusion combining the spectral abilities of low spatial resolution satellite sensors and the spatial abilities of very high spatial resolution satellite sensors.

In general, the results of this study show that RF and SVM classifiers can accurately predict midden, non-vitrified dung, and vitrified dung materials based on their spectral characteristics. These results agree with those of other research using RF and SVM together in a number of spectral mapping applications for vegetation species (Ghosh et al. 2014; Sesnie et al. 2010), vegetation health (Abdel-Rahman et al. 2014), agriculture (Duro et al. 2012), land cover (Adam et al. 2014; Noi and Kappas 2018), soil texture (Bousbih et al. 2019), and archaeology (Thabeng et al. 2019). Comparison of the accuracies achieved by the two classifiers has revealed that SVM achieved higher overall classification accuracies than RF in all datasets. For example, RF classifier achieved 78.10, 80.00, 72.38, 81.90, 77.14, and 77.14% when classifying hyperspectral data resampled to the spectral resolutions of GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5, and WorldView-2. Meanwhile, SVM classifier achieved overall accuracies of 86.67, 92.38, 82.86, 92.38, 91.43, and 86.67% for the same datasets. This is in line with other studies (Adam et al. 2014; Sesnie et al. 2010) that attained variable classification accuracies between the two classifiers when dealing with similar data samples. However, other researchers (Pelletier et al. 2016; Thabeng et al. 2019) found out that, even though there may be variations in their results, RF and SVM classifiers complement each other because of their different classification abilities.

RF and SVM also varied in their predictions of individual classes. In general, RF classifier picked a lot more confusion between middens and non-vitrified dung than SVM classifier (Tables 3 and 5). RF had the lowest producer’s accuracy (52.00%) and user’s accuracy (54.17%) for hyperspectral data resampled to the resolution of SPOT 5 sensor. The lowest producer’s and user’s accuracies attained by SVM classifier stood at 68.00% each and were from hyperspectral data resampled to RapidEye sensor. The confusion between MD and NVD might be a result of chemical similarities between the two of them (Thabeng et al. 2019). Another reason might be signature confusion influenced by post-depositional processes such as erosion, which is rampant in the study area, mixing the two deposits, which are located in close proximity to each other. Sesnie et al. (2010) posits that SVM is a superior method for solving complex classification problems, which would make it a better predictor of archaeological sites in the study area, particularly those characterized by the presence of MD and NVD in close proximity. Both classifiers achieved the highest user’s accuracy (100%) for VD across all the datasets.

In sum, the approach presented is a cost-effective manner of using information on the spectral characteristics of archaeological surface features (acquired through field spectroscopy in limited sample areas) to gain insight on the potential of different sensors and classifiers to predict archaeological sites. As remote sensing sensors all have different spectral and spatial resolutions, the selection of the appropriate data for archaeological applications is challenging. This assessment step helps to maximize the time and effort put into image treatment and classification. Doing so in advance of image acquisition will aid the exploration of vast and understudied areas in Africa and across the globe. This method can be applied anywhere as a cost-effective way of identifying sensors with spectral abilities to predict the archaeological material of interest. If the results are positive, the mapping of archaeological sites could be scaled up to data from operational sensors on satellite platforms. Properly identifying appropriate images and classifiers for the detection of archaeological features leads to accurate and reliable predictive maps. These maps have the potential to capture the archaeological landscapes in a synoptic manner that would be challenging (or even unachievable) by other means (Klehm et al. 2019).

Conclusions

This study investigates the possibility of discriminating archaeological features using hyperspectral data resampled to the spectral resolutions of the most widely used multispectral sensors (GeoEye, Landsat 8 OLI, RapidEye, Sentinel-2, SPOT 5, and WorldView-2). The following findings can be reported:

  • The bands within the visible and SWIR portions of the electromagnetic spectrum are the most important for predicting the natural soils and archaeological sites characterized by middens, vitrified dung, and non-vitrified dung—all of which are in line with the spectral resolutions of Sentinel-2, SPOT 5, and Landsat 8 OLI. This, therefore, makes them the most suitable sensors for detecting archaeological sites. Their lower spatial resolution is nevertheless a limitation and restricts the detection to larger byres and middens.

  • The high classification accuracies achieved in this study demonstrate that multispectral sensors have the ability to detect middens, non-vitrified dung, and vitrified dung. In general, higher classification accuracies were achieved by SVM than by RF classifier. The highest classification accuracies were achieved when classifying data resampled to the resolution of the Sentinel-2 sensor using both RF (81.90%) and SVM (92.38%). Landsat 8 OLI also achieved the highest classification accuracy similar to that of Sentinel-2 when using SVM classifier.

  • The green band and SWIR bands in SPOT 5 satellite sensor were the most important bands in discriminating between midden, vitrified dung, non-vitrified dung, and non-sites. The blue band is the most important discrimination band in the GeoEye, RapidEye, Sentinel-2, and WorldView-2 sensors. Other important bands included the SWIR bands in Landsat 8 OLI and Sentinel-2 and the NIR bands in sensors without SWIR bands.

  • VHR satellite sensors, although characterized by lower spectral resolution, still possess important bands for discriminating archaeological and non-archaeological features; they also achieved high classification accuracies in this study. These sensors offer the potential of overcoming the limitation of higher spectral resolution sensors that can only detect larger archaeological features.

The results of this study have revealed the prospects for discriminating middens, natural soils, vitrified dung, and non-vitrified dung by means of hyperspectral data resampled to the resolutions of multispectral satellite sensors. This, in turn, offers an opportunity to scale this approach up to spaceborne sensors for mapping and monitoring archaeological features. This is an important development for archaeological researchers and heritage managers because it can facilitate satellite image identification and classification for predictive modelling across vast regions. This will reduce the challenges inherent to traditional fieldwalking surveys—including high costs, restricted land access, long surveying periods, wild animals, and active war zones. This is one of the earliest studies to assess the potential of several sensors to detect surface archaeological material. Although hyperspectral data resampled to the spectral resolutions of low spatial resolution sensors (Landsat 8 OLI, Sentinel-2, and SPOT 5) achieved relatively high classification accuracies, this study recommends further research using actual images to assess the potential of very high spatial resolution satellites (GeoEye and WorldView-2) in detecting archaeological sites characterized by surface features. As with other remote sensing applications, the trade-offs among multispectral sensors, in terms of their spectral and spatial resolutions, needs careful consideration and assessment within the context of the features being studied.