Introduction

The study of Palaeolithic settlement patterns has garnered substantial scholarly attention in recent times owing to its capacity to yield copious insights into past societies (e.g. Turrero et al. 2013; Burke et al. 2014; 2017; 2021b; García Moreno and Fano Martínez 2014; Ludwig et al. 2018; Wren and Burke 2019). Indeed, there are tools available that enable us to reconstruct ancient landscapes, with the aim of quantifying environmental factors that may have influenced settlement patterns in Palaeolithic hunter-gatherer societies. Among these tools are Geographic Information Systems (GIS) and spatial statistics, invaluable for analysing and quantifying data, thereby enabling us to draw insightful conclusions regarding the priorities of these societies when they inhabited particular locations (Bevan et al. 2013). The use of GIS and various statistical tools has provided a robust framework for analysing the settlement patterns. These methodologies, combined with a comprehensive set of biotic and abiotic variables, offer significant insights into the prehistoric occupation of the region. The potential of these tools and the analytical results obtained in this study highlight the importance of applying such methods in other study areas and across different chronological periods. This approach can yield valuable comparative data, enhancing our understanding of settlement dynamics in various contexts. Nevertheless, investigations of this nature are conspicuously scarce when considering the Northwestern region of the Iberian Peninsula, save for a limited number of preliminary inquiries (de Lombera Hermida et al. 2015; Díaz Rodríguez 2017; Díaz Rodríguez and Carrero Pazos 2019; Díaz-Rodríguez et al. 2021, 2023; Díaz-Rodríguez and Fábregas-Valcarce 2022).

In recent decades, a series of investigations have unveiled archaeological sites in the Northwestern region of the Iberian Peninsula that had hitherto eluded scholarly detection. Particularly during the latter stages of the Palaeolithic period and within specific areas, such as the Central Mountain ranges (Criado Boado et al. 1988, 1989a, 1991a, b) and Northern Mountain ranges (Ramil Soneira and Vázquez Varela 1983; Llana Rodríguez et al. 1992; López Cordeiro 2003). A theoretical model has been formulated regarding the settlement patterns of archaeological sites based on the data acquired from various sites within the previously mentioned regions (Cerqueiro Landín 1989; Criado Boado and Cerqueiro Landín 1991). However, the validation of this theoretical model remains outstanding, as contemporary analytical tools have yet to be employed to ascertain its conformity.

This article delves into an exploration of various environmental variables that could have shaped the settlement patterns of Upper Pleistocene sites within two distinct regions of the Northwestern Iberian Peninsula. A thorough analysis of 50 locational variables has been conducted to ascertain whether unique patterns of settlement emerge within the aforementioned regions or if a single overarching pattern prevails. It is plausible that specific variables, as proposed within the theoretical framework, hold significance in site selection. The identification of these variables offers valuable insights into the intricacies of hunter-gatherer interactions with their environment. The variables have been meticulously modelled using GIS and the outcomes derived from sites within each region have undergone thorough analysis and subsequent statistical comparison against values generated under randomized conditions. This analytical process is complemented by a statistical comparison with values generated under randomized conditions, with the primary aim of elucidating whether the choice of archaeological sites may be influenced by, or bear any relationship to, these variables. The complete methodological procedure employed in this study, spanning from the inception of the variables to their subsequent statistical analysis, is entirely replicable, in strict accordance with the principles of reproducible research (Marwick 2017; Marwick et al. 2017; Karoune and Plomp 2022). Moreover, the code and data, essential for this reproducibility, are readily accessible and can be found in the Data Availability section.

Regional setting

This study is focused on two distinct regions situated in the Northwest of the Iberian Peninsula (Fig. 1a): the Northern Mountain ranges and the Central Mountain ranges (Fig. 1b). The Northern Mountain ranges form a natural corridor that separates the northeastern coastal sector of Galicia from the inland Terra Chá basin, extending southwards into the current province of Lugo (Fig. 1c and Fig. 19 in the Supplementary Material). This area has undergone intensive archaeological study since the mid-1970s to the 1990s, resulting in the discovery of over 50 archaeological sites, primarily from the Upper Palaeolithic and Epipalaeolithic periods (Ramil Rego and Ramil Soneira 1996).

The second region encompasses the Galician Central Mountain Ranges (Fig. 1d and Fig. 20 in the Supplementary Material), with a notable concentration of archaeological sites in the O Bocelo mountain range and along the Furelos River. A research project conducted in the 1980s in this specific area aimed to explore its occupation history from the Palaeolithic era to medieval times, employing Landscape Archaeology methods. As a result, more than 80 archaeological sites have been identified in this region, primarily attributed to the Upper Palaeolithic and Epipalaeolithic periods (Cerqueiro Landín 1989; Criado Boado and Cerqueiro Landín 1991).

Fig. 1
figure 1

a) General location of Galicia region (red polygon). b) Northern Mountain ranges (area in green) with Upper Palaeolithic and Epipalaeolithic sites (red dots) and Central Mountain ranges (area in blue) with Upper Palaeolithic and Epipalaeolithic sites (red dots). c) Detail of Northern Mountain ranges area with sites (red dots). d) Detail of Central Mountain ranges area with sites (red dots)

Northern Mountain ranges

The study of the Upper Palaeolithic and Epipalaeolithic periods in Galicia finds its roots in the exploration of the Northern Mountain ranges. This era began to be systematically investigated during the 1970s and remained a focal point of research throughout the 1980s. These investigations introduced updated archaeological methodologies, systematic action plans and interdisciplinary collaboration aimed at addressing specific objectives and challenges. These efforts were primarily led by Medical Doctor J. Ramil Soneira, a passionate advocate for archaeological heritage, especially the Palaeolithic period. He collaborated with researchers associated with the University of Santiago de Compostela (USC) to initiate surveying activities in various municipalities of Lugo, an area that had hitherto been overlooked by scientific inquiry (Senín Fernández 1996, p. 34).

This early groundwork led to the formation of a research team in the mid-1980s. Their investigations extended across contemporary municipalities such as Muras, Vilalba, Xermade, the Serra do Xistral and the Arnela River Valley. The team devised comprehensive survey programs and archaeological excavation campaigns, supplemented by soil and palaeobotanical analyses (Llana Rodríguez et al. 1992; Martínez Cortizas and Moares Domínguez 1995; Ramil Rego and Fernández Rodríguez 1996).

The investigations carried out allowed the discovery of the first sites such as Pena Grande or Prado do Inferno (Alonso del Real and Vázquez Varela 1976) in the 1970s. The subsequent surge in archaeological activities revealed more sites, including Férvedes, Xestido 3 and the A Veiga and Piñeiro Flint Workshops, all unearthed in the early 1980s (Ramil Soneira and Vázquez Varela 1976). The initial phase of work in the region culminated with the excavation of the Férvedes II shelter (Ramil Soneira and Vázquez Varela 1983), where a stone pendant from the early Magdalenian period was uncovered. The pendant's decorative elements draw parallels with contemporaneous findings in the Cantabrian Region, such as La Paloma, Altamira, El Castillo and Balmori (Ramil Soneira and Vázquez Varela 1983; Villar Quinteiro 1997).

The mid-1980s marked the onset of a second phase of investigation, focused on reconstructing the paleoenvironment and characterizing human settlement in the Northern Mountain ranges. This phase involved an extensive survey and excavation initiative, resulting in the identification of over twenty archaeological sites (Ramil Rego 2014).

Research activities in the area were interrupted in 1994 (Ramil Rego and Ramil Soneira 1996), but investigations continued through large-scale public and private initiatives (López Cordeiro 2003). In the late 1990s, the implementation of the Galician Strategic Wind Plan posed a threat to the O Xistral Mountain range's integrity. Several companies commissioned projects to evaluate and mitigate the archaeological impact of wind farms, entrusting these tasks to GIARPA (Research Group within the Laboratory of Archaeology and Cultural Forms at USC). Unlike previous endeavours, this project sought not only to understand the Galician Palaeolithic in the area but also to minimize its impact.

Some companies contracted projects for the evaluation and correction of the archaeological impact of wind farms to GIARPA. The intervention strategy was carried out following an action program that included two action plans: an applied research plan and a basic research plan (López Cordeiro 2015).

In tandem with these archaeological investigations, typological studies of lithic industries recovered from specific sites in the region were conducted (Villar Quinteiro 1996, 1997, 2008). Valuable insights from these studies and findings unearthed during the 1970s to 1990s have recently come to light through the defence of two doctoral theses (Ramil Rego 2014; López Cordeiro 2015).

Central Mountain ranges

At the end of the 1980s, a research project was conducted in the O Bocelo mountain range and the Furelos River valley. This area was chosen for its favourable conditions for this work. Directed by the USC under the leadership of F. Criado Boado, this project embraced the principles of Landscape Archaeology. It aimed to achieve several objectives: illuminate poorly understood periods of Galician prehistory, define settlement characteristics from the final Upper Palaeolithic to the Middle Ages, ascertain the impact of the natural environment on human communities, reconstruct ecological changes in the natural environment throughout the Holocene, conduct a diachronic, historical and cultural analysis of the Galician landscape over the last 10000 years, identify key periods in the transformation of the Galician landscape, test a theoretical-interpretative model for spatial analysis and human positioning within it, perform an evaluative analysis of prehistoric periods in an area to assess the percentage and type of remains that were not located during extensive works and systematically study sites and cultural periods that were previously unknown to define their conditions and facilitate their discovery in future works.

The project's work plan spanned five years. The first two years (1987 and 1988) focused on fieldwork, including intensive surveying during the initial phase (from October 1987 to March 1988) and archaeological excavations in extension during the second phase (in July and August 1988). These activities targeted megalithic and Iron Age sites and involved surveys based on small pits in sites from the Palaeolithic/Epipalaeolithic, Chalcolithic and medieval periods. Subsequently, materials and data collected during these excavations were processed (from November 1988 to June 1989). The third phase (July and August 1989) extended the previous campaign, continuing with excavation in extension, surveys and intensive prospecting. Physical–chemical surveying methods were also incorporated to address the issue of archaeological sites without visible structures (Criado Boado et al. 1991a, b).

Since the project adopted a Landscape Archaeology perspective, the survey aimed to identify archaeological sites and included an environmental survey to gather geographical and ecological data complementing the archaeological information. Information from environmental documentation and physical–chemical samples of archaeological documentation were separated into distinct files, each specific to one of the two groups. During the survey, the concept of primary sites (referring to locations where remains occupy their original position, are identifiable and correspond to a specific site) and secondary sites (materials displaced from their primary position) was abandoned in favour of archaeological points (PA). PA referred to all locations in the workspace where archaeological material appeared, irrespective of whether it was found in situ or not. PAs had their dedicated files, but there was also a file for documenting environmental conditions (CA). In addition to the PA concept, the term 'dispersion area' (DISP) was introduced to describe specific sectors within a PA where concentrations of archaeological material or specific structures were abundant. Locating a PA inherently implied the existence of at least one DISP, even if more material concentrations were subsequently identified within that PA and were numbered and incorporated accordingly (Criado Boado et al. 1991a, b).

Specifically concerning the Palaeolithic period, the project aimed to define settlement patterns from the Late Glacial to the early Holocene (Criado Boado et al. 1988, 1989a, b; Cerqueiro Landín 1989; Criado Boado 1991; Criado Boado and Cerqueiro Landín 1991). To achieve this, a set of factors and criteria that could be controlled to study the landscape of these hunter-gatherer societies were established. These factors included absolute and relative altitude, proximity to wetland areas, visibility and proximity between sites, among others (Cerqueiro Landín 1989, pp. 50–57). For the first time in the Northwest of Iberia, variables that could influence the location of Palaeolithic sites were defined.

Upon completion of the project, it was concluded that a complex system of territorial organization revolved around wet and depressed areas. Sites were found to cluster around these areas (Cerqueiro Landín 1989, p. 48,85; Criado Boado et al. 1991a, b, p. 99). Three settlement patterns were proposed: 1) Sites situated in the lowest regions of the area, very close to wetland areas, with limited access. These sites had a visual domain over them but did not extend to more distant sites in the vicinity. 2) Another group was located at higher absolute and relative altitudes, farther from wetland areas, affording them a broad visual domain over a vast territory and distant sites, although not those in the immediate vicinity. 3) Finally, a smaller group comprised sites associated with streams that flowed into the valley. These sites had no direct relationship with wetland areas, given their distance and topographic location on the opposite side of the dividing lines that marked the depressed areas.

Material and methods

Data acquisition and software

The archaeological sites used in this study were sourced from the General Directorate of Cultural Heritage of the Xunta de Galicia (DXPCXG). This public entity maintains a comprehensive catalogue encompassing all the archaeological sites within the Galician autonomous community. Additionally, some site information was gleaned from publications arising from research projects associated with the aforementioned regions. It is important to note that the data employed for this study originates from diverse sources, lacking uniformity. Consequently, the information found within the DXPCXG catalogue and the research projects may differ based on their initial objectives. The DXPCXG catalogue primarily aims to preserve and protect archaeological evidence, while research projects focus on studying, understanding and disseminating this data.

This diversity in data sources has led to the identification of location inaccuracies in some coordinates within the DXPCXG catalogue. These inaccuracies result from the passage of information through various hands and the absence of clear criteria during data collection. Something as fundamental as establishing a universal coordinate system for all researchers to report their findings was not initially considered. This omission has necessitated additional work to reconcile the various coordinate systems used. As a prudent approach, it was decided to work with sites that had precise location data. Incorporating archaeological sites with coordinates varying by hundreds of meters could compromise the study's integrity. Therefore, sites present both in the DXPCXG catalogue and referenced in at least one publication derived from research and emergence projects in both study areas were selected.

While resolving the coordinate issue was relatively straightforward, the reduction in the number of usable sites posed a significant challenge. Additionally, various other challenges should be considered, although some remain cannot be controlled. Each mentioned project had distinct origins and objectives, influencing data collection methods. These projects were conducted in different decades when information and technology varied significantly. Therefore, the project in the Northern Mountain ranges, initiated as a personal endeavour and later joined by various entities, differs substantially from the project in the Central Mountain ranges. The latter project had specific objectives, a defined methodology and a predetermined duration. Moreover, preventive archaeology, driven by public works in Galicia, adapted well in the Central Mountain ranges, evolving into a multidisciplinary Landscape Archaeology project. Nevertheless, the primary focus in this region was not the Palaeolithic period. It was viewed as one element contributing to the understanding of landscape transformations by ancient societies. However, this diachronic perspective may have limited the information gathered, given the project's extensive chronological framework. It is logical to assume that in such a large-scale project, some information had to be sacrificed for the greater good. In this case, the methodology used for identification, while suitable for other periods, might not have been ideal for the Palaeolithic. Subsequently, researchers recognized this limitation when adapting the methodology to their subsequent work in Palaeolithic chronology within professional archaeology (López Cordeiro 2015).

Despite these differences, there were some similarities in data collection methodologies, particularly regarding surveying, which became essential for locating remains of hunter-gatherer communities. Unlike vestiges from other eras, these sites cannot be identified through other means. However, archaeological surveying posed challenges due to inaccessible areas and the tendency of lithic tools, the primary remains from this period, to be buried. Their appearance is often linked to areas where land was manipulated for agricultural purposes, further complicating their discovery. In any case, archaeological surveys in both areas began with a preliminary identification of potential sites based on the locations of shelters, which may have led to the settlement of prehistoric communities. Consequently, the presence of biases in data collection is acknowledged and should be taken into account. Nevertheless, it is assumed that the same bias exists in both areas since the methodology for locating sites was similar, centred on archaeological surveys. Therefore, this bias should not significantly affect the comparison of results between the two areas.

The identification of Palaeolithic sites is associated with the recognition of scattered lithic industry remains across the landscape. This presents challenges in determining the functional category of these sites because reaching the extent of the archaeological sites is often difficult. Material dispersion could result from sedimentological or edaphic effects, extensive occupation of space with unknown boundaries, or repeated occupation. While the position and orientation of the pieces provide insights into their origin, their extent remains uncertain unless extensive excavation is conducted. To address this issue, the project in the Central Mountain ranges established the categories of PA (archaeological site) and DISP (dispersion of materials). However, this criterion was not applied to the other study area. Consequently, the diverse origins of data from both projects could influence the analysis of settlement patterns. For instance, one PA may contain multiple DISPs. To mitigate this, an attempt has been made to unify the archaeological points in both areas, maintaining the site category. In cases where a PA contains several DISPs, each is treated as a distinct site, unless their proximity is so close that differentiation becomes impractical.

Another challenge arises from the fact that some sites discovered during archaeological surveys have been categorized as archaeological sites or PAs with varying quantities of pieces, ranging from fewer than five to over 100. In the Central Mountain ranges, the average number of lithic pieces found at each site is approximately 10. Excavated sites, however, exhibited significantly larger quantities of lithic remains: PA 154.1 with 525 pieces, PA 149.1 with 78 remains, PA 74.1 with 350 lithic pieces and PA 69.1 with 200 remains. While exact data on piece counts for sites in the Northern Mountain ranges are unavailable, the overall pattern in both areas is similar. Non-excavated sites typically contain a handful to a few dozen pieces, while excavated sites feature more than a hundred pieces. It is important to note that an archaeological site with only three lithic tools does not hold the same significance as one with hundreds of artifacts. However, this determination can only be confirmed through archaeological intervention. Nevertheless, the mere presence of more lithic remains in one site compared to another may suggest different functionalities. While there was some uniformity in conducting archaeological surveys at sites with higher concentrations of pieces and suitable topography for archaeological level preservation, it should be noted that the same treatment was applied to the archaeological points during analysis. Without this approach, there would be very few archaeological sites to work with. In essence, establishing a threshold, such as more than one hundred pieces, would exclude most sites from the analysis, rendering it unfeasible. While this challenge lacks a feasible solution, it remains important to consider when analysing the data as a whole and within each study area.

In the Northern Mountain ranges, the study includes 34 archaeological sites, categorized into two types: sheltered sites (N = 23) and open-air sites (N = 11). These sites span various chronocultural phases, including Epipalaeolithic (N = 26), Lower-Middle Magdalenian (N = 5), Azilian (N = 2) and Magdalenian (N = 1) (see Table 1 in the Supplementary Material). In the Central Mountain ranges, 61 archaeological sites have been included, similarly divided into sheltered sites (N = 31) and open-air sites (N = 30). However, specific chronocultural information for these sites is lacking. According to the literature, these archaeological sites are broadly categorized under the Final Upper Palaeolithic/Epipalaeolithic period (refer to Table 2 in the Supplementary Material).

The chronological attributions of each site have been based on previous works and research projects conducted in both study areas. The concerns regarding the detailed characterization of the materials from these sites are acknowledged. However, comprehensive studies of the recovered materials do not exist, and it was not feasible to conduct a thorough review of these materials for this work. Furthermore, the primary aim of this study was not to reassess the material culture but to analyse settlement patterns. The inherent complexity and partial characterization of the archaeological record are acknowledged, and future research should aim to address these gaps by conducting detailed material analyses.

Spatial data analysis was conducted using various software applications designed for Geographic Information Systems (GIS). The chosen coordinate system was EPSG: 25829 (ETRS89 / UTM zone 29N). GRASS GIS has been used in versions 6.4.3, 7.0.2 and 7.0.4 (Grass Development Team 2020). Quantum GIS (versions 2.8.1 and 2.10.1) (QGIS.org 2021) and SAGA GIS (version 2.2.1) (Conrad et al. 2015) have also been used. The latest GIS software employed has been ArcGIS 10.3 (USC license) (Esri 2011). It is the only one that does not share the GNU-GPI license. Finally, to carry out the different analytical approaches, R version 4.0.5 was used, with the R Studio graphical interface (R Core Team 2021) and the requisite packages for conducting the various analyses (Table 1).

Table 1 Synthesis of R packages used, authors and application details

The Digital Elevation Model (DEM) has been used as a base map to elaborate the different locational variables analysed in this paper. This DEM has been obtained from the National Centre for Geographic Information (CNIG) and has a resolution of 25 m. It is cartography that collects the information obtained from the photogrammetric and LiDAR flights of the National Plan for Aerial Orthophotography (PNOA) (http://centrodedescargas.cnig.es/CentroDescargas/index.jsp). Additionally, the Geologic Map was accessed and retrieved from the Spanish Government's online repository (López Olmedo et al. 2022).

Spatial distribution of sites

The initial step in conducting spatial analysis of archaeological sites in both areas involved an assessment of whether Complete Spatial Randomness (CSR) was present. To achieve this, a random sample of points, equating to the same number of points as the archaeological dataset, was generated for the purpose of comparison. Subsequently, the distribution of both datasets was examined to determine if they could be considered as originating from the same population, signifying the absence of significant differences between the two datasets and thereby precluding the rejection of CSR. To conduct this analysis, it was used the UTM X variable, representing the x-coordinate of each point and performed normality tests using the Shapiro–Wilk test. It was also assessed whether both datasets belonged to the same population using the Kolmogorov–Smirnov (K-S) test. Furthermore, Ripley's K functions and their L and G variants were employed (Bivand et al. 2013). Both homogeneous and inhomogeneous K, L and G functions were computed (Baddeley et al. 2015), utilizing a confidence interval derived from Monte Carlo simulations (n = 99).

Definition of covariates

In preparation for the statistical analysis, a set of covariates was meticulously chosen, building upon insights gleaned from prior research conducted in the study areas and analogous regions. The total number of variables selected for inclusion stands at 50. These variables can be categorized into three primary classes of influencing factors: abiotic, biotic and other determinant factors. These, in turn, have been further grouped into overarching variables such as altitude, slope, hydrology, or geology, among others. Subsequently, the covariates utilized in this study will be described below. A more comprehensive explanation of the process employed to obtain each covariate is available in the Supplementary Material (SM) file.

Altitude

Altitude refers to the elevation calculated with reference to specific data points, often sea level for absolute altitude or, alternatively, the base of a valley for relative altitude. In the context of defining the occupation patterns of Galician Palaeolithic sites, altitude has been one of the key factors considered. The prevailing notion suggests that archaeological sites tend to be situated at higher elevations in the landscape (Ramil Rego 1989/1990, p. 194; Fábregas Valcarce et al. 2010, p. 267; de Lombera Hermida et al., 2015, p. 285). Absolute altitude (ALTA) represents a general variable, from which other derived variables such as ALTm (Table 2) have been computed. ALTm corresponds to the average elevation within the four adjacent cells surrounding each site. This variable is particularly valuable for gaining an overview of the site's environmental context, especially in the case of open-air sites where materials may not necessarily be in situ.

In addition to the absolute altitude, the relative altitude will be considered. On the one hand it is going to calculate the topographic prominence, which has been defined, in a GIS environment, by M. Llobera (2001) as “the function of the differential height between an individual and the environment as it is perceived from the point of view of the individual in question." The focus of this study builds upon prior research that underscores the significance of topographic prominence as a locational variable, particularly in contexts such as Early Prehistory, Protohistory and the Iron Age (Carrero Pazos 2017; Cazorla Martín et al. 2008; Cerrillo Cuenca 2011; De Reu 2012; De Reu et al. 2013, 2011; Fábrega Álvarez 2004; Parcero Oubiña and Fábrega Álvarez 2006; among others).

In the specialize literature, it has been considered that Palaeolithic sites would be acting as landmarks in the landscape. These would be reference points that would stand out from the surrounding terrain and would be visible at a certain distance (Fábregas Valcarce and de Lombera Hermida 2010). It has also been considered that there was intervisibility between these sites (López Cordeiro 2002, 2004a, 2015). However, as the contemporaneity of these archaeological sites cannot be confirmed, intervisibility has not been analysed into the present study.

Table 2 Covariates used related with altitude, conditioning type, variables, acronym, description and ID number

For modelling topographic prominence, the Topographic Prominence Index (TPI) obtained from SAGA GIS software was employed. Following the recommendations of domain experts (Nakoinz and Knitter 2016) and based on a prior work in the Northern Mountain ranges (Díaz Rodríguez and Carrero Pazos 2019), TPI was calculated at three different radii (100, 500 and 1000 m) (TPI100, TPI500 and TPI1000). Additionally, TPI was computed considering the average value of the four adjacent cells for each site at different radii (TPI100m, TPI500m and TPI1000m).

Finally, the relative altitude has been considered. This variable is deemed crucial for the analysis of the strategic potential of Palaeolithic sites as it quantifies precise positioning and elucidates whether a site occupies the highest terrain within a designated radius. While some studies have calculated the relative altitude index using a fixed radius of 1000 m from the archaeological site (Marcos Sáiz 2006, pp. 49–50), this paper adapted this analysis by introducing a limit determined by travel cost distance rather than Euclidean distance. For this purpose, a 20' isochrone was used, which corresponds to a radius of 1000 m. Two indices were computed: the maximum relative altitude index (ALTrA) and the minimum relative altitude index (ALTrB). Both variables are complementary. The value of the first variable indicates whether the site is located in the highest part of the established environment or not. The values range from 0 to 1, with 0 being the lowest area and 1 being the highest. The closer its value is to 1, the more it stands out in that territory. The combination of the two variables can indicate whether the site is located in a very homogeneous spatial area. It could be the case that the same archaeological site has the highest index in both variables and although it may seem contradictory, it would indicate that, within the defined area, it would be located in a completely flat area, where the absolute height of the site would be very similar to the maximum and minimum relative height. This could reveal whether there are significant differences or not.

Slope

Slope represents the maximum degree of elevation change at a specific location and is derived from the DEM. This covariate has been a subject of consideration in prior studies of the Galician Palaeolithic (de Lombera Hermida et al. 2015, p. 280) as well as in other regions of the Iberian Peninsula, such as the Cantabrian (García Moreno 2010), Asturian (Fernández Fernández 2010) and the Sierra de Atapuerca (Marcos Sáiz 2006).

Similar to altitude, slope serves as a general variable that can be employed to calculate various derived variables (refer to Table 3). In this study, the absolute slope (SLO) and the average slope of the terrain within the four cells adjacent to each site (SLOm) have been included. Calculating slope in neighbouring cells can be closely associated with site accessibility.

Table 3 Covariates used related with slope, conditioning type, variables, acronym, description and ID number

Moreover, a set of slope indices has been employed concerning the site's surroundings (refer to Table 3). These indices, originally defined by F. J. Marcos Sáiz (2006) and adapted to this study, include the Geomorphological slope area index (SLOga), the Theoretical slope index (SLOt), the Steepest true slope index (SLOst) and the Plateau index (SLOpi). Finally, another relevant variable linked to slope is accessibility. Accessibility, in this context, refers to the conditions provided by a surface for movement from a specific point, taking into account both distance and surface characteristics (Fábrega Álvarez 2004, p. 16). It has been considered to evaluate the proximity to resources and the defensive potential of archaeological sites (for further details on these variables, please refer to the SM file).

Aspect

Aspect has been recognized as a significant covariate for determining the location of archaeological sites (de Lombera Hermida et al. 2015, p. 289). In the Northern Mountain ranges, it has been observed that the majority of sites are oriented toward the second and third quadrants (Ramil Rego and Ramil Soneira 1996). Aspect data has been derived from the DEM and the aspect in the cell where each site is situated (ASP) has been considered, as well as the average aspect of the four cells surrounding each site (ASPm) (refer to Table 4).

Table 4 Covariates used related with aspect, conditioning type, variables, acronym, description and ID number

Hydrology

Hydrology is another critical variable in this study. Previous research has suggested a strong association between Palaeolithic sites and watercourses (Ramil Rego 1989/1990, p. 193; Villar Quinteiro 1996; Fábregas Valcarce and de Lombera Hermida, 2010, p. 267). Additionally, a connection between archaeological sites and wetland areas, which could have served as attracting locations for hunting resources, has also been considered (Criado Boado et al. 1991a, b; de Lombera Hermida et al., 2015; López Cordeiro, 2002, p. 72).

However, the current hydrology map shows evidence of human-induced alterations. Therefore, this paper proposes to analyse a potential hydrographic network derived from theoretical points that would topographically be more likely to have served as water accumulation sites. This methodology has been used and explained in other works (García García 2015; Díaz-Rodríguez et al. 2023). The process of obtaining this potential hydrology map is detailed further in the SM file.

From the potential hydrology, various variables have been derived for analysis (refer to Table 5). Site proximity to potential hydrology has been measured from all points within each study area to the nearest watercourse, both in terms of distance (HYDROE) and displacement cost time (HYDROC). Although straight-line distance is an idealized measure, it has been utilized in the literature on settlement patterns. Therefore, it is included to determine its effectiveness as a variable. Proximity to hydrology has also been considered by examining the four cells surrounding each site in a straight line (HYDROEm) and displacement cost time (HYDROCm).

Table 5 Covariates used related with hydrology, ID number, conditioning type, variables, acronym and description

As mentioned earlier, previous studies have considered proximity to wetland areas and visual control over these areas (Criado Boado et al. 1991a, b; López Cordeiro 2002, p. 72; de Lombera Hermida et al. 2015; López Cordeiro 2015). Wetland areas can be defined as regions with accumulated water points. The SAGA GIS software was used to model this variable, particularly the Topographic Wetness Index (TWI), which indicates the topographic humidity index in each map cell. After obtaining the map and identifying areas with higher humidity values (considered most interesting), quartiles were calculated. This retained map cells with values above the third quartile. It is important to note that TWI values are substantially elevated in cells where rivers coincide, but these cells are not of interest. Thus, the hydrological map, created earlier, was subtracted from the TWI polygon map to retain higher humidity values in areas not intersected by rivers. Subsequently, displacement cost time was calculated from every point in the study area to the nearby wetland areas identified as points (WET). The cost of displacement to nearby wetland areas was also calculated by considering the average of the four neighbouring cells for each site (WETm).

Lastly, a variable based on the visibility of wetland areas from each site (WETv) was included. This variable was obtained by identifying cells coinciding with wetland areas visible from each archaeological site. The visible surface area for each site was calculated in hectares (ha).

Geology

Another variable that was considered in the present study is potential geology. For the Palaeolithic hunter-gatherers it was important to have raw materials such as quartzite or quartz, since these are the raw materials used by these societies in this region of NW Iberia (Llana Rodríguez 1990; Villar Quinteiro 1997). These resources could be obtained from the fluvial courses or by going to collect raw materials from the veins where this material was found. Geology has been identified as an important variable in previous studies (Ramil Rego 1989/1990; Villar Quinteiro 1996; López Cordeiro, 2002; 2015; Fábregas Valcarce and de Lombera Hermida, 2010; de Lombera Hermida et al., 2012). The geology map has been obtained from the Mining Geological Institute (IGME). On this map, those areas that could contain raw materials of interest to Palaeolithic hunter-gatherers have been identified. The cells of interest have been selected and divided into points at established radii. Subsequently, the cost of moving, in time (GEOLC) and distance (GEOLE), from the rest of the cells in the study area to the closest potential geology points has been calculated (see SM file for further details).

Due to the acidic nature of Galician soils, faunistic remains that would provide taphonomic information are typically only preserved in the limestone cavities of northeastern Galicia. The study areas analysed in this work consist solely of rock shelters and open-air sites, which do not preserve faunistic remains. Consequently, faunistic data are not available for these areas.

Additionally, paleoenvironmental information is very limited because few sites have been excavated and studied in detail. While the IGME geological map lacks detailed archaeological information, it has been used in this study as a proxy to consider potential geological resources. This map is referred to as potential geology throughout the manuscript and no definitive assumptions are made regarding the use of quartz or quartzite as raw material based solely on this data.

Table 6 Covariates used related with geology, ID number, conditioning type, variables, acronym and description

In addition to the maps of travel cost in time and distance, other variables related to geology have been obtained (Table 6). The average of the four cells adjacent to each archaeological site has been calculated for both variables (GEOLEm and GEOLCm). Finally, the visible potential geological surface, in ha, from each site (GEOLV) has been quantified.

CPFPC

Within the context of biotic conditioning, the variable related to travel costs to potential hunting areas was considered. The Central Place Foraging Prey Choice (CPFPC) model, originally proposed by M. Cannon (2003), which is rooted in foraging theory was applied. This model was previously utilized by Marín Arroyo to investigate mobility patterns and territorial control in the eastern Cantabrian region, focusing on deer and goats, the most prominent species in the Magdalenian diet (Marín Arroyo 2008, 2009).

In this study, potential goat hunting areas are analysed and a covariate was created based on travel cost in time from any point within the study area to these prospective goat and deer hunting zones. To maximize productivity, a maximum travel time threshold of 2.15 h was set for deer hunting, while a 1.2-h threshold was established for goat hunting. The 1.2-h and 2.15-h isochrones were calculated for each site and within these isochrones, the slope map was used to identify cells with slopes greater than 30º for goats and less than 30º for deer (refer to the SM file for further details on the calculation process).

Table 7 Covariates used related with CPFPC, conditioning type, variables, acronym, description and ID number

To assess the significance of this conditional factor, the total area of each site in ha that represents potential hunting areas for goats (CPFPCGs) and deer (CPFPCDs) was calculated. Additionally, the slope map was used to determine the cost of travel from each site to reach the potential catchment areas for goats (CPFPCGc) and deer (CPFPCDc) (Table 7).

Visibility

Various types of conditioning factors have been considered in this study, including visibility, which has been a critical aspect in defining the occupation of archaeological sites in prior research (López Cordeiro, 2002; 2004; 2015; Rodríguez Álvarez et al. 2008; Fábregas Valcarce and de Lombera Hermida, 2010; de Lombera-Hermida et al. 2011; de Lombera Hermida et al., 2015). Different approaches related to visibility have been used in the present study. Firstly, it was calculated the number of cells visible from each archaeological site (VISC) and the number of cells in the study area from which each site is visible (VISZ). Additionally, it was explored visual prominence, which entails computing the points that are visible from specific locations. To achieve this, it was used an observer height of 1.75 m. The outcome of this calculation is a raster file in which cells are assigned values indicating the number of cells visible from each location (VISPR) (refer to the SM file for further methodological details). Furthermore, the average value of visibility from the four adjacent cells for each site (VISPRm) was obtained (Table 8).

Table 8 Covariates used related with visibility, conditioning type, variables, acronym, description and ID number

Potential least cost path

Another significant conditional factor considered in this study is the calculation of potential least cost paths (LCP). These paths are a representation of the relationships between archaeological sites and movement within the surrounding landscape, essentially indicating the routes of least energy or time cost between two points.

Table 9 Covariates used related with least cost path, conditioning type, variables, acronym, description and ID number

In previous literature, natural transit routes, often referred to as royal roads, have been explored in relation to Palaeolithic sites. These routes have been identified as a key variable influencing the distribution of Palaeolithic sites in the Northwestern Iberia (Ramil Rego and Ramil Soneira 1996, p. 125; Fábregas Valcarce and de Lombera Hermida 2010, p. 267; de Lombera Hermida et al. 2015, p. 289; López Cordeiro 2015, p. 301; Díaz Rodríguez 2017).

In this study, it has been computed transit routes within a specific area without considering the archaeological sites themselves. To perform this analysis, it is necessary to have starting and ending points. This approach is inspired by a previous work that utilized the From Everywhere to Everywhere (FETE) methodology, which calculates optimal routes between all points in the landscape (White and Barber 2012). In FETE, every point in a grid serves as both a starting and ending point, enabling comprehensive route calculations across the entire territory. However, this method demands substantial computational power. As an alternative, a simplified model based on a methodology from another study was adopted (Rodríguez Rellán and Fábregas Valcarce 2015). This simplified approach has been described in greater detail elsewhere (Díaz Rodríguez 2017; Díaz-Rodríguez et al. 2021, 2023). In brief, it involves dividing the study area's boundary into points spaced at a predetermined radius and calculating LCPs between them. It was considered one point as the starting point and the others as stopping points, repeating this analysis with all points. Subsequently, the travel cost in minutes from each site to the nearest route (LCPC) was determined. Additionally, it was calculated the average value for the four neighbouring cells adjacent to each site (LCPCm) (Table 9).

Potential insolation

Insolation, or the exposure to sunlight, has been a consideration in the context of Northwestern Palaeolithic sites. It has been suggested that these sites were often situated on west-facing slopes to maximize the heat from the sun's rays (Ramil Rego and Ramil Soneira 1996, p. 125). While insolation has not been extensively discussed in the literature regarding this region, it has been a factor in the selection of occupation sites in other parts of the Iberian Peninsula. For instance, some studies have examined its influence on site selection in the Asón River Valley area of Cantabria (García Moreno 2008, 2015).

In this study, insolation data was obtained using SAGA GIS and the Potential Incoming Solar Radiation module (Conrad et al. 2015). Several insolation-related variables were calculated for each site, including the received potential solar radiation (TOTINS), the direct solar radiation received in the cell where each site is located (DIRINS) and the diffuse solar radiation received in the cell of each site (DIFINS). Additionally, values for these variables were obtained for the four neighbouring cells adjacent to each site (TOTINSm, DIRINSm and DIFINSm) (Table 10).

Table 10 Covariates used related with the potential insolation, conditioning type, variables, acronym, description and ID number

Potential wind

The final conditioning factor considered in the analysis is related to potential wind exposure. The literature suggests that protection from prevailing winds may also be a factor in the selection of occupation sites by hunter-gatherer societies (Villar Quinteiro 1996; García Moreno 2010; de Lombera Hermida et al. 2015, p. 290). To quantify this variable, the wind exposure index was obtained using the Wind Effect Index from the SAGA GIS module (Böhner and Antonić 2009; Conrad et al. 2015) for each site (WIND). Additionally, the average value of this index for the four neighbouring cells adjacent to each site was calculated (WINDm) (Table 11).

Table 11 Covariates used related with the potential wind, ID number, conditioning type, variables, acronym and description

Results

Complete spatial randomness

With the aim of analysing Complete Spatial Randomness (CSR), it was first assessed whether the distribution of archaeological sites in both areas follows a normal distribution. This was done by choosing UTMX as the variable for this comparison between the actual site data and randomly generated points. The Shapiro–Wilk test was employed for this purpose and the results indicate a lack of normality in both areas. Specifically, the p-value was less than 0.05 in both the Northern Mountain ranges (W = 0.63203, p-value = 5.181e−06) and the Central Mountain ranges (W = 0.95885, p-value = 0.03876) (Table 12).

Furthermore, when comparing the archaeological sample with a randomly generated sample, by performing a K-S Test, it revealed that the two samples belong to different populations in both the Northern Mountain ranges (D = 0.58824, p-value = 1.555e−05) and the Central Mountain ranges (D = 0.42623, p-value = 3.077e−05). In the K, L and G homogenous functions graphs (see Fig. 2 and Fig. 3), the black line does not closely align with the confidence interval, indicating that H0 (the null hypothesis) can be rejected in favour of H1 (the alternative hypothesis), as CSR is not supported. In the inhomogeneous K and L functions, it is evident that site clustering occurs up to 4 km in the case of sites in the Northern Mountain ranges and up to 6 km for sites in the Central Mountain ranges.

Table 12 Results for the statistical tests applied in both areas
Fig. 2
figure 2

K, L and G Functions for the Northern Mountain ranges (a-f). The envelope in grey represents the confidence interval of the test, which is defined by the lowest (Kl0(r)) and highest (Khi(r)) values. The flashing red line shows the theoretical mean of the results of the 99 random simulations (Ktheo(r)). The black line (Kobs(r)) marks the mean of the results for the sample of archaeological sites

Fig. 3
figure 3

K, L and G Functions for the Central Mountain ranges (a-f). The envelope in grey represents the confidence interval of the test, which is defined by the lowest (Kl0(r)) and highest (Khi(r)) values. The flashing red line shows the theoretical mean of the results of the 99 random simulations (Ktheo(r)). The black line (Kobs(r)) marks the mean of the results for the sample of archaeological sites

Evaluation of locational variables

This section includes an evaluation of the variables presented above. Some types of statistical analysis have been conducted based on the nature of each variable. In other words, for those variables that have been modelled, resulting in a raster map, they were analysed by comparing the values of archaeological sites with random values for each study area. For absolute variables, the values were compared with 999 random samples and resampling density plots were created, following previous approaches (Bocinsky 2017; Cascalheira et al. 2022).

The results for average variables obtained from the raster layers (mean value of the 4 cells adjacent to the site) are presented for each zone and compared with the results of the random points using box plots and statistical analysis. An equal number of random points was generated as archaeological sites in each of the study areas and the results were compared for each variable to statistically evaluate the mean variables defined in the Material and Methods section. In cases where normality exists, the Shapiro–Wilk test was applied to check normality and Fisher's F test was used to check homoscedasticity. The T-Student test was also used when the variances were equal and the Welch test when the variances were different. The Mann–Whitney-Wilcoxon test was used for those samples that did not present normality.

Subsequently, both zones are compared and for this purpose, all the variables are analysed, including the results of specific variables that have not been obtained from a raster file but through calculation. The aim is to identify trends by presenting the results for each area individually and then comparing the results of the two areas together.

Northern area

There are 19 variables associated with raster data that have been assessed both graphically and statistically. For the variable ALTA (Fig. 4a), it is evident that the spatial distribution of archaeological sites deviates from what would be expected under random conditions. This deviation is supported by the Mann–Whitney U test, which returned a p-value < 0.05 (Table 13). Other variables like TPI100, SLO and HYDROE (Fig. 4b-d) also show deviations in the values of archaeological sites from random expectations, though statistical analysis with the Mann–Whitney U test did not yield significant differences (Table 13).

Fig. 4
figure 4

Kernel density plots comparing archaeological sites (red line) and randomly resampled background points (black line) for different variables. The grey areas represent the 95% confidence intervals calculated from the resampling of 999 random samples

Among the studied variables analysed using this method, some display visually distinct spatial distributions for archaeological sites compared to random conditions, yet they do not exhibit statistically significant differences (Table 13). This is observed in the cases of WET, GEOLE, GEOLC and VISPR (Fig. 5), as well as LCPC, WIND and CPFPCGc (Fig. 6).

Fig. 5
figure 5

Kernel density plots comparing archaeological sites (red line) and randomly resampled background points (black line) for different variables. The grey areas represent the 95% confidence intervals calculated from the resampling of 999 random samples

Table 13 Mann–Whitney U test results for the comparison between archaeological sites and randomly resampled background points in Northern area

Other variables studied showed neither visual nor statistical distinctions when comparing the archaeological sample to random simulations. This is correct for variables like TPI500, TPI1000 and HYDROC (Fig. 11 in the SM). Similarly, locational variables like TOTINS and DIRINS (Fig. 12a-b in the SM) demonstrated this absence of visual and statistical differences. The variable ASP, on the other hand, did not exhibit apparent visual differences (Fig. 11c in the SM) but did show statistical significance with a p-value of 0.046 (Table 13). Similar observations were made for the variable DIFINS, which displayed no clear visual differences (Fig. 12c in the SM) but did exhibit statistical significance with a p-value of 0.008 (Table 13). However, it is essential to note that the variable CPFPCDc presented significant differences both visually and statistically. However, caution is needed when interpreting this variable as all archaeological sites have a value of 0, rendering it unusable (Fig. 12d in the SM and Table 13).

Fig. 6
figure 6

Kernel density plots comparing archaeological sites (red line) and randomly resampled background points (black line) for different variables. The grey areas represent the 95% confidence intervals calculated from the resampling of 999 random samples

For the variable ALTm, the archaeological sites exhibit a minimum altitude of 451 m, while the random sites register a minimum of 261 m. The median altitude for the archaeological sites is 688.5 m and for the random sites, it is 527 m. The highest recorded altitude for archaeological sites is 758 m, whereas for random sites, it is 734 m. Notably, the majority of archaeological site altitudes cluster between 600 and 750 m, as depicted in Fig. 7a. The Shapiro–Wilk test for normality indicates that the distributions of both datasets are non-normal (p-value = 0.04185). Consequently, a non-parametric Mann–Whitney-Wilcoxon test was employed to examine homoscedasticity, which confirmed differing variances (p-value = 0.00001762). Subsequently, the T Welch test supported the assertion that both sets of data originate from distinct populations (p-value = 0.00001234). The results of the statistical analyses for this variable and for the following variables studied in this area can be checked in Table 3 and Table 4 in the SM file.

In the case of TPI100m, the minimum values for archaeological sites and random points are -4.32 and 9.5, respectively. The median TPI100m value for archaeological sites is 1.05, whereas for random points, it is 0.52. The maximum TPI100m value recorded for archaeological sites is 8.36 and for random points, it is 9.40. Notably, both datasets predominantly exhibit values clustered around 0, as represented in Fig. 7b. The Shapiro–Wilk normality test indicated non-normal distributions for both datasets (p-value = 0.008938). Regarding homoscedasticity, the Fisher's F-test suggested that the variances of both datasets differ insignificantly (p-value = 0.468). Consequently, a Student's t-test was conducted to evaluate differences in means, which concluded that both datasets are drawn from the same population, thereby not displaying statistically significant distinctions (p-value = 0.1943).

Fig. 7
figure 7

Boxplots comparing the results of archaeological sites with random sites for different variables in the Northern area

For the TPI500m variable, archaeological sites have a minimum value of -3.88, whereas random points show a minimum value of -47.89. The median TPI500m value for archaeological sites is 0.49, while for random points, it is -1.21. The maximum TPI500m values recorded are 10.55 for archaeological sites and 39.96 for random points. Both datasets exhibit a concentration of values around 0, as illustrated in Fig. 7c. The Shapiro–Wilk normality test confirmed normal distributions for both datasets (p-value = 0.09752). Consequently, the Fisher's F-test was utilized to assess homoscedasticity, indicating similar variances (p-value = 0.463). Subsequently, a Student's t-test was employed to compare means, revealing that both datasets belong to the same population, with no statistically significant differences (p-value = 0.5105).

In the case of the TPI1000m variable, archaeological sites display a minimum value of -3.88, while random points exhibit a minimum value of -111.29. The median TPI1000m value for archaeological sites is 0.49 and for random points, it is -3.68. The maximum TPI1000m values are 10.55 for archaeological sites and 57.96 for random points. Similar to the previous variable, both datasets demonstrate a concentration of values near 0, as presented in Fig. 7d. The Shapiro–Wilk test for normality confirms non-normal distributions for both datasets (p-value = 0.02265). The Mann–Whitney-Wilcoxon test, assessing homoscedasticity, concludes that both datasets have similar variances (p-value = 0.7839), while the Student's t-test establishes that they belong to the same population, without statistically significant differences (p-value = 0.5793).

Regarding the SLOm variable, the minimum slope value for archaeological sites is 0.07, whereas for random points, it is 0.22. The median slope value for archaeological sites is 9.96 and for random sites, it is 7.07. The maximum slope value for archaeological points is 25.25 and for random points, it is 27.10. The dispersion of values for archaeological sites spans between 5 and 15, which is slightly broader than the range for random points, concentrated between 4 and 11, as visualized in Fig. 7e. Statistical analysis indicates that both datasets do not adhere to a normal distribution (p-value = 0.00005112). The Mann–Whitney-Wilcoxon test establishes that both datasets possess similar variances (p-value = 0.3567) and the Student's t-test affirms that they originate from the same population, without statistically significant differences (p-value = 0.3596).

The variable ASPm showcases archaeological sites with a minimum aspect value of 11.34, whereas random points exhibit a minimum value of 13.01. The median aspect value for archaeological sites is 229.45, while for random points, it is 159.44. The maximum aspect value recorded for archaeological sites is 330.94 and for random points, it is 351.37. The distribution of values for archaeological sites primarily ranges from 150 to 250, while random points cluster between 100 and 240, as depicted in Fig. 7f. The Shapiro–Wilk test confirms normality for the datasets (p-value = 0.1197). Regarding homoscedasticity, the Fisher's F-test indicates similar variances (p-value = 0.5768). Subsequently, the Student's t-test affirms that both datasets belong to the same population (p-value = 0.02261).

For the HYDROEm variable, the minimum value for archaeological sites is 34.42, whereas random points display a minimum value of 13.94. The median hydrological exposure value for archaeological sites is 687.89 and for random points, it is 627.03. The maximum value recorded for archaeological sites is 1783.49 and for random points, it is 2100.81. Both datasets illustrate a similar trend, oscillating between values of 250 and 1200, as visualized in Fig. 7g. Statistical analyses confirm that neither dataset conforms to a normal distribution (p-value = 0.003188). The Mann–Whitney-Wilcoxon test suggests that both datasets possess similar variances (p-value = 0.9079) and the Student's t-test establishes that they emanate from the same population, with no statistically significant distinctions (p-value = 0.9887).

Another variable analysed is HYDROCm, for which the minimum values for both samples are 3.36 for archaeological sites and 3.27 for random points. As for the median, archaeological sites have 42.86, while random points have 44.21. Finally, the maximum value for archaeological sites is 99.44 and for random points, it is 118. Similar to the previous variable, the trend in both datasets is comparable, with values ranging from 15 to 70, as represented in Fig. 7h. The Shapiro–Wilk test indicated that there is no normality among the data (p-value = 0.002521). The Mann–Whitney-Wilcoxon test showed that both samples have the same variance (p-value = 0.8407) and the Student's t-test confirmed that both belong to the same population (p-value = 0.8503).

The WETm variable showed a minimum value of 0.11 for archaeological sites and 0 for random points. The median for archaeological sites is 13.61 and for random points, it is 18.73. Finally, the maximum value for archaeological sites is 44.04 and for random points, it is 85.16. Both datasets showcase a similar dispersion of values, concentrating between 5 and 20, as visualized in Fig. 7i. Regarding the statistical analyses conducted, it was confirmed that there is no normality (p-value = 0.000000006833), that the data has the same variance (p-value = 0.538) and that there are no significant differences according to the Student's t-test (p-value = 0.1682).

Another variable studied is GEOLEm. In this case, the minimum value for the archaeological sample and the random sample is 0. The median for archaeological sites is 802.42 and for random points, it is 1526.5. Finally, the maximum value for archaeological sites is 2975.79 and for random points, it is 4861.4. The trend for archaeological points oscillates between 0 and 1500, while random points exhibit a wider concentration, ranging from 500 to 2300, as shown in Fig. 8a. The Shapiro–Wilk statistical test confirmed that there is no normality (p-value = 0.000006409) and the Student's t-test indicated that the variables from both samples are different (p-value = 0.01154). Finally, the T Welch test confirmed that they belong to different populations and therefore exhibit statistically significant differences (p-value = 0).

In the case of the GEOLCm variable, the minimum values of both samples are found at 0. However, the median for archaeological sites is at 25.03, whereas for random points, it is at 61.01. The maximum value for archaeological sites is 152.54 and for random points, it is 240.52. The data trends for both samples are similar to what was observed in the previous variable, but in this case, the values for archaeological sites range between 0 and 80, while for random points, they are concentrated between 25 and 150, as depicted in Fig. 8b. Regarding the statistical analyses of both samples, it was verified that there is no normality (p-value = 0.000007255), both samples have different variances (p-value = 0.01469) and they exhibit significant differences (p-value = 0.01332).

Another variable studied is VISPRm, for which a minimum value of 0 was identified for archaeological sites and 3 for random points. The median for the former is located at 20.09 and for the latter, it is at 19.65. As for the maximum values, it is 75 for archaeological sites and 66 for random points. The value distribution shows that for archaeological points, it concentrates between 8 and 30, whereas for random points, it ranges from 9 to 22, as seen in Fig. 8c. The Shapiro–Wilk statistical test indicated that there is no normality (p-value = 0.000001251). The variances of the samples are similar, as demonstrated by the Mann–Whitney-Wilcoxon test (p-value = 0.8443) and it was confirmed that there are no significant differences between both samples (p-value = 0.9161).

The LCPCm variable yielded a minimum value of 0.26 for archaeological sites and 0.50 for random points. The median for archaeological sites is 20.63, while for random points, it is 2.42. As for the maximum value for each sample, it is 99.41 for archaeological sites and 23.68 for random points. The trend in the dispersion of values for both samples is practically the same, as values are grouped between 2 and 4, as displayed in Fig. 8d. The Shapiro–Wilk test indicated that there is no normality (p-value = 0.0000002778) and the variances of both samples are equal (p-value = 0.7839). Finally, the T Student test confirmed that there are no significant differences between both samples (p-value = 0.2453).

Fig. 8
figure 8

Boxplots comparing the results of archaeological sites with random sites for different variables in the Northern area

Another variable that has been analysed is TOTINSm, for which a minimum value of 3.51 was identified for archaeological sites and 1.86 for random points. The median for archaeological sites is 4.57 and for random points, it is 4.42. As for the maximum value for the former, it is located at 5.26 and for the latter, it is at 5.40. The trend in the concentration of values for each sample is similar in both cases, oscillating between values of 4 and 5, as presented in Fig. 8e. The statistical analysis by Shapiro–Wilk indicated that there is no normality (p-value = 0.0000001963) and both samples have the same variance according to the T Student test (p-value = 0.2146). Finally, it was confirmed that both samples belong to the same population (p-value = 0.1367).

Regarding the DIRINSm variable, the minimum value of the archaeological sample is 0.84 and for the random sample, it is 1.01. Their respective medians are 3.70 for the first group and 3.53 for the second group. The maximum value for archaeological sites is 4.42 and for random sites, it is 4.53. The concentration of values for both samples is similar, as they are found between values of 3 and 4, as depicted in Fig. 8f. Both samples do not exhibit normality (p-value = 0.0000003559) and their variances are similar (p-value = 0.2797). The T Student test confirmed that there are no significant differences between both samples (p-value = 0.1618).

Another variable related to insolation is DIFINSm. In this case, archaeological sites present a minimum value of 0.84 and random points have a minimum value of 0.88. The median for archaeological sites is 0.91 and for random points, it is 0.85. The maximum value for both samples is located at 0.93. The concentration trends of values are different in both samples, as the values for archaeological points range between 0.89 and 0.92, while for random points, they are concentrated between 0.87 and 0.89, as seen in Fig. 8g. Both samples do not exhibit normality (p-value = 0.04847) and their variances are different (p-value = 0.005377). The T Welch test confirmed that they belong to different populations and therefore exhibit statistically significant differences (p-value = 0.003053).

For the WINDm variable, archaeological sites have a minimum value of 0.78 and random points have a minimum value of 0.77. The median for both samples is 0.97. The maximum value for archaeological sites is located at 1.23, while for random points, it is at 1.31. Their value concentration trends are similar, although for archaeological sites, it ranges between values of 0.9 and 1.1. For random points, it concentrates between 0.85 and 1.1, as displayed in Fig. 8h. The Shapiro–Wilk test confirmed that there is no normality (p-value = 0.006604) and the Mann–Whitney-Wilcoxon test showed that their variances are similar (p-value = 0.9854). Finally, the T Student test confirmed that both samples belong to the same population (p-value = 0.8687).

Central area

In the Central Mountain ranges area, the first variable under scrutiny was the locational variable ALTA. Visual examination did not reveal any significant differences in the distribution of archaeological points (refer to Fig. 9a); however, statistical analysis unveiled disparities between this variable and the random background points (see Table 14). Similar observations applied to the variable TPI1000, the sole one associated with Topographic Prominence analysis that displayed noteworthy deviations from the expected conditions under random sampling (p-value = 0.01, Table 14), although it did not manifest visually (see Fig. 9d). Conversely, the variables TPI100 and TPI500 showed no substantial evidence (refer to Fig. 9b-c and Table 14).

Table 14 Mann–Whitney U test results for the comparison between archaeological sites and randomly resampled background points in Central area
Fig. 9
figure 9

Kernel density plots comparing archaeological sites (red line) and randomly resampled background points (black line) for different variables. The grey areas represent the 95% confidence intervals calculated from the resampling of 999 random samples

Variables pertaining to hydrology, such as HYDROE and HYDROC, exhibited marked distinctions from the expected patterns under random conditions. These distinctions were perceptible through visual analysis (see Fig. 10a-b) and statistically significant (p-value < 0.05 in Table 14). Another variable demonstrating this phenomenon was WIND, where the distribution of archaeological sites deviated from the distribution generated in 999 random simulations (see Fig. 10d). The Mann–Whitney U test confirmed statistically significant differences (refer to Table 14). In the case of the LCPC variable, visual observations suggested deviations in the distribution of archaeological points from what was expected under random conditions, with a notable concentration near potential transit routes (see Fig. 10c). However, statistical confirmation through the Mann–Whitney U test remained inconclusive (p-value = 0.54, Table 14).

Fig. 10
figure 10

Kernel density plots comparing archaeological sites (red line) and randomly resampled background points (black line) for different variables. The grey areas represent the 95% confidence intervals calculated from the resampling of 999 random samples

The distribution of archaeological points for the DIFINS variable displayed a distinct pattern compared to the median of the background derived from 999 random samples, though it consistently remained within the envelope generated by them (see Fig. 11a). Nevertheless, statistically significant differences emerged upon applying the Mann–Whitney U test (p-value < 0.05, Table 14). In the case of the CPFPCDc and CPFPCGc variables, the former exhibited non-normal values, as all random points possessed values of 0 and could not be deemed representative (see Fig. 11b). For the latter variable, no significant differences were observed either visually or statistically (see Fig. 11c and Table 14).

Fig. 11
figure 11

Kernel density plots comparing archaeological sites (red line) and randomly resampled background points (black line) for different variables. The grey areas represent the 95% confidence intervals calculated from the resampling of 999 random samples

With regard to the variables SLO, ASP and WET, no significant differences were detected visually (refer to Fig. 13a-c in the SM) and statistical analysis corroborated these findings (see Table 14). Conversely, variables linked to geological proximity displayed a different pattern. For GEOLE, statistical significance was confirmed after conducting the Mann–Whitney U test (p-value < 0.05, Table 14). However, this distinction remained indiscernible through visual analysis (see Fig. 13d in the SM). The same applied to the variable GEOLC, which exhibited a p-value of < 0.05 following the Mann–Whitney U test (refer to Fig. 14a in the SM and Table 14 in the text). In contrast, the variables VISPR, TOTINS and DIRINS did not exhibit statistically significant differences and these distinctions were not visually evident through the probability density estimation graph for archaeological sites (see Fig. 14b-d in the SM and Table 14 in the text).

The analysis proceeds to present results for variables lacking raster data but obtained from cell values proximal to archaeological sites, offering insights into the immediate environment. This analysis involves a comparison of archaeological point values with an equal number of randomly generated points. The results of the statistical analyses for the following variables studied in this area can be checked in Table 3 and Table 5 in the SM file.

The first one is ALTm, for which a minimum value of 470 is observed, unlike the random points that show a minimum value of 307 m. As for the medians of both, the archaeological sites have a value of 725 and the random ones have 492 m. Finally, the maximum value of the former is 758 and the latter is 734 m. Observing the distribution of both samples, the archaeological sites have a concentration around 700 m, while the random points are grouped between 450 and 550 m (refer to Fig. 12a). The statistical analysis using the Shapiro–Wilk test has shown that both samples do not follow a normal distribution (p-value = 0.000000007778), which led to the use of a non-parametric test, the Mann–Whitney-Wilcoxon test, to check for homoscedasticity. This test confirmed that both samples have different variances (p-value =  < 2.2e-16). Finally, the T Welch test confirmed that both samples have statistically significant differences (p-value =  < 2.2e-16).

The next variable analysed is TPI100m, for which a minimum value of -5.63 is observed for archaeological sites and -5.40 for random points. The median for archaeological points is 0.22 and for random sites, it is 5.09. As for the maximum values, archaeological sample have a value of 10.11 and random sites have 4.10. Both samples are grouped around values of -1.5 and 1.5 (see Fig. 12b), so no significant differences can be seen visually. The statistical analysis using the Shapiro–Wilk test shows that there is normality (p-value = 0.1544). This result led to the use of the F Test to test homoscedasticity, which confirmed the presence of the same variance in both samples (p-value = 0.3478). Since they have the same variance, the T Student Test was used and it was found that there are no significant differences between both samples (p-value = 0.7598).

Another topographic prominence-related variable is TPI500m. Archaeological sites display a minimum value of -6.08, whereas random points plummet to -19. Regarding the medians, the value for archaeological points is -0.19 and for random ones, it is -2.14. As for the maximum value, for archaeological sites, it is located at 5.52 and for random sites, it is at 18.94. The grouping of values in both samples is similar (refer to Fig. 12c). The statistical test indicated that there is no normality (p-value = 0.2957). The F Test allowed verifying that the variances of both samples are different (p-value = 0.02653). The T Welch test confirmed that both samples do not have statistically significant differences (p-value = 0.945).

The last variable related to topographic prominence is TPI1000m. The minimum value for archaeological sites is -6.08 and for random points, it is -31.38. The median for first ones is -0.19 and for random ones, it is 2.79. The maximum value for the archaeological sample is 5.52 and for random sample, it is 28.94. Their distributions, as depicted in Fig. 12d, appear strikingly alike. The Shapiro–Wilk test demonstrated that there is no normality (p-value = 0.0131). The Mann–Whitney-Wilcoxon test showed that both samples do not have the same variance (p-value = 0.03906). The T Welch test demonstrated that there are significant differences in both samples and therefore, they would not statistically belong to the same population (p-value = 0.03107).

Fig. 12
figure 12

Boxplots comparing the results of archaeological sites with random sites for different variables in the Central area

The minimum value of the SLOm variable for archaeological sites is 0.85 and for random points, it is 0.48. Regarding the medians, for archaeological points, it is 5.14 and for random ones, it is 5.90. Finally, the maximum value for the archaeological sample is 15.19 and for random sites, it is 18.43. The distribution of both samples is similar (see Fig. 12e). The Shapiro–Wilk test shows that there is no normality (p-value = 0.000009964). Both samples have the same variance, as shown by the Mann–Whitney-Wilcoxon test (p-value = 0.3702). Finally, it has been verified that both samples are not statistically significant (p-value 0.2006).

Regarding the variable ASPm, the minimum value for archaeological sites is 11.59 and for random points 22.82. The median is 165.57 for archaeological ones and 202.74 for random sites. While the maximum value for the archaeological sample is 346.40, it is 335.70 for random points. Their distribution patterns, as depicted in the boxplot (Fig. 12f), suggest comparable distributions. The statistical analysis of both samples indicates that there is no normality (p-value = 0.04993). Archaeological sites and random points have the same variance, as demonstrated by the Mann–Whitney-Wilcoxon test (p-value = 0.5155). The T Student test showed that both samples do not have significant differences (p-value = 0.6325).

For the HYDROEm variable, the minimum value for archaeological sites is 33.25 and for random points is 14.98 m. The median for the archaeological sample is 1247.73 and for random ones, it is 589.14 m. The maximum value for the former sample is 2018.44 and for the latter, it is 1959.42 m. The distribution of both samples is visually different, as archaeological points are concentrated around 1000 and 1500 m. Visual examination reveals contrasting distributions: archaeological points tend to cluster around 1000 and 1500 m, while random points gravitate toward 200 and 800 m (refer to Fig. 12g). The statistical analysis has shown that there is no normality (p-value = 0.003927). The Mann–Whitney-Wilcoxon test indicates that both samples have different variances (p-value = 0.00000003019). The T Welch test shows that there are significant differences between the sample of archaeological sites and random points (0.000000002273).

In the case of HYDROCm, the minimum value for archaeological sites is 5.27 and the minimum value for random points is 1.95 min. The median for the former sample is 68.53 and for the latter, it is 33.85 min. The maximum value for archaeological points is 108.14 and for random points, it is 115.53 min. Their distributions are strikingly different, with archaeological sites concentrating between 60 and 80 min and random points ranging from 15 to 45 min (Fig. 12h). Statistically, both samples do not follow a normal distribution (p-value = 0.01832). The Mann–Whitney-Wilcoxon test indicates that both samples have different variances (p-value = 0.00000002122). Finally, it is observed that both samples come from different populations and exhibit statistically significant differences (p-value = 0.000000003962).

For the WETm variable, based on the cost of travel to potential wetland areas, a minimum value of 1.03 min has been identified for archaeological sites and 0.98 min for random points. The median for the archaeological sample is 12.25 min and for the random sample, it is 12.14 min. As for the maximum values, archaeological sites have a value of 45.93 min, while random sites have a value of 82.75 min. The distribution of values for each of the samples is concentrated between 10 and 20 min (Fig. 12i). The Shapiro–Wilk test indicates that there is no normality (p-value = 2.332e-10). The Mann–Whitney-Wilcoxon test shows that both samples have the same variance (p-value = 0.8417). Ultimately, the T Student test demonstrates that there are no statistically significant differences between these two samples (p-value = 0.4089).

The distance in meters to potential geology has been calculated using the variable GEOLEm. The minimum values for the archaeological sample and the random sample are 5367 and 0 m, respectively. The median value for archaeological points is located at 8575 m and for random points, it is 5204 m. The maximum value for archaeological sites is 9522, while for random points, it is 16,127 m. Visual representation of the data (see Fig. 13a) indicates that the values in the archaeological sample cluster between 7500 and 8000 m, whereas random values are more widely dispersed. It has been confirmed that there is no normality after applying the Shapiro–Wilk test (p-value = 0.003927). Variances have been checked to be different (p-value = 0.00000003019). Finally, the T Welch test has confirmed that there are statistically significant differences between both samples (p-value = 0.000000002273).

The other variable related to potential geology but in terms of travel cost in time is GEOLCm. The minimum value for archaeological sites for this variable is 259.3 min and 0 for random points. The median for the former sample is 431.6 min and for the latter, it is 253.9 min. The maximum value for archaeological sites is 474.5 and for random points, it is 801.7 min. As indicated in Fig. 13b, the data shows that archaeological sample travel times, to potential geology, cluster between 400 and 500 min, while random sample travel times exhibit greater variability. The Shapiro–Wilk test has confirmed that there is no normality (p-value = 0.01832). It has also been verified that both samples have different variances (p-value = 0.00000002122). Similar to the previous variable, it confirms the existence of significant differences between archaeological sites and random points (p-value = 0.000000003962).

For the VISPRm variable, there is a minimum value of 0 for archaeological sites and random sites. The median value for the former sample is 7 and for the latter, it is 19. The maximum value for the archaeological sample is 68 and for the random sample, it is 45. The data suggests that archaeological sites are primarily concentrated between values of 5 and 15, while random points cover a wider range of values from 10 to 25, as illustrated in Fig. 13c. It has been confirmed that there is no normality with the Shapiro–Wilk test (p-value = 0.0000000288). The variance of both samples is different (p-value = 0.000002453). The T Welch test has allowed confirming that there are significant differences between both samples (p-value = 0.0004423).

The LCPCm variable has a minimum value of 0.3 min and a minimum value of 0.11 min for archaeological sites and random sites, respectively. Looking at the median of both samples, the median for archaeological sites is 1.46 min and for random sites, it is 1.16. The maximum value for archaeological sites is 13.63 and for random points, it is 14.85 min. The distribution of values for both samples is similar, with values clustered between 0 and 2.5 min (see Fig. 13d). The Shapiro–Wilk test has confirmed that there is no normality (p-value = 3.324e-16). Furthermore, the variances of both samples are similar according to the Mann–Whitney-Wilcoxon test (p-value = 0.09815). Finally, it has been demonstrated that both samples do not possess statistically significant differences (p-value = 0.6297).

Fig. 13
figure 13

Boxplots comparing the results of archaeological sites with random sites for different variables in the Central area

The next variable analysed is TOTINSm, related to insolation. The minimum value for archaeological sites is 3.43 and for random points, it is 3.24. The median for the archaeological sample is 4.55 and for the random representation 4.46. The maximum value for the former sample is 5.43 and for the latter, it is 5.41. Data distribution is consistent between both samples, with values ranging from 4.2 to 4.7 in Fig. 13e. It has been confirmed that there is no normality (p-value = 0.0178). The Mann–Whitney-Wilcoxon test has yielded a result that confirms both samples have similar variances (p-value = 0.6486). Ultimately, the T Student test substantiates the absence of statistically significant differences between archaeological sites and random points (p-value = 0.6973).

Another variable related to insolation is DIRINSm. The minimum value for the archaeological sample for this variable is 2.52 and for the random sample, it is 2.4. The medians for both are 3.64 and 3.57, respectively. The maximum value for archaeological sites is 4.53 and for random points, it is 4.54. A visual examination of the data (refer to Fig. 13f) suggests that the distribution in both samples is alike. The Shapiro–Wilk test has confirmed that there is no normality (p-value = 0.02601). Both samples have the same variance according to the Mann–Whitney-Wilcoxon test (p-value = 0.9918). Finally, it has been verified that there are no statistically significant differences between archaeological sites and random sites (p-value = 0.9409).

The last of the variables related to insolation is DIFINSm. For this variable, the minimum value for archaeological sites is 0.88 and for random points, it is 0.85. The median for the archaeological sample is 0.92 and for the random sample, it is 0.88. Regarding the maximum values, for archaeological sites, it is 0.93 and for random points, it is 0.92. Visually, the archaeological sample is clustered at higher values, approximately around 0.92. In contrast, the values in the random sample are notably lower (see Fig. 13g). Statistically, it has been confirmed that there is no normality (p-value = 0.000000192). The variances of both samples are different (p-value =  < 2.2e-16). It has also been verified that both samples have statistically significant differences based on the T Welch test (p-value =  < 2.2e-16).

For the variable WINDm, there is a minimum value of 0.79 for archaeological sites and 0.79 for random points. The median for the archaeological sample is 1.16 and for the random sample, it is 1.03. The maximum value for the former is 1.32 and for the latter, it is 1.28. Notably, the distribution of both samples varies, with archaeological sites having higher values than random points (see Fig. 13h). It has been confirmed that there is no normality (p-value = 0.00002514). There are no similarities in terms of variances between both samples either (p-value = 0.000007451). Lastly, the T Welch test has confirmed that there is statistical significance between archaeological sites and random points (p-value = 0.000001081).

Northern area vs Central area

In this section, the comparison of various variables for both study areas will be presented. The objective is to augment the preceding analysis, which compared the variables of each area with expectations under random conditions. This comparison aims to shed light on the settlement patterns in both areas.

For both the Central and Northern areas, the results for ALTA and ALTm are similar. There are no discernible differences between the values obtained at specific points and the average values in the surrounding cells. In the Central area, archaeological sites are predominantly clustered between 700 and 750 m, while in the Northern area, they span from 600 to 750 m (see Fig. 15a-b in the SM).

In terms of topographic prominence, the comparison reveals some differences. In TPI100 and TPI100m, sites in the Central area exhibit a tighter cluster between -2 and 2, while in the Northern area, they range from -2 to 3 (refer to Fig. 15c-d in the SM). In TPI500, the archaeological sites in the Northern area are situated in more prominent areas and a similar distribution is observed when considering TPI500m (Fig. 15e-f in the SM). The variables TPI1000 and TPI1000m follow a similar pattern, where sites in the Central area are found in less prominent areas compared to the Northern area. However, for TPI1000m, the values are more equalized and even the median of the Central area points is slightly higher (Fig. 15g-h in the SM).

The last two variables related to altitude are ALTrA and ALTrB, both complementary. When comparing ALTrA in the two areas, it is observed that there are no major differences between them and the sites are located at similar indices because the median of the areas is the same (0.95) and is very close to 1, which indicates that the sites are in prominent areas. In the Northern area, the ALTrB index ranges from 1.01 to 1.16 and the median is located at 1.08. Meanwhile, in the Central area, it ranges from 0.99 to 1.14 and the median is located at 1.05 (see Fig. 14a-b).

Fig. 14
figure 14

Boxplots comparing the results of archaeological sites in Central area and Northern area for some variables

For the archaeological sites in the Northern area, the variables SLO and SLOm have slightly higher values than in the Central area (Fig. 15i in the SM). The sites are located on moderate slopes, whereas in the Central area, the slopes are lower and their distribution is concentrated between values of 4 and 8 (Fig. 16a in the SM). The slope of the geomorphological area (SLOga) has low values in the Central Mountain ranges, while in the Northern Mountain ranges, they are slightly higher. These can be considered average values that make occupation less favourable (Fig. 14c).

Regarding the Theoretical Slope (SLOt), the values in the Northern area are concentrated very close to zero, while in the Central area, they are grouped around 10 (Fig. 14d). Another variable for analysing the surroundings of the sites is the steepest real slope (SLOst), which is related to the theoretical slope. The coincidence of both indices, in some of the sites, would indicate that in both cases, these areas are completely uniform and do not show differences between the highest and lowest slope values (Fig. 14e). Based on this, in the Central area, 29.51% of the sample shows coincidences in both variables, while in the Northern area, it accounts for 32.35% of the total (Table 15).

Table 15 Total number and percentage of sites where the SLOt and SLOst indexes are coincident

The last two variables related to slope, which also approach the study of accessibility, are the plateau slope index (SLOpi) and the increase in the 15' to 45' isochrone (INCr15-45). For the SLOpi variable, values close to 0 indicate more difficult accessibility, while values close to 100 indicate complete accessibility. In both areas, it has been observed that the sites, being located in mountainous areas, have poor accessibility, as their indices are low and close to 0 (see Fig. 14f). As for the INCr15-45 variable, it involves comparing the increase in the 15' isochrone to the 45' isochrone. If the index obtained is less than 9, accessibility is good and if it is greater than 9, it is poor. This has resulted in the Northern area having a higher percentage of sites with good accessibility (67.64%), while in the Central area, sites with poor accessibility predominate (60.66%) (see Fig. 14g and Table 16).

Table 16 Number and percentage of sites with poor and good accessibility for both study areas

The two variables used for calculating accessibility are complementary and as can be observed, they yield different results. While the plateau slope analyses an environment close to the site (based on a maximum cost of 10'), the isochrone increase is based on data from the surrounding area (from about 15' to 45'). The results show that in the immediate vicinity, the terrain relief conditions accessibility. In mountainous areas, accessibility is more challenging than in flatter or lower areas. On the other hand, when this analysis is extended to a medium-range environment, this model no longer holds and the nearby geomorphology of the area does not matter. The Central area has poorer accessibility in a medium-range environment than the Northern area.

Regarding the aspect, understood as the slope orientation, two variables have been analysed. The first is ASP and the second is ASPm. The medians of the sites in the Northern area are located between values of 200 and 250 for both variables. Meanwhile, for the Central area, the median is situated between 150 and 200 (Fig. 16b-c in the SM). This translates to the archaeological sites in the Northern area tending to be oriented to the Southwest, while the points in the Central area tend to be oriented to the Southeast.

In the analysis of potential hydrology, various variables have been studied, categorized into two groups: those related to watercourses and those related to wetland areas. For the first group, the Euclidean distance between potential river courses and sites in each area (HYDROE) has been calculated. After this analysis, it can be observed that in the Northern Mountain ranges, archaeological sites are located at short distances, while in the Central Mountain ranges, they are situated at medium distances (see Fig. 16d in the SM).

It is in the Central area that the trend of the archaeological sample with respect to randomness varies, as shown in the previous section. This trend is confirmed by the variable that analyses the Euclidean distance to watercourses in the immediate cells of each site (HYDROEm), which is different in each area (refer to Fig. 16e in the SM). Therefore, the Euclidean distance to river courses appears to be a relevant factor in the Central area, not because the sites in this area are closer than usual but because they are located at medium distances.

The proximity to hydrological courses calculated in travel time cost (HYDROC) has allowed verifying that the same thing occurs in the 2 study areas as for the previous variable (Fig. 16f in the SM). Similarly, with the travel cost analysis in nearby cells (HYDROCm), a close pattern is observed as for HYDROEm (Fig. 16 g in the SM). In the Central Mountain ranges, sites are at medium time distances and do not coincide with the expected randomness conditions, whereas in the Northern area, archaeological sites are located in the immediate vicinity, but their pattern matches when comparing the data with random samples, as seen previously.

Water is a fundamental resource for life and our ancestors must have had areas where they could regularly obtain water, which also served for animals to drink and be hunted. Based on this premise, the surface of potential visible watercourses (in ha) from each of the archaeological sites has been analysed using the variable HYDROV. When comparing the data from the 2 areas, it has been found that there is very limited or practically no visual control for both mountain areas (refer to Fig. 14h).

The sites in both areas are located nearby, just a few minutes travelling time from potential wetland areas modelled with the variable WET. Sites in the Central Mountain ranges are slightly more dispersed and at greater distances (see Fig. 16 h in the SM). Analysing the mean value of the cells adjacent to the sites, using WETm and comparing these values in both areas, the same trend as the previous variable is observed (Fig. 16i in the SM). Regarding the visual control of wetland areas obtained with WETv, the same trend as with the visual control of potential hydrology (HYDROV) has been identified. In both areas, there is limited visual control (refer to Fig. 14i), but this visual scarcity is more notable in the Central area (Table 17).

Table 17 Statistical summaries of the values in each study area for the WETv variable

Potential geology emerges as a variable to be considered since hunter-gatherer communities crafted their tools from stone and needed areas to procure raw materials. Some raw materials might have come from more distant areas, probably through trade with other groups. While local materials were used, the selection was also influenced by the need for high-quality materials suitable for laminar and microlaminar techniques, which were not always available locally (Llana Rodríguez 1990). The variables related to potential geology selected and how the areas susceptible to exploitation were chosen to have already been explained in the methodological section. However, it should be noted that this is a very basic approach and should be further investigated in the future to obtain more robust results. In any case, it has been observed that in the Northern Mountain ranges, archaeological sites are close to areas with geological potential (GEOLE), at distances ranging from 0 to 1 km. On the other hand, in the Central Mountain ranges, sites are located between 7 and 10 km (Fig. 17a in the SM). Considering the mean of the cells adjacent to each site, using the variable GEOLEm, the same pattern is observed (Fig. 17b in the SM). It is also identified that sites in the Central area are farther away than those in the Northern area, concentrated between 5 and 10 km.

Regarding the proximity to these areas with geological potential calculated in travel time cost in minutes using the GEOLC variable, it shows a similar picture to what was mentioned in the previous paragraph. In the Northern area, sites are concentrated between 0 and 100 min (see Fig. 17c in the SM). For archaeological sites in the Central area, the bulk of the sample is between 400 and 500 min. When calculating this value for the cells surrounding the sites, based on GEOLCm, the same trend persists (Fig. 17d in the SM). Just as it was done with watercourses and potential wetland areas, the visible surface with geological potential has also been calculated using the GEOLV variable. These surfaces are again larger in the Northern area compared to archaeological sites in the Central area (refer to Fig. 15a and Table 18).

Table 18 Statistical summaries of the values in each study area for the GEOLV variable

After completing the abiotic factors, the results for the biotic factors will be analysed. In this group, there are only two major factors summarized in the areas of land suitable for hunting goats and hunting deer. Within each of these, travel costs in minutes (CPFPCGc and CPFPCDc) and exploitable surface area in ha (CPFPCGs and CPFPCDs) have been calculated. The travel cost for the CPFPCGc variable shows that in the Central area, sites are mainly around 1 min away, while in the Northern area, they concentrate between 0 and 30 min. In the latter case, values are lower and more concentrated. In the case of CPFPCDc, it is challenging to make a comparison because the values are extremely low. Almost all sites are at values of 0 (see Fig. 17e-f in the SM).

Regarding the exploitable surfaces for the CPFPCGs variable, it was found that in the Central area, there is a trend towards low surfaces, close to zero. However, for sites in the Northern area, values are higher and there are larger exploitable areas for hunting these herbivores. For the CPFPCDs variable, there are larger areas. But the trend is different from the previous variable. It is observed that in the Central area, sites have a greater extent than in the Northern area (see Fig. 15b-c).

Once the abiotic and biotic constraints have been studied, the group of other conditioning factors will be compared. Within this group are the variables related to visibility, insolation, proximity to least cost paths and potential wind. The first of these is the variable related to the visual catchment area from each of the sites in each zone (VISC, calculated in ha). This variable can provide an idea of visual control from the sites and can be complemented by visibility from the rest of the terrain to each of the sites through VISZ variable. For both zones, it is observed that sites have low values for the VISC variable (refer to Fig. 15d).

Regarding VISZ and based on the surface from which each of the sites can be seen (in ha), it is very similar to the previous variable (Fig. 15e). There are archaeological sites that see more surface area and are seen from less surface area and vice versa, which specific data could indicate if some of these sites have a clear visual control over certain areas or, on the contrary, are located in areas with the aim of being seen from different parts of the territory and acting as reference points in the landscape. This study has not addressed the individual analysis of each of the sites because the aim is to study the general trend of each area.

The next variable to be evaluated is the analysis of visual prominence through VISPR. The higher the value of this variable, the more visible the archaeological site is. Taking this into account, it has been found that in neither of the zones are the sites associated with high values but, quite the opposite, tend to have low values. Nevertheless, those with higher values are in the Northern area compared to the Central area. The mean visual prominence of the cells adjacent to each of the sites using the VISPRm variable was also considered, confirming practically the same trend as the previous variable (Fig. 17 g-h in the SM).

Fig. 15
figure 15

Boxplots comparing the results of archaeological sites in Central area and Northern area for some variables

The sites in both study zones are located near potential transit routes, analysed with the LCPC variable (refer to Fig. 17i in the SM). When calculating the same variable for cells adjacent to the sites, through LCPCm, similar results were found (Fig. 18a in the SM). These results seem to indicate that there is a relationship of proximity between archaeological sites in both areas and potential transit routes.

The next variables to be studied are those related to potential insolation. Among these is TOTINS, which in broad terms, is composed of direct and diffuse insolation, which will be analysed later. The results show values located in medium and medium–high indices. It is observed that the distribution for each area is very similar. In the calculation of the average of the cells adjacent to the sites, using TOTINSm, practically the same distribution pattern was confirmed (see Fig. 18b-c in the SM). Another variable related to potential insolation is direct insolation, whose variable has been named DIRINS. It has been found that the sites in both zones tend to have medium or medium–high indices. The result of the calculation of direct insolation in the cells near each site, through DIRINSm, yields similar results. Although the archaeological sites in the Northern area have slightly higher values (Fig. 18d-e in the SM). The last of the variables related to insolation is DIFINS, where higher values are found in the sites of the Central area. This data is reinforced when calculating differential insolation in the cells near the archaeological sites using DIFINSm (Fig. 18f-g in the SM).

The final set of variables examined pertains to the influence of wind exposure. Higher values in these variables indicate greater susceptibility to wind exposure, whereas lower values suggest reduced exposure. For the WIND variable, a noteworthy pattern has emerged. Archaeological sites in the Northern area display moderate to low values in comparison to what would typically be expected in their geographic region. Conversely, in the Central area, the sites exhibit higher values (refer to Fig. 18 h in the SM). This suggests that these sites in the Central area may be located in spaces that are more exposed to prevailing winds, possibly on slopes or in positions directly facing these winds. Furthermore, it is important to note that both mountainous regions have wind exposure indices that align closely with what would be anticipated in random conditions and within the respective areas. This alignment is substantiated when comparing the archaeological samples in each zone with random samples, as seen previously in this paper. An additional variable, WINDm, which calculates the mean of the dominant wind patterns in the cells surrounding the sites, reaffirms the same trends observed with the preceding variable (see Fig. 18i in the SM) In summary, these results suggest that Palaeolithic hunter-gatherer groups may not have prioritized protection from prevailing winds when selecting the locations for their settlements in those areas.

Discussion

The investigation of Palaeolithic archaeological sites in Northwestern Iberia has provided valuable insights into the factors influencing their location. Absolute altitude emerged as a pivotal variable in determining site placement in both the Northern Mountain ranges and the Central Mountain ranges. This revelation underscores the importance of considering altitude in future research. However, it is crucial to approach this finding with caution. While altitude is evidently a fundamental factor, it does not imply a universal preference for high, medium, or low-altitude sites. Instead, the pattern of altitude-based site distribution is closely aligned with the specific characteristics of each region, a pattern consistent in both study areas. Furthermore, it is essential to mention that in colder periods, higher altitudes might have been less habitable (Viana-Soto and Pérez-Alberti 2019).

Another significant factor shaping site locations is the cost of traveling to potential wetland areas, primarily in the Northern area and the cost of reaching potential watercourses, which holds importance in the Central area. These hydrology-related variables prove instrumental in predicting the archaeological site’s locations. As with altitude, proximity does not consistently define the site distribution pattern, as sites can also be situated in intermediate zones. This study not only reaffirms the importance of factors such as altitude, wetland areas and watercourses as mentioned in prior literature, but it also highlights other variables with statistically significant differences worth considering. These variables include potential geology and diffuse insolation, particularly in the Northern area and aspect, specifically in the Central area. The analysis reveals general trends within the samples of archaeological sites in each region, but it is essential to recognize that examining individual sites might uncover distinctive characteristics related to their chronology and functionality.

The primary focus of this study has been the examination of general trends in the location of Palaeolithic sites across the landscape, emphasizing what can be categorized as primary factors or first-order effects. Notably, central emphasis has not been placed on the investigation of what can be termed secondary factors, particularly those related to cultural and social considerations. These called second-order effects, often associated with the spatial relationships between sites (Nakoinz and Knitter 2016) and their possible cultural or social implications (Bevan et al. 2013), have not been the central theme of the current analysis. However, recognizing the importance of these secondary factors in shaping site distribution, future research will necessitate a more comprehensive examination of the spatial relationships among archaeological sites, delving into the potential cultural and social influences that play a role in their placement.

In the Northern area, a recurring model of site occupation revolved around the notion that shelters, near which Palaeolithic sites were located, functioned as landmarks in the landscape. These sites might serve as prominent zones in their immediate vicinity, within approximately 0 to 100 m. Beyond this range, these characteristics tend to fade. In this sense, sites in the Cantabrian region have been identified as landmarks and interpreted as places that allowed them to take part in their social construction (García-Moreno 2013).

The study explored accessibility to sites within both the immediate and medium-range environments. Accessibility has been a major factor to explain settlement patterns in other regions of Iberia (Mas et al. 2018; Fano et al. 2022). The present study has found that in the immediate environment, the findings suggest a model of relatively poor accessibility, potentially adapted to the mountainous terrain. However, the overall trend in both study regions, considering regional characteristics, indicates good accessibility. In the medium-range environment (15–45 min), two models emerged: one revealing better accessibility in the Northern area and the other indicating poorer accessibility in the Central area.

Notably, zones for potential catchment, such as goats and deer, appeared statistically less significant in site location choices in both study areas. However, their relevance varied between the Northern area, where proximity to potential goat capture zones seemed significant and the Central area, where proximity to deer exploitation areas took precedence due to larger exploitable surfaces.

The relationship between Palaeolithic archaeological sites and transit zones, a notable criterion in previous research (Ramil Rego and Ramil Soneira 1996; López Cordeiro 2015), was found to be less significant than expected when compared to random conditions. This suggests the necessity for a more focused investigation in subsequent research, in agreement with prior findings (Díaz Rodríguez 2017; Díaz-Rodríguez et al. 2023).

With regard to insolation, most sites tended to be located in areas with medium to high levels of insolation, with few significant deviations from randomness. One exception was diffuse insolation in the Northern Mountain ranges, where this variable effectively predicted site locations, implying that in regions with greater topographical variability, insolation plays a more crucial role. Comparing the results obtained with other region, in the case of the Late Magdalenian and Cantabrian Azilian, the analysis of potential insolation at archaeological sites has revealed that it does not appear to have been a determining factor in site selection. However, it is possible that during specific seasons, hunter-gatherers may have favoured sites with high insolation among those meeting other essential criteria (García-Moreno 2015).

An additional variable that emerged in the literature was the shelter from dominant winds (Ramil Rego 1989/1990). The analysis indicated that sites in the Northern Mountain ranges, were typically located in areas with medium to low wind shelter values, suggesting a preference for more exposed settings. However, it is the contrary on Central Mountain ranges.

Regarding the selected variables, it has been found that some of them provided similar results, often because their creation was based on the same underlying principles. For example, variables derived from Euclidean distance and travel time costs demonstrated statistically similar results in their comparisons. For this reason, it is necessary to take this into account when including them in further analysis.

In the Northern and Central Mountain ranges of Northwestern Iberia, landscape dynamics such as river valley erosion and sediment deposition have significantly influenced the archaeological record. Understanding these processes is essential for interpreting the distribution and preservation of Palaeolithic sites. Therefore, considering landscape dynamics is crucial when studying settlement patterns in this area. Previous studies, such as the work of Dimuccio et al. (2023) in Leiria, Portugal, have provided valuable insights into the preservation potential of Palaeolithic sites. This study offers a comprehensive perspective on the relationship between landscape evolution and site preservation, which can be similarly applied to the study areas in Northwestern Iberia.

When comparing the analyses of this study with other neighbouring areas, such as the Cantabrian region or Western Iberia, notable differences become apparent. In a region adjacent to this study area, Eastern Cantabria, it was documented that sites inhabited during the Upper-Final Magdalenian were located either on predominantly flat terrain or in rocky areas (García Moreno 2010). In the case of other areas in Western Iberia studied within the context of the Middle Palaeolithic, archaeological sites have been found at lower elevations and closer distances to riverbanks. This pattern appears to influence the availability and exploitation of lithic raw materials (Cascalheira et al. 2022). Finally, the results presented at the Monforte de Lemos basin area, in Northwestern Iberia, show that the main predictor variables are elevation, slope, cost to potential hydrology, the cost to wetland areas and visual prominence (Díaz-Rodríguez et al. 2023). These findings underscore the importance of considering the specific characteristics of each region when interpreting past prehistoric settlement patterns.

Such comparative insights not only elucidate regional variations in site selection but also contribute to a broader understanding of the adaptive strategies employed by Palaeolithic populations across different environmental contexts. The observed similarities and differences between regions may reflect a complex interplay of environmental factors, such as topography, hydrology and resource availability, which influenced the subsistence strategies and mobility patterns of prehistoric groups. Additionally, cultural factors, including technological innovations, social organization and symbolic behaviours, likely played a role in shaping the spatial distribution of settlements. For example, variations in site location could indicate specialized adaptations to local ecological niches or the transmission of cultural practices over time.

In summary, these findings provide valuable insights into the multifaceted factors that shape the location of Palaeolithic archaeological sites in Northwestern Iberia. These results highlight the need for region-specific approaches when examining occupation patterns and underscore the significance of a nuanced understanding of the intricate interplay of these factors.

Conclusion

In summary, the theoretical model proposed in the existing literature had been rigorously tested, demonstrating the utility of statistical analysis in evaluating a comprehensive set of 50 variables within two regions that exhibit some orographic similarities and are hypothesized to have been occupied during overlapping broad chronological periods, including the Upper Palaeolithic and Epipaleolithic. However, it is important to note that the lack of absolute dating makes precise chronological comparisons challenging and this hypothesis is based on relative dating and stratigraphic correlations. The findings offer a rich tapestry of insights that not only illuminate intricate details of site placement but also underscore the regional distinctions that challenge the notion of a uniform occupation pattern.

In brief, the sites within the Central Mountain ranges are predominantly situated in high-altitude areas with gentle slopes. They are positioned near wetland zones, in intermediate settings concerning potential hydrology, with relatively low visual prominence. These sites typically lack wind protection, orient towards the Southeast, lie in proximity to potential transit routes and are easily accessible from nearby areas. However, access from medium-range environments is comparatively challenging. These sites benefit from moderate to high insolation levels and possess a moderate topographical prominence index. Additionally, they offer extensive nearby areas suitable for deer hunting. Furthermore, they are located at medium distances from geological potential zones (refer to Table 6 in the SM).

Conversely, the sites in the Northern Mountain ranges are primarily found in mid-altitude regions with gentle slopes. They are located close to wetland areas, in intermediate potential hydrology environments, with relatively low visual prominence. These sites exhibit limited visibility but are sheltered from dominant winds. They are oriented to the Southwest, situated near potential transit routes and readily accessible. Furthermore, they enjoy medium to high levels of insolation, with differential insolation becoming a significant variable in areas with higher values. They feature a moderate topographical prominence index and offer extensive nearby areas suitable for goat hunting. These sites are positioned in proximity to geological potential zones (refer to Table in the SM).

Their findings enable them to both validate and challenge hypotheses grounded in the theoretical model used to explain the location of Palaeolithic sites in Northwestern Iberia. While some hypotheses have found support, others warrant further examination and refinement. This methodology, integrating Geographic Information Systems (GIS) and spatial statistics, has proven effective in empirically testing these initial hypotheses, facilitating the transformation of conjecture into scientifically verified results. Although researchers can never definitively ascertain the thoughts of ancient hunter-gatherer communities, this work serves as a foundational cornerstone for future research into Palaeolithic settlement patterns.