Introduction

Over recent decades, policymakers have continuously increased their use of urban-rural classifications for planning, assessment and monitoring of changes across territories. In Europe, these classifications act in different programmes—e.g. the EU implementation of Sustainable Development Goals or Cohesion and Rural Development Policies—as a way to identify candidate territories for specific initiatives. Urban and rural areas are subject to scientific studies and policy interventions related to industrialisation, digitalisation, green transition, resilience, demographic changes or access to services. Moreover, classifications allowing reproducibility over time, as well as concrete diagnostic features, are needed to measure and monitor urbanisation processes and rural development. Therefore, the distinction between urban and rural areas is a key issue associated with development policies and regional planning.

This paper builds on the two pillars of any territorial classification: the selection of indicators and methods and the identification of meaningful spatial units. The existing scholarly work has primarily focused on administrative geographies. In contrast, this paper adopts labour market areas (LMAs), which are functional areas derived from travel to work flows, characterised by commuting self-containment; this allows the entire population to be represented as a community-based network (OECD, 2020). In this respect, LMAs constitute a multi-level geography. At the micro level, individual travel-to-work patterns describe factual spatial practices. At the macro level, local and national administrations influence territorial organisation. For this reason, LMAs provide solutions for policies aimed at developing endogenous potential.

There is a strong need to characterise these spatial units but a proper urban-rural classification which enables systematic comparisons among LMAs is missing. Moreover, to the best of our knowledge, a territorial functional classification of LMAs is not currently available. The claim of this paper is that geography based on commuting self-containment, linked to a hierarchical classification comprising both urban-rural and functional levels, will improve policy actions. In this approach, the first essential layer is geography. LMAs locate communities that live and work in delineated territories through the self-containment of commuting flows. The second layer is the urban-rural classification component that characterises such communities. Finally, the third layer, the functional component, sheds light on daily community activities. The case study is the Italian territory; however, the classification is easily replicable in other contexts.

Background

In the scientific literature, attempts to define urban and rural spaces are developed from measurable definitions. In many cases, the focus is on rural areas because they are widely recognised as needing policy attention (Nelson et al., 2021). On the opposite side, some scholars concentrate on urban regions (Duranton, 2021) because, among other reasons, they have an increasing share of population living in such areas. However, scientists and stakeholders are aware of the inadequacy of simple binary classifications to represent intrinsic territorial variety (Gillen et al., 2022; Wandl et al., 2014). In this context, there are issues concerning selection problems related to the representativeness of indicators, statistical methods and spatial units.

Choice of indicators in urban-rural classifications

Since the earliest studies, population demography has played a central role in delineating urban–rural areas (OECD, 1996). Independently of statistical and geographical methods, population-based indicators are always, directly or indirectly, considered. However, different indicators pertain to different scopes: if the classification highlights territorial transformations, then dynamic measures are used; in contrast, structural classifications involve population density and counts. For example, in social studies dealing with internal migrations or depopulation phenomena, both population dynamics and growth are used (Molestina et al., 2020). Additionally, population composition by age class highlights distinct urban-rural demographic trends (Bański & Mazur, 2016). In contrast, population counts and densities provide a structural portrait of the spatial unit in a given moment. Standardised and abstract geometric grids have brought a new paradigm, allowing comparability regardless of the spatial unit under study. This is now the standard way to compare areas (Dijkstra et al., 2021; Taubenböck et al., 2022; van Eupen et al., 2012; Vanhatalo & Partanen, 2022).

However, solely population-based classifications are unable to capture the complexity and heterogeneity of territories. The indicators which are recognised as being able to better discriminate between urban and rural areas pertain to environmental and economic dimensions.

As the natural environment influences human activities, urban and rural spaces show significantly different interactions with natural resources. Land cover indicators are, by-and-large, the most popular. These indicators refer to specific classes of land cover to diversify scope: measures of built-up areas characterise urban space (Aubrecht et al., 2016; Cieślak et al., 2020) and measures of agricultural activities are mainly used for rural areas (Gonçalves et al., 2017; Molestina et al., 2020; Öztaş, 2021). To highlight urban-rural contrasts, land cover approaches use square grids, as well (van Eupen et al., 2012; van Vliet et al., 2019). Moreover, to address specific social policy needs, properly designed indicators of land cover or landscape dynamics are used (Moreira et al., 2016; Serra et al., 2014).

The economic dimension in territorial classifications largely aims at capturing the spatial unit functional type. For this reason, analyses generally identify sectors of economic activity which dominate the socio-economic structure of territories or look for their diversification (Gonçalves et al., 2017; Laurin et al., 2020). Next to these, income/wages and employment/unemployment are, by far, the most investigated phenomena, given their significantly different behaviour in urban and rural areas (Öğdül, 2010; van Eupen et al., 2012; Wineman et al., 2020).

Besides the natural environment and the economic dimension of territorial units, the concept of accessibility to services has a central role in marking the contrast between urban and rural areas. The less the access to services and resources, the greater the degree of remoteness. Remoteness mainly characterises fragile rural areas. Moreover, accessibility is a measurable concept, with respect to the abstractness of rurality, and, therefore, eases quantification and implementation. Structural location indicators measure the relationship between the spatial units under study, with respect to the nearest unit showing the availability of several services, usually urban centres. In the literature, the distances to urban centres (Caschili et al., 2015; Molestina et al., 2020) or travel times between spatial units (Bański & Mazur, 2016; van Eupen et al., 2012) are considered. Accessibility targets depend on certain objectives. In wellbeing studies and cohesion policies, education or health systems discriminate between central/remote areas (Zolin et al., 2017). Instead, economic research focuses on labour-related topics like employment or commuting (Caschili et al., 2015).

Methods

Ease of implementation, use and interpretation drive method selection in policy contexts. Threshold-based methods are the pillars of urban-rural classifications. International organisations and official statistics in the EU rely on fixed thresholds on population grids (Dijkstra & Poelman, 2014; Eurostat, 2019). The choice of thresholds is detailed in Dijkstra et al. (2021).

Starting from this setting, the scientific literature has added more dimensions of territorial human presence and made use of more sophisticated reasoning. Öztaş (2021) simply’superimposes’ (page 461) the share of population living in rural areas, population density, distance to urban centres and employment in agriculture. Using specific thresholds for each indicator, the author establishes an urban-rural classification of Turkey. Controlling for land cover classes, Wandl et al. (2014) use population density thresholds to delineate ‘territories-in-between’ across Europe. Using Global Human Settlement built-up grid data,Footnote 1 van Vliet et al. (2019) use thresholds to determine which grid cells are predominantly rural or predominantly urban.

In the presence of many indicators, a data reduction method is usually necessary. Zhao et al. (2019) weights different indicators related to population density, accessibility to services, occupation and natural space to derive a continuous composite indicator of rurality. This indicator is then used to define the urban-rural classification. Their index of rurality makes equal distributional assumptions over the four domains. Meanwhile, van Eupen et al. (2012) use principal components analysis (PCA) to combine more than 30 variables, including population density, land cover and accessibility indicators. The urban-rural classes in Europe are defined in terms of data-driven thresholds—i.e. the average and standard deviation of each principal component. Moreover, in order to account for the diversity of different environmental zones, these thresholds are area-dependent.

By means of a rather complex procedure, Bański and Mazur (2016) develop a territorial classification of Poland combining dynamic, structural and locational approaches. Fixed thresholds are applied to different indicators and the final classification is a cross-classification of the three approaches.

Taubenböck et al. (2022), using both administrative and grid-based figures, apply thresholds to population, building density and share of building types in Germany. They combine different thresholds used across the world, to define the degree of urbanisation in a probabilistic manner.

Besides thresholds, data-driven methods (multivariate statistical methods) are employed to derive or validate territorial classifications. Caschili et al. (2015) use PCA to derive a continuous composite index of rurality, further categorised to classify the territory of Sardinia, Italy. Moreira et al. (2016) use a clustering approach, namely k-medoid, to classify municipalities in the Lisbon Metropolitan Area in Portugal. Gonçalves et al. (2017) apply a combination of PCA and partitioning around medoids to identify classes of peri-urban areas. Fiaschetti et al. (2021) use affinity propagation in order to identify ‘regional typologies’ in Europe.

Although data-driven methods might appear to be more appealing, they depend heavily on the availability, quality and accessibility of data (Fiaschetti et al., 2021). Consequently, their temporal and spatial extension and harmonisation are generally questionable. In practical applications, threshold-based methods are often preferred.

Data and methods

The spatial unit

Most territorial classifications use administrative units as basic spatial elements.Footnote 2 Nonetheless, administrative units present some disadvantages: they are derived from historical or natural boundaries; they are not comparable in terms of structural, economic or socio-demographic features; they do not account for the presence of agglomeration economies or accessibility (Grillitsch et al., 2021; Viñuela et al., 2014). Moreover, given their variable size, they are also known to contribute to the modifiable areal unit problem (MAUP) in spatial analyses (Budde & Neumann, 2019; Openshaw & Taylor, 1979). These drawbacks could be avoided by using functional units.

LMAs are functional regions based on travel to work data and are aggregations of municipalities where the bulk of the population lives and works. They share, by construction, a common minimum level of commuting self-containment leading to comparable spatial units that make up the entire national territory. The number of municipalities in LMAs approximates the home-to-work network size: the larger the network, the easier it is to commute larger distances, especially in the presence of transportation infrastructure. LMAs thus exhibit a built-in capacity to address accessibility issues. Moreover, being functional areas, LMAs reduce MAUP distortions. For example, by analysing French employment areas, Briant et al. (2010) show that LMAs’ size only slightly alters their economic geography analyses, while shape is almost irrelevant.

In Italy, LMAs have proved their usefulness for regional labour and industrial policies, as well as for planning purposes, especially in transportation systems (De Montis et al., 2011). From the viewpoint of policy development, especially in small or sparsely populated regions like rural areas, LMAs are able to capture socio-economic interactions at the local level and represent local economies. From a planning viewpoint, as the understanding of mobility and transport connectivity is crucial to locate essential services (Tran & Draeger, 2021), LMAs enable better services provision.Footnote 3

LMAs, based on Coombes et al. (1986), gain recent development at international level (Ichim et al., 2023; OECD, 2020).Footnote 4 Various EU countries have adopted the European LMA method for national policy design,Footnote 5 paving the way for a harmonised and comparable geography at EU level (Eurostat, 2020).

In Italy, there are 610 LMAs stemming from travel-to-work flows from the 2011 population census. Their size varies from two to 174 municipalities, see Fig. 1a. There are vast LMAs with several million inhabitants, corresponding to the main cities (Rome, Milan and Turin) and large conurbations next to Milan (Como, Lecco and Bergamo). On the other hand, in isolated areas characterised by small settlements (e.g. along the Apennine chain or in remote areas on the two main Italian islands of Sicily and Sardinia), LMAs consist of a few municipalities.

Fig. 1
figure 1

(a) Distribution of municipalities in Italian labour market areas and (b) their commuting intensity. Source: Authors’ elaboration on 2011 population census

Indeed, LMAs adapt their size according to the strength of socio-economic linkages. For each LMA, Fig. 1b shows commuting intensity—i.e. the percentage of commuting flows between different municipalities over the total number of flows. Northern Italy shows high commuting intensities, which correspond to strong connections between municipalities; conversely, in non-core and remote areas, where flows are mostly within municipalities, the indicator exhibits lower values. Although large, the Rome LMA does not show a high value of commuting intensity because its municipalities are connected to the capital city but not with each other.

Data

Geostat grid

The urban–rural component relies on population grid data—i.e. worldwide geographically referenced statistics based on a coordinate system of grid cells. A population grid is composed of squared cells, each containing population counts. Being independent of any administrative boundaries, grid cells enable spatially comparable classifications. They also have the advantage of being stable over time and may easily be combined to reflect specific areas. This paper makes use of the 2011 Italian census grid data.Footnote 6 Additionally, the Eurostat (2019) grid cells classification is used: contiguous grid cells with a population density of at least 1,500 inhabitants/km2 and, collectively, more than 50,000 inhabitants identify high-density clusters (urban centres). Contiguous grid cells with a density of at least 300 inhabitants/km2 and more than 5,000 inhabitants define urban clusters. The remaining cells are labelled as rural.

In Italy, there are more than 300,000 grid cells and 33 per cent and 43 per cent of Italians live in high-density and urban clusters, respectively.

CORINE Land Cover (CLC)

The functional component relies on cartography from CORINE (European Union, 2018). Using a 75 per cent threshold rule, CORINE maps landscape patterns to a three-level hierarchical nomenclature. The first level (CLC1) comprises five main classes of land cover: Artificial surfaces (5.5%),Footnote 7Agricultural areas (51.9%), Forest and semi-natural areas (41.3%), and Wetlands and Water bodies (for the residual part). The second level (CLC2) contains 15 classes; whereas, the third level includes 44 categories. Italian LMAs cover about 130,000 CLC3 patches. The number of CLC3 categories covering each LMA ranges from six (two compact LMAs) to 34 (Cagliari, showing the most variable landscape). The CORINE minimum mapping unit for status layers is 25 hectares; since the median LMA surface equals 39,000 hectares, CLC precision is considered sufficient for the purposes of our classification.

Since CLC1 does not sufficiently discriminate between classes and CLC3 classes are far too detailed, CLC2 level data are more suitable for our research objectives. The aggregation of scarcely present classes helps to achieve more balanced land distribution. For Italy, the aggregation of Wetlands and Water bodies proves adequate. Figure 2 shows the CLC2 classes, \(C=\left\{\left.C1,\dots ,C10\right\}\right.\) analysed classes and the proposed functional classification.

Fig. 2
figure 2

CLC2 classes (left), analysed aggregations (centre) and final proposed functional classification (right). Source: Authors’ elaboration on CORINE Land Cover 2018 data

A spatial overlay of LMA geography and CLC3 layers allows LMA land cover to be derived. Figure 3 shows CLC3 patches mapped as Built-up areas, Agricultural resources and Environmental assets classes. Heterogeneity in the land cover characteristics of Italian LMAs is overwhelming.

Fig. 3
figure 3

Italian land cover and some labour market areas according to functional classes. Source: Authors’ elaboration on CORINE Land Cover 2018 data

Methods

The proposed functional urban-rural classification hierarchically exploits two primitive features to account for LMA territorial diversities: population density and land cover classification. The former allows the characterisation of spatial unit settlement types. The latter considers land cover classification as a pathway to identify underlying human activities, in order to infer their socio-economic dimension.

The urban-rural component relies on the spatial concentration of population through grid cells, therefore capturing the logic of agglomeration economies. This component makes use of the methodology adopted in the EU to classify NUTS3 regions (Eurostat, 2019) and it inherits its thresholds and nomenclature. Indeed, stakeholders and policymakers are already acquainted with this classification and its interpretation as it is included in European regulations. Moreover, the European NUTS3 classification takes advantage of sensitivity studies in EU countries.

Following Eurostat (2019), the set \(G\) of grid cells covering Italian territory is first divided into high-density (\(H\)), urban clusters (\(U\)) and rural (\(R\)) cells. Denote by \({G}_{i}=\left\{{g}_{i, 1},{g}_{i, 2}, \dots , {g}_{i, {n}_{i}}\right\}\) the set of \({n}_{i}\) grid cells covering the \(i\)-th LMA, \(i=1, \dots , N\), where \(N\) is the total number of LMAs. Denote further by \({P}_{i, j}\) the number of residents in the \(j\)-th grid cell of the \(i\)-th LMA, \(j=1,\dots , {n}_{i}, i=1, \dots , N\).

The percentage of the resident population in \(H\) grid cells associated with each LMA is computed by:

$${P}_{i}^{H}=\frac{\sum_{j=1}^{{n}_{i}}{P}_{i, j}*{{\varvec{I}}}_{H}\left({g}_{i,j}\right)}{\sum_{j=1}^{{n}_{i}}{P}_{i, j}}*100 \mbox{ } \mbox{ } i=1, \dots , N$$
(1)

where \({I}_{H}\) represents the indicator function of high-density cells \(H\). Similar formulas hold for urban clusters whose population is denoted by \({P}_{i}^{U}, i=1, \dots , N\).

Finally, the LMA urban-rural classification may be operationalised as:

$$\begin{array}{ll}Urban& , \mbox{ } if \mbox{ } {P}_{i}^{H}+{P}_{i}^{U}\ge 80\\ Intermediate& ,\mbox{ } if \mbox{ } {50\le P}_{i}^{H}+{P}_{i}^{U}<80\\ Rural& , \mbox{ }if \mbox{ }{P}_{i}^{H}+{P}_{i}^{U}<50\end{array}$$

A final fine-tuning avoids possible distortions due to extremely small surface LMAs and upgrades units in the presence of large cities.

Concerning the classification’s functional component, a method accounting for relative concentrations of land cover types with respect to the average national level seems more appropriate than comparisons with absolute values. To measure such concentrations, location quotient measures (\(LQ\)) are widely used in the literature (for a review of \(LQ\) usefulness, see Crawley et al., 2013). Moreover, for classification definitions, \(LQ\) is more suitable than spatial autocorrelation methods. Indeed, \(LQ\) compares each spatial unit to a common benchmark, thus focusing on individual features.

For each LMA, \(LQ\) compares the surface proportion of each \(C1,\dots ,C10\), with the corresponding national mean level (km2). Let \({x}_{ik}\) denote the surface covered by the k-th analysed class in the i-th LMA. Then \({LQ}_{ik}\) is defined as:

$$\begin{array}{ccc}{LQ}_{ik}=\frac{{x}_{ik}}{{x}_{i.}}/\frac{{x}_{.k}}{{x}_{..}}& i = 1, \dots ,\mathrm{ N},& k=1, \dots , 10\end{array}$$
(2)

where \({x}_{.k}\), \({x}_{i.}\) and \({x}_{..}\) represent the k-th analysed class surface in the country, the i-th LMA surface, and the total national surface, respectively. When \({LQ}_{ik}\) is greater than one, the k-th CLC class characterises the i-th LMA more than the national average does.

LMAs’ environmental footprints are then defined by those C classes reaching maximum \(LQ\) values when \(LQ\ge 1\). Finally, to increase classification usability and unify land cover classes that potentially lead to similar end-use, land cover classes are labelled according to their functional classification—i.e. Built-up areas, Agricultural resources and Environmental assets. The first class is characterised by urban and industrial cover. Agricultural resources class includes those LMAs where land surface devoted to crops, pastures or arable land is significantly higher than the national mean level. Environmental assets LMAs share a higher concentration of forests, shrubs, open spaces, wetlands, marine or inland waters than the national average.

Results

A joint hierarchical use of urban-rural and functional primitives allows a flexible description of communities and their territories. In fact, high levels of population density do not necessarily correspond to completely anthropic spaces. Furthermore, areas characterised by low population density may not engage in agricultural activities. The first hierarchical level consists of three classes: predominantly urban (Urban), Intermediate and predominantly rural (Rural). Each class of this first level comprises three environmental classes: Built-up, Agricultural resources and Environmental assets, resulting in the nine-class classification, presented in Table 1.

Table 1 Percentage distribution of Italian labour market areas, their surface and population

Urban and Intermediate classes cover 21.6 per cent and 35.1 per cent of Italian LMAs, respectively. Conversely, the Rural class has small population proportion (13.5%) but is well represented in terms of spatial units (43.3%). Population percentages clearly inherit density requirements defining the classification; the Urban class shows the highest percentage (57.3%), followed by Intermediate and Rural LMAs. As for the environmental dimension, natural land cover has larger extensions than artificial. This reflects the small percentage of LMAs in Built-up (12.1%). The classification linked to agricultural land includes more than 50 per cent of LMAs, most of them in Intermediate or Rural classes. Environmental assets class represents more than one third of Italian LMAs (36.7%) and is particularly present in the Rural class. In Italy, there is no occurrence of LMAs in the combination (Rural & Built-up).

Figure 4 shows our classification along with NUTS2 region borders. The leftmost panel presents Urban LMAs; these correspond to the largest Italian cities and those where high-density clusters dominate population shares. In Italy, only some of the major cities (e.g. Rome, Milan, Bologna and Naples), as well as towns with large harbours and infrastructure along the coasts, belong to the Built-up. They are mainly concentrated in northern Italy (Lombardy being well represented) and in Campania. In the Lazio region, the Pomezia LMA is contiguous to the large LMA of Rome. The Pomezia LMA has petrochemical and pharmaceutical business clusters, whereas the Rome LMA shows administrative and governmental infrastructures. Urban LMAs in Apulia and the southern part of Sicily belong to Agricultural resources class due to their very compactly shaped municipalities with high population density, surrounded by vast and completely uninhabited areas devoted to agriculture (olive trees or wheat cultures). Because of its lagoon, Venice belongs to the Urban-Environmental assets, like many LMAs on Italian coasts with natural resources (e.g. Cagliari).

Fig. 4
figure 4

Spatial configuration of the functional urban-rural classification of Italian labour market areas (grey) and regional borders (black). Source: Authors’ elaboration

Intermediate LMAs (the central panel of Fig. 4) are the most complex and diversified; they cover medium-sized towns or networked small urban centres. Amongst the former, in Environmental assets, well-known examples are those located in the Alps (north of Italy) and hilly towns next to lakes or central national parks (e.g. the Umbria and Abruzzo regions). Networked small urban centres, belonging to Built-up class, embody most of the LMAs in the Pianura Padana (Emilia-Romagna region). They mainly specialise in manufacturing activities: agri-food in Parma, leather and related products in Cesenatico, and machinery equipment in Correggio, Reggio nell’Emilia and Modena. Similarly, in the Friuli-Venezia Giulia region, Intermediate & Built-up Pordenone and Gorizia are devoted to wood and furniture sectors, whereas the economy in Udine is more diversified. However, the largest LMAs in these two regions present very different profiles. Parma, Modena and Reggio nell’Emilia have a smaller number of municipalities than Udine and Pordenone in Friuli-Venezia Giulia. Moreover, these LMAs are polycentric (11 core municipalitiesFootnote 8 each) and exhibit nearly double the commuting intensity with respect to the LMAs in Emilia-Romagna.

Finally, Intermediate LMAs comprise all environmental features due to the simultaneous presence of large industrial clusters, intensive farming and natural reserves (e.g. the River Po delta and surrounding countryside). The rightmost panel shows LMAs in Rural classes. These comprise small villages in agricultural or mountainous areas (see Borgo Val di Taro in Fig. 3 for example). They are mostly present in Sardinia, along sandy shorelines, and on the eastern side of the Alps (Dolomite UNESCO sites) with a high concentration of pasture and rock. Additionally, these LMAs are located along the Apennine chain, a notably fragile and depopulated part of Italy with large forests and mountains (following a virtual arch shape from Liguria to Calabria).

Classification validation

Spatial validation

This section presents the results of analyses investigating classification coherence and robustness. Our urban-rural classification depends on shares of residents living in urban cells, \(H\) and \(U\). To analyse its responsiveness and gain further insight into the concentration of urban cells within each class, types and patterns of urban agglomerations are investigated (Tikoudis et al., 2022). Figure 5 illustrates the job density and the spatial structure of urban cells. Figure 5a shows job density distributions as the number of employed persons/km2. There are three clearly distinctive patterns. As expected, Urban LMAs display the highest average job density (135.8 jobs/km2). Intermediate labour markets halve this mean value and, on average, Rural LMAs yield less than 21 jobs/km2. Urban job density distribution suggests the existence of two subpopulations. The first, around the rightmost peak at 190 jobs/km2, has a mean job density equal to 266 jobs/km2 and a mean population of 443,000 inhabitants. Moreover, these LMAs show, on average, five core municipalities. This group includes the most populated Italian LMAs (Milan, Naples, Rome, Bergamo, etc.). The second group of Urban LMAs (leftmost peak at 70 jobs/km2) includes the surrounding medium-sized cities (Cagliari, Brindisi, Sanremo, etc.). Indeed, the average population in these LMAs is less than 118,000, their mean job density is 89 jobs/km2 and, on average, they comprise only three core municipalities. Moreover, compared to the first group, their mean network size (number of municipalities) is much lower—i.e. 14, as opposed to 25.

Fig. 5
figure 5

(a) Labour market areas’ job density (log-scale) distribution by urban-rural classification: red Urban, dashed grey Intermediate and dotted brown Rural; (b) box plot of spatial Shannon entropy by urban-rural classification; results for 511 labour market areas presenting at least one urban grid cell. Source: Authors’ elaboration on 2011 population census

To investigate LMAs’ spatial concentration, spatial entropy (Altieri et al., 2018) is used. Figure 5b shows the box plot of Shannon spatial entropy distributions for urban-rural classes; the first (Q1) and third (Q3) quartiles are the box limits, the median being the line between them. Urban LMAs show the highest median value, confirming that these LMAs present a homogeneous coverage, in terms of urban grid cells. In contrast, Rural LMAs produce significantly lower values than the other two classes, indicating a more dispersed pattern of urban grid cells. Interestingly, both Urban and Intermediate LMAs show a wide range of patterns of urban grid concentration. Their spatial entropy indices vary from 0.05 to 1.04. Moreover, both classes include LMAs among the highest top 10 values according to entropy-based ranking: e.g. Milan, Naples and Varese in Urban or Cesenatico and Rapallo in Intermediate.

However, these large values portray different situations. Urban LMAs show extensive and compact patches of urban grid cells, (see Fig. 6 for some examples). Such LMAs highlight functional relationships between coastal and inland territories (Naples), a network of physical nodes of development linked by transport routes (Milan and Varese), and urban spillover across neighbouring agricultural land (Rome). In contrast, in the Intermediate class, large spatial entropy values correspond with typical examples of littoralisation, concentration of economic activities, population and settlements in coastal areas (Cesenatico), or growth constrained by the physical environment (Rapallo, located between the sea and mountains). The highest values of both job density and Shannon entropy clearly characterise polycentric Urban LMAs. Such functional regions are able to create a network of central nodes, exchanging commuting flows that extend their socio-economic influence, even outside their boundaries (Milan, Como, Bergamo, etc.). Slightly lower values pertain to large monocentric LMAs (Rome) where the central node dominates other municipalities (Omnes Viae Romam Ducunt).

Fig. 6
figure 6

High-density clusters, urban clusters and rural cells associated with labour market areas of Rome, Naples, Milan, Varese (Urban), Cesenatico and Rapallo (Intermediate). Source: Authors’ elaboration on Geostat 2011 grid data

The comparison of LMAs portrayed in Fig. 6 and their corresponding CLC classes (Fig. 3) shows a high correlation between the urban grid and Built-up class. Figure 6 illustrates the anthropic space compactness. However, it is only by means of the CLC lens that rural LMAs can be characterised. In fact, widely different landscapes are present outside Milan, Varese and Naples (Built-up), as well as in the Cesenatico and Rapallo littoralisation.

Heterogeneity is also present in the functional data. Although \(LQ\) is a powerful tool for coping with this matter, various issues need to be addressed. \(LQ\)’s sensitivity to benchmark value choice, distributional (sparseness and extreme values) and size effects is known to be problematic.

Several studies have been carried out on the choice of benchmark values, which are commonly set equal to one. Depending on both region size and the level of aggregation of the observed characteristics, benchmarks ranging from one to three are in use. To deal with this arbitrariness, the literature suggests parametric and non-parametric statistical approaches, often based on normal distribution assumptions (O'Donoghue & Gleave, 2004). In our application, Kolmogorov–Smirnov tests reject normality hypotheses of the \(LQ\) distributions. Thus, to evaluate classification robustness, a data-driven approach is implemented (Tian, 2013). For each \(C\) class, bootstrap is used to estimate the third quartile of the distribution; this estimated value is then used as a benchmark. Subsequently, each LMA is re-classified to the class where \(LQ\) achieves its maximum value and is greater than the corresponding analysed \(C\) quartile benchmark. Since the bootstrap values are not very distant from one, this method leads to a similar classification.

An additional weakness of \(LQ\) is its sensitivity to extreme values and sparsity. The former is relevant when location quotients appear as covariates in regression models, rather than when used for classification purposes. With regard to sparsity, a high percentage of structural zeros might indicate that variable aggregations are not sufficient. In our application, about 12 per cent of \(LQ\) matrix entries equal zero. Pastures (\({C}_{5}\), 3.4%), Permanent crops (\({C}_{4}\), 2.2%), Water (\({C}_{10}\), 2.2%) and Open spaces (\({C}_{9}\), 1.5%) show the largest percentages of zero values. Sparsity analysis shows that 86 per cent of LMAs have at least eight non-zero \(C\) classes represented. These figures suggest that sparsity is a negligible issue when analysing Italian LMAs.

A further concern about \(LQ\) is its sensitivity to size effects (Iglesias, 2021). Indeed, \(LQ\) shows increased uncertainty in small regions. Various statistics address this issue by assessing the classification robustness. Pre-treatment of less represented surfaces is a way to address size effects. The adjusted location quotient (Pominova et al., 2022) allows the assessment of size effects; it measures the influence of adding a single surface unit to LQ values. Large differences between the original and adjusted counterparts suggest the existence of size effects. In Italy, only six LMAs (out of 610) showed absolute differences greater than one. They are mainly very small islands or seaside LMAs, their surfaces ranging from 10 km2 -51 km2. Moreover, the small difference (2.3%) between the original and adjusted LQ based classifications proves classification robustness,

Economic validation

This section analyses statistical indicators reflecting the LMAs’ economic structure, to investigate classification efficacy. Such validation is based on LMAs’ share of employment. In this research, employment indicators explore three different dimensions: employment intensity, changing patterns of agriculture, and distinctive Italian activity, such as tourism. The economic sectors which validate the classification might change over time or according to territorial contexts. The first dimension is addressed by LMA employees over population (Emp/Pop): the higher the employment intensity, the greater the regional economic development (both notably characterise urban centres). Moreover, modern agriculture covers a large percentage of land but has a low employment share. In Italy, agricultural occupations are higher (5.5%) than the EU level (4.0%).Footnote 9 Furthermore, tourism has undeniable importance in the Italian labour market: at the national level, 12 per cent of employees undertake touristic activities. As reported by the United Nations World Tourism Organisation, Italy is constantly in the top 10 world tourist destinations, according to many rankings.Footnote 10 Indeed, its attractiveness is supported by tourism variety—e.g. art and historical tourism (50 per cent of UNESCO monuments are in Italy), religious (the Vatican City), mountains (the Alps), and ‘sun and beach’ tourism (the Mediterranean Sea). The total share of employees in agriculture and tourism sectors (Sh_Agri_Emp and Sh_Touristic_Emp, respectively) measures these economic phenomena. Figure 7 shows these indicators by functional urban-rural classification.

Fig. 7
figure 7

Employment indicators by class: share of total economy employees over population (Emp/Pop), and total economy share of employees in tourism and agricultural sectors (Sh_Touristic_Emp and Sh_Agri_Emp, respectively). Source: Authors’ elaboration on 2017 official statistics business registers

Productive structures and infrastructures delineate artificial CLC areas. Built-up shows the highest shares of employees over population in both Urban (35.6%) and Intermediate (34.9%) LMAs, while for the Rural class, the same indicator is up to 14.5 percentage points lower. Indeed, many Rural LMAs are characterised by ageing index values and total age dependency ratiosFootnote 11 which are higher than the national average, highlighting depopulation processes. The discriminant power of LMA classification is also proved by the ANOVA analysis performed using Emp/Pop as the dependent variable. Agricultural resources show significant differences with respect to Urban-Built-up class.

In Italy, agricultural activities are not confined to a single urban-rural LMA class. Indeed, in Urban LMAs, the share of employees in agriculture ranges from 0.7 per cent (Built-up) to 7.1 per cent (Agricultural resources). New forms of urbanisation that extend the urban space towards the countryside modify the structure and role of cities. This extension contributes to urban and peri-urban agricultural development (Thebo et al., 2014). Urban and Intermediate LMAs in the Agricultural resources class show the same level of employees in this economic sector (7.1%). In fact, alongside Italian urban centres, territories of intensive or extensive agriculture devoted to market-oriented large-scale agricultural production are often present; this is the case with LMAs in Apulia and Sicily (highlighted in Fig. 4). However, by ordering the eight LMA classes by their share of employees in agriculture, Rural-Agricultural resources LMAs reach the highest value (11.3%); whereas, Rural-Environmental assets LMAs are in second highest position (8.3%). The ANOVA analysis on Sh_Agri_Emp confirms widespread agricultural activities across LMAs. Only Rural-Agricultural resources class shows Sh_Agri_Emp values significantly different from Urban-Built-up.

Tourism activities have large shares of employees in the main Italian centres (e.g. 23.7 per cent in the municipality of Venice, 16.2 per cent in Florence). However, when considering corresponding LMAs, such shares drop as Urban LMAs are quite large and nearby municipalities are less involved in tourism (17.2 per cent for Venice, 12.1 per cent for Florence). Moreover, the higher the degree of urbanisation, the greater the diversification of economic activities (Carlucci et al., 2020; Jacobs, 1969). As a result, both Intermediate-Built-up and Urban-Built-up LMAs achieve a minimum share of employees in tourism sectors (7.9 per cent and 8.1 per cent, respectively). Environmental assets LMAs show a relatively high presence of employees in tourism and cultural sectors compared to other groupings: in Urban, Intermediate and Rural classes, the indicators equal 14.9 per cent, 15.1 per cent and 17.5 per cent, respectively. This evidence confirms that Environmental assets LMAs are closely linked to tourism; ANOVA analysis also indicates statistically significant differences between the Environmental assets classes and the others. The connection between tourism and rural areas —the first is considered to be an engine of development for the second, through sustainable tourismFootnote 12 —is also obvious in Agricultural resources, where the share of employees in tourism activities ranges from 9.5, per cent in the Intermediate class, to 11.6 per cent in the Rural class.

Concluding remarks

LMAs’ functional self-containment enables the identification of communities based on daily labour routines and relationships. Coupled with our classification, LMAs allow the formation of a coherent set of functional urban and rural areas stemming from a unique process and simultaneously accounting for environmental features. The combination of spatial units and classifications is a substantial contribution to the characterisation of communities and their territories. The classification represents a suitable framework for territorial policies and strategic planning.

Currently, in Italy, only NUTS3 regions have an urban-rural classification. However, this is known to be unable to detect many rural areas. In contrast, because LMAs are functional territorial units, they better delineate different areas. Moreover, the statistics which are particularly relevant at this geographic level—e.g. (un)employment rate—and information on LMAs’ industrial characteristics, are tools that further broaden and deepen knowledge of complex economic phenomena. Therefore, our classification enriches urban-rural indicators aimed at prioritising areas of local and central policy intervention.

In the urban context, this combination of geography and classification defines an integrated and networked urban system with well-developed agglomeration economies delivering standardised information across governing levels. More specifically, LMA geography could play an essential role in detecting barriers or inter-municipal cooperation, proving to be of further relevance for spatial planning agendas. As for the urban-rural component, it helps to identify systems with different development levels (urban or intermediate). Use of a well-known tool for such components supports its immediate use in urban planning contexts. In urban spaces, the environmental component helps discovering LMAs’ possible emerging vocations.

In rural areas, where asymmetries are sharper and linkages between municipalities are weaker, LMAs represent a way to aggregate small municipalities. Our classification supports the identification of rural communities and sheds light on possible economic activities, thus favouring evidence-based decisions on territorial management.

The use of standard, globally available harmonised data, methods and tools makes classification easily implemented in other case studies and for other spatial units, although its components might be less interconnected. Indeed, the classification itself, via its cartographic representation, provides a first glimpse at socio-economic characteristics. Moreover, contrary to previously researched classifications solely concentrating on a single type of land cover, this proposal provides an exhaustive framework.

More granular land cover data are envisaged in further studies. In fact, many international projects are investigating the production of different, more detailed, land cover data.

A further extension could be the adoption of the same resolution level for both primitive features—i.e. a 1 km2 grid for the environmental dimension too. A finer resolution requires more detailed imagery, possibly interpreted aerial photographs, whose availability and quality may need assessment. Additionally, methods could address accessibility of services and resources more deeply. The use of specific information on extremely remote rural areas may help to locate labour markets where targeted policies are imperative. A dynamic framework could improve understanding of the determinants of territorial change.

Amidst growing concerns about increasing geographical inequalities in many aspects of life, this proposal seeks to contribute to a classification which would enable the characterisation of urban and rural populations and their spatial entities, providing tools to satisfy the heightened need for place-based policymaking and monitoring.