Introduction

Sustainable development—defined as “development that meets the needs of the present without compromising the ability of future generations to meet their own needs” (World Commission on Environment and Development 1987, p. 8)—appears to be the central challenge and the common goal of our time (Chen and Wu 2009; Wu 2013; Liu 2018; Opdam et al. 2018). Indeed, the United Nations 2030 Agenda for Sustainable Development has been adopted worldwide, guiding the actions of individuals, societies, and countries towards a total of 17 sustainable development goals (SDGs; da Silva et al. 2018; Sun et al. 2018; Wang et al. 2020). According to the Land Portal Foundation (2019), 30% of these SDGs are directly related to land use, such as “zero hunger” and “no poverty”. This suggests that it is critical for land use to be optimized to support better sustainable development (Wu et al. 2011; Gustafson 2013; Hu et al. 2015; Pohjanmies et al. 2017; Lin et al. 2019; Gao et al. 2020).

Sustainable land-use optimization is the process of optimizing the composition (e.g., the number of land-use types and the proportion of each type) and the configuration (e.g., the spatial distribution of land-use types) of the land-use types of each unit of a geographical area to meet the requirements of sustainable development. Such a process usually balances the tradeoffs among multiple land-use objectives (e.g., ecological protection and economic growth), in an attempt to maximize net utility across all outcomes. Often one objective can only be increased (or decreased) at the expense of decreasing (or increasing) another objective (Karimi and Hockings 2018), but more sophisticated approaches seek solutions that simultaneously improve multiple objectives while minimizing negative impacts on others, such that the net result is the optimal net benefit (e.g., Liu et al. 2018; Wang et al. 2018). These conflicts were traditionally resolved in a highly subjective manner in which the aim was to reach some thresholds of subjective values, rather than determine the best land-use planning (Caparros-Midwood et al. 2015). This subjective method is vulnerable to perception error and unable to effectively or rigorously incorporate multiple costs and benefits. Therefore, it is not suitable for the urgent needs of sustainable development, especially given the large challenges the world faces in meeting the goals of the SDGs (Fu et al. 2019).

An alternative approach to land-use optimization that is less subjective and more robust is to employ optimization techniques to address those conflicts. Among all optimization techniques, linear programming (LP) was applied earliest to sustainable land-use optimization. Its applications can be traced back to Schlager’s (1965) formulation of a land-use plan design model using LP. This model was very influential and was the basis of a wide range of land-use optimization programs, such as FORPLAN (Iverson and Alston 1986), which was used for decades to guide timber harvest planning for the United States Forest Service (USFS). Despite the recognized limitations of the modeling approach, it was groundbreaking in attempting to optimize the sustained yield of natural resources objectively. The primary disadvantage of using LP for sustainable land-use optimization is the need to integrate multiple objective functions by subjectively setting the weights of different objectives, using the weighted-sum method (Chen et al. 2019; Herrera-León et al. 2019). For example, in the case of FORPLAN as applied to USFS planning, timber production was the dominant criterion, and other resource conditions and goals (such as endangered species, recreation, and water) were not sufficiently incorporated. In addition, the model made incorrect assumptions about regeneration and harvest placement, leading to what is now widely accepted to be unsustainable land-use decisions that led to dramatic policy upheavals and reversals (Thomas et al. 2006).

To avoid subjective weighting, new optimization methods have been developed based on the identification of a Pareto front in parameter space, which is defined as the set of solutions representing the optimal tradeoffs among multiple objectives. The function seeks the parameter combination set of solutions where all the objectives cannot be further optimized simultaneously. Such solutions are referred to as Pareto-optimal solutions, and include Pareto optimal zoning (Ohls et al. 1974) and surrogate worth trade-off (Das and Haimes 1979). However, the practical problem is that these optimal tradeoffs are inefficient, or even impossible, to determine because of the vast number of possible tradeoffs, especially when dealing with problems involving a large number of variables (e.g., land-use optimization, where every land unit is formulated as a variable). Therefore, heuristic methods were introduced to make the determination possible and efficient, such as particle swarm optimization (Masoomi et al. 2013) and genetic algorithms (Porta et al. 2013).

Among different heuristic methods, the nondominated sorting genetic algorithm II (NSGA-II, Deb et al. 2002) has been demonstrated to be the most effective (Song and Chen 2018a). Also, NSGA-II appears to be the most popular genetic algorithm for optimization, as it has been cited 31,805 times (according to Google Scholar on 21 May 2020).

This study focuses on two improved NSGA-II algorithms for sustainable land-use optimization. One has been widely applied (Cao et al. 2011), and the other is the latest improvement (Song and Chen 2018b). The former algorithm has only had limited validation confirming its general utility in sustainable land-use optimization. The latter algorithm has only been evaluated in comparison with the original NSGA-II algorithm.

We argue that a thorough evaluation is needed for both of these two algorithms. The evaluation should be performed using a comprehensive framework, which is precisely what has been missing in the development and improvement of Pareto front-based algorithms for sustainable land-use optimization. In this study, we aimed to present such a framework. Using this framework, we then carried out a thorough evaluation of, and a systematic comparison between, these two improved NSGA-II algorithms. We expected that the evaluation results would support the general superiority of the improved Song and Chen (2018b) method. Furthermore, we identified areas where both algorithms could be further improved, as well as areas of potential importance to the development of future Pareto front-based algorithms for sustainable land-use optimization. We believe that the framework illustrated here will be useful for comprehensively evaluating the performance of these future algorithms.

NSGA-II: concepts and procedure

As introduced in the preceding section, NSGA-II is essentially a genetic algorithm. Therefore, in this section, we will introduce genetic algorithms and then NSGA-II.

The genetic algorithm (Holland 1992) mimics Darwin’s grand idea of evolution by natural selection to find the optimal solution of a mathematical function. It is an iterative algorithm and begins from an initial population of \(N\) solutions (also called candidates or individuals). Then, each candidate is evaluated in terms of their fitness to the environment, namely how optimal they are. After that, a new population is formed by selecting candidates based on their fitness. This new population (i.e., parent candidates) are allowed to have offspring (i.e., children candidates) by crossover and mutation, which combine characteristics of the parents’ model parameters and mimic the role of genetic mutation in natural selection, respectively. Typically, each pair of parent candidates generate two children candidates, making the total number of children candidates equal to \(N\). Finally, the population of children candidates replaces the original population. In summary, the main loop of a genetic algorithm involves three steps: selection, crossover, and mutation, as shown in Fig. 1a.

Fig. 1
figure 1

The comparison between an ordinary genetic algorithm (a) and NSGA-II (b)

NSGA-II improved this main loop to be Fig. 1b. The highlight is that there are two selections. The first selection works with crossover and mutation, making one population (\({P}_{t}\)) evolve to another (\({Q}_{t}\)). The second selection identifies the elitist candidates among the combined population of \({P}_{t}\) and \({Q}_{t}\) according to two criteria, namely domination (measured by nondomination rank) and crowdedness (measured by crowding distance). These elitist candidates form the next generation of \({P}_{t}\), namely \({P}_{t+1}\) (Fig. 2).

Fig. 2
figure 2

The procedure of the second selection in NSGA-II

When a NSGA-II algorithm is practically applied, the final results (i.e., the candidates of the last generation) are commonly assumed to be the Pareto-optimal solutions. However, it is essential to note that depending on the number of iterations, these candidates are not necessarily the true Pareto-optimal solutions. To avoid ambiguity, the final results are referred to as the resultant solutions of an algorithm in this study.

Two improved NSGA-II algorithms: a theoretical comparison

Improved NSGA-II algorithm by Cao et al. (2011)

Cao et al. (2011) improved the initialization, crossover, and mutation of NSGA-II. Also, they modified the commonly understood definition of Pareto-optimal solutions.

Improved initialization

The initial population was created with 90% candidates that are randomly generated land-use maps and 10% candidates that are precisely the current land-use map. This improves the ability of the algorithm to converge to plausible results relative to actual landscape conditions, which is important given that landscapes have inertia and “memory” (e.g., Wallin et al. 1994).

Improved crossover

The crossover was improved by limiting it spatially to occur between two cells or patches, rather than two candidates. This is important given the strong spatial dependence and autocorrelation of geographical processes and spatial patterns (Gao et al. 2013). Thus, an operator called a “single parent crossover” was introduced in three steps (Fig. 3). The first is to prepare a template, which is made up of 7 cells that are randomly selected from a \(3\times 3\) window. Second, two locations are randomly chosen from a candidate, at each of which a patch is created using the template. Third, the two patches are swapped to generate a new candidate.

Fig. 3
figure 3

The procedure of the single parent crossover operator

Improved mutation

Two new operators were developed: patch cells and constraint steering. Aimed at maintaining the diversity of candidates, the patch cells operator involves four steps (Fig. 4). The first is also to prepare a template of 7 cells. Second, a land-use type is randomly selected as the “mutate to” type. The third is to employ the template to create a patch at a randomly selected location of a candidate. Fourth, if any cell in the patch has the “mutate to” type, then all cells of the patch become that type; otherwise, the operator should be performed again.

Fig. 4
figure 4

Procedure of the patch cells operator

The constraint steering operator was designed to help candidates satisfy the constraints of an optimization. This is implemented in two steps. First, a candidate is evaluated regarding whether the constraints are satisfied or not. Second, the land-use types of a small number of cells of the candidate are changed to another, to satisfy the constraints.

Modified definition of Pareto-optimal solutions

Cao et al. (2011) introduced a new concept of “global Pareto front solutions” and used such solutions as the commonly understood Pareto-optimal solutions (i.e., the candidates of the final generation). Specifically, these “global Pareto front solutions” were obtained using the following procedure. First, the algorithm was performed for 1000 iterations. Second, a “sampled global generation” was formed by archiving the candidates of the 1st, 100th, 200th, …, 900th, and 1000th generations. Third, all archived candidates were divided into two categories according to their dominance, namely dominated and nondominated. The second category was the so-called “global Pareto front solutions.” In this study, these solutions are referred to as additional resultant solutions to avoid any ambiguity.

Improved NSGA-II algorithm by Song and Chen (2018b)

Song and Chen (2018b) also improved the initialization, crossover, and mutation of NSGA-II, and produced additional resultant solutions using a new archiving strategy.

Improved initialization

The initial population is comprised of \(N\) candidates that are at least 70% similar to the current land-use map, based on cell-wise identity. To generate such a candidate, 30% cells of the current land-use map are selected randomly, and then the land-use type of each selected cell is altered randomly to a different class.

Improved crossover

An edge cell-based operator was developed for the crossover (Fig. 5). First, all edge cells of a candidate are detected. Then, the land-use type of each edge cell is swapped with the corresponding cell of another candidate. In particular, such a swap will be skipped if the edge cell or its counterpart is conserved.

Fig. 5
figure 5

Procedure of the edge-crossover operator

Improved mutation

Two mutation operators were developed: patch-based and constraint-edge. The former makes a patch more homogeneous, as shown in Fig. 6. First, a window of \(3\times 3\) cells is randomly selected from a candidate. Second, seven cells are chosen randomly from the window to form a patch. Third, the land-use types of all these seven cells (except for the cells from a restrictive area, if any) are altered to the dominant land-use type of the patch.

Fig. 6
figure 6

Procedure of the patch-based-mutation operator

The constraint-edge operator is available if there is a constraint on the number of cells of a land-use type. It is performed on a randomly selected edge cell of a candidate. If the cell is not from a restrictive area (if any), the total number of the cells with the same land-use type will be determined. If this total number exceeds the upper limit, the land-use type of the selected edge cell will be altered to another one that is randomly selected. If this total number falls within the upper and lower limits, the land-use type will be set as any of the existing types with equal probability. In other cases, the land-use type will remain unchanged.

Archiving strategy

The nondominated candidates are archived in all generations. If the number the archived candidates exceeds the population size, the candidates with a smaller crowding distance will be removed. The final archived candidates are regarded as Pareto-optimal.

Difference between the two improved algorithms

We conducted a theoretical comparison between the two improved algorithms (Table 1), in terms of initialization, operators and their effects, and results. We argue that the crossover operator by Cao et al. (2011) is actually a mutation operator by definition (Kozek et al. 1993) because only one parent candidate is involved.

Table 1 A comparison between the two improved algorithms

A comprehensive framework for evaluation and comparison

In this section, we introduce a framework for evaluating the performance of a Pareto front-based algorithm for sustainable land-use optimization. This framework consists of principles, criteria, and indicators. These elements form a hierarchy of increasingly sophisticated conceptions and have been demonstrated to be fundamental in other comprehensive evaluations (Bautista et al. 2016; Gao et al. 2019). Here, the criteria are further classified into macro and micro, leading to a framework of four hierarchal levels (Fig. 7).

Fig. 7
figure 7

The proposed four-level framework for evaluation and comparison

Principle and three macro-criteria

A dictionary definition of principle is “a fundamental truth or proposition that serves as the foundation for a system of belief or behavior or for a chain of reasoning” (Oxford dictionary). Such a proposition in the context of algorithmically optimizing land use should be focused on the usability of the algorithm. Therefore, our principle was determined as follows: the algorithm is usable for Pareto front-based sustainable land-use optimization.

With this principle, we designed three macro-criteria according to usability. The International Organization for Standardization (ISO) defines usability as “the extent to which a product can be used by specified users to achieve specified goals” (ISO 2008, p. 2) and should be evaluated against three criteria. The first is effectiveness, which means the “accuracy and completeness with which users achieve specified goals” (ISO 2008, p. 2). The second is efficiency, namely the “resources expended in relation to the accuracy and completeness with which users achieve goals” (ISO 2008, p. 2). The third is satisfaction, which deals with the “freedom from discomfort, and positive attitudes towards the use of the product” (ISO 2008, p. 2). Here, these three criteria were refined in the context of Pareto front-based sustainable land-use optimization, leading to three macro-criteria:

  • Effectiveness: the capability to determine the Pareto front.

  • Efficiency: the resources expended to determine the Pareto front.

  • Satisfaction: the solutions’ goodness from the perspective of users.

Micro-criteria of each macro-criterion

Effectiveness has been defined as the capability to accurately determine the Pareto front, but there are two practical issues with this determination. First, the results of an optimization algorithm are not necessarily Pareto-soptimal because of the limited number of iterations (i.e., generations). Second, even if all the results are Pareto-optimal, the set of results is not necessarily the Pareto front because of the limited size of the population. Thus, we designed the following two criteria. One is whether the results come from the Pareto front or not, and the other is whether the results form a representative sample of the Pareto front or not. To access representativeness, we suggest employing the spread of the results over the Pareto front, or, in other words, the diversity of the results. These two criteria are defined as the micro-criteria of effectiveness and referred to as the quality and diversity of the results, respectively.

Efficiency is the resources expended to determine the Pareto front. For a computer algorithm, the resources expended can be specified as computational costs. Typically, two types of computational costs are identified, namely time and space (e.g., Zhou et al. 2017). Here, these two types are used as the micro-criteria of efficiency.

Satisfaction is difficult to specify because the perspectives of users are subjective and diverse, but this specification can be achieved here for two reasons. First, in performing an optimization, users are expected to have formulated the criteria of their subjective satisfaction as the objectives (and constraints). Second, in general optimization problems such as finding the maximum value of a mathematical function, there is not necessarily an initial solution. Such a solution, however, exists in any sustainable land-use optimization problems, namely the status quo of land use. Accordingly, we propose a micro-criterion for evaluating users’ satisfaction, namely the degree of optimization achieved by the results when compared with the status quo of land use. In addition, because users may set some constraints, we introduce the degree of violation of constraints as the second micro-criterion.

Indicators of the micro-criteria

Indicators of effectiveness

Both micro-criteria of effectiveness are dependent on a Pareto front; however, the Pareto front is unknown in evaluating an algorithm for sustainable land-use optimization. Note that such a problem does not exist in evaluating a general optimization algorithm, where test functions are typically deliberately designed and their Pareto fronts are explicitly determined (Yue et al. 2019). In those cases, the quality of results can be directly evaluated. For example, it was evaluated as follows in the original paper of NSGA-II (Deb et al. 2002). First, a total of 500 Pareto-optimal solutions were uniformly sampled from the Pareto front. Then, the minimum Euclidean distance is calculated in parameter space between each result and these sampled Pareto-optimal solutions. Last, the average of these distances was employed as the indicator of quality. Such a strategy does not apply to our case.

Therefore, we proposed a Pareto front-parameterless indicator of quality. The core idea is that since the end point of optimization (i.e., the Pareto front) is unknown, we employ the start point as an alternative. Instead of evaluating how close the results are to the Pareto front, we evaluate how much the results improve from the initial solutions, as follows:

$$Q\, = \,\sum\limits_{s = 1}^{m} {\gamma_{s} \left( {\frac{{\sum\limits_{\beta = 1}^{{n^{\prime}}} {{{f_{s}(Y_{\beta } )} \mathord{\left/ {\vphantom {{fs(Y_{\beta } )} {n^{\prime}}}} \right. \kern-\nulldelimiterspace} {n^{\prime}}}} }}{{\sum\limits_{\alpha = 1}^{n} {{{f_{s}(X_{\alpha } )} \mathord{\left/ {\vphantom {{fs(X_{\alpha } )} n}} \right. \kern-\nulldelimiterspace} n}} }}\, - \,1} \right)}$$
(1)

where \(Q\) is the indicator of quality, and \(m\) is the number of objectives. \({\gamma }_{s}\) is a coefficient; it equals 1 when the \(s\) th objective is maximization and -1 with minimization. \(n\) and \(n\mathrm{^{\prime}}\) are the numbers of feasible solutions in the initial population and that in the results, respectively. \({X}_{\alpha }\) denotes the \(\alpha\) th feasible solution in the initial population, \({Y}_{\beta }\) denotes the \(\beta\) th feasible solution in the results, and \({f}_{s}\left(\cdot \right)\) returns the \(s\) th objective function value (OFV) of \({X}_{\alpha }\) or \({Y}_{\beta }\). The value of \(Q\) can be explained as the sum of the average improvements towards all objectives. The greater \(Q\), the better quality of results.

This indicator \(Q\) is different from a popular indicator called average rank index (ARI; e.g., Duh and Brown 2007; Song and Chen 2018b), which is calculated in three steps. First, the results of an optimization algorithm are pooled with that of another optimization algorithm. Second, the nondomination rank of each solution from this pool is calculated using the fast nondominated sorting approach (Deb et al. 2002). Third, the average nondomination rank of the results of the optimization algorithm in question is calculated and used as the ARI. As a result, ARI has two problems. First, its value depends on the benchmark algorithm. Second, the effects of initial solutions are not considered. By contrast, \(Q\) is an independent indicator, and its value is relative to the quality of initial solutions. Therefore, \(Q\) reflects the true capability of an optimization algorithm.

To quantify diversity, we propose to employ the popular measure from information theory, namely Shannon entropy (Vranken et al. 2015). Its (\(H\)) formula is as follows:

$$H=-\sum_{i=1}^{h}{P}_{i}\cdot {\mathrm{log}}_{2}{P}_{i}$$
(2)

where \(h\) is the total number of cases, and \({P}_{i}\) is the probability of each case. Here, we first calculate the \({P}_{i}\) of a result as the proportion of one OFV of a result to the total values of all results towards that objective. Accordingly, the value of \(H\) is inversely proportional to the diversity towards that objective. Then, our indicator of diversity (\(D\)) is calculated as the sum of the decreases in \(H\) by the results towards each objective, as follows:

$$\left\{ {\begin{array}{*{20}c} {D\, = \,\sum\limits_{s = 1}^{m} {\left( {1\, - \,\frac{{\sum\limits_{\beta = 1}^{{n^{\prime}}} {{{P_{\beta } \cdot \log_{2} P_{\beta } } \mathord{\left/ {\vphantom {{P_{\beta } \cdot \log_{2} P_{\beta } } {\log_{2} n^{\prime}}}} \right. \kern-\nulldelimiterspace} {\log_{2} n^{\prime}}}} }}{{\sum\limits_{\alpha = 1}^{n} {{{P_{\alpha } \cdot \log_{2} P_{\alpha } } \mathord{\left/ {\vphantom {{P_{\alpha } \cdot \log_{2} P_{\alpha } } {\log_{2} n}}} \right. \kern-\nulldelimiterspace} {\log_{2} n}}} }}} \right)} } \\ {P_{\alpha } \, = \,{{f_{s} \,(X_{\alpha } )} \mathord{\left/ {\vphantom {{f_{s} \,(X_{\alpha } )} {\sum\limits_{\alpha = 1}^{n} {f_{s} \,(X_{\alpha } )} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{\alpha = 1}^{n} {f_{s} \,(X_{\alpha } )} }}} \\ {P_{\beta } \, = \,{{f_{s} \,(Y_{\beta } )} \mathord{\left/ {\vphantom {{f_{s} \,(Y_{\beta } )} {\sum\limits_{\beta = 1}^{{n^{\prime}}} {f_{s} \,(Y_{\beta } )} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{\beta = 1}^{{n^{\prime}}} {f_{s} \,(Y_{\beta } )} }}} \\ \end{array} } \right.$$
(3)

\(D\) is proportional to the degree of disversity, where a positive value indicates that diversity is improved, whereas a negative value means that diversity is worsened.

The proposed \(D\) has advantages over existing indicators of diversity. In comparison to the diversity indicator used in the original paper of NSGA-II (Deb et al. 2002), \(D\) tends to be more universally applicable as it does not require any prior knowledge of the Pareto front. In addition, \(D\) is capable of quantifying diversity globally and thus can be referred to as a global indicator of diversity. By contrast, the indicator of diversity used by Song and Chen (2018b) quantifies only local diversity because it is calculated as the average of crowding distances, where a crowding distance is measured using two neighbors of a result. Furthermore, \(D\) is also capable of reflecting the true capability of an optimization algorithm.

Indicators of efficiency

The indicator of time cost (\(T\)) is proposed as the executive time of an optimization algorithm operated on a general personal computer. The operating environment is emphasized because most users work with personal computers rather than enterprise computing resources. The executive time is used, rather than time complexity, because it is more intuitive for users in estimating time cost with their operating environment.

The indicator of space cost is proposed as the amount of working storage, which is a widely used indicator in evaluating a general algorithm. Here, we also proposed a novel indicator (\(R\)) designed for the improved NSGA-II algorithms employing an archiving strategy, the ratio of the total number of additionally archived solutions (from which the additional resultant solutions are selected) to the number of solutions in an iteration (i.e., to the population size). The smaller this ratio is, the less amount of working storage required. The ideal value of this ratio is zero, which has been achieved by the original NAGA-II algorithm.

Two advantages can be expected by using the proposed ratio. First, it is easy to calculate. Second, it is more interpretable because it is directly related to the most memory-consuming part of an optimization algorithm, namely retrieving and handling every land-use map. Note that a single map for a small region is probably highly memory-consuming because it may contain millions of values. For example, the land-use map with a spatial resolution of 30 m for the Lhasa City, China contains as many as 60,494,994 values and requires 220 MB memory.

Indicators of satisfaction

We proposed two indicators to evaluate the two micro-criteria for satisfaction, respectively. The indicator of the degree of optimization (\(O\)) was proposed as follows:

$$O=\underset{\beta }{\mathrm{max}}\left(\sum_{s=1}^{m}{\gamma }_{s}\left(\frac{{f}_{s}\left({Y}_{\beta }\right)}{{f}_{s}\left(Z\right)}-1\right)\right)$$
(4)

where \(Z\) denotes the current land-use map. This indicator quantifies the maximum degree of optimization achieved by the results when compared with \(Z\). To the best of our knowledge, \(O\) is the first indicator of satisfaction concerning the degree of optimization. It can be used in optimization problems consisting of either maximizing or minimizing functions, or both. The average degree of optimization is not used because (a) such property has been characterized by \(Q\) and (b) usually only one result is used as the future land-use planning.

In developing an indicator for the degree of the violation of constraints, we did not distinguish equality constraints from inequality ones. In essence, all equality constraints can be regarded as inequality ones using the following formula:

$$c^{\prime} \le C^{\prime}( \cdot ) \le c^{\prime},$$

where C′(·)denotes the function of an inequality constraint, and c′ is the function value. Accordingly, the proposed indicator of violation (\(V\)) is expressed as follows:

$$V=\frac{1}{{n}^{{^{\prime}}{^{\prime}}}}\sum_{\beta =1}^{{n}^{{^{\prime}}{^{\prime}}}}\sum_{k=1}^{K}max\left\{\frac{{C}_{k}\left({Y}_{\beta }^{^{\prime}}\right)-{c}_{k}^{u}}{\left|{c}_{k}^{u}\right|},\frac{{c}_{k}^{l}-{C}_{k}\left({Y}_{\beta }^{^{\prime}}\right)}{\left|{c}_{k}^{l}\right|},0\right\}$$
(5)

where n″ is the number of results, and \({Y}_{\beta }^{^{\prime}}\) is the \(\beta\) th result. \(K\) is the total number of constraints, \({C}_{k}\left(\cdot \right)\) denotes the function of the \(k\) th constraint, and \({c}_{k}^{u}\) and \({c}_{k}^{l}\) are the upper and lower limits of the \(k\) th constraint, respectively.

Experimental evaluation and comparison

Datasets, objectives, and constraints

The study area is the city of Lhasa, which is the capital, largest, and most urbanized city of the Tibet Autonomous Region (a provincial-level administrative region) of China. It is located in an area known as the “Lhasa Valley” along the southern edge of the Tibetan plateau (or Qinghai–Tibet Plateau), which is known as “The Third Pole” (also “Roof of the World”) and plays a vital role in global climate change (Qiu 2008; Kuang 2020). Lhasa and Tibet have witnessed an expansion of industrialization, urbanization, and tourism since the implementation of the western development strategy by China in 2000 (Huang et al. 2013; Zhao et al. 2015) and the construction of the Qinghai-Tibet railway in 2005 (Duo et al. 2015). However, there are five critical problems with the land use of Tibet according to the Government of the Tibet Autonomous Region (2011), namely low utilization, very limited cultivated land, low resistance to natural hazards, the unreasonable structure of different land-use types, and a heavy trade-off between land use and economic development. One key to partly solve these problems is Lhasa, where the trade-off is the most prominent between nature and socio-economic development.

Three datasets were prepared for experiments. The first dataset is the 2018 land-use map of Lhasa with a spatial resolution of 1 km (Kuang 2019), as shown in Fig. 8. According to this map, the dominant type of land use is grassland (55.1%), followed by barren land (i.e., unused; 22.4%) and forest (11.9%). The other two datasets were the ecological and economic benefits of different land-use types per square kilometer, as shown in Table 2. Both datasets were generated in the current study. The ecological benefits were assessed based on the evaluation of a selection of nine ecosystem services in Lhasa (Hou et al. 2011), namely air purification, climate regulation, water conservation, soil formation and protection, waste disposal, biodiversity conservation, food production, raw materials, and entertainment and culture. These nine ecosystem services were chosen based on their widely recognized importance at a regional scale (e.g., Su et al. 2012; Peng et al. 2019; Zhang et al. 2019). The economic benefits were assessed based on the gross domestic product (GDP) of Lhasa and the relative weights of different land-use types to GDP. The GDP data were collected from the statistical yearbook. The relative weights were determined using the analytic hierarchy process (Kulczyk et al. 2017) with seven criteria: elevation, slope, mean annual precipitation, normalized difference vegetation index, population density, per capita GDP, and distance to the nearest road.

Fig. 8
figure 8

The land-use map of the Lhasa city in 2018 (spatial resolution is 1 km)

Table 2 Ecological and economic benefits (US dollars per square kilometer) of different land-use types in Lhasa: cultivated land, forest, grassland, waterbody, construction, and unused

We established three objectives and three constraints. These three objectives are to maximize the ecological benefits, the economic benefits, and the spatial compactness. The first two objectives are generally conflicting, as shown in Table 2. The third objective was introduced from the perspective of urban planning, where the spatial compactness was evaluated according to Shaygan et al. (2014; Eq. 8). The first constraint was that cells of type “waterbody” were preserved in optimization. The second constraint was established according to the early instructions of the Central Government of China (2014) to Lhasa to protect cultivated land strictly; that is, the area of cultivated land should not be reduced. The third constraint was that the proportion of altered cells is not greater than 30%.

Algorithms and evaluated solutions

We evaluated the two improved NSGA-II algorithms and their modified versions by us. The modified versions were obtained by removing the archiving strategies, to evaluate the real usability of these algorithms for sustainable land-use optimization.

As a result, we carried out a total of four experiments, leading to four sets of optimization results for evaluation and comparison, as shown in Table 3. In all experiments, the population size was set as 50 and the number of generations as 1000. The constraints were handled using the constrained-domination principle by Deb et al. (2002). In the first experiment, the candidates of the 1st, 100th, 200th, …, 900th, and 1,000th generations were archived.

Table 3 The algorithm and optimization results of each experiment

Results and analysis

The operating environment for all experiments is a personal computer equipped with Intel Core i7–6500U CPU @ 2.50 GHz 2.59 GHz, 8.00 GB RAM, and 64-bit Windows 10. The indicators calculated with each experiment are listed in Table 4.

Table 4 Algorithm performance against each criterion, where “OC” and “MC” denote the original and modified algorithms of Cao et al. (2011), respectively, and “OS” and “MS” denote the original and modified algorithms of Song and Chen (2018b), respectively

A comparison in the indicators between OC and OS reveals the following findings. OC is better than OS in terms of the quality of results (\(Q\)) and time cost (\(T\)). However, OS outperforms OC in terms of the diversity of results (\(D\)), space cost (\(R\)), and the degree of optimization (\(O\)). Both algorithms exhibited excellent performance in the violation of constraints (\(V\)). Similar findings can be obtained by making a comparison between MC and MS.

A comparison between the original and revised versions (i.e., OC and MC) of the algorithm by Cao et al. (2011) suggests that the archiving strategy has limited effect on the quality (\(Q\)), diversity (\(D\)), time (\(T\)), and violation (\(V\)). The changes in \(Q\), \(D\), \(T\), and \(V\) from MC to OC are 8%, 4%, − 6%, and 0, respectively. By contrast, the effect of the archiving strategy on \(O\) and \(R\) is noticeable. The improvement in \(O\) is as high as 24% from MC to OC, demonstrating the necessity of archiving. However, this improvement was achieved at the expense of more space resources, as shown by the increase in \(R\).

A comparison between OS and MS demonstrates that the archiving strategy has little effect on \(O\) (increased by 2%) and \(T\) (decreased by 1%). However, the quality of results (\(Q\)) was decreased by 23% after adopting the archiving strategy. This fact indicates little need to apply the archiving strategy, although the diversity of results (\(D\)) was increased by 27%.

Discussion

To illustrate some optimized land-use plans, we showed the results with the highest degree of optimization (\(O\)) from each algorithm (Fig. 9). Intuitively, the optimized land-use plan produced by OC appears to be the most acceptable because it seems to be more spatially compact than the others. This fact can be confirmed by a comparison among the OFVs of these land-use plans towards Objective 3 (spatial compactness). As shown in Table 5, the spatial compactness (OFV 3) of OC is the highest among the four land-use plans. On the other hand, it is important to note that OC has clearly low OFVs 1 and 2. Therefore, it is difficult to conclude that OC produced the best land-use plan. In fact, all results of these four algorithms, including the four shown in Fig. 9, are among the best land-use plans according to the given objectives and constraints, or, more technically, among the Pareto-optimal solutions where one OFV cannot be further improved without worsening the other OFVs.

Fig. 9
figure 9

Land-use plans with the highest degree of optimization (\(O\)) by each algorithm, where “OC” and “MC” denote the original and modified algorithms of Cao et al. (2011), respectively, and “OS” and “MS” denote the original and modified algorithms of Song and Chen (2018b), respectively

Table 5 Objective function values (OFV) of each land-use plan in Fig. 9 towards three objectives

As a result, in practically applying these algorithms, decision-makers may wonder how to select one solution as the future land-use plan from a large number of Pareto-optimal solutions. This question is independent of the capability of a NSGA-II algorithm to find the Pareto-optimal solutions and concerns the post-processing of results. To answer this question, one usually considers average and extreme solutions. For example, in their post-processing, Cao et al. (2011) selected a solution with the highest average OFV towards all three objectives and three solutions with the highest OFV towards each single objective. Similar strategies were adopted by Song and Chen (2018b) in the post-processing.

Our suggestion for practical applications of a NSGA-II algorithm is that the objectives and constraints of optimization should be carefully designed. This is somewhat tricky because, as previously noted in "Micro-criteria of each macro-criterion" Section, it means to model the expectations of decision-makers. To this end, such a design should be constantly tested and improved. Also, we recommend two approaches for the post-processing of results. One is to model the changes of future land use based on Pareto-optimal solutions and then select Pareto-optimal solutions through an analysis of their long-term impacts on future ecosystem services. Similar work was performed by Bai et al. (2018). The other was proposed by Karakostas (2017) to analyze the frequency of land-use types in each available map cell of the Pareto-optimal solutions.

Conclusions

NSGA-II is the most popular method for algorithmically optimizing land use towards sustainable development. Its latest applications in landscape ecology include exploring the sustainable trade-offs of refuge expansion in constrained landscapes (Liberati et al. 2019). In this study, we compared the performance of two improved NSGA-II algorithms for sustainable land-use optimization. One algorithm is the most popular version, and the other is the latest version. This comparison is the first of its kind, and it was carried out both theoretically and experimentally. In the theoretical aspect, the differences between the two versions were systematically analyzed in terms of initialization, crossover, mutation, and archiving strategy. In the experimental aspect, we developed an evaluation framework of four hierarchal levels. To the best of our knowledge, it is the first comprehensive framework for such evaluations. From the results, we can draw two major conclusions:

  • The proposed framework facilitates both a comprehensive evaluation of an NSGA-II algorithm for sustainable land-use optimization and systematic comparison between two algorithms. Indeed, the framework applies not only to NSGA-II algorithms but also all Pareto front-based algorithms for sustainable land-use optimization.

  • Each of the two improved NSGA-II algorithms has its advantages. The algorithm by Cao et al. (2011) outperformed that by Song and Chen (2018b) in terms of the quality of results and the time cost, whereas the latter algorithm outperformed the former one in terms of the diversity of results, the space cost, and the degree of optimization. Also, there is little need to apply the archiving strategy to the latter algorithm. Both algorithms can be further improved according to the experimental results.

Future research is recommended in two areas. The first is to further improve the usability of NSGA-II algorithms for sustainable land-use planning. Ideas from landscape ecology studies can be borrowed, such as scale optimization, movement, and gene flow (e.g., Cushman and Lewis 2010; McGarigal et al. 2016; Wan et al. 2019). Specifically, we suggest using a patch template of different sizes for crossover/mutation and controlling the distance between two patches in the single parent crossover. The effects can be evaluated using the proposed evaluation framework. The second is to apply evaluated optimization algorithms in landscape ecology. For example, to develop a sustainable landscape architecture (Chen and Wu 2009; Liu et al. 2020), one needs to spatially arrange land resources and ground objects. The optimal arrangement can be achieved by using these algorithms.