Introduction

Land use and land cover (LULC) change begun since the dawn of civilization. It was primarily driven by human needs for survival. Human converted landscapes to derive their basic needs including food, water, shelter and medicinal products. Regions with favourable terrains, slope condition, soil type and climate were prone to LULC changes (Tewabe & Fentahun, 2020). Concern of spontaneous human expansion of agricultural and pastoral industry at the expense of natural land cover was voiced out as early as the 1860s (Marsh, 1864). A gain of 4.8 billion ha of agricultural land (cropland and grazeland) has costed the world 2 billion ha of forest and nearly 2.9 billion ha of wild grassland and shrubs (Ritchie & Roser, 2021). It was estimated by Intergovernmental Panel on Climate Change (IPCC)’s Special Report on land use, land use change and forestry that such loss particularly since 1750 to be responsible for one-third of global warming (Watson et al., 2000). Furthermore, LULC is recognized to be a primary driver of land degradation, hydrological system alterations, decreased groundwater recharge, soil degradation and irreversible species extinction (Bogale, 2020; Mengistu et al., 2022; Olorunfemi et al., 2022). Africa is particularly vulnerable to these impacts of LULC changes. It was estimated nearly 65% of ecosystem service value (ESV) in Sub-Saharan Africa is lost due to the degradation of forests and croplands (Fenta et al., 2020) leaving over 1.3 billion people stranded on deteriorating agricultural land (UNCCD, 2017). This poses a barrier to the continent’s efforts to achieve food security and improved lives.

Efforts to conserve the environment has been done through legally binding multinational agreements as early as 1910 (Mitchell, 2003). The Kyoto Protocol of 1997 and the Paris Agreement of 2015 are two of the most well-known international agreements. These agreements along with several environmental conservation–related campaigns and projects have reduced deforestation and increased global growing stock levels (Nabuurs et al., 2022; Ritchie & Roser, 2021). However, still forest losses outstrip the gains (Hansen, 2013; Ritchie & Roser, 2021). With current exponential population growth, closing the gap between the losses and gains and overall rejuvenation of the landscapes could be challenging especially in regions where considerable population growth is anticipated such as growing cities. It is even far more challenging in regions with unfavourable climate conditions such as areas of aridity with limited rainfall amounts and high erosion (Yildiz et al., 2018). Therefore, understanding the future pace of LULC under these areas is of paramount importance.

There are a wide range of approaches for LULC prediction being developed and applied across many regions (Mishra et al., 2020; Wang et al., 2021). However, numerous LULC prediction studies have used the hybrid cellular automata (CA)-Markov chain model for future prediction of LULC patterns (Sang et al., 2011; Harris et al., 2014; Gidey et al., 2017; Hamad et al., 2018; Chanapathi & Thatikonda, 2020; Aburas et al., 2021; Gemitzi, 2021; Getachew et al., 2021; Wang et al., 2021; Ghalehteimouri et al., 2022). The CA-Markov chain model has recently gained popularity because of its simplicity and ability to predict a complex LULC system (Gidey et al., 2017; Wang et al., 2021; Girma et al., 2022). CA works under the principle that changes in any particular grid cell is explained by its current state and state of the neighbouring cells (Gidey et al., 2017), whereas Markov model is a random stochastic model that operates on the basis that the future cell state depends on its current state (Ghalehteimouri et al., 2022). Therefore, the hybrid CA-Markov chain model performs under the influence of proximity that state of LULC class is influenced by the dynamic changes of the adjacent classes (Ghalehteimouri et al., 2022; Gidey et al., 2017). Nevertheless, dynamic changes in LULC often exhibit nonlinearity (Ghalehteimouri et al., 2022; Lamchin et al., 2022). Therefore, it becomes imperative to account for the influence of drivers such as socioeconomic, geospatial and climatic variables (Alqadhi et al., 2021; Girma et al., 2022; Siddik et al., 2022). Methods like the multilayer perceptron neural network (MLP-NN) offer an effective approach to enhancing our understanding of the LULC process (Eastman, 2020a; Girma et al., 2022). The MLP-NN utilizes a backpropagation algorithm that generalizes transition probabilities while considering drivers that impact LULC transitions (Alqadhi et al., 2021; Girma et al., 2022; Lamchin et al., 2022). This consideration of applying MLP-NN contributes to improving the predictive capabilities of the CA–Markov model. Accuracy and reliability of integrated MLP-NN and CA-Markov chain model have been endorsed by numerous scholars (Alqadhi et al., 2021; Ghalehteimouri et al., 2022; Girma et al., 2022; Rahaman et al., 2023). This influences studies pertaining to future LULC change detection to employ integrated MLP-NN and CA-Markov owing to its performance.

Therefore, this study aims to predict future LULC in Bahi (Manyoni) Catchment (BMC) using CA-Markov chain model for the immediate-future (2035), mid-future (2055) and far-future (2085). BMC became of significance since it covers 14% of the new Tanzania’s capital city (i.e. Dodoma) in the southeast of BMC and other rapidly expanding areas, such as 89% of Kondoa Town Council in the northeast, 53% of Singida District in the northwest, and 16% of Manyoni District in the central of BMC. Unprecedented immigration (URT, 2012, 2022) from other regions of the country including the former capital city (i.e. Dar es Salaam) is pressing BMC for a hasty expansion of agricultural land and settlements (Kisamba & Li, 2022; Mubako et al., 2022; Mussa et al., 2021). Therefore, keeping close tabs on the LULC is vital before irreversible degradation occurs. Since no study has attempted to predict future LULC for the entire BMC, therefore, the findings from this study will aid the basin board and other decision-making bodies in planning and decision-making about land conservation, sustainable LULC development and natural resource management.

Description of study area

The BMC is one of the nine catchments in IDB covering 24,770.2 \({\mathrm{km}}^{2}\). The geographical location of the BMC falls within latitude \(4^\circ {07}^{\mathrm{^{\prime}}}{12}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\) and \(6^\circ {44}^{\mathrm{^{\prime}}}{10}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\) South and longitude \(34^\circ {39}^{\mathrm{^{\prime}}}{43}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\) and \(36^\circ {05}^{\mathrm{^{\prime}}}{50}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\) East. The catchment is located in the northeast to central part of Tanzania and covers 11 districts, including the Dodoma capital city, Babati, Bahi, Chemba, Kondoa rural, Kondoa urban, Singida rural, Ikungi, Manyoni, Hanang and Itigi (Fig. 1). The BMC has mean elevation of 1235 m above sea level (a.s.l) and is characterized by mostly gentle slopes, with some steep slopes in the northern part of the catchment. The catchment receives a unimodal rainfall pattern with an average rainfall amount of 620 mm/year usually from December to May and an average temperature ranging from 16 to 27 \(\mathrm{^\circ{\rm C} }\). The major socio-economic activities in the catchment are rain-fed farming, with major crops including maize, sunflower, sorghum, millet, paddy and livestock keeping, mostly cattle and small stock of goats, sheep and pigs (Ghotbi et al., 2015).

Fig. 1
figure 1

Location of the BMC

Materials and methods

Methodological framework employed to guide this study is presented in Fig. 2. Detailed information on image acquisition and processing, image classification, selection of driver variables, model training, validation and future of prediction of LULC change were further elaborated in the following subsections.

Fig. 2
figure 2

Methodological framework for LULC change prediction

Images acquisition and processing

There are several satellite images available online for LULC studies; however, for this study, images from Landsat program were used since it is the longest running program for Earth observation. Landsat satellite images considered in this study had spectral resolution of 30 m. The selected images were for years 1985, 2005 and 2021 obtained from the United States Geological Survey (USGS) website https://earthexplorer.usgs.gov/. Selection of the images was based on the availability from the oldest to the latest year (i.e. when analysis of downloaded images was performed). The images were downloaded from Landsat 5 Thematic Mapper and Landsat 8 OLI/TIRS. Specifically, Landsat images from collection 2 level 2 were used because of improved usage of ground control points (GCPs) and radiometric calibration. Detailed description of Landsat collections and their products are well summarized in https://www.usgs.gov/media/files/landsat-collection-1-vs-collection-2-summary. Bands 1 to 5 of Landsat 5 and bands 2 to 6 of Landsat 8 were used to distinguish closely related vegetation based on vegetation health and density, and various land use categories, including roads, built-up areas, water bodies and bare soils. Dry seasons, i.e. from July to October, were prioritized in order to differentiate closely related land use types such as agriculture vs grassland. During this timeframe, the predominant crops in the region, including maize, sunflower, paddy and sorghum, have undergone harvesting leaving the fields clear. This provides noticeable contrast, making it easier to distinguish agricultural land from grassland. Acquired imageries were processed by removing cloud cover using quality assessment (QA) and surface reflectance sub-datasets. To minimize the effects of gap filling in the multi-temporal characteristics, images from adjacent month/year in the same season (July to October) were used to patch gaps created by the cloud removal process. Later, cloud-free imagery tiles were merged and the study area was extracted.

Image classifications

Pixel-based supervised classification was done using support vector machine (SVM) algorithm. SVM algorithm has been widely used in multiple fields of application (Cervantes et al., 2020). SVM is less susceptible to noise and unbalanced training samples and sizes (Boateng et al., 2020; Noi & Kappas, 2017). LULC schema was based on type of LULC described in Mainland Tanzania (Mauya et al., 2019). Ten LULC types were used including water, forest, open woodland, shrubland, grassland, agricultural land, wetland, built land, bare land and cloud cover. Description of these LULCs is presented in Table 1. Since the study was conducted in a semi-arid region, the type of grassland referred to here is savanna grassland.

Table 1 LULC types classification in BMC

After several training iterations, accuracy assessment was conducted with more than 500 stratified random points. Accuracy assessment for the classified map of 1985 in particular was performed using toposheet maps, while classified map of 2005 used Google Earth Engine platform per se, and classified map of 2021 used a combination of Google Earth Engine Platform and ground-truthing points. The existing toposheet maps for the study area were prepared from year 1958 to 1984 with fewer details. Being able to use toposheets, classes like open woodland, shrubland, grassland and bare land were merged and classified as shrubland for the purpose of accuracy testing only. Additionally, most of the toposheet maps used did not have built land class; therefore, the class was not evaluated for its accuracy. The sample sizes for the stratified random points per class were determined by the proportionality of each class on the map, with the most dominating class receiving a larger sample size. Classified LULC maps were then validated using kappa coefficient in Eqs. 1 and 2 to determine whether there is a statistical significance similarity between classified and observed LULC.

$$\mathrm{Kappa}\;\mathrm{coefficient}=\frac{\mathrm{Overall}\;\mathrm{Agreement}-\mathrm{Probability}\;\mathrm{of}\;\mathrm{Chance}\;\mathrm{Agreement}}{1-\mathrm{Probability}\;\mathrm{of}\;\mathrm{Chance}\;\mathrm{Agreement}}$$
(1)
$$\mathrm{Kappa}\;\mathrm{coefficient}=\frac{\sum_{i=1}^kn_i-\sum_{i=1}^kn_i(C_uC_p)}{n^2-\sum_{i=1}^kn_i(C_uC_p)}$$
(2)

where \(i\) is the class number, \(n\) is the total number of classified pixels that are being compared to actual class, \({n}_{\mathrm{i}}\) is the number of pixels belonging to the actual class \(i\) that were classified with a class \(i\), \({C}_{\mathrm{u}}\) is the total number of classified user pixels belonging to class \(i\) and \({C}_{\mathrm{p}}\) is the total number of producer pixels belonging to class \(i\). Kappa coefficient indicates strength of agreement: i.e. < 0.01 considered as no agreement, 0.01–0.20 as slight agreement, 0.20–0.40 as fair agreement, 0.41–0.60 as moderate agreement, 0.61–0.80 as substantial agreement and 0.81–1.00 as almost perfect agreement (Cohen, 1960). Therefore, in this study, LULC maps with kappa coefficient above 0.61 were considered; otherwise, classification was repeated. Post-classification, effect of salt and pepper was reduced by applying majority filter of 3 × 3 to produce clean classified image.

Selection of driver variables

Geospatial, socio-economic and climatic driver variables were identifies and tested for their predictive ability. The geospatial variables included were altitude (Elevation) and slope (Slope); socio-economic variables were distance from previous agricultural activities (Dist_Pre_Farming), distance from previous built area (Dist_Pre_Builtland), distance from roads (Dist_Roads), distance from streams (Dist_Stream) and distance from urban centres (Dist_Urban); and climatic variables were annual rainfall, average annual minimum temperature (Tmin) and maximum temperature (Tmax). The last included variable was evidence likelihood of land cover (EC_Likelihood_LC) which is the relative frequency of different land use categories in the area of a given transition period (Eastman, 2020a). Protected areas in BMC, like forests and game reserves, were identified, and constraints were imposed to help the model to maintain the current reserved LULC. Detailed description of the driver variables and constraints is listed in Table 2.

Table 2 Description of driver variables data used in prediction of LULC

Drivers’ variable explanatory power were tested using Cramer’s V and their variable’s worth with p value. High Cramer’s V indicate high explanatory power despite that it does not guarantee the strong performance (Eastman, 2020a). A cut-off point ≥ 0.15 was considered for this study; the rest of the variables below cut-off were discarded. Higher p values indicate low variable’s worth suggesting its rejection. Therefore, the driver variables with p values < 0.05 were selected. Qualified variables were used to train the multilayer perceptron neural network (MLP-NN) model.

Model training and transition potential

Transition sub-models were considered for training based on LULC classes that experienced significant changes from year 1985 to year 2005. Transitioned and persisted pixels from the sub-models were trained with the selected driver variables using MLP-NN. Training parameters including start and end learning rates, momentum factor, sigmoid constant and hidden layer nodes were altered until accuracy rate of ≥ 75% and root mean square (RMS) error of < 0.5 was attained (Eastman, 2020b; Moriasi et al., 2007). However, during training, stopping criteria were not confined to the aforementioned criteria to give the model a chance to be performed better. Therefore, the stopping criteria accuracy rate was set as high as 100%, and RMS was set as low as 0.01. Once a desirable performance was achieved, transition potential maps for each sub-model were then generated.

Validation and future prediction

CA-Markov chain model validation was performed by comparing simulated and classified maps for the year 2021. During simulation of LULC map of 2021, Markov chain analysis uses transition potential maps to simulate the probability of a pixel transitioning (i.e. transition probability matrix) from one land use class to another in different time periods. Transition probability matrix was manually edited until the desired simulated map was obtained. The produced validation map indicates pixels that model managed to capture transition as hits (revealed by green colour), pixels that model predicted transition but persisted as misses (revealed by red colour) and pixels model predicted persistence but transitioned as false alarms (revealed by yellow colour). Once green colour dominated the validation map, accuracy assessment was performed using kappa coefficient (Eqs. 1 and 2) to determine whether there is a statistically significant similarity between simulated and classified LULC. The verified model with kappa coefficient ≥ 0.61 was then applied to predict the LULC for BMC in immediate-future (2035), mid-future (2055) and far future (2085).

Results

Accuracy of classified LULC maps

The results of the classified LULC maps for the years 1985, 2005 and 2021 are presented in Fig. 3, whereas their corresponding accuracies are presented in confusion matrices in Tables 3, 4, and 5. The obtained accuracy for the 1985 classified map was considered to have substantial agreement with the toposheet maps. Assessment point distribution method used was stratified random, where classes with larger coverage received more points and smaller coverage received fewer points. As shown in Table 3, the assessment points located in C_2, for instance, only accounted for 9 points (i.e. < 1.8% of the total assessment points), while those in C_3 accounted for 414 points (i.e. > 81.6% of the total assessment points). Misclassification of four assessment points in C_2, the user accuracy became 56%, whereas misclassification of nine assessment points in C_3, the user accuracy is as high as 98%. This implies that accuracy assessment of maps with substantial differences in coverage may result to higher sensitivity in the accuracy of classes with fewer assessment points compared to classes with more points when misclassified. Conversely, the producer accuracy of the same classes with lower user accuracy demonstrated values within the same map contributing to an overall the kappa coefficient of 74% (Table 3). According to Cohen (1960), the obtained coefficient signifies substantial agreement between the produced map and observed LULC.

Fig. 3
figure 3

Classified LULC maps for years 1985, 2005 and 2021

Table 3 Confusion matrix for the year 1985
Table 4 Confusion matrix for the year 2005
Table 5 Confusion matrix for the year 2021

Similarly, for the classified maps of the year 2005 and 2021, both their user and producer accuracies were high leading to their kappa coefficients to reach 83% and 84%, respectively (Tables 4 and 5). This indicates almost perfect agreement with the actual LULC of the area (Cohen, 1960). This concludes that all three classified maps met the expectations of the map producers.

Analysis of 1985, 2005 and 2021 LULC maps

Findings of LULC classification from the year 1985 to 2021 are presented on Table 6. In 1985, open woodland was the most dominant LULC covering from the northern part to west and southern parts of the catchment whereas shrubland was the second dominant available nearly throughout the catchment with high concentration on the eastern part (Fig. 3a). Considering BMC is a semi-arid region, the weather favours open woodland and shrubland covers making them dominant. The two classes covered > 73% of the LULC in 1985. Grassland on the other hand with a coverage of only around 7% was predominantly located on relatively lower elevations where runoff and sediments from the steep altitudes are deposited (Fig. 1). Agricultural land was the third dominant land cover with an area about 9.37%, found mostly on the northern part and southeast of the catchment.

Table 6 Coverage of land covers in 1985, 2005 and 2021 for BMC

Figure 4 presents all LULC transitions caused by anthropogenic activities from 1985 to 2005 and from 1985 to 2021. The anthropogenic disturbances transformed natural vegetative covers including open woodland, shrubland and grassland to agricultural lands and built lands. By the year 2021, agricultural land has gained a total of 22.52% becoming the most dominant LULC followed by shrubland and open woodland whereas built land only gained about 1.33%. The expansion of agricultural land was detected throughout the catchment except in water bodies and some parts at the north, east and west of the catchment (Fig. 3c).

Fig. 4
figure 4

LULC transitions caused by anthropogenic activities a from 1985 to 2005 and b from 1985 to 2021

During the period from 1985 to 2021, significant transitions were observed from open woodland, shrubland and grassland to agricultural land, in contrast with the gradual transitions to built land shown in Fig. 5. Figure 6 presents the losses, gains and persistence of these highly transitioned LULC classes. After the identification of the classes that experienced significant changes as a result of anthropogenic disturbances, the transitions were therefore grouped under a single sub-model. The sub-model along with suitable drivers were then used to train the MLP-NN model.

Fig. 5
figure 5

Overall gain and losses from 1985 to 2021 in \({\mathrm{km}}^{2}\) a in all LULC classes, b in agricultural land, c in built land, d in open woodland, e in shrubland, and f in grassland

Fig. 6
figure 6

Losses, persistence and gains from 1985 to 2021 in a open woodland, b shrubland, c grassland, d agricultural land, and e built land

Suitable drivers and model training for future prediction

The MPL-NN model was trained using driver variables that has adequate explanatory power. Table 7 presents explanatory power of all tested drivers. Based on the set criteria for the selection of suitable drivers on the “Selection of driver variables” section, only six driver variables had adequate explanatory power, which were Elevation, Rainfall, Dist-Pre_Builtland, Dist-Pre_Farming, Dist_Roads and EC_Likelihood_LC.

Table 7 Results on the explanatory power of the tested drivers

Figure 7a, b, and c presents the influence of the driver variables in the MLP-NN model performance. Figure 7a indicates model performance when one variable was kept constant. Variable 6 (EC_Likelihood_LC) had the most influence on the model performance, followed by variables 4 and 5 (distance from previous anthropogenic disturbances, i.e. farming and built land), whereas variable 2 (Rainfall) had the least influence. This implies that variables 4, 5 and 6 have higher ability to anticipate the transition and/or persistence of most modelled classes far more than the remaining variables. Figure 7b demonstrates variation of model performance when all variables are constant except one. It implies that, independently, each variable performance is quite low even for variable 6 despite of its displayed influence on Fig. 7a. Figure 7c is backward stepwise forcing, where one by one variable is held constant until only one variable is left. The figure demonstrated the impact on model performance becomes significant when variable 5 and then variable 4 were held constant along with other least influencing variables. This implies that the overall analysis revealed that these variables depend on one another to achieve optimum model performance.

Fig. 7
figure 7

Sensitivity of driver variables on MLP-NN Model

The parameters and performance of the trained model are presented in Table 8. The performance in Table 8 indicates the successful model training since the RMS (root mean square) error is < 0.50 and accuracy rate > 75% (Eastman, 2020b; Moriasi et al., 2007).

Table 8 Parameters and performance

After successful model training, results of the transition and persistence of the selected LULC are presented in Table 9. It was observed from the Table 9 that transitions performed better than persistence. This is because of the driver variables 4 to 6 were based on land use categories of the area of 1985 inclusive of previous anthropogenic activities which was farming and built land. Persistence of grassland was very high compared to the other LULC types, likely due to its smaller coverage making its training more efficient. However, the persistence of shrubland and open woodland on the other hand was the lowest. This suggests that variables indicated a higher likelihood of some pixels from open woodland and shrubland transitioning to either agricultural land and built land, while they persisted in the 2005 LULC map. The model was aided by the adjusting transition probability grid matrix for correction. Otherwise, the model is considered to be generally successful.

Table 9 Model skill breakdown by transition and persistence

Prediction of future LULC maps

Prior to prediction, the performed accuracy assessment between the generated and classified 2021 LULC map produced a kappa coefficient of > 0.89 (Table 10). Moreover, their error map with nearly 74% hits (captured transitions), 23% misses (predicted transition but persisted), 3% false alarms (predicted persistence but transitioned) and persistence (areas that persisted from 1985 to 2021). Based on Table 10 and Fig. 8, these results demonstrate nearly perfect agreement between the generated and classified LULC maps of 2021 (Cohen, 1960), indicating the CA-Markov chain model was ready to generate future LULC maps for the year 2035, 2055 and 2085.

Table 10 Confusion matrix between classified and generated LULC map of year 2021
Fig. 8
figure 8

Error map between classified and generated LULC map of year 2021

Coverage of predicted future LULC maps by CA-Markov chain for the years 2035, 2055 and 2085 are presented in Table 11. It is evident that the main change was predicted to be from open woodland, shrubland and grassland to agricultural land and built land. The results demonstrated agricultural land and built land are expected to increase from 2021 to 2085 agricultural land becoming the most dominant LULC whereas built land becoming the fourth dominant LULC after shrubland and open woodland (Tables 6 and 11).

Table 11 Coverage of the predicted future land covers in 2035, 2055 and 2085 in BMC

Discussion

This study has classified and analysed spatio-temporal changes in LULC maps for the years 1985, 2005 and 2021 and subsequently employed a cellular automata Markov chain model to project LULC maps for the years 2035 (immediate-future), 2055 (mid-future) and 2085 (far-future) within the Bahi (Manyoni) Catchment (BMC), Tanzania. The accuracy results of the classified LULC maps of the year 2005 and 2021 (Tables 4 and 5) indicated almost perfect agreement with the observed LULC in the study area (Cohen, 1960). However, the accuracy of the classified LULC map of year 1985 ranked slightly lower than the remaining LULC maps (Table 3). This could be attributed by lack of available reference ground-truth data, paper quality of a toposheet map which brought difficulties during georeferencing as well as use of outdated toposheet maps (Daba & You, 2022; Nath et al., 2023; Piškinaitė and Veteikis 2023). It is worth noting that some of these toposheets date back to 1958, which is more than 20 years before the classification year of 1985. Based on Cohen (1960) criteria of agreements concludes that all three produced LULC maps represent the spatio-temporal situation of BMC and that the classified maps could be used for further analysis and predictions.

The findings of the classified LULC maps from the years 1985 to 2021 revealed that in 1985, BMC was predominantly covered by natural vegetation, which subsequently transitioned to agricultural lands across most of the catchment (Figs. 3 and 4).

The transition was primarily driven by population growth and low crop yield due to water scarcity and/or poor soil quality which brings strong incentive to expand agriculture to increase production and meet food demand (Dolan et al., 2021; Rosa et al., 2020). Furthermore, clearing of natural vegetation could also be attributed by logging for timber and rotational grazing for livestock keeping. It is a common practice in Tanzania for livestock keepers to burn shrubland to stimulate new out-sprout of vegetations for grazing their cattle herds (Tanzania Forest Services 2013). Studies conducted by Valone et al. (2002) and Mora et al. (2022) on the effects of fire and grazing on an arid regions found out fire has a temporary effect on the vegetation species that could be reversed. However, continuous large and high-severity burning could cause permanent transition from shrubland to grassland (Orr et al., 2022). Since this was beyond the scope of this study to analyse the tolerance of vegetation species subjected to fire, this calls for further studies. The segments of the catchment that remained unaltered were mostly protected areas. For instance, in the northern part of the catchment has Hanang forest reserve and Swagaswaga game reserve as well as on the east there is Mbuga ya Goima reserve and southeast is Kigongkwe/Chigongkwe reserve. There are also other forest reserves at the edges of the catchment, which extends beyond the northern part including Nou, Salanga and Isabe and on the east there is Chenene west forest reserves. These protected areas are under either the state management (Tanzania Forest Services Agency) or Joint Forest Management (JFM) constituting of miombo woodland, Itigi-thickets, savanna-grassland and shrublands except for Mbuga ya Goima and Kigongkwe/Chigongkwe, which constitutes of mostly savanna-grassland and shrubland (USAID, 2019).

Interestingly, the western part of the catchment (Makuru and Saranda wards in Manyoni district and Sanzawa ward in Chemba district) and northwest (Mughunga ward in Singida rural district) were not subjected to anthropogenic activities despite not being under protection. These areas are distant from the town centres such as Manyoni and Singida urban centres, respectively. This could be further explained by Von Thunen’s model of agricultural land use established in 1826, which relates the use of suitable agricultural lands close to the commercial centres being triggered by the lower transportation costs (Peet et al., 1967). Hence, the more distant the agricultural areas are from the centres, the less likely for them to be utilized. As for built land, it is sparsely located throughout the catchment (Figs. 4 and 6e). Table 6 indicates continuous growth of built land from 1985 to 2021, with relatively higher growth from 2005 to 2021 compared to 1985 to 2005. The higher growth from 2005 to 2021 was associated with population growth and/or active relocation of government offices to capital city from 2015 (Mnyampanda & Chindengwike, 2021), which attracted movement from various parts of the country seeking opportunities. However, the overall expansion of built land from 1985 to 2021 is still small, as indicated in Table 6. This could be attributed to some of the town centres being located a few kilometres outside the catchment area, such as Dodoma city centre and Singida town centre.

The CA-Markov chain model successfully generated a LULC map for the year 2021 with no significant differences compared to the classified LULC map of 2021 (Table 10). However, despite of CA–Markov model success, the model failed to mimic the state of Bahi swamp of the year 2021 (Fig. 8). The dried-out part of the swamp, which exposed the bottom soils, was detected and classified as bare land on the classified LULC map of year 2021. However, since the CA-Markov chain model was trained using previously classified LULC maps from 1985 and 2005, which had full coverage of the Bahi swamp, it was unable to detect the dried part of the swamp in 2021, resulting in errors. This means that the CA–Markov model predicted the persistence of water in the swamp instead of bare land. This is one of the limitations of using the CA-Markov chain model, which could not be resolved in this study, even with the inclusion of climatic drivers such as rainfall. Aguejdad (2021) and Gemitzi, (2021) were also concerned with ability of CA-Markov chain model performance accuracy when subjected to abrupt changes as a result from natural disasters, climate variabilities and/or geopolitical uncertainties. This is a cross-cutting drawback for most LULC change prediction models. Major limitation of LULC prediction models is not being able to predict non-linear transformations, which is usually the case for areas undergoing rapid changes (Hao et al., 2018; Aburas et al., 2019; Aguejdad, 2021; Antwi-Agyakwa et al., 2023).

Normally, the drying out of water bodies is associated with reduced rainfall amounts and higher temperatures in the surrounding areas (Mir et al., 2021). However, it was not the case for Bahi swamp. Based on the yearly statement on the status of Tanzania climate of 2021, it was recorded to have an increased rainfall amounts of 276.6 mm and with the temperature increase between 1 and 2 \(\mathrm{^\circ{\rm C} }\) compared to a long-term average (TMA, 2022). The Clausius-Clapeyron relation, a 1 °C increase in temperature can increase the rate of evaporation approximately about 7% (Kim et al., 2022). However, the actual evaporation depends on multiple factors including initial temperature, relative humidity, wind speed, availability of water and surface area of water bodies, and plant cover can affect the total amount of water available for evaporation (Penman, 1956). Since this study did not establish the rate of evaporation, therefore the influence of future changes in rainfall and temperature on water bodies within the catchment remains unknown.

The findings highlight a significant shift in the LULC composition towards increased agricultural and built land at the expense of natural vegetation (Tables 6 and 11). It seems that the historical pattern of growth of agricultural land and built land at the expense of natural vegetative land covers since 1985 will prevail. The pattern’s perseverance from historical to future predicted LULC was based on model training that relied on sub-model used, which dealt with only major changes caused by anthropogenic disturbances (see the “Analysis of 1985, 2005 and 2021 LULC maps” section). It is acknowledged that there are other LULC caused by human activities during the period between 1985 and 2005. However, because the sub-model used in the study accounted for over 80.5% of all anthropogenic changes during that time, it is reasonable to assume that any remaining changes would have minimal impact on future predictions.

It was also noted that despite a predicted substantial increase of built land from 2021 to 2035, the growth of agricultural land gradually slowed from 2035 to 2085. Several studies predicted similar results that the growth of agricultural land will either slow down or decrease in the future ( Belay & Mengistu, 2021; Lu et al., 2019; Mahdian et al., 2023; Zhu et al., 2022). This slowing growth is linked to the limited availability of land suitable for agricultural expansion. The majority of the remaining unconverted lands with potential for agriculture are protected areas, especially on the north and northeast (Fig. 9). If the forest management becomes ineffective, there is a high possibility of encroachment as some farms are already positioned quite close to the borders of these protected areas since 2021. With the fast-growing population, promoting a revolution from extensive to intensive farming practices is necessary to minimize the conversion of native lands to agricultural lands. This will increase productivity in small areas and may lower the urge to encroach on the protected areas when management practices become ineffective. Predictions indicate that by 2085 that some parts of Makuru and Saranda wards in Manyoni district, as well as Sanzawa ward in Chemba district, are the only remaining areas of the catchment that have remained as non-agricultural land. This could be because of the farming extended from the northern and southeast towards the west of the catchment. Since the remaining area is not under protection, it is anticipated that it is at risk of being converted to agriculture in the future. The findings obtained here are consistent with prior researches conducted in this and other water basin in Tanzania, which have identified farming as the fastest-growing LULC in the country (Berakhi et al., 2015; Msofe et al., 2019; Mussa et al., 2021; Näschen et al., 2019; Twisa & Buchroithner, 2019). Prolonged farming can lead to a reduction in infiltration and an increase in runoff due to agricultural activities, such as ploughing, tilling and harvesting, that disturb the soil structure and increase soil compaction (Nunes et al., 2020; Voorhees et al., 1978). Consequently, the soil’s ability to absorb water decreases, and more water runs off the surface, causing erosion. This raises concerns about the alteration of hydrological processes which may affect groundwater recharge. Moreover, prolonged farming accompanied with excessive use of agrochemicals such as fertilizers and insecticides can lead to decreased soil fertility and eventually contributing to food insecurity (Popp et al., 2013). Therefore, with the expected growing demand for agricultural land, this study provides necessary information for the basin board regarding the areas that are more susceptible to changes for proper LULC planning.

Fig. 9
figure 9

Generated maps by CA-Markov chain for the year 2021, 2035, 2055 and 2085

Conclusion

This study aimed to predict future land use and land cover change in Bahi (Manyoni) catchment, Tanzania for the immediate-future (2035), mid-future (2055) and far-future (2085) using integrated MLP-NN and CA-Markov chain model. The classified LULC maps of the years 1985, 2005 and 2021 were used to train the model. The accuracy assessment revealed remarkable agreement with observed conditions, reaffirming reliability of the classified LULC maps for further analysis and predictive applications. It should be noted that the land use for 2021 included abrupt changes associated drying of Bahi swamp. Despite limitations encountered by the integrated MLP-NN-CA-Markov chain when predicting abrupt changes, such as the drying out of the Bahi swamp, nevertheless, the overall trained model managed to simulate almost perfectly the LULC map of 2021, thus indicating its high predictive capability. The findings of LULC in BMC has indicated significant shifts in the landscape dynamics of the catchment. The transition from natural vegetation to agricultural lands emerged as a dominant pattern from 1985 to 2021, driven by factors such as water scarcity, low crop yields, population growth and anthropogenic activities such as logging and grazing. The study demonstrated that most of these anthropogenic changes are expected to spread from the north and southeast to the west part covering nearly the entire catchment except for the protected areas and water bodies. It was noted that the patterns are likely to continue into the future, though the expansion of agricultural lands might gradually slow due to constraints of land availability. To address the potential challenges arising from these changes, the study suggested promoting a transition from extensive to intensive farming practices. This approach could reduce conversion rate of native lands to agriculture. Furthermore, as agriculture continues to expand, concerns about altered hydrological processes including groundwater recharge arise, underlining the need for strategic LULC planning and management. This research provides a valuable contribution to understanding the complex interplay of anthropogenic activities and environmental changes within the Bahi (Manyoni) Catchment. The findings offer insights for sustainable land management strategies, aiding policymakers and stakeholders in making informed decisions to balance development with conservation efforts in this vital catchment. As Tanzania and other regions globally face similar challenges of land use transformation, this study’s insights resonate beyond the Bahi (Manyoni) catchment, contributing to the broader discourse on effective land use planning and management.

Recommendations

This study offers valuable insights and directions for future research in the field of land use and land cover (LULC) dynamics. The following are some areas that may benefit from further investigation:

  • Since the study revealed challenges in predicting non-linear transformations, particularly instance of abrupt changes like the drying out of the Bahi swamp, future research could investigate deeper into understanding and addressing this limitation to improve the accuracy of LULC prediction model in abrupt changing environment.

  • This study did not incorporate impact of future climate variables to the projected land use. Since it is known that climate affect LULC, hence future study may consider incorporating future climate variables into the LULC prediction model.

  • The study projected LULC changes up to 2085, yet uncertainties increase with longer projection periods. Hence, future research could investigate the reliability and robustness of LULC predictions over extended timeframes, addressing potential uncertainties that arise in long-term modelling.