1 Introduction

Floods are a frequently occurring and highly destructive occurrence in the natural realm, resulting in significant harm to agricultural crops and posing a threat to food security (Hirabayashi et al. 2008; Bhattacharya et al. 2019). In recent years, floods have escalated in severity, exerting substantial influence on the agricultural sector, disrupting infrastructure and causing economic and social setbacks (Lee and Mohamad 2014; Su et al. 2021). Consequently, there has been an increasing emphasis on the mapping of flood-susceptible areas for the purposes of early warning systems and impact evaluation (Do et al. 2022a, b, c). Traditional approaches to flood mapping rely on ground surveys and aerial observations, but the irregular and extensive nature of floods renders these methods time-consuming, expensive, and impedes the prompt assessment of flood-related effects on the economy and livelihoods (Peng and Peng 2018; Chen et al. 2019; Do and Tran 2023a, b, c).

Hanoi City has recently encountered frequently instances of flooding during prolonged rainfall (Anh 2021). Due to rapid urbanization, numerous main roads have been expanded without coordinated drainage planning, leading to localized flooding in various areas during heavy rainfall. Among these areas, three-quarters of the total area comprise plains, with agriculture still occupying a significant portion of the economic structure. Consequently, the impact of floods has resulted in localized inundation (Anh 2023). Nowadays, one- and two-dimensional models are progressively being employed to enhance the precision of simulation results (Lin et al. 2006; Liu et al. 2015). However, the primary drawback of this technique is the substantial amount of input data required for the model, necessitating extensive time and effort for field surveys, data collection, and model calibration (Klemas 2015; Lin et al. 2016). Furthermore, in susceptible areas due to torrential rain, the current hydraulic models have not been able to provide a comprehensive resolution (Brakenridge et al. 1994). The development of remote sensing technology and GIS has supplied powerful tools for data acquisition, spatial analysis, and graphical representation in the monitoring and identification of flooded areas (Zaharia et al. 2017; Al-Abadi 2018). Remote sensing data possesses the capacity to gather information over vast areas and for extended periods with a high repetition frequency (Do et al. 2022b). The integration of remote sensing and GIS within machine learning models facilitates rapid calculation and assessment of areas at susceptible of flooding and inundation (Do et al. 2022a, b, c).

In recent years, the utilization of machine learning techniques and data mining has proven to be valuable in the realm of flood prediction (Mosavi et al. 2018; Do and Tran 2023a). One particularly popular method employed in this context is the Support Vector Machine (SVM) non-linear model (Khan et al. 2019). The distinguishing feature of SVM lies in its ability to employ kernel functions to transform the original feature space, thereby facilitating the handling of non-linear features (Do and Tran 2023c). Consequently, SVM is able to classify flood data based on non-linear features such as the interplay between environmental factors, topography, and weather. Additionally, the SVM model is effective in managing large datasets, thus reducing the time and resources required for computation (Costache 2019). However, it should be noted that SVM is susceptible to data noise and exhibits relatively high computational complexity, especially when complex kernel functions are employed or when dealing with substantial amounts of data (Do and Tran 2023c). Therefore, in the current study, the Principal Component Analysis (PCA) algorithm has been utilized to identify the principal components of the data, the components with the highest variance, with the aim of enhancing the performance of the SVM prediction model (Xu and Wang 2005).

In order to assess the effects of floods on agriculture, it is necessary to possess a spatial distribution map of agricultural land (Do et al. 2022a, b, c). Over the past few years, there has been rapid development in various machine learning algorithms for mapping land cover/land use (LULC), including agricultural land (Pham et al. 2023a, b, 2024; Do et al. 2023). The efficacy of machine learning methods in handling limited sample data has been demonstrated through their generalization and noise resistance capabilities. As a result, machine learning algorithms have become invaluable tools in processing remote sensing data and offering solutions within the realm of agriculture (Anh 2023). Currently, algorithms based on the Convolutional Recurrent Neural Network (CRNN) have emerged as a prominent subject of interest in this field, utilizing convolutional layers and pooling techniques (Rajendran et al. 2020; Moharram and Sundaram 2023). The CRNN model the ability to automatically extract features and information from the original images based on the combination of two types of network, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), thereby enhancing the accuracy of classification results (Cao et al. 2019).

In light of this reality, the primary objectives of the current investigation are as follows: (i) to examine the practicality of the CRNN model in extracting agricultural land using SPOT 7 satellite imagery; (ii) to map flood-susceptible areas using the PCA-SVM model; and (iii) to evaluate the impacts of floods on agricultural land use in Hanoi City.

2 Materials and methods

2.1 Study area

The Hanoi City area is distinguished by numerous advantages for the advancement of high-quality agriculture. It is the largest city in Vietnam, encompassing an area of roughly 3,360km2, and it ranks second in terms of population and population density among Vietnam’s 63 provinces and cities. Situated in the northwest of the central Red River Delta, within the latitude range of 20°34’ to 21°18’ north and the longitude range of 105°17’ to 106°02’ east, Hanoi is situated in the triangle of the Red River Delta, an area known for its fertile and abundant land (Fig. 1). Hanoi has an extensive hydrological system consisting of various small and large rivers, including the Red River, Duong River, Da River, Nhue River, Cau River, Day River, and Ca Lo River. The city possesses all the necessary prerequisites for the development of a contemporary agricultural sector and serves as a market for high-quality rice, catering to a substantial and steadfast demand.

Fig. 1
figure 1

A map showing Hanoi city

2.2 Data collection and SPOT image preprocessing

This investigation collected satellite imagery data from SPOT 7 in January 2023, featuring a resolution of 1.5 m, while ensuring that the acquired images contained a cloud cover of less than 10% in the study area. The SPOT 7 images underwent atmospheric and spectral correction to derive radiometric values. Several atmospheric correction models, such as COST, DOS, MODTRAN, ATCOR, or FLAASH, can be employed for atmospheric correction (Pham et al. 2023a, b, 2024; Do, 2024). To enhance accuracy, the ATCOR (Atmospheric and Topographic Correction) atmospheric correction model was integrated into the PCI Geomatica 2018 software to execute atmospheric correction. The quality of satellite imagery is heavily dependent on the image processing. Typically, the acquired satellite image channels have pixel values that are distributed within a narrow range compared to the display capabilities of the image. Each individual channel tends to be relatively dark or bright when displayed. Therefore, to enhance the contrast of the image, we need to perform an image stretching operation. This is done to transform the actual gray-level range of the original image into a gray-level range that the display device is capable of showing. After the processing, the satellite image will have good image quality, appropriate contrast, accurate color representation, even color distribution, and will be suitable for LULC cover classification purposes. The image preprocessing procedure entailed four steps, as depicted in Fig. 2, encompassing geometric correction and enhancement facilitated by ENVI 5.3 software, utilizing the UTM projection grid, VN-2000 coordinate system, and zone 48 with a resolution of 2.5 m.

Fig. 2
figure 2

Simple flow of SPOT image preprocessing

The current study executed the acquisition of sample data for image classification within the research area. In order to reference during the classification and prediction process, a land use map of Hanoi city for the year 2022 has been collected. The current study has collected a total of 363 samples encompassing all land cover classes and surveyed flood-sensitive areas throughout the entire research region. These samples include on-site data collection (95 samples), utilization of high-resolution imagery from Google Earth, historical flood locations, and direct sampling on SPOT 7 satellite imagery. Five various types of land cover have been classified, including: other land, construction, forest and urban green space, water surface, and agriculture. The collected dataset was divided into two, with 70% of the samples utilized for training the classification model (training data) and the remaining 30% utilized for validating the classification results (testing data).

2.3 Selection of LULC classification method

Currently, there exist multiple machine learning algorithms designed for land use land cover (LULC) classification using satellite imagery (Nahuelhual et al. 2012; Hua 2017). Among these algorithms is the Convolutional Recurrent Neural Network (CRNN), which combines the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Connectionist Temporal Classification (CTC) (Fig. 3). CRNN is commonly employed for image classification tasks (Cao et al. 2019). This neural network architecture seamlessly integrates feature extraction, sequence modeling, and decoding within a unified framework. CNN is utilized to extract structural features from the images (Kattenborn et al. 2021), RNN is employed to model the sequential information (Mou et al. 2017), and CTC is utilized to enhance the performance of the CRNN model (Hsu and Li 2021). The training process of the CRNN model for image segmentation generally involves the utilization of training data consisting of image-label pairs. The model is adjusted through the optimization of a loss function, such as cross-entropy loss, to attain the most accurate segmentation results (Kattenborn et al. 2021). Numerous studies have demonstrated the commendable performance of CRNN in LULC classification (Wu and Prasad 2017; Zhao and Zettsu 2018). To evaluate the classification performance, the current study employed the overall accuracy (OA) and cross-validated accuracy (CV) metrics (Anh 2021; Do et al. 2022b).

Fig. 3
figure 3

Land use land cover classification steps using CRNN model

$$ \text{O}\text{A}=(\text{T}\text{P}+\text{T}\text{N})/(\text{T}\text{N}+\text{T}\text{P}+\text{F}\text{N}+\text{F}\text{P})$$
(1)
$$ \text{C}\text{V}=\left(\frac{2{\text{T}\text{P}}^{2}}{\left(\text{T}\text{P}+\text{F}\text{N}\right).(\text{T}\text{P}+\text{F}\text{P})}\right)/\left(\frac{2{\text{T}\text{N}}^{2}}{\left(\text{T}\text{P}+\text{F}\text{N}\right).(\text{T}\text{P}+\text{F}\text{P})}\right)$$
(2)

where TP represents accurately classified agricultural objects, TN represents accurately classified non-agricultural objects, FP represents incorrectly classified agricultural objects, and FN represents incorrectly classified non-agricultural objects.

2.4 PCA-SVM model in susceptible prediction of floods

Principal Component Analysis (PCA) is a technique utilized to diminish the dimensionality of data within the feature space by identifying the principal components of the data (Do et al. 2022b). Within this investigation, the PCA algorithm was employed to eliminate less significant components from a total of 16 input variables (Table 1) in order to decrease the dimensionality of the data and optimize the model for better performance. Following the reduction of data dimensionality through PCA, the principal components were selected as input features for the Support Vector Machine (SVM) model, which is a supervised machine learning model utilized for regression problems aiming to discover an optimal hyperplane for classifying data into flood-susceptible and flood-insensitive (Do et al. 2022b; Do and Tran 2023c). When employing a linear kernel, the SVM decision function takes the form (Gao et al. 2003):

Table 1 Database for flood susceptibility mapping
$$ \text{f}\left(\text{x}\right)=\text{s}\text{i}\text{g}\text{n}({\text{w}}^{\text{T}\text{x}}+\text{b})$$
(3)

where w is the weight vector, x is the feature vector of the data sample, b is the bias term, and sign() is the sign function. In the present study, the parameter C was used to regulate the model’s regularization. A larger C value directs the SVM model to prioritize error minimization and tolerate fewer violations, while a smaller value of C prioritizes minimizing the magnitude of w and allows more violations of the margin. In the SVM model, the optimal value of the parameter C is determined through techniques such as grid search or error optimization. This process involves training and evaluating the model using both the training and testing datasets.

To evaluate the accuracy of the PCA-SVM model in predicting flood sensitivity in the study area, the study utilized the coefficient of determination (R2), and the root mean square error (RMSE) (A. N. T. Do et al. 2022a, b, c; T. A. T. Do et al., 2022; Do et al., 2024; Do, 2024).

$$ {\text{R}}^{2}=\frac{{\sum }_{\text{i}=1}^{\text{k}}\left[\left({\text{Y}}_{\text{i}}-\text{Y}\right)\left({\text{X}}_{\text{i}}-\text{X}\right)\right]}{\sqrt{{\sum }_{\text{i}=1}^{\text{k}}{\left({\text{Y}}_{\text{i}}-\text{Y}\right)}^{2}.\sqrt{{\sum }_{\text{i}=1}^{\text{k}}{\left({\text{X}}_{\text{i}}-\text{X}\right)}^{2}}}}$$
(4)
$$ \text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{1}{\text{k}}{\sum }_{\text{i}=1}^{\text{k}}{\left({\text{Y}}_{\text{i}}-{\text{X}}_{\text{i}}\right)}^{2}}$$
(5)

where \( {\text{Y}}_{\text{i}}\), and Y represent the predicted variable and the mean value, respectively; \( {\text{X}}_{\text{i}}\), and X represent the observed variable and the mean value, respectively, and k is the sample size.

3 Results

3.1 CRNN model in agricultural land classification

In the current study, the utilization of the CRNN model has been employed for the purpose of classifying LULC in Hanoi city. The results of this classification are displayed in Table 2, which the area and percentage for each respective LULC type. Agricultural land, with an expansive area of 141,982.871 ha (42.258%); followed by construction (32.259%); forest and urban green space (13.304%); water surface (11.744%); and finally, other land only accounting for a mere 0.425% of the total area. To gauge of precision of the SPOT 7 image classification results, the study utilized the overall accuracy (OA) and the coefficient of variation (CV) as evaluation metrics. The evaluation results are as follows: the overall accuracy achieved 88.005% and the CV reached 0.855 (Table 3). Among these results, the water surface category displayed the highest classification accuracy (OA = 95.027%, and CV = 0.931), followed by forest, urban green space, construction, other land, and finally agriculture (OA = 80.282%, and CV = 0.782). Overall, the CRNN model demonstrated a high level of accuracy and suitability for classifying land cover within the Hanoi city area.

Table 2 Area and percentage of area of each type of LCL
Table 3 Performance of the CRNN classification model

Figure 4 displays the spatial distribution map of agricultural land, which was classified using the CRNN model and SPOT satellite imagery. It can be observed that agricultural land encompasses a significant area and is primarily distributed in suburban areas. Areas devoid of agricultural land are predominantly situated in inner-city areas such as Ba Dinh and Dong Da districts, as well as in highland areas such as Ba Vi district (Fig. 4). In general, agriculture plays a prominent role in the economy and food production within the research area. However, it exhibits an uneven distribution, primarily concentrated in suburban areas, delta regions, and areas in close proximity to rivers and streams. With water surfaces covering 11.744% of the total area, flood events can cause significant losses to crops in these areas.

Fig. 4
figure 4

Spatial distribution of agricultural land in Hanoi city in 2023

3.2 The importance of variables

Sixteen variables were selected as input variables to prediction flood susceptibility in the research area, as depicted in Fig. 5. Nevertheless, incorporating an excessive number of input variables would give rise to overfitting concerns for the SVM model. Therefore, principal component analysis (PCA) was employed to diminish the dimensionality of the data and improve the predictive performance of the model. The results of the PCA analysis indicate a distinct separation of data points into separate clusters. The principal component PC1, which represents rainfall, accounts for a significant portion of the data’s variance, with, with R2 = 0.342 (Fig. 6) indicating its utmost importance in relation to flood variables. Rainfall holds a critical role in flood modeling in Hanoi, an area characterized by a humid tropical climate with a total area of 3,360 km2, Large and abrupt increases in rainfall can result in flooding and give rise to various flood-related issues. Consequently, constructing a flood modeling approach sensitive to Hanoi requires comprehensive data on the intensity and distribution of rainfall within the region. This result underscores the significant role of rainfall in instigating floods and reaffirms its importance in the flood susceptibility prediction model within the research area. The variables exhibiting correlations greater than 0.25 include water density (R2 = 0.316), distance to water surface (R2 = 0.290), forest density (R2 = 0.274), and LULC (R2 = 0.252). These results indicate that LULC plays a pivotal role in identifying susceptible areas. The level of susceptible and susceptibility heavily relies on the interplay between land cover and rainfall. Forests and land use can influence watershed runoff, soil permeability, and water absorption capacity. Regions with a substantial forest cover can mitigate flood susceptible in the vicinity. Altitude, aspect, slope, curvature, NDVI, temperature, agriculture density, construction density, distance to agriculture, distance to forest, distance to construction, and distance to water surface possess R-square values ranging from 0.107 to 0.216 (Fig. 6). Although these variables exhibit a certain degree of correlation with flood susceptibility, their significance is not as pronounced as the aforementioned variables.

Fig. 5
figure 5

Predictors of flood susceptibility in the study area. a) altitude; b) aspect; c) slope; d) curvature; e) LCLU; f) NDVI; g) temperature; h) rainfall; i) agriculture density; k) forest density; l) construction density; m) water density; n) distance to agriculture; o) distance to forest; p) distance to construction; and q) distance to water surface

Fig. 6
figure 6

The coefficient of determination between flood susceptibility and sixteen metrics applied from SPOT satellite

3.3 Mapping flood-susceptible areas

To evaluate the performance of the flood sensitivity prediction model, this study utilized the R2, RMSE, and ROC curve metrics, as indicated in Fig. 7; Table 4. The R2train value of 0.938 signifies that the model accounts for 93.8% of the variability in the training data, suggesting a strong fit to the training data (Table 4). Additionally, the PCA-SVM model demonstrated commendable performance on the test data, achieving an R2test value of 0.904. The results revealed an AUC of 0.921 (Fig. 7), signifying the model’s ability to accurately classify flood susceptibility at a rate of 92.1%. Therefore, the PCA-SVM model proves valuable for flood susceptibility prediction in the study area.

Fig. 7
figure 7

The ROC plot shows the performance of the flood prediction model using the PCA-SVM model

Table 4 Performance of PCA-SVM on flood susceptibility in Hanoi city

Drawing on the prediction results obtained from the PCA-SVM model, this current study was able to identify and depict the spatial distribution map of flood-susceptible areas in Hanoi city (Fig. 8). Generally, susceptible areas are predominantly found along major rivers, especially the Red River. Additionally, flood-susceptible areas are typically located in low-lying regions with inadequate drainage, high river density, and substantial rainfall. Table 5 presents the distribution of areas based on flood susceptible zones, with the very low and low-risk levels accounting for 6.278% and 10.357% respectively, primarily concentrated in hilly areas with low river density (Fig. 5). The average flood susceptible level encompasses a substantial portion of approximately 27% of the total area (Table 5). In contrast, the highest proportion (over 55%) of high and above flood susceptible levels is concentrated in the delta region with high river density and heavy rainfall. Through the overlay of the flood classification layer beneath the agricultural land use layer (Fig. 9), this current study has successfully identified flooded agricultural land in areas with high to very high flood susceptible. Overall, more than 70% of agricultural land is situated in high-risk and above areas, indicating that flooding can result in inundation, crop loss, and a decline in agricultural quality and productivity.

Fig. 8
figure 8

Flood susceptibility map in the study area

Table 5 Area and percentage of each flood risk level in Hanoi city
Fig. 9
figure 9

Distribution of high and very high flood risk levels compared to agricultural land

4 Discussion

Floods are severe natural phenomena that annually cause substantial damage in Hanoi city annually (Do et al. 2022a, b, c; Anh 2021). Therefore, the analysis of flood susceptible contributes to the examination of areas at an unacceptable high susceptibility to flooding and the identification of locations for risk mitigation actions (Liu and Wu 2011; Costache 2019). Historically in Vietnam, the delimitation of flood zone heavily relied on hydro-meteorological monitoring data obtained from stations (Do et al. 2022a, b, c). However, these stations are widely spaced, with an average coverage of approximately 1,650 km2 per station (Anh 2023). Therefore, the utilization of data from Earth observation satellites, which provide detailed information and have shorter repetitive cycles, has been regarded as a superior approach to complement the traditional monitoring methods facilitated by existing stations (Lee and Mohamad 2014; Lin et al. 2016).

To evaluate the impact and extent of floods on agriculture, the classification of LULC becomes essential (Ahmadlou et al. 2019). In the realm of multi-level tasks, deep learning algorithms have progressively outperformed traditional algorithms in terms of fast processing speed, and classification accuracy (Amitrano et al. 2018; Su et al. 2021; Do and Tran 2023b). Among them, the CRNN model has demonstrated remarkable effectiveness in LULC classification (Wu and Prasad 2017). The findings of the study unveil that the CRNN model attains favorable classification accuracy, with OA = 88.005%, and CV = 0.855 (Table 3). The research results indicate that the CRNN model achieves commendable classification accuracy, with an OA of 88.005% and CV of 0.855 (Table 3). Based on the computed results, agriculture encompasses nearly half of the total area (Table 2) and is mainly concentrated in low-lying and riverine areas. Nevertheless, the agricultural land area in Hanoi city is diminishing, as documented by studies conducted by Anh (2021, 2023). Despite being the largest city in Vietnam, Hanoi still retains a significant portion of agricultural land, which not only plays a pivotal role in food provision but also in conservation and sustainable development. Therefore, when flooding transpires, it can result in inundation, reduced quality, and productivity, as well as substantial economic losses.

To minimize the negative impacts of floods on the agricultural sector, numerous investigations have indicated the necessity for effective measures in flood prevention and response. This process begins with the assessment and classification of flood susceptible s (Liu et al. 2015; Loc et al. 2022; Do and Tran 2023a). Various studies have successfully conducted assessments of flood susceptible Hailin et al. (2009) employed multi-year average rainfall, storm rainfall days, terrain factors, and flood frequency in order to map flood hazards. Similarly, Hagos et al. (2022) used GIS to identify susceptible areas by considering factors such as annual rainfall, slope, drainage systems, and soil type. However, the challenge lies in the multitude of factors that influence floods, encompassing both natural and socio-economic factors. Therefore, prior to incorporating all dependent variables into the prediction model, careful consideration of the various factors that impact flood zoning is necessary. The current study utilized the PCA method to examine the relationships between factors and reduce the dimensionality of the data (Fig. 4).

Figure 6 illustrates that rainfall is the most important factor leading to an increase in the quantity of surface water, which in turn affects flow transmission. Furthermore, the majority of Ha Noi city comprises areas with low slopes, flat terrain, and predominantly agricultural land cover in the downstream areas and main river branches within the city. The study conducted by Do et al. (2022a, b, c) also demonstrated that the aspect variable has minimal influence on flood occurrence. In the current study, less significant variables will be eliminated from the input model to ensure optimal model performance. Based on the prediction results from the PCA-SVM model, flood susceptible zoning maps at different levels are presented in Fig. 8. The areas with high to very high flood susceptible are primarily located in agricultural cultivation areas (Fig. 9). Conversely, low-risk susceptible areas are typically found in hilly regions with abundant forest cover and low river density. Similar conclusions were also drawn in studies by Zaharia et al. (2017); Do and Tran (2023a).

According to the study by Anh (2021), revealed that Hanoi city is renowned for its exceptional rice and vegetable production in the Red River Delta. The region yields an impressive amount of agriculture products, including 952.7 thousand tons of rice, 72.5 thousand tons of corn, and 723.2 thousand tons of vegetables, alongside other perennial crops. Consequently, any disruptions caused by flooding could have a significantly impact on the local food supply. To address this concern, a spatial distribution map has been developed to identify areas with varying flood sensitivity levels. This map serves as an early warning system for potential hazards and aids in assessing the likelihood of flood-related inundation affecting agricultural activities in Hanoi city. The results of this study provide valuable reference materials and effective support for decision-makers when planning land use for agriculture purposes.

5 Conclusions

The findings of the current study introduce a methodology for mapping flood sensitivity, which is highly relevant in distinguishing flood-affected agricultural land areas. Through the utilization of satellite imagery data and the PCA-SVM model, the study effectively depicts flood susceptible levels, ranging from high to very high, in approximately 70% of the agricultural regions. Furthermore, the study identifies susceptible areas primarily situated in riverine zones, highlighting their high susceptibility. that the efficacy of satellite imagery data in detecting and mapping flood sensitivity is evident. These findings significantly contribute to assisting decision-makers in pinpointing susceptible locations and formulating prevention and mitigation measures to minimizing the damages incurred by floods in the agriculture sector and its related industries.