Keywords

1 Introduction

With the rapid increase of population and the development of urban and rural areas, the depth of land use has shown a trend of rapid expansion compared with the past. Nowadays, grasping and using the land information is the most necessary plan to be implemented by the state and governments, so as to make a more effective judgment and decision on China’s economic development plan. Besides, internationally, many countries have made use of remote sensing technology to manage land use and science, and have achieved successful experience.

With the rapid development of remote sensing technology, it has been widely used in many fields. In recent years, remote sensing image has been widely used in land use and land cover classification, which plays a crucial role in land resource management, urban planning, environmental protection, and other applications [1, 2]. With the development of remote sensing technology, Imaging satellites can provide by remote sensing data covered most of the surface of the earth, such as high-resolution remote sensing image has the characteristics of the “three highs” and can realize the day and in the aspect of earth observation, all-weather, real-time observation, for the land use and cover classification provides a new opportunity, namely how to make use of high-resolution remote sensing satellites for land use classification.

At present, the most commonly used image classification techniques are pixel-based and object-oriented classification methods. However, due to the lack of spectral information of high-resolution images and the phenomenon of “same thing different spectrum, same spectrum foreign matter”, image classification is affected, and the pixel-based classification method cannot be well applied. The object-oriented classification method has obvious advantages in the consideration of spectrum, space, texture and context information, so as to improve the classification accuracy and make the extraction more intuitive for the analysis of the problem. Therefore, this method has attracted wide attention. For example, Meng et al. [3] proposed an object-oriented method to extract urban ecological land cover from the multi-channel images obtained by China’s GF-1 satellite. This method can accurately identify the urban land cover with a verification accuracy of 90%. Yu et al. [4] used object-oriented information extraction technology, multi-temporal hj-1a image and other auxiliary data to extract the main land use/land cover types in the research area. Compared with the pixel-based classification method, Yu et al. research results have higher accuracy. Jesus et al. [5] combined object-oriented and pixel-based methods to extract land cover information in the mountains of Mexico. What’s more, Wang et al. [6] used object-oriented classification method to test the feasibility and applicability of classifying ecologically sound land, and the overall classification accuracy of Wang et al.’s results was 87.43%. Cahairet et al. [7] used SVM classifier to carry out land cover supervision and classification in Gabes region located in the southeast of Tunisia, with an accuracy of 92.12%. In addition, Shi et al. [8] using satellite remote sensing technology to analyze and study high -resolution remote sensing images are helpful to the effective supervision of urban land use. This paper studies part of Yangshuo county, uses ENVI5.3 platform in object-oriented spatial feature extraction module Example-Based-Feature-Extraction tools to carry out research under Guilin area land cover classification information extraction technology. The advantages of FX include fast, repeatable, accurate, convenient, and accessible, which is suitable for experimental research on object-oriented classification.

The rest of the paper is organized as follows: We describe the data used for research in Sect. 2, and introduce our proposed research area and image preprocessing procedure; In Sect. 3, we introduce the object-oriented classification method; The results obtained when applying the proposed method to selected data sets are reported and discussed in Sect. 4. Finally, we summarize the main conclusions in Sect. 5 and suggest some possibilities for further research.

2 Research Area Overview and Data Preprocessing

2.1 GF-2 Data

The GF-2 satellite is configured with two panchromatic and multispectral CCD camera sensors (PMS) with a resolution of 1 m panchromatic/4 m multispectral. The spatial resolution of the GF-2 satellite can reach 0.8 m, indicating that China’s remote sensing satellite has entered the sub-meter “high score era” [9]. GF-2 can provide a combined band area of 45 km, which is reflected in the multispectral image as 6908 × 7300 pixels. The revisit time of GF-2 is 5 days, so it can capture a wide range of detailed information at very short intervals. The GF-2 image has the characteristics of high resolution, wide image coverage, frequent revisit and high image quality, and is an ideal data source for land cover information extraction. GF-2 images at the 2 m and 8 m scales obtained from the sensor PMS1 on March 12, 2019, were used in the study area.

2.2 Study Area Overview

GF-2 has a good performance in information extraction between urban and rural areas. Therefore, this paper selected some urban areas in the GF-2 image of Yangshuo county, Guilin city, Guangxi Zhuang autonomous region as the experimental research area. Yangshuo County is located in the northeast part of Guangxi Zhuang autonomous region and to the south of Guilin city, located at longitude 110° 13’–110° 40’ east and latitude 24° 28’–25° 4’ north. It is adjacent to Gongcheng county and Pingle in the east, Lipu county in the south, Yongfu county in the west, Lingchuan county in the north and Yanshan district in the north, wild goose mountain area, the county area of 1428.38 sq. km, the experimental study on regional content-rich on the mainland, including urban land, water, wood land, farm land, road and other land use, such as terrain, the terrain has certain representativeness, is advantageous to the object-oriented information extraction experiment research. The study area in this paper is 110°29’–110°30’ east longitude and 24°46’–24°47’ north latitude. Figure 1 shows the true-color image of the study area, which is composed of red, green and blue wave segments.

Fig. 1.
figure 1

True color images of the study area

2.3 Preprocessing

Remote sensing image preprocessing is to process remote sensing images and related data through computers, providing a basis for remote sensing information extraction and remote sensing quantitative analysis [10]. Based on the remote sensing image data, this paper carries out the image data pre-processing processes such as orthographic correction, image registration, image fusion and clipping. The basic flow chart of its preprocessing is shown in Fig. 2 below.

Fig. 2.
figure 2

Flow chart of preprocessing

3 Object-Oriented Land Covers Are Classified

Due to the lack of spectral information of high-resolution images, and the phenomenon of “same thing, different spectrum, same spectrum foreign matter”, the result of image classification will be affected, and the pixel-based classification method cannot be well applied. The Object-oriented image classification method is to make full use of high-resolution panchromatic and multispectral data, which has obvious advantages in terms of spectrum, space, texture and context information, so as to improve the classification accuracy and make the extraction of information more intuitive for analysis. Image classification based on samples, namely supervised classification, uses training sample data to identify other unknown objects, including extract texture features, sample definition, classification algorithm selection and output results. The method in this paper is using ENVI5.3 software object-oriented spatial feature extraction module under Example-Based Feature Extraction tools to carry out the research Guilin area land cover classification. The sample-based object-oriented classification process is shown in Fig. 3.

Fig. 3.
figure 3

Flow chart of sample-based object-oriented classification

3.1 Image Segmentation and Merging

FX mainly uses the object-oriented idea to extract the needed information from the remote sensing image, such as urban land, water, wood land, farm land, road, and other lands, etc. It divides the image according to the brightness, texture and color of adjacent pixels. The image segmentation algorithm is according to certain rules to the whole image segmentation research area is a number of small patches of each patch have the characteristics of homogeneous gay, its spectral characteristics, texture characteristics, spatial characteristics, with the same or similarity with - kind of features, image segmentation is the core of segmentation threshold segmentation scale that set up [11]. Due to some threshold values are too low in image segmentation, some features may be misclassified, and a feature may be divided into many parts. Therefore, these problems can be solved by merging, so that better images can be merged. A large number of experiments have been carried out in this paper, and the image merging is obtained on the basis of the segmentation scale of 45. When the merging scale is 85 and the texture kernel is 3, it can be observed that the contours of objects in Fig. 4(a) study area can be well displayed. It can also accurately classify the required objects. Larger areas such as woodland and cultivated land have reached the ideal polygonal range.

Fig. 4.
figure 4

The best merged effect picture. (a) Segmentation image; (b) Original image

3.2 Extract Texture Features

In order to reduce the possibility of classification errors, we used different strategies to extract land cover. First, water bodies are determined using the normalized difference water index:

$$ NDWI = \frac{{\rho_{green} - \rho_{nir} }}{{\rho_{green} + \rho_{nir} }} > 0.2 $$
(1)

Vegetation through the normalized vegetation index:

$$ NDVI = \frac{{\rho_{nir} - \rho_{red} }}{{\rho_{nir} + \rho_{red} }} > 0.12 $$
(2)

Where \( \rho_{green} \), \( \rho_{red} \) and \( \rho_{nir} \) are green, red and near infrared band reflectivity; Furthermore, urban land use is identified using spectral information (such as Eq. (3)); Because the spectrum of the road is similar to that of other land, it is difficult to distinguish clearly. In order to reduce this effect, texture features of entropy (such as Eq. (4)) and contrast (such as Eq. (5)) of other land are added to the classification process. Finally, two different classifications are analyzed for accuracy.

$$ Max.diff = \frac{{i,j \in K_B \left| {c_i \left( v \right) - c_j \left( v \right)} \right|}}{c\left( v \right)} $$
(3)

where \( c_i \left( v \right) \) and \( c_j \left( v \right) \) are the average brightness of the \( v \) at the \( i \) and \( j \) levels, respectively. \( K_B \) are all objects. \( c\left( v \right) \) is the average brightness of the v at the whole level.

$$ Ent = \sum\nolimits_{i = 1}^n {\sum\nolimits_{j = 1}^n {p\left( {i,j} \right) \times \ln \left[ {p\left( {i,j} \right)} \right]} } $$
(4)

where \( p\left( {i,j} \right) \) is the probability of pixel value \( i \) and \( j \).

$$ Q = \sum_\delta {\delta \left( {i,j} \right)^2 p_\delta \left( {i,j} \right)} $$
(5)

where, \( \delta \left( {i,j} \right) = \left| {i - j} \right| \) | is the sum of the number difference of adjacent pixels and \( p_\delta \left( {i,j} \right) \) is the probability of the specified difference \( \delta \) of adjacent pixels.

3.3 Define the Classification Samples

Taking Yangshuo county in Guilin city as an example, this paper explores the application of object-oriented technology in land cover classification and sets up a classification system sample according to the actual situation of the research area. Through visual interpretation, use ROITOOL to create areas of interest, such as urban land, waterbody, woodland, arable land, road and other lands. In this paper, the land cover classification standard shown in Table 1 will be adopted [12].

Table 1. Sample Standard for Land Cover Classifications

According to the spectral characteristics of the image, the objects in the image are divided into 6 types of objects shown in Table 1. Then the sample is selected by drawing polygon to select the area of interest of each kind of feature is distinguished by different colors. The interest area is shown in Fig. 5. Red means urban land, lavender means water, cyan means woodland, yellow means wasteland, brown means roads, and light orange means other land.

Fig. 5.
figure 5

Area of interest

3.4 Classification of Executive Supervision

According to the classification complexity and accuracy of the research area, the classifier is selected. Different classifiers have different pixel values and can generate regular images according to the parameters of the classification results.

Support vector machine (SVM) classifier: in supervised classification system, SVM is based on statistical learning and SVM is used for classification and regression. SVM classifier separates testing from training. In the training set, each instance has multiple attributes and a target value, SVM the concept of decision surface is adopted. It is used for region classification to maximize the boundaries of classes, and the decision surface is also called the optimal hyperplane. The support vector is defined as a data point close to the decision surface. When creating the training sample set, the key element is the support vector [13]. The solution of SVM depends on the choice of kernel function. The use of different types of kernels in the SVM is linear, polynomial, radial basis function, and Sigmoid function. The linear support vector machine method is used in this paper, as shown in Fig. 6. The preprocessed GF-2 data was used by SVM.

Fig. 6.
figure 6

Linear Support Vector Machine method

The goal of SVM is to find an optimal hyperplane for classification, which can not only correctly classify each sample, but also make the distance between the closest sample to the hyperplane in each class of samples and the hyperplane as far as possible.

Suppose there are i samples in the training sample set, the feature vector is an n-dimensional vector, and the class label value is +1 or −1, corresponding to positive samples and negative samples, respectively. SVM finds an optimal classification hyperplane for these samples:

$$ {\varvec{w}}^{\rm T} {\varvec{x}} + {\varvec{b}} = 0 $$
(6)

where \( {\varvec{x}} \) is the input vector (sample feature vector); \( {\varvec{w}} \) is the weight vector and \( {\varvec{b}} \) is the bias term (scalar). these two sets of parameters are obtained by training.

First, ensure that each sample is correctly classified. For positive samples:

$$ {\varvec{w}}^{\rm T} {\varvec{x}} + {\varvec{b}} \ge 0 $$
(7)

For negative samples:

$$ {\varvec{w}}^{\rm T} {\varvec{x}} + {\varvec{b}} < 0 $$
(8)

Since the category label of the positive sample is +1 and the category label of the negative sample is −1, it can be uniformly written as the following inequality constraint, \( i = 1,2, \cdots ,n \), \( n \) is the number of training samples.

$$ y_i \left( {{\varvec{w}}^{\rm T} {\varvec{x}}_i + {\varvec{b}}} \right) \ge 0 $$
(9)

The second requirement is that the distance between the hyperplane and the two types of samples should be as large as possible. According to the distance formula from the midpoint to the plane in analytic geometry, The distance of each sample from the classification hyperplane:

$$ d = \frac{{\left| {{\varvec{w}}^{\rm T} {\varvec{x}}_i + {\varvec{b}}} \right|}}{{\left\| {\varvec{w}} \right\|}} $$
(10)

where \( \left\| {\varvec{w}} \right\| \) is the L2 norm of the vector. The following constraints may be added to \( {\varvec{w}} \) and \( {\varvec{b}} \):

$$ \min_{{\varvec{x}}_i } \left| {{\varvec{w}}^{\rm T} {\varvec{x}}_i + {\varvec{b}}} \right| = 1 $$
(11)

The constraint on the classification hyperplane becomes:

$$ y_i \left( {{\varvec{w}}^{\rm T} {\varvec{x}}_i + {\varvec{b}}} \right) \ge 1 $$
(12)

As a result, the interval between the classification hyperplane and the two types of samples:

$$ \begin{aligned} d\left( {{\varvec{w}},{\varvec{b}}} \right) = & \min_{x_i ,y_i = - 1} d\left( {{\varvec{w}},{\varvec{b}};{\varvec{x}}_i } \right) + \min_{x_i ,y_i = 1} d\left( {{\varvec{w}},{\varvec{b}};{\varvec{x}}_i } \right) \\ = & \,\frac{2}{{\left\| {\varvec{w}} \right\|}} \\ \end{aligned} $$
(13)

A SVM classification method is used to classify images with or without texture features through supervised classification based on sample rules. Because of the complexity and accuracy of classification, a large number of experiments are needed to improve the accuracy of classification. After performing supervised classification, perform post-classification processing such as Majority/Minority Analysis and Clump Analysis on the classification results to eliminate some of the small spots in the classification results and to obtain the final classification results.

3.5 Classification Accuracy Evaluation

In this paper, the confusion matrix in ENVI is used to evaluate the classification accuracy, and the pre-processed image is visually interpreted as the region of interest, so as to serve as a reference for the accuracy evaluation.

The confusion matrix is a standard format for classification accuracy evaluation of remote sensing images. A confusion matrix is a matrix of \( i \) rows by \( i \) columns, where \( i \) represents the number of categories. For the classification confusion matrix of remote sensing image, the basic statistics of the classification accuracy evaluation index include:

(1) Overall classification accuracy: the number of correct classifications divided by the sum of reference numbers;

$$ OA = \sum_{i = 1}^k {\frac{{x_{ii} }}{N}} $$
(14)

(2) Kappa coefficient: a statistical value of classification accuracy, ranging from 0 to 1. Shows how much better the classification approach is than randomly assigning each pixel to any class;

$$ Kappa = \frac{{N\sum\limits_{i = 1}^k {x_{ii} } - \sum\limits_{i = 1}^k {\left( {x_{hi} x_{li} } \right)} }}{{N^2 - \sum\limits_{i = 1}^k {\left( {x_{hi} x_{li} } \right)} }} $$
(15)

Where, \( {\rm k} \) is the sum of the total columns of the confusion matrix, that is, the total number of categories; \( x_{ii} \) represents the correct classification number of the \( i \) category, \( x_{hi} \) is the total number of samples in the column of type \( i \), \( x_{li} \) is the total number of samples in the row of type \( i \), and \( {\rm N} \) is the total number of pixels participating in the statistics. The relationship between Kappa value and classification accuracy is that the larger the Kappa value, the higher the classification accuracy.

(3) User accuracy: In the same category, the percentage of the correctly classified number \( x_i \) to the total number \( x_{i + } \) of this category represents the probability that a classified pixel can truly represent this category;

$$ UA\left( c \right) = \frac{x_i }{{x_{i + } }} $$
(16)

(4) Producer accuracy: It represents the ratio of the correct number \( x_{ii} \) of a certain category to the total number of real categories \( x_{ + i} \) of the category; Reflect the percentage of correctly classified reference data;

$$ PA\left( c \right) = \frac{{x_{ii} }}{{x_{ + i} }} $$
(17)

(5) Commission error: refers to the proportion of misclassified pixels;

(6) Omission error: refers to the proportion that itself belongs to the real classification of the surface, but is not classified into the corresponding categories by the classifier.

The accuracy of the object-oriented classification method is evaluated by using sample points. Table 2 shows the accuracy evaluation results of the object-oriented classification method of the k-proximity method.

Table 2. Accuracy evaluation result of classification method with or without texture information of SVM

4 Experimental Result

The image was preprocessed by ENVI software V5.3.1 (image data preprocessing process such as orthorectification, image registration, image fusion and cropping). For different urban land cover, the optimal segmentation scale for image segmentation is 45, the merge scale is 85, and the texture kernel is 3. After that, five kinds of urban ecological land cover are classified by ENVI software, and urban land, water body, woodland, cultivated land, road and other land are classified one by one. Because other land and roads are difficult to distinguish, texture information is added to the classification process.

The classification results are shown in Fig. 7 and Fig. 7 (a) and (b) are the results without texture information and with texture information, respectively. They show that the use of texture information turns road pixels into urban land, making the classification result closer to reality. It can be seen from the classification results that some objects of the road are classified as urban land, which makes the display of urban land more complete. The sample points are used to evaluate the accuracy of the object-oriented classification method. Table 2 shows the accuracy evaluation results of the object-oriented classification method with or without texture information in SVM.

Fig. 7.
figure 7

Classification result map (a): without texture; (b) texture

From the accuracy evaluation results in Table 2, it can be seen that the classification effect of SVM is relatively good, the overall accuracy is above 95%, and the Kappa coefficient is greater than 0.95. Among them, the overall accuracy of the SVM classification results with texture features is the highest, reaching 97.41%, and the Kappa coefficient is 0.9605; the overall accuracy of the untextured SVM classification results is the second, with an overall accuracy of 96.89% and a Kappa coefficient of 0.9539. It is found from the table that the producer accuracy of road and other land with texture information is 99.44% and 88.33% respectively, while the producer accuracy without texture information is only 84.99% and 79.42% respectively.

From the two Fig. 8 and Fig. 9, it can be clearly found that the user accuracy and producer accuracy of all kinds of objects with texture information have good performance. Using the texture information in the classification rules, the accuracy of roads and other land use is greatly improved. From this point of view, the object-oriented classification method with texture information in this paper can improve the accuracy of urban areas, which is a promising method and should be extended to other urban land cover.

Fig. 8.
figure 8

User accuracy comparison with and without texture

Fig. 9.
figure 9

Producer accuracy comparison with or without texture

5 Conclusion

Based on the object-oriented method to extract spatial characteristics of land cover classification, GF-2 in the image is utilized to extract the urban land cover information, on the basis of image segmentation, the object using cover features information, Though the establishment of the classification system to classify urban land, water, woodland, arableland, road, and other land six land cover and evaluate it accuracy. The results show that the classification effect with texture feature information is better, and the classification accuracy is as high as 97.41%. But this object-oriented method still has some errors, which may be caused by the setting of the segmentation threshold, the availability of supervised classification and the complexity of land. Therefore, the next work will further refine the classification system to improve the accuracy and increase the segmentation and recognition of other types of data.