Keywords

1 Introduction

Remotely sensed images are rich in geographical information by capturing various geographical objects. Geographical information can be useful for different sectors like government, business, science, engineering and research institutes. Geographical information can be used for planning, extraction and analysis of natural resources and help improve the vegetation of an area. These are few examples but there can be gazillions of its advantages.

Remotely sensed images that we can acquire through satellites, sensors and radars are very large in numbers and in size as well. With the advancement of technologies, such as image digitization and storage, quantity of images has also elevated [1]. Each image has huge information residing inside it in the form of objects. It is difficult for humans to go through each image and extract patterns from such images. With the help of state of the art image storage, data analysis and classification techniques it is possible to automate who process to understand hidden patterns and help improve the prediction, in various domains.

A number of techniques have been proposed for object recognition. Adnan A. Y. Mustafa [2] proposed an object recognition scheme by matching boundary signature. For this purpose, he introduced four boundary signatures: the Curvature Boundary Signature (SCB), the Direction Boundary Signature (SAB), the Distance Boundary Signature (SDB) and the Parameter Boundary Signature (SPB). These signatures can be constructed by using boundary’s local and global geometric shape attributes. This approach is based on the shape of the object. For shape of object, boundary of object is required to compute. But some objects don’t have distinct boundaries. Soil, ecological zone are good examples that don’t have distinct sharp boundaries.

Ender Ozcan et al. [3] proposed a methodology for partial shape matching by using genetic algorithms. For this purpose, they described features of model shapes in form of line segments and angles. For recognition purpose, they matched input shape with model shapes. In search, GAs uses a population of individuals of fixed size. New solutions can be produced by using operators.

Farzin Mokhtarian et al. [4] proposed a method that was based on the maxima of curvature zero crossing contours. They used it for representing it as feature vector of Curvature Scale Space (CSS) image that can be used for describing the shapes of object boundary counters. In their purposed matching algorithm, they compared two sets of maxima. Then they assigned a matching value as a measure of similarity.

Kyoung Sig Roh et al. [5] recognize Object by using invariant counter descriptor and projective refinement methodology. For this purpose, they proposed a contour descriptor that consists of a series of geometric invariance of five equally spaced coplanar points on the contour. This descriptor is basically used for indexing a hash table that helps for recognizing 2D curved objects. For verification, they used projective refinement. They repeatedly compute a projective transformation between model and scene contours. They used resulting pixel error after projective refinement to prove whether a generated hypothesis is true or false.

For recognizing objects, Mariana Tsaneva et al. [6] used transform methods e.g. Gabor filter and wavelet transform for texture analysis. The drawback of this technique is that wavelet performance is not satisfactory in image processing when bandwidth is not high and images are in compressed form.

Jean-Baptiste Bordes et al. [7] prescribed an algorithm that integrates an angular local descriptor and radial one. It also used the co-efficient that are produced as a result of the Fourier analysis. Their proposed algorithm work for roundabout detection but it is not for general object detection. Secondly proposed methodology is not fast enough.

Talibi-Alaoui and Sbihi [8] used unsupervised classification approach of neural networks to classify the textual images. Jiebo Luo et al. [9] proposed a methodology for characterizing satellite images that was based on the color- and structure-based vocabularies. The drawback of this methodology is usage of larger vocabulary. Large vocabulary provides only small gain in accuracy but it requires a greater computation cost.

Most of the object recognition algorithms don’t produce satisfactory results when boundaries of the objects are not constant, clear and sharp enough. Desert, sea and ecological zone are well known examples of such objects that don’t have distinct boundaries [1]. We propose a methodology that is independent of the shape and boundary of the object. In our proposed object recognition methodology, we shall focus on two major domains that are: pixel intensity and organization of color pixels [10]. We shall use these two techniques to calculate the attributes from the image. Decision tree shall be helpful for taking decision by classifying the objects by using attributes that are extracted from the image for object recognition. The attributes shall be stored in a database that shall serve as knowledge base which will be useful for classification of satellite images based on similarities [11]. The process is initiated with the broader domain of objects like greenery or water but later such objects will be further subdivided into various types such as different kinds of greenery like grass, bushes, trees etc. for greenery and water into sea, pond, river or freshwater etc.

2 System Mode Details and its Description

Our system has two modes, one of them is the operational or testing mode and another one is training mode which is working on testing and training data sets respectively.

The whole data which is collected from Google Earth is divided into two portions, training data and testing data. In training mode, we used half of sample images for each object. After acquiring the images, we employed the pixel intensity and organization of color pixels to each image. As a result of employment of pixel intensity and organization of color pixels on each image, we extracted attributes from each image and then these attributes were stored in the database. Based on the result of Pixel Intensity and Organization of Color Pixels, we build a decision tree that shall be helpful for object recognition from new incoming images.

In testing mode, we provide a satellite image that has some objects. We employee the pixel intensity and organization of color pixels to incoming satellite image and extract attributes from the image. Afterward the extracted attributes are given to decision tree classifier for making decision about the presence of the object.

Fig. 1
figure 1

Block diagram of object recognition system

Fig. 2
figure 2

Block diagram of pixel intensity

Figure 1 shows a block diagram of the object recognition system. The input to the system is the satellite image. For recognizing object from satellite image, first we employed Pixel intensity to extracts attributes from the input image. We then employed Organization of color pixels on the image to extract attributes. In organization of color pixels, we divide the image into grids and for each grid we computed attributes of organization of color pixels. On the basis of the attributes extracted under the Pixel intensity domain and organization of color pixel, we have drawn ID3decision tree with the help of database established in the training mode in which an hypothesis is made about presence of objects in image.

2.1 Object Recognition Scheme

An object can be recognized by its form, texture and color. If we deal with the form of the object then boundary of an object is very important. In satellite image one of the major issues is the boundary of the object as some objects like soil don’t have distinct boundary [1]. To deal with such a problem, we developed a technique that is independent of the object boundary. Color and texture can play prominent role in object recognition because color is an important factor in object recognition rather than intensity [12]. The proposed object recognition algorithm shall get cue from color and texture of the object for making decision about the presence of the object. The algorithm is also independent of rotational, size and translational invariance.

  1. i.

    Pixel intensity

In pixel intensity, we deal with the intensity of image. We focus on color histogram and other statistical attributes that are calculated from RGB color space of an image and gray scale image which are the contributors for recognizing objects from the image. We compute a number of attributes under pixel intensity domain that are color histogram of image, mean, standard deviation and variance of RGB color space and entropy of gray scale image.

Figure 2 shows all the attributes of the pixel intensity domain that are computed from satellite image that are acquired through the Google Earth.

Color histogram represents the distribution of color pixels inside the image. In simple words, we can say that the Color histogram is the counting of each color inside an image. Color histogram refers to the joint probabilities of intensities of three color channels. Color histogram can be defined as [13]:

$$\begin{aligned} \mathrm{{h}}_{{\mathrm{{R,G,B}}}} [\mathrm{{r,b,g}}] = \mathrm{{N}}\mathrm{{.Prob}}\left\{ {\mathrm{{R}} = \mathrm{{r,G}} = \mathrm{{g,B}} = \mathrm{{b}}} \right\} \end{aligned}$$

where R, G and B refer to three color channels and N refers to number of pixels in the image. Color histogram can be computed by counting the number of pixels of each color from an image. It is more convenient approach to converts three channel histogram into a single variable histogram. This transformation can be achieved by using the following equation [13]:

$$\begin{aligned} m = r + N_{r} g + N_{r} N_{a} b \end{aligned}$$

where \(N_{r}\) and \(N_{g}\) are the number of bins for colors red and green respectively and \(m\) shall be the single variable histogram.

Histogram can be computed by using the equation [14]:

$$\begin{aligned} h(m_{k} ) = n_{k} \end{aligned}$$

where \(m_{k}\) is \(k\)th single value color, \(n_{k}\) is the number of pixels in the image having single value color \(m_{k}\) and h(\(m_{k}\)) is the single value color histogram .

If we deal an image that lies in RGB color space then the computation complexity of image shall be 256 \(\times \) 256 \(\times \) 256 = 16,777,216. For reducing the computation complexity of the color space, we shall prefer to quantize the color space. We can discrete the color of the image into the finite number of bins. In our case we divide the RGB color space into 3 bins. As a result of three bins, the color of image 16,777,216 is quantized to the \(3\times 3\times 3=27\) colors which deteriorate the quality of the results. The process can obviously be improved with a slight decrease in the efficiently by increasing the number of bins from 3 to 4 and so on until we reach at the full intensity of the image.

The range of bin1 lies between 0 and 85 intensities, range of bin2 lies between 86 and 172 intensities and bin 3 lies between 173 and 255 intensities. After quantization of RGB level, we convert three channels color into a single variable. Afterward we compute the color histogram of single color variable.

Texture of object is also useful in object recognition process. For extracting the texture information of the image, we computed the entropy of the image. Entropy is another attribute of our pixel intensity domain. Entropy is a statistical measure of randomness. Entropy can be useful for characterizing the texture of the input image. We can utilize entropy for measuring the information that lies in the signal. Entropy typically represents the uniformity [15, 16].

For computing the entropy of the image, we transform image into the gray scale. After getting the grayscale image, we shall compute the histogram of gray scale image. After computing the histogram, entropy of grayscale image can be computed by using the following equation [17]:

$$\begin{aligned} Entropy = - \sum {p_{i} \log _{2} p_{i} } \end{aligned}$$

Another attribute of the pixel intensity domain is the mean of RGB image. Mean is the statistical measure that is used for determining the central value from the distribution of data set. Mean can be computed by the dividing sum of all elements of data set to the number of data set. We can compute the mean of RGB image by using the following equation [18]:

$$\begin{aligned} \mu =\frac{1}{N}\sum _{i=1}^N {x_i } \end{aligned}$$

Another attribute of the pixel intensity domain is the standard deviation of RGB image. Standard deviation is a statistical measure that can be used for measuring the dispersion inside the data set. If we take the square root of the standard deviation then we shall acquire the variance. We can compute the standard deviation of RGB image by using the following equation [18]:

$$\begin{aligned} S_N =\sqrt{\frac{1}{N}\sum _{i=1}^N {(x_i -\mu )} } \end{aligned}$$

where \(\mu \) is the mean of all elements of the data set

The last attribute of the pixel intensity domain is the variance of RGB image. Variance is a statistical measure that can be used to measure how far value lies from mean. It is used for measuring dispersion. Variance can be measured by using the following equation [18]:

$$\begin{aligned} S_N ^{2}=\frac{1}{N}\sum _{i=1}^N {(x_i -\mu )} \end{aligned}$$

After extracting all of the attributes of the pixel intensity domain, the attributes handover to the decision tree induction module that make a hypothesis about the presence of the object.

  1. ii.

    Organization of color pixel

For improving our object recognition method, we introduced another domain that is Organization of color pixels. In Organization of color pixels, we divide our image into the number of equal size grids and extracts useful attributes from each grid.

From each grid, we computed a number of useful attributes that might help in recognizing an object from the image. These attributes are color histogram of image, mean, standard deviation and variance of RGB color space and entropy of gray scale image.

Fig. 3
figure 3

Block diagram of organization of color pixels

Figure 3 is showing that when satellite image is provided to the system for computing the attributes of organization of color pixels then first it is divided into number of grids and afterward all the attributes of the organization of color pixels are computed from each grid. For each grid, we shall compute quantized color histogram, mean, standard deviation, variance and entropy.

After extracting all of the attributes of organization of color pixel from each grid of an image, the attributes are mapped in a decision tree induction module that makes a hypothesis about the presence of the object.

  1. iii.

    Decision tree induction

This module is helpful in making decision about the object inside an image. We stored the evaluated attributes in a database.

In training phase, a number of objects for each class are taken. For each object belonging to a certain class we extracted the attributes under pixel intensity domain and organization of color pixels. These all attributes are helpful for recognizing objects from satellite image. After acquiring attributes from training phase, we make a decision tree that shall be helpful for making decision about incoming object into the system.

2.2 Classification of Images

The images are extracted from Google Earth and rectified for classification purposes. The objects from the image are recognized on the basis of algorithm proposed in Sect. 3. Afterward recognized objects are passed to a decision tree that makes a decision in which class satellite image lies. We generalized our image classification scheme to determine if hydrocarbons are present in a particular area or not.

Fig. 4
figure 4

Block diagram for classification of images

Figure 4 shows the block diagram of the image classification scheme. First of all we shall recognize objects from a satellite image then passed identified object to a decision tree that make a decision either hydrocarbons is present in that area or not.

2.3 Summary of Algorithm

We can summarize our proposed algorithm as:

  1. 1.

    Acquire the satellite image from the Google earth.

  2. 2.

    Apply Pixel intensity domain on satellite image for calculating attributes under Pixel Intensity domain.

  3. 3.

    Apply Organization of Color Pixel domain on satellite image. For this purpose, we divide our satellite image into a number of grids and calculate attribute of each grid.

  4. 4.

    Pass calculated attributes from Step 2 and 3 to the decision tree.

  5. 5.

    Decision tree makes a decision about the presence of the objects inside the image.

  6. 6.

    After identification of the object, information is passed over to the classification module.

  7. 7.

    Classification module has a decision tree that makes a decision about the presence of the hydrocarbon inside an area.

3 Verification of the Proposed Scheme

For verifying our purposed scheme, we focused on five main classes of objects that are tree, greenery, water, soil and rocks. Later on these classes are further divided into sub class to refine the results.

Fig. 5
figure 5

Attributes of Pixel intensity domain (a) Mean of RGB of the five classes (Tree, Water, Greenary, Rock and Soil) (b) Standard deviation of RGB of the five classes (c) Variance of RGB of the five classes and (d) Entropy of RGB of the five classes

Figure 5 shows the mean, standard deviation, variance and entropy of RGB of the five classes. As you can see in the figure that value of each attribute of each class is different from another class that is helpful for differentiate one class from another classes.

For performing experiment, we took a number of satellite images from Google Earth. We took a number of images of each class. For object recognition from satellite images, we used our proposed object recognition scheme as we described in Sect. 3. For each image of each class, we applied the Pixel intensity and Organization of Color Pixel domain. As a result of this we acquired the attribute of both domains.

Figure 6 shows the mean, standard deviation, entropy and variance of the five classes. These four attributes play a vital role in making a decision about object recognition.

Figure 7 shows the quantized color histogram of the tree, water, greenery, rock and soil. As you can see in the figure that peak of histogram varies from class to class that is helpful for differentiate each class from another class.

Fig. 6
figure 6

Mean, Standard deviation, Entropy and Variance of the five classes

Fig. 7
figure 7

Quantized color histogram of the tree, water, greenery, rock and soil

Fig. 8
figure 8

Attribute of pixel intensity domain: Color histogram: a Color Histogram of Tree b Color histogram of Water c Color histogram of Greenery d Color histogram of Rock and e Color histogram of Soil

Fig. 9
figure 9

Decision tree for classification of hydrocarbon

Figure 7a shows quantized color histogram of tree. As you can see in the figure that peak of histogram is remarkably high for the color set 1, 4, 13 and 14. These colors play an important role in differentiating tree from other objects. Figure 7b shows the quantized color histogram of the water. As you can see, peak of histogram is remarkably high for the color set 1, 2, 5 and 14. Figure 7c shows the quantized color histogram of the greenery. As you can see for greenery class that peak of histogram is remarkably high for the color set 4, 13 and 14. Figure 8d shows the quantized color histogram of the rock. As you can see in the figure that peak of histogram is remarkably high for the color set 1, 13, 14, 23, 26 and 27. Figure 7e shows the quantized color histogram of the soil. As you can see in the figure that peak of histogram is remarkably high for the color set 13, 14, 22, 23 and 26.

Fig. 10
figure 10

Color Histogram of water subclasses Pond, River and Sea water

In order to improve the object refinement the objects can further be subdivided into sub classes for all the main five classes used for this experimentation. Rock can further subdivided into Andesite, Boninite, Pegmatite etc. Similarly water can be of various types such as pond water, river water, sea water etc. These subclasses of objects can remarkably increase the classification results after refine object identification.

In our experiment we divided water into further three subclasses i.e. pond, river and sea water. Figure 8b shows the color quantized histogram of water. The water subclass’s quantized histograms are shown in Fig. 10.

Figure 10 shows the quantized color histogram of the Pond River and Sea water. As you can see in the figure that peak of Pond histogram is remarkably high for the color set 1 and 14. These colors play an important role in differentiating pond from other subclasses. For River Water the peak of histogram is remarkably high for the color set 2 and 5. These colors differentiate river from other subclasses. Similarly peak of histogram for sea water is remarkably high for the color set 5 and 6. These colors play an important role in differentiating Sea from other subclasses.

In order to verify the working of the algorithm for the recognition of various water types, the results are tabulated in the form of confusion matrix show in Table 1.

River water result is not 100 percent accurate. Samples of river water are sometimes identified as both river and sea water. This problem can be further solved if we incorporate higher number of RGB color bins.

After extracting attributes from the pixel intensity and organization of color pixel, these attributes are hand over to the decision tree induction module for making hypothesis about the presence of the object. For identifying objects from an image, this information was passed over to the image classification module that decides either hydrocarbon is present in the area of given image or not.

In our experiments, we consider only five types of the classes: tree, water, greenery, rock and soil. These classes can be useful for finding the presence of the hydrocarbon in an area. We took total 42 samples of different area from which some of area has hydrocarbon and others don’t have. We make a database that has the information of the five classes and presence of hydrocarbons. From this data, we make decision tree on the base of presence of hydrocarbons and our five classes: tree, water, greenery, rock and soil. Rapid miner is used to build the decision tree and the result is shown in Fig. 9.

Table 1 Confusion matrix of subclasses of water

The figure shows the decision tree for classification of hydrocarbon. In Fig. 9, 1 indicates the presence and 0 indicates the absence. From decision tree, we can conclude that water and greenery play an important role in making decision about the presence of hydrocarbons in an area. From decision tree, we can easily infer that if there is water in an area then there can be chance of presence of hydrocarbons in that area. If water is not present but greenery is present then there can also be chance of hydrocarbons in that region.

4 Conclusion

In this research paper, we have proposed a method for classification of satellite images by using data mining technique. Major concern in classification of images is the object recognition from satellite image. Our object recognition technique is based on two domains that are Pixel intensity and organization of color pixels. All of the attributes that are extracted under Pixel Intensity and Organization of Color Pixel domain are helpful for making hypothesis about object’s identification. Our purposed object recognition scheme is also independent of rotational, size and translational invariance. It is also computationally efficient. After recognition of objects from an image, information is handed over to decision tree that makes decision about the presence of hydrocarbon inside that area.

Our proposed methodology has been tested for 42 satellite images. The experimental results are satisfactory and show that the system’s accuracy is around 80%. The proposed model shows a higher degree of robustness and accuracy for the object recognition process.

We focused on only five classes: tree, water, greenery, rock and soil. Our image classification scheme can be improved by adding more classes. Addition of sub classes of these classes (tree, water, greenery, rock and soil) can also boost the performance of our proposed methodology. For example: sub class of water can be lake water, sea water, river water and water covered with greenery etc.

For hydrocarbon classification we just consider the classes that were related to natural phenomena. Tree, water, greenery, rock and soil are part of nature. Similar classification scheme can be employed for enemy attack detection but the classes used for this activity should be related to enemy attack. Unnecessary human movement, presence of tanks, unwanted bushes and sand blocks etc. can be an indication of enemy attack.