1 Introduction

Research focusing on image classification has long attracted the attention of the remote-sensing community as classification results are the basis for many environmental and socioeconomic applications such as agriculture, defence system, weather forecasting, and disaster management system etc.

Image classification is the fundamental task for processing of remotely sensed images. In a broad sense, image classification is defined as the process of categorizing all pixels in an image or raw remotely sensed satellite data to obtain a given set of labels or land cover themes. The major steps of image classificationmay include the determination of aappropriate classification system, selection of training samples, image pre-processing, feature-extraction, selection of suitable classification approach, post-classification processing, and accuracy assessment. Generally, we classify the image by either pixel-based classification or object-based classification.

Mainly land-use/land-cover classifications have been created using a pixel-based classification of remotely sensed imagery. Where in, either a supervised classification, unsupervised classification or a combination of both these two is used Enderle and Weih [1]. These pixel-based procedures analyze the spectral properties of each pixel within the area of interest without taking into account the spatial or contextual information related to the pixel of interest.

Scientists made immense efforts in the development of advanced classification approaches and techniques for improving the classification precision Kontoes et al. [2], Foody [3], San Miguel Ayanz and Biging [4], Aplin et al. [5], Stuckens et al. [6], Franklin et al. [7], Pal and Mather [8], Gallego [9]. However, classifying the remotely sensed data into a thematic map remains a challenge because of many factors, such as the complication of the landscape in the area of study,causes of less accuracy. Although a lot of previous researchs are specifically concerned with image classification Tso and Mather [10], Landgrebe [11], a comprehensive, up-to-date review of classification techniques are not available. The recent years' continuous appearance of new classification techniques such a review, which will be extremely valuable for guiding and selecting a suitable classification procedure for a particular study.

In per pixel classification, the uncertainty of pixel class is derived mainly from the mixed pixels known as “Mixels” for the medium and low-resolution remotely sensed images. For high-resolution remotely sensed images, it is mainly caused by clearly visible surface information. For example,high-resolution remote sensing images clearly visible vegetation, water-body, building area, and roads. In high resolution, remote sensing images, the uncertainty of the classification decision is further increased by the complexity and diversity of the ground objects and the lack of actual surface information.

The problem of mixed pixel occurs when at the scale of observations, there are number of classes contributing to the observed spectral response of the pixel. The implementation of fuzzy classification technique has been motivated, in many studies, by the presence of the mixed pixel problem. Some previous works on the interpretation of fuzzy set membershipsof pixels in remotely sensed images as mixed pixel class proportions have produced good results. Bezdek et al. [12], Fisher and Pathirana [13], Pathirana [14], Wang [15, 16], Foody [17,18,19], Maselli et al. [20], and Atkinson et al. [21], Choodarathnakara [22] all have recommended, with varying degrees of experimental support, that there is a strong relationship between fuzzy memberships (usually derived from statistical approaches) and proportions of the ground cover.

In digital image analysis data science tools with artificial intelligence has the ability to not just assist users in face recognition but help in detecting objects available in the camera. The tools scan all the objects and attempt to name and identify them. Computers can make use of machine vision technologies in combination with a camera and artificial intelligence software to achieve image recognition and image classification. Data science is also capable of recognizing any special patterns, be it facial expressions or texture, in the image and matches it with its database. It also has the potential to detect colors and shapes present in the image and provides the users with appropriate insight into the contents of the image. Pattern recognition entirely relies on data and derives any outcome or model from data itself. It is the ability to detect arrangements of characteristics or data that yield information about a given image or dataset.

Data science and digital image processing are becoming an increasingly integral part of image analysis elaborates on the understanding of big data and its analytics, evolution and challenges; such as Shi [23]. Mandal [24] has used weighted Tchebycheff optimization technique under uncertainty. Olson and Shi [25] introduction to business data mining. In the domain of image processing, image mining is advancement in the field of data mining; such as optimization based data mining: theory and applications by Shi et al. [26]. Real-time decision making is central to the internet of things; it is about decision informatics and embraces the advanced technologies of sensing such as; Tien [27] internet of things, real-time decision making, and artificial intelligence.

In this paper, a fuzzy supervised classification method in which geographical information is represented as fuzzy sets has been used to overcome the problem of mixed pixels. Partial membership of the mixed pixels allows component cover classes to be identified and more accurate statistical parameters to be generated. As a result, the error rates get reduced compared to conventional classification methods like LDF and QDF.

The other Sections of this study are organised as; Sect. 2 explain the problem of interest. Statistical image classification methods and their types are briefly describes in Sect. 3. Description of framework for the study is given in Sect. 4. Finally, results of the study and their conclusions are explained in Sect. 5 respectively.

2 Problem of Interest

The presence of large size pixels in land cover satellite imagery increases the possibility of the presence of mixed pixels as well. Forinstance, a pixel in land sat imagery has a size of 10 × 10 m, whereas the pixel size ofQuickBird imagery is 0.60 × 0.60 m. This difference in resolution increases the chances of encountering mixed pixels in Landsat imagery as compared to the QuickBird image. Thus, mixed pixels can be one of the sources of error per pixel classification and should be treated accordingly (Fig. 1).

Fig. 1
figure 1

Mixed pixels due to the occurrence of small, sub-pixel targets within the area it represent

The fuzzy set theory provides useful tools to deal with the problem ofmixed pixel in land cover classification. The concept of partial membership allows information about more difficult situations, such as cover mixture or intermediate conditions, to be better represented and utilize. The conventionalper-pixel classifier which assumes, pixels as pure pixels (i.e. the entire pixel contains only one surface cover or one class) is termed as conventional classifier. Classifiers that assume fuzzy membership of multiple classes in a pixel are called as fuzzy classifiers.

3 Statistical Image Classification Methods

To categorize every spatial unit in a digital image into one of the several classes (or populations) of concern on the basis of a multivariate vector of observations available for every such unit.The spatial unit to be classified may be a pixel or an object in the imaged view. Each pixel of an image is represented by a multivariate vector consisting of a set of measurements (e.g. textural features, spectral bands etc.). Decision boundaries are based on some particular classifier, and these are defined for different classes. A decision rule, referred to as discriminant function, is framed that defines the position of every pixel or picture element with respect to the decision boundaries and hence allocates a specific label to each pixel.

3.1 Supervised Classifiers

Supervised classification is more accurate for mapping classes but depends heavily on the cognition and skills of the image expert. The strategy is simple; the expert must recognize conventional classes (actual and well-known) or significant (but somewhat artificial) classes in a sight from prior information, such as personal experience with the region known, by experience with the thematic maps, or by on-site visit. Each pixel is classified into one of the several classes based on some function of the multivariate observations on training data which maximizes the separation between the groups to be trained on the basis of the training datasets,which are samples of known identity from the data. The two statistical supervised classifiers that analysts most readily use are the classifiers based on Linear Discriminant Function(LDF) and the Quadratic Discriminant Function(QDF).

3.2 Linear Discriminant Function

Linear discriminant analysis (LDA) is based on the generalization of Fisher's linear discriminant function. A method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or more commonly, for dimensionality reduction before classification.

Let us consider two independent random samples of sizes \(n_{1}\) and \(n_{2}\) respectively, from each of two p-variate populations, and that a method of best distinguishes between these samples was required. The only assumption is that the dispersion matrices in these two populations were equal otherwise, the populations were completely unspecified. With this assumption, the data can be summarized by computing sample mean vectors \(\overline{X}_{1}\) and \(\overline{X}_{2}\) and the pooled within-sample covariance matrix \(S\). Fisher [28], then looked for the linear combination \(w = a^{\prime}\underline{X}\) of responses that gave maximum separation of the group means when measured relative to the within-group variance of the data. This linear combination he found, by maximizing \(\{ a^{\prime}(\overline{X}_{1} - \overline{X}_{2} )\}^{2} /a^{\prime}\,S\,a\), to be \(w = (\overline{X}_{1} - \overline{X}_{2} )^{\prime}\,S^{ - 1} \underline{X} .\) The given group separation is maximized by this function, then a sensible allocation rule can be constructed by allocating \(\underline{X}\) to \(\pi_{1}\) if \((\overline{X}_{1} - \overline{X}_{2} )^{\prime}\,S^{ - 1} \underline{X}\) is greater than some constant \(k\), and otherwise to \(\pi_{2}\).

Note that this function is of exactly the same form as allocate \(\underline{X}\) to \(\pi_{1}\) if \(L\,(\underline{X} ) > \log_{e} k\), otherwise to \(\pi_{2}\) where \(L(\underline{X} ) = (\overline{X}_{1} - \overline{X}_{2} )^{\prime}S^{ - 1} \{ \underline{X} - \frac{1}{2}(\overline{X}_{1} + \overline{X}_{2} )\}\). The function \(L\) is known as the sample linear discriminant function. Since the portion \(\frac{1}{2}(\overline{X}_{1} - \overline{X}_{2} )^{\prime}\,S^{ - 1} (\overline{X}_{1} + \overline{X}_{2} )\) of the latter is merely a sample-based constant and can be absorbed into right-hand side of inequality. The function \((\overline{X}_{1} - \overline{X}_{2} )^{\prime}\,S^{ - 1} \underline{X}\) is generally known as Fisher’s linear discriminant function (LDF).

3.3 Quadratic DiscriminantFunction (QDF)

Quadratic discriminant functions are developed using the maximum likelihood criterion (MLC), the basic underlying assumption for which is that the probability distributions of the populations (or classes) under study should be multivariate normal.

To discriminate between two p-variatenormal populations \(N_{p} (\mu_{1} ,\Sigma_{1} )\) and \(N_{p} (\mu_{2} ,\Sigma_{2} )\), optimal classification of a new observation \(\underline{X} \in \,\,\,R^{p}\) into one of the two populations using MLC is based on the quadratic function,

$$Q(x) = \,\,X^{\prime}\,\overline{A}\,X + \overline{b}^{\prime}X$$
(1)

and \(\underline{X}\) is allocated to population 1 or 2 according to \(Q(X) \ge \overline{c}\) or \(Q(X) < \overline{c}\) respectively. Here,

$$\begin{aligned} \overline{A}\, = \,\, - \frac{1}{2}(\sum_{1}^{ - 1} - \sum_{2}^{ - 1} ) \hfill \\ \overline{b}\, = \,\,\sum_{1}^{ - 1} \mu_{1} \,\, - \,\,\sum_{2}^{ - 1} \mu_{2} \hfill \\ \end{aligned}$$
(2)

and \(\overline{c}\) is a constant that depends on the \(\mu_{i} \,\,,\,\,\sum_{i}\)(where, \(i\, = \,1,\,2,\,...,\,n)\) in the general setup, on prior probabilities and costs of the misclassification. We refer to this classification procedure as quadratic discrimination. When \(\sum_{1} \,\, = \,\,\sum_{2}\) then the quadratic term vanishes, and we obtain the well known linear classification rule.

3.4 Fuzzy Classification

Zadeh [29] proposed a new theory called “Fuzzy Sets”. Fuzzy set theory suggest that multiple classes or sets can be present at one place or at one time and expresses the probability that each set or class is present as a membership value or the value of belongingness (Hedge, 2003).A fuzzy set is defined by a membership function which define how each point in the input set is mapped to a membership value between 0 and 1. If the membership value of an element is 0, it means it does not belong to that class, and if it is 1, then it completely belongs to that class. In crisp sets, the membership value is either 1 or 0.Fuzzy memberships allow a pixel to belong to more than one class with a degree of membership in each class. Fuzzy classification is used to handle the problem of mixed pixel in which a given pixel may have partial membership in more than one class. In this classification, it is assumed that each pixel comprisemore than one class, and classification can be made using either supervised or unsupervised techniques. The fuzzy approach is being used to process and analyze images in different ways; such as Bibiloni et al. [30], has used fuzzy mathematical morphology to process the digital images. Dwivedi et al. [31], used a fuzzy approach for learning unknown patterns to be used with a neural network for analyzing satellite images to estimate crop area. Thapa and Murayama [32], compared four approaches for mapping that include fuzzy supervised and GIS post-processing: the other two were the conventional approaches unsupervised and supervised. The study highlighting the advantage of each approach did not provide the impact of fuzzy membership on LDF and QDF.

3.5 Membership Functions

A membership function for a fuzzy set A on the universe of discourse, \(\underline{X}\) is defined as \(F_{A}\):\(\underline{X}\) → [0,1] where each element of \(\underline{X}\) is mapped to a value between 0 and 1. This value, called membership value or degree of membership, quantifies the grade of membership of the element in \(\underline{X}\) to the fuzzy set A.

A fuzzy set does not have clearly defined boundaries, and a element of that set may have partial membership. In fuzzy representation of a remotely sensed image, land use/land-cover classes can be defined as fuzzy sets, and pixel values as a elements of that set.Each mixed pixel is attached with a group of membership grades to indicate the degree to which the pixel belongs to certain classes.

3.5.1 Triangular Membership Function

Let a, b and c represent the \(x\) coordinates of the three vertices of \(F_{A} \,(x)\) in a fuzzy set A, where ‘a’ is a lower boundary and ‘c’ is a upper boundary where degree of membership is zero, b is the centre where degree of membership is 1.

$$F_{A} (x) = \left\{ {\begin{array}{*{20}c} 0 & {if\,\,x \le a} \\ {\frac{(x - a)}{{(b - a)}}} & {if\quad a \le x \le b} \\ {\frac{(c - x)}{{(c - b)}}} & {if\quad b \le x \le c} \\ 0 & {if\quad x \ge c} \\ \end{array} } \right.$$
figure a

3.5.2 Trapezoidal Membership Function

Like the triangular membership function, the trapezoidal membership function also has a linear boundary for fuzzy set, and a trapezoidal membership function is characterized by a threshold (range of elements) having maximum membership grade. Let a, b, c and d represent the x coordinates of the membership function. Then,trapezoid(x; a, b, c, d) = 0 if x ≤ a;

$$F_{A} \,(x) = \left\{ {\begin{array}{*{20}c} {0,} & {x < a} \\ {\frac{(x - a)}{{(b - a)}},} & {a < x \le b} \\ {\frac{(d - x)}{{(d - c)}},} & {c \le x \le d} \\ 0 & {x \ge d} \\ \end{array} } \right.$$
$$F_{A} \,(x) = \max \left( {\min \,\,\left( {\frac{x - a}{{b - a}}\,\,,\,\,1\,\,,\,\,\frac{d - x}{{d - c}}} \right)\,\,,\,\,0} \right)$$
figure b

3.5.3 Gaussian Membership Function

The Gaussian membership function is defined as Gaussian (x: µ,\(\sigma\)) where µ and \(\sigma\) represents the mean and standard deviation, respectively.

$$F_{A} \,\,(x\,\,:\,\,\,\mu \,,\,\,\sigma \,\,,\,\,m\,\,) = \exp \,\,\left[ {\frac{ - 1}{{\,\,\,2}}\,\,\left| {\frac{x - \mu }{\sigma }} \right|^{m} } \right]\,\quad ( - \infty \, < \,\mu \, < \,\infty \,\,,\,\,\sigma \, > \,0\,)$$

Here, m represents the fuzzification factor.

figure c

3.6 Proposed Classifier for Mixed Pixel

Per Pixel Classification assumes pixels as pure pixels (i.e. the entire pixel contains only one surface cover or one class); it is termed as a conventional classifier. In this paper, Fuzzy membership is used to handle the problem of mixed pixels in classification of satellite image it is termedasfuzzy approach.

4 Framework for the Study

To verify the effectiveness and feasibilityof the fuzzy classification technique, we have conducted classification on high-resolution remote sensing images using the fuzzy method and the traditional per-pixel classification method.

The image under study is the LISS-III Sentinel image of Jawan, Tehsil Koil, District Aligarh, U.P., India, terrain with Latitude: 27°54′1.37′′ and Longitude: 78°42′20.21′′. The ortho-rectified image has 4 spectral bands 2, 3, 4 and 5 with a resolution of 10 × 10 m. The image was captured on September 22, 2016,and was procured from United States Geological Survey (USGS) [https://www.usgs.gov]. The dimensions of the image are 249 × 48 pixels (Fig. 2).

Fig. 2
figure 2

LISS-III Sentinel image of Jawan, Tehsil Koil., District Aligarh, U.P., India, terrain with Latitude: 27°54′1.37′′ and Longitude: 78°42′20.21′′

In this paper, fuzzy classifier has been suggested as the classifier of choice when dealing with mixed pixels in imagery. Its performance in terms of confusion matrix and misclassification error has been compared with that of the traditional per-pixel classifier.

4.1 Ground Truthing (Training data Collection)

On the parameter of the LISS-III digital image, a total of 3497 ground truth points were identified.Using GPS–MAP 64S Garmin, the Latitude and Longitude of these points were recorded.Theshapefiles of the surface covers were created using the software package ERDAS IMAGINE and the computational work of this study, has been carried out by using the software MATLAB.The ground truth data consists of a total of 3497 pixels from four different classes,namely water(587 pixels), vegetation(1153 pixels), bare soil(1280 pixels) and that of concrete (built-up area)(477 pixels).

4.2 Test and Training samples

Out of the total 3497 ground truth pixels, 30% from each class were kept in training data and the remaining 70% in test data,as shown in Table 1. Thus, a total of 1049 pixels and 2448 pixels were there in training and test samples, respectively.

Table 1 Surface cover wise Training and Test pixels distribution

4.3 Identification of Mixed Pixels

Mixed pixels can be well identified by analyzing the combination of membership grades;Tables 2 and 3 show the membership grades of all the training pixels of each class.The proportion of component cover classes in a pixel is found from the membership grades;this conforms to the real situation. The tables clearly indicates the presence of mixels in the training data. It has been shown in the following tables that in class 1, i.e. water no mixels were found whereas, in class 2, i.e. vegetation number of mixels are 287, in class 3 i.e. bare soil number of mixels are 1085, in class 4, i.e. concrete number of mixels are 402 respectively (Tables 4, 5).

Table 2 Membership grade of class 1 (Water) and class 2 (Vegetation)
Table 3 Membership grade of class 3 (Bare Soil) and class 4 (Concrete)
Table 4 Fuzzy and conventional means
Table 5 Probability of misclassification

4.4 Error Rates

It is very obvious that any classification method results in some misclassification probabilities and these misclassification probabilities play a vital role in assessing the performance of the classification method in future samples Johnson &Wichern [33].

$${\text{Error Rate }} = {\text{ Total no of wrongly classified pixels}}/{\text{Total no}}.{\text{ of pixels}}$$

The following Tables 6 and 7 showsthe confusion matrix by conventional methodand confusion matrix by fuzzy approach.In terms of area on the ground, the improvement is of 17 pixels or 1700 sq mt. area is correctly classified by using fuzzy approach.

Table 6 Confusion matrix by conventional method
Table 7 Confusion matrix by fuzzy approach

5 Results and Conclusion

The remote sensing literature extensively uses cluster analysis and discriminant analysis under the terms of unsupervised and supervised classification methods, respectively. The real data application of the methods poses challenges to the end-user in the form of higher error rates of classification of the imageries.

This study identified some of the problems of collecting ground truth data for supervised classification and the presence of mixed pixels and explore a fuzzy supervised classification method to overcome the problem of mixed pixels. It is found that the improvement through fuzzy approach over conventional LDF is improved by 2%, and QDF is improved by 1.4%, as shown in Table 5. In terms of area on the ground, the improvement is of 17 pixels or 1700 sq.mt. area. Tables 6 and 7 shows the confusion matrix by conventional method and confusion matrix by fuzzy approach.

Therefore, this study concludes that the parametric classification based on QDF provides an improvement over LDF, and the QDF based classification can further be improved by using a fuzzy approach for the mixed pixels present in an image.

Further research is needed to develop statistical techniques to identify analytically the mixed pixels and the membership grade in an image so, that a statistical model is developed for detecting and resolving the problem of mixed pixels available in a satellite imagery.