Abstract
Clustering can be considered to be the most important unsupervised learning technique to find homogeneous groups in a collection of a moderately large number of data points. Clustering could be defined as the process of dividing items into unknown number of groups whose members are alike in some way. A cluster is therefore a collection of items those are similar among themselves and are dissimilar to the items belonging to other clusters. It can be shown that there is no absolute "best" criterion which would be independent of the final aim of the clustering. Hence, the structure of the clusters should be finalized by the user depending on the physical requirements. By depending on the nature of analysis, clustering is called an unsupervised learning method and classification is called a supervised learning method.
Sections of this chapter draw from one of the authors’ published work, ‘Statistical Methods for Astronomical Data Analysis,’ authored by Asis Kumar Chattopadhyay and Tanuka Chattopadhyay, and published in 2014 by Springer Science\(+\)Business Media, New York.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
Keywords
- Clustering
- Distance
- Proximity
- Similarity
- Dissimilarity
- Linkage
- Classification
- Multivariate Normal
- Discriminator
- Linear discrimination function
8.1 Introduction
Under multivariate analysis, two very important techniques are clustering and classification. Under the problem of clustering, we try to find out the unknown number of homogeneous inherent groups in a data set as well as the structure of the groups. But under classification, the basic problem is discrimination of objects into some known groups. One of the most basic abilities of living creatures involves the grouping of similar objects to produce a classification. Classification is fundamental to most branches of science.
Cluster analysis has a variety of objectives. It is focussed on segmenting a collection of items (also called observations, individuals, cases, or data rows) into subsets such that those within each cluster are more closely related to one another than objects assigned to different clusters. The main focus in cluster analysis is on the notion of degree of similarity (or dissimilarity) among the individual objects being clustered. The two major methods of clustering are hierarchical clustering and k-means clustering. Most of the clustering methods are exploratory in nature and do not need any model assumption.
Different statistical techniques are available for clustering and classification (Fraix-Burnet et al. 2015; De et al. 2013 and references there in). But depending on the nature of the different types of data, several problems often arise and in some cases a proper solution is still not available.
Sometimes the data set under consideration has a distributional form (usually normal), and sometimes it is of non-normal nature. Based on the above point, there is a justification needed about which clustering or classification technique should be used so that it reflects the proper nature of the data set provided. This problem is more relevant for classification as most of the classification methods are model based. For clustering, most of the methods are nonparametric in nature and as such the above problem is not very serious. But here also basic assumption is that the nature of the variables under study is continuous, whereas under practical situations, these may be categorical like binary, nominal, ordinal, and even directional (particularly for environmental and astronomical data). Under such situations, standard similarity/dissimilarity measures will not work.
The clustering techniques which require an inherent model assumption are known as model-based methods, whereas the clustering technique where no modeling assumption or distributional form is needed may be termed as non-model-based methods. Hence based on the nature of data set, one has to decide about proper application of the two types of techniques.
At present, big data issues related to data size are quite common. In statistical terms, these problems may be tackled in terms of both the number of observations and the variables considered. Many standard clustering techniques fail to deal with such big data sets. Thus, some dimension reduction methods may be applied at first and then clustering may be performed on the reduced data set. Some data mining techniques are very helpful under such situations.
Finally and most importantly, after all these considerations, the similarity of grouping of objects obtained from different methods should be checked in terms of some physical properties.
8.2 Hierarchical Clustering Technique
There are two major methods of clustering, viz. hierarchical clustering and k-means clustering. In hierarchical clustering, the items are not partitioned into clusters in a single step. Instead, a series of partitions takes place, which runs from a single cluster containing all objects to n clusters each containing a single object. Hierarchical clustering is subdivided into agglomerative methods, which proceed by series of combinations of the n objects into groups, and divisive methods, which separate n objects successively into smaller groups. Agglomerative techniques are more commonly used. Hierarchical clustering may be represented by a two-dimensional diagram known as dendrogram which illustrates the additions or divisions made at each successive stage of analysis.
8.2.1 Agglomerative Methods
An agglomerative hierarchical clustering procedure produces a series of partitions of the data, \(G_n; G_{n - 1};\) :::::::; \(G_1.\) The first \(G_n\) consists of n single-object ‘clusters,’ and the last \(G_1\) consists of single group containing all n cases. The structure of the groups is not unique and depends on several factors like choice of the dissimilarity/similarity measure, choice of the linkage measure.
At each particular stage, the method adds together the two clusters which are most similar. At the first stage, we join together two objects that are closest together, since at the initial stage each cluster has only one object. Differences between methods arise because of the different ways of defining dissimilarity or similarity between clusters.
Hierarchical clustering is largely dependent on the selection of such a measure. A simple measure is Manhattan distance, equal to the sum of absolute distances for each variable. The name comes from the fact that in a two-variable case, the variables can be plotted on a grid that can be compared to city streets, and the distance between two points is the number of blocks a person would walk.
The most popular measure is Euclidean distance, computed by finding the square of the distance between each variable, summing the squares, and finding the square root of that sum. In the two-variable case, the distance is analogous to finding the length of the hypotenuse in a triangle. Besides Manhattan and Euclidian distances, there are other dissimilarity measures also based on the correlation coefficients between two observations on the basis of several variables.
Alternatively, one may use a similarity measure which is complementary in nature and under the normalized set up, it may be obtained by subtracting the dissimilarity measure from one.
8.2.2 Similarity for Any Type of Data
The above-mentioned dissimilarity/similarity measures are applicable to continuous-type data only. But generally, we work with mixed-type data sets those include different types like continuous, discrete, binary, nominal, ordinal. Gower (1971) has proposed a general measure as follows:
The Gower’s Coefficient of Similarity:
Two individuals i and j may be compared on a character k and assigned a score sijk. There are many ways of calculating sijk, some of which are described below.
Corresponding to n individuals and p variables, Gower’s similarity index \(S_{ij}\) is defined as
For continuous (quantitative) variables with values \(x_{1k}, x_{2k}, \ldots , x_{n k}\) for the kth variable
where \(R_k\) is the range of the variable k and may be the total range in population or the range in the sample.
For a categorical (qualitative) character with m categories (\(m = 2\) for binary variable)
8.2.3 Linkage Measures
To calculate distance between two clusters, it is required to define two representative points from the two clusters (Chattopadhyay and Chattopadhyay 2014). Different methods have been proposed for this purpose. Some of them are listed below.Footnote 1
Single linkage: One of the simplest methods is single linkage, also known as the nearest neighbor technique. The defining feature of the method is that distance between clusters is defined as the distance between the closest pair of objects, where only pairs consisting of one object from each cluster are considered.
In the single linkage method, \(d_{rs}\) is computed as \(d_{rs} =\) Min \(d_{ij}\), where object i is in cluster r and object j is in cluster s and \(d_{ij}\) is the distance between the objects I and j. Here the distance between every possible object pair (i, j) is computed, where object i is in cluster r and object j is in cluster s. The minimum value of these distances is said to be the distance between clusters r and s. In other words, the distance between two clusters is given by the value of the shortest link between the clusters. At each stage of hierarchical clustering, the clusters r and s, for which \(d_{rs}\) is minimum, are merged.
Complete linkage: The complete linkage, also called farthest neighbor, clustering method is the opposite of single linkage. Distance between clusters is now defined as the distance between the most distant pair of objects, one from each cluster. In the complete linkage method, \(d-{rs}\) is computed as \(d_{rs} =\) Max \(d_{ij}\), where object i is in cluster r and object j is cluster s. Here the distance between every possible object pair (i, j) is computed, where object i is in cluster r and object j is in cluster s and the maximum value of these distances is said to be the distance between clusters r and s. In other words, the distance between two clusters is given by the value of the largest distance between the clusters. At each stage of hierarchical clustering, the clusters r and s, for which \(d_{rs}\) is minimum, are merged.
Average linkage: Here the distance between two clusters is defined as the average of distances between all pairs of observations, where each pair is composed of one object from each group. In the average linkage method, \(d_{rs}\) is computed as \(d_{rs} = Trs / (Nr \times Ns)\) where Trs is the sum of all pair-wise distances between cluster r and cluster s. Nr and Ns are the sizes of the clusters r and s, respectively. At each stage of hierarchical clustering, the clusters r and s, for which \(d_{rs}\) is the minimum, are merged.
Minimax Linkage: This was introduced by Bien and Tibshirani (2011). For any point x and cluster G, define
as the distance to the farthest point in G from x. Define the minimax radius of the cluster G as
that is, find the point \(x \in G\) from which all points in G are as close as possible. This minimizing point is called the prototype for G. It may be noted that a closed ball of radius r(G) centered at the prototype covers all of G. Finally, we define the minimax linkage between two clusters G and H as
that is, we measure the distance between clusters G and H by the minimax radius of the resulting merged cluster.
8.2.4 Optimum Number of Clusters
Usually, the number of clusters is determined from the dendrogram and validated by the physical properties. We specify a horizontal line for a particular similarity/dissimilarity value, and the clusters below this line are selected as optimum. But some mathematical rules (thumb rules) are also available which are based on between cluster and within cluster sum of squares values. If we denote by k, the number of clusters and define by W(k) the sum of the within cluster sum of squares for k clusters then the values of W(k) will gradually decrease with increase in k and that ‘k’ may be taken as optimum where W(k) stabilizes. For detailed discussion, one may follow the link http://www.cc.gatech.edu/~hpark/papers/cluster_JOGO.pdf.
8.2.5 Clustering of Variables
The hierarchical clustering method can also be used for clustering of variables on the basis of the observations. Here instead of the distance matrix, one may start with the correlation matrix (higher correlation indicating similarity of variables). The linkage measures as listed in the previous section will not be applicable for variable clustering. In order to measure similarity/dissimilarity between two clusters of variables, one may either use the correlation between first principal components corresponding to the two clusters or the canonical correlations.
8.3 Partitioning Clustering-k-Means Method
The k-means algorithm (MacQueen 1967) assigns each point to the cluster whose center (also called centroid) is nearest. The center is the average of all the points in the cluster that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster. This method can be used for clustering of objects and not variables.
This method starts with a value of k. We will discuss later the method of selection of the value of k. Then we randomly generate k clusters and determine the cluster centers, or directly generate k seed points as cluster centers. Assign each point to the nearest cluster center in terms of Euclidian distance. Re-compute the new cluster centers. Repeat until some convergence criterion is met, i.e., there is no reassignment. The main advantages of this algorithm are its simplicity and speed which allows it to run on large data sets. Its disadvantage is that it is highly dependent on the initial choice of clusters. It does not yield the same result with each run, since the resulting clusters depend on the initial random assignments. It maximizes inter-cluster variance and minimizes intra-cluster variance.
The advantages of partitioning method are as follows (Chattopadhyay and Chattopadhyay 2014):
-
(a)
A partitioning method tries to select best clustering with k groups which is not the goal of hierarchical method.
-
(b)
A hierarchical method can never repair what was done in previous steps.
-
(c)
Partitioning methods are designed to group items rather than variables into a collection of k clusters.
-
(d)
Since a matrix of distances (similarities) does not have to be determined and the basic data do not have to be stored during the computer run, partitioning methods can be applied to much larger data sets.
For k-means algorithms, the optimum value of k can be obtained in different ways.
On the basis of the method proposed by Sugar and James (2003), by using k-means algorithm first determine the structures of clusters for varying number of clusters taking \(k = 2, 3, 4\), etc. For each such cluster formation, compute the values of a distance measure
which is defined as the distance of the \(x_k\) vector (values of the parameters) from the center \(c_k\) (which is estimated as mean value), p is the order of the \(x_k\) vector. Then the algorithm for determining the optimum number of clusters is as follows. Let us denote by \(d_k^\prime \) the estimate of \(d_k\) at the kth point which is actually the sum of within cluster sum of squares over all k clusters. Then \(d_k^\prime \) is the minimum achievable distortion associated with fitting k centers to the data. A natural way of choosing the number of clusters is plot \(d_k^\prime \) versus k and look for the resulting distortion curve. This curve is always monotonic decreasing. Initially, one would expect much smaller drops, i.e., a leveling off for k greater than the true number of clusters because past this point adding more centers simply partitions within groups rather than between groups.
According to Sugar and James (2003) for a large number of items the distortion curve when transformed to an appropriate negative power, will exhibit a sharp “jump” (if we plot k versus transformed \(d_k^\prime \)). Then calculate the jumps in the transformed distortion as
Another way of choosing the number of clusters is plot \(J_k\) versus k and look for the resulting jump curve. The optimum number of clusters is the value of k at which the distortion curve levels off as well as its value associated with the largest jump.
The k-means clustering technique depends on the choice of initial cluster centers (Chattopadhyay et al. 2012). But this effect can be minimized if one chooses the cluster centers through group average method (Milligan 1980). As a result, the formation of the final groups will not depend heavily on the initial choice and hence will remain almost the same according to physical properties irrespective of initial centers. In MINITAB package, the k-means method is almost free from the effect of initial choice of centers as they have used the group average method.
8.4 Classification and Discrimination
DiscriminantFootnote 2 analysis and classification are multivariate techniques concerned with separating distinct sets of objects and with allocating new objects to previously defined groups. Once the optimum clustering is obtained by applying the method discussed under previous section, one can verify the acceptability of the classification by computing classification/misclassification probabilities for the different objects. Although the k-means clustering method is purely a data analytic method, for classification it may be necessary to assume that the underlying distribution is multivariate normal. The method can be illustrated as follows for two populations (clusters). The method can be easily generalized for more than two underlying populations.
Let \(f_1(x)\) and \(f_2(x)\) be the probability density functions associated with the \(p \times 1\) random vector X for the populations \(\pi _1\) and \(\pi _2\) respectively. Let \(\Omega \) be the sample space, i.e., collection of all objects. Let us denote by x the observed value of X. Let R1 be that set of x values for which we classify objects as \(\pi _1\) and \(R_2 = \Omega R_1\) be the remaining x values for which we classify objects as \(\pi _2.\) Since every object must be assigned to one and only one of the two groups, the sets \(R_1\) and \(R_2\) are disjoint and exhaustive. The conditional probability of classifying an object as \(\pi _2\) when in fact it is from \(\pi _1\) (error probability) is,
Similarly, the other error probability can be defined. Let \(p_1\) and \(p_2\) be the prior probabilities of \(\pi _1\) and \(\pi _1\), respectively, \((p_1 + p_2 = 1).\) Then the overall probabilities of correctly and incorrectly classifying objects can be derived as
P (correctly classified as \(\pi _1\)) = P (Observation actually comes from \(\pi _1\) and is correctly classified as \(\pi _1) = P[X \in R_1 \mid \pi _2] p_2\).
P (misclassified as \(\pi _1) = P[X \in R_1 \mid \pi _2] p_2\).
The associated cost of misclassification can be defined by a cost matrix
Classified as | ||
---|---|---|
True population | \(\pi _1\) | \(\pi _2\) |
\(\pi _1\) | 0 | \(C(2 \mid 1)\) |
\(\pi _2\) | \(C(1 \mid 2)\) | 0 |
For any rule, the average or Expected Cost of Misclassification (ECM) is given by
A reasonable classification rule should have ECM as small as possible.
Rule: The regions \(R_1\) and \(R_2\) that minimize the ECM are defined by the value of x for which the following inequalities hold.
If we assume \(f_1(x)\) and \(f_2(x)\) are multivariate normal with mean vectors \(\mu _1\) and \(\mu _2\) and covariance matrices \(\Sigma _1\) and \(\Sigma _2\), respectively, then a particular object with observation vector \(x_0\) may be classified according to the following rule (under the assumption \(\Sigma _1 = \Sigma _2\))
Allocate \(x_0\) to \(\pi _1\) if
allocate \(x_0\) to \(\pi _2\) otherwise.
If we choose \(C(1 \mid 2) = C(2 \mid 1)\) and \(p_1 = p_2\), then the estimated minimum ECM rule for two Normal populations will be as follows:
Allocate \(x_0\) to \(\pi _1\) if
where \(m_1\) and \(m_2\) are sample mean vectors of the two populations and \(S_{\text {pooled}}\) is pooled (combined) sample covariance matrix. Allocate \(x_0\) to \(\pi _2\) otherwise. The LHS is known as the linear discriminant function. One can easily generalize the method for more than two groups.
8.5 Data
Example 8.5.1
The Fisher’s Iris data set is a multivariate data set introduced by Fisher (1936). It is also known as Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (Iris setosa (type-3), Iris versicolor (type-2), and Iris virginica (type-1)). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters (Table 8.1).
We have performed k-means clustering of the data on the basis of the first four variables, viz. sepal length, sepal width, petal length, and petal width. Choosing \(k = 3\), we have divided the 150 observations into three groups in order to verify whether we can identify three groups corresponding to three species. From columns 6 and 7, it is clear that k-means method has correctly identified Iris setosa (type-3) species for all the 50 cases, whereas there are some errors corresponding to types 1 and 2. For type 2, three cases and for type 1 fourteen cases had wrongly identified. The summary result for k-means clustering is given below:
Number of clusters: 3
Number of observations | Within cluster sum of squares | Average distance from centroid | Maximum distance from centroid | |
Cluster1 | 39 | 25.414 | 0.732 | 1.552 |
Cluster2 | 61 | 38.291 | 0.731 | 1.647 |
Cluster3 | 50 | 15.151 | 0.482 | 1.248 |
We have also performed linear discriminant analysis by considering types as the true groups.
Linear Method for Response: Type
Predictors: Sepal le Sepal wi Petal le Petal wi
Summary of Classification
Put into | ....True Group.... | ||
Group | 1 | 2 | 3 |
1 | 49 | 2 | 0 |
2 | 1 | 48 | 0 |
3 | 0 | 0 | 50 |
Total N | 50 | 50 | 50 |
Summary of Classification with Cross-validation
Put into | ....True Group.... | ||
Group | 1 | 2 | 3 |
1 | 49 | 2 | 0 |
2 | 1 | 48 | 0 |
3 | 0 | 0 | 50 |
Total N | 50 | 50 | 50 |
N Correct | 49 | 48 | 50 |
Proportion | 0.980 | 0.960 | 1.000 |
N \(=\) 150 N Correct \(=\) 147 Proportion Correct \(=\) 0.980
Squared Distance Between Groups
1 | 2 | 3 | |
1 | 0.000 | 17.201 | 179.385 |
2 | 17.201 | 0.000 | 89.864 |
3 | 179.385 | 89.864 | 0.000 |
Linear Discriminant Function for Group
1 | 2 | 3 | |
Constant | −103.27 | −71.75 | −85.21 |
Sepal le | 12.45 | 15.70 | 23.54 |
Sepal wi | 3.69 | 7.07 | 23.59 |
Petal le | 12.77 | 5.21 | −16.43 |
Petal wi | 21.08 | 6.43 | −17.40 |
Variable Pooled Means for Group
Mean | 1 | 2 | 3 | |
Sepal le | 5.8433 | 6.5880 | 5.9360 | 5.0060 |
Sepal wi | 3.0573 | 2.9740 | 2.7700 | 3.4280 |
Petal le | 3.7580 | 5.5520 | 4.2600 | .4620 |
Petal wi | 1.1993 | 2.0260 | 1.3260 | 0.2460 |
Variable Pooled StDev for Group
StDev | 1 | 2 | 3 | |
Sepal le | 0.5148 | 0.6359 | 0.5162 | 0.3525 |
Sepal wi | 0.3397 | 0.3225 | 0.3138 | 0.3791 |
Petal le | 0.4303 | 0.5519 | 0.4699 | 0.1737 |
Petal wi | 0.2047 | 0.2747 | 0.1978 | 0.1054 |
Pooled Covariance Matrix
Sepal le Sepal wi Petal le Petal wi
Sepal le 0.26501
Sepal wi 0.09272 0.11539
Petal le 0.16751 0.05524 0.18519
Petal wi 0.03840 0.03271 0.04267 0.04188
Here we see that only three observations are wrongly classified. The corresponding probabilities are given by
Observation | True Group | Pred Group | Group | Probability Predicted |
71 ** | 2 | 1 | 1 | 0.75 |
2 | 0.25 | |||
3 | 0.00 | |||
84 ** | 2 | 1 | 1 | 0.86 |
2 | 0.14 | |||
3 | 0.00 | |||
134 ** | 1 | 2 | 1 | 0.27 |
2 | 0.73 | |||
3 | 0.00 |
Example 8.5.2
The following data are related to a survey on environmental pollution level. The following variables were observed in suitable units at 111 selected places. The four variables under study were Ozone content, Radiation, Temperature, and Wind speed in some proper units. We have performed hierarchical clustering with Euclidian distance and single linkage. The data set as well as the cluster membership is shown in the following table.
The summary of results and the dendrogram are given below the table. By considering similarity level at 93, six clusters were found of which three (4, 5, and 6) may omitted as outliers containing 2, 1, and 1 observations. Hence clusters 1, 2, and 3 are the main clusters. Figures corresponding to radiation, temperature, wind speed, ozone content and H-cluster number of 111 places.
Number of main clusters: 3
Number of observations | Within cluster sum of squares | Average distance from centroid | Maximum distance from centroid | |
Cluster1 | 71 | 202337.219 | 48.851 | 101.003 |
Cluster2 | 12 | 5151.429 | 18.929 | 35.732 |
Cluster3 | 24 | 26269.208 | 30.505 | 58.654 |
Cluster Centroids
Variable | Cluster1 | Cluster2 | Cluster3 | Grand centroid |
Radiatio | 240.7606 | 123.9167 | 46.6250 | 184.8018 |
Temperat | 80.1831 | 73.5833 | 71.9167 | 77.7928 |
Wind spe | 9.6577 | 10.2583 | 11.5292 | 9.9387 |
Ozone Co | 49.2535 | 22.1667 | 17.7500 | 42.0991 |
The dendrogram of the pollution data is shown below. The centroids of the first three clusters are widely separated corresponding to all the variables; the 24 places falling in cluster 3 may be considered to be least polluted, whereas the 71 places falling in cluster 1 are most polluted (Fig. 8.1).
Notes
- 1.
A significant part of ‘Chattopadhyay and Chattopadhyay (2014). Statistical methods for Astronomical Data Analysis, Springer Series in Astrostatistics, Springer’ is reproduced in this part.
- 2.
A significant part of ‘Chattopadhyay and Chattopadhyay (2014). Statistical Methods for Astronomical Data Analysis, Springer Series in Astrostatistics, Springer’ is reproduced in this part.
References and Suggested Readings
Bien, J., & Tibshirani, R. (2011). Hierarchical clustering with prototypes via minimax linkage. Journal of the American Statistical Association, 106(495), 1075–1084.
Chattopadhyay, A. K., & Chattopadhyay, T. (2014). Statistical methods for astronomical data analysis., Springer series in astrostatistics New York: Springer.
Chattopadhyay, T., et al. (2012). Uncovering the formation of ultracompact dwarf galaxies by multivariate statistical analysis. Astrophysical Journal, 750, 91.
De, T., Chattopadhyay, T., & Chattopadhyay, A. K. (2013). Comparison among clustering and classification techniques on the basis of galaxy data. Calcutta Statistical Association Bulletin, 65, 257–260.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
Fraix-Burnet, D., Thuillard, M., & Chattopadhyay, A. K. (2015). Multivariate approaches to classification in extragalactic astronomy. Frontiers in Astronomy and Space Science, 2, 1–17.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, p. 281).
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45(3), 325–342.
Sugar, A. S., & James, G. M. (2003). Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association, 98, 750.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Mukherjee, S.P., Sinha, B.K., Chattopadhyay , A. (2018). Cluster and Discriminant Analysis. In: Statistical Methods in Social Science Research. Springer, Singapore. https://doi.org/10.1007/978-981-13-2146-7_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-2146-7_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2145-0
Online ISBN: 978-981-13-2146-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)