Abstract
In some classification problems it may be important to impose constraints on the set of allowable solutions. In particular, in regional taxonomy, urban and regional studies often try to segment a set of territorial data in homogenous groups with respect to a set of socio-economic variables taking into account, at the same time, contiguous neighbourhoods. The objects in a class are thus required not only to be similar to one another but also to be part of a spatially contiguous set. The rationale behind this is that if a spatially varying phenomenon influences the objects, as could occur in the case of geographical units, and this spatial information were ignored in constructing the classes then it would be less likely to be detected. In this paper a constrained version of thek-means clustering method (MacQueen, 1967; Ball and Hall, 1967) and a new algorithm for devising such a procedure are proposed; the latter is based on the efficient algorithm proposed by Hartigan and Wong (1979). This algorithm has proved its usefulness in zoning two large regions in Italy (Calabria and Puglia).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Anania G, Cersosimo D, Costanzo GD (2001) Le Calabrie contemporanee. Un'analisi delle caratteristiche degli ambiti economico produttivi sub-regionali. In: Scelte pubbliche, strategie private e sviluppo economico in Calabria. Conoscere per Decidere, Rubbettino, Soveria Mannelli, 333–380
Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behavioural, Science12, 153–155
Batagelj V (1984) Agglomerative methods in clustering with constrains. Preprint Series Dept. Math. Univ. Ljublijana22 (102), 5–19
Caliñski T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics3, 1–27
Christofides N (1975) Graph Theory. Academic Press, London.
Cressie NAC (1993) Statistics for spatial data. Wiley, New York
De Soete G, DeSarbo WS, Furnas GW, Carrol JD (1984) The estimation of ultrametric and path trees from rectangular proximity data. Psychometrika49, 289–310
De Soete G, Carrol JD (1994)K-means clustering in a low-dimensional Euclidean space. In: Diday E et al. (eds.) New approaches in classification and data analysis, pp. 212–219. Springer, Berlin Heidelberg New York
DeSarbo WS, Mahajan V (1984) Constrained classification: the use of a priori information in cluster analysis. Psychometrika49, 187–215
Ferligoj A, Batagelj V (1982) Clustering with relational constraint. Psychometrika47, 413–426
Ferligoj A, Batagelj V (1983) Some types of clustering with relational constraint. Psychometrika48, 541–522.
Ferligoj A, Batagelj V (1992) Direct multicriteria clustering algorithms. Journal of Classification9 (1), 43–61
Ferligoj A, Batagelj, V (1998) Constrained clustering problems. In: Proceedings of IFCS '98, Rome, 541–522
Ferligoj A, Batagelj V (2000). Clustering relational data. In: Gaul W, Opitz O., Schader M (eds.) Data analysis, Springer, Berlin heidelberg New York, 3–15
Gordon AD (1973) Classifications in the presence of constraints. Biometrics29, 821–827
Gordon AD (1980) Methods of constrained classification. In: Tomassone R (ed.) Analyse de données et informatique. (INRIA, Le Chesnay), 149–160.
Gordon AD (1999) Classification. Chapmann & Hall, London
Gordon AD (1987) Parsimonious trees. Journal of Classification4, 85–101
Gordon AD (1996) A survey of constrained classification. Computational Statistics & Data Analysis21, 17–29
Gordon AD (1996) (a). How many clusters? An Investigation of five procedures for detecting nested cluster structure. In: Hayashi C et al. (eds.) Data science, classification, and related methods. Berlin Heidelberg New York, Springer, 109–116
Gordon AD, Vichi M (2001) Fuzzy partition models for fitting a set of partitions.Psychometrika 66, 229–248
Harary F (1969) Graph theory Addison-Wesley, Reading, MA
Hartigan JA (1975) Clustering algorithms Wiley, New York
Hartigan JA, Wong MA (1979) Algorithm AS 136: Ak-means clustering algorithm. Applied Statistics28 (1), 100–108
Hubert LJ (1974) Some applications of graph theory to clustering. Psychometrika39 (3), 283–308
Lebart L (1978) Programme d'agrégation avec contraintes. Le Cahiers de l'Analyse des Données3, 275–287
Lechevallier Y (1980) Classification sous contraintes. In: Diday E et al. (eds.) Optimisation en classification automatique INRIA, Paris, 677–696
Lefkovitch LP (1980) Conditional clustering. Biometrics36, 43–58
Legendre P (1987) Constrained clustering. In: Legendre P et al. (eds.) Developments in numerical ecology. Springer, Berlin Heidelberg New York
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: LeCam LM et al. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematic, Statistics and Probability, vol. 1, Statistics, University of California Press, Berkeley, CA, 281–298
Maravalle M, Simeone B, Naldini, R (1997). Clustering on trees. Computational Statistics & Data Analysis24, 217–234
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika50, 159–179
Mills G (1967) The determination of local government boundaries. Operational Research Quarterly18, 243–255
Monestiez P (1977) Méthode de classification automatique sous contraintes spatiales. Statistique et Analyse des Données3, 75–84
Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Computer Journal28, 82–88
Openshaw S (1977) A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transaction of the Institute of British Geographers52, 247–258
Seber GAF (1984) Multivariate observations, Wiley, New York
Späth H (1980) Cluster analysis algorithms. Ellis Horwood, Chichester
Taylor PJ (1973) Some implications of the spatial organizations of elections. Transaction of the Institute of British Geographers60, 121–136
Upton G, Fingleton B (1985) Spatial data analysis by example, vol. 1, Wiley, New York
Vicari D (1990) Indici per la scelta del numero dei gruppi. Metron49, 473–492
Webster R (1977) Quantitative and numerical methods in soil classification and survey. Clarendon Press, Oxford New York
Wilson RJ (1996) Introduction to graph theory. Addison Wesley Longman, England
Zani S (1993) Classificazione di unità territoriali e spaziali. In: Zani S (ed.) Metodi statistici per le analisi territoriali. Franco Angeli, Milano, 93–121
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Damiana Costanzo, G. A constrainedk-means clustering algorithm for classifying spatial units. Statistical Methods & Applications 10, 237–256 (2001). https://doi.org/10.1007/BF02511650
Issue Date:
DOI: https://doi.org/10.1007/BF02511650