Abstract
The Neighborhood Expectation-Maximization (NEM) algorithm is an iterative EM-style method for clustering spatial data. Unlike the traditional EM algorithm, NEM has the spatial penalty term incorporated in the objective function. The clustering performance of NEM depends mainly on two factors: the choice of the spatial coefficient, which is used to weigh the penalty term; and the initial state of cluster separation, to which the resultant clustering is sensitive. Existing NEM algorithms usually assign an equal spatial coefficient to every site, regardless of whether this site is in the class interior or on the class border. However, when estimating posterior probabilities, sites in the class interior should receive stronger influence from its neighbors than those on the border. In addition, initialization methods deployed for EM-based clustering algorithms generally do not account for the unique properties of spatial data, such as spatial autocorrelation. As a result, they often fail to provide a proper initialization for NEM to find a good solution in practice. To that end, this paper presents a variant of NEM, called ANEMI, which exploits an adaptive spatial coefficient determined by the correlation of explanatory attributes inside the neighborhood. Also, ANEMI runs from the initial state returned by the spatial augmented initialization method. Finally, the experimental results on both synthetic and real-world datasets validated the effectiveness of ANEMI.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Tobler, W.R.: Cellular Geography, Philosophy in Geography. In: Gale, W.R., Olsson, W.R. (eds.) Reidel, The Netherlands (1979)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society B(39), 1–38 (1977)
Ambroise, C., Govaert, G.: Convergence of an EM-type algorithm for spatial clustering. Pattern Recognition Letters 19(10), 919–927 (1998)
Garey, M.R., Johnson, D.S., Witsenhausen, H.S.: The complexity of the generalized lloyd-max problem. TOIT 28(2), 255–256 (1980)
Ng, R., Han, J.: CLARANS: A method for clustering objects for spatial data mining. TKDE 14(5), 1003–1016 (2002)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer 32(8), 68–75 (1999)
Estivill-Castro, V., Lee, I.: Fast spatial clustering with different metrics and in the presence of obstacles. In: ACM GIS. (2001) 142 – 147
Tung, A.K.H., Hou, J., Han, J.: Spatial clustering in the presence of obstacles. In: ICDE, pp. 359–367 (2001)
Guo, D., Peuquet, D., Gahegan, M.: Opening the black box: Interactive hierarchical clustering for multivariate spatial patterns. In: ACM GIS, pp. 131–136 (2002)
Legendre, P.: Constrained clustering. In: Legendre, P., Legendre, L. (eds.) Developments in Numerical Ecology. NATO ASI Series G 14, pp. 289–307 (1987)
Rasson, J.P., Granville, V.: Multivariate discriminant analysis and maximum penalized likelihood density estimation. J. Royal Statistical Society B(57), 501–517 (1995)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. TPAMI 6, 721–741 (1984)
Solberg, A.H., Taxt, T., Jain, A.K.: A markov random field model for classification of multisource satellite imagery. IEEE Trans. Geoscience and Remote Sensing 34(1), 100–113 (1996)
Congdon, P.: A model for non-parametric spatially varying regression effects. Computational Statistics & Data Analysis 50(2), 422–445 (2006)
Hammersley, J.M., Clifford, P.: Markov fields on finite graphs and lattices, unpublished manuscript (1971)
Neal, R., Hinton, G.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer Academic Publishers, Dordrecht (1998)
Hathaway, R.J.: Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters 4, 53–56 (1986)
Shekhar, S., Chawla, S.: Spatial Databases: A Tour. Prentice-Hall, Englewood Cliffs (2002)
Tou, J.T., Gonzalez, R.C.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)
Katsavounidis, I., Kuo, C., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
LeSage, J.P.: MATLAB Toolbox for Spatial Econometrics (1999), http://www.spatial-econometrics.com
Dang, V.M.: (1998), http://www.hds.utc.fr/~mdang/Progs/prognem.html
Pernkopf, F., Bouchaffra, D.: Genetic-based EM algorithm for learning gaussian mixture models. TPAMI 27(8), 1344–1348 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, T., Xiong, H., Gong, X., Sung, S.Y. (2008). ANEMI: An Adaptive Neighborhood Expectation-Maximization Algorithm with Spatial Augmented Initialization. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)