A New Vision Inspired Clustering Approach

Jin, Dequan; Huang, Zhili

doi:10.1007/978-3-642-38466-0_15

Dequan Jin^3,4 &
Zhili Huang⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 256))

2175 Accesses
1 Citations

Abstract

In this paper, a new clustering approach by simulating human vision process is presented. Human is good at detecting and segmenting objects from the background, even when these objects have not been seen before, which are clustering activities in fact. Since human vision shows good potential in clustering, it inspires us that reproducing the mechanism of human vision may be a good way of data clustering. Following this idea, we present a new clustering approach by reproducing the three functional levels of human vision. Numeric examples show that our approach is feasible, computationally stable, suitable to discover arbitrarily shaped clusters, and insensitive to noises.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Dominant Set Based Data Clustering and Image Segmentation

Performance Assessment for Clustering Techniques for Image Segmentation

Utilizing Structure-Rich Features to Improve Clustering

Keywords

15.1 Introduction

Clustering is primitive in human learning activities [1]. All the unsupervised learning activities can be considered as clustering activities [2]. The most fundamental human clustering activity may be in human vision, for instant, in object detection or segmentation, which are elementary visual activities. Human vision is highly complex. A simple visual activity may involve a lot of neuronal structures. If we just investigate one or several ones of these structures, it may be not enough to illustrate the visual activity. But if we consider the whole visual system that includes most of these coupled structures, it may become too complicated to be analyzed. Instead of considering individual neuronal structure, some investigations on visual perception indicates a feasible way to avoid these difficulties, which reproduce the functional levels of human vision, each of which might be structurally distributed [3, 4].

In this paper, a new clustering approach is presented, by considering data as an image and clustering them by three-level neural field system for visual perception. Method for determining the range of excited regions in the activation distribution of neural field is also introduced.

15.2 Levels for Human Vision

Generally speaking, the whole procedure of human vision contains three functional levels at least, including the transfer level, the planning level, and the motor control level.

15.2.1 Transfer Level

In this level, eyes and some low-level neuronal structures are involved. Eyes act as sensors, whose main task is to accept the light intensity distribution and transform them into neural signals distributions. The visual information may be subjected to some spatial and temporal transformations in this level induced by retina and some neuronal structures [4].

Suppose the objects are static in visual field. Then the light intensity distribution is usually presented as a static image that consists of N light points $ \left\{ {{\text{x}}_{i} } \right\}_{i = 1}^{N} $, which can be described as [5]

$$ I\left( z \right) = \frac{1}{N}\sum\limits_{i = 1}^{N} {\delta \left( {z - x_{i} } \right)} $$

(15.1)

where

$$ \delta \left( {z - x_{i} } \right) = \mathop{\lim }\limits_{\sigma \to 0} g\left( {z,\sigma } \right) $$

(15.2)

$ g\left( {z,\sigma } \right) $ is a Gaussian function

$$ g\left( {z,\sigma } \right) = \exp \left( { - \frac{{\left\| z \right\|^{2} }}{{\sigma^{2} }}} \right) $$

(15.3)

The visual information I(z) may subject to some filtering effects in the transfer level, the most significant one of which in object detection and segmentation is the blurring effect inducing by the retina, which is usually described as the Gaussian filtering process. Then the output neural signal distribution $ S\left( z \right) $ is given by

$$ S\left( z \right) = S\left( {z,\sigma_{in} } \right) = \int_{\Upomega } {I\left( {z - z^{\prime}} \right)} g\left( {z,\sigma_{in} } \right)dz^{\prime} $$

(15.4)

$ \sigma_{in} $ is a scale parameter that can be understood as the distance between object and eye, or the curvature of crystalline lens [6]. If light points are too close that under the resolution of eyes, they cannot be identified in $ S\left( z \right) $ individually. $ S\left( z \right) $ is the output of the transfer level, as well as the input of the planning level.

15.2.2 Planning Level

In this level, the input neural signal $ S\left( z \right) $ would be processed by neurons. A visual perception is presented in the form of activation distribution of these neurons.

There are many neural models in describing the activity of neurons. A popular one of them is the Amari’s dynamical neural field [7]:

$$ \tau \dot{u}\left( {z,t} \right) = - u\left( {z,t} \right) + \int_{\Upomega } {w\left( {z,z^{\prime}} \right)} \theta \left( {u\left( {z^{\prime},t} \right)} \right)dz^{\prime} + S\left( z \right) - h $$

(15.5)

The vector space $ \Upomega $ is called perceive space. $ \tau $ is a positive time constant. h is the resting level parameter. The region $ \left\{ {z \in \Upomega :u\left( {z,t} \right) > 0} \right\} $ is called excited region, denoting the activated neurons. The excited region usually corresponds to a perceived pattern. $ \theta \left( u \right) $ is a monotonically increasing nonlinear threshold function satisfying that $ \mathop{\lim }\limits_{u \to - \infty } \theta \left( u \right) = 0 $ and $ \mathop{\lim }\limits_{u \to + \infty } \theta \left( u \right) = 1 $, for instant, the step function. It describes the neural field feedback of each excited point to its neighboring positions in $ \Upomega $ with an interaction strength which is determined by interaction function $ w\left( {z,z^{\prime}} \right) $.

Mostly, the interaction function $ w\left( {z,z^{\prime}} \right) $ is isotropic and usually written as $ w\left( {z - z^{\prime}} \right) $. In this case, $ w\left( z \right) $ is also called as the interaction kernel of the neural field. Approximating the neurophysiologic lateral interaction among neurons, the lateral interaction of neural field is usually assumed to be locally exciting and globally inhibiting. One of the typical interaction kernels is the difference of Gaussian (DoG) functions with constant inhibition, given by

$$ w\left( z \right) = Ag\left( {z,\sigma } \right) - Bg\left( {z,\gamma \sigma } \right) - h_{\ker } $$

(15.6)

where $ \gamma > 1 $.

There are three important types of stable solutions to system (15.5), which are $ \phi $-solution, “bubble”-solution and $ \infty $-solution:

1.
An equilibrium solution $ u^{ * } \left( z \right) $ is called $ \phi $-solution if $ u^{ * } \left( z \right) \le 0 $ for all $ z \in \Upomega $;
2.
An equilibrium solution $ u^{ * } \left( z \right) $ is called “bubble”-solution if $ u^{ * } \left( z \right) > 0 $ for z in a subset $ D \subset \Upomega $;
3.
An equilibrium solution $ u^{ * } \left( z \right) $ is called $ \infty $-solution if $ u^{ * } \left( z \right) > 0 $ for all $ z \in \Upomega $.

15.2.3 Motor Control Level

Motor control level aims at sending out control signals to specific organs, for instant, eyes, according to the neuron activity $ u\left( {z,t} \right) $ which is the output the planning level. For different purposes, this level would have different descriptions. For instant, in the investigation on saccadic motor planning, to control eyes to stare at an object in visual field, let $ z^{ * } = \frac{P}{M} $, where $ P = \int\nolimits_{\mathbb{R}} {z\theta \left( {u^{ * } \left( z \right)} \right)} dz $ and $ M = \int\nolimits_{\mathbb{R}} {\theta \left( {u^{ * } \left( z \right)} \right)} dz $, then $ z^{ * } $ is the density center of activation distribution which corresponds to the center of object [4].

15.3 Clustering Approach Based on Vision

Since human vision shows good potential in clustering, it is possible to find a feasible clustering approach by simulating visual mechanisms. In this section, we present a new clustering approach by reproducing the three levels of vision. Some numeric examples are given to show the feasibility and advantages of our approach.

15.3.1 Transfer Level for Clustering

In this level, the first thing is to transform data set $ {\text{X = }}\left\{ {x_{i} \in {\mathbb{R}}^{n} :i = 1,2, \ldots ,N} \right\} $, to image which can be accepted by visual system by

$$ I\left( z \right) = \frac{1}{N}\sum\limits_{i = 1}^{N} {\delta (z - x_{i} )} $$

(15.7)

In this way, we obtain a data distribution $ I\left( z \right) $.

The data distribution $ I\left( z \right) $ is transformed into neural input distribution $ S\left( z \right) $ by a Gaussian filtering process:

$$ S\left( z \right) = S\left( {z,\sigma_{in} } \right) = \int_{\Upomega } {I\left( {z - z^{\prime}} \right)} g\left( {z,\sigma_{in} } \right)dz^{\prime} $$

(15.8)

15.3.2 Planning Level for Clustering

The aim of this level is to discover clusters in data, i.e., produce a perception of clusters, according to the neural signal distribution $ S\left( z \right) $.

Since Amari’s model achieves successes in illustrating phenomena in visual perception, we also employ it in the planning level for clustering, which is given by (15.5):

$$ \tau \dot{u}\left( {z,t} \right) = - u\left( {z,t} \right) + \int_{\Upomega } {w\left( {z,z^{\prime}} \right)} \theta \left( {u\left( {z^{\prime},t} \right)} \right)dz^{\prime} + S\left( z \right) - h $$

(15.9)

where $ \Upomega \subset {\mathbb{R}}^{n} $.

As soon as $ S\left( z \right) $ being transferred to the planning level, the neural field begins to evolve, until the field reaches its steady state $ u^{ * } \left( z \right) $. Several bubbles, i.e., excited regions may be sustained in $ u^{ * } \left( z \right) $, whose number and range generally depends on the input $ S\left( z \right) $, the kernel $ w\left( z \right) $ and the resting level h of the neural field. By grouping the data located in the same connected excited region into a cluster, the clusters of the data set X are perceived.

15.3.3 Motor Control Level for Clustering

In visual perception, the motor control level sends out control signal based on the activation distribution $ u^{ * } \left( z \right) $ given by the planning level, so that people can react based on their perception in the planning level, corresponding to outside visual stimulus. In the motor control level, we introduce some methods to point out the range of connected excited regions in $ u^{ * } \left( z \right) $.

When a connected excited region in $ u^{ * } \left( z \right) $ is convex, its range is equal to the attraction domain of a corresponding equilibrium point of the gradient dynamical system

$$ \frac{{d{\text{z}}}}{dt} = \nabla u*\left( z \right) $$

(15.10)

By estimating its corresponding attraction domain, we can estimate the range of a connected excited region. A feasible way for estimating the domain of attraction of such a system is presented in [8], which employs an iterative expansion approach. Details can be seen in [8].

15.4 Algorithm and Examples

On the basis of the above strategies, for a data set $ {\text{X}} = \left\{ {x_{i} \in {\mathbb{R}}^{n} :i = 1,2, \ldots ,N} \right\} $, we present a clustering approach as following:

1.
Select a scale $ \sigma_{in} > 0 $, and the interaction kernel $ w\left( z \right) $. Compute the signal distribution $ S\left( z \right) $ by (15.8);
2.
Let $ u\left( {z,0} \right) = - h $ for $ z \in \Upomega $. Compute the steady state $ u^{ * } \left( z \right) $ of system (15.9). If there are m excited regions, take all the data points in the same connected excited region into a cluster, denoting by $ C_{j} $, $ j = 1,2, \ldots ,m $. If there are unlabeled data points, go to step 3; else, let M = m, go to step 4.
3.
If the unlabeled data points locate in excited regions, then group them to the clusters corresponding to these excited region; else, if the unlabeled data points locate in some peaks with negative activation, group the data locating in the same peaks into new clusters $ C_{m + i} ,\;i = 1,2, \ldots ,\tilde{m} $. Let $ M = m + \tilde{m} $.
4.
Let $ Sc = \left\{ {C_{j} } \right\}_{j = 1}^{M} $, then Sc is the clustering result.

To show the feasibility of our approach, we give some numeric examples as shown in Figs. 15.1 and 15.2. The kernel $ w\left( z \right) $ is given by (15.6). Parameters are given as $ \sigma = 0.07,\;\gamma = 1.1,\;\tau = 0.1 $, A = 1.2 and B = 0.1, $ h_{\ker } = 0.002 $, h = 0.02.

As shown in these examples, it can be seen that our approach has some advantages, for instant, our approach doesn’t require the number of clusters and specific learning step, is suitable to discover clusters with arbitrary shape. Noises and isolated points are easy to be identified in the above results.

Since convolution is involved in Amari’s neural field, which consumes a lot of computation, the computational time expense of this approach is high. As a result, limited by current computer technology, this approach cannot deal with high dimensional clustering problems efficiently. However, these Numeric examples show that this approach has high accuracy in clustering and anti-noise ability. Moreover, these clustering results are highly close to human cognition. So this approach shows potential especially when breakthroughs in computer technology like quantum computer are made in future.

15.5 Conclusion

In this paper, we present a new clustering approach inspired from human vision. By reproducing the mechanism of the three functional levels of human vision, we present a new clustering approach. This approach is biologically plausible, robust to noises and suitable to discover arbitrary shaped clusters. To show the feasibility of our approach, some numeric examples are given.

Nevertheless, our approach is an attempt. Our approach is suitable for all kinds of data sets theoretically, but it relies on a neural field which contains a convolution on its right hand side, which would consume much computer time.

References

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Giese MA (1999) Dynamic neural field theory for motion perception. Klwer Academic Publishers, Norwell
Book Google Scholar
Kopecz K, Schoner G (1995) Saccadic motor planning by integrating visual information and pre-information on neural dynamic fields. Biol Cybern 73(1):49–60
Article MATH Google Scholar
Leung Y, Zhang J, Xu Z (2000) Clustering by scale-space filtering. IEEE Trans Pattern Anal Mach Intell 22(12):1396–1410
Article Google Scholar
Xu Z, Meng D, Jing W (2005) A new approach for classification: visual simulation point of view. Lecture notes in computer science advances in neural networks—proceedings of international symposium on neural networks, ISNN 2005, Part II, Chongqing, pp 1–7
Google Scholar
Amari S (1977) Dynamics of pattern formation in lateral-inhibition type neural filed. Biol Cybern 27(2):77–87
Article MathSciNet MATH Google Scholar
Jin D, Peng J (2009) A new approach for estimating the attraction domain for Hopfield-type neural networks. Neural Comput 21(1):101–120
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This paper is sponsored by the Scientific Research Foundation of Guangxi University (Grant No. XBZ120366) and supported by NSFC, Tian Yuan Special Foundation, Project No. 11226141.

Author information

Authors and Affiliations

School of Mathematics and Information Science, Guangxi University, No. 100 Daxue Road, Nanning, Guangxi, China
Dequan Jin
School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 West Xianning Road, Xi’an, Shaanxi, China
Dequan Jin
School of Mechanical Engineering, Guangxi University, No. 100 Daxue Road, Nanning, Guangxi, China
Zhili Huang

Authors

Dequan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dequan Jin .

Editor information

Editors and Affiliations

, Department of Computer Science, Tsinghua University, Qinghua, Beijing, 100084, China, People's Republic
Zengqi Sun
Tsinghua University, Qinghua, Beijing, 100084, China, People's Republic
Zhidong Deng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, D., Huang, Z. (2013). A New Vision Inspired Clustering Approach. In: Sun, Z., Deng, Z. (eds) Proceedings of 2013 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38466-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-38466-0_15
Published: 28 June 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38465-3
Online ISBN: 978-3-642-38466-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics