Keywords

1 Introduction

Clustering [1, 2] can be considered as a process of partitioning a dataset of different objects into a meaningful group of similar objects. Several methods can be used for successful clustering, namely partitioning methods, fuzzy clustering, model-based clustering, hierarchical clustering, density-based clustering, etc. Among these methods, the K-means [3] performs very well. However, it requires an a priori knowledge about the present number of clusters in the dataset. Rather, automatic clustering is intended to discover the optimal number of clusters from a dataset without having the prior knowledge about the number of clusters, which may be useful during the segmentation and classification purpose. Basically, this requirement has dragged the researchers into the field of automatic clustering techniques.

Nature-inspired metaheuristic algorithms are extensively used for solving both simple and complex optimization problems. These types of algorithms are able to provide an optimal or near optimal solution within a minimum time frame. Some well-known nature-inspired metaheuristic algorithms include genetic algorithm (GA) [4], differential evolution (DE) [5], particle swarm optimization (PSO) [6], ant colony optimization (ACO) [7], bat algorithm [8], to name a few. Some nature-inspired metaheuristic algorithms have been successfully applied for automatic clustering techniques [10]. In spite of having such ability to find out a solution from a population, these algorithms sometimes may suffer from premature convergence. In order to overcome this problem, researchers have intended to incorporate the features of quantum computing within the nature-inspired metaheuristic algorithms. Nowadays, quantum-inspired techniques have been a flourishing research area. The field of quantum computation has emerged as a new computing paradigm of engineering insisting on quantum mechanical effects to solve computational problems [11]. Some quantum-inspired techniques merged with metaheuristic algorithms have been developed so far, and the betterment of the quantum version over the classical version of the same algorithms has been presented in terms of convergence, better fitness value and other parameters [1113].

In this paper, a quantum-inspired metaheuristic algorithm has been introduced for solving the purpose of automatic clustering of image datasets. In this regard, the classical bat algorithm has been chosen as the metaheuristic algorithm. The superiority of the quantum-inspired bat algorithm over its classical counterpart has been presented and judged on the basis of the computed fitness value of the mean, standard deviation, standard error and the computational time. Finally, a statistical superiority test, namely the unpaired t-test, has been performed and the corresponding p value has been shown to prove that the outcome is in favor of the quantum-inspired method. During the computation, the fitness has been computed using the cluster validity index—the DB index [14].

This rest of the paper is organized as follows. The overview of the bat algorithm and quantum computing has been summarized in Sects. 2 and 3, respectively. Section 4 describes the functionality of the cluster validity index (DB index) [14]. The basic steps of the proposed quantum-inspired bat algorithm have been demonstrated in Sect. 5. The experimental results and analysis have been presented in Sect. 6. Finally, the conclusion has been drawn in Sect. 7.

2 Overview of Bat Algorithm

A nature-inspired metaheuristic algorithm, viz., the bat algorithm, has been introduced by Yang in the year 2010 [8]. The bat algorithm is a population-based algorithm and can be considered as an efficient algorithm to discover the optimal solution from a complex optimization problem. In order to get the optimal results, bats use echolocation, by which they are able to sense the prey, avoid the obstacles and recognize their roosting crevices in the dark. The procedure of hunting strategies of the bats can be summarized as follows.

Initially, it is considered that n number of bats are flying randomly in the search space from the position Pi with the velocity Vi, frequency Fmin, pulse rate ri and loudness L0 at time stamp t. At time t + 1, their next position and velocity should be changed and can be calculated as follows.

$$ F_{i} = F_{\text{min }} + (F_{\text{min }} - F_{\text{max }} )\beta $$
(1)
$$ V_{i}^{t + 1} = V_{i}^{t} + (P_{i}^{t + 1} - P_{\text{best}} )F_{i} $$
(2)
$$ P_{i}^{t + 1} = P_{i}^{t} + V_{i}^{t + 1} $$
(3)

where \( \beta \in [0, 1] \). The minimum and the maximum frequency can be considered as \( F_{\text{min }} = 0 \) and \( F_{\text{max }} = 2. \) Here, \( P_{\text{best}} \) represents the global best position of the bat from the population of n bats which has been chosen after evaluating the fitness of all solutions to the population. Here, a uniformly distributed random number between [0, 1] has been considered as rand1. The new location can be obtained by using the following equation.

$$ P_{\text{new}} = P_{\text{old}} + \varepsilon L^{t} $$
(4)

where \( \varepsilon \in [ - 1,1] \) and \( L^{t} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {L_{i}^{t} } \) are the average loudness of n bats at the time stamp t. The pulse rate(ri) and the loudness \( (L_{i} ) \) of the bats will be increased and decreased, respectively, by the following equations.

$$ r_{i}^{t + 1} = r_{i}^{0} (1 - {\text{e}}^{{ -\upgamma{\text{t}}}} ) $$
(5)
$$ L_{i}^{t + 1} = \alpha L_{i}^{t} $$
(6)

where \( \gamma (0 < \gamma < 1) \) and \( \alpha (0 < \alpha < 1) \) are constants and the initial pulse rate can be represented as \( r_{i}^{0} \in [0,1] \). Then, the fitness of all the bats will be evaluated and the current \( P_{\text{best}} \) will be selected.

3 Overview of Quantum Computing

In case of quantum computing (QC), the quantum bit (qubit) can be considered as a unit of information like a classical bit. In case of classical bit, a single binary bit can be able to hold the value 0 or 1 but in contrast a single qubit can hold the value 0 and 1 at the same time and that is referred as quantum superposition states [1113]. The benefit of using qubit is the fastest execution of process due to the coherent nature of qubits as they are able to execute multiple processes simultaneously. The superposition of the basis states in QC can be described as

$$ \begin{aligned} \left| \varPsi \right\rangle & = \sum\limits_{i = 1}^{n} {{\mathcal{C}}_{i} \left| {{\mathcal{V}}_{i} } \right\rangle } \\ \left| \varPsi \right\rangle & = {\mathcal{C}}_{1} \left| {{\mathcal{V}}_{1} } \right\rangle + {\mathcal{C}}_{2} \left| {{\mathcal{V}}_{2} } \right\rangle + \cdots + {\mathcal{C}}_{n} \left| {{\mathcal{V}}_{n} } \right\rangle \\ \end{aligned} $$
(7)

where \( {\mathcal{V}}_{i} \) refers to the ith states and \( {\mathcal{C}}_{i} \text{ } \in {\mathbb{C}} \). In case of a two-state quantum bit, Eq. (7) can be rewritten as \( \left| \varPsi \right\rangle = {\mathcal{C}}_{1} \left| 0 \right\rangle + {\mathcal{C}}_{2} \left| 1 \right\rangle . \) The state \( \left| 0 \right\rangle \) is known as the “ground state,” and the state \( \left| 1 \right\rangle \) is known as “excited state”; \( {\mathcal{C}}_{i} \) is a complex number, and the sufficient condition for quantum orthogonality is represented as follows.

$$ \sum\limits_{i = 1}^{n} {{\mathcal{C}}_{i}^{2} } = 1 $$
(8)

Coherence and decoherence are two striking features of quantum computing. In Eq. (7), \( \left| \varPsi \right\rangle \) defines a linear superposition of the basis states and can be defined in terms of coherence. A forceful destruction of the above-mentioned linear superposition is referred to as decoherence. A quantum mechanical phenomenon, namely quantum entanglement, establishes a unique connection between the existing quantum systems by which the entangled qubit states are able to accelerate so that the computational capability may increase to a very large extent.

Some well-known quantum gates are NOT gate, C-NOT gate, rotation gate, Hadamard gate, Pauli –X gate, Pauli –Y gate, Pauli –Z gate, Toffoli gate, controlled phase shift gate, Fredkin gate, to name a few.

A rotation gate is responsible for updating the ith qubit value of \( (\alpha_{i} ,\beta_{i} ) \) and can be represented as follows

$$ \left( {\begin{array}{*{20}c} {\alpha_{i}^{{\prime }} } \\ {\beta_{i}^{{\prime }} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}l} {\cos \theta_{i} } \hfill & { - \sin \theta_{i} } \hfill \\ {\sin \theta_{i} } \hfill & {{ \cos }\theta_{i} } \hfill \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\alpha_{i} } \\ {\beta_{i} } \\ \end{array} } \right) $$
(9)

Here, for each qubit, \( \theta_{i} \) represents the rotation angle. For a specific problem, the rotation angle is chosen accordingly.

4 Cluster Validity Index

In this paper, the Davies–Bouldin (DB) index [14] has been used as a cluster validity index. The minimum value of DB index is the criteria of discovering the optimal number of clusters from a dataset. Let us consider two different clusters are \( C_{i} \) and \( C_{j} \) and a cluster similarity measure is \( {\text{CSM}}_{ij} \). Therefore, the cluster similarity measure \( {\text{CSM}}_{ij} \) can be defined as follows.

$$ {\text{CSM}}_{ij} = \frac{{{\text{DM}}_{i} + {\text{DM}}_{j} }}{{d_{ij} }} $$
(10)

Here, \( {\text{DM}}_{i} \) is the dispersion measure of ith cluster and can be defined as

\( {\text{DM}}_{i} = \left[ {\frac{1}{{{\text{no}}_{i} }}\sum\nolimits_{k = 1}^{{{\text{no}}_{i} }} {\left\| {dp_{k}^{(i)} - {\text{cc}}_{i} } \right\|^{2} } } \right]^{{\frac{1}{2}}} \), such that \( {\text{no}}_{i} \) represents the total number of objects and \( {\text{cc}}_{i} \) represents the ith cluster center in cluster \( C_{i} \) and \( \forall dp_{k}^{(i)} \in C_{i} \).

The distance between the clusters \( C_{i} \) and \( C_{j} \) can be defined as

$$ d_{ij} = \left\| {{\text{cc}}_{i} - {\text{cc}}_{j} } \right\|. $$

Now, the Davies–Bouldin (DB) index can be defined as

$$ {\text{DBI}} = \frac{1}{{n_{c} }}\sum\limits_{i = 1}^{{n_{c} }} {I_{i} } $$
(11)

where \( I_{i} = \max_{{j = 1, \ldots ,n_{c} , i \ne j}} ({\text{CSM}}_{ij} ),\;i = 1, \ldots ,n_{c} \).

5 Basic Steps in Quantum-Inspired Bat Algorithm

5.1 Principle of Operation of the QIBAT Algorithm

In this section, a quantum-inspired bat algorithm has been presented. The proposed method has been applied on image dataset to determine the optimal number of cluster from the given dataset.

Initially, the population of Ps number of bats has been initialized (line 1). Now, in order to get the quantum states the population Ps has been encoded to produce \( P^{\alpha } \). The encoding process has been carried out in the following way.

Initially, \( \theta_{i,j} (0 \le \theta_{i,j} \le 2\pi ) \) has been generated randomly for each of the solutions to the population. Then, generate \( P_{i,j}^{\alpha } \), where \( P_{i,j}^{\alpha } = \cos \theta_{i,j} ,i = (1,2, \ldots ,Ps) \) and \( j = (1,2, \ldots ,Ln) \). Now, the \( P^{\beta } \) has been generated from \( P^{\alpha } \) to achieve quantum orthogonality by using Eq. (8) (lines 2–3). Now, the search process has been guided by the state generated from \( P_{i,j}^{\alpha } \) and \( P_{i,j}^{\beta } \) by using Eq. (12). The active cluster centroids have been identified by using the information of the excited states (line 7). Now, the fitness of each individual solution has been computed to find out the fittest solution from the population. The rotation gate has been used [Eq. (9)] to generate another set of populations \( P_{\text{new}}^{\alpha } \) and \( P_{\text{new}}^{\beta } \). Update the values of \( P^{\alpha } \) and \( P^{\beta } \) by their previous values or by the newly generated values \( P_{\text{new}}^{\alpha } \) and \( P_{\text{new}}^{\beta } \) depending upon the fitness value. If the newly generated values can be able to provide lesser fitness, then choose the new one; otherwise choose the old set of values. Using this mechanism, a noble quantum population can be created (lines 5–13). Then, the velocity \( \left( {{\mathcal{V}}_{i} } \right) \), frequency \( \left( {{\mathcal{F}}_{i} } \right) \), pulse rate \( \left( {R_{i} } \right) \) and loudness \( \left( {A_{i} } \right) \) for each quantum solution to the quantum population have been initialized (lines 14–15). In each of the iterations of the main loop, the quantum positions, velocity and fitness have been updated (lines 18–19). Then, improvement of the best quantum solutions has been done (lines 20–22). Next, the new quantum solutions have been evaluated (line 23). The best quantum solutions have been determined and saved (lines 24–27). Finally, determine the best quantum solution (line 28). Finally, the optimal value for the number of cluster has been evaluated from the best quantum solution.

Algorithm: Quantum-Inspired Bat

Input:

Maximum iteration number: MxIt

Population size: Ps

Output:

Optimal number of cluster: \( {\mathcal{N}}_{{\mathcal{C}}} \)

Optimal fitness value: \( {\mathcal{F}}_{{\mathcal{T}}} \)

  1. 1.

    Initially, a population P is created with Ps number of particles by choosing the normalized value of the intensity of the image, where each particle is considered as a solution to the problem. Let us consider the length of each particle is Ln where Ln has been chosen as the square root of the maximum value of the intensity of the input image.

  2. 2.

    Now, each element of P is encoded to produce \( P^{\alpha } \) using an encoding scheme by using Eq. (9), described in Sect. 3.

  3. 3.

    Now to establish the feature of quantum orthogonality, each particle of \( P^{\alpha } \) is participating to produce \( P^{\beta } \) by using Eq. (8).

  4. 4.

    In order to produce \( I_{\text{normal}} \), the normalization has been carried out on the input image to convert the pixel intensity values between the ranges 0 and 1.

  5. 5.

    \( {\text{for}}\,i = 1\,to\,Ps\,do \)

  6. 6.

    \( {\text{for}}\,j = 1\,to\,Ln\,do \)

  7. 7.

    Create \( {\mathcal{N}}_{{{\mathcal{C}}_{i} }} \) number of unique cluster points from \( P_{i} \) by satisfying the condition \( P_{ij}^{\beta } > P_{ij}^{\alpha } \), and store the cluster points in \( {\text{CP}}_{i} \).

  8. 8.

    Now compute the fitness \( {\mathcal{F}}_{{{\mathcal{T}}_{i} }} \) from \( CP_{i} \) by using a cluster validity index, named \( {\text{DB index}} \) [Eq. (11)].

  9. 9.

    end for.

  10. 10.

    end for.

  11. 11.

    Find out the best fitness value \( {\mathcal{F}}_{{\mathcal{T}}} \) among all \( {\mathcal{F}}_{{{\mathcal{T}}_{i} }} \) along with its corresponding number of cluster point \( {\mathcal{N}}_{{\mathcal{C}}} \).

  12. 12.

    Now apply a small rotation \( \Delta \theta \) on each of the elements of \( P^{\beta } \), and again establish quantum orthogonality by generating corresponding value of \( P^{\alpha } \).

  13. 13.

    Continue the steps 5–12 until satisfying a predefined condition to achieve the best fitness value \( {\mathcal{F}}_{{\mathcal{T}}} \), and preserve its corresponding cluster points in CP, total number of cluster points in nc and also the new value of \( P^{\alpha } \) and \( P^{\beta } \) for further use.

  14. 14.

    Initialize the frequency \( \left( {{\mathcal{F}}_{i} } \right) \) and velocity \( \left( {{\mathcal{V}}_{i} } \right) \) for each solution to the population.

  15. 15.

    Loudness \( \left( {A_{i} } \right) \) and the pulse rates \( \left( {R_{i} } \right) \) have been initialized.

  16. 16.

    \( for\,I = 1\,to\,MxIt\,do \)

  17. 17.

    \( for\,J = 1\,to\,Ps\,do \)

  18. 18.

    Update the frequency, velocity and location of each particle in \( P_{J}^{\beta } \) using Eqs. (1), (2), (3).

  19. 19.

    Update each value of \( P_{J}^{\alpha } \) using Eq. (8) to ensure quantum orthogonality.

  20. 20.

    \( if\left( {Rand (0,1) > R_{ij} } \right) \) then

  21. 21.

    Select the best solution by computing the fitness from the updated positions, and generate a local solution by using Eq. (4).

  22. 22.

    end if.

  23. 23.

    Now, new solution has been evaluated.

  24. 24.

    \( if\left( {Rand(0,1) < A_{i} \,{\& }\,{\mathcal{F}}_{{{\mathcal{T}}_{\text{new}} }} < {\mathcal{F}}_{{{\mathcal{T}}_{\text{Best}} }} } \right) \) then

  25. 25.

    New solution has been accepted.

  26. 26.

    Increase \( R_{i} \) and decrease \( A_{i} \) by using Eqs. (5) and (6).

  27. 27.

    end if.

  28. 28.

    The best solution has been chosen.

  29. 29.

    end for.

  30. 30.

    end for.

Finally, the optimal number of cluster \( {\mathcal{N}}_{{\mathcal{C}}} \) with its corresponding fitness value FT is reported.

6 Experiment and Analysis of Result

This paper is intended to propose a quantum-inspired algorithm along with the working methodology of bat algorithm to achieve the optimal number of clusters from a given image dataset. During the experiment, the DB index [14] has been chosen as an objective function of which the minimum value is indicative of the optimal result. The following subsections provide the experimental results obtained.

6.1 Cluster Centroid Representation Scheme

The optimal number of cluster from an image dataset has been achieved by identifying the cluster centroids from the given dataset on a run. During the experiments, the cluster centroids have been chosen from a solution by satisfying the following condition.

$$ \delta_{i} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\,\beta_{i} > \alpha_{i} } \hfill \\ 0 \hfill & {\text{Otherwise}} \hfill \\ \end{array} } \right. $$
(12)

where the activation threshold \( \delta_{i} (1 < i < Ln) \) has been used to identify the number of cluster centroids from a single solution by observing the value of \( \delta_{i} \). A value of 1 for \( \delta_{i} \) represents the activated cluster center.

6.2 Experimental Result

See Tables 1, 2 and 3.

Table 1 Optimal number of cluster \( \left( {{\mathcal{N}}_{{\mathcal{C}}} } \right) \), optimal fitness value \( \left( {{\mathcal{F}}_{{\mathcal{T}}} } \right) \) and optimal execution time \( \left( {{\mathcal{T}}_{{\mathcal{E}}} } \right) \) in second for QIBAT and CBAT
Table 2 Mean \( \left( {{\mathcal{F}}_{\mu } } \right) \), standard deviation \( \left( {{\mathcal{F}}_{\sigma } } \right) \) and standard error \( \left( {{\mathcal{S}}_{{\mathcal{E}}} } \right) \) of fitness for QIBAT and CBAT
Table 3 Result of unpaired \( t - \text{test } \) \( \left( {{\mathcal{P}}\text{-}{\text{value}}} \right) \) between QIBAT and CBAT

6.3 Simulation of Work

The experimental implementation has been done on Python environment. The proposed method has been applied on two Berkeley images of size 80 × 120 and other two Berkeley images of size 120 × 80. A machine on Windows 7 with configuration Dell Intel(R) Core(TM) i3, 2.00 GHz, 4.00 GB RAM has been used as the development environment.

The proof of the superiority of the proposed method over its classical counterpart has been shown on the basis of the mean value of the fitness, standard error, standard deviation and minimum computational time. Moreover, a statistical superiority test, namely the unpaired t-test, has been performed to prove the efficiency of the quantum-inspired bat algorithm over the classical bat algorithm. The optimal number of cluster \( \left( {{\mathcal{N}}_{{\mathcal{C}}} } \right) \), optimal fitness value \( \left( {{\mathcal{F}}_{{\mathcal{T}}} } \right) \) and the optimal execution time \( \left( {{\mathcal{T}}_{{\mathcal{E}}} } \right) \) for both the algorithms QIBAT and CBAT have been reproduced in Table 1. The mean value of the fitness \( \left( {{\mathcal{F}}_{\mu } } \right) \), standard deviation \( \left( {{\mathcal{F}}_{\sigma } } \right) \) and standard error \( \left( {{\mathcal{S}}_{{\mathcal{E}}} } \right) \) for both the algorithms have been demonstrated in Table 2. Finally, the result of unpaired \( t - {\text{test}} \) has been shown in Table 3 which has been done with 95% confidence level. This test basically checks whether the \( \left( {{\mathcal{P}}\text{-}{\text{value}}} \right) \) is less than 0.05 or not. If the value of \( {\mathcal{P}} \) is less than 0.05, then null hypotheses will be rejected against the alternative hypothesis.

6.4 Dataset Used

The experiment has been performed on the following images, and these image datasets have been normalized between (0,1) during the execution of the program (Fig. 1).

Fig. 1
figure 1

Images used for testing: a #86000 (80 × 120), b #92059 (80 × 120), c #94079 (120 × 80), d #97017 (20 × 80)

7 Conclusion

This paper envisages an automatic image clustering technique based on bat algorithm combined with the principles of quantum computing. This proposed procedure has been found superior over its classical counterpart as it is capable to decide the optimal number of clusters automatically from an image dataset with a lesser amount of computation time. So far, the proposed procedure has been applied only to gray-level images with satisfying only one objective at a time.

There remains a scope of research on finding the optimal number of clusters from a true color image, and in the future the quantum-inspired algorithms will open up the door to resolve the multi-objective optimization problem efficiently within a short time frame.