A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering

Hosseini Rad, Mina; Abdolrazzagh-Nezhad, Majid

doi:10.1007/s00500-020-04881-0

A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering

Methodologies and Application
Published: 06 April 2020

Volume 24, pages 15529–15549, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering

Download PDF

Mina Hosseini Rad¹ &
Majid Abdolrazzagh-Nezhad²

471 Accesses
23 Citations
Explore all metrics

Abstract

Data aggregation from different databases into a data warehouse creates multidimensional data such as data cubes. With regard to the 3D structure of data, data cube clustering has significant challenges to perform on data cube. In this paper, new preprocessing techniques and a novel hybridization of DBSCAN and fuzzy earthworm optimization algorithm (EWOA) are proposed to solve the challenges. Proposed preprocessing consists of an assigned address to each cube cell and dimension move to create a related 2D data from the data cube and new similarity metric. The DBSCAN algorithm, as a density-based clustering algorithm, is adopted based on both Euclidean and newly proposed similarity metric, which are called DBSCAN1 and DBSCAN2 for the related 2D data. A new hybridization of the EWOA and DBSCAN is proposed to improve the DBSCAN, and it is called EWOA–DBSCAN. Also, to dynamically tune parameters of EWOA, a fuzzy logic controller is designed with two fuzzy group rules of Mamdani (EWOA–DBSCAN-Mamdani) and Sugeno (EWOA–DBSCAN-Sugeno), separately. These ideas are proposed to present efficient and flexible unsupervised analysis for a data cube by utilizing a meta-heuristic algorithm to optimize DBSCAN’s parameters and increasing the efficiency of the idea by applying dynamic tuning parameters of the algorithm. To evaluate the efficiency, the proposed algorithms are compared with DBSCAN1 and GA-DBSCAN1, GA-DBSCAN1-Mamdani and GA-DBSCAN1-Sugeno. The experimental results, consisting of 20 runs, indicate that the proposed ideas achieved their targets.

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

Article 08 August 2017

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Article 08 May 2018

Large Scale Efficient Clustering Using DBSCAN and Ensemble Techniques

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

There is a natural requirement for the effective methods for accessing data and extracting useful knowledge, with regard to the increasing expansion of data on different storage media. Data mining consists of the most effective methods in this field. The data mining is an iterative process in order to make the discovery of knowledge which is done manually and automatically. The data mining searches valuable and new information from the huge volume of data (Gnanapriya et al. 2010).

Description and prediction are the main aims of the data mining. In the first category, data attributes are described in a dataset and its focus is about finding patterns from the dataset so that the found patterns can be described by human. The second category is based on data deduction, looking for unknown variables and values of the data (Mining 2006). Each of these categories includes different patterns such as exploring frequent patterns, classification and regression, clustering and exploring outline patterns, each of which has its own application and features. The aim of this study is to investigate the clustering analysis, which is part of the descriptive patterns with regard to the type of data used for data mining (Pujari 2001).

In clustering, we can create a grouping of data and so, its main aim is to maximize similarity between samples of a cluster as well as minimizing similarity between samples of the various clusters (Berkhin 2006). The clustering widely helps discovery of unknown patterns in data and has a vast application in the various fields including Web searching, security and business intelligence as well (Vercellis 2011; Han et al. 2011). The data mining in business intelligence as a powerful and advanced technology will enable companies to have more focus on important data in data warehouses. It can help corporations to adopt effectively knowledge-based decisions in order to increase business profits using data mining tools (Hema and Malik 2010).

The multidimensional data analysis is one of the most important factors in improving efficiency and increasing the data mining speed in business intelligence. In this study, due to clustering, data cube-contained datasets were used; this type of data provides the possibility of analysis in various aspects. In the following, some of the works done in the data cube are investigated.

The data warehouses that contain collected data from data sources and are around a specific topic provide possible widespread. The data require the complex analysis for managers by using OLAP tools (Hema and Malik 2010). The data warehouse and OLAP tools are based on a multidimensional data model; therefore, the data cube is the best concept for data modeling in several dimensions, in which data are represented by dimensions and facts. In addition, it is possible to use the OLAP operation in order to create views, interactive query and perform data analysis in the data cube (Chaudhuri and Dayal 1997). With regard to Liço (2017), OLAP is introduced as the main component of business intelligence and data cube is considered as an OLAP’s main component. Moreover, it considers the data cube as the most powerful tool for using in Big Databases. The study introduces intelligent cube in order to reduce system response time and also addresses using compression techniques to reduce storage memory space.

Introducing clustering algorithm for modeling of the data cube and collecting information from cuboid has been already done by Woo et al. (2015). In this study, the amounts of special attributes contain flow of large data and cuboid is used for saving different parts of flow data and so clustering is carried out on this type of data. A research was done on hierarchical-based clustering algorithm (Ceci et al. 2015) through continuous data, and their aim was using the algorithm in applications including wireless sensor network.

In the current research, data cube clustering is considered to prepare an efficient unsupervised analysis through the data. The challenge of cube data clustering is the existence of irregular and specific data (when data size is high), which makes it difficult to cluster. There are several approaches for clustering (Joshi and Kaur 2013), which include partitioning method, hierarchical-based clustering, density-based clustering and grid-based clustering, and among these four approaches, only a density-based approach has the ability to identify nonconvex clusters.

Therefore, in order to achieve higher efficiency, we use density-based clustering methods. Among the density-based clustering methods, the method of DBSCAN is widely used in comparison with other methods. The popularity reasons of the DBSCAN are its simplicity to performance and its ability to recognize clusters with different sizes and nonconvex shapes (Berkhin 2006; Cheng 2017). Hence, in the current research, the DBSCAN algorithm is selected for density-based clustering. The DBSCAN is a very good candidate to find nonconvex clusters in data space (Kumar and Reddy 2016). The challenge of the DBSCAN clustering is the cluster’s dependence on its parameters such as the neighborhood radius and the minimum points. These parameters are empirically chosen according to the type of data. So, the fine-tuning of these parameters has a significant role in identifying proper clusters.

There are several studies which tried to improve DBSCAN. In Smiti and Eloudi (2013), fuzzy set theory was applied to design fuzzy clustering and improve DBSCAN that the authors called Soft DBSCAN. The Soft DBSCAN is a new fuzzy clustering, which offers appropriate primal degrees for data’s membership to express proximities of data entities to the cluster centers. A graph-based index structure method group (Kumar and Reddy 2016) was proposed to improve the performance of DBSCAN on high-dimensional dataset that accelerated the neighbor search operations. A new measurement criterion (Cheng 2017) was utilized to obtain a distance which was calculated based on the threshold analysis of the nearest neighbor with the total neighbors. Darong and Peng (2012) combined partition technique with DBSCAN. The goal was to obtain the proper input parameters for DBSCAN. However, the effectiveness of this method was not evaluated for datasets with different densities. A combination of Gaussian-means method with DBSCAN (Smiti and Eloudi 2012) was proposed to improve the determination of DBSCAN parameters. However, Gaussian-means creates circular clusters that are not density-based and do not act very well against dense data as well. The DBSCAN clustering was combined with binary differential evolution (Karami and Johansson 2014) to determine the parameters of the DBSCAN. Recently, many meta-heuristic algorithms have been presented to improve clustering on various algorithms for reducing clustering sensitivity to the important parameters of the algorithms (Chen 2012; Pei et al. 2008; Zhao 2007; Aydilek and Arslan 2013).

Among them, there is a lack of improvement in the DBSCAN as a density-based clustering with a meta-heuristic algorithm such as earthworm optimization algorithm (EWOA) (Wang et al. 2018). Therefore, in this paper, the EWOA is considered to identify the best parameters for the DBSCAN. The proposed clustering algorithm, which is called the improved DBSCAN (EWOA–DBSCAN), contains a hybridization of the DBSCAN with the EWOA on the obtained 2D data from the data cube based on the new dimension move and similarity metric.

The EWOA’s challenge is tuning Reproduction 1 and Reproduction 2 parameters that are also empirically determined. The parameters have a significant impact on the efficiency and convergence of the algorithm. There have been several attempts to dynamically tune the other meta-heuristic parameters such as proposing a diversity measure between GA’s chromosomes in the population, designing a variation depending on the fitness function values of GA’s chromosomes and planning fuzzy logic controllers (FLCs) (Angelova and Pencheva 2011; Karafotias et al. 2015), but this type of tuning parameters has not been considered for the EWOA. An FLC is constructed with a knowledge base, which consists of the information given by linguistic control rules, a fuzzification interface, an inference system and a defuzzification interface (Liu and Lampinen 2005; Herrera and Lozano 2003). Therefore, in order to modify the EWOA–DBSCAN, another data cube clustering algorithm is proposed, which attempts to improve accurate selection of effective EWOA parameters by using a new simple FLC based on two group linguistic control rules such as Mamdani’s rules and Takagi–Sugeno’s rules. We name the second algorithm as “soft improved DBSCAN” (EWOA–DBSCAN- Mamdani and EWOA–DBSCAN-Sugeno). The proposed ideas and their achievements will be considered to design 3D clustering in our future work.

The rest of the paper is organized as follows. In Sect. 2, we describe preprocessing of data cube based on data cube structure, assigning an address to each cube cell, designing a dimension move to create a related 2D data from the data cube and proposing a new similarity metric. The proposed clustering algorithms are explained in Sect. 3 and consist of three investigated evaluation indices, DBSCAN algorithm, improved and soft improved DBSCAN algorithms. Then, in the following, the experimental results are illustrated in Sect. 4. Finally, conclusion and future works are explained in Sect. 5.

2 Preprocessing of data cube

In order to smoothly explain the proposed clustering algorithms and demonstrate their scalability, it is necessary to first describe in four steps the cube data structure, the normalizing data cube, the dimension move and the new similarity metric in this section. To clarify these descriptions, a data cube sample is designed with size $ 5 \times 4 \times 3 $, which is shown in Fig. 1. The sample’s values belong to the sales of three well-known hypermarkets in Iran, namely Refah (Ref), Etkah (Etk) and Shahrvand (Sha). The sales are obtained in five cities in Iran, namely Tehran (Teh), Shiraz (Shi), Mashhad (Mash), Tabriz (Tab) and Esfahan (Esf). Each city’s hypermarket sales have been measured in four seasons, namely winter, spring, summer and fall. It is assumed that the cities are represented by X ($ \left| X \right| = 5 $), the seasons by Y ($ \left| Y \right| = 4 $) and the hypermarkets by Z ($ \left| Z \right| = 3 $). So, the steps of preprocessing are described as follows.

2.1 Data cube structure

A normal datum is a two-dimensional array of values in which each row is called an object and each column is called as attribute. Since, data cube is a multidimensional array of values that it has more than 2D. Each dimension of data cube belongs to an attribute. The investigated data cube of the current research consists of 3D array of values, which is called $ {\text{cube(}}x,y,z )= {\text{value}} $. In Fig. 1, there are three attributes, namely city (x), season (y) and hypermarket (z), and for example, $ {\text{cube}}(3,2,1) = 163 $ represents 163 sale units of Shahrvand Hypermarket on Mashhad city in spring. In the research, an address is assigned to each cube cell based on the following function:

$$ V_{r} : r = \left| {Z \times Y} \right| \times (i - 1) + \left| Y \right| \times (k - 1) + j $$

(1)

where $ i $, $ j $ and $ k $ represent the cell indices for $ X $, $ Y $ and $ Z $, respectively. For example, in Fig. 1, $ v_{42} $ is the address of $ {\text{cube}}(4,2,2) = 600 $. Also, the above function has the following inverse function:

$$ i = \left\{ {\begin{array}{*{20}l} {{\text{div}}\left( {r, \left| {Z \times Y} \right|} \right),} \hfill & {{\text{if}}\;\bmod \left( {r,\left| {Z \times Y} \right|} \right) = 0} \hfill \\ {{\text{div}}\left( {r, \left| {Z \times Y} \right|} \right) + 1,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$

(2)

$$ k = \left\{ {\begin{array}{*{20}l} {{\text{div}}\left( {\left( {r - \left| {Z \times Y} \right| \times (i - 1)} \right),\left| Y \right|} \right),} \hfill & {{\text{if}}\; {\text{mod}}\left( {\left( {r - \left| {Z \times Y} \right| \times (i - 1)} \right),\left| Y \right|} \right) = 0} \hfill \\ {{\text{div}}\left( {\left( {r - \left| {Z \times Y} \right| \times (i - 1)} \right), \left| Y \right|} \right) + 1,} \hfill & {{\text{otherwise}} } \hfill \\ \end{array} } \right. $$

(3)

$$ j = r - \left| {Z \times Y} \right| \times (i - 1) - \left| Y \right| \times (k - 1) $$

(4)

The address function and its inverse create a two-way relationship between 3D and 2D search spaces. Because the proposed data cube clustering converts the 3D space of data cube into a two-dimensional space by moving one dimension along the other dimensions using the Dimension Move Algorithm.

2.2 Normalizing data cube

There are different scaling sizes between the 3D attributes of the data, and application of normalizing the attributes causes removal of the effect of larger scale attributes on smaller one. The min–max normalization (Han et al. 2011) is a linear converter of $ v_{i} \in A $ into $ v^{\prime }_{i} \in A^{\prime } $ with new bound $ \left[ {{\text{new}}\_{ \hbox{min} }_{{A^{\prime } }} ,{\text{new}}\_{ \hbox{max} }_{{A^{\prime } }} } \right] $. The min–max normalization was proposed for 2D data, and a 3D draft of the normalization (see Fig. 1), which is linearly converted to $ [0,1] $, is designed in the current research whose procedure is shown for the first layer as follows:

$$ \left\{ {\begin{array}{*{20}c} {{\text{cube}}^{\prime } (i,j,1) = \frac{{{\text{cube}}(i,j,1) - \mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {1 \le i \le \left| X \right|} \\ {1 \le j \le \left| Y \right|} \\ \end{array} }} {\text{cube(}}i,j,1 )}}{{\mathop {\hbox{max} }\limits_{{\begin{array}{*{20}c} {1 \le i \le \left| X \right|} \\ {1 \le j \le \left| Y \right|} \\ \end{array} }} {\text{cube(}}i,j,1 )- \mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {1 \le i \le \left| X \right|} \\ {1 \le j \le \left| Y \right|} \\ \end{array} }} {\text{cube(}}i,j,1 )}}} \\ \vdots \\ {{\text{cube}}^{\prime } \left( {i,j,\left| Z \right|} \right) = \frac{{{\text{cube}}\left( {i,j,\left| Z \right|} \right) - \mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {1 \le i \le \left| X \right|} \\ {1 \le j \le \left| Y \right|} \\ \end{array} }} {\text{cube}}\left( {i,j,\left| Z \right|} \right)}}{{\mathop {\hbox{max} }\limits_{{\begin{array}{*{20}c} {1 \le i \le \left| X \right|} \\ {1 \le j \le \left| Y \right|} \\ \end{array} }} {\text{cube}}\left( {i,j,\left| Z \right|} \right) - \mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {1 \le i \le \left| X \right|} \\ {1 \le j \le \left| Y \right|} \\ \end{array} }} {\text{cube}}\left( {i,j,\left| Z \right|} \right)}}} \\ \end{array} } \right. $$

(5)

where $ i = 1, \ldots ,\left| X \right| $ and $ j = 1, \ldots ,\left| Y \right| $.

2.3 Dimension move

There are technical reshape data cube and 3D matrix such as Scovanner et al. (2007), Zhao and Yang (2015) and Johnson et al. (2013) which have different complexity levels and are applied in image processing. In the current research, a dimension move is designed to convert a 3D matrix into a 2D matrix (see Fig. 1). To achieve the aim, first, sub-2D matrixes of the 3D matrix are considered by dimension separation along a given dimension. For example, three sub-2D matrixes ($ X \times Y $) are separated along Z in Fig. 1. Second, the 2D matrixes are joined together along one of their dimensions. In Fig. 1, the 2D separated matrixes are joined together along Y which create a 2D matrix with $ \left| X \right| $ rows and $ \left| Y \right| \times \left| Z \right| $ columns. The procedure of dimension move is presented in Algorithm 1.

2.4 New similarity metric

As mentioned, each cell of a data cube is indicated with four values such as $ {\text{Cube(}}i,j,k )= {\text{value}} $, where $ i,j,k $ are dimension values of X ($ X{\text{Cube}} $), Y ($ Y{\text{Cube}} $) and Z ($ Z{\text{Cube}} $), respectively. Because the obtained 2D data from a given data cube consist of quantitative values for its rows and column labels, a new similarity metric is proposed to calculate distance between rows A and B in the 2D data. The similarity’s procedure is presented in Algorithm 2. The similarity metric is utilized to calculate dissimilarity of two rows in the procedure of DBSCAN clustering (Algorithm 3) and the investigation evaluation indices between the samples in each cluster [(7) and (8)].

Here, $ \delta $ is an effectiveness parameter for comparing quantitative values of $ X{\text{Cube}} $, $ Y{\text{Cube}} $ and $ Z{\text{Cube}} $. The parameter is tuned in the range of $ \left[ {0.2,0.4} \right] $, since 3D matrix was normalized.

3 The proposed clustering algorithms

With regard to the proposed preprocessing, the data cube with its 3D matrix structure was converted to a related 2D data by the dimension move (Algorithm 1) and the distance between rows A and B were calculated by a new similarity metric (Algorithm 2) in the 2D data. Now, two improved drafts of density-based clustering are proposed to solve data cube clustering. In the next subsections, first, three investigation evaluation indices are explained. Second, DBSCAN algorithm is presented and also the challenge of DBSCAN algorithm and novel strategies are proposed to improve it.

3.1 The investigation evaluation indices

Three investigation evaluation indices, namely the Davies–Bouldin index (DBI) (Davies and Bouldin 1979), Dunn index (DI) (Bezdek and Pal 1995) and Silhouette index (SI) (Rousseeuw 1987), are considered to evaluate the obtained clusters with DBSCAN. The DBI calculates within-cluster distance and between-cluster distance. The best choice of clusters will be done, since the DBI is minimized and the index is formulated as follows:

$$ {\text{DBI}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {\mathop {\hbox{max} }\limits_{{\begin{array}{*{20}c} {j = 1, \ldots ,N} \\ {j \ne i} \\ \end{array} }} \left( {\frac{{S_{i} + S_{j} }}{{d_{ij} }}} \right)} \right) $$

(6)

where $ N $ is the number of clusters and $ d_{ij} $ is the average linkage as between-cluster distance of clusters $ C_{i} $ and $ C_{j} $. Also, $ S_{i} $ and $ S_{j} $ are the average distance of within-cluster $ C_{i} $ and within-cluster $ C_{j} $, respectively.

$$ d_{ij} = \frac{{\mathop \sum \nolimits_{{p_{r} \in C_{i} ,p_{s} \in C_{j} }} p_{r} - p_{s} }}{{C_{i} C_{j} }} $$

(7)

$$ S_{i} = \frac{{\mathop \sum \nolimits_{{p_{r} ,p_{s} \in C_{i} }} p_{r} - p_{s} }}{{C_{i} \left( {C_{i} - 1} \right)}} $$

(8)

where $ \left\| \cdot \right\| $ is Algorithm 2 and $ p_{r} \in C_{i} $ means point $ r $ belongs to the cluster $ i $.

The DI focuses on compactness of clusters which are well-separated from others based on inter-cluster distance and cluster’s diameter, respectively. So, it calculates the minimum inter-cluster distance divided by the maximum cluster size. The best choice of clusters will be done, since the DI is maximized and the index is formulated as follows:

$$ {\text{DI}} = \frac{{\mathop {\hbox{min} }\nolimits_{{\begin{array}{*{20}c} {i,j = 1, \ldots ,N} \\ {j \ne i} \\ \end{array} }} \left( {D_{ij} } \right)}}{{\mathop {\hbox{max} }\nolimits_{k = 1, \ldots ,N} \left( {{\text{diam}}_{k} } \right)}} $$

(9)

$$ D_{ij} = \mathop {\hbox{min} }\limits_{{p_{r} \in C_{i} ,p_{s} \in C_{j} }} \left\| {p_{r} - p_{s} } \right\| $$

(10)

$$ {\text{diam}}_{k} = \mathop {\hbox{max} }\limits_{{p,p' \in C_{k} }} \left\| {p - p^{\prime } } \right\| $$

(11)

where $ \left\| \cdot \right\| $ is Algorithm 2, $ D_{ij} $ is the inter-cluster distance between clusters $ C_{i} $ and $ C_{j} $ and $ {\text{diam}}_{k} $ is the diameter of cluster $ C_{k} $.

The SI focuses on consistency within clusters based on comparison of the similarity of an object to its own cluster to other clusters. The index is calculated for a data point $ p_{i} $ in the cluster $ C_{i} $ and ranges from − 1 to + 1, where the values close to 1 show that the object is well matched to its own cluster and poorly matched to neighboring clusters. So, the best choice of clusters will be done, since the DI is maximized by being close to 1 and the SI’s average of the all data points is formulated as follows:

$$ {\text{SI}} = \frac{1}{{\left| {\text{Data}} \right|}}\mathop \sum \limits_{i = 1}^{{\left| {\text{Data}} \right|}} \left( {\frac{{b\left( {p_{i} } \right) - a\left( {p_{i} } \right)}}{{\hbox{max} \left\{ {a\left( {p_{i} } \right),b\left( {p_{i} } \right)} \right\}}}} \right) $$

(12)

$$ b\left( {p_{i} } \right) = \mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {k \ne i} \\ {k = 1, \ldots ,N} \\ \end{array} }} \frac{1}{{\left| {C_{k} } \right|}}\mathop \sum \limits_{{p_{j} \in C_{k} }} \left\| {p_{i} - p_{j} } \right\| $$

(13)

$$ a\left( {p_{i} } \right) = \frac{1}{{\left| {C_{i} } \right| - 1}}\mathop \sum \limits_{{p_{j} ,p_{i} \in C_{k} }} \left\| {p_{i} - p_{j} } \right\| $$

(14)

where $ \left\| \cdot \right\| $ is Algorithm 2 and |Data| and $ \left| {C_{k} } \right| $ are the number of all data points and the number of data points in cluster k, respectively. $ a\left( {p_{i} } \right) $ is the mean distance between $ p_{i} $ and all other data points in the same cluster, and $ b\left( {p_{i} } \right) $ is the mean dissimilarity of $ p_{i} $ to the other clusters.

3.2 DBSCAN algorithm

DBSCAN (Han et al. 2011) is an information clustering method based on the data density whose brief procedure is presented in Algorithm 1. Two parameters, namely the neighborhood radius (ε) and minimum points (MinPts) (µ), needed to form a cluster have been used in order to evaluate the distributed density of points. This algorithm begins from an optional point, and then it accounts for the points which are located in the neighborhood radius of this point at a distance less than ε. If the number of points is more than µ parameter, they produce a cluster; otherwise, the intended point is known as outliner data. In the next step, this point may be recognized as a part of a cluster. The advantage of this method is the ability to distinguish and separate the outliner data from other data.

In this algorithm, the most important role is to find the proper $ \varepsilon $ and $ \mu $ points. Commonly, by using statistical and classical methods of combining different data mining ways, we can find these points. In many cases, despite consuming too much time, this is not run with high precision. Therefore, in the research, we try to use the earthworm optimization algorithm (EWOA) (Wang et al. 2018), as a meta-heuristic algorithm, to estimate the exact values of these parameters and achieve significant improvements.

3.3 The improved DBSCAN

To design the improved BDSCAN (EWOA–DBSCAN), the EWOA is adapted to find the optimum values of P, µ and ε in Algorithm 3. In the adapted EWOA to improve DBCSAN for an obtained 2D dataset (Algorithm 1) with |Data| objects/points and M attributes, each earthworm is an M + 2-dimensional array such as (15). The first M elements represent an initial point P at which DBSCAN starts. The element of M + 1 represents the neighboring radius (ε), and the last element represents the value of MinPts (µ).

$$ {\text{EW}}_{i} = \left\{ {\begin{array}{*{20}l} {{ \hbox{min} }_{ij} \le {\text{ew}}_{ij} \le { \hbox{max} }_{ij} ,} \hfill & {{\text{if}}\;1 \le j \le M} \hfill \\ {{\text{dis}}_{ \hbox{min} } \le {\text{ew}}_{ij} \le {\text{dis}}_{ \hbox{max} } ,} \hfill & {{\text{if}}\; j = M + 1} \hfill \\ {2 \le {\text{ew}}_{ij} , } \hfill & {{\text{if}}\;j = M + 2} \hfill \\ \end{array} } \right. $$

(15)

$$ { \hbox{min} }_{ij} = \mathop {\hbox{min} }\limits_{{1 \le j \le \left| {\text{Data}} \right|}} 2D\_{\text{Data}}\left( {i,j} \right) $$

(16)

$$ { \hbox{max} }_{ij} = \mathop {\hbox{max} }\limits_{{1 \le j \le \left| {\text{Data}} \right|}} 2D\_{\text{Data}}\left( {i,j} \right) $$

(17)

$$ {\text{dis}}_{ \hbox{min} } = \mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {1 \le i,r \le \left| {\text{Data}} \right|} \\ {i \ne r} \\ \end{array} }} \left\| {p_{i} - p_{r} } \right\| $$

(18)

$$ {\text{dis}}_{ \hbox{max} } = \mathop {\hbox{max} }\limits_{{\begin{array}{*{20}c} {1 \le i,r \le \left| {\text{Data}} \right|} \\ {i \ne r} \\ \end{array} }} \left\| {p_{i} - p_{r} } \right\| $$

(19)

where $ i,r = 1, \ldots ,{\text{pop}}\_{\text{size}} $, $ { \hbox{min} }_{ij} $ and $ { \hbox{max} }_{ij} $ are the minimum and the maximum values of $ j{\text{th}} $ columns of $ 2D\_{\text{Data}} $, $ {\text{dis}}_{ \hbox{min} } $ and $ {\text{dis}}_{ \hbox{max} } $ are the minimum and the maximum distances between objects/points. Note that $ { \hbox{min} }_{ij} = 0 $ and $ { \hbox{max} }_{ij} = 1 $ for a normalized 2D data.

The EWOA’s inputs are population size ($ {\text{pop}}\_{\text{size}} $), similarity factor ($ \alpha $), the number of kept earthworms ($ n{\text{KEW}} $), maximum generations ($ {\text{MaxGen}} $) and/or other terminate criteria (Carvalho and Freitas 2004; Nagar and Srivastava 2008; Huang 1997; Freitas 2003). In Algorithm 4, a brief procedure of the EWOA is presented based on Initialization, Reproduction 1, Reproduction 2 and Selection.

Initialization: With regard to the earthworm’s structure (15), $ {\text{pop}}\_{\text{size}} $ earthworms are generated, as initial population in the zero generation $ {\text{EW(}}0 )= \left\{ {{\text{EW}}_{1}^{0} , \ldots ,{\text{EW}}_{{{\text{pop}}\_{\text{size}}}}^{0} } \right\} $ randomly that is $ {\text{EW}}_{i}^{0} = \left( {{\text{ew}}_{i1} , \ldots ,{\text{ew}}_{i,M + 2} } \right) $. To evaluate each earthworm, Algorithm 3 runs and the DBI (6) calculates its fitness function. The worms of $ {\text{EW}}\left( 0 \right) $ are sorted in ascending order, and the best one is saved as $ {\text{EW}}_{\text{best}} $.

Reproduction 1: The EWOA’s procedure includes two strategies to generate new offspring such as Reproduction 1 and Reproduction 2. The behavior of Reproduction 1 is similar mutation in GA, and first the operation is run. The Reproduction 1 has just effects on $ \varepsilon $ and $ \mu $ values, and one worm ($ {\text{EW}}_{i} $) generates a new offspring ($ {\text{EW}}_{i}^{\text{new}} $) by the following formula and saves it in $ R1\left( G \right) $:

$$ {\text{EW}}_{i}^{\text{new}} = \left\{ {\begin{array}{*{20}l} {{\text{ew}}_{i,j} ,} \hfill & {{\text{if}}\; 1 \le j \le M} \hfill \\ {{\text{dis}}_{ \hbox{max} } + {\text{dis}}_{ \hbox{min} } - \alpha \times {\text{ew}}_{i,j} , } \hfill & {{\text{if}}\; j = M + 1} \hfill \\ {2 \le {\text{ew}}_{i,j} + \left( {U\left( { - \alpha ,\alpha } \right) \times \left| {\text{Data}} \right| } \right),} \hfill & {{\text{if}}\; j = M + 2} \hfill \\ \end{array} } \right. $$

(20)

where $ \alpha \in \left[ {0,1} \right] $ is a similarity factor, and it can adjust the distance between the parent earthworm ($ {\text{EW}}_{i} $) and its offspring ($ {\text{EW}}_{i}^{\text{new}} $). Decreasing $ \alpha $ value causes reduces the distance. $ U\left( { - \alpha ,\alpha } \right) $ generates random real value in $ \left[ { - \alpha ,\alpha } \right] $.

Reproduction 2: There are different strategies to generate offspring in Reproduction 2 since earthworm’s index is bigger or smaller than $ n{\text{KEW}} $. In the adapted draft of EWOA for the current research, if the index is bigger than $ n{\text{KEW}} $ ($ i > n{\text{KEW}} $), a earthworm is selected from $ {\text{EW(}}G ) $ like $ {\text{EW}}_{S} $ ($ i \ne S $) based on the Roulette wheel. Then, the scattered crossover (Fig. 2) is run on $ {\text{EW}}_{S} $ and $ {\text{EW}}_{i} $ to generate two new offspring ($ {\text{EW}}_{1}^{\text{new}} $ and $ {\text{EW}}_{2}^{\text{new}} $) and save them in $ R2\left( G \right) $. If the index is smaller than $ n{\text{KEW}} $ ($ i \le n{\text{KEW}} $), the earthworm ($ {\text{EW}}_{i} $) is known as the elite worm and is stored in $ R2\left( G \right) $ by changing $ \varepsilon $ and $ \mu $ values as follows:

$$ {\text{EW}}_{i}^{\text{new}} = \left\{ {\begin{array}{*{20}l} {{\text{ew}}_{i,j} , } \hfill & {{\text{if}}\;1 \le j \le M} \hfill \\ {{\text{dis}}_{ \hbox{max} } + {\text{dis}}_{ \hbox{min} } - \frac{\alpha }{2} \times {\text{ew}}_{i,j} , } \hfill & { {\text{if}}\; j = M + 1} \hfill \\ {2 \le {\text{ew}}_{i,j} + \left( {U\left( {\frac{ - \alpha }{2},\frac{\alpha }{2}} \right) \times \left| {\text{Data}} \right| } \right),} \hfill & {{\text{if}}\;j = M + 2} \hfill \\ \end{array} } \right. $$

(21)

Selection: There are $ {\text{EW}}\left( G \right) \cup R1\left( G \right) \cup R2\left( G \right) $ earthworms in the last generation G, and the evaluation of offspring worms in $ R1\left( G \right) \cup R2\left( G \right) $ is done by running Algorithm 3 and (6). To reduce the number of worms to $ {\text{pop}}\_{\text{size}} $ one for considering in the next generation (EW $ \left( {G + 1} \right) $), EW $ \left( G \right) \cup R1\left( G \right) \cup R2\left( G \right) $ and the top $ {\text{pop}}\_{\text{size}} $ worms are sorted in the ascending order. Also, the best one is saved as $ {\text{EW}}_{\text{best}} $.

The challenge for the EWOA is to optimize the determination of the Reproduction 1 ($ \alpha $) and Reproduction 2 ($ n{\text{KEW}} $) parameters. These parameters are empirically determined and have a significant impact on the efficiency, accuracy and speedup of the algorithm.

3.4 The soft improved DBSCAN algorithm

To fill up the above challenge and to design the soft improved DBSCAN (EWOA–DBSCAN–FLC), a new self-adaptive EWOA based on a new fuzzy logic controller (FLC) is designed in this subsection.

Algorithm 5 has the procedure like Algorithm 4, except calculating $ \alpha $ and $ n{\text{KEW}} $ by the proposed FLC (see Fig. 3) based on the inputs such as the $ {\text{UG}} $ and the $ {\text{fitBest}} $. The FLC’s inputs are normalized values related to generation, and the evaluation function (6) that they calculate is as follows:

$$ {\text{UG}} = \frac{G}{{{\text{Max}}G}} $$

(22)

$$ {\text{fitBest}} = \frac{{{\text{DBI}}\left( { {\text{EW}}_{\text{best}} } \right)}}{{{ \hbox{min} }\left( {{\text{DBI}}\left( {{\text{EW}}_{S} } \right),{\text{DBI}}\left( {{\text{EW}}_{i} } \right)} \right)}} $$

(23)

The FLC consists of fuzzifying the inputs, linguistic logic strategy (LLS) and defuzzifying the outputs. The inputs fuzzify based on the presented membership functions in Fig. 4 . The LLS includes two main parts: rule based and inference engine. There are two group rules, which are called Mamdani’s rules and Takagi–Sugeno’s rules, because the FLC is designed to generate dynamic outputs based on Mamdani (Mamdani and Assilian 1975) and Takagi–Sugeno’s inferences (Takagi and Sugeno 1993). The LLS creates outputs’ surfaces in Figs. 5 and 6 by Mamdani’s rules and Takagi–Sugeno’s rules, respectively. Maximum and minimum operations are considered for “OR” and “AND” operators in the LLS’s inference engine to aggregation functions and reasoning. The center of gravity is used for defuzzification method.

4 Evaluation of the improved DBSCAN algorithm and the soft improved DBSCAN

To evaluate the effectiveness of the proposed algorithms, experiments were performed on the Intel Core i5 3.2 GHz CPU and 4.00 GB memory. The algorithms were implemented in MATLAB 2017a. Six benchmarks datasets of the data cube, which are available from UCI and considered for experimentation, are shown in Table 1.

Table 1 Investigated data cube

Full size table

In order to achieve scalability of the proposed data cube clustering algorithms, each cell of the investigated 3D benchmarks such as $ \left( {i,j,k} \right) $ is assigned by a given address like r based on (1) and the addresses have inverse functions (2), (3) and (4) to calculate their $ \left( {i,j,k} \right) $. The data are normalized based on (5), and their related 2D data are obtained by dimension move (Algorithm 1). To analyze DBSCAN2, which utilizes the proposed similarity metric (Algorithm 2) in the procedure of DBSCAN clustering (Algorithm 3) and the investigation evaluation indices, DBSCAN1 is considered based on the Euclidean metric in Algorithm 3 and the indices for its evaluation.

Eight drafts of the proposed clustering algorithms are preformed, namely DBSCAN1, DBSCAN2, EWOA–DBSCAN1, EWOA–DBSCAN2, EWOA–DBSCAN1-Mamdani, EWOA–DBSCAN2-Memdani, EWOA–DBSCAN1-Sugeno and EWOA–DBSCAN2-Sugeno. There are eight parameters in the experimentation, namely µ and ε in Algorithm 3, $ {\text{pop}}\_{\text{size}} $, $ \alpha $, $ n{\text{KEW}} $ and $ {\text{Max}}G $ in Algorithm 4 and $ {\text{pop}}\_{\text{size}} $ and $ {\text{Max}}G $ in Algorithm 5. Tuning parameters of Algorithm 3 are based on ε = 0.5 and µ is 10% of the investigated data, because other values increased the DBI and number of clusters simultaneously. Also, Algorithm 4 was tested on the data cube of “Daily Demand Forecasting Orders” by different values for $ \alpha $ and $ n{\text{KEW}} $; then, $ \alpha = 0.8 $ and $ n{\text{KEW}} = \left[ {0.2 \times {\text{pop}}\_{\text{size}}} \right] $, which had the best results, were considered for experimentations of all data cubes. Finally, $ {\text{pop}}\_{\text{size}} $ and $ {\text{Max}}G $ were tuned with 100 earthworms and 100 generations.

In order to compare the behavior of the EWOA in improving the designed DBSCAN with other meta-heuristics, a genetic algorithm (GA) with the scattered crossover (Fig. 2), single point mutation, $ P_{\text{c}} = 0.8 $ (crossover rate), $ P_{\text{m}} = 0.02 $ (mutation rate), $ {\text{pop}}_{\text{size}} = 100 $ and $ {\text{MaxItr}} = 100 $ is performed and called GA-DBSCAN1. Also, $ P_{\text{c}} $ and $ P_{\text{m}} $ are dynamically tuned by the FLC (Fig. 3) in GA-DBSCAN1-Mamdani and GA-DBSCAN1-Sugeno.

The algorithms are run 20 times on each data cube, and then the best obtained DBI (6), DI (9) and SI (12) are reported as the best quality clustering of the data. The details of the obtained results from implementations are shown in “Appendix” and Tables 6, 7, 8, 9, 10, 11, 12 and 13. The best obtained DBI (6), DI (9) and SI (12) are summarized in Tables 2, 3 and 4, respectively. With regard to the tables, the performance of the EWOA–DBSCAN2-Sugeno (Algorithm 3 based on Takagi–Sugeno’s rules) is significantly superior to that of the other performed algorithms.

Table 2 Best results (DBI) after 20 runs

Full size table

Table 3 Best results (Dunn index) after 20 runs

Full size table

Table 4 Best results (Silhouette index) after 20 runs

Full size table

Also, a comparison of the obtained results of DBSCAN1 and DBSCAN2 can show that except Dow Jones Index dataset for the DBI and except ADL Recognition dataset for Dunn and Silhouette indices, DBSCAN2 on the other datasets obtains between 3.6 and 25.5% better results in Table 2, between 1.6 and 33.6% better results in Table 3 and between 1.5 and 188.3% better results in Table 4. So, the newly proposed similarity metric is more efficient than Euclidean metric.

To compare the functionality of the proposed data cube clustering algorithms, the curve convergences of the best DBI, DI and SI are shown on the datasets in Figs. 7, 8, 9, 10, 11 and 12. The horizontal axis of the figures is measured based on the number of generations from 1 to 100, and the vertical axis is denoted by the best found DBI, DI and SI through improvement. Based on the figures, the EWOA has more control over the search space than the GA. Although the EWOA and the GA have almost the same evolutionary structure, the EWOA generates more offspring than the GA in a generation that the fact enhances the search process and increases the diversity of search points.

To evaluate the significance level of the comparisons for the proposed data cube clustering algorithms, a hypothesis test is done to test the difference in the resulting quality between the algorithms. Because the obtained results of each algorithm are not normally distributed, a nonnormally distributed hypothesis test, such as the Wilcoxon signed-rank test, is utilized in SPSS between two samples at a significant level of α = 0.05. The results are presented in the form of [Z, P] in Table 5. If P value < 0.05, then the null hypothesis (the two samples are dependent samples) can be rejected at the 95% level, but if P value > 0.05, then the null hypothesis cannot be rejected. Therefore, the bold P values present that the comparison of two mentioned clustering algorithms on the related datasets is significant at the 95% level.

Table 5 Results of the Wilcoxon signed-rank test in the form of [Z, P]

Full size table

5 Conclusion and discussion

This paper focuses on the data cube clustering, and the DBSCAN algorithm is considered as the basic clustering technique, because it successfully recognizes nonconvex clusters and does not depend on a given number of clusters as an input. Since the efficiency of the DBSCAN Algorithm is highly dependent on the parameters of the neighboring radius and the number of neighbors, the study tries to improve this algorithm.

For this purpose, first new techniques were proposed in the preprocessing section. Assigning a unique address (1) to each cube cell for clarifying and interpreting the scalability of the proposed data cube clustering algorithms, obtaining the related 2D data from the 3D data by moving dimension (Algorithm 1) and designing similarity metric (Algorithm 2) are those techniques. The DBSCAN1 and the DBSCAN2 were performed based on embedding Euclidean and new similarity metrics in Algorithm 3, respectively.

The EWOA was adapted to improve DBSCAN (Algorithm 4) and find the optimum values for P, µ and ε in Algorithm 3 (EWOA–DBSCAN1 and EWOA–DBSCAN2). To fill up the clustering algorithms’ challenges and enhance the exploration and exploration capabilities of the algorithms, a new FLC (Fig. 3) was designed to dynamically tune the EWOA’s parameters (Algorithm 5) and EWOA–DBSCAN1-Mamdani, EWOA–DBSCAN2-Mamdani, EWOA–DBSCAN1-Sugeno and EWOA–DBSCAN2-Sugeno were performed for analyzing the efficiency of the proposed ideas. The designed FLC has been carried out by Mamdani’s rules and Takagi–Sugeno’s rules.

For comparison of the obtained results, the GA was considered to embed in Algorithms 4 and 5 and GA-DBSCAN1, GA-DBSCAN1-Mamdani and GA-DBSCAN1–Sugeno were performed on the experimental results. Three investigation evaluation indices were considered to find the most similarities between members of the cluster, and on the other hand, they can provide separation issue which is related to less similarity with other clusters.

To evaluate and compare the proposed clustering algorithms, six datasets of data cube were considered and the details of the obtained results are reported in “Appendix”. All experiments indicated the efficiency and improvement of the DBSCAN2 compared to the DBSCAN1 and the EWOA–DBSCAN2–Sugeno compared to the others. In addition, the ρ values of the Wilcoxon signed-rank test were calculated to recognize the significant level of the comparisons for the proposed algorithms. With regard to Table 5, comparisons of the most of the algorithms were significant at the 95% level for all datasets.

References

Angelova M, Pencheva T (2011) Tuning genetic algorithm parameters to improve convergence time. Int J Chem Eng 2011:1–7
Article Google Scholar
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
Article Google Scholar
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28349-8_2
Chapter Google Scholar
Bezdek JC, Pal NR (1995) Cluster validation with generalized Dunn’s indices. In: Proceedings 1995 second New Zealand international two-stream conference on artificial neural networks and expert systems. IEEE
Carvalho DR, Freitas AA (2004) A hybrid decision tree/genetic algorithm method for data mining. Inf Sci 163(1):13–35
Article Google Scholar
Ceci M, Cuzzocrea A, Malerba D (2015) Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering. J Intell Inf Syst 44(3):309–333
Article Google Scholar
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. ACM Sigmod Rec 26(1):65–74
Article Google Scholar
Chen J (2012) Hybrid clustering algorithm based on PSO with the multidimensional asynchronism and stochastic disturbance method. J Theor Appl Inf Technol 46(1):434–440
Google Scholar
Cheng T (2017) An improved DBSCAN clustering algorithm for multi-density datasets. In: Proceedings of the 2nd international conference on intelligent information processing. ACM
Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Phys Procedia 24:1166–1170
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Article Google Scholar
Freitas AA (2003) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsui S (eds) Advances in evolutionary computing. Natural Computing Series, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18965-4_33
Chapter Google Scholar
Gnanapriya S et al (2010) Data mining concepts and techniques. Data Min Knowl Eng 2(9):256–263
Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Hema R, Malik N (2010) Data mining and business intelligence. In: Proceedings of the 4th national conference
Herrera F, Lozano M (2003) Fuzzy adaptive genetic algorithms: design, taxonomy, and future directions. Soft Comput 7(8):545–562
Article Google Scholar
Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. DMKD 3(8):34–39
Google Scholar
Johnson RJ, Williams JP, Bauer KW (2013) AutoGAD: an improved ICA-based hyperspectral anomaly detection algorithm. IEEE Trans Geosci Remote Sens 51(6):3492–3503
Article Google Scholar
Joshi A, Kaur R (2013) A review: comparative study of various clustering techniques in data mining. Int J Adv Res Comput Sci Softw Eng 3(3)
Karafotias G, Hoogendoorn M, Eiben ÁE (2015) Parameter control in evolutionary algorithms: trends and challenges. IEEE Trans Evol Comput 19(2):167–187
Article Google Scholar
Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91(7):1–11
Google Scholar
Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognit 58:39–48
Article Google Scholar
Liço L (2017) Data mining techniques in database systems
Liu J, Lampinen J (2005) A fuzzy adaptive differential evolution algorithm. Soft Comput 9(6):448–462
Article Google Scholar
Mamdani EH, Assilian S (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud 7(1):1–13
Article Google Scholar
Mining WID (2006) Data mining: concepts and techniques. Morgan Kaufinann, Burlington
Google Scholar
Nagar P, Srivastava S (2008) Application of genetic algorithms in data mining. In: 2nd National conference on challenges and opportunities in information technology
Pei Z, Hua X, Han J (2008) The clustering algorithm based on particle swarm optimization algorithm. In: 2008 International conference on intelligent computation technology and automation (ICICTA). IEEE
Pujari AK (2001) Data mining techniques. Universities Press, Cambridge
Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia. ACM
Smiti A, Eloudi Z (2012) DBSCAN-GM: An improved clustering method based on Gaussian means and DBSCAN techniques. In: 2012 IEEE 16th International conference on intelligent engineering systems (INES). IEEE
Smiti A, Eloudi Z (2013) Soft DBSCAN: improving DBSCAN clustering method using fuzzy set theory. In: 2013 The 6th international conference on human system interaction (HSI). IEEE
Takagi T, Sugeno M (1993) Fuzzy identification of systems and its applications to modeling and control. In: Kozma R (ed) Readings in fuzzy sets for intelligent systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp 387–403
Vercellis C (2011) Business intelligence: data mining and optimization for decision making. Wiley, New York
MATH Google Scholar
Wang G-G, Deb S, dos Santos Coelho L (2018) Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems. IJBIC 12(1):1–22
Article Google Scholar
Woo HJ, Joo KH, Park NH (2015) A clustering OLAP analysis in a big data stream environment
Zhao Y-Q, Yang J (2015) Hyperspectral image denoising via sparse representation and low-rank constraint. IEEE Trans Geosci Remote Sens 53(1):296–308
Article Google Scholar
Zhao B et al (2007) Image segmentation based on ant colony optimization and K-means clustering. In: 2007 IEEE International conference on automation and logistics. IEEE

Download references

Funding

The study is not funded by any agency.

Author information

Authors and Affiliations

Department of Computer Engineering, Birjand Branch, Islamic Azad University, Birjand, Iran
Mina Hosseini Rad
Department of Computer Engineering, Faculty of Engineering, Bozorgmehr University of Qaenat, Qaen, Iran
Majid Abdolrazzagh-Nezhad

Authors

Mina Hosseini Rad
View author publications
You can also search for this author in PubMed Google Scholar
Majid Abdolrazzagh-Nezhad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Majid Abdolrazzagh-Nezhad.

Ethics declarations

Conflict of interest

The authors do hereby declare that there is no conflict of interests of other works regarding the publication of this paper.

Ethical approval

The manuscript does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 6, 7, 8, 9, 10, 11, 12 and 13.

Table 6 Details of the experimental results for 20 runs of the DBSCAN1

Full size table

Table 7 Details of the experimental results for 20 runs of the GA-DBSCAN1

Full size table

Table 8 Details of the experimental results for 20 runs of the GA-DBSCAN1-Mamdani

Full size table

Table 9 Details of the experimental results for 20 runs of the GA-DBSCAN1-Sugeno

Full size table

Table 10 Details of the experimental results for 20 runs of the DBSCAN2

Full size table

Table 11 Details of the experimental results for 20 runs of the EWOA–DBSCAN2

Full size table

Table 12 Details of the experimental results for 20 runs of the EWOA–DBSCAN2-Mamdani

Full size table

Table 13 Details of the experimental results for 20 runs of the EWOA–NDBSCAN2-Sugeno

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hosseini Rad, M., Abdolrazzagh-Nezhad, M. A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering. Soft Comput 24, 15529–15549 (2020). https://doi.org/10.1007/s00500-020-04881-0

Download citation

Published: 06 April 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00500-020-04881-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering

Abstract

Similar content being viewed by others

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Large Scale Efficient Clustering Using DBSCAN and Ensemble Techniques

1 Introduction