Threshold Based Clustering Algorithm Analyzes Diabetic Mellitus

Mulay, Preeti; Joshi, Rahul Raghvendra; Anguria, Aditya Kumar; Gonsalves, Alisha; Deepankar, Dakshayaa; Ghosh, Dipankar

doi:10.1007/978-981-10-3156-4_3

Preeti Mulay¹⁸,
Rahul Raghvendra Joshi¹⁸,
Aditya Kumar Anguria¹⁸,
Alisha Gonsalves¹⁸,
Dakshayaa Deepankar¹⁸ &
…
Dipankar Ghosh¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 516))

973 Accesses
2 Citations

Abstract

Diabetes Mellitus is caused due to disorders of metabolism and its one of the most common diseases in the world today, and growing. Threshold Based Clustering Algorithm (TBCA) is applied to medical data received from practitioners and presented in this paper. Medical data consist of various attributes. TBCA is formulated to effectually compute impactful attributes related to Mellitus, for further decisions. TBCAs primary focus is on computation of Threshold values, to enhance accuracy of clustering results.

Please note that the LNCS Editorial assumes that all authors have used the western naming convention, with given names preceding surnames. This determines the structure of the names in the running heads and the author index.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care

Estimating the Risk of Diabetes Using Association Rule Mining Based on Clustering

Classification of Diabetes Mellitus Disease (DMD): A Data Mining (DM) Approach

Keywords

1 Introduction

Diabetes is emerged as a major healthcare problem in India and every year it is affecting large number of people. The data science based Knowledge Management System (KMS) in health care industry is getting attention to draw effective recommendations to cure the patient in its early stages [1, 2]. The knowledge augmented through KMS is an asset for society and incremental learning triggers knowledge augmentation [3, 4]. Online interactive data mining tools are available for incremental learning [5]. The threshold acts as a key in incremental learning to investigative formed closeness factors [6]. This approach in a way may change pattern of diabetes diagnosis [6–10]. In this study proposed TBCA is applied on the values of attributes that are collected from patient’s medical reports. TBCA implementation unleashes hidden relationships among attributes to extract impactful and non impactful attributes for diabetes mellitus.

In Sect. 2, TBCA is presented. In the following sections i.e., in Sect. 3 the methodology used for its implementation, in Sect. 4 analysis of obtained results, in Sect. 5 concluding remarks and at the last section, references used to carry out this study are listed.

2 TBCA

This section presents a high level pseudo code for TBCA in two parts to show TBCA is an extended version of Closeness Factor Based Algorithm (CFBA).

3 Methodology Used to Implement TBCA

TBCA data set considers medical reports of working adult diabetic patients having age group between 35–45 years for the year 2015–2016. TBCA works in three different phases as mentioned below:

(1)
In pre-processing input is taken as a CSV file and closeness factor value is calculated by taking into account different possibilities like sum wise, series wise, total weight and error factor for each data series set. The computed values are exported as a CSV file.
(2)
In clustering, clusters are formed based on closeness values that are generated through preprocessing for a particular data series and formed clusters are stored in a new CSV file in an incremental fashion.
(3)
Post clustering phase is used to extract values of attributes from the formed clusters for further analysis. The attributes related to diabetes mellitus are extracted on the basis of threshold where lower limit is mean of a cluster and upper limit is its higher value. These eight attributes are mentioned in Table 1 where first four are impactful and remaining are non impactful. The following figures represent processing done on 5 K data sets during phases of TBCA in a single and in multiple iterations.
Table 1 Impactful and non impactful attributes for diabetes mellitus
Full size table

4 TBCA’s Analysis

TBCA aims to find out impactful and non impactful attributes and for the same following types of analysis are carried out.

(1)
Related attributes analysis: The mean value of each attribute of every cluster is taken into account to analyze related attributes in a single and multiple iterations on data sets as shown in Figs. 1 and 2. The graphs for some of the related attribute analysis are shown below and they depict their behaviour pattern graphically (Fig. 3).
Fig. 1
Processing of 5 K data series in single iteration of TBCA
Full size image

Fig. 2
Processing of 5 K data series in multiple iterations of TBCA
Full size image

Fig. 3
HDL versus Non HDL Cholesterol, VLDL versus Non HDL Cholesterol analysis
Full size image
(2)
Outlier analysis to extract impactful attributes: The outlier deviation analysis of datasets with extracted eight attributes is carried out which results in depiction of the deviation of the outlier values from the cluster deviation values. The generated pattern in shown in outlier analysis and it is observed that outlier detection in clustering plays a vital role. The patterns depicted via the statistical graph in Cluster 2 deviation versus outlier deviation for diabetes datasets in Fig. 4. In Fig. 4, after analysis of deviation of each cluster against the outlier deviation, it is observed that attributes BLOOD GLUCOSE FASTING, BLOOD GLUCOSE PP, CHOLESTEROL and TRIGLYCERIDES are the main factors that are responsible for the generation of the outliers as deviation of the other cluster attributes are overlapping with the outlier deviation. This pattern is cross verified through cluster 2 averages versus outlier average graph shown in another part of Fig. 4.
Fig. 4
Clusters, outlier average and clusters deviation, outlier deviation analysis
Full size image

4.1 Accuracy/Purity of TBCA

The following formula is used for calculation of accuracy or purity of TBCA.

$$ = \left( {100 - \frac{{({\text{Clustering}}\,{\text{value}}\,{\text{of}}\,{\text{multiple}}\,{\text{iteration}} - {\text{Clustering}}\,{\text{value}}\,{\text{of}}\,{\text{single}}\,{\text{iteration}})}}{{{\text{Clustering}}\,{\text{value}}\,{\text{of}}\,{\text{multiple}}\,{\text{iteration}}}}{ \times }100} \right) $$

where clustering value = cluster count for cluster that contains maximum clustered data for a particular iteration.

The accuracy/purity of TBCA is based on clustering value for single iteration and in multiple iterations on same dataset. As shown in Figs. 1 and 2, the first cluster has the maximum weight age (42 and 46 % of the total data resides there) and hence it contains maximum clustered datasets. Therefore, the cluster count or clustering value of this cluster is used calculate the accuracy or purity of TBCA. This accuracy signifies processing of raw datasets and creation precise clusters in single as well as multiple iterations as shown in Figs. 1 and 2 over the same datasets. The multiple iterations on same dataset work in an incremental fashion and confirm cluster members independent of their order, CFBA parameters.

5 Concluding Remarks and Outlook

TBCA proved to be very useful in obtaining inter attribute relationship and outlier value knowledge over various iterations in an accurate manner which eventually triggered towards finding of key attributes related to diabetes mellitus. TBCA has showed 91.9 % of accuracy over single or in several iterations on data set under consideration. It can be effectively used in healthcare domain for prediction of a particular disease like diabetes mellitus. It involves novel mechanism of formation of clusters based on closeness factor and then by using threshold to extract required attributes leading to crisp prediction of impactful set of attributes among them for diabetes mellitus. If a person is suffering from diabetes mellitus properly keeps track of impactful attributes then he/she can manage to cure at early stages. These extracted impactful attributes can act as a catalyst for IT industries for those that are working on medical reports of patients in order to suggest life style management recommendations to cure them from certain diseases. These impactful attributes can also bring revolution in diabetic mellitus patient’s treatment in terms of test on a patient for its diagnosis. TBCA algorithm in turn plays a vital role in augmentation of generated knowledge for diabetes mellitus and may also change current way of pathology practices for diagnosis of diabetes mellitus. So, TBCA may prove best in all other disease prediction, being applied across domain, not restricted.

References

K.R. Lakshmi, S.P. Kumar, Utilization of data mining techniques for prediction of diabetes disease survivability. Int. J. Sci. Eng. Res. 4(6), 933–940 (2013)
Google Scholar
D.S. Vijayarani, M.P. Vijayarani, Detecting outliers in data streams using clustering algorithms. Int. J. Innov. Res. Comput. Commun. Eng. 1(8), 1749–1759 (2013)
Google Scholar
P. Mulay, P.A. Kulkarni, Knowledge augmentation via incremental clustering: new technology for effective knowledge management. Int. J. Bus. Inf. Syst. 12(1), 68–87 (2013)
Google Scholar
P.A. Kulkarni, P. Mulay, Evolve systems using incremental clustering approach. Evol. Syst. 4(2), 71–85 (2013)
Article Google Scholar
M. Borhade, P. Mulay, Online interactive data mining tool. Proc. Comput. Sci. 50, 335–340 (2015)
Article Google Scholar
P. Mulay, Threshold computation to discover cluster structure: a new approach. Int. J. Electr. Comput. Eng. (IJECE), 6(1) (2016)
Google Scholar
R.J. Singh, W. Singh, Data mining in healthcare for diabetes mellitus. Int. J. Sci. Res. (IJSR) 3(7), 1993–1998 (2014)
Google Scholar
S.M. Gaikwad, P. Mulay, R.R. Joshi, Attribute visualization and cluster mapping with the help of new proposed algorithm and modified cluster formation algorithm to recommend an ice cream to the diabetic patient based on sugar contain in it. Int. J. Appl. Eng. Res. 10 (2015)
Google Scholar
M.W. Berry, J.J. Lee, G. Montana, S. Van Aelst, R.H. Zamar, Special issue on advances in data mining and robust statistics. Comput. Stat. Data Anal. 93(C), 388–389 (2016)
Google Scholar
M.S. Tejashri, N. Giri, Prof S.R. Todamal, Data mining approach for diagnosing type 2 diabetes. Int. J. Sci. Eng. Technol. 2(8), 191–194 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CS and IT, Symbiosis Institute of Technology (SIT), Symbiosis International University (SIU), Pune, India
Preeti Mulay, Rahul Raghvendra Joshi, Aditya Kumar Anguria, Alisha Gonsalves, Dakshayaa Deepankar & Dipankar Ghosh

Authors

Preeti Mulay
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Raghvendra Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Kumar Anguria
View author publications
You can also search for this author in PubMed Google Scholar
Alisha Gonsalves
View author publications
You can also search for this author in PubMed Google Scholar
Dakshayaa Deepankar
View author publications
You can also search for this author in PubMed Google Scholar
Dipankar Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Preeti Mulay .

Editor information

Editors and Affiliations

Anil Neerukonda Inst. of Tech. & Sci., Prof., Dept. of Computer Sci. & Engg. Anil Neerukonda Inst. of Tech. & Sci., Vishakapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
Professional Colleges (SRMGPC), Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
SCIS, University of Hyderabad , Hyderabad, India
Siba K. Udgata
KIIT University, School of Computer Engineering KIIT University, Bhubaneswar, Odisha, India
Prasant Kumar Pattnaik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mulay, P., Joshi, R.R., Anguria, A.K., Gonsalves, A., Deepankar, D., Ghosh, D. (2017). Threshold Based Clustering Algorithm Analyzes Diabetic Mellitus. In: Satapathy, S., Bhateja, V., Udgata, S., Pattnaik, P. (eds) Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications . Advances in Intelligent Systems and Computing, vol 516. Springer, Singapore. https://doi.org/10.1007/978-981-10-3156-4_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-3156-4_3
Published: 03 March 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3155-7
Online ISBN: 978-981-10-3156-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Threshold Based Clustering Algorithm Analyzes Diabetic Mellitus

Abstract

Similar content being viewed by others

Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care

Estimating the Risk of Diabetes Using Association Rule Mining Based on Clustering

Classification of Diabetes Mellitus Disease (DMD): A Data Mining (DM) Approach

Keywords

1 Introduction

2 TBCA

3 Methodology Used to Implement TBCA

4 TBCA’s Analysis

4.1 Accuracy/Purity of TBCA

5 Concluding Remarks and Outlook

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Threshold Based Clustering Algorithm Analyzes Diabetic Mellitus

Abstract

Similar content being viewed by others

Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care

Estimating the Risk of Diabetes Using Association Rule Mining Based on Clustering

Classification of Diabetes Mellitus Disease (DMD): A Data Mining (DM) Approach

Keywords

1 Introduction

2 TBCA

3 Methodology Used to Implement TBCA

4 TBCA’s Analysis

4.1 Accuracy/Purity of TBCA

5 Concluding Remarks and Outlook

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation