Rough set–BPSO model for predicting vitamin D deficiency in apparently healthy Kuwaiti women based on hair mineral analysis

Own, Hala S.; Alyahya, Khulood O.; Almayyan, Waheeda I.; Abraham, Ajith

doi:10.1007/s00521-016-2454-x

Rough set–BPSO model for predicting vitamin D deficiency in apparently healthy Kuwaiti women based on hair mineral analysis

Review
Published: 23 July 2016

Volume 29, pages 329–344, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Rough set–BPSO model for predicting vitamin D deficiency in apparently healthy Kuwaiti women based on hair mineral analysis

Download PDF

Hala S. Own¹,
Khulood O. Alyahya²,
Waheeda I. Almayyan³ &
…
Ajith Abraham⁴

401 Accesses
1 Citation
Explore all metrics

Abstract

Vitamin D deficiency is prevalent in the Arabian Gulf region, especially among women. Recent researches show that, the vitamin D deficiency is associated with mineral status of patient. Therefore, it is important to assess the mineral status of patient to reveal the hidden mineral imbalance associated with vitamin D deficiency. A well-known test such as the red blood cells is fairly expensive, invasive, and less informative. On the other hand, a hair mineral analysis can be considered an accurate, excellent, highly informative tool to measure mineral imbalance associated with vitamin D deficiency. In this study, 120 apparently healthy Kuwaiti women were assessed for their mineral levels and vitamin D status by hair and serum samples, respectively. This information was used to build a computerized model that would predict vitamin D deficiency based on its association with the levels and ratios of minerals. The model introduces a two-stage reduction technique based on BPSO and rough set theory as attribute reduction and rules extraction to predicting vitamin D deficiency. The results show that the proposed model (RS + BPSO), not only can effectively detect the deficiency in vitamin D, but can also provide valuable information with regard to the mineral imbalance as a cause of deficiency which should be addressed in any treatment management. To the best of our knowledge, this is the first work that predicts vitamin D deficiency based on hair minerals analysis.

Prediction and Analysis of Vitamin D Deficiency Using Machine Learning Algorithms

Machine learning approach for the detection of vitamin D level: a comparative study

Article Open access 16 October 2023

Hair Mineral and Trace Element Content in Children with Down’s Syndrome

Article 12 September 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The Middle Eastern populations are among the lowest to obtain vitamin D levels, and vitamin D deficiency has been reported repeatedly from many parts of the region including the Arabian Gulf countries [1]. Females seem to have a higher risk of deficiency compared to males [2]. Levels below 25 nmol/L have been found common in 81 % of Saudi females [4], 78.4 % of Kuwaiti adolescent females [5], 67.6 % of Bahraini females [6], and in 51.4 % of Qatari females [7]. Limited sun exposure due to veiling, high temperature, and economic status is among the risk factors of vitamin D deficiency [5].

Numerous studies show that low serum levels of 25-hydroxyvitamin D (25OHD), the generally accepted biomarker of vitamin D status, are linked to an array of ailments including but not limited to osteoporosis, cardiovascular disease, diabetes, obesity, cancer, and autoimmune disorders [6]. In addition, low levels of vitamin D in pregnant mothers are associated with low birth weight and higher risk of developing immune-related disorders in their newborns [7].

The evidence suggests that maintaining an adequate vitamin D status plays a critical role in the prevention of disease, and for this reason, vitamin D supplementation, both orally and intravenously, has been widely and heavily administered in vitamin D deficient patients, only to find little or no effect on the outcome of the disease. In fact, studies have not been able to confirm the benefit of supplementation with vitamin D in cases of osteoporosis, cardiovascular disease, hypertension, obesity, and autoimmune diseases [8]. In musculoskeletal studies, however, supplementation with vitamin D has shown a significant benefit in reducing falls in the elderly [9]. In conclusion to the recent scientific data, vitamin D deficiency, which was once believed to be a cause of disease, is now conceived as a sign of illness and health imbalance.

Nutrient interrelations are complex because they depend on each other to perform a single function. An imbalance in one nutrient can affect the absorption, metabolism, and accumulation of other nutrients [9]. For example, vitamin D deficiency can cause rickets and osteomalacia due to its role in calcium metabolism in the bone. Excessive vitamin D intake, on the other hand, can cause magnesium deficiency. Because magnesium is required for the activation and transportation of vitamin D throughout the body, magnesium deficiency, in turn, can cause serum vitamin D deficiency [10]. Thus, a typical deficiency in one nutrient may be secondary to the imbalance of another nutrient. This is important to consider when managing a treatment of a disease, as addressing only a single nutrient may not be effective enough. Studies have shown inconsistent results on bone mineral density when supplementing with vitamin D alone [9], but combining vitamin D with other nutrients has shown to be significantly effective in raising bone mineral density. The literature shows that vitamin D deficiency is associated with calcium, phosphorus, and magnesium, and an earlier article shows that vitamin D has a close and synergetic relation with calcium, magnesium, sodium, copper, and selenium [10]. Therefore, it is important to assess the mineral status of patients to reveal the hidden mineral imbalance associated with vitamin D deficiency.

Widespread use of medical information systems and explosive growth of medical databases require efficient mechanism methods to be coupled with clinical analysis. Data mining has been applied to the medical domain as a significant tool for knowledge discovery. Medical data mining has played a role in discovering hidden pattern, noticing relations, and generating decision rules. Such rules have been massively applied especially in the area of medical diagnosis and treatment [11].

Several computational intelligent (CI) methods such as neural networks, fuzzy sets, decision trees, and expert systems are effectively applied in the medical field [12]. The main role of the CI tools is to tackle with potential problems in medical data such as, missing data, inconsistent information, and extraction of meaningful information from a huge set of irrelevant data. Feature selection and reduction have a played great role in deducing the hidden trends within the medical data with the purpose of identifying the most significant attributes in the dataset [13]. Therefore, feature reduction has a great impact on the effectiveness and complexity of the classification process [14].

Rough set theory (RS) is one of the most motivating areas of CI research; it become increasingly popular with medical applications. One of the major advantages of RS is that it does not require any additional information about the data such as probability distribution or grade membership [15]. RS theory also was proved to be competent in handling inconsistency which is a common problem within medical data. Such inconsistency is normally caused by the existence of an indiscernibility relation in data. Unlike traditional computational intelligent techniques, RS does not require any preprocessing to eliminate the inconsistency [16]. Moreover, RS have an efficient algorithm for dimensionality reduction.

2 Research motivation and contribution

2.1 Motivation

Vitamin D status plays a critical role in the prevention of disease therefore; improvement in predictive accuracy of vitamin D deficiency is highly required. Previous reported work in predicting vitamin D deficiency has several potential limitations that may not accurately reflect the vitamin D status.

1.
Some studies used statistical methods like multiple linear regression (MLR). Though, the constructed model based on statistical methods usually depends on a predetermined model to predict the occurrence or not of an event by fitting data to a logistic curve. In addition, the computational complexities of real-time classification problems are highly nonlinear in nature. In a research published in 2013, Guo et al. [17] suggested the using of support vector regression (SVR) model to predict the serum 25(OH) D concentrations which indicates the status of vitamin D. However, the accuracy of the technique highly depends on choosing appropriate model parameters and kernel function which is always done in trial-and-error manner.
2.
Most of published works depend on the data gathered by questionnaire surveys including self-reported sun exposure and sun protection behavior and different physical activity. However, the data collected through questionnaire may not be reliable, as the validity of the person response cannot be verified. Moreover, with questionnaire it is very difficult to detect errors, and there is a great possibility of misinterpretation of person responds. Consequently, it may lead to misclassification.
3.
Several researchers depend on blood sampling test to assessed vitamin D status (25OHD); however, on extracellular fluids such as blood, urine, and sweet may not provide a good indication of the minerals activity in the body. This is because the blood tends to balance the minerals continuously either by depositing the excessive minerals into the tissues or by forcing them out of the cells and excreting them into the urine and sweet. Measuring minerals in intracellular fluids such as the red blood cells is a well-known test, but fairly expensive, invasive, and less informative compared to the hair tissue [18].

A hair mineral analysis is a tissue biopsy that is safe, noninvasive, and relatively inexpensive. Provided the accuracy and expertise of the laboratory, the test can be considered an accurate and an important tool in diagnostics and monitoring. The hair mineral analysis is a standard test applied in routine forensic assessment, and in clinical and research evaluation of minerals and toxic metals [19]. It is also highly informative because it measures a large number of minerals and their ratios including both essential and toxic minerals just by a small amount of hair that is cut from the scalp. Hair tissue, like any other tissue in the body, contains minerals that are deposited as the hair grows [18]. Thus, the hair mineral analysis is an excellent tool in assessing the nutritional balance [20].

Recently, RS and binary particle swarm optimization (BPSO) have been useful tools in medical research. In this article, we build a computerized model that would predict vitamin D deficiency based on its association with the levels and ratios of minerals measured by hair tissue samples. We propose a hybrid model based on the integration of RS and BPSO to build a novel classification model.

2.2 Contribution

Hair mineral analysis tool is not as common as requesting a blood test to measure 25OHD, nevertheless the information obtained with using the hair mineral analysis can assist in the prognosis of the disease, including the likelihood of vitamin D deficiency. The main focus of this study is to develop a computerized model to predict vitamin D deficiency using the minerals levels and ratios as predictive features. We claim our study is the first trial to predict vitamin D deficiency based on hair minerals samples.

Healthcare-related data mining is one of the most promising fields for discovering new facts and trends from large quantities of data. Therefore, excluding irrelevant and redundant features usually improves the understanding of the suggested computational models. Moreover, high-dimensional dataset usually worsens the classification accuracy. Consequently, feature selection has become a vital step in many biomedical data mining problems due to their ability to handle high dimensionality of input attributes.

In this study, the gathered features were reduced through BPSO algorithm firstly to construct an intermediate decision table. Then, RS was applied as a feature reduction algorithm and rules extraction technique on the intermediate decision table to perform classification. The proposed system concludes with a set of minimal minerals which has direct effect in predicting vitamin D deficiency. The results show that the proposed model, not only can effectively detect the deficiency in vitamin D, but can also provide valuable information with regard to the mineral imbalance as a cause of deficiency which should be addressed in any treatment management. To the best of our knowledge, this is the first work that predicts vitamin D deficiency based on hair minerals analysis.

Since there is considerable discussion on the serum concentration of 25OHD associated with the deficiency, two main experiments with 25 and 50 nmol/L serum concentrations were conducted to confirm the proposed model validity. The results confirm that the proposed model has greatly reduced the feature dimension while improving the classification performance as well as the outcomes concordance with the recent clinical literature. The experimental results showed how the integration approaches increased the predictive relationship between vitamin D deficiency and mineral levels extracted from the hair samples. The proposed model can be employed by specialists when deciding the clinical regimen for their patients.

The remainder of the article is organized as follows. Section 3 introduces the preliminary of RS and BPSO. Section 4 presents the suggested RS + BPSO hybrid model methodology. Section 5 describes the data gathering procedure. Sections 6 and 7 illustrate the experimental analysis and discussion. Conclusion and future work are given in Sect. 8.

3 Preliminary

3.1 Rough set theory

RS theory is a mathematical approach for handling vagueness and uncertainty in data analysis. A RS is characterized by a pair of precise concepts, called lower and upper approximations, generated using object indiscernibilities. The main advantage of rough set theory is that it does not need any preliminary or additional information about data. A detailed information about the RS can be found in [21].

3.2 PSO for feature selection

The particle swarm optimization (PSO) approach has recently gained more attention for solving optimization and feature subset selection problems. As an algorithm, the main strength of PSO is its fast convergence with few parameters, which compares favorably with many global optimization algorithms because it has strong search capability with flexible, well-balanced, and efficiently mechanism to find the set of best feature in data knowledge discovery domain. Moreover, its mathematical base is simple reasonable computational cost [22].

The particle swarm optimization (PSO) technique is a population-based stochastic optimization technique first introduced in 1995 by Kennedy and Eberhart. A detailed description with a lot of background information can be found in their textbook swarm intelligence [23]. In PSO, a possible candidate solution is encoded as a finite-length string called a particle p _i in the search space. All of the particles make use of its own memory and knowledge gained by the swarm as a whole to find the best solution. With the purpose of discovering the optimal solution, each particle adjusts its searching direction according to two features, its own best previous experience (p _best) and the best experience of its companions flying experience (g _best). Particles evolve simultaneously based on knowledge shared with neighboring particles; they make use of their own memory and knowledge gained by the swarm as a whole to find the best solution. The best previous experience of all neighbors of particle i is called g _best. Each particle additionally keeps a fraction of its old velocity. The particle updates its velocity and position with the following equation in continuous PSO [23]:

$$v_{pd}^{\text{new}} = \omega \times v_{pd}^{\text{old}} + C_{1} \times {\text{rand}}_{1} \left( {} \right) \times \left( {p{\text{best}}_{pd} - x_{pd}^{\text{old}} } \right) + C_{2} \times {\text{rand}}_{2} \left( {} \right) \times \left( {g{\text{best}}_{{d_{d} }} - x_{pd}^{\text{old}} } \right)$$

(1)

$$x_{pd}^{\text{new}} = x_{pd}^{\text{old}} + v_{pd}^{\text{new}}$$

(2)

The acceleration coefficients (C ₁) and (C ₂) are constants represent the weighting of the stochastic acceleration terms that pull each particle toward the p _best and g _best positions. Therefore, the adjustment of these acceleration coefficients changes the amount of “tension” in the system. In the original algorithm, the value of (C ₁ + C ₂) is usually limited to 4 [24]. Particles’ velocities are restricted to a maximum velocity, V _max. According to Eq. 2, the particle’s new velocity is calculated according to its previous velocity and the distances of its current position from its own best experience and the group’s best experience. Afterward, the particle flies toward a new position according to Eq. 2.

3.3 Binary PSO

Kennedy and Eberhart [25] presented a discrete binary version of PSO algorithm specifically to handle discrete problems since that the initial PSO algorithm was developed for a space of continuous values, where the variable domain is finite. In the binary version of PSO, the particles are represented by binary values (0 or 1). Each particle velocity is updated according to the following equations:

$$S\left( {V_{\text{id}}^{\text{new}} } \right) = \frac{1}{{1 + {\text{e}}^{{ - V_{\text{id}}^{\text{new}} }} }}$$

(3)

$$x_{\text{id}} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\left( {{\text{rand}} < S\left( {V_{\text{id}}^{\text{new}} } \right)} \right)} \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$

(4)

where $S\left( {V_{\text{id}}^{\text{new}} } \right)$ denotes the particle velocity obtained from Eq. 1, function $S\left( {V_{\text{id}}^{\text{new}} } \right)$ is a sigmoid transformation, and rand is a random number selected from a uniform distribution U(0, 1).

4 Rough set–BPSO hybrid model for predicting vitamin D deficiency

The proposed model has two main stages; the feature selection stage, which integrates BPSO and RS to extract the most prominent features. The second stage employs RS rule extraction and classification capabilities to build a model for predicting vitamin D deficiency based on its association with the levels and ratios of minerals. The formal steps of the BPSO + RS method are stated in the following algorithm:

The entire system flow of the proposed approach is shown in Fig. 1. The individual stages detailed description is introduced in the following subsections.

4.1 Feature selection stage

In medical datasets, most of the collected features are not significant or redundant. These features may be deleterious in the case of relatively small training sets, where this irrelevancy and redundancy are harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature selection is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features [14].

Recently, RS and PSO have been useful tools in medical research. Wang et al. [26] applied a RS-based method to predict the degree of malignancy in brain glioma. For feature selection, the researchers applied genetic algorithm PSO and RS-based feature selection (PSORSFS). Inbarani et al. [27] applied the RS two techniques of unsupervised quick reduct (USQR) unsupervised relative reduct (USRR) and the unsupervised PSO-based relative reduct (USPSO-RR) for gene expression dataset. Inbarani et al. [14] introduced a supervised feature selection methods based on hybridization of RS and PSO for the diseases diagnosis. The proposed feature selection hybridized PSO-based techniques; quick reduct and relative reduct methods. These were tested for their ability to diagnose four different medical datasets including erythemato-squamous diseases, breast tissue, prognostic, and cardiac disorders.

In this article, we introduce an efficient two-stage reduction technique for improving the classification accuracy and for detecting an optimal feature subset selection. There are three basic reasons for applying two-level reduct steps:

1.
The efficiency of RS would be affected by the scale of dimensionality. Therefore, quick reduct algorithm fails to scan most of the reducts when the number of features is high [28].
2.
The relative dependency method makes backward elimination of the attributes using the relative dependence parameter, and the proportion of the number of the indiscernible subsets is evaluated. One drawback of this algorithm is that in a case when the numbers of the subsets in the numerator and denumerator parts are equal. The relative dependency which is measured in the lower nodes usually halts the algorithm if its value is below 1 [28].
3.
PSO is a well-known tool to obtain improved results in high-dimensional search spaces by finding the optimum characteristics with the help of local and global search in an iterative manner [29]. It does not need complex operator but only require simple mathematical operators without highly time and memory consuming.

In the proposed feature selection technique, we try to combine the advantages of RS and BPSO to empower the ability for feature selection. Initially, the BPSO was applied to the vitamin D dataset to obtain the reduct information table. Due to the fact that RS theory cannot directly process continuous values of features, the obtained reducted information table was discretized first. We adopted, in this article, RS with Boolean reasoning (RSBR) algorithm proposed by Zhong et al. [30]. The main advantage of RSBR is that it combines discretization of real-valued features and classification. The algorithm tries to discover fewer breakpoints with larger dependency degree between features. It uses a bottom-up approach which adds cuts for a given attribute one-by-one in subsequent iterations [31].

The yielded discretized information table is fed to the reduction techniques using RS. An algorithm called “dynamic reduct,” implemented in RESE [31], attempts to calculate a minimal reduct. The process requires several subtables to be examined in order to find the frequently repeating minimal subsets of features (reduct). Such dynamic reducts may be calculated for general or decision-related indiscernibility relation [31]. The output of this stage is a minimal set of features which preserve the power of the whole features.

4.2 Rule extraction and classification stage

Extracting comprehensible classification rules is the most emphasized concept in data mining. As we mentioned before, to transform a reduct into a rule, we have to bind the condition feature values of the object class from which the reduct originated to the corresponding features of the reduct. Then, to complete the rule, a decision part comprising the resulting part of the rule is added. This is done in the same way as for the condition features. To classify objects, which have never been seen before, rules generated from a training set will be used. These rules represent the actual classifier. This classifier is used to predict to which class the new objects are attached. The nearest matching rule is determined as the one whose condition part differs from the feature vector of re-object by the minimum number of features [33]. Once the rules are generated, the classification will begin using the test set to predict the actual class for each object in the test set. The output of the classification is the confusion matrix (as shown in later in Table 4), which records correctly and incorrectly recognized objects for each class in the test set.

5 Data gathering

The data used in this study was granted ethical approval by the Joint Committee for the Protection of Human Subjects in Research combined by Kuwait Institute for Medical Specialization KIMS–Ministry of Health and AbdulMihsin Al-Abdulrezzag Health Sciences Centre HSC–Health Sciences Centre–Kuwait University.

The data were collected from healthy Kuwaiti women between the age of 19–49 years old, not suffering from any chronic illness, not pregnant or nursing, and not taking supplements for the past 6 months. Women from a previous study and from the college of Basic Education, PAAET were approached and invited to participate in the study. Those who agreed were recalled for a full examination, during which a total blood sample of 10 mL and a ≤ 3 cm hair long (~0.15–0.2 mg) cut from the scalp side of the hair was obtained. The hair was collected from 2 to 3 areas on the back of the head (mid-parietal to occipital region).

The collected serum samples were shipped on dry ice to The Doctors Laboratory (TDL) in London, UK (http://www.tdlpathology.com/), where they were measured for 25OHD by a radio-immunoassay, whereas the hair samples were shipped to the Trace Elements Laboratory (TEI), USA (http://www.traceelements.com/), to determine the hair mineral levels and ratios. Each hair sample was collected in a separate and sealed envelope.

The women in this dataset had a high prevalence of vitamin D deficiency; 81.4 % had levels <50 nmol/L, and 74.6 % had levels <25 nmol/L. These women are not restricted to a particular residence; in fact, the dataset includes women from all of the six governorates in Kuwait with no significant difference in serum 25OHD.

The hair mineral analysis shows a wide variation in the mineral status among the women, with more than 60 % of them obtaining calcium levels beyond the laboratory reference range, and 39 % of them having potassium levels below the reference range. The report of the hair mineral analysis includes nutritional minerals such as calcium, magnesium, sodium, and potassium, trace nutritional minerals such as lithium, germanium, and vanadium, and heavy toxic metals such as arsenic, lead, mercury, and cadmium (Table 1). Further information regarding method of hair collection, minerals tested, and the resulting report can be found on their website.

Table 1 List of minerals

Rough set–BPSO model for predicting vitamin D deficiency in apparently healthy Kuwaiti women based on hair mineral analysis

Abstract

Similar content being viewed by others

Prediction and Analysis of Vitamin D Deficiency Using Machine Learning Algorithms

Machine learning approach for the detection of vitamin D level: a comparative study

Hair Mineral and Trace Element Content in Children with Down’s Syndrome

Explore related subjects

1 Introduction

2 Research motivation and contribution

2.1 Motivation

2.2 Contribution

3 Preliminary

3.1 Rough set theory

3.2 PSO for feature selection

3.3 Binary PSO

4 Rough set–BPSO hybrid model for predicting vitamin D deficiency

4.1 Feature selection stage

4.2 Rule extraction and classification stage

5 Data gathering

6 Experimental and performance analysis

6.1 Assessment metrics

6.2 Experiments analysis

6.3 Classification performance discussion

6.4 Measure the effectiveness of using BPSO

7 Discussion

7.1 Vitamin D, calcium, and magnesium

7.2 Vitamin D, zinc, and copper

7.3 Vitamin D, iron, and cobalt

7.4 Vitamin D, potassium, and sodium

7.5 Vitamin D, selenium, and sulfur

7.6 Vitamin D, aluminum, cadmium, and mercury

7.7 Vitamin D and age

8 Conclusion and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation