Abstract
Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic tests. This paper focuses on effective identification of informative genes. A newly developed gene selection criterion using the concept of Bayesian discriminant is used. The criterion measures the classification ability of a feature set. Excellent gene selection results are then made possible. Apart from the cost function, this paper addresses the drawback of conventional sequential forward search (SFS) method. New genetic algorithms based Bayesian discriminant criterion is designed. The proposed strategies have been thoroughly evaluated on three kinds of cancer diagnoses based on the classification results of three typical classifiers which are a multilayer perception model (MLP), a support vector machine model (SVM), and a 3-nearest neighbor rule classifier (3-NN). The obtained results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
A. Al-Ani and M. Deriche, “Optimal feature selection using information maximisation: case of biomedical data,” in Proc. of the 2000 IEEE Signal Processing Society Workshop, vol. 2, 2000, pp. 841–850.
C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, 1995.
J. Casillas, O. Cordon, M. J. Del Jesus, and F. Herrera, “Genetic Feature Selection in a Fuzzy Rule-based Classification System Learning Process for High-dimensional Problems,” Inf. Sci., vol. 136, 2001, pp. 135–157.
X. W. Chen, “Gene Selection for Cancer Classification Using Bootstrapped Genetic Algorithms and Support Vector Machines,” Proc. Bioinformatics Conference, 2003.
I. Cheng, D. O. Stram, K. L. Penney, M. Pike et al., “Common Genetic Variation in IGF1 and Prostate Cancer Risk in the Multiethnic Cohort,” J. Natl. Cancer Inst., vol. 98, no. 2, 2006, pp. 123–124.
M. L. Chow, E. J. Moler, and I. S. Mian, “Identifying Marker Genes in Transcription Profiling Data Using a Mixture of Feature Relevance Experts,” Physiol. Genomics, vol. 5, 2001, pp. 99–111.
J. Deutsch, “Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction,” Bioinformatics, vol. 19, 2003, pp. 45–52.
S. Ding et al., “A Genetic Algorithm Applied to Optimal Gene Subset Selection,” Evolutionary Computation, Congress on, CEC2004. vol. 2, 2004, pp. 1654–1660, Jun.
P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ, 1982.
Kai-bo Duan et al., “Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data,” IEEE Trans. Nanobioscience, vol. 4, no. 3, 2005, pp. 228–234, Sep.
S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of Discrimination Methods for the Classification of Tumours Using Gene Express Data,” J. Am. Stat. Assoc., vol. 97, no. 457, 2002, pp. 77–87.
R. Ekins and F. W. Chu, “Microarrays: Their Origins and Applications,” Trends Biotech., vol. 17, 1999, pp. 217–218.
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, 1999, pp. 531–537.
J. R. Graff, J. A. Deddens, B. W. Knoicek, B. M. Colligan et al., “Integrin-linked Kinase Expression Increases with Prostate Tumor Grade,” Clin. Cancer Res., vol. 7, 2002, pp. 1987–1991.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Mach. Learn., vol. 46, 2002, pp. 389–422.
D. Huang and T. W. S. Chow, “Efficiently Searching the Important Input Variables Using Bayesian Discriminant,” IEEE Trans. Circuits Syst., vol. 52, no. 4, 2005, pp. 785–793.
D. Huang, T. W. S. Chow, E. W. M. Ma, and J. Li, “Efficient Selection of Salient Features from Microarray Gene Expression Data for Cancer Diagnosis,” IEEE Trans. Circuits Syst. Part I, vol. 52, no. 9, 2005, pp. 1909–1918.
C. Jerónimo, R. Henrique, J. Oliveira, F. Lobo et al., “Aberrant Cellular Retinol Binding Protein 1 (CRBP1) Gene Expression and Promoter Methylation in Prostate Cancer,” J. Clin. Pathol., vol. 57, 2004, pp. 872–876.
K. E. Lee, N. Sha, E. R. Dougherty et al., “Gene Selection: A Bayesian Variable Selection Approach,” Bioinformatics, vol. 19, no. 1, 2003, pp. 90–97.
H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer, London, UK, 1998.
X. Liu, A. Krishnan, and A. Mondry, “An Entropy-based Gene Selection Method for Cancer Classification Using Microarray Data,” BMC Bioinformatics, vol. 6, no. 76, 2005.
L. C. Molina, L. Belanche, and A. Nebot, “Feature Selection Algorithms: A Survey and Experimental Evaluation,” available at: http://www.lsi.upc.es/dept/techreps/html/R02-62.html, Technical Report, 2002.
N. R. Pal, S. Nandi, and M. K. Kundu, “Self-crossover: A New Genetic Operator and Its Application to Feature Selection,” Int. J. Syst. Sci., vol. 29, no. 2, 1998, pp. 207–212.
E. Parzen, “On the Estimation of a Probability Density Function and Mode,” Ann. Math. Stat., vol. 33, 1962, pp. 1064–1076.
P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recogn. Lett., vol. 15, 1994, pp. 1119–1125.
M. Richeldi and P. Lanzi, “Performing Effective Feature Selection by Investigating the Deep Structure of the Data,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining. Menlo Park, CA, 1996, pp. 379–383.
S. C. Shah and A. Kusiak, “Data Mining and Genetic Algorithm Based Gene/SNP Selection,” Artif. Intell. Med., vol. 31, no. 3, 2004, pp. 183–196.
W. Siedlecki and J. Sklansky, “A Note on Genetic Algorithms for Large Scale Feature Selection,” Pattern Recogn. Lett., vol. 10, 1989, pp. 335–347.
T. J. Umpai and S. Aitken, “Feature Selection and Classification for Microarray Data Analysis: Evolutionary Methods for Identifying Predictive Genes,” BMC Bioinformatics, vol. 6, no. 148, 2005.
S. S. Uzma and H. G. Robert, “Fingerprinting the Diseased Prostate: Associations between BPH and Prostate Cancer,” J. Cell. Biochem., vol. 91, 2004, pp. 161–169.
E. P. Xing, M. I. Jordan, and M. Karp, “Feature Selection for High-dimensional Genomic Microarray Data,” in Proc. 18th Intl. Conf. On Machine Learning, 2001.
K. Yeung, R. E. Bumgarner, and A. E. Raftery, “Bayesian Model Averaging: Development of An Improved Multi-class, Gene Selection and Classification Tool for Microarray Data,” Bioinformatics, vol. 21, no. 10, 2005, pp. 2394–2402.
C. Zhang, Hai-Ri Li, Jian-Bing Fan, J. Wang-Rodriguez et al., “Profiling Alternatively Spliced mRNA Isoforms for Prostate Cancer Classification,” BMC Bioinformatics, vol. 7, 2006, pp. 202–236.
Chaolin Zhang et al., “Significance of Gene Ranking for Classification of Microarray Samples,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, 2006, pp. 312–320.
X. Zhou, X. Wang, and E. Dougherty, “Nonlinear Probit Gene Classification Using Mutual Information and Wavelet-based Feature Selection,” J. Biol. Syst., vol. 12, no. 3, 2004, pp. 371–386.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gan, Z., Chow, T.W.S. & Huang, D. Effective Gene Selection Method Using Bayesian Discriminant Based Criterion and Genetic Algorithms. J Sign Process Syst Sign Image 50, 293–304 (2008). https://doi.org/10.1007/s11265-007-0120-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-007-0120-3