Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics

Kamble, Subhash; Arunalatha, J. S.; Venkataravana Nayak, K.; Venugopal, K. R.

doi:10.1007/978-981-19-4182-5_11

Subhash Kamble¹⁹,
J. S. Arunalatha¹⁹,
K. Venkataravana Nayak²⁰ &
…
K. R. Venugopal²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1414))

273 Accesses

Abstract

The exponential rise in advanced software computing, internet technologies, and humongous data has given rise to a new paradigm called BigData, which requires an allied computing environment to ensure 4Vs aspects, often characterized as varieties, volume, velocity, and veracity. In sync with these demands, most of the classical commutating models fail, especially due to large unstructured features of gigantically huge volume. To alleviate this problem, feature selection can be a viable solution; provided it guarantees minimum features with optimal accuracy. In this reference, the proposed work contributed a first of its kind solution which could ensure minimum features while ensuring expected higher accuracy to meet 4V demands. To achieve it, in this paper, a robust Chi-Squared Select-K-Best Incremental Feature Selection (CS-SKB-IFS) model is developed that achieved a minimum set of features yielding the expected accuracy. Subsequently, over the selected features, the CS-SKB-IFS model is used for further classification using the Extra Tree classifier. Thus, the strategic amalgamation of the CS-SKB-IFS model achieved the accuracy of (91.02%), F-Measure (91.20%), and AUC (83.06%) than the other state-of-art methods. In addition to the statistical performance, CS-SKB-IFS exhibited significantly smaller computational time (1.01 s) than the state-of-art method (6.74 s).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Feature selection techniques in the context of big data: taxonomy and analysis

Article 27 January 2022

Finding a needle in a haystack: insights on feature selection for classification tasks

Article Open access 03 November 2023

Optimization-Based Effective Feature Set Selection in Big Data

References

Chardonnens, T. (2013). Big data analytics on high velocity streams. Journal of Software Engineering Group, University of Fribourg, 50, 1–96.
Google Scholar
Alshawish, R. A., Alfagih, S. A., & Musbah, M. S. (2016). Big data applications in smart cities. In IEEE International Conference on Engineering and Management Information Systems (pp. 1–7). IEEE.
Google Scholar
Zhang, X., Mei, C. L., Chen, D. G., Yang, Y. Y., & Li, J. H. (2019). Active incremental feature selection using a fuzzy rough set based information entropy. IEEE Transactions on Fuzzy Systems, 28(5), 901–915.
Article Google Scholar
Wang, C. Z., Huang, Y., Shao, M. W., Hu, Q. H., & Chen, D. G. (2019). Feature selection based on neighborhood self-information. IEEE Transactions on Cybernetics, 50(9), 4031–4042.
Article Google Scholar
Saeys, Y., Inza, I., & Larra ñaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, Oxford University Press, 23(19), 2507–2517.
Google Scholar
Qian, W. B., Shu, W. H., & Zhang, C. S. (2016). Feature selection from the perspective of knowledge granulation in dynamic set-valued information system. Journal of Information Science and Engineering, 32(3), 783–798.
MathSciNet Google Scholar
Jing, Y. G., Li, T. R., Huang, J. F., & Zhang, Y. Y. (2016). An incremental attribute reduction approach based on knowledge granularity under the attribute generalization. International Journal of Approximate Reasoning, 76, 80–95.
Article MATH MathSciNet Google Scholar
Javidi M. M., & Eskandari, S. (2018). Streamwise feature selection: A rough set method. International Journal of Machine Learning and Cybernetics, Elsevier, 9(4), 667–676
Google Scholar
Liu, J. H., Lin, Y. J., Li, Y. W., Weng, W., & Wu, S, X. (2018). Online multi-label streaming feature selection based on neighborhood rough set. Journal of Pattern Recognition, 84, 273–287.
Google Scholar
Zhou, P., Hu, X. G., Li, P. P., & Wu, X. D. (2017). Online feature selection for high dimensional class imbalanced data. Journal of Knowledge Based Systems, 136, 187–199.
Article Google Scholar
Jing, Y. G., Li, T. R., Huang, J. F., Chen, H. M., & Horng, S. J. (2017). A group incremental reduction algorithm with varying data values. International Journal of Intelligent Systems, 32(9), 900–925.
Article Google Scholar
Chen, D. G., Yang, Y. Y., & Dong, Z. (2016). An incremental algorithm for attribute reduction with variable precision rough sets. Journal of Applied Soft Computing, 45, 129–149.
Article Google Scholar
Yang, Y. Y., Chen, D. G., & Wang, H. (2016). Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Transactions on Fuzzy Systems, 25(4), 825–838.
Article Google Scholar
Wang, F., Liang, J. Y., & Dang, C. Y. (2013). Attribute reduction for dynamic data sets. Journal of Applied Soft Computing, 13(1), 676–689.
Article Google Scholar
https://www.openml.org/search
Bahassine, S., Madani, A., Al-Serem, M., & Kissi, M. (2020). Feature selection using an improved chi-square for Arabic text classification. Journal of King Saud University Computer and Information Sciences, 32(2), 225–231.
Article Google Scholar
El-Hasnony, I. M., Barakat, S. I., Elhoseny, M., & Mostafa, R. R. (2020). Improved feature selection model for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 8, 66989–67004.
Google Scholar
Kong, L., Qu, W., Yu, J., Zuo, H., Chen, G., Xiong, F., et al. (2019). Distributed feature selection for big data using fuzzy rough sets. IEEE Transactions on Fuzzy Systems, 28(5), 846–857.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bengaluru, India
Subhash Kamble & J. S. Arunalatha
Department of Computer Science and Engineering, Jain University, Bengaluru, India
K. Venkataravana Nayak
Bangalore University, Bengaluru, India
K. R. Venugopal

Authors

Subhash Kamble
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Arunalatha
View author publications
You can also search for this author in PubMed Google Scholar
K. Venkataravana Nayak
View author publications
You can also search for this author in PubMed Google Scholar
K. R. Venugopal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Subhash Kamble .

Editor information

Editors and Affiliations

Education and Training Division, CDAC, Noida, Uttar Pradesh, India
Arti Noor
Education and Training Division, CDAC, Noida, Uttar Pradesh, India
Kriti Saroha
Department of Automatic Control, Computers and Electronics, Petroleum-Gas University of Ploiesti, Ploiesti, Romania
Emil Pricop
Computer Science, and Information Technology, Kwantlen Polytechnic University, Surrey, BC, Canada
Abhijit Sen
Department of Electronics and Electrical Engineering, IIT Guwahati, Guwahati, Assam, India
Gaurav Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamble, S., Arunalatha, J.S., Venkataravana Nayak, K., Venugopal, K.R. (2023). Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics. In: Noor, A., Saroha, K., Pricop, E., Sen, A., Trivedi, G. (eds) Proceedings of Emerging Trends and Technologies on Intelligent Systems. Advances in Intelligent Systems and Computing, vol 1414. Springer, Singapore. https://doi.org/10.1007/978-981-19-4182-5_11

Download citation

DOI: https://doi.org/10.1007/978-981-19-4182-5_11
Published: 16 November 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4181-8
Online ISBN: 978-981-19-4182-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Feature selection techniques in the context of big data: taxonomy and analysis

Finding a needle in a haystack: insights on feature selection for classification tasks

Optimization-Based Effective Feature Set Selection in Big Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Feature selection techniques in the context of big data: taxonomy and analysis

Finding a needle in a haystack: insights on feature selection for classification tasks

Optimization-Based Effective Feature Set Selection in Big Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation