Abstract
This chapter provides a comprehensive overview of data preprocessing techniques and tools in the context of web and social media analytics. As data volume and complexity from various sources grow, effective data preprocessing becomes crucial for extracting valuable insights and knowledge. This chapter covers vital steps in data preprocessing, including characterizing data, reducing dimensionality, data transformation, and data enrichment and validation. By following these steps and utilizing appropriate techniques and tools, you can improve the quality of your data, enhance the effectiveness of your analytics efforts, and make better-informed decisions. Moreover, this chapter aims to equip you with the necessary knowledge to effectively tackle complex and noisy data, enabling you to unlock your organization’s full potential for data mining and analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexandropoulos, S.A.N., Kotsiantis, S.B., Vrahatis, M.N.: Data preprocessing in predictive data mining. Knowl. Eng. Rev. 34, e1 (2019)
Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI Soc. 30(1), 89–116 (2015)
Coughlin, D.M., Campbell, M.C., Jansen, B.J.: A web analytics approach for appraising electronic resources in academic libraries. J. Assoc. Inf. Sci. Technol. 67(3), 518–534 (2016)
Danubianu, M.: Step by step data preprocessing for data mining. A case study. In: Proceedings of the International Conference on Information Technologies (InfoTech-2015), pp. 117–124 (2015)
Diouf, R., Sarr, E.N., Sall, O., Birregah, B., Bousso, M., Mbaye, S.N.: Web scraping: State-of-the-art and areas of application. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 6040–6042 (2019)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996). https://ojs.aaai.org/index.php/aimagazine/article/view/1230
Gama, J.a., Žliobaitundefined, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4) (2014)
García, S., Luengo, J., Herrera, F.: Data preprocessing in data mining, vol. 72. Springer (2015)
Garson, G.D.: Data Analytics for the Social Sciences: Applications in R. Routledge, London (2021)
Jansen, B., Jung, S.g., Salminen, J.: The effect of hyperparameter selection on the personification of customer population data. Int. J. Electr. Comput. Eng. Res. 1(2) (2021)
Jolliffe, I.: Principal Component Analysis. Wiley Ltd (2005)
Kazil, J., Jarmul, K.: Data Wrangling with Python: Tips and Tools to Make Your Life Easier. O’Reilly Media, Inc. (2016)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. Wiley (2011)
Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, NLD (2007)
Kraus, D.: Consolidated data analysis and presentation using an open-source add-in for the microsoft excel R spreadsheet software. Med. Writ. 23(1), 25–28 (2014)
Kuhn, M., Johnson, K., et al.: Applied Predictive Modeling, vol. 26. Springer (2013)
Liu, B.: Opinion Mining and Sentiment Analysis, pp. 459–526. Springer, Berlin (2011)
Liu, H., Motoda, H., Setiono, R., Zhao, Z.: Feature selection: an ever evolving frontier in data mining. In: Liu, H., Motoda, H., Setiono, R., Zhao, Z., (eds.) Proceedings of the Fourth International Workshop on Feature Selection in Data Mining, Proceedings of Machine Learning Research, vol. 10, pp. 4–13. PMLR, Hyderabad (2010). https://proceedings.mlr.press/v10/liu10b.html
Mukherjee, R., Kar, P.: A comparative review of data warehousing etl tools with new trends and industry insight. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 943–948 (2017)
Nelli, F.: Python Data Analytics. Apress, Berkeley (2015)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann (1999)
Suadaa, L.H.: A survey on web usage mining techniques and applications. In: 2014 International Conference on Information Technology Systems and Innovation (ICITSI), pp. 39–43 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jansen, B.J., Aldous, K.K., Salminen, J., Almerekhi, H., Jung, Sg. (2024). Data Preprocessing. In: Understanding Audiences, Customers, and Users via Analytics. Synthesis Lectures on Information Concepts, Retrieval, and Services. Springer, Cham. https://doi.org/10.1007/978-3-031-41933-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-41933-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41932-4
Online ISBN: 978-3-031-41933-1
eBook Packages: Synthesis Collection of Technology (R0)