Abstract
Given the evolving character of Big Data, a new kind of way to manage data has become a requisite. The domain had a growing interest in recent years and has been, therefore, investigated for use for the new kind of massively generated data. In this sense, the concept of Data Lakes was found to be promising. In fact, any kind of data (structured, semi-structured or unstructured) could be the input of data lakes, where their processing is performed on a “lazy-basis” and executed at the time of use, depending on the actual needs of the user, and based on a schema-on-read approach. One pertinent application of data lakes is relating to data collected during arctic expeditions. Indeed, these data are various especially in nature and in volume and, hence, are suitable for data lakes. In this paper, we detail the challenges stemming from using Big Data Lakes along with machine learning to manage, at will, the collected Big Arctic Data samples.
A. Cuzzocrea—This research has been made in the context of the Excellence Chair in Computer Engineering at LORIA, University of Lorraine, Nancy, France.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bikakis, N., Papastefanatos, G., Papaemmanouil, O.: Big data exploration, visualization and analytics. Big Data Res. 18, art. 100123 (2019)
Wang, X., et al.: A general framework for big data knowledge discovery and integration. Concurr. Comput. Pract. Exp. 30(13), art. 100123 (2018)
Eberius, J., Thiele, M., Lehner, W.: Exploratory ad-hoc analytics for big data. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 365–407. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_11
Chopade, P., Zhan, J.: Structural and functional analytics for community detection in large-scale complex networks. J. Big Data 2, art.11 (2015)
Cuzzocrea, A., Song, I.-Y.: Big graph analytics: the state of the art and future research agenda. In: DOLAP 2014, pp. 99–101 (2014)
Barh, D., et al.: Multi-omics-based identification of SARS-CoV-2 infection biology and candidate drugs against COVID-19. Comput. Biol. Med. 126, 104051:1–104051:13 (2020)
Jiang, F., et al.: Mining sequential patterns from uncertain big DNA in the Spark framework. In: IEEE BIBM 2016, pp. 874–881 (2016)
Leung, C.K., et al.: Predictive analytics on genomic data with high-performance computing. In: IEEE BIBM 2020, pp. 2187–2194 (2020)
Pawliszak, T., et al.: Operon-based approach for the inference of rRNA and tRNA evolutionary histories in bacteria. BMC Genom. 21(Supplement 2), 252:1–252:14 (2020)
Sarumi, O.A., Leung, C.K.: Adaptive machine learning algorithm and analytics of big genomic data for gene prediction. In: Mehta, M., Fournier-Viger, P., Patel, M., Lin, J.C.-W. (eds.) Tracking and Preventing Diseases with Artificial Intelligence. ISRL, vol. 206, pp. 103–123. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-76732-7_5
Sarumi, O.A., Leung, C.K.: Exploiting anti-monotonic constraints for mining palindromic motifs from big genomic data. In: IEEE BigData 2019, pp. 4864–4873 (2019)
Gupta, P., Hoi, C.S.H., Leung, C.K., Yuan, Y., Zhang, X., Zhang, Z.: Vertical data mining from relational data and its application to COVID-19 data. In: Lee, W., Leung, C.K., Nasridinov, A. (eds.) Big Data Analyses, Services, and Smart Data. AISC, vol. 899, pp. 106–116. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-8731-3_8
Leung, C.K., et al.: Towards trustworthy artificial intelligence in healthcare. In: IEEE ICHI 2022, pp. 626–632 (2022)
Souza, J., Leung, C.K., Cuzzocrea, A.: An innovative big data predictive analytics framework over hybrid big data sources with an application for disease analytics. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) AINA 2020. AISC, vol. 1151, pp. 669–680. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44041-1_59
Tsumoto, S., et al.: Estimation of disease code from electronic patient records. In: IEEE BigData 2019, pp. 2698–2707 (2019)
Tran, N.D.T., et al.: A deep learning based predictive model for healthcare analytics. In: IEEE ICHI 2022, pp. 547–549 (2022)
Chanda, A.K., et al.: A new framework for mining weighted periodic patterns in time series databases. Expert Syst. Appl. 79, 207–224 (2017)
Leung, C.K., et al.: A machine learning approach for stock price prediction. In: IDEAS 2014, pp. 274–277 (2014)
Murray, M., et al.: Large scale financial filing analysis on HPCC systems. In: IEEE BigData 2020, pp. 4429–4436 (2020)
Sharma, R., et al.: Tale of three states: analysis of large person-to-person online financial transactions in three Baltic countries. In: IEEE BigData 2019, pp. 1497–1505 (2019)
Cabusas, R.M., Epp, B.N., Gouge, J.M., Kaufmann, T.N., Leung, C.K., Tully, J.R.A.: Mining for fake news. In: Barolli, L., Hussain, F., Enokido, T. (eds.) AINA 2022, Part II. LNNS, vol 450, pp. 154–166. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99587-4_14
Chowdhury, M.E.S., et al.: A new approach for mining correlated frequent subgraphs. ACM Trans. Manage. Inf. Syst. 13(1), 9:1–9:28 (2022)
Czubryt, T.J., Leung, C.K., Pazdor, A.G.M.: Q-VIPER: quantitative vertical bitwise algorithm to mine frequent patterns. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2022. LNCS, vol. 13428, pp. 219–233. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12670-3_19
Leung, C.K., et al.: Fast algorithms for frequent itemset mining from uncertain data. In: IEEE ICDM 2014, pp. 893–898 (2014)
Ishita, S.Z., et al.: New approaches for mining regular high utility sequential patterns. Appl. Intell. 52, 3781–3806 (2022)
Madill, E.W., Leung, C.K., Gouge, J.M.: Enhanced sliding window-based periodic pattern mining from dynamic streams. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2022. LNCS, vol. 13428, pp. 234–240. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12670-3_20
Smallwood, J.F., et al.: Mining the impacts of COVID-19 pandemic on the labour market. In: IMCOM 2022, pp. 337–344 (2022)
Rahman, M.M., et al.: Mining weighted frequent sequences in uncertain databases. Inf. Sci. 479, 76–100 (2019)
Roy, K.K., et al.: Mining weighted sequential patterns in incremental uncertain databases. Inf. Sci. 582, 865–896 (2022)
Roy, K.K., Moon, M.H.H., Rahman, M.M., Ahmed, C.F., Leung, C.K.: Mining sequential patterns in uncertain databases using hierarchical index structure. In: Karlapalem, K., et al. (eds.) PAKDD 2021, Part II. LNCS (LNAI), vol. 12713, pp. 29–41. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75765-6_3
Jiang, F., et al.: Web page recommendation based on bitwise frequent pattern mining. In: IEEE/WIC/ACM WI 2016, pp. 632–635 (2016)
He, C., et al.: Finding mutual X at WeChat-scale social network in ten minutes. In: IEEE BigData 2019, pp.288–297 (2019)
Cameron, J.J., et al.: Finding strong groups of friends among friends in social networks. In: IEEE DASC 2011, pp. 824–831 (2011)
Leung, C.K.: Mathematical model for propagation of influence in a social network. In: Alhajj, R., Rokne, J. (eds.) Encyclopedia of Social Network Analysis and Mining, 2nd edn., pp. 1261–1269. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7131-2_110201
Leung, C.K., et al.: Big data analytics of social network data: who cares most about you on Facebook? In: Moshirpour, M., Far, B., Alhajj, R. (eds.) Highlighting the Importance of Big Data Management and Analysis for Various Applications. Studies in Big Data, vol. 27, pp. 1–15. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60255-4_1
Leung, C.K., et al.: Parallel social network mining for interesting ‘following’ patterns. Concurr. Comput. Pract. Exp. 28(15), 3994–4012 (2016)
Leung, C.K., et al.: Personalized DeepInf: enhanced social influence prediction with deep learning and transfer learning. In: IEEE BigData 2019, pp. 2871–2880 (2019)
Leung, C.K.-S., Jiang, F.: Big data analytics of social networks for the discovery of “following” patterns. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 123–135. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_10
Deligiannis, K., Raftopoulou, P., Tryfonopoulos, C., Platis, N., Vassilakis, C.: Hydria: an online data lake for multi-faceted analytics in the cultural heritage domain. Big Data Cogn. Comput. 4(2), art. 7 (2020)
Alserafi, A., Abelló, A., Romero, O., Calders, T.: Keeping the data lake in form: proximity mining for pre-filtering schema matching. ACM Trans. Inf. Syst. 38(3), 26:1–26:30 (2020)
Olawoyin, A.M., et al.: Open data lake to support machine learning on Arctic big data. In: IEEE BigData 2021, pp. 5215–5224 (2021)
Bala, M., Boussaid, O., Alimazighi, Z.: a fine-grained distribution approach for ETL processes in big data environments. Data Knowl. Eng. 111, 114–136 (2017)
Prabhune, A., Ansari, H., Keshav, A., Stotzka, R., Gertz, M., Hesser, J.: MetaStore: a metadata framework for scientific data repositories. In: IEEE BigData 2016, pp. 3026–3035 (2016)
Cuzzocrea, A.: Combining multidimensional user models and knowledge representation and management techniques for making web services knowledge-aware. Web Intell. Agent Syst. 4(3), 289–312 (2006)
Coimbra, M.E., Francisco, A.P., Veiga, L.: Distributed graphs: in search of fast, low-latency, resource-efficient, semantics-rich big-data processing. CoRR, abs/1911.11624 (2019)
Hoi, C.S.H. Hoi, et al.: Data, information and knowledge visualization for frequent patterns. In: IV 2022, pp. 227–232 (2022). https://doi.org/10.1109/IV56949.2022.00045
Leung, C.K.-S., Carmichael, C.L., Teh, E.W.: Visual analytics of social networks: mining and visualizing co-authorship networks. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) FAC 2011. LNCS (LNAI), vol. 6780, pp. 335–345. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21852-1_40
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: F&A: a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol. 6263, pp. 89–104. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15105-7_8
Ceci, M., Cuzzocrea, A., Malerba, D.: Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering. J. Intell. Inf. Syst. 44(3), 309–333 (2013). https://doi.org/10.1007/s10844-013-0268-1
Ahn, S., et al.: A Fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. In: FUZZ-IEEE 2019, pp. 1259–1264 (2019)
Morris, K.J., et al.: Token-based adaptive time-series prediction by ensembling linear and non-linear estimators: a machine learning approach for predictive analytics on big stock data. In: IEEE ICMLA 2018, pp. 1486–1491 (2018)
Audu, A.-R., Cuzzocrea, A., Leung, C.K., MacLeod, K.A., Ohin, N.I., Pulgar-Vidal, N.C.: An intelligent predictive analytics system for transportation analytics on open data towards the development of a smart city. In: Barolli, L., Hussain, F.K., Ikeda, M. (eds.) CISIS 2019. AISC, vol. 993, pp. 224–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-22354-0_21
Acknowledgements
This research has been partially supported by Arctic Research Foundation (ARF), Mitacs Inc., NSERC (Canada) and University of Manitoba, and the French PIA project “Lorraine Université d’Excellence”, reference ANR-15-IDEX-04-LUE.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cuzzocrea, A., Leung, C.K., Soufargi, S., Olawoyin, A.M. (2022). The Emerging Challenges of Big Data Lakes, and a Real-Life Framework for Representing, Managing and Supporting Machine Learning on Big Arctic Data. In: Barolli, L., Miwa, H. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2022. Lecture Notes in Networks and Systems, vol 527. Springer, Cham. https://doi.org/10.1007/978-3-031-14627-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-14627-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14626-8
Online ISBN: 978-3-031-14627-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)