Keywords

1 Introduction

Water resources have become stressed with the enhancement of population growth, looming agricultural/industrial production, unprecedented rise in living standard, and uncontrolled climate change (Kumar et al. 2019a; Roshan et al. 2020). Water scarcity refers to the condition when sufficient water resources are not available to meet a particular region’s water requirement (Kumar et al. 2019b, 2020). It is estimated that worldwide approximately four thousand million people are not provided with adequate quantity of potable water for at least one month a year. With the population projected to expand to nine thousand million by 2050, the demand for potable water is set to increase dramatically (du Plessis 2019; Patel et al. 2019; Singh et al. 2020). Hence, to meet the demand of blooming population for clean water, monitoring-based management of water resources is essential.

Advancement of the engineered sensors, data monitoring, and communication devices enables continuous monitoring of particular water system. As a result of this, near real-time series data with high frequency can be recorded. Such perpetual measurement produces bulk of data, called as big data (Gandomi and Haider 2015; Mayer-Schönberger and Cukier 2013). The term “Big data” which includes the major processes like, data acquisition, storage, extraction, and cleaning as well as analysis and interpretation were first proposed by Michael Cox and David Ellsworth in 1997. The targeted use of advanced big data analytics is emerging for effective and sustainable management of water resources in the scientific community. Application of the computer models is increasing in the field of water science and engineering because of the urgent necessity for deeper perspicacity into water systems and demand for providing effective solutions toward stressed water resources in a sustainable manner (Bibri and Krogstie 2017; Singh et al. 2020; Mukherjee et al. 2020). However, heterogenous nature of big water data causes difficulty in its storing, handling, and processing. Thus, to manage water resources effectively and sustainably, proper application of advanced water analytics is one of the prime necessities of the current decade.

A key contribution of this chapter is to bring forth the basic overview, characteristics, applications, challenges, and open platform/proposed model supporting water resources-based big data. Also, the future directions related to the integrations of various platforms are provided.

2 Big Water Data and Associated Characteristics

Technological advancement facilitates constant acquisition and processing of data at an unprecedented rate which can be further managed with the readily available software and hardware. This capability can be inclusively termed as “big data” (Adamala 2017). To characterize the big data, commonly used parameters include (i) Volume: As name suggests, volume is generally the quantity of data generated, processed, and stored. In the twenty-first century, data generation is constantly increasing as a result of which big data sizes are reported in multiple of terabytes and petabytes. For the storage, handling, and processing of this bulk data, the distributed systems are used instead of traditional database technology (Schroeck et al. 2012). (ii) Velocity: Speed with which generated data can be transferred and processed is known as velocity. At present time, streaming data (collected in real time) is one of the leading edges of big data. The modern applications and computer-based programmes/softwares enable the sorting, transmitting, and processing of generated data at faster rate (David et al. 2014). (iii) Variety: The availability of different types of data represents the variety. Water-related data is highly unstructured. The modern big data technology enables simultaneous collection and usage of structured and unstructured data. In water resource management, there has been more efforts require to integrate all types of water data from across different sections/sectors into one continuous data stream (Zikopoulos and Eaton 2011). (iv) Veracity: The quality or trustworthiness of any water-related data is known as veracity and it is directly associated with the health aspect as water is considered to be one of the primary necessities for the survival of living being. In general, it is a measure of the accuracy of the data. Quality control is one the important parameters to be considered for big data. (v) Value: It refers to the actionable perception gained from generated data. Having access to big water data will not going to complete the work unless and until it’s conversion into some value has not been performed. In case of water consumption survey study, the availability of data is not sufficient to reach the decision making untill its conversion into some deliverable value. With the help of the state-of-the-art models/softwares and algorithms, large amounts of data can be converted into deductive information for final decision making (David et al. 2014; Madden 2012).

3 Big Data Analytical Methods

Big data analysis can be useful in enlightening the decision-making process in numerous areas such as environmental, natural disaster, and resources management. Numbers of big data analytical methods are used to infer a value/decision from the acquired big data (Chen et al. 2012; Manyika 2011). Most important of them are listed with their characteristics in Table 3.1.

Table 3.1 Big data analysis methods and their characteristics

4 Big Data and Water Resources Management

Water usage is more than double the rate of the population growth in the last century, which makes water as one of the precious resources of the present decade. This also increased the importance of effective management of water resources via the big data analytics. The major 5 “V” capabilities of big data (shown in Fig. 3.1) can help in proper perception and management of these scarce water resources.

Fig. 3.1
figure 1

Characteristics of Big data

4.1 Types of Water Data and Data-Sharing Methodologies

A diverse set of information that addresses the environmental, physical, ecological, social, economic, cultural, and political parameters of water usage, availability, and accessibility is known as water data. Water data can be divided into five categories: (i) water quality: The physical, biological, and chemical characteristics of water are often referred as water quality, an important parameter to determine the potability of water. To identify the water quality, single measurement is not enough but measurement of the number of water characteristics is required. It is a measure of the condition of water usually in reference to the requirements of some ecological process or anthropogenic purpose. (ii) Water quantity: It is often regarded as a rate at which volume of water is moving downstream (Wanielista et al. 1997). (iii) Water use: It includes the human consumptive uses (i.e., per capita), application by various sectors (i.e., agriculture, industry), environmental practice (i.e., evapotranspiration rates), and ecosystem services. (iv) Water extremes: Hazard and natural disaster-related data that include drought/flood monitoring and weather data. (v) Water indicators: Such indicators are generally linked to few common aspects of human or environmental health. Water indicators integrate other water-related data to provide a metric for water sustainability and utilization for human well-being (Sternlieb and Laituri 2010). Water data can be generated as primary and secondary data. The collection of water quality and quantity-related raw data can be defined as the primary data. For the measurement of the primary data, different methods are used depending upon the characteristics of water and availability of resources. Data which is derived directly from the sensors or hydraulic measurements are known as secondary data. Primary data can be easily shared as compared to secondary data.

Water resources-based data is highly fragmented as data is generated by number of entities and warehoused in many locations. Due to the fragmented nature, water data sharing is considered as a barrier toward big data capabilities. The data fragmentation problem can be overcome by using three common methodologies: (i) one-to-one: As name suggests, the data is generated by one entity and used for a single purpose. The most common example includes the academic research study or a contracted consulting project, (ii) one-to-many: In this case, data are generated by one entity and provided to many users for many purposes, and (iii) many-to-many.

4.2 Appositeness of Big Data to Water Resources

Science has been driven by the data but with advancement of technology, the word data has been replaced by big data. Water resources, one of the significant fields of environmental science, comprise a big data issue and flourishes increasingly. Big data helps in identifying the suitable data to resolve the problems, which are difficult to be addressed by traditional data. Some of the major applications of big data are highlighted here:

Irrigation process which requires appropriate amount of water is mainly dependent on the number of climatic factors as well as on crop and soil types. These data can be easily provided with the help of the automated sensors and continuous monitoring systems. By using the big data farming, efficiency can be improved through reducing water requirements. Variety of automated sensors, continuous monitoring systems, robotics, and computational technology provide useful information related to the water quality which enables to understand the movement of chemicals. In addition, big data can be helpful in monitoring flood, tsunami, and drought conditions as well as the melting of ice and related climatic problems can also be monitored.

In addition to the above-discussed water resources-based applications, big data techniques have been also utilized for many applications such as oceanic (e.g., oil spill pollution detection), agriculture (e.g., food monitoring and security), urban planning, management, and sustainability, climate change (global warming, acid rain), energy assessment, disease problem, ecosystem assessment, land development and use, and so on.

4.3 Limitations of the Big Water Data Analytics

Big data analytics help in identifying, analyzing, and interpreting the available data for the proper management of water resources. However, at present, water resources systems in many developing countries are organized with the help of hydrological data. This represents potable water accessibility and availability data from which demand for the current and future generations can be derived. Such type of the conventional datasets mostly leads to ineffective planning, design, and functioning of water management schemes. The following listed limitations need to overcome to acquire complete benefit of the big data analytics.

Because of its large volume, the quality of stored and transmitted database is one of the major concerns in big data. Errors can be introduced from the first stage of data collection to the final deposition. Most of the automated instruments are either battery operated or need some kind of power supply. Sudden failure of which is directly associated with the gap in time series data. For example, data gap usually happens during the measurement of water consumption data using the smart water meters. Water resources quality data are complex to handle, store, and process because of their heterogeneous nature. Hence, modeling is still being done using traditional simulation models supported by GIS data.

5 Big Water Data Platform Components and Structure

Water resources management-related conceptual framework of the big water data open platform is shown in Fig. 3.2. It basically consists of nine blocks as discussed below: (i) The first bloc, i.e., decision support tools contain decision support technique to resolve the real-world difficulties. Because of the various available techniques, the first difficulty lies in the selection of the best decision method. (ii) Knowledge-based system deals with collection and storage of water data and ultimately transfers that information to stakeholders, including professionals and experts in the field of water research. (iii) The third block geographic information system (GIS) is generally used to capture, store, analyze, and integrate complex hydrological data. ArcGis normally collects maps, applications, data, and allows users to recognize data in order to quickly deduce the best conclusion. (iv) The most important bloc of the big data platform is big data analysis system which consists number of tools to arrange, investigate, envisage, and extract useful water sources regarding information from large quantities and varieties of datasets. It requires suitable technologies (like, big data computing, analytics, mining, and security) to competently process large quantities of data.

Fig. 3.2
figure 2

Big data open platform for water resources management

(v) With help of the fifth bloc called simulation models, the data acquired from GIS will be linked and tried to simulate the water-associated difficulties using the simulators and interface. (vi) Sixth block of the big data platform, computation and processing, furnishes a receptacle of tools like hydraulic/hydrological models and high performance/grid computing. These tools help for the advancement of water resources prediction. (vii) After acquiring and processing the water data, the next important bloc is the communication system which makes pertinent data and information available to achieve efficiency and effectiveness. (viii) Search engine as the eighth block enables users to find the suitable information from the big water data warehouses. (ix) User interfaces as the ninth and final bloc help operators to formulate the water resources-based problems by entering related data and portraying the obtained results and graphics.

6 Modern Big Data Cycle in the Context of Water Resources

Some meaningful outputs from the collected data can be drawn in order to reach up to the final conclusion. In general, two main processes, i.e., data management and analytics are used for extracting meaningful results from the big water data. The term data management can be defined as the acquisition of data, its temporary storage and final preparation for suitable for analysis. Analytics refer to methods utilized to investigate and get conclusive findings from big data. Both of these processes are normally divided into five stages as shown in Fig. 3.3. Data management is the first process which needs to be performed, after acquiring the big data. From this process, the structured data can be stored and retrieved using some traditional methods such as data marts and data warehouses. Extract–load–transform (ELT) tools are used for extraction, transformation, and loading of data into the final database.

Fig. 3.3
figure 3

General classification of the big data processes

One or more analytical methods discussed in the above section have been used by water engineers and scientists for the modeling and management of water systems. (Shafiee et al. 2018) proposed the framework for the state of the flow of water data as represented in Fig. 3.4. Number of sensors have been installed in the environment for the collection of data. After proper data management, the stored data was embedded into models for analysis and interpretation. In this system (Fig. 3.4), the water data lake collects data during every stage. Analytics handles and further processes raw data and finally, returns cleaned/forecasted data. Middleware pulls, aggregates, and formats data for a model. A wrapper provides communication capabilities to a model.

Fig. 3.4
figure 4

Typical water data lake

7 Future Perspectives of Big Data for Water Resources Management

At present, a number of big data platforms are available related to the water resources. Table 3.2 displays some of the common big data platforms pertaining to the water resources with their major objectives, significance, and limitations. Based on the associated limitations, the prospective applications of big data in water resources management are highlighted below.

Table 3.2 Common big data platforms pertaining to the water resources with their major objectives, significance, and limitations

As it has been mentioned, big data techniques have demonstrated wide applications in the decision-making process by predicting the outcomes. However, despite having access to a broad range of data sources and technical resources, the water utility sector appears to make partial use of it for the enhancement of water quality and source distribution. With the high-density survey of big data, the risk of the error will also increase mainly due to the lack of the availability of instant processing techniques. Hence, the future perspective of big data research is not to obtain more and more data but it should mainly focus on the development of the new generation of smaller, cheaper, and accurate sensors to produce real-time data. The integration techniques can be helpful in improved decision making and management of the water resources. For example, machine learning, one of the analysis techniques, is able to extract accurate patterns and relationships from the data. At present, a number of the models, methodologies, and techniques are accessible for the planning and management of the water resources. However, none of them provides a convenient solution.

8 Conclusion

With advancement of the computer science and Web technology, the data generation has become increasing in day-to-day life. These large datasest ultimately pose challenges in its storage, handling, analysis, and interpretation. Water is one of the prime requirements for the survival of life and is progressively becoming a precious resource due to its inflated usage. Increased population and economic/industrial growth cause stress on available water resources. Similarly, climate change also significantly affects the water resources due to its direct effects on important hydrological processes, i.e., precipitation and evaporation. With the help of big data, each and every component of environment, such as water resources, can be managed.

The aim of the current chapter is to present an overview of big water data, associated characteristics, applications, and limitations. It also gives a summary related to the open big data platforms/proposed models supporting water resources. The authors can get the specific idea about the available models by referring to this chapter. It also highlights the future perspective required for the proper utilization of big data technique for the water resources management. Despite the increasing importance of modeling in water resources management and planning, no single methodology/tool provides an acceptable solution. Hence, more research is required in development of single but comprehensive methodology/tool. The basic available models are generally restricted to local/regional-level strategies, while the challenges are transdisciplinary and encompass knowledge from various sciences and engineering backgrounds.