1 Introduction

Rainfall is a key element of the hydrologic cycle and, hence, has great significance for our water resources, environment, and ecosystems. Therefore, it is vital to adequately understand the dynamics of rainfall (Sivakumar and Woldemeskel 2015; Ouallouche et al. 2018). However, rainfall shows significant variability in both time and space, which makes it extremely challenging to model and predict (De Michele and Bernardara 2005).

Over the past century, numerous approaches and mathematical models have been proposed and applied to model and predict rainfall (e.g., Johnson and Bras 1980; Folland et al. 1991; French et al. 1992; Toth et al. 2000; Sivakumar et al. 2001; Wong et al. 2003; Chau and Wu 2010; Wu et al. 2015; Ali et al. 2018, 2020; Danandeh Mehr et al. 2019; Diop et al. 2020). Despite the encouraging outcomes from such studies, our knowledge of the rainfall dynamics remains largely inadequate.

Recent developments in the field of complex systems science, especially complex networks (e.g., Watts and Strogatz 1998; Barabási and Albert 1999), seem to offer useful avenues to improve our understanding of the complex dynamics of rainfall—A network is a set of points (called nodes or vertices) connected by a set of lines (called links or edges). As a result, applications of the concepts of complex networks for studying the dynamics of rainfall are gaining increasing attention (e.g. Scarsoglio et al. 2013; Jha et al. 2015; Sivakumar and Woldemeskel 2015; Jha and Sivakumar 2017; Naufan et al. 2018; Sun et al. 2018; Tiwari et al. 2019). A brief account of such studies is as follows.

Scarsoglio et al. (2013) analyzed the spatial dynamics of gridded global precipitation over a period of seventy years (1941–2010) by using the complex network theory. The annual precipitation network was built based on the linear correlation function to evaluate the possible links between nodes, and the network was investigated through topological properties of nodes, including degree centrality, betweenness centrality, clustering coefficient, and weighted average topological distance. The results revealed the wide range of spatial variablility with highly connected and barely connected regions. Sivakumar and Woldemeskel (2015) employed the clustering coefficient and degree distribution methods to analyze the spatial connections in monthly rainfall dynamics over a period of 68 years (1940–2007) in a rainfall network of 230 stations across Australia. The results indicated that the network was not a purely random graph but might be an exponentially truncated power-law network. Jha et al. (2015) applied the network theory to examine the spatial conncetions in rainfall in two different areas in Australia (Western Australia and Sydney catchment) by using clustering coefficient. The clustering coefficient values were interpreted in terms of topographic factors and rainfall properties.

Jha and Sivakumar (2017) applied the concepts of complex networks to explore the properties of rainfall, in terms of the spatial links, temporal scale, and network size in the daily rainfall data. They employed the clustering coefficient method to find rainfall properties at six temporal scales (1, 2, 4, 8, and 16 days, as well as monthly) from a large number of stations in the Murray-Darling basin in Australia. The outcomes demonstrated that the nature of spatial connections changes with temporal scale for different thresholds. They also suggested that identification of a suitable threshold is essential to understand the connections in rainfall properties. Naufan et al. (2018) examined the spatial connections in rainfall at different temporal scales in the specific context of climate change. They analyzed gridded rainfall data outputs (during 1961–1990) from a regional climate model over Southeast Asia in different temporal scales. They suggested that the scale-free network is more fitting for very fine temporal scales and small-world network is more fitting for very coarse temporal scales, while their combination is more fitting for intermediate temporal scales.

Sun et al. (2018) examined the structures of rainfall and soil moisture extremes in Texas using complex network analysis, with focus on spatial correlation patterns in rainfall (P) and soil moisture (SM). Their analyses provided useful information on the dispersion, junctions, and concurrency of extremes in daily rainfall and SM–P coupling in this flood-prone region and could be used as a base for rainfall-runoff events. Tiwari et al. (2019) implemented the complex network to select neighbors in the inverse distance weighting (IDW) approach and reconstruct the rainfall data at a desired location. They proposed three types of inverse distance weighting methods, including nearest neighbor model, linked neighbor model, and clusted neighbor model to study the spatial connections in daily rainfall data from 430 rain gauges located in the Murray- Darling Basin. The performances of proposed models were evaluated by using cross validation method. They concluded that in a natural system, a traditional IDW may be more accurate than the network-based models but may not be completely efficient in accounting for the spatial rainfall variablitily.

While the outcomes of the above complex networks-based studies on rainfall are certainly encouraging, it is important to recognize one particular limitation of these studies and the opportunities that exist for further improvements. This is related to the way the construction of the nodes and links is done for the rainfall network for analysis. As Yasmin and Sivakumar (2018) pointed out, in their analysis of streamflow data using complex networks, a fundamental task in complex networks analysis is the construction of the network (of nodes and links). In this regard, Yasmin and Sivakumar (2018) explored the utility of the phase-space reconstruction concept (Packard et al. 1980), a basic concept in chaos theory, for construction of the (streamflow) network; see, for instance, Sivakumar (2017) for an extensive account of the phase-space reconstruction concept and chaos theory applications to rainfall (and other hydrologic time series). The phase-space reconstruction concept offers a “multi-dimensional” view, as opposed to the single-dimensional view offered by traditional network construction method based on time series. Instead of treating each streamflow time series as a node in the network, Yasmin and Sivakumar (2018) considered each point or vector within the multi-dimensional reconstructed phase space as a node. Yasmin and Sivakumar (2018) applied this new network construction approach for complex networks-based analysis of temporal streamflow dynamics of the monthly streamflow time series observed over a period of 53 years from 639 stations in the United States. They used the distribution of the strength of nodes for any given streamflow network or station (i.e. streamflow series) to identify the type of network associated with that station. They also used the node strengths for the different stations to classify the 639 stations.

Encouraged by the results reported by Yasmin and Sivakumar (2018), the present study applies the coupled phase space reconstruction–network construction approach to investigate the temporal dynamic behavior of rainfall in central Vietnam. Daily rainfall data observed over a period of 38 years (1979–2016) from 14 rainfall stations in the Vu Gia Thu Bon River basin in central Vietnam are analyzed. Each station is considered as a network—there is a total of 14 networks. For each of these rainfall networks, the strength is calculated to examine the network properties. Entropy values of the original rainfall time series are also calculated for obtaining additional information on the properties of the rainfall dynamics.

The rest of this paper is organized as follows. First, Sect. 2 describes the concepts and methodology, including network construction using phase space reconstruction and procedure for calculation of strength and entropy. Section 3 provides details of the study area and the rainfall data considered for analysis. The analysis and results are presented in Sect. 4. Finally, Sect. 5 provides some conclusions.

2 Methodology

2.1 Network construction using phase space reconstruction and strength calculation

A network or a graph is a set of points that are connected by a set of lines. The points are called vertices or nodes and the lines are called edges or links. The existence/non-existence of links in a network is identified based on a measure that represents the strength of the links (e.g. distance or correlation between the nodes). For instance, node pairs that have strengths exceeding/below a certain threshold value may be assigned links.

With these basics, construction of the rainfall network to represent the temporal dynamics and the strength properties of the network are described below. The procedure involves three steps. First, the single-variable rainfall time series is represented in a multi-dimensional phase space using delay embedding (i.e. phase-space reconstruction). Next, this reconstructed phase space is treated as a ‘network,’ with the reconstructed vectors serving as the ‘nodes’ and the connections between them serving as the ‘links.’ Finally, the strength of each node in the network is determined using a distance metric (i.e. distance of a given node with every other node).

Let us assume a rainfall time series \( X_{i} \), where \( i = 1, 2, \ldots , N \), and the objective is to identify the temporal connections using network analysis. Here, for network construction, we adopt the method proposed by Yasmin and Sivakumar (2018), which offers a “multi-dimensional” view. We use the concept of phase space reconstruction (e.g. Packard et al. 1980), where a multi-dimensional phase space is reconstructed using only a single-variable time series \( X_{i} \), where \( i = 1, 2, \ldots , N \), through delay embedding (e.g. Takens 1981), as follows:

$$ \varvec{Y}_{j} = \left( {X_{j} , X_{j + \tau } , X_{j + 2\tau } , \ldots ,X_{{j + \left( {m - 1} \right)\tau /\Delta t}} } \right) $$
(1)

where \( j = 1, 2, \ldots , N - \left( {m - 1} \right)\tau /\Delta t \); m is the dimension of the vector \( Y_{j} \)(embedding dimension); and τ is an appropriate delay time taken to be some suitable integer multiple of the sampling time Δt.

There are several methods and guidelines to choose an appropriate delay time \( \tau \) for phase space reconstruction, including autocorrelation function method, mutual information method, and correlation integral method. In this study, we use the average mutual information method (Fraser and Swinney 1986). This method defines how the measurements \( X_{i} \) at time i are connected, in an information-theoretic fashion, to measurements \( X_{i + \tau } \) at time \( i + \tau \) (Abarbanel 1996). The average mutual information (AMI) is defined as:

$$ I\left( \tau \right) = \mathop \sum \limits_{{X_{i} ,X_{i + \tau } }} P\left( {X_{i} ,X_{i + \tau } } \right)\log_{2} \left[ {\frac{{P\left( {X_{i} ,X_{i + \tau } } \right)}}{{P\left( {X_{i} } \right)P\left( {X_{i + \tau } } \right)}}} \right] $$
(2)

Here, \( P\left( {X_{i} } \right) \) and \( P\left( {X_{i + \tau } } \right) \) are individual probabilities of the measurements \( X_{i} \) and \( X_{i + \tau } \), and \( P\left( {X_{i} ,X_{i + \tau } } \right) \) is the joint probability density for measurements \( P\left( {X_{i} } \right) \) and \( P\left( {X_{i + \tau } } \right) \). The appropriate delay time \( \tau \) is defined as the first minimum of the average mutual information \( I\left( \tau \right) \). There are many methods to estimate the optimal embedding dimension (mopt), such as the correlation dimension method and false nearest neighbour algorithm. In this study, the False Nearest Neighbor (FNN) algorithm, proposed by Kennel et al. (1992), is used to determine the optimal embedding dimension (mopt) for each of the 14 rainfall time series.

With this phase space reconstruction, for mopt, the distances (e.g. the Euclidean distances) between any two nodes i and j, denoted as \( d_{ij} \), can be calculated. Once the distances \( d_{ij} \) are obtained, the strength of the node j can be calculated as follows:

$$ S_{j} = \frac{1}{{\frac{1}{{N_{d} }}\mathop \sum \nolimits_{{\begin{array}{*{20}c} {i = 1} \\ {i \ne j} \\ \end{array} }}^{{N_{d} }} d_{ij} }} $$
(3)

where \( N_{d} \) is the total number of node distances.

2.2 Entropy measure

The average of uncertainty and randomness associated with the probability distribution in each a series can be quantitively measured by using the entropy theory (Kawachi et al. 2001; Mishra et al. 2009). Several entropy measures, such as Shannon, approximate, Renyi, and Tsallis, exist to quantify the randomness in a system. According to Remya and Unnikrishnan (2010), the best entropy for detecting dynamical complexity is the Tsallis entropy. The Tsallis entropy is given by:

$$ S_{q} \left( {P_{i} } \right) = \frac{1}{q - 1} \left( {1 - \mathop \sum \limits_{i} P_{i}^{q} } \right) $$
(4)

where \( q \) is a real number and \( P_{i} \) represents the probability of an event. Taking the limit as \( q \to 1 \), we obtain the Shannon entropy as:

$$ S = S_{1} \left( P \right) = - k\mathop \sum \limits_{i} P_{i } \ln P_{i } $$
(5)

The highest value of entropy occurs for uniform probability distribution. The entropy value decreases with the increasing number of constraints and increases with their decreasing number (Kawachi et al. 2001).

3 Study area and data

The Vu Gia Thu Bon River basin lies between latitudes 14° 55′ N to 16° 04′ N and longitudes 107° 15′ E to 108° 20′ E in central Vietnam. This river basin is located in the Da Nang and Quang Nam provinces and has an area of 10,350 km2. The topography of this region varies between narrow mountainous in the upstream region and flat coastal areas in the downstream region. According to DEM-SRTM-90 m data (Digital Elevation Model derived from Shuttle Radar Topography Mission with 90 m spatial resolution) (Jarvis et al. 2006), the altitude of the study area ranges from −14 m to 2583 m MSL (Fig. 1).

Fig. 1
figure 1

Distribution of hydro-meteorological stations in the Vu Gia Thu Bon catchment, central Vietnam

The total annual rainfall in this basin is relatively high, varying from 2000 to 4000 mm, and increases from north to south. The rainy season spans 4 months from September to December and provides around 70–75% of the total rainfall. The dry season lasts 8 months from January to August and accounts for around 25–30% of the total rainfall. Approximately two to four typhoons strike this region every year, which bring abundant rainfall. Flooding is also a significant challenge for water resources management of this river basin (Souvignet et al. 2014). Drought is very frequent during the dry season (Vu et al. 2017). There has also been significant saline intrusion in coastal areas, which leads to water shortage problems (Ribbe et al. 2017).

Rain-gauge stations in the Vu Gia Thu Bon River basin are sparse. There are only 14 rain gauges in this area (Fig. 1). Table 1 presents a general description of the 14 stations in the basin. For this study, measured monthly rainfall data from these stations over a period of 38 years (1979–2016) are considered. These data have been provided by the Vietnam National Centre for Hydro-Meteorological Forecasting (NCHMF). Figure 2 presents the monthly rainfall time series from these stations for the period 1979–2016. The rainfall time series show remarkable variations over the study period. Some of the key statistics (mean, maximum, and coefficient of variation) of the rainfall data from these stations are presented in Table 1.

Table 1 General description of the 14 stations in the Vu Gia Thu Bon catchment considered in this study
Fig. 2
figure 2

Rainfall time series of the 14 stations across central Vietnam considered in this study

4 Results and discussion

In order to examine the strength of the nodes in the rainfall network and entropy of the rainfall time series, the 14 monthly rainfall time series from central Vietnam are analyzed using the methods described in Sect. 2. The monthly rainfall time series are normalized between 0 and 1, and the analysis is performed on the normalized rainfall data. For strength calculation of the nodes, first the phase space is reconstructed using rainfall time series with the normalized rainfall data and then the network is analyzed. For the entropy calculation, the normalized rainfall data are used.

4.1 Delay times and embedding dimensions

As mentioned earlier, the phase space reconstruction of the rainfall time series depends largely on the selection of delay time (\( \tau \)) and embedding dimension (m). Here, the average mutual information (AMI) method, which determines both linear and nonlinear relationships between the series, is used to find the optimal \( \tau \). With the selection of the optimal \( \tau \), the optimal embedding dimension for phase space reconstruction of each rainfall time series is determined using the false nearest neighbor (FNN) algorithm. Table 2 presents the optimal delay time values and embedding dimensions obtained for the 14 rainfall time series.

Table 2 Optimal delay times and embedding dimensions of the rainfall time series

4.2 Network strength

Figure 3 shows the strengths of the nodes for each of the 14 rainfall networks across the study area. As seen, for each network, the strengths of the nodes exhibit noticeable variations, somewhat similar to those exhibited by the rainfall time series. One particular observation, and perhaps may be expected, is that the stations with smaller delay time (\( \tau ) \) values for the rainfall series, such as Cam Le (S2), Cau Lau (S3), Giao Thuy (S4), Ai Nghia (S5), and Tien Phuoc (S12) stations (all havting a delay time value of 1), which are located in the north-east and eastern parts of the study area, have less variability in the strengths of the nodes.

Fig. 3
figure 3

Strengths of nodes for the 14 rainfall networks

To obtain the frequency distribution of the strengths of nodes for the 14 rainfall networks, we apply a smooth kernel distribution. We use the function SmoothKernelDistribution of Mathematica software (It uses, by default, the Gaussian kernel and applies the Silverman’s rule to determine the bandwidth or kernel radius). Figure 4 shows the frequency distribution of the strengths of nodes for the 14 rainfall networks. As may be seen, rainfall time series with low optimal embedding dimensions, such as those from Stations 1, 2, 3, 6, 8, 9, and 10, seem to have left-skewed density plots.

Fig. 4
figure 4

Frequency distribution of strengths of nodes for the 14 rainfall networks

4.3 Strength and entropy distribution across the study area

Figure 5a, b presents the average strength of the nodes of each of the 14 rainfall networks and the entropy value for each of the 14 rainfall time series, respectively. Figure 5c, d presents the Voronoi maps for the same. The results, on one hand, seem to indicate that some adjacent stations, especially in the north and northeast of the study area, have the same range of strength value (e.g., the Hoi Khanh and Thanh My stations in the north, and the Ai Nghia and Giao Thuy stations in the northeast). On the other hand, however, there are several stations that are geographically close also show significantly different network strengths (e.g., the Que Son and Nong Son stations in the central part of the study area); this may be attributed to the topography and elevation of the rain gauges). In the case of entropy, one can find similar entropies for adjacent stations more frequently than that for similar average strengths, especially in the northeastern part of the study area. There is a high entropy trend in the central part of the study area from the north to the west that may be attributed to the rainfall patterns. Therefore, the node strengths and entropy values can be helpful not only for analyzing the temporal connections in rainfall but also for the classification of rainfall stations, even in small areas.

Fig. 5
figure 5

a Average strength of nodes for the rainfall networks, b entropy of the rainfall time series, c Voronoi map for the strength of the nodes for the rainfall networks and d Voronoi map for the entropies of the rainfall time series

4.4 Mean strength versus mean rainfall, entropy and elevation

For further information of the dynamic properties of rainfall in the study area, the following relationships are examined: (a) average node strength of the temporal network versus mean rainfall, (b) average node strength versus entropy of the rainfall time series, and (c) average node strength versus elevation of the rain gauge station. Figure 6a–c presents these relationships. The following major observations are made from these results.

  1. 1.

    The station with the highest mean rainfall (Que Son, with a mean annual rainfall of 345.5 mm) has the maximum mean strength (10.2) and also has the maximum entropy (3.05). However, significant variations in the mean strength of nodes (from 5.62 to 9.89) are observed when the mean rainfall is low (less than 200 mm).

  2. 2.

    Generally, there seems to be a positive and proportional relationship between the average strengths of nodes of the network and entropies of the rainfall time series, since stations with high average node strength values have high entropy.

  3. 3.

    Stations with high elevations (about 400 m or more) generally have low average node strengths (about 6 or less), but stations with low elevations (less than 100 m) have significantly varying ranges of node strengths (from as low as 5.62 to as high as over 10).

Fig. 6
figure 6

Relationship between strength of node in the temporal rainfall network and other rainfall/station properties: a average node strength versus average rainfall, b average node strength versus entropy of rainfall series, and c average node strength versus elevation of station

4.5 Mean strength versus month

Figure 7 presents the average node strength for different months of the year for the 14 stations. The results indicate that there are some distinct temporal patterns in the node strengths (Fig. 8), which are highlighted as follows.

  1. 1.

    There is a cyclic pattern of strength, with an increasing strength from the first month (January) to the third month (March) and then a decreasing strength for the fourth month (April), for Stations 7 (Hien), 8 (Thanh My), 9 (Nong Son), 11 (Hiep Duc), 13 (Kham Duc), and also, to some extent, for Station 6 (Hoi Khanh). This observation seems to be consistent with the seasonal patterns, with a delay time of 3 (\( \tau \) = 3), obtained for the rainfall time series. These stations are also located in the central and western part of the study area.

  2. 2.

    There is a 4-month changing pattern in the node strengths for Stations 1 (Da Nang) and 14 (Tra My), which is due to a delay time of the stations (\( \tau \) = 4) in the phase space.

  3. 3.

    Station 10 (Que Son) (\( \tau \) = 6) with the highest mean rainfall and entropy have a 6-month changing pattern.

  4. 4.

    For Stations 2 (Cam Le), 3 (Cau Lau), 4 (Giao Thuy), 5 (Ai Nghia), and 12 (Tien Phuoc) located in east and northeast part of the study area, there is no clear temporal pattern, where the stations have the lowest delay time (\( \tau \) = 1).

The results seem to indicate the importance of optimal delay time detection in recognizing the temporal patterns of rainfall stations regarding the strengths of nodes.

Fig. 7
figure 7

Average node strengths for different months in 14 stations across central Vietnam

Fig. 8
figure 8

Temporal patterns of strengths considering different months for 14 stations

5 Conclusions

This study applied the concepts of complex networks to examine the temporal dynamics of rainfall observed at 14 stations in central Vietnam. For each of the 14 rainfall time series, phase space was reconstructed and the reconstructed vectors were used for network construction, i.e. nodes and links. For phase space reconstruction, the optimal delay time was obtained using the average mutual information method and the optimal embedding dimension was obtained using the false nearest neighbor algorithm. With such network construction, the strengths of the nodes were determined. The entropy values of the rainfall time series were also calculated.

The results showed that stations with smaller delay time values (\( \tau ) \), which also happen to be in the north-east and eastern parts of the study area, have less variability in the strengths of nodes. The results also indicated that some adjacent stations have the somewhat similar strength values (or group); however, several stations, that are geographically close, show significantly different network strengths. Stations with low embedding dimensions seem to have left-skewed distribution in node strengths. In the case of entropy, similar entropies for adjacent stations were found more frequently than similar average node strengths, especially in the northeastern part of the study area. There is a high entropy trend in the central part of the study from the north to the west that may be attributed to the rainfall patterns of the study area.

Analysis of the relationship between average node strength versus mean annual rainfall, entropy of the rainfall time series, and station elevation indicated that (1) the station with the highest mean rainfall has the maximum mean strength and the maximum entropy as well; (2) in general, there is a positive and proportional relationship between average strengths of nodes and entropies; (3) the average strengths are low for stations with high elevations, but they have a wide range for stations with low elevations. Analysis of the average node strengths of different months for the 14 stations indicated some distinct temporal patterns (3-month, 4-month, and 6-month patterns) in rainfall dynamics, depending upon the region of the study area. These results indicate the importance of the optimal delay time selection for phase space reconstruction and network construction and recognizing the temporal patterns of rainfall stations.