Abstract
Large distributed databases are split into fragments stored on far distant nodes that communicate through a communication network. Query execution requires data transfers between the processing sites of the system. In this paper we propose a solution for minimizing raw data transfers by re-arranging and replicating existing data within the constraints of the original database architecture. The proposed method gathers incremental knowledge about data access patterns and database statistics to solve the following problem: online re-allocation of the fragments in order to constantly optimize the query response time. We model our solution as a transport network and show in the final section the experimental numerical results we obtain by comparing the improvements obtained between various database configurations, before and after optimization.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Let us consider a distributed database with a set of fragments/shards F stored on a set of sites S in a communication network. A set Q of applications/queries is executed against the database. We start from the assumptions that in order to minimize the query execution time, the data transferred needs to be minimized. Thus the data allocation to the sites of the system needs to be implemented such that data transfer during query execution is minimized. This paper assumes that the fragments have been already determined and (eventually) allocated, and focuses on the problem of allocating (re-allocating) them in order to minimize the total cost of data transmission. In practice, the optimal initial allocation of fragments is not possible without apriori knowledge of the applications running on the database. Our solution relaxes this requirement by allowing an initial allocation unaware of the applications querying data. It then optimizes the allocation by observing, at the database level, the access patterns incurred by the queries and performing redundant re-allocation.
2 Related Work
Many aspects of the data allocation problem have been studied in the literature. Reid and Orlowska [6] studied the communication cost minimization problem while replica allocation modeled as an integer linear programming with minimizing the execution cost has been approached in [1, 3, 6].
Menon in [7] considered non-redundant allocation. This paper focuses on redundant allocation. Wiese in [13] use clustering and clustered attributes. Huang and Chen in [5] propose a simple and comprehensive model that reflects transaction behavior in distributed databases.
The fragment allocation problem is NP-complete [9]. A genetic algorithm is proposed in [12], a genetic search-based clustering in [2] and an evolutionary approach in [4].
A reinforcement learning solution for allocating replicated fragments is presented in [8].
3 Query Evaluation and Data Transfer
Let A(q) be the query evaluation tree for query q. We add a root node to this tree and we obtain a sub-tree rooted in the new node that corresponds to the entire query q (the new root represents the overall q query). Leaf nodes in A(q) correspond to fragments, while internal nodes represent relational operators (unary or binary). When evaluating an operator op from q we get a transfer cost for data from the nodes where each operand of op is evaluated/stored to the node where op is evaluated.
In the next paragraphs we will use the following notations: \(F=\{f_i | i=\overline{1,n}\}\) - fragment set of the database, \(df_i = dim(f_i)\) - size of fragment \(f_i, i=1,n\), \(S = \{s_i | i=\overline{1,m}\}\) - the sites of the system where fragments of F are stored, S(f) - sites of the system where a fragment \(f, f \in F\) is stored, F(s) - fragments stored on site \(s\in S\).
Starting from a predefined (current) state of the database (fragments, sites, current fragment allocation), in [10] we attach two values to each node of the query tree A(q): d and c as follows: d - the size of data associated to the site (fragment size if the current node is a leaf node, or an estimation of the relational operator result size for internal nodes), and the costs vector \(c = (c_1, \ldots , c_m)\) (m is the number of sites) of evaluating the query on all sites. For leaf nodes this equates to the size of the fragment or zero. For an internal node corresponding to an operator op, \(c_i\) is the minimal cost of the required data transfers when the operator op is evaluated on site \(s_i\). See our previous work [10].
The following paragraphs describe the computation method for the vector c in the case of the two possible cases: an unary and a binary operator.
Let op be an unary/binary operator and its current operand(s) A(B) with its associated values: \(d_A\) and \(c_A=[c_1^A, \ldots ,c_m^A]\). The incurred data transfer in the evaluation of op on site \(s_i\) depending on the location of operand(s) is given by the bellow expression (binop=1 for binary operators and 0 otherwise):
We show in Fig. 1 the A(q) tree built for some real values of the fragments size and results of the relational operators.
Using (1), we can compute the values for the vector c associated to query q - that is the root of the A(q) tree in Fig. 1. Query q is executed on a specific site of the system. The vector c that labels the root of the A(q) tree provides the minimal cost of the data transfers during the execution of query q. In the following we will analyze the required data transfer for the two possible cases: when f is a sub-tree of q, or f is used in a combination of unary operators. These cases are depicted in Figs. 2 and 3.
The two cases described above are valid for a sub-tree of the binary operator op. The same applies for the second sub-tree, corresponding to the second operand. If the fragment used by the second sub-tree is also stored on site \(s_i\), then \(c_i^{op}\) will be null. If all fragments used by a query q are stored on the site \(s_i\) where the query is evaluated, then \(c_i^{op}=0\) for all operator nodes from A(q).
Using the above analysis we can infer that by storing the fragment f on a site where the query accessing f is executed (let this site be s), we can reduce the data transfer cost by an amount r - equal to the size of fragment f or with the size of the result of the last unary operator applied to f, but before a binary operator applied to f - i.e. the first unary operator applied to f appearing strictly before a binary operator on f, if such exists. If fragment f appears multiple times in the evaluation tree A(q) (as for example is the case for \(B_1\) in Fig. 1), then the data transfer is reduced on all accesses to fragment f.
Given a distributed database and a time interval, we denote by Q the set of observed queries that are executed against the database and their access patterns in the given time period. Information about the access patterns is stored by the database’s statistical module in views and can be retrieved for analysis. We should note that apriori knowledge about the database queries is not needed (as for the case of fragmentation). Instead we retrieve statistical observed information about operators, their evaluation and operator to query membership relations from the database statistics.
In order to speed up the evaluation of a query \(q \in Q\) on a site \(s \in S\), we can infer a set of replication hints for the fragments accessed by q, denoted as a triple (f, s, c) and signifying that by storing fragment f on site s we can reduce the data transfer cost by c. Since query q can be observed running a number \(r, r \ge 1\), of times on site s, then by using the replicas according to the replication hints the data transfer costs are reduced by an amount of \(r *c\). Considering the replication hints (f, s) proposed by all queries \(q \in Q\) and the amount of reduction in data transfer cost for the resulting fragment storage policy we obtain a set of replication hints for Q denoted as:
In the following we assume that the database dictionary after an observed running interval contains information about: fragments, fragment allocation, queries and fragments accessed by a query. Suppose that this information is made available throughout computed views like an usual database would.
4 Induced Fragment Replication
Let \(s_i\in S\) be a site containing some fragments of the database. Let \(ds_i\) be the available memory space on site \(s_i\). We can only store new fragments within the limits of the available memory space. In the trivial case, if the available memory is infinite, the solution to the replication problems is total replication where the data transfer cost is null for any query. When the available memory space is limited our proposed replication model needs to find the optimal set of replicas within the memory space constraint such that data transfer cost is minimal for the overall set of queries in Q. v If the size of a fragment \(f_i \in F\) is \(df_i\) and the available memory/storage space on site \(s_j \in S\) is \(ds_j\), then we propose a solution modeled as a transport network compatible flow problem.
A replication hint \((f_i, s_j, c_{ij})\) as mentioned in (2) has two possible implementation choices: to be retained/applied or dismissed. We introduce a new variable \(r_{ij}, r_{ij} \in \left\{ 0,1\right\} \) that denotes the above possibilities.
4.1 Network Flow Solution
When modeling the induced replication as a network flow problem we have two possible options: a global variant for the whole set S of sites, or individually for each site \(s \in S\). We will describe the solution model for the former variant.
Given all the above described elements we propose a transport network denoted as:
where: V - is the vertex set, \(V=F \cup S \cup \left\{ start, fin \right\} \). We add two new vertices: start and fin; A - the set of edges of the graph; lo and up correspond to the lower and upper bound, while co is a set of functions that associates a real non-negative value to each edge;
The set of edges A and the functions lo, up, co are defined as following, where |S| denotes the number of sites:
A flow in the above transport network N is a real function: \(fl: A \longrightarrow \mathfrak {R}\) having the following properties:
-
(1)
Capacity Restrictions: \(fl(a)=0\; \text {or}\; lo(a) \le fl(a) \le up(a), \forall a \in A\).
-
(2)
Flow Conservation \(\forall v \in V - \left\{ start, fin \right\} :\)
$$\begin{aligned} \sum \limits _{\begin{array}{c} u \in V,\\ (u,v) \in A \end{array}} fl(u,v) = \sum \limits _{\begin{array}{c} u \in V,\\ (v,u) \in A \end{array}} fl(v,u) \;\text {or}\; \sum \limits _{\begin{array}{c} a=(u,v) \in A,\\ u \in V \end{array}} fl(a) = \sum \limits _{\begin{array}{c} a = (v,u),\\ u \in V \end{array}} fl(a) \end{aligned}$$
The value of the flow can be computed by: \(\sum \limits _{s \in S} fl(s, fin)\). Given a flow in the network N, we can determine a cost given according to the following formula:
where co(a) is the value associated to each edge, introduced above and computed according to (4, 5, 6). A flow has a maximum cost if there is no other flow with a higher cost. Conditions from (6) state that storing a fragment f on site s is done on the entire fragment or not at all. Equation (5) state that storing fragments in a site cannot exceed the available storage space on that site. Equation (4) state that the number of fragment replicas is unbounded. The flow cost is only influenced by the values of the co function in (6) and its value represents the amount of cost reduction for data transfers.
Finding the allocation schema (with replication) that maximizes the data transfer cost reduction can be solved in the above conditions by finding the maximum compatible cost flow in the transport network.
This transport problem is a special one due to its capacity restrictions and the maximum cost requirement, but not the maximum flow. We elaborate a backtrack type algorithm which determines for every site \(s_i\) a set of fragments that can be replicated on that site. From all the constructed sets we only keep the one with the maximum cost. The main issue of the backtracking algorithm is that it performs an exhaustive search of the solution space. The explored space grows proportionally with the product of the number of fragments and sites and the required time to solution grows and becomes unrealistic for an online system. As a solution to this issue we elaborate an approximate algorithm based on a greedy approach to find the maximum cost flow. The previous algorithm is simplified by considering only one set R in step 2. Candidate replication fragments will be allocated to a site in cost descending order - as long as there is available space. This approach reduces algorithm complexity while still allowing a close approximation of the solution. We thought our proposal as a module in a database system, that runs quasi-continuously and provides replication hints whenever these exist and are possible. As a consequence the algorithm should be as fast as possible and with minimal impact on the database.
5 Experimental Results
In order to evaluate the efficiency of the proposed solutions, we run a battery of simulations and tests. For assessing the generality of our model we randomly generate a set of database configurations. To test the proposed Induced Fragment Replication (IFR) we generate different sets of large distributed databases. The synthetic experiments were preferred due to the lack of large and statistically complete and consistent real databases. We choose to generate statistically database configurations as we only need to process the meta-information from the database and not the actual data. We first generated an initial database state by averaging the evaluation costs over a number of uniformly sampled generated distribution configurations. Then we generate fifteen small to large sample database configurations drawn from the same distribution (see [11] about the configuration generator, test data and results) (Fig. 4).
The fifteen distributed database configurations are presented in Table 1.
In the following we present the analysis of the tests’ results. In order to asses the improvements we measure the network transfer before and after applying the induced fragmentation on a series of test databases. We consider the percent of data transfer cost needed in query processing after applying the IFR solution compared to the transfer cost in the initial database as the measure of query optimization. We test the replication problem for the next cases: (a) the available space is equal with the space occupied by the fragments; and (b) the available space is 2 * space occupied by the fragments. Table 2 presents the percents and execution times in seconds for IFR problem when (Maximum Fragment Replication Number - the maximal number of generated fragment replicas) MFRN=1 and MFRN=5 and the available free space corresponds to above space constraints (a) and (b).
The proposed network flow problem is a bit uncommon as it needs a flow of maximum cost (regardless of the value of the flow). There is no solver, to our knowledge, for this problem formulation and thus we implemented a backtracking solution to solve the flow problem and then we proposed a faster Greedy approximation algorithm for the same problem.
In Table 2 we show the transport cost expressed as a percentage of the original database (before applying induced fragmentation). We also present the execution times in seconds for each algorithm variant. The backtracking variant has an exact solution but the search space explodes exponentially with the product of the number of fragments, sites and queries and thus its execution times explode exponentially with this product (search space). In Table 2 we left empty the cells where the backtracking solution failed to give a solution within the time required for a Mathematical solver to solve the equivalent linear programming problem.
The proposed Greedy solution has an almost constant execution time with a ratio of 6:1 between the fastest and slowest solution. As execution time this is more than appropriate for a system where our module is run quasi-continuously and produces fragmentation hints.
The cost penalty obtained by applying the proposed approximation Greedy solution is around 1.75% higher compared to the exact backtracking solution. However the execution time for the Greedy approach compared to the exact solutions is in the order of 550 times less. The Greedy solution average running time is around 0.5 s with the largest execution time being 1.24 s.
6 Conclusions and Future Work
In this paper we provide a solution for query response time improvement modeled as a maximal compatible flow cost in transport networks. We perform online perpetual data replication within the original space constraints of the database. The major contribution is the Greedy algorithm that solves the transport flow problem in a fraction of the time needed to a classical solver or algorithm with an approximation penalty cost under 2% making this algorithm suitable to online execution within the database and as a replacement to the classical solvers.
We only considered so far the transport cost as the argument driving data replication and allocation in order to improve query response time. While it solves a complex problem this model is in many cases too simplistic. As future work we would like to extend our model to cases where data allocation is driven by more parameters (CPU, storage, network capacities, etc.) or to non-relational cloud databases where principles are different.
References
Apers, P.M.G.: Data allocation in distributed database systems. ACM T Datab. Syst. 13(3), 263–304 (1988). Applied Mathematical Programming. Addison-Wesley (1977)
Cheng, C.H., Lee, W.K., Wong, K.F.: A genetic algorithm-based clustering approach for database partitioning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(3), 215–230 (2002)
Dokeroglu, T., Bayır, M.A., Cosar, A.: Integer linear programming solution for the multiple query optimization problem. In: Czachórski, T., Gelenbe, E., Lent, R. (eds.) Information Sciences and Systems 2014, pp. 51–60. Springer, Cham (2014). doi:10.1007/978-3-319-09465-6_6
Graham, J., Foss, J.A.: Efficient allocation in distributed object oriented databases. In: Proceedings of 16th International Conference on Parallel and Distributed Computing Systems (ISCA), pp. 471–412 (2003)
Huang, Y., Chen, J.: Fragment allocation in distributed database design. J. Inf. Sci. Eng. 17, 491–506 (2001)
Lin, X., Orlowska, M.: An integer linear programming approach to data allocation with the minimum total communication cost in distributed database systems. Inf. Sci. 85, 1–10 (1995)
Menon, S.: Allocating fragments in distributed databases. IEEE Trans. Parallel Distrib. 16(7), 577–585 (2005)
Morffi, A.R., et al.: A reinforcement learning solution for allocating replicated fragments in a distributed database. Comput. Sist. 11(2), 117–128 (2007)
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Heidelberg (2011)
Tambulea, L., Darabant, A.S., Varga, V.: Data transfer optimization in distributed database query processing. Studia Univ Babes Bolyai, Informatica LIX(1), 71–82 (2014)
Tambulea, L., Darabant, A. S., Varga, V.: Query Evaluation Optimization in a Distributed Database using Data Reorganization (2015). http://www.cs.ubbcluj.ro/~ivarga/ddbpaper
Virk, R.S., Singh, D.G.: Optimizing access strategies for a distributed database using genetic fragmentation. Int. J. Comput. Sci. Netw. Secur. 11(6), 180–183 (2011)
Wiese, L.: Clustering-based fragmentation and data replication for flexible query answering in distributed databases. Int. J. Cloud Comput. 3(1), 3–18 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Darabant, A.S., Tambulea, L., Varga, V. (2017). Access Patterns Optimization in Distributed Databases Using Data Reallocation. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-64468-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)