Keywords

1 Introduction

In a traditional database management system (DBMS), finding the query execution plan (QEP) with the best query execution time among those QEPs generated by a query optimizer is the key to the performance of a query. However, in a cloud database system, minimizing query response time is not the only goal of query optimization. As hardware usage is charged on-demand and scalability is available to users, query execution monetary cost also needs to be considered as one of the objectives for optimizing QEPs. Meanwhile, the cloud providers need to minimize SLA violation rate in addition to fulfilling the users’ requirements of query execution time and monetary cost for query execution. Traditionally, the query optimizer evaluates the time and monetary costs of different QEPs to derive the best QEP for a query before execution. These time and monetary costs are estimated based on the data statistics available to the query optimizer at the moment when the query optimization is performed. These statistics are often approximate, which may result in inaccurate estimates for the time and monetary costs needed to execute the query. Thus, the QEP generated before query execution may not be the best one.

To deal with this issue, there exist methods proposed to re-optimize queries during their execution [1,2,3]. Ortiz and et al. [1] apply deep reinforcement learning (RL) to learn a representation of queries, which can then be used in downstream query optimization tasks. Marcus and et al. present work of a deep RL-based join optimizer, ReJOIN [2], which orders a preliminary view of the potential for deep RL in this context. In these techniques, QEPs are re-optimized multiple times by a deep RL model. Kipf [3] uses Deep Neural Network (DNN) to learn cardinality estimates. Wu and et al. [6] have proposed Sample, a query re-optimization algorithm that updates data statistics estimated from a sample of tuples collected during runtime. However, none of them addresses monetary costs and SLA requirements for cloud databases at the same time. In this paper, we present two algorithms, ReOptRL and SLAReOptRL, that use reinforcement learning to perform multi-objective query re-optimization for query processing in an end-to-end cloud database system. The algorithms employ a reward function designed specifically for query re-optimization considering query execution time, money cost and SLA requirements.

2 The Reinforcement Learning-Based Multi-objective Query Re-optimization Algorithm (ReOptRL)

We choose RL instead of supervised learning methods because RL does not require training data, which is a labeled dataset of past actions, to be available in advance to train the learning model before the model can be used to predict future actions. There are various kinds of RL algorithms that have been proposed. Q-Learning is one of the popular value-based RL algorithms and using the Bellman equation [4].

$$Q\left({S}_{t},{a}_{t}\right)\leftarrow Q\left({S}_{t},{a}_{t}\right)+\alpha \left({R}_{t}+\Upsilon Q\left({S}_{t+1},{a}_{t+1}\right)-Q\left({S}_{t},{a}_{t}\right)\right)$$
(1)

In Q-Learning, a table (called Q-table) is used to store all the potential state-action pairs (Sn, an) and an evaluated Q-value associated with this pair. In Eq. (1), Q (St, at) is an evaluated value (called Q-value) for executing Action at at State St. This value is used to select the best Action to perform under the current state. In our scenario, there are many available containers on which a single query operator can be executed. Thus, many state-action pairs are in the Q-table potentially. Iterating a large Q-table incurs extra time overhead which delays the query execution. To solve this issue, we apply Deep Q Network (DQN) [4] as our reinforcement learning for query re-optimization. DQN works similarly to Q-Learning. The major difference is that, given a state, instead of using the Q-table, it uses a neural network to estimate the Q-values for all the potential actions. The input of the neural network is the current state. For the current QEP to represent the current state, we use a one-hot vector adapted from the recent work [2] to represent a QEP. The ReOptRL algorithm is given in Fig. 1. First, a query is submitted to a query optimizer which generates the QEP (logical plan) for the query (Line 4). Then the QEP is converted into a one-hot vector representation (Line 7). This vector is sent to the RL model, which is a neural network. The RL model will evaluate the Q-values for all the potential actions to execute the next available query operator (Line 8). Each of these actions consists of two parts, a physical operator and a container (machine) to execute the physical operator. Then the action with the best Q-value will be selected and performed by the DBMS (Line 9). After that, the executed query operator is discarded from the QEP (Line 10). The reward is updated with the time and monetary cost needed to execute the operator and then the expected Q-value is updated by the Bellman Eq. (2) with the updated reward (Lines 11–13). The weights of the neural network are updated accordingly by the back-propagation method (Line 14). This process repeats for each operator in the QEP and terminates when all the operators in the QEP are executed. The query results are then sent to the user (Line 17).

Fig. 1.
figure 1

The ReOptRL algorithm

In ReOptRL, after an action is performed, the reward function is used to evaluate the action. This gives feedback on how the selected action performs to the learning model. The performed action with a high reward will be more likely to be selected again under the same state. The reward function plays a key role in the entire algorithm. According to the Bellman equation, if the reward of performing a previous action At-1 is high on state St-1, the Q-value will also be high. This means that, given the same state, the action with the good previous performance will have a higher chance to be selected. In our algorithm, we would like the actions with low query execution time and monetary cost to be the ones that will be more likely to be chosen. To reflect this feature, we define the reward function as follows:

$$ Reward\,R = \frac{1}{{1 + \left( {W_{t} \,{*}\,T_{op}^{q} } \right) + \left( {W_{m} \,{*}\,M_{op}^{q} } \right)}} $$
(2)

where \({W}_{t}\) and \({W}_{m}\) are the time and monetary weights provided by the user, and \({\mathrm{T}}_{\mathrm{op}}^{\mathrm{q}}\) and \({\mathrm{M}}_{\mathrm{op}}^{\mathrm{q}}\) are the time and monetary costs for executing the current operator op in query q.

According to this reward function, the query is executed based on the user’s preference.

which is either the user wanting to spend more money for a better query execution time or vice versa. We call these two preferences Weights. These two weights defined by the user are called Weight Profile (wp), which is a two-dimensional vector, and each dimension is a number between 0.0 to 1.0. Notice that the user only needs to specify one dimension of the weight profile, the other dimension is computed as 1-Weight automatically. The detail can be found in our previous work [5].

3 The SLA-Aware Reinforcement Learning-Based Multi-objective Query Re-optimization Algorithm (SLAReOptRL)

An SLA is a contract between cloud service providers and consumers, mandating specific numerical target values which the service needs to achieve. Considering an SLA in query processing is important for cloud databases. If an SLA violation happens, the cloud service providers need to pay a penalty to their users in a form such as money or CPU credits. From a profit-oriented perspective, cloud service providers would want to keep the number of SLA violations as low as possible. Different cloud service providers implement different SLAs with their users. Using time and monetary costs to execute a query as the SLA requirements has been studied in [1]. We find them practical and more specific to users and thus adopt the same SLA requirements in our work.

In particular, the reward function shown in Eq. (4) is extended from Eq. (2) to make it possible to select the best action according to the SLA requirements:

$$ Reward\,R = \frac{1}{{1 + \left( {W_{t} \,{*}\,\left( {T_{op}^{q} + P_{t} } \right)} \right) + \left( {W_{m} \,{*}\,(M_{op}^{q} + P_{m} } \right))}} $$
(3)

where \(T_{op}^{q}\) and \(M_{op}^{q}\) are the time and monetary costs for executing the current operator op in query q

$$ P_{t} = \alpha_{op} \,{*}\,delay\_time,\quad P_{m} = \alpha_{op} \,{*}\,exceeded\_money $$

where \(\alpha_{op} { }\) is the operator impact rate of the operator type op

$$ delay\_time = \left\{ {\begin{array}{*{20}c} 0 \\ {{ }T_{op}^{q} - SLA.T_{op}^{q} } \\ \end{array} } \right.{ }\begin{array}{*{20}c} { } \\ {if\,T_{op}^{q} > SLA.T_{op}^{q} } \\ \end{array} $$
(4)
$$ exceeded\_money = \left\{ {\begin{array}{*{20}c} 0 \\ {{ }M_{op}^{q} - SLA.M_{op}^{q} } \\ \end{array} } \right.{ }\begin{array}{*{20}c} { } \\ {if\,M_{op}^{q} > SLA.M_{op}^{q} } \\ \end{array} $$
(5)

In this reward function (Eq. (4)), \(P_{t}\) and \({ }P_{m}\) reflect the extra costs for executing a query operator if the SLA is violated. If the SLA is not violated for executing every operator, then this equation is the same as the reward function used in ReOptRL (Eq. (2)). In Eqs. (4) and (5), \(delay\_time\) is the amount of difference between the actual time to execute a query operator and the maximum time allowed to execute this query operator as specified in the SLA. The same idea applies to \(exceeded\_money\) for monetary costs. We use these two values to quantify the SLA violation on query execution time and monetary cost. In Eq. (4), these two values are used to compute \(P_{t}\) and \({ }P_{m}\). It shows that the larger the number of SLA violations, the smaller the reward becomes. We build the reward function this way so that the reward is related to SLA violations. Also, we use the query operator impact rate \(\alpha_{op}\) to scale up the impact of SLA violations on different types of operators.

4 Performance Evaluation

There are two sets of machines used in our experiments. A single local machine used to train the machine learning model and to perform the query optimization. This local machine has an Intel i5 2500K Dual-Core processor running at 3 GHz with 16 GB DRAM. The second set consists of 10 dedicated Virtual Private Servers (VPSs) that are used for the deployment of the query execution engine. The query optimizer and the query engine used in the experiments are modified from the open-source database management system, PostgreSQL 8.4. The data are distributed among these VPSs. The queries and database tables are generated using the TPC-H database benchmark. The database tables are populated with 1,000 GB data using the default data generator. We run 50,000 queries and these queries are generated by the query templates randomly selected from the 22 query templates from the benchmark.

Fig. 2.
figure 2

Time (a) and monetary cost (b) performance for executing queries using different algorithms

We compare the performance results obtained when the following query re-optimization algorithms are incorporated into query processing: 1) our two proposed algorithms, ReOptRL and SLAReOptRL; 2) the algorithm where a query re-optimization is conducted automatically after the execution of each operator in the query is completed (denoted as ReOpt), which we developed based on the work presented in [5]; 3) the algorithm where a query re-optimization is conducted by a supervised machine learning model decision (denoted as ReOptML). 4) the algorithm proposed in [6] where query optimization uses sampling-based query estimation (denoted as Sample), and 5) the algorithm that uses no re-optimization (denoted as NoReOpt).

Fig. 3.
figure 3

Average SLA violation rates when executing queries using different algorithms

From Fig. 2 (a) and (b), we can see that, for both the query execution time and monetary costs, on average SLAReOptRL performs the best and ReOptRL performs the second best among all the algorithms. Specifically, comparing with the baseline NoReOpt where no re-optimization is conducted, the query execution time improvement using SLAReOptRL is 45%, ReOptRL 39%, ReOptML 27%, ReOpt 13%, and Sample 10%, while the monetary cost improvement using SLAReOptRL is 62%, ReOptRL 52%, ReOptML 27%, ReOpt 17%, and Sample 5%. Especially, the monetary cost has a significant improvement (SLAReOptRL and ReOptRl are 62% and 52% better than NoReOpt, repsectively). Moreover, from Fig. 3, we can also find that by using SLAReOptRL, the SLA violation rate is the lowest one among the SLA violation rates caused by all the algorithms. This shows the positive effect of considering SLA requirements in re-optimization.

5 Conclusion

This paper presents two query re-optimization algorithms called ReOptRL and SLAReOptRL. Both use a reinforcement learning-based model to decide the physical query operator and machines to execute an operator from a query execution plan (QEP) for a query in a cloud database system. The experiments conducted using the TPC-H database benchmark show that both SLAReOptRL and ReOptRL improve query response time (from 12% to 45%) and monetary cost (from 17% to 62%) over the existing algorithms In addition, we also find that when there are SLA requirements, SLAReOptRL performs 20% better than ReOptRL on SLA violation rate.