Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In a recent SIGMOD blog entry [23], Guy Lohman asked “Is query optimization a ‘solved’ problem?”. He argued that current query optimizers and their cost models can be critically wrong. Instead of relying on wrong cost models, Stillger et al. have proposed LEO-DB2, a learning optimizer [40]; its enhanced performance with respect to classical query optimizers strengthens the claim of discrepancies introduced by the predetermined cost models. Stillger et al. have proposed LEO-DB2, a learning optimizer [40]; its enhanced performance with respect to classical query optimizers strengthens the claim of discrepancies introduced by the predetermined cost models.

This is our perspective in this article: we propose a learning approach to performance tuning of database applications. By performance tuning, we mean selection of an optimal physical database configuration in view of the workload. In general, configurations differ in the indexes, materialized views, partitions, replicas, and other parameters. While most existing tuning systems and literature [10, 38, 39] rely on a predefined cost model, the objective of this work is to validate the opportunity for a tuning strategy to do without.

To achieve this, we propose a formulation of database tuning as a reinforcement learning problem (see Sect. 3). The execution of queries and updates is modelled as a Markov decision process whose states are database configurations, whose actions are configuration changes, and whose rewards are functions of the cost of configuration change and query/update evaluation. This formulation does not rely on a preexisting cost model, rather it learns it.

We present a solution to the reinforcement learning formulation that tackles the curse of dimensionality (Sect. 4). To do this, we reduce the search space by exploiting the quasi-metric properties of the configuration change cost, and we approximate the cumulative cost with a linear model. We formally prove that, assuming such a linear approximation is sound, our approach converges to an optimal policy for estimating the cost.

We then tackle in Sect. 5 the problem of overfitting: to avoid instability while learning the cost model, we add a regularization term in learning the cost model. We formally derive a bound on the total regret of the regularized estimation, that is logarithmic in the time step (i.e., in the size of the workload).

We instantiate our approach to the use case of index tuning (Sect. 6), developing in particular optimizations specific to this use case to reduce the search space. The approaches of Sects. 4 and 5 provide us with two algorithms COREIL and rCOREIL to solve the index tuning problem.

We use this case to demonstrate the validity of a cost-model oblivious database tuning with reinforcement learning, through experimental evaluation on a TPC-C workload [30] (see Sect. 7). We compare the performance with the Work Function Index Tuning (WFIT) algorithm [39]. Results show that our approach is competitive yet does not need knowledge of a cost model. While the comparison of WFIT with COREIL establishes reinforcement learning as an effective approach to automatize the index tuning problem, performance of rCOREIL with respect to COREIL demonstrates that the learning performance is significantly enhanced by a crisp estimation of the cost model.

Related work is discussed in Sect. 2.

This article extends a conference publication [6]. In addition to minor edits and precisions added throughout the paper, the following material is novel: the discussion of related work on reinforcement learning (Sect. 2.2); the result of convergence to an optimal policy (Proposition 2) and its proof; the introduction of regularization (Sect. 5), including bounds on regrets (Theorem 1 and Sect. 6.4), the experimental comparison between regularized and non-regularized versions of our approach (Sect. 7.4) and the study of the quality of their cost estimators (Sect. 7.5).

2 Related Work

We now review the related work in two areas of relevance: self-tuning databases, and the use of reinforcement learning for data management applications.

2.1 Automated Database Configuration

Table 1 provides a brief classification of related research in automated database configuration, in terms of various dimensions: the offline or online nature of the algorithm (see further), and the physical design aspects being considered by these works (index selection, vertical partitioning, or mixed design together with horizontal partitioning and replication).

Table 1. Automated physical database design

Offline Algorithms. Traditionally, automated database configuration is conducted in an offline manner. In that approach, database administrators (DBAs) identify representative workloads from the trace of historical database queries and updates. That task can be done either manually or with the help of sophisticated tools provided by database vendors. Based on these representative workloads, new database configurations are realized: for example, new beneficial indexes to be created [1, 20, 45], smart vertical partitioning for reducing I/O costs [17, 22, 33], or possibly a combination of index selection, partitioning and replication for both stand-alone databases [11, 13, 27] and parallel databases [2, 32].

Online Algorithms. Given the increasing complication and agility of database applications, coupled with the introduction of modern database environments such as database-as-a-service, the aforementioned manual task of DBAs, though it can be done at regular times in an offline fashion, becomes even more tedious and problematic. Therefore, it is desirable to design more automated solutions to the database design problem that are able to continuously monitor changes in the workload and react in a timely manner by adapting the database configuration to the new workload. In fact, the problem of online index selection has been well-studied in the past [10, 12, 24, 25, 39]. Generally, these techniques adopt the same working model in which the system continuously tracks the incoming queries for identifying candidate indexes, profiles the benefit of the indexes, and realizes the ones that are most useful for query execution. Specifically, an online approach to physical design tuning (index selection) was proposed in [10]. The essence of the algorithm is to progressively choose the optimal plan at each step by using a case-by-case analysis on the potential benefits that we may lose by not implementing relevant candidate indexes. That is, each new database configuration is selected only when a physical change, i.e., creating or deleting an index, would be helpful in improving system performance. Similarly, a framework for continuous online physical tuning was proposed in [38] where effective indexes are created and deleted in response to the shifting workload. Furthermore, the framework is able to self-regulate its performance by providing explicit mechanism for controlling the overhead of profiling the benefit of indexes.

One of the key components of an index selection algorithm is profiling indexes’ benefit, i.e., how to evaluate the cost of executing a query workload with the new indexes as well as the cost of configuration transition, i.e., creating and deleting indexes. To realize this function, most of the aforementioned online algorithms exploit a what-if optimizer [14] which returns such estimated costs. For examples, the what-if optimizer of DB2 was used in [39], and the what-if optimizer of SQL Server was employed in [10], while the classical optimizer of PostgreSQL was extended to support what-if analysis in [38]. However, it is well-known that invoking the optimizer for estimating the cost of each query under different configurations is expensive [27]. In this work, we propose an algorithm that does not require the use of a what-if optimizer while being able to adaptively provide a better database configuration in the end, as reported in our experimental results.

More recently, as column-oriented databases have attracted a great deal of attention in both academia and industry, online algorithms for automated vertical partitioning becomes critical for such an emerging breed of database systems [3, 21, 36]. Specifically, a storage advisor for SAP’s HANA in-memory database system was proposed in [36] in order to take advantage of both columnar and row-oriented storage layouts. At the core of that storage advisor is a cost model which is used to estimate and compare query execution times for different stores. Similarly, a continuous layout adaptation has been recently been introduced in [3] with the aim to support multiple storage layouts in a single engine which is able to adapt to changing data access patterns. The adaptive store monitors the access patterns of incoming queries through a dynamic window of N queries and devises cost models for evaluating the workload and layout transformation cost. Furthermore, in order to efficiently find a near optimal data layout for a given workload, the hybrid store exploits proper heuristic techniques to prune the immense search space of alternative data layouts. On the contrary, the algorithm introduced in [21] uses data mining techniques for vertical partitioning in database systems. The technique is based on closed item sets mining from a query set and system statistic information at run-time, and hence is able to automatically detect changing workloads and perform a re-partitioning action without the need of interaction from DBAs.

Recently, [5, 9] have aimed at solving the problem of online index selection for large join queries and data warehouses. Since both approaches use a predefined cost model, similarly to [39], it renders them susceptible to erroneous estimations of the cost model. In our work, we are removing the effects of errors made by the cost model by learning it. This gives our approach more robustness and flexibility than cost-model–dependent ones. Moreover, [9] uses genetic algorithms to optimize the selection of multi-table indexes incrementally. But genetic algorithms generally performs worse than reinforcement learning [34] in this kind of dynamic optimization tasks due to its more exploratory nature. In addition, reinforcement learning agents have a more real-time behaviour than genetic behaviour. In [5], authors use a heuristics approach where they incrementally look for frequent itemsets in a query workload. With the knowledge base acquired from there updates, indexes are generated for the frequent itemsets while eliminating the ones generated for infrequent itemsets. Due to such greedy behaviour, higher variance and instability is expected than for reinforcement learning approaches where a trade-off between exploration and exploitation is reached through learning.

Discussion. Compared to the state-of-the-art in online automated database design, our proposed overall approach is more general and has the potential to be applied for various problems such as index selection, horizontal/vertical partitioning design, and in combination with replication as well; we note however that we experiment solely with index tuning in this article. More importantly, as our proposed online algorithm is able to learn the estimated cost of queries gradually through subsequent iterations, it does not need the what-if optimizer for estimating query cost. Therefore, our proposed algorithm is applicable to a wider range of database management systems which may not implement a what-if optimizer or expose its interface to users.

2.2 Reinforcement Learning in Data Management

Reinforcement learning [41] is about determining the best thing to do next under an evolving knowledge of the world, in order to reach a goal. This goal is commonly represented as maximization of the cumulative reward obtained while performing some actions. Here each action leads to an individual reward and to a new state, usually in a stochastic manner. Markov decision processes (MDPs) [29] are a common model for reinforcement learning scenarios. In this model each action leads to a new state and to a given reward according to a probability distribution that must be learned. This implies an inherent trade-off between exploration (trying out new actions leading to new states and to potentially high rewards) and exploitation (performing actions already known to yield high rewards), a compromise explored in depth in, e.g., the stateless model of multi-armed bandits [4]. Despite being well-adapted to the modelling of uncertain environments, the use of MDPs in data management applications has been limited so far. The use of MDP for modelling data cleaning tasks has been raised in [7]. In that paper, the authors discussed the absence of a straightforward technique to do that because of the huge state space. More generally, the following reasons may explain the difficulties in using reinforcement learning in data management applications:

  1. (i)

    As in [7], the state space is typically huge, representing all possible partial knowledge of the world. This can be phrased as the curse of dimensionality.

  2. (ii)

    States have complex structures, namely that of the data, or, in our case, of the database configuration.

  3. (iii)

    Rewards may be delayed, obtained after a long sequence of state transitions. This is for example the case in focused Web crawling, which is a data-centric application domain. Still multi-armed bandits have been successfully applied [16] to this problem.

  4. (iv)

    Because of data uncertainty, there may be only partial observability of the current state. That changes the problem to a partially observable Markov decision process [43].

The last two issues are fortunately not prevalent in the online tuning problem discussed here. This lets us formulate online tuning problem as an MDP and focus on a solution to the first two problems.

3 Problem Definition

Let R be a logical database schema. We can consider R to be the set of its possible database instances. Let S be the set of corresponding physical database configurations of instances of R. For a given database instance, two configurations s and \(s'\) may differ in the indexes, materialized views, partitions, replicas, and other parameters. For a given instance, two configurations will be logically equivalent if they yield the same results for all queries and updates.

The cost of changing configuration from \(s \in S\) to \(s' \in S\) is denoted by the function \(\delta (s,s')\). The function \(\delta (s,s')\) is not necessarily symmetric, i.e., we may have \(\delta (s, s') \ne \delta (s', s)\). This property emerges as the cost of changing configuration from s to \(s'\) and the reverse may not be the same. On the other hand, it is a non-negative function and also verifies the identity of indiscernibles: formally, \(\delta (s, s') \ge 0\) and the equality holds if and only if \(s = s'\). Physically this means that there is no free configuration change. As it is always cheaper to do a direct configuration change, we get

$$\begin{aligned} \forall s, s', s'' \in S \quad \delta (s, s'') \le \delta (s, s') + \delta (s', s''). \end{aligned}$$

This is simply the triangle inequality. As \(\delta \) exhibits the aforementioned properties, it is a quasi-metric on S.

Let Q be a workload, defined as a schedule of queries and updates. For brevity, we refer to both of them as queries. To simplify, we consider the schedule to be sequential and the issue of concurrency control orthogonal to the current presentation. Thus, query \(q_t\) represents the \(t^{th}\) query in the schedule, which is executed at time t.

We model a query \(q_t\) as a random variable, whose generating distribution may not be known a priori. It means that \(q_t\) is only observable at time t. The cost of executing query \(q\in Q\) on configuration \(s\in S\) is denoted by the function cost(sq). For a given query, the cost function is always positive as the system have to pay some cost to execute a query.

Let \(s_0\) be the initial configuration of the database. At any time t the configuration is changed from \(s_{t-1}\) to \(s_t\) with the following events in order:

  1. 1.

    Arrival of query \(q_t\). We call \(\hat{q}_t\) the observation of \(q_t\) at time t.

  2. 2.

    Choice of the configuration \(s_t \in S\) based on \(\hat{q}_1,\hat{q}_2,\ldots ,\hat{q}_t\) and \(s_{t-1}\).

  3. 3.

    Change of configuration from \(s_{t-1}\) to \(s_t\). If no configuration change occurs at time t, then \(s_t = s_{t-1}\).

  4. 4.

    Execution of query \(\hat{q}_t\) under the configuration \(s_t\).

Thus, the system has to pay the sum of the cost of configuration change and that of query execution during each transition. Now, we define per-stage cost as

$$\begin{aligned} C(s_{t-1}, s_{t}, \hat{q_{t}}) :=\delta (s_{t-1}, s_{t}) + cost(s_{t}, \hat{q_{t}}). \end{aligned}$$

We can phrase in other words the stochastic decision process of choosing the configuration changes as a Markov decision process (MDP) [29] where states are database configurations, actions are configuration changes, and penalties (negative rewards) are the per-stage cost of the action. Note that in contrast to the general framework of MDPs, transitions from one state to another on an action are deterministic. Indeed, in this process there is no uncertainty associated with the new configuration when a configuration change is decided. On the other hand, penalties are stochastic, as they depend on the query which is a random variable. In the absence of a reliable cost model, the cost of a query in a configuration is not known in advance. This makes penalties uncertain.

Ideally, the problem would be to find the sequence of configurations that minimizes the sum of future per-stage costs. We assume an infinite horizon [41], which means an action will affect all the future states and actions of the system. But it makes the cumulative sum of future per-stage costs infinite. One practical way to circumvent this problem is to introduce a discount factor \(\gamma \in [0,1)\). Mathematically, it makes the cumulative sum of per-stage costs convergent. Physically, it gives more importance to immediate costs than to costs distant in the future, which is a practical intuition. Now, the problem translates into finding the sequence of configurations that minimize a discounted cumulative cost defined with \(\gamma \). Under Markov assumption, a sequence of configuration changes is represented by a function, called policy \(\pi : S \times Q \rightarrow S\). Given the current configuration \(s_{t-1}\) and a query \(\hat{q}_t\), a policy \(\pi \) determines the next configuration \(s_t :=\pi (s_{t-1}, \hat{q}_t)\).

We define the cost-to-go function \(V^\pi \) for a policy \(\pi \) as:

$$\begin{aligned} V^\pi (s) :=\mathbb E \left[ \sum _{t=1}^{\infty } \gamma ^{t-1} C(s_{t-1},s_{t},\hat{q}_{t}) \right] \text {such that}{\left\{ \begin{array}{ll}s_0 = s\\ s_t = \pi (s_{t-1},\hat{q}_t), t\ge 1\end{array}\right. } \end{aligned}$$
(1)

where \(0<\gamma <1\) is the discount factor. The value of \(V^\pi (s)\) represents the expected cumulative cost for the following policy \(\pi \) from the current configuration s.

Let \(\mathscr {U}\) be the set of all policies for a given database schema. Our problem can now be formally phrased as to minimize the expected cumulative cost, i.e., to find an optimal policy \(\pi ^*\) such that

$$ \pi ^* :=\mathop {\text {arg min}}\limits _{\pi \in \mathscr {U}} V^\pi (s_0) $$

where the initial state \(s_0\) is given.

4 Adaptive Database Tuning

4.1 Algorithm Framework

In order to find the optimal policy \(\pi ^*\), we start from an arbitrary policy \(\pi \), compute an estimation of its cost-to-go function, and incrementally attempt to improve it using the current estimate of the cost-to-go function \(\overline{V}\) for each \(s \in S\). This strategy is known as policy iteration [41] in the reinforcement learning literature.

Traditionally, policy iteration functions as follows. Assuming the probability distribution of \(q_t\) is known in advance, we improve the cost-to-go function \(\overline{V}^{\pi _t}\) of the policy \(\pi _t\) at iteration t using

$$\begin{aligned} \overline{V}^{\pi _t}(s) = \min _{s'\in S} \left( \delta (s,s')+\mathbb E \left[ cost(s',q)\right] + \gamma \overline{V}^{\pi _{t-1}}(s') \right) \end{aligned}$$
(2)

We obtain the updated policy as \(\mathop {\text {arg min}}_{{\pi _t}\in \mathscr {U}} \overline{V}^{\pi _t}(s)\). The algorithm terminates when there is no change in the policy. The proof of optimality and convergence of policy iteration can be found in [28].

Unfortunately, policy iteration suffers from several problems. First, there may not be any proper model available beforehand for the cost function cost(sq). Second, the curse of dimensionality [28] makes the direct computation of \(\overline{V}\) hard. Third, the probability distribution of queries is not assumed to be known a priori, making it impossible to compute the expected cost of query execution \(\mathbb E \left[ cost(s',q)\right] \).

figure a

Instead, we apply the basic framework shown in Algorithm 1. The initial policy \(\pi _0\) and cost model \(C_0\) can be initialized arbitrarily or using some intelligent heuristics. In line 5 of Algorithm 1, we have tried to overcome the issues at the root of the curse of dimensionality by juxtaposing the original problem with approximated per-stage cost and cost-to-go function. Firstly, we map a configuration to a vector of associated feature \(\phi (s)\). Then, we approximate the cost-to-go function by a linear model \(\theta ^T\phi (s)\) with parameter \(\theta \). It is extracted from a reduced subspace \(S'\) of configuration space S that makes the search for optimal policy computationally cheaper. Finally, we learn the per-stage cost \(C(s,s',\hat{q})\) by a linear model \(\zeta ^T\eta (s, \hat{q})\) with parameter \(\zeta \). This method does not need any prior knowledge of the cost model, rather it learns the model iteratively. Thus, we have resolved shortcomings of policy iteration and the need of predefined cost model for the performance tuning problem in our algorithm. These methods are depicted and analysed in the following sections.

4.2 Reducing the Search Space

In order to reduce the size of search space in line 5 of Algorithm 1, we filter the configurations that satisfy certain necessary conditions deduced from an optimal policy.

Proposition 1

Let s be any configuration and \(\hat{q}\) be any observed query. Let \(\pi ^*\) be an optimal policy. If \(\pi ^*(s,\hat{q})=s'\), then \( cost(s, \hat{q}) - cost(s',\hat{q}) \ge 0. \) Furthermore, if \(\delta (s,s')>0\), i.e., if the configurations certainly change after a query, then \( cost(s, \hat{q}) - cost(s',\hat{q}) > 0. \)

Proof

Since \(\pi ^*(s,\hat{q})=s'\), we have

$$\begin{aligned}&\delta (s,s') + cost(s',\hat{q}) + \gamma V(s') \\&\quad \le cost(s, \hat{q}) + \gamma V(s)\\&\quad = cost(s, \hat{q}) + \gamma \mathbb E \left[ \min _{s''}\left( \delta (s,s'') + cost(s'',\hat{q}) + \gamma V(s'') \right) \right] \\&\quad \le cost(s, \hat{q}) + \gamma \delta (s,s') + \gamma V(s'), \end{aligned}$$

where the second inequality is obtained by exploiting triangle inequality \(\delta (s,s'') \le \delta (s,s') + \delta (s',s'')\), as \(\delta \) is a quasi-metric on S.

This infers that

$$\begin{aligned} cost(s, \hat{q}) - cost(s',\hat{q}) \ge (1-\gamma )\delta (s,s') \ge 0. \end{aligned}$$

The assertion follows.   \(\square \)

By Proposition 1, if \(\pi ^*\) is an optimal policy and \(s'=\pi ^*(s,\hat{q})\ne s\), then \(cost(s, \hat{q}) > cost(s',\hat{q})\). Thus, we can define a reduced subspace as

$$ S_{s,\hat{q}} = \{ s'\in S \mid cost(s, \hat{q}) > cost(s',\hat{q}) \}. $$

Hence, at each time t, we can solve

$$\begin{aligned} \pi _t = \mathop {\text {arg min}}\limits _{s\in S_{s_{t-1},\hat{q}_t}} \left( \delta (s_{t-1},s) + cost(s,\hat{q}_t) + \gamma \overline{V}^{\pi _{t-1}} (s) \right) \!. \end{aligned}$$
(3)

Next, we design an algorithm that converges to an optimal policy through searching in the reduced set \(S_{s,\hat{q}}\).

4.3 Modified Policy Iteration with Cost Model Learning

We calculate the optimal policy using the least square policy iteration (LSPI) [18]. If for any policy \(\pi \), there exists a vector \(\varvec{\theta }\) such that we can approximate \(V^\pi (s)=\varvec{\theta }^T \mathbf \phi (s)\) for any configuration s, then LSPI converges to the optimal policy. This mathematical guarantee makes LSPI a useful tool to solve the MDP as defined in Sect. 3. But the LSPI algorithm needs a predefined cost model to update the policy and evaluate the cost-to-go function. It is not obvious that any form of cost model would be available and as mentioned in Sect. 1, pre-defined cost models may be critically wrong. This motivates us to develop another form of the algorithm, where the cost model can be equivalently obtained through learning.

Assume that there exists a feature mapping \(\varvec{\eta }\) such that \(cost(s,q) \approx \varvec{\zeta }^T \varvec{\eta }(s,q)\) for some vector \(\varvec{\zeta }\). Changing the configuration from s to \(s'\) can be considered as executing a special query \(q(s, s')\). Therefore we approximate

$$\delta (s,s')=cost(s,q(s,s'))\approx \varvec{\zeta }^T \varvec{\eta }(s,q(s,s')).$$

The vector \(\varvec{\zeta }\) can be updated iteratively using the well-known recursive least squares estimation (RLSE) [44] as shown in Algorithm 2, where \(\varvec{\eta }^t = \eta (s_{t-1}, \hat{q}_t)\) and \(\hat{\epsilon }^t = (\varvec{\zeta }^{t-1})^T \varvec{\eta ^t} - cost(s_{t-1},\hat{q}_t)\) is the prediction error. Combining RLSE with LSPI, we get our cost-model oblivious algorithm as shown in Algorithm 3.

figure b
figure c

In Algorithm 3, the vector \(\varvec{\theta }\) determines the current policy. We can make a decision by solving the equation in line 6. The values of \(\delta (s_{t-1},s)\) and \(cost (s, \hat{q}_t)\) are obtained from the approximation of the cost model. The vector \(\varvec{\theta }^t\) is used to approximate the cost-to-go function following the current policy. If \(\varvec{\theta }^t\) converges, then we update the current policy (line 14–16).

Instead of using any heuristics we initialize policy \(\pi _0\) as initial configuration \(s_0\) and the cost-model \(C_0\) as 0, as shown in the lines 1–3 of Algorithm 3.

Proposition 2

If for any policy \(\pi \), there exists a vector \({\varvec{\theta }}\) such that \({V^\pi (s)=\varvec{\theta }^T \varvec{\phi }(s)}\) for any configuration s, Algorithm 3 will converge to an optimal policy.

Proof

Let \(\mathscr {V}: S \rightarrow \mathbb R\) be a set of bounded, real-valued functions. Then \(\mathscr {V}\) is a Banach space with the norm \(\Vert v \Vert = \Vert v \Vert _{\infty } = \sup _{} | v(s) |\) for any \(v\in \mathscr {V}\).

If we redefine our problem in the reduced search space, we get:

$$\begin{aligned} \mathop {\text {arg min}}\limits _{\pi \in \mathscr {U}}&\,\mathbb E \left[ \sum _{t=1}^{\infty } \gamma ^{t-1}\left( \delta (s_{t-1},s_{t}) + cost(s_{t},q) \right) \right] \\ \nonumber \text {such that}:&\,s_t = \pi (s_{t-1}, q), \quad s_t\in S_{s_{t-1}, q},\quad \text {for } t\ge 1\\ \nonumber \end{aligned}$$
(4)

Then Algorithm 3 is analogous to LSPI over the reduced search space. For this new problem given by Eq. (4), Algorithm 3 converges to a unique cost-to-go function \(\tilde{V} \in \mathscr {V}\). We need to show that \(V^*=\tilde{V}\). That means we need to prove the cost-to-go function estimate by Algorithm 3 is the optimal one.

Let us define the process of updating policy as a mapping \(\mathscr {M}: \mathscr {V}\rightarrow \mathscr {V}\). Now based on Eq. (2), it can be expressed as

$$ \mathscr {M}v(s) = \mathbb {E} \big [ \min _{s'\in S_{s,q}} ( \delta (s,s') + cost(s',q) + \gamma v(s') ) \big ]. $$

For a particular configuration s and query q, let

$$\begin{aligned} a^*_{s,q} (v) = \arg \min _{s'\in S_{s, q}} \left( \delta (s, s') + cost(s', q) + \gamma v(s') \right) . \end{aligned}$$

Assume that \(\mathscr {M}v(s)\ge \mathscr {M}u(s)\). Then

$$\begin{aligned} 0&\le \mathscr {M}v(s) - \mathscr {M}u(s)\\&=\mathbb {E} \left[ \delta (s,a^*_{s,q}(v)) + cost(a^*_{s,q}(v),q) + \gamma v(a^*_{s,q}(v))\right] \\&\quad -\mathbb {E} \left[ \delta (s,a^*_{s,q}(u)) + cost(a^*_{s,q}(u),q) + \gamma u(a^*_{s,q}(u))\right] \\&\le \mathbb {E} \left[ \delta (s,a^*_{s,q}(u)) + cost(a^*_{s,q}(u),q) + \gamma v(a^*_{s,q}(u))\right] \\&\quad -\mathbb {E} \left[ \delta (s,a^*_{s,q}(u)) + cost(a^*_{s,q}(u),q) + \gamma u(a^*_{s,q}(u))\right] \\&=\gamma \mathbb {E} \left[ v(a^*_{s,q}(u)) - u(a^*_{s,q}(u)) \right] \\&\le \gamma \mathbb {E} \left[ \Vert v -u \Vert \right] = \gamma \Vert v -u \Vert . \end{aligned}$$

Thus we can conclude, \(|\mathscr {M}v(s) - \mathscr {M}u(s)|\le \gamma |v(s)-u(s)|\) for all configuration \(s\in S\). From the definition of our norm, we can write

$$\begin{aligned} \sup _{s\in S}|\mathscr {M}v(s) - \mathscr {M}u(s)| = \Vert \mathscr {M}v - \mathscr {M}u \Vert \le \gamma \Vert v-u \Vert . \end{aligned}$$

This means that if \(0\le \gamma < 1\), \(\mathscr {M}\) is a contraction mapping. By [28, Proposition 3.10.2], there exists a unique \(v^*\) such that \(\mathscr {M}v^* = v^*\), such that for an arbitrary \(v^0\), the sequence \(v^n\) generated by \(v^{n+1} = \mathscr {M}v^n\) converges to \(v^*\). By the property of convergence of LSPI [18], \(v^*=\tilde{V}\). From Proposition 1, the optimal cost-to-go function \(V^*\) also satisfies \(\mathscr {M}V^* = V^*\). Hence \(V^*=\tilde{V}\) and the property of convergence of LSPI is preserved in Algorithm 3.   \(\square \)

5 Adaptive Database Tuning with Regularized Cost-Model Learning

In the results that we will present in Sect. 7.3, we will observe a higher variance of Algorithm 3 for index tuning than that of the state-of-art WFIT algorithm [39]. This high variance is caused mainly due to the absence of the cost model. As Algorithm 3 decides the policy depending on the estimated cost model, any error in the cost model causes instability in its outcome.

The process of cost-model estimation by accumulating information of incoming queries is analogous to approximating a function online from its incoming samples. Here, the function is the per-stage cost model \(C : S \times S \times \widetilde{Q} \rightarrow \mathbb {R}\). Here, \(\widetilde{Q}\) is the extended set of queries given by \(Q \cup \left\{ q(s, s') \mid s, s' \in S \right\} \). We obtain this \(\widetilde{Q}\) by considering configuration updates as special queries, as explained in Sect. 4.3. Now, the per-stage cost function can be defined as

$$\begin{aligned} C(s_{t-1}, s_{t}, \hat{q_{t}})&= cost(s_{t-1}, q(s_{t-1}, s_t)) + cost(s_{t}, \hat{q_{t}}) \end{aligned}$$

This equation shows that if we consider changing the configuration from s to \(s'\) as executing a special query \(q(s, s')\), approximating the function \(cost : S \times \widetilde{Q} \rightarrow \mathbb {R}\) in turn approximates the per-stage cost.

As explained in the previous section, we approximate cost online using linear projection to the feature space of the observed state and query. At each step we obtain some vector \({\varvec{\zeta }}\) such that \({cost(s,q) \approx \varvec{\zeta }^T \varvec{\eta }(s,q)}\). Here, \(\eta (s, q)\) is the feature vector corresponding to state s and query q. In order to obtain the optimal approximation, we initialize with an arbitrary \(\zeta \) and then recursively improve our estimate of \({\varvec{\zeta }}\) using recursive least squares estimation (RLSE) algorithm [44]. But the issues with RLSE are:

  1. i.

    It tries to minimize the square error per step

    $${{\hat{\epsilon }_t}^2 = \left( (\varvec{\zeta }^{t-1})^T \varvec{\eta ^t} - cost(s_{t-1},\hat{q}_t)\right) ^2} $$

    which is highly sensitive to outliers. If RLSE faces some query or configuration update which is very different from the previously observed queries, the estimation of \({\varvec{\zeta }}\) can change drastically.

  2. ii.

    The algorithm becomes unstable if the components of \({\varvec{\eta }(s,q)}\) are highly correlated. This may happen when the algorithm passes through a series of related queries.

As the reinforcement learning algorithm uses the estimated cost model to decide policies and to evaluate them, error or instability in the estimated cost model at any step affects its performance. Specifically, large deviations arise in the estimated cost model due to the queries which are far from the previously learned distribution. This costs the learning algorithm some time to adapt. It also affects the policy and evaluation of the present state and action. We thus propose to use a regularized cost-model estimator instead of RLSE, which is less sensitive to outliers and relatively stable, so as to improve the performance of Algorithm 3 and decrease its variance.

5.1 Regularized Cost-Model Estimator

In order to avoid the effect of outliers, we can penalize high variance of \({\varvec{\zeta }}\) by adding a regularization term with the squared prediction error of RLSE. Thus at time step t, the new estimator will try to find

$$\begin{aligned} \varvec{\zeta }^{t} = \mathop {\text {arg min}}\limits _{\zeta } P^t \qquad \text {given} \qquad \hat{\epsilon }_t, \overline{B}^{t-1}, \varvec{\zeta }^{t-1}, \varvec{\eta }^t \end{aligned}$$
(5)

such that:

$$\begin{aligned} P^t&:={\hat{\epsilon }_t}^2 + \lambda \Vert \varvec{\zeta }^{t-1}\Vert _2^2\\&= \left( \langle \varvec{\zeta }^{t-1}, \varvec{\eta ^t} \rangle - cost(s_{t-1},\hat{q}_t)\right) ^2 + \lambda \langle \varvec{\zeta }^{t-1}, \varvec{\zeta }^{t-1} \rangle . \end{aligned}$$

Here, \(\lambda > 0\) is the regularization parameter. Square of \(L_2\)-norm, \(\Vert \varvec{\zeta }\Vert _2^2\), is the regularization function. \(\varvec{\eta ^t} :=\varvec{\eta }(s_{t-1},\hat{q}_t)\) is the feature vector of state \(s_{t-1}\) and query \(\hat{q}_t\). We call this squared error the loss function \(L_t\) defined at time t for a given choice of \(\varvec{\zeta }^t\). Thus,

$$\begin{aligned} L_t(\varvec{\zeta }^t) :=\left( \langle \varvec{\zeta }^{t}, \varvec{\eta ^t} \rangle - cost(s_{t-1},\hat{q}_t)\right) ^2. \end{aligned}$$

The dual of this problem can be considered as picking up such an \(\varvec{\zeta }^{t}\) inside an n-dimensional Euclidean ball \(\mathbb {B}_{\lambda }^n\) of radius \(s(\lambda )\) that minimizes the error \({\hat{\epsilon }_t}^2\). From an optimization point of view, we choose \(\varvec{\zeta }^t\) inside \(\mathbb {B}_{\lambda }^n :=\left\{ \varvec{\zeta }\mid \Vert \varvec{\zeta } \Vert _2^2 \le s(\lambda ) \text { and } \varvec{\zeta }\in \mathbb {R}^n \right\} \) rather than searching for it in the whole space \(\mathbb {R}^n\). This estimator penalizes any drastic change in the cost model due to some outlier query. If some query tries to pull \(\varvec{\zeta }^t\) out of \(\mathbb {B}_{\lambda }^n\), this estimator regularizes \(\varvec{\zeta }^t\) at the boundary. It also induces sparsity in the components of estimating vector \(\varvec{\zeta }\) that eliminates the instability due to highly correlated queries.

figure d

The online penalized cost-model estimation algorithm obtained from this formulation is shown in Algorithm 4. Generally, the optimal values of \(\varepsilon \) and \(\lambda \) are decided using a cross-validation procedure. In Sect. 6.4, we are going to derive the optimal value of \(\varepsilon \) and a probable estimation for \(\lambda \) for the index tuning problem. This will decide optimal values of the hyper-parameters for a given set of workload with theoretical performance bounds.

5.2 Performance Bound

We can depict this online cost-model estimation task as a simple game between a decision maker and an adversary [35]. In database tuning, the decision maker is our cost-model estimating algorithm and the adversary is the workload providing an uncertain sequence of queries. Then, we can formulate the game as Algorithm 5.

figure e

We can define the regret of this game after time step T as,

$$\begin{aligned} {Reg}_T :=\sum _{t=1}^T L_t(\varvec{\zeta }^t) - \sum _{t=1}^T L_t(\varvec{\zeta }^{\mathrm {OPT}}) \end{aligned}$$
(6)

where \(\varvec{\zeta ^{\mathrm {OPT}}}\) is the solution picked up by an offline expert that minimizes the cumulative loss after time step T. \({Reg}_T\) is the difference between cumulative sum of errors up to time T obtained using Algorithm 4 and the optimal offline algorithm. This regret term captures deviation of the cost-model estimated by the Algorithm 5 from the computable optimal cost model.

As the loss function \(L(\varvec{\zeta })\) is the square of the error between estimated and actual values of cost at time t, it is a convex function over the set of \(\varvec{\zeta }\). According to the analysis given in [35], we can canonically describe our estimation model as a scheme to develop a Legendre potential function \(\varvec{\varPhi }(\varvec{\zeta }^{\mathrm {OPT}})\) with time t for the given workload, where the initial value of potential is given by:

$$\begin{aligned} \varvec{\varPhi }_0(\varvec{\zeta }^{\mathrm {OPT}}) :=\Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 \end{aligned}$$

and its value at time t is updated as

$$\begin{aligned} \varvec{\varPhi }_t(\varvec{\zeta }^{\mathrm {OPT}}) :=\varvec{\varPhi }_{t-1}(\varvec{\zeta }^{\mathrm {OPT}}) + \frac{1}{\lambda }L_t(\varvec{\zeta }^t). \end{aligned}$$

Now, we can re-write Eq. (5) as:

$$\begin{aligned} \varvec{\zeta }^{t} = \mathop {\text {arg min}}\limits _{\varvec{\zeta }\in \mathbb {B}_{\lambda }^n} \left[ D_{\varvec{\varPhi }_0}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^{t-1}) + \frac{1}{\lambda } \left( \varvec{\nabla }L_{t-1}(\varvec{\zeta }_{t-1})\right) ^T \varvec{\zeta }^{t-1} \right] \end{aligned}$$
(7)

Here, \(D_{\varvec{\varPhi }_0}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^{t-1})\) is the Bregman divergence [26] between \(\varvec{\zeta }^{\mathrm {OPT}}\) and \(\varvec{\zeta }^{t-1}\) along the potential field \(\varvec{\varPhi }_0(\varvec{\zeta }^{\mathrm {OPT}})\). This term in Eq. (7) inclines Algorithm 4 to choose such a \(\varvec{\zeta }\) which is nearest to optimal \(\varvec{\zeta }^{\mathrm {OPT}}\) on the \(\Vert \varvec{\zeta }\Vert ^2\) manifold. Also, \(\varvec{\nabla }L_{t-1}(\varvec{\zeta }_{t-1})^T \varvec{\zeta }^{t-1}\) is the change of the loss function in the direction of \(\varvec{\zeta }^{t-1}\). Minimization of this term is equivalent to selection of such a \(\varvec{\zeta }^t\) that minimizes the corresponding loss. Thus, the \(\varvec{\zeta }^t\) picked up by the Algorithm is the one that minimizes a linear combination of these two terms weighted by \(\lambda \). From this formulation we can obtain the following lemma for the regret bound.

Lemma 1

After time step T, the upper bound of the regret of Algorithm 4 can be given by

$$\begin{aligned} {Reg}_T \le \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{1}{\lambda } \sum _{t=1}^T \hat{\epsilon _t}^2 (\varvec{\eta }^t)^T R^t \varvec{\eta }^t. \end{aligned}$$
(8)

Proof

Applying Theorem 1 of [42] on Eq. (7) we get the inequalities,

$$\begin{aligned} {Reg}_T&\le \lambda \left[ D_{\varvec{\varPhi }_0}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^0) - D_{\varvec{\varPhi }_T}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^{T+1}) + \sum _{t=1}^T D_{\varvec{\varPhi }_t}(\varvec{\zeta }^{t}, \varvec{\zeta }^{t+1}) \right] \\&\le \lambda \left[ D_{\varvec{\varPhi }_0}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^0) + \sum _{t=1}^T D_{\varvec{\varPhi }_t}(\varvec{\zeta }^{t}, \varvec{\zeta }^{t+1}) \right] . \end{aligned}$$

From the definition of the Legendre potential we get:

$$\begin{aligned} \varvec{\varPhi }_t(\varvec{\zeta })&= \varvec{\varPhi }_{t-1}(\varvec{\zeta }) + \frac{1}{\lambda }L_t(\varvec{\zeta })\\&= \Vert \varvec{\zeta }\Vert ^2 + \sum _{t=1}^T \left( \langle \varvec{\zeta }, \varvec{\eta ^t} \rangle - cost(s_{t-1},\hat{q}_t)\right) ^2 \\&= \varvec{\zeta }^T \left( I + \frac{1}{\lambda } \sum _{t=1}^T\varvec{\eta }^t (\varvec{\eta }^t)^T \right) \varvec{\zeta }- \varvec{\zeta }^T \left( \frac{1}{\lambda } \sum _{t=1}^T cost(s_{t-1},\hat{q}_t) \varvec{\eta }^t \right) + \sum _{t=1}^T cost(s_{t-1},\hat{q}_t)^2 \\&= \varvec{\zeta }^T (R^T)^{-1} \varvec{\zeta }- \varvec{\zeta }^T b_T + C_T \end{aligned}$$

where \(b_T = \sum _{t=1}^T c^t \varvec{\eta }^t \) and \(C_T = \sum _{t=1}^T cost(s_{t-1},\hat{q}_t))^2 \). Thus, the dual of the potential can be given by

$$\begin{aligned} \varvec{\varPhi }_t^*(\varvec{\zeta })&= \varvec{\zeta }^T R^T \varvec{\zeta }- 2 \varvec{\zeta }^T R^T b_T + (b_T)^T R^T b_T \end{aligned}$$

Now, from the definition of \(\varvec{\varPhi }_0\) and properties of Bregman divergence,

$$\begin{aligned} D_{\varvec{\varPhi }_0}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^0)&= D_{\Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2}(\varvec{\zeta }^{\mathrm {OPT}}, \varvec{\zeta }^0)\\&= \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 \end{aligned}$$

and

$$\begin{aligned} D_{\varvec{\varPhi }_t}(\varvec{\zeta }^t, \varvec{\zeta }^{t+1})&= D_{\varvec{\varPhi }_t^*}\left( \varvec{\nabla }\varvec{\varPhi }_t(\varvec{\zeta }^{t+1}), \varvec{\nabla }\varvec{\varPhi }_t(\varvec{\zeta }^t)\right) \\&= D_{\varvec{\varPhi }_t^*}\left( \varvec{0}, \varvec{\nabla }\varvec{\varPhi }_t(\varvec{\zeta }^t)\right) \\&= D_{\varvec{\varPhi }_t^*}\left( \varvec{0}, \frac{1}{\lambda } \varvec{\nabla }L_t(\varvec{\zeta }^t)\right) \\&= \frac{1}{\lambda ^2} \left( \varvec{\nabla }L_t(\varvec{\zeta }^t)\right) ^T R^t \left( \varvec{\nabla }L_t(\varvec{\zeta }^t) \right) \\&= \frac{1}{\lambda ^2} \left( \langle \varvec{\zeta }, \varvec{\eta ^t} \rangle - cost(s_{t-1},\hat{q}_t)\right) ^2 (\varvec{\eta }^t)^T R^t \varvec{\eta }^t \\&= \frac{1}{\lambda ^2} \hat{\epsilon _t}^2 (\varvec{\eta }^t)^T R^t \varvec{\eta }^t. \end{aligned}$$

By replacing these results in the aforementioned inequality we get:

$$\begin{aligned} {Reg}_T \le \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{1}{\lambda } \sum _{t=1}^T \hat{\epsilon _t}^2 (\varvec{\eta }^t)^T R^t \varvec{\eta }^t. \end{aligned}$$

   \(\square \)

Lemma 2

If \(R^0 \in \mathbb {R}^{n\times n}\) and invertible,

$$\begin{aligned} (\varvec{\eta }^t)^T R^t \varvec{\eta }^t = 1 - \frac{\mathrm {det}(R^{t})}{\mathrm {det}(R^{t-1})} \quad \forall t = 1, 2, \ldots , T \end{aligned}$$
(9)

Proof

From [19], we get if there exists an invertible matrix \(B \in \mathbb {R}^{n\times n}\) such that \(A = B + \varvec{x} \varvec{x}^T\), where \(\varvec{x} \in \mathbb {R}^n\), then

$$\begin{aligned} \varvec{x}^T A^{-1} \varvec{x} = 1 - \frac{\mathrm {det}(B)}{\mathrm {det}(A)} \end{aligned}$$
(10)

As, per Algorithm 4, \(R^0 = \varepsilon I \), it is invertible. Since \((R^t)^{-1} = (R^{t-1})^{-1} + \varvec{\eta }^t (\varvec{\eta }^t)^T\), by the Sherman–Morrison formula, all \(R^t\)’s are invertible for \(t \ge 0\). Thus, simply replacing A by \((R^t)^{-1}\) and B by \((R^{t-1})^{-1}\) in Eq. (10), we obtain

$$\begin{aligned} (\varvec{\eta }^t)^T R^t \varvec{\eta }^t = 1 - \frac{\mathrm {det}((R^{t-1})^{-1})}{\mathrm {det}((R^t)^{-1})} = 1 - \frac{\mathrm {det}(R^{t})}{\mathrm {det}(R^{t-1})} \end{aligned}$$

since, \(\mathrm {det}((R^t)^{-1}) = \frac{1}{\mathrm {det}(R^{t})}\).   \(\square \)

Using Lemmas 1 and 2, we finally derive the regret bound for the regularized cost-model estimator in the following theorem.

Theorem 1

If we consider the error as a bounded function such that \( 0 \le \hat{\epsilon _t}^2 \le E_{\max }\) and \(\Vert \varvec{\eta }^t \Vert _\infty \le \delta \),

$$\begin{aligned} {Reg}_T \le \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ n {\ln }\left( 1+\frac{\varepsilon \delta ^2 T}{n}\right) - (n-1) {\ln }(\varepsilon ) \right] \end{aligned}$$
(11)

where \(R^0 = \varepsilon I\).

Proof

Let us assume the squared error has an upper bound \(E_{\max }\) for a given workload. Under this assumption, we get from Eqs. (8) and (9),

$$\begin{aligned} {Reg}_T&\le \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \sum _{t=1}^T \left( 1 - \frac{\mathrm {det}(R^{t})}{\mathrm {det}(R^{t-1})} \right) \\&\le \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 - \frac{E_{\max }}{\lambda }\sum _{t=1}^T \ln \left( \frac{\mathrm {det}(R^{t})}{\mathrm {det}(R^{t-1})} \right) \\&= \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \ln \left( \frac{\mathrm {det}(R^{0})}{\mathrm {det}(R^{T})} \right) \\&= \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ \ln (\varepsilon ) - \ln (\mathrm {det}(R^{T})) \right] \\&= \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ \ln (\varepsilon ) + \ln \left( \mathrm {det}\left( \frac{1}{\varepsilon } I + \sum _{t=1}^T \varvec{\eta }^t (\varvec{\eta }^t)^T \right) \right) \right] \\&= \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ \sum _{k=1}^n \ln \left( 1 + \varepsilon \lambda _k \right) - (n-1) \ln (\varepsilon ) \right] . \end{aligned}$$

Because

$$\begin{aligned} \mathrm {det}\left( \frac{1}{\varepsilon } I + \sum _{t=1}^T \varvec{\eta }^t (\varvec{\eta }^t)^T \right)&= \varepsilon ^{-n} \mathrm {det}\left( I + \varepsilon \sum _{t=1}^T \varvec{\eta }^t (\varvec{\eta }^t)^T \right)&= \varepsilon ^{-n} \prod _{k=1}^n (1+\varepsilon \lambda _k) \end{aligned}$$

where \(\lambda _1, \ldots , \lambda _n\) are eigenvalues of the matrix \(\sum _{t=1}^T \varvec{\eta }^t (\varvec{\eta }^t)^T\). As the eigenvalues of \(\sum _{t=1}^T \varvec{\eta }^t (\varvec{\eta }^t)^T\) are equal to the eigenvalues of its Gram matrix \(G_{ij} = (\varvec{\eta }^i)^T \varvec{\eta }^j\), we can write

$$\begin{aligned} \sum _{k=1}^n \lambda _k = Trace(G) = \sum _{t=1}^T (\varvec{\eta }^t)^T \varvec{\eta }^t \le \delta ^2 T \end{aligned}$$

where \(\Vert \varvec{\eta }^t \Vert _\infty \le \delta \), that is, the maximum value of any component of \(\varvec{\eta }\) is bounded by \(\delta \). In the above inequality, the equality holds if and only if \(\lambda _1 = \lambda _2 = \ldots = \lambda _n = \frac{\delta ^2 T}{n}\). By applying this condition, we get the regret bound as

$$\begin{aligned} {Reg}_T \le \lambda \Vert \varvec{\zeta }^{\mathrm {OPT}} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ n {\ln }\left( 1+\frac{\varepsilon \delta ^2 T}{n}\right) - (n-1) {\ln }(\varepsilon ) \right] . \end{aligned}$$

   \(\square \)

This theorem shows that our estimation of the cost model using Algorithm 4 is always upper bounded by a constant value depending on the optimal solution added with a term that increases with time logarithmically. This shows that the regret, which is the cumulative deviation of the cost model computed by Algorithm 4 with respect to the optimal one, increases very slowly with time. That means the error of estimation in each and every time step is considerably small.

6 Case Study: Index Tuning

In this section we present COREIL (for Cost-model Oblivious REInforcement Learning algorithm) and its regularized version, rCOREIL. COREIL and rCOREIL instantiate Algorithm 3 taking as cost-model estimators Algorithms 2 and 4 respectively. Both of them tune the configurations differing in their secondary indexes and handle the configuration changes corresponding to the creation and deletion of indexes. COREIL uses reinforcement learning approach to solve the index tuning problem on-the fly. It projects index tuning as an MDP and applies Algorithm 3 to solve it. On the other hand, rCOREIL uses the regularized cost-model estimator, described in Sect. 5.1; rCOREIL’s regularized estimator affords it to leverage the fact that if we serve the learning algorithm with a better cost-model to evaluate its policy better, it will perform better. In this section, we also define the feature mappings \(\varvec{\phi }\) and \(\varvec{\eta }\) for both COREIL and rCOREIL. They are used to approximate the cost-to-go function V and the cost function respectively. At the end of this section we prove tighter performance bounds for Algorithm 4 in case of index tuning. We also derive optimal values of the parameters \(\lambda \) and \(\varepsilon \) for a given workload.

6.1 Reducing the Search Space

Let I be the set of indexes that can be created. Each configuration \(s\in S\) is an element of the power set \(2^I\). For example, 7 attributes in a schema of R yield a total of 13699 indexes and a total of \(2^{13699}\) possible configurations. Such a large search space invalidates a naive brute-force search for the optimal policy.

For any query \(\hat{q}\), let \(r(\hat{q})\) be a function that returns a set of recommended indexes. This function may be already provided by the database system (e.g., as with IBM DB2), or it can be implemented externally [1]. Let \(d(\hat{q})\subseteq I\) be the set of indexes being modified (update, insertion or deletion) by \(\hat{q}\). We can define the reduced search space as

$$\begin{aligned} S_{s,\hat{q}} = \{ s'\in S \mid (s-d(\hat{q})) \subseteq s' \subseteq (s\cup r(\hat{q})) \}. \end{aligned}$$
(12)

Deleting indexes in \(d(\hat{q})\) will reduce the index maintenance overhead and creating indexes in r(q) will reduce the query execution cost. Note that the definition of \(S_{s,\hat{q}}\) here is a subset of the one defined in Sect. 4.2 which deals with the general configurations.

Note that for tree-structured indexes (e.g., B\(^+\)-tree), we could further consider the prefix closure of indexes for optimization. For any configuration \(s\in 2^I\), define the prefix closure of s as

$$\begin{aligned} \langle s\rangle = \{ i\in I \mid i\;\text {is a prefix of an index}\;j\;\text {for some}\;j\in s \}. \end{aligned}$$
(13)

Thus in Eq. (12), we use \({\langle r(\hat{q})\rangle }\) to replace \(r(\hat{q})\) for better approximation. The intuition is that in case of \(i \notin s\) but \({i \subseteq \langle s\rangle }\) we can leverage the prefix index to answer the query.

6.2 Defining the Feature Mapping

Let V be the cost-to-go function following a policy. As mentioned earlier, Algorithm 3 relies on a proper feature mapping \(\varvec{\phi }\) that approximates the cost-to-go function as \(V(s) \approx \varvec{\theta }^T \varvec{\phi }(s)\) for some vector \(\varvec{\theta }\). The challenge lies in how to define \(\varvec{\phi }\) under the scenario of index tuning. Both in COREIL and rCOREIL, we define it as

$$\begin{aligned} \phi _{s'}(s) :={\left\{ \begin{array}{ll} 1, &{}\text{ if } s'\subseteq s \\ -1, &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$

for each \(s,s'\in S\). Let \(\varvec{\phi } = (\phi _{s'})_{s'\in S}\). Note that \(\phi _\emptyset \) is an intercept term since \(\phi _\emptyset (s)=1\) for all \(s\in S\). The following proposition shows the effectiveness of \(\varvec{\phi }\) for capturing the values of the cost-to-go function V.

Proposition 3

There exists a unique \(\varvec{\theta }= ( \theta _{s'} )_{s'\in S}\) which approximates the value function as

$$\begin{aligned} V(s) = \sum _{s'\in S} \theta _{s'} \phi _{s'} (s) = \varvec{\theta }^T \varvec{\phi }(s). \end{aligned}$$
(14)

Proof

Suppose \(S = \{ s^1, s^2,\ldots ,s^{|S|} \}\). Note that we use superscripts to denote the ordering of elements in S.

Let \(\varvec{V} = (V(s))^T_{s\in S}\) and M be a \(|S|\times |S|\) matrix such that

$$\begin{aligned} M_{i,j} = \phi _{s^j}(s^i). \end{aligned}$$

Let \(\varvec{\theta }\) be a |S|-dimension column vector such that \(M \varvec{\theta }= \varvec{V}\). If M is invertible then \(\varvec{\theta }= M^{-1} V\) and thus Eq. (14) holds.

We now show that M is invertible. Let \(\varvec{\psi }\) be a \(|S|\times |S|\) matrix such that

$$\begin{aligned} \varvec{\psi }_{i,j} = M_{i,j} + 1. \end{aligned}$$

We claim that \(\varvec{\psi }\) is invertible and its inverse is the matrix \(\varvec{\tau }\) such that

$$\begin{aligned} \varvec{\tau }_{i,j} = (-1)^{|s^i|-|s^j|} \varvec{\psi }_{i,j}. \end{aligned}$$

To see this, consider

$$\begin{aligned} (\varvec{\tau \psi })_{i,j}&= \sum _{1\le k \le |S|} (-1)^{|s^i|-|s^k|} \varvec{\psi }_{i,k}\varvec{\psi }_{k,j}\\&= \sum _{s_j\subseteq s_k \subseteq s_i} (-1)^{|s^i|-|s^k|}. \end{aligned}$$

Therefore \((\varvec{\tau \psi })_{i,j}=1\) if and only if \(i=j\). By the Sherman-Morrison formula, M is also invertible.

However, for any configuration s, \(\varvec{\theta }(s)\) is a \(|2^I|\)-dimensional vector. In order to reduce the dimensionality, the cost-to-go function can be approximated by \( V(s) \approx \sum _{s'\in S, |s'|\le N} \theta _{s'} \phi _{s'}(s) \) for some integer N. Here we assume that the collaborative benefit among indexes could be negligible if the number of indexes exceeds N. In particular when \(N=1\), we have

$$\begin{aligned} V(s) \approx \theta _0 + \sum _{i\in I} \theta _i\phi _i(s). \end{aligned}$$
(15)

where we ignore all the collaborative benefits among indexes in a configuration. This is reasonable since any index in a database management system is often of individual contribution for answering queries [31]. Therefore, we derive \(\varvec{\phi }\) from Eq. (15) as \(\varvec{\phi }(s) = ( 1, (\phi _{i}(s))_{i\in I}^T)^T.\) By using this feature mapping \(\varvec{\phi }\), both COREIL and rCOREIL approximate the cost-to-go function \(V(s) \approx \varvec{\theta }^T \varvec{\phi }(s)\) for some vector \(\varvec{\theta }\).

6.3 Defining the Feature Mapping

A good feature mapping for approximating functions \(\delta \) and cost must take into account both the benefit from the current configuration and the maintenance overhead of the configuration.

To capture the difference between the index set recommended by the database system and that of the current configuration, we define a function \(\varvec{\beta }(s,\hat{q}) = (1, (\varvec{\beta }_i(s,\hat{q}))_{i\in I}^T)^T\), where

$$\begin{aligned} \varvec{\beta }_i(s,\hat{q}) :={\left\{ \begin{array}{ll} 0, &{} i\notin r(\hat{q}) \\ 1, &{}i\in r(\hat{q})\text { and }i\in s \\ -1, &{}i\in r(\hat{q})\text { and }i\notin s. \end{array}\right. } \end{aligned}$$

If the execution of query \(\hat{q}\) cannot benefit from index i then \(\varvec{\beta }_i(s,\hat{q})\) always equals zero; otherwise, \(\varvec{\beta }_i(s,\hat{q})\) equals 1 or -1 depending on whether s contains i or not. For tree-structured indexes, we could further consider the prefix closure of indexes as defined in Eq. (13) for optimization.

On the other hand, to capture whether a query (update, insertion or deletion) modifies any index in the current configuration, we define a function \(\varvec{\alpha }(s,\hat{q}) = (\alpha _i(s,\hat{q}))_{i\in I}\) where

$$\begin{aligned} \alpha _i(s,\hat{q}) = {\left\{ \begin{array}{ll} 1, &{}\text {if } i\in s \text { and } \hat{q} \text { modify } i \\ 0, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Note that if \(\hat{q}\) is a selection query, \(\varvec{\alpha }\) trivially returns \(\varvec{0}\).

By combining \(\varvec{\beta }\) and \(\varvec{\alpha }\), we get the feature mapping \(\varvec{\eta }= ( \varvec{\beta }^T, \varvec{\alpha }^T )^T\) used in both of the algorithms. It can be used to approximate the functions \(\delta \) and cost as described in Sect. 4.3.

6.4 Performance Bounds for Regularized COREIL

rCOREIL applies Algorithm 4 for cost-model estimation, while COREIL uses RLSE for this. If we follow Algorithm 3, on line 13 rCOREIL calls the regularized cost-model estimator with arguments \(\hat{\epsilon }^t, R^{t-1}, \varvec{\zeta }^{t-1}, \varvec{\eta }^t\) instead of RLSE. Following Theorem 1 and the construction of the feature map in Sect. 6.3, Proposition 4 gives a tighter regret bound for the cost-model estimation of rCOREIL.

Proposition 4

If we consider the error as a bounded function such that \( 0 \le \hat{\epsilon _t}^2 \le E_{\max }\):

$$\begin{aligned} {Reg}_T^{rCOREIL} \le \lambda \Vert \varvec{\zeta }^{OPT} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ 2n {\ln }T - n {\ln }n \right] \end{aligned}$$
(16)

and the optimal value for \(\varepsilon \) is given by:

$$\begin{aligned} \varepsilon ^* = \frac{n^2 - n}{T}. \end{aligned}$$

Proof

From Sect. 6.3, \(\Vert \varvec{\eta }^t \Vert _\infty \le 1\). Equation (11) transforms into

$$\begin{aligned} {Reg}_T^{rCOREIL} \le \lambda \Vert \varvec{\zeta }^{OPT} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ n {\ln }\left( 1+\frac{\epsilon T}{n}\right) - (n-1) {\ln }(\varepsilon ) \right] . \end{aligned}$$

Now, we determine the optimal value of \(\varepsilon \) by minimizing the RHS of above inequality as this will impose tighter limit on the bound. Thus,

$$\left[ \frac{\partial (\text {RHS})}{\partial \varepsilon }\right] _{\varepsilon ^* = 0} = 0.$$

By solving this, we get \(\varepsilon ^* = \frac{n^2 - n}{T}\). Substituting this value in the previous inequality gives us the regret bound for regularized COREIL algorithm as

$$\begin{aligned} {Reg}_T^{rCOREIL} \le \lambda \Vert \varvec{\zeta }^{OPT} \Vert ^2 + \frac{E_{\max }}{\lambda } \left[ 2n {\ln }(T) - n {\ln }(n) \right] . \end{aligned}$$

   \(\square \)

Similarly, we can also find out the optimal value of \(\lambda \) that will make the upper bound tightest.

Corollary 1

If the value of optimal solution \(\varvec{\zeta }^{OPT}\) can be predicted beforehand, the optimal value of \(\lambda \) is given by

$$\begin{aligned} \lambda ^* = \frac{E_{\max }}{\Vert \varvec{\zeta }^{OPT} \Vert ^2} \left[ 2n {\ln }(T) - n {\ln }(n) \right] \end{aligned}$$

where the stopping time T is given.

Proof

As an optimal \(\lambda \) will minimize the RHS of Eq. (16), we get it by setting the partial derivative of the RHS with respect to \(\lambda \) as zero. This simply gives us, \(\lambda ^* = \frac{E_{\max }}{\Vert \varvec{\zeta }^{OPT} \Vert ^2} \left[ 2n {\ln }(T) - n {\ln }(n) \right] \).

Substituting the optimal value of \(\lambda \) in Eq. (16) for a given T and \(\varvec{\zeta }^{OPT}\), we get

$${Reg}_T^{rCOREIL} \le \Vert \varvec{\zeta }^{OPT} \Vert ^2 + E_{\max } \left[ 2n {\ln }(T) - n {\ln }(n) \right] .$$

For large n and comparatively smaller T, \(\left[ 2n {\ln }(T) - n {\ln }(n) \right] \) is a negative number that makes the plausible error in cost-model estimation much smaller than even the magnitude of the optimal \(\varvec{\zeta }\) vector. This shows the guarantee on the quality of the cost-model estimated by rCOREIL once the parameters are properly set.

7 Performance Evaluation

In this section, we present an empirical evaluation of COREIL and rCOREIL through two sets of experiments. In the first set of experiments, we implement a prototype of COREIL in Java. We compare its performance with that of the state-of-the-art WFIT algorithm [39] (briefly described in Sect. 7.2). In the results, we can see that COREIL shows competitive performance with WFIT but has higher variance. This validates the efficiency of the reinforcement learning approach to solve the index tuning problem on the fly. This shows that, even without any assumption of a pre-determined cost model, it is possible to perform at the level of the state-of-the-art.

In the second set of experiments, we evaluate the performance of rCOREIL with respect to COREIL. The results show enhancements in performance by rCOREIL as reasoned in Sect. 5. This validates the claim in Sect. 5 that the higher variance of COREIL is due to suboptimal use of the RLSE algorithm. It also establishes the fact that if we serve the learning algorithm with an enhanced estimation of cost-model, it improves the performance substantially. In these experiments, we also check the sensitivity of rCOREIL with respect to the parameter \(\lambda \) and cross-validate the optimal value for the given workload.

7.1 Dataset and Workload

The dataset and workload is conforming to the TPC-C specification [30] and generated by the OLTP-Bench tool [15]. The 5 types of transactions in TPC-C are distributed as NewOrder 45\(\%\), Payment 43\(\%\), OrderStatus 4\(\%\), Delivery 4\(\%\) and StockLevel 4\(\%\). Each of these transactions are associated with \(3\sim 5\) SQL statements (query/update). The scale factor used throughout the experiments is 2. We do not leverage any repetition or periodicity of the workload in our approach; still for robustness there may be up to \(10\,\%\) of repetition of queries. Note that [39] additionally uses the dataset NREF in its experiments. However, this dataset and workload are not publicly available.

7.2 WFIT: Brief Description

WFIT is proposed in [39] as a method of semi-automatic index tuning. This algorithm keeps the database administrator “in the loop” by generating recommendations. These recommendations are generated through a feedback loop originating from the administrator’s preferences. This process is based on the Work Function Algorithm [8]. In order to determine the change of configuration, WFIT considers all the queries observed in the past. Then it solves a deterministic problem of minimizing the total processing cost. However, while doing so, it assumes the existence of a pre-determined cost model served by the database system or administrator. Due to use of a pre-defined cost model for all the datasets and workloads it faces the problems discussed in the Introduction. Results documented in the following sections will show the importance of a reinforcement learning approach to make the process generic and cost-model oblivious.

Fig. 1.
figure 1

Evolution of the efficiency (total time per query) of the two systems from the beginning of the workload (smoothed by averaging over a moving window of size 20)

Fig. 2.
figure 2

Box chart of the efficiency (total time per query) of the two systems. We show in both cases the 9th and 91th percentiles (whiskers), first and third quartiles (box) and median (horizontal rule).

Fig. 3.
figure 3

Evolution of the overhead (time of the optimization itself) of the two systems from the beginning of the workload (smoothed by averaging over a moving window of size 20)

7.3 COREIL: Experiments and Results

Experimental Set-Up. We conduct all the experiments on a server running IBM DB2 10.5. The server is equipped with Intel i7-2600 Quad-Core @ 3.40 GHz and 4 GB RAM. We measure wall-clock times for execution of all components. Specially, for execution of workload queries or index creating/dropping, we measure the response time of processing corresponding SQL statement in DB2. Additionally, WFIT uses the what-if optimizer of DB2 to evaluate the cost. In this setup, each query is executed only once and all the queries were generated from one execution history.

Efficiency. Figure 1 shows the total cost of processing TPC-C queries for online index tuning of COREIL and WFIT. Total cost consists of the overhead of corresponding tuning algorithm, cost of configuration change and that of query execution. Results show that, after convergence, COREIL has lower processing cost most of the time. But COREIL converges slower than WFIT, which is expected since it does not rely on the what-if optimizer to guide the index creations.Footnote 1 With respect to the whole execution set, the average processing cost of COREIL (451 ms) is competitive to that of WFIT (452 ms). However, if we calculate the average processing cost of the 500\(^{th}\) query forwards, the average performance of COREIL (357 ms) outperforms that of WFIT (423 ms). To obtain further insight from these data, we study the distribution of the processing time per query, as shown in Fig. 2. As can be seen, although COREIL exhibits larger variance in the processing cost, its median is significantly lower that that of WFIT. All these results confirms that COREIL has better efficiency than WFIT under a long term execution.

Fig. 4.
figure 4

Evolution of the time taken by configuration change (index creation and destruction) of the two systems from the beginning of the workload; no configuration change happens past query #700. All values except the vertical lines shown in the figure are zero.

Figures 3 and 4 show analysis of the overhead of corresponding tuning algorithm and cost of configuration change respectively. By comparing Fig. 1 with Fig. 3, we can see that the overhead of the tuning algorithm dominates the total cost and the overhead of COREIL is significantly lower than that of WFIT. In addition, WFIT tends to make costlier configuration changes than COREIL, which is reflected in a higher time for configuration change. This would be discussed further in the micro-analysis. Note that both methods converge rather quickly and no configuration change happens beyond the 700th query.

A possible reason for the comparatively smaller overhead of COREIL with respect to WFIT, in addition to not relying on a possibly costly what-if optimizer, is the MDP structure. In MDPs, all the history of the system is assumed to be summarized in the present state and the cost-function. Thus, COREIL has to do less book-keeping than WFIT.

Fig. 5.
figure 5

Evolution of the effectiveness (query execution time in the DBMS alone) of the two systems from the beginning of the workload (smoothed by averaging over a moving window of size 20); logarithmic y-axis

Effectiveness. To verify the effectiveness of indexes created by the tuning algorithms, we extract the cost of query execution from the total cost. Figure 5 (note the logarithmic y-axis) indicates that the set of indexes created by COREIL shows competitive effectiveness with that created by WFIT, though WFIT is more effective in general and exhibits less variance after convergence. Again, this is to be expected since COREIL does not have access to any cost model for the queries. As previously noted, the total running time is lower for COREIL than WFIT, as overhead rather than query execution dominates running time for both systems.

We have also performed a micro-analysis to check whether the indexes created by the algorithms are reasonable. We observe that WFIT creates more indexes with longer compound attributes, whereas COREIL is more parsimonious in creating indexes. For instance, WFIT creates a 14-attribute index as shown below.

figure f

The reason of WFIT creating such a complex index is probably due to multiple queries with the following pattern.

figure g

In contrast, COREIL tends to create shorter compound-attribute indexes. For example, COREIL created an index [S_I_ID, S_W_ID] which is definitely beneficial to answer the query above and is competitive in performance compared with the one created by WFIT.

7.4 rCOREIL: Experiments and Results

Experimental Set-Up. We run COREIL and rCOREIL, with a set of \(\lambda \) values 300, 350, 400, 450, and 500. The previous set of experiments have already established competitive performance of COREIL with WFIT. In this set we evaluate the basic idea of rCOREIL: providing regularized estimation of cost-model enhances the performance of COREIL and also stabilizes it. We conduct all the experiments on a server running IBM DB2 10.5 with scale factor and time measure, mentioned in the previous set of experiments. But here the server is installed on a 64 bit Windows virtual box with dual-core 2-GB hard disk. It operates in an Ubuntu machine with Intel i7-2600 Quad-Core @ 3.40 GHz and 4 GB RAM. This eventually makes both version of algorithms slower in comparison to the previous physical machine installation.

Fig. 6.
figure 6

Box chart of the efficiency (total time per query) of COREIL and its improved version with different values of \(\lambda \). We show in both cases the 9th and 91st percentile (whiskers), first and third quartiles (box) and median (horizontal rule).

Fig. 7.
figure 7

Evolution of the efficiency (total time per query) of COREIL and rCOREIL with \(\lambda = 400\) from the beginning of the workload (smoothed by averaging over a moving window of size 20)

Efficiency. As the offline optimal outcome for this workload is unavailable beforehand, we set an expected range of \(\lambda \) as [300, 600] depending on the other parameters like the number of queries and the size of state space. Figure 6 shows efficiency of COREIL and rCOREIL with different values of \(\lambda \). As promised by Algorithm 4, variations of rCOREIL are always showing lesser median and variance of total cost. We can also observe from the boxplot, the efficiency is maximum as well as the variance is minimum for \(\lambda = 400\). As efficiency is the final measure that controls runtime performance of the algorithm, we have considered this as optimal value of \(\lambda \) for further analysis. This process is analogous to cross-validation of parameter \(\lambda \), where the proved bounds help us to set a range of values for searching it instead of going through an arbitrary large range of values. Though here we are validating depending upon the result obtained from the whole run of 3,000 queries in the workload, the optimal \(\lambda \) would typically be set, in a realistic scenario, after running first 500 queries of the workload with different parameter values and then choosing the optimal one. Figure 7 shows that rCOREIL with \(\lambda = 400\) outperforms COREIL. With respect to the whole execution set, the average processing cost of rCOREIL is 1758 ms which is significantly less than that of COREIL (1975 ms). Also the standard deviation of rCOREIL is 90 ms which is half of that of COREIL, 180 ms. This enhanced performance and low variance establishes the claim that if we serve the learning algorithm with a better estimation of cost-model it will improve.

Fig. 8.
figure 8

Evolution of the overhead (time of the optimization itself) of COREIL and rCOREIL with \(\lambda = 400\) from the beginning of the workload (smoothed by averaging over a moving window of size 20)

Fig. 9.
figure 9

Evolution of the time taken by configuration change (index creation and destruction) of COREIL and rCOREIL with \(\lambda = 400\) from the beginning of the workload; no configuration change happens past query #2000. All values except the vertical lines shown in the figure are zero.

Figures 8 and 9 show analysis of the overhead of corresponding tuning algorithms and cost of configuration change respectively. In this set of experiments also, we can see that the overhead of the tuning algorithms dominates their total cost. Here, the overhead of rCOREIL for each query is on an average 207 ms lower than that of COREIL. This is more than \(10\,\%\) improvement over the average overhead of COREIL. In addition, rCOREIL (mean: 644 ms) also makes cheaper configuration changes than COREIL (mean: 858 ms). rCOREIL also converges faster than COREIL as the last configuration update made by rCOREIL occurs at the \(335^{th}\)query but the last two updates for COREIL occur at the \(358^{th}\) and \(1940^{th}\) queries respectively. If we look closely, the \(358^{th}\) and \(1940^{th}\) queries in this particular experiment are:

figure h

and

figure i

In reaction to this, COREIL creates indexes [ORDER_LINE.OL_D_ID,ORDER_LINE. +OL_W_ID] and [STOCK.S_W_ID, STOCK.S_QUANTITY] respectively. It turns out that such indexes are not of much use for most other queries (only 6 out of 3000 queries benefit of one of these indexes). COREIL makes configuration updates to tune the indexes for such queries, while the regularized cost model of rCOREIL does not make configuration updates due to rare and complex events, because it regularizes any big change due to such an outlier. Instead, rCOREIL has a slightly higher the overhead to find out the optimal indexes. For example, in the window consisting of 10 queries after the \(359^{th}\) query average overhead of rCOREIL increases from 1724 ms to 1748 ms.

Effectiveness. Like Sect. 7.3, here also we extract the cost of query execution to verify the effectiveness of indexes created by the tuning algorithms. Figure 10 indicates that the set of indexes created by rCOREIL are significantly more effective than those created by COREIL. We can see the average query execution time of rCOREIL is less than that of COREIL almost by a factor of 10.

Fig. 10.
figure 10

Evolution of the effectiveness (query execution time in the DBMS alone) of COREIL and rCOREIL with \(\lambda = 400\) from the beginning of the workload (smoothed by averaging over a moving window of size 20); logarithmic y-axis

At a micro-analysis level, we observe rCOREIL creates only one index with two combined attributes, all other indexes being single-attribute. On the other hand, COREIL creates only one index with a single attribute whereas all other indexes have two attributes. This observation shows that though COREIL creates parsimonious and efficient indexes, rCOREIL shows even better specificity and effectiveness in doing so.

Fig. 11.
figure 11

Scatter plot of the estimated cost by COREIL and the what-if optimizer vs execution time. Left shows correlation between cost estimated by COREIL and actual execution time (in ms). Right shows (on a log y-axis) correlation between the cost estimated by the what-if optimizer and the actual execution time (in ms) in the same run.

Fig. 12.
figure 12

Scatter plot of the estimated cost by rCOREIL and the what-if optimizer vs execution time. Left shows correlation between cost estimated by rCOREIL and actual execution time (in ms). Right shows (on a log y-axis) correlation between the cost estimated by the what-if optimizer and the actual execution time (in ms) in the same run.

7.5 Analysis of Cost Estimator

In order to examine the quality of the three cost estimators used by WFIT, COREIL, and rCOREIL to predict the actual cost of query executions or configuration updates, we observe the actual execution time, the estimated cost, and that returned by the what-if optimizer during every run of experiments for COREIL and rCOREIL, respectively. The scatter plot of Fig. 11 shows that the what-if cost has significantly less correlation (0.013) with the actual execution time than COREIL (0.1539) Again, the scatter plot of Fig. 12 shows the regularized cost estimated by rCOREIL has significantly higher positive correlation (0.1558) than that predicted by the what-if optimizer. This proves that the execution time estimated by COREIL and rCOREIL are significantly more reliable than the ones estimated by what-if optimizer. It can also been observed that rCOREIL provides better estimations: visually, there are many more points at the middle of Fig. 12 (left) with positive inclination.

Fig. 13.
figure 13

Evolution of the estimated costs of COREIL and rCOREIL with \(\lambda = 400\) from the beginning of the workload (smoothed by averaging over a moving window of size 20); logarithmic y-axis

Finally, Fig. 13 shows that the regularized cost model estimator of rCOREIL gives a more stable estimation of the cost model than that of COREIL, as the cost model estimated by COREIL (averaged over 20 queries) shows higher variance and also sensitivity to changes in types of queries.

8 Conclusion

We have presented a cost-model oblivious solution to the problem of performance tuning. We first formalized the problem as a Markov decision process. Then we devised and presented a solution, which addresses both issues of the curse of dimensionality and of over-fitting. We instantiated the problem to the case of index tuning. For this case, we implemented and evaluated the COREIL and rCOREIL algorithms, with and without regularization, respectively. Experiments show competitive performance with respect to the state-of-the-art WFIT algorithm, despite our approach being cost-model oblivious. We also show that as our cost-model estimation becomes crisp and stable the performance of learner improves significantly. Beyond the material presented in this paper, we continue studying the universality and robustness of the COREIL and rCOREIL approaches.

Specially for rCOREIL, it is an interesting problem to determine the optimal regularization parameter on the go or to adapt it with the dynamics of workload. Though now this process causes us only a one-time up-front cost, following the flavour of our approach we would like to perform it online. One possible method is to run COREIL for the first 500 queries and to calculate the costs for different set of regularization parameter values simultaneously for that period. Following that, we can choose the parameter value that causes minimum average estimation of the cost function.

We are now running further empirical performance evaluation tests with other datasets such as TPC-E, TPC-H and dedicated benchmarks for online index tuning [37]. For completeness from an engineering perspective, we are considering concurrent access, which was ignored in the algorithm and experiments presented in this paper for the sake of simplicity. We are also going to look at the favourable case of predictable workload such as periodic transactions. Furthermore, we are extending the solution to other aspects of database configuration, including partitioning and replication. For each of these aspects, we need to devise specific and non-trivial heuristics that help curb the combinatorial explosion of the configuration space as well as specific intelligent initialization techniques.

Finally, note that a critical assumption in our approach is that queries arrive sequentially and that nothing is known ahead of time about the workload. Both assumptions do not held in a number of realistic settings: queries can be submitted concurrently to the database, and a workload may often be predictable (such as when it consists of similar transactions, repeated on different data items). We leave for further work the adaptation of rCOREIL to such settings.