Keywords

1 Introduction

In the context of software development, model-driven engineering (MDE) aspires to develop software systems by using models as the driving-force. Models are artifacts defining the different aspects and views of the intended software system. Ideally, the gap between the models and the real software systems would be covered by appropriate code-generators.

The Unified Modeling Language [12] is the facto standard modeling language for MDE. Originally, it was conceived as a graphical language: models were defined using diagrammatic notation. However, it promptly became clear that UML diagrams were not expressive enough to define certain aspects of the intended software systems, and the Object Constraint Language (OCL) [11] was added to the UML standard. OCL is a textual language, with a formal semantics. It can be used to specify in a precise, unambiguous way complex constraints and queries over models. For example, to define integrity constraints and authorization constraints in the context of secure database-centric application model-driven development [5].

In the past a number of mappings from OCL to other languages have been proposed, each with its own goals and limitations. In particular, [7, 8, 10, 13] introduce different mappings from OCL to SQL. The limitations of these mappings, when used as OCL-to-SQL code-generators in a software development process driven by UML/OCL models, can be organized into two groups. The first group contains the limitations related to language coverage, i.e., how much part of the OCL language a mapping can cover. The second group contains the limitations related to execution-time efficiency, i.e., how much time it takes to execute a query generated by a mapping.

In this paper we provide a novel solution to the question of whether OCL queries can be transformed into executable software. Our solution covers a significant subset of the OCL language, including (possibly) nested iterators as well as undefined values. It also reduces significantly the limitation of previous mappings regarding execution-time efficiency.

Organization. The rest of the paper is organized as follows. In Sect. 2 we introduce background material about SQL and OCL. Next, in Sect. 3 we define the sub-language of OCL that our OCL-to-SQL mapping currently covers. Then, in Sect. 4 we introduce our mapping, which is defined recursively over the sub-language presented in Sect. 3. Next, in Sect. 5 we discuss some preliminary benchmarks. Finally, in Sect. 6 we review previously proposed OCL-to-SQL, and we conclude, in Sect. 7, with some closing remarks and future work.

2 Background

SQL [14] is a special-purpose programming language designed for managing data in relational database management systems (RDBMS). Originally based upon relational algebra and tuple relational calculus, its scope includes data insert, query, update and delete, schema creation and modification, and data access control. Although SQL is to a great extent a declarative language, it also includes procedural elements. In particular, the procedural extensions to SQL support stored procedures which are routines (like a subprogram in a regular computing language, possibly with loops) that are stored in the database.

Notation. Let \({ tb}\) be a table. Let \({ r}\) and \({ col}\) be, respectively, a row and a column of \({ tb}\). In what follows, we denote by \({ col}(r)\) the value stored in the column \({ col}\) in the row r. Let \({ qry}\) be an SQL-query. Let \({ db}\) be an SQL-database. We denote by \(\mathrm{Exec}({ qry}, { db})\) the result of executing \({ qry}\) on \({ db}\).

OCL [11] is a language for specifying constraints and queries using a textual notation. Every OCL expression is written in the context of a model (called the contextual model). OCL is strongly typed. Expressions either have a primitive type, a class type, a tuple type, or a collection type. OCL provides standard operators on primitive data, tuples, and collections. For example, the operator includes checks whether an element is inside a collection. OCL also provides a dot-operator to access the values of class instances’ attributes and association-ends in the given data model instance. For example, suppose that the contextual model includes a class c with an attribute \({ at}\) and an association-end \({ as}\). Then, if o is an object of class c in the given data model instance, the expression o.\({ at}\) refers to the value of the attribute \({ at}\) for the object o in this data model instance, and \({ o}\).\({ as}\) refers to the objects linked to the object o through the association-end \({ as}\). OCL provides operators to iterate over collections, such as forAll, exists, select, reject, and collect. Collections can be sets, bags, ordered sets and sequences, and can be parametrized by any type, including other collection types. Finally, to represent undefinedness, OCL provides two constants, namely, and , of a special type. Intuitively, represents an unknown or undefined value, whereas represents an error or exception. OCL is a pure specification language: when an expression is evaluated, it simply returns a value without changing anything in the model.

Notation. Let e be an OCL expression. In what follows, we denote by \(\mathrm{FVars}(e)\) the set of variables that occur free in e, i.e., that are not bound by any iterator. Let e be an OCL expression, and let v be a variable introduced in e by an iterator expression . In what follows, \(\mathrm{src}_{e}(v)\) denotes the source s of v in e. Let e be an OCL expression and let \(e'\) be a subexpression of e. Then, we denote by \(\mathrm{SVars}_{e}(e')\) the set of variables which (the value of) \(e'\) depends on, and is defined as follows:

$$ \mathrm{SVars}_{e}(e') = \bigcup _{v\in \mathrm{FVars}(e')}\{v\}\cup \mathrm{SVars}_{e}(\mathrm{src}_{e}(v)). $$

Let e be an OCL expression, such that \(\mathrm{FVars}(e) = \emptyset \). Let \(\mathcal{O}\) be an OCL Scenario. In what follows, we denote by \(\mathrm{Eval}(e, \mathcal{O})\) the result of evaluating e in \(\mathcal{O}\), according to the semantics of the language.

In what follows, without loss of generality, we assume that the names of the iterator variables within an OCL expression are unique.

3 The OCL2PSQL Language

The OCL2PSQL-language is currently defined by the following inductive rules:Footnote 1

  • and are OCL2PSQL-expression of type .

  • i is an OCL2PSQL-expression, for i an integer number. The type of i is .

  • w is an OCL2PSQL-expression, for w a string. The type of w is .

  • is an OCL2PSQL-expression, for c a class-type. The type of is \(\mathrm {Set({ c})}\).

  • is an OCL2PSQL-expression, for v a variable of a class-type t, and \({ at}\) an attribute of the class-type t. The type of \({ at}\) is the type of \({ at}\).

  • \({ ase}\) is an OCL2PSQL-expression, for v a variable of a class-type t, and \({ ase}\) an association-end of the class-type t. The type of \({ ase}\) is \(\mathrm {Set({ c'})}\), where \(c'\) is the class at the other end of \({ ase}\).

  • \(\,{ r}\) is an OCL2PSQL-expression, for \({ l}\) and \({ r}\) OCL2PSQL-expressions of the same predefined type or class-type. The type of .

  • is an OCL2PSQL-expression, for \({ s}\) an OCL2PSQL-expression of type \(\mathrm {Col({ t})}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\) and t is a predefined type or a class-type. The type of is .

  • is an OCL2PSQL-expression, for \({ s}\) an OCL2PSQL-expression of type \(\mathrm {Col({ t})}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\), with \({ t}\) a class type or a predefined type, and where \({ v}\) is a variable of the type \({ t}\), and \({ b}\) is an OCL2PSQL-expression of type . The type of is .

  • is an OCL2PSQL-expression, for \({ s}\) an OCL2PSQL-expression of type \(\mathrm {Col({ t})}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\), with \({ t}\) a class type or a predefined type, and where \({ v}\) is a variable of the type \({ t}\), and \({ b}\) is an OCL2PSQL-expression of type . The type of .

  • is an OCL2PSQL-expression, for \({ s}\) an OCL2PSQL-expression of type \(\mathrm {Col({ t})}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\), with \({ t}\) a class type or a predefined type, and where \({ v}\) is a variable of the type \({ t}\), and \({ b}\) is an OCL2PSQL-expression of type . The type of is \(\mathrm {Set({ t})}\).

  • is an OCL2PSQL-expression, for \({ s}\) an OCL2PSQL-expression of type \(\mathrm {Col({ t})}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\), with \({ t}\) a class type or a predefined type, and where \({ v}\) is a variable of the type \({ t}\), and \({ b}\) is an OCL2PSQL-expression of type \({ u}\), where \({ u}\) either a class-type or a predefined type or a type \(\mathrm{Col({ u'})}\), where \({ u'}\) either a class-type or a predefined type. The type of is \(\mathrm {Bag({ u})}\).

  • is an OCL2PSQL-expression, for \({ s}\) an OCL2PSQL-expression of type \(\mathrm {Bag(Col({ t}))}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\), with t a predefined type or a class-type. The type of is \(\mathrm {Bag({ t})}\).

  • is an OCL2PSQL-expression, for \({ l}\) an OCL2PSQL-expression of type \({ t}\), with \({ t}\) a predefined type or a class-type. The type of is .

4 The OCL2PSQL Mapping

To illustrate the definition of our mapping, we will use the following example throughout this section. Consider the diagram CarOwnership shown in Fig. 1. It models a simple domain, where there are only cars and persons. The persons can own cars (they are their owners), and, logically, the cars can be owned by persons (they are their ownedCars). No restriction is imposed regarding ownership: a person can own many different cars (or none), and a car can be owned by many different persons (or by none). Finally, each car can have a color, and each person can have a name.

Fig. 1.
figure 1

The CarOwnership model

4.1 Data Models

Firstly, we formally define the valid context for OCL2PSQL expressions. An OCL2PSQL-data model is a tuple \(\langle C, AT , AS \rangle \) where:

  • C is a set of classes c.

  • \( AT \) is a set of attribute declarations. An attribute declaration \(\langle at \), c, \(t\rangle \) denotes that \({ at}\) is an attribute, of type t, of the class c.

  • \( AS \) is a set of association declarations. An association declaration \(\langle as \), \( ase _l\), \(c_l\), \( ase _r\), \(c_r\rangle \) denotes that \({ as}\) is an association between two classes, \(c_l\) and \(c_r\), where \({ ase}_l\) is the association-end whose source is \(c_l\) and target is \(c_r\), and, vice versa, \({ ase}_r\) is the association-end whose source is \(c_r\) and target is \(c_l\).

Next, we define a mapping \(\mathrm{map}()\) from OCL2PSQL-data models to SQL-schemata.

Let \(\mathcal{D} = \langle C, AT , AS \rangle \), be a data model. Then \(\mathrm{map}(\mathcal{D})\) is defined as follows:

  • For every \(c\in C\),

    figure ao
  • For every attribute \(\langle at , c, t\rangle \in AT \)

    figure ap

    where:

    • if \(t = \texttt {Integer}\), then \(\mathrm {SqlType}(t) = \texttt {int}\);

    • if \(t = \texttt {String}\), then \(\mathrm {SqlType}(t) = \texttt {varchar}\);

    • if \(t \in C\), then \(\mathrm {SqlType}(t) = \texttt {int}\).

    Moreover, if \(t \in C\), then

    figure aq
  • For every association \(\langle as , ase _l, c_l, ase _r, c_r\rangle \in AS \),

    figure ar

Example 1

In Fig. 2 we show the SQL-schema generated by our mapping for the OCL2PSQL-data model CarOwnership. In what follows, we denote by CarDB the database create with the aforementioned schema.

Fig. 2.
figure 2

Example: map(CarOwnership)

4.2 Data Model Instances

Firstly, we formally define the instances of OCL2PSQL-data models, i.e., the valid scenarios for evaluating OCL2PSQL-expressions. Let \(\mathcal{D} = \langle C , AT , AS \rangle \) be a data model. A \(\mathcal{D}\)-instance is a tuple \(\langle OC , OAT , OAS \rangle \) where:

  • \( OC \) is a set of objects declarations. An object declaration \(( oc , c)\) denotes that \({ oc}\) is an object of the class c.

  • \( OAT \) is a set of attribute value declarations. An attribute value declaration \(\langle \langle at \), \(c'\), \(t\rangle \), (\( oc \), c), \( vl \rangle \) denotes that the value of the attribute \({ at}\) in the object \({ oc}\) is \({ vl}\).

  • \( OAS \) is a set of association link declarations. Each association link declaration \(\langle \langle as \), \( ase _l\), \(c_l\), \( ase _r\), \(c_r\rangle \), (\( oc _l\), \(c_l\)), (\( oc _r\), \(c_r\))\(\rangle \) denotes that the objects \({ oc}_l\) and \({ oc}_r\) are linked through the association \({ as}\), in such a way that \({ oc}_r\) is among the objects linked to \({ oc}_l\) through the association-end \({ ase}_l\), and, vice versa, \({ oc}_l\) is among the objects linked to \({ oc}_r\) through the association-end \({ ase}_r\).

Next, we define a mapping \(\mathrm{map}()\) from instances of OCL2PSQL-data models to SQL-databases.

Let \(\mathcal{D} = \langle C , AT , AS \rangle \) be an OCL2PSQL-data model. Let \(\mathcal{OD} =\langle OC , OAT , OAS \rangle \) be a \(\mathcal{D}\)-instance. Then \(\mathrm{map}(\mathcal{O})\) is defined as follows:

  • For every \({ oc}\in { OC}\),

    figure as

    For our reference, let \(\mathrm{id}({ oc})\) denote the integer-value automatically generated for the column \(c\_\)id.

  • For every \(\langle \langle at , c', t\rangle , ( oc , c), vl \rangle \in OAT \)

    figure at
  • For every \(\langle as , ase _l, c_l, ase _r, c_r\rangle \in OAS \),

    figure au

4.3 Expressions

Finally, in this section we recursively define our mapping \(\mathrm{map}()\) from OCL2PSQL-expressions to SQL-queries. The correctness of our mapping could be formalized as follows: Let \({ e}\) be an OCL2PSQL-expression, such that \(\mathrm{FVars}({ e}) = \emptyset \), and let \(\mathcal{O}\) be an OCL2PSQL-scenario. Then,

$$\begin{aligned} \mathrm{Exec}(\mathrm{map}_{e}({ e}), \mathrm{map}(\mathcal{O})) \equiv _{\mathrm{OCL2PSQL}} \mathrm{Eval}({ e}, \mathcal{O}). \end{aligned}$$

The different cases in our recursive definition below follow the same key idea underlying our mapping: namely, let e be an OCL2PSQL-expression, let \(e'\) be a subexpression of e, and let \(\mathcal{O}\) be an OCL2PSQL-scenario. Then, \(\mathrm{Exec}(\mathrm{map}_{e}(e'),\) \(\mathrm{map}(\mathcal{O}))\) returns a table, with a column \(\mathtt{res}\), a column \(\mathtt{val}\), and, for each \(v\in \mathrm{SVars}_{e}(e')\), a column \(\mathtt{ref\_}v\). Informally, for each row in this table: (i) the columns \(\mathtt{ref\_}v\) contain a valid “instantiation” for the iterator variables of which the evaluation of \(e'\) depends on (if any); (ii) the column \(\mathtt{val}\) contains 0 when evaluating the expression \(e'\), with the “instantiation” represented by the columns \(\mathtt{ref\_}v\), evaluates to the empty set; otherwise, the column \(\mathtt{val}\) contains 1; (iii) when the column \(\mathtt{val}\) contains 1, the column \(\mathtt{res}\) contains the result of evaluating the expression \(e'\) with the “instantiation” represented by the columns \(\mathtt{ref\_}v\); when the column \(\mathtt{val}\) contains 0, the value contained in the column \(\mathtt{res}\) is not meaningful. More concretely,

Remark 1

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e, such that \(\mathrm{FVars}(e') = \emptyset \). Let \(\mathcal{O}\) be an OCL2PSQL-scenario. Then, \(\mathrm{Exec}(\mathrm{map}_{e}(e'),\) \(\mathrm{map}(\mathcal{O}))\) returns a table, with a column \(\mathtt{res}\) and column \(\mathtt{val}\), such that, for each row r, \(\mathtt{val}(r) = 1\). Intuitively, each row in the table returned by \(\mathrm{Exec}(\mathrm{map}_{e}(e'), \mathcal{O})\) represents an element in \(\mathrm{Eval}(e', \mathcal{O})\), and vice versa.

Remark 2

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e, such that \(\mathrm{FVars}(e')\) is not empty. Let \(\mathcal{O}\) be an OCL2PSQL-scenario. Then, \(\mathrm{Exec}(\mathrm{map}_{e}(e'), \mathcal{O})\) returns a table, with a column \(\mathtt{res}\), a column \(\mathtt{val}\), and, for each \(v\in \mathrm{SVars}_{e}(e')\), a column \(\mathtt{ref\_}v\), and such that, for each row r, \(\mathtt{val}(r)\) is either 1 or 0. Intuitively, for each row r in the table returned by \(\mathrm{Exec}(\mathrm{map}_{e}(e'), \mathcal{O})\)

  • if \(\mathtt{val}(r)\) is 1, then \(\mathtt{res}(r)\) is an element in \(\mathrm{Eval}(\theta (e'), \mathcal{O})\), and vice versa, where \(\theta \) is the substitution \(\{ v\mapsto \mathtt{ref\_}v(r)\mid v\in \mathrm{SVars}(e')\}\).

  • if \(\mathtt{val}(r)\) is 0, then \(\mathrm{Eval}(\theta (e'), \mathcal{O}) = \emptyset \), where, as before, \(\theta \) is the substitution \(\{ v\mapsto \mathtt{ref\_}v(r)\mid v\in \mathrm{SVars}(e')\}\).

Variables

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let \(e' = v\), where v is a variable. Then,

figure av

Attributes

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let , where v is a variable of class-type c and \({ att}\) is an attribute of the class c. Then,

figure ax

Association-ends

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let , where v is a variable of class-type c, and \({ ase}\) is an association-end of the class c. Let \(\mathrm{Assoc}({ ase})\) be the association to which \({ ase}\) belongs, and let \(\mathrm{Oppos}({ ase})\) be the association-end at the opposite end of \({ ase}\) in \(\mathrm{Assoc}({ ase})\). Then,

figure az

AllInstances

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let , where c is a class type. Then,

figure bb

Equality

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . We need to consider the following cases:

  • \(\mathrm{FVars}(l) = \mathrm{FVars}(r) = \emptyset \). Then,

    figure bd
  • \(\mathrm{FVars}(l) \not = \emptyset \), \(\mathrm{SVars}(r) \subseteq \mathrm{SVars}(l)\). Then,

    figure be
  • \(\mathrm{FVars}(r) \not = \emptyset \), \(\mathrm{SVars}(l) \subseteq \mathrm{SVars}(r)\). As before, but swapping the order of the elements in the left-join.

  • \(\mathrm{FVars}(l) \not = \emptyset \), \(\mathrm{FVars}(r) \not = \emptyset \), \(\mathrm{SVars}(l) \not \subseteq \mathrm{SVars}(r)\), and \(\mathrm{SVars}(r) \not \subseteq \mathrm{SVars}(l)\). Then,

    figure bf

Size

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . We need to consider the following cases:

  • \(\mathrm{FVars}(s) = \emptyset \). Then,

    figure bh
  • \(\mathrm{FVars}(l) \not = \emptyset \), Then,

    figure bi

Collect

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . We need to consider the following cases:

  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') = \emptyset \).

    figure bk
  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') \not = \emptyset \).

    figure bl
  • \(v\not \in \mathrm{FVars}(b)\). Similarly, but the source and the body would need to be joined using a JOIN-clause.

Exists

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . We need to consider the following cases:

  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') = \emptyset \). Then

    figure bn
  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e')\not = \emptyset \). Then

    figure bo
  • \(v\not \in \mathrm{FVars}(b)\). Similarly, but the source and the body would need to be joined using a JOIN-clause.

ForAll

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . We need to consider the following cases:

  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') = \emptyset \). Then

    figure bq
  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') \not = \emptyset \).

    figure br
  • \(v\not \in \mathrm{FVars}(b)\). Similarly, but the source and the body would need to be joined using a JOIN-clause.

Select

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . We need to consider the following cases:

  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') = \emptyset \).

    figure bt
  • \(v\in \mathrm{FVars}(b)\) and \(\mathrm{FVars}(e') \not = \emptyset \).

    figure bu
  • \(v\not \in \mathrm{FVars}(b)\). Similarly, but the source and the body would need to be joined using a JOIN-clause.

OclIsUndefined

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . Then,

figure bw

Flatten

Let e be an OCL2PSQL-expression. Let \(e'\) be a subexpression of e. Let . Currently, \(e''\) must be a collect-expression, , where b is of type \(\mathrm {Col({ t})}\), where \(\mathrm{Col}\) is either \(\mathrm{Set}\) or \(\mathrm{Bag}\), and where t is a predefined type or a class-type.

We need to consider the following cases:

  • \(\mathrm{FVars}(e') = \emptyset \).

    figure bz
  • \(\mathrm{FVars}(e') \not = \emptyset \). Let assume that \(v\in \mathrm{FVars}(b)\).

    figure ca

5 Preliminary Benchmarks

As part of the work presented here, we have implemented in Java the OCL2PSQL mapping. The interested reader can experiment with the latest version of our tool at:Footnote 2

http://cs.vgu.edu.vn/se/tools/ocl2psql/.

In this section we provide some experiments to evaluate our mapping with respect to execution-time efficiency of the generated queries.Footnote 3 To this end, we will consider different scenarios of the database CarDB, introduced in Example 1. In particular, CarDB(n) will denote an instance of the database CarDB containing \(10^{n}\) cars and \(10^{(n-1)}\) persons, where each car is owned by one person and each person owns 10 different cars, and each car has a color different from ‘no-color’, and each person has a name different from ‘no-name’. We use a server machine, with Intel(R) Xeon(R) CPU E5-2620 v3 at 2.40 GHz with 16 GB RAM, using MySQL 5.7.25.Footnote 4 The execution-times reported here correspond to the arithmetic mean of 50 executions. The figures reported are given in seconds, unless otherwise stated.

First, suppose that we want to know the number of cars whose color is ‘no-color’. In SQL we can use the following query:

figure cb

In OCL we can specify the same query using the following expression:

figure cc

If we execute the above queries on CarDB(6) and CarDB(7), (without indexing the column color) in the table Car, we obtain the following results. The figures shown in the row “OCL2PSQL” correspond to the execution-time of the SQL-query generated, for the above OCL expression, by the OCL2PSQL mapping.

 

CarDB(6)

CarDB(7)

SQL

0.22

2.28

OCL2PSQL

0.26

3.30

Next, suppose that we want to know if there is at least one car whose color is different from ‘no-color’. In SQL we can use the following query,

figure cd

We can specify in OCL the original query using the following expression:

figure ce

If we execute the above queries on CarDB(6) and CarDB(7), without indexing the column color in the table Car, we obtain the following results.

 

CarDB(6)

CarDB(7)

SQL

0.22

2.72

OCL2PSQL

0.28

3.28

Finally, suppose that we want to know the number of cars that has at least one owner whose name is ‘no-name’. In SQL we can also use the following query, which uses joins (instead of correlated subqueries):

figure cf

We can specify in OCL the original query using the following expression:

figure cg

If we execute the above queries on CarDB(6) and CarDB(7), without indexing the column name in the table Person, we obtain the following results.

 

CarDB(6)

CarDB(7)

SQL

0.04

0.28

OCL2PSQL

0.36

4.24

6 Related Work

To the best of our knowledge, OCL2SQL [6, 10] was the first attempt of mapping OCL into SQL. Based on a set of transformation templates, OCL2SQL automatically generates SQL queries from OCL expressions. OCL2SQL only covers (a subset of) OCL boolean expressions. Moreover, the high execution-time for the queries generated by OCL2SQL makes it impractical, as an OCL-to-SQL code-generator, for large scenarios. For example, [4] reported that the query generated by OCL2SQL for the expression:

figure ch

takes more than 45 min to execute on a scenario consisting of \(10^{2}\) writers and \(10^{5}\) books, each writer being the author of \(10^{3}\) books and each book having exactly 150 pages.Footnote 5

MySQL4OCL [8] is defined recursively over the structure of OCL expressions. For each OCL expression, MySQL4OCL generates a stored procedure that, when called, creates a temporary table containing the values corresponding to the evaluation of the given expression. More concretely, for the case of iterator expressions, the stored procedure generated by MySQL4OCL repeats, using a loop, the following process: (i) it fetches from the iterator’s source collection a new element, using a cursor; (ii) it calls the stored procedure corresponding to the iterator’s body with the newly fetched element as a parameter; (iii) it processes the resulting temporary table according to the semantics of the iterator’s operator. Although cursors and loops (inside stored procedures) allow MySQL4OCL to cover a large subclass of the OCL language (including nested iterators), they also bring about a fundamental limitation to the use of MySQL4OCL as an OCL-to-SQL code-generator: they often impede the highly-optimized execution strategies implemented by SQL engines.

An interesting method for efficiently checking OCL constraints by means of SQL queries is proposed in [13]. According to this method, an OCL constraint is satisfied if its corresponding SQL query returns the empty set. As in the case of OCL2SQL, this method is limited to (a subset of) OCL boolean expressions. With regards to execution-time efficiency, the figures provided in [13] are not easily comparable with normal execution times, since the generated SQL queries are computed in an incremental way. More specifically, “whenever a change in the data occurs, only the constraints that may be violated because of such change are checked and only the relevant values given by the change are taken into account.”

SQL-PL4OCL [7] closely follows the design of MySQL4OCL and, consequently, bears the same fundamental limitation regarding execution-time efficiency, as we illustrate with an example below. Still, with respect to its predecessor, SQL-PL4OCL simplifies the definition of the mapping, improves the execution-time of the generated queries (by reducing the number of temporary tables), and implements some of the features that were left in [8] as future work: namely, handling the null value and supporting (unparameterized) sequences.

To illustrate the costly consequences, in terms of execution-time efficiency, of using cursors and loops to implement OCL iterator expressions, consider the following OCL query:

figure ci

The stored procedure generated by SQL-PL4OCL for this query is given in [7] (Example 11). Now, if we call this stored procedure on the scenarios CarDB(3), CarDB(4), CarDB(5), CarDB(6), and CarDB(7), we obtain the following execution-times:

 

CarDB(3)

CarDB(4)

CarDB(5)

CarDB(6)

CarDB(7)

SQL-PL4OCL

0.76

6.17

1 min 3.02

10 min 24.00

\(>90\)min

Note that the above query, when implemented in SQL in the expected way (without cursors and loops), takes less than 1 second to execute on the scenario CarDB(7), while the stored procedure generated by SQL-PL4OCL (using cursors and loops) did not finish its execution after 90 min. As reported in Sect. 5, the SQL-query generated by OCL2PSQL takes less than 10 seconds to execute on the scenario CarDB(7). OCL2PSQL diverts completely from MySQL4OCL/SQL-PL4OCL in that it does not rely on the use of cursors and loops for implementing iterator expressions, neither does it create temporary tables for storing intermediate results. Instead, (i) for intermediate results, it uses standard subqueries and (ii) for iterator expressions, it adds to the subquery corresponding to the iterator’s body an extra column corresponding to the iterator’s variable. Intuitively, this column stores the element in the iterator’s source that is “responsible” for the result that is stored in the corresponding row.

Finally, there have been also different proposals [1, 3, 4, 9] in the past for what we may call OCL evaluators. These are tools that load first the scenario on which an OCL expression is to be evaluated, and then evaluate this expression using an OCL interpreter. As reported in [4], the (insurmountable) problem with OCL evaluators is the time required for loading a large scenario: none of the existing tools were able to finish loading a scenario with \(10^6\) objects after 20 min.

7 Concluding Remarks and Future Work

The Object Constraint Language (OCL) plays a key role in adding precision to UML models, and therefore it is called to be a main actor in model-driven engineering (MDE). However, to fulfill this role, smart/advanced code-generators must bridge the gap between UML/OCL models and executable code. This is certainly the case for secure database-centric applications [2].

In this paper, we have defined (and implemented) a novel mapping, called OCL2PSQL, from a significant subset of OCL to SQL. Our mapping generates queries that can perform on par with queries manually implemented in SQL for non-trivial examples, overcoming the main limitations of previously proposed mappings.

Looking ahead, we recognize that manually implementing in SQL complex queries is not an easy task; in fact, we can argue that it is a more difficult task than specifying them in OCL. Suppose, for example, that we are interested in querying our database CarDB (without assuming that every car has at least one owner) about: (i) if it exists a car whose owners all have the name ‘no-name’, and (ii) how many cars have at least one owner with no name declared yet. We can specify (i) in OCL as follows:

figure cj

Similarly, we can specify ii) in OCL as follows:

figure ck

We invite the reader to implement (i) and (ii) in SQL, and draw his/her own conclusions. In our opinion, this state of affairs offers exciting opportunities for smart/advanced OCL-to-SQL code-generators.

Our challenge now is two-fold. On the one hand, we want to extend our mapping to cover the part of the OCL language that is not covered yet. In particular, (i) collection of collections (e.g., OCL expressions that represent set of sets, or sequences, or ordered sets); (ii) operations on collections of collections (e.g., union, intersection); (iii) type operations (e.g., or , which are particularly relevant when dealing with UML models that include generalizations); and (iv) invalid values. On the other hand, we want to formally prove the correctness of our mapping, based on the semantics of SQL and OCL, using an interactive theorem prover (proof assistant) like Isabelle/HOL or Coq.

Finally, we plan to use our mapping to generate appropriate executable code from UML/OCL models containing OCL invariants, and pre and post conditions. Notice that these are ultimately OCL queries of type Boolean, and therefore are covered by our mapping.