Keywords

1 Introduction

Similar to the source code of large systems, the modeling languages (i.e., metamodels) are subject to evolution due to changing requirements and technological constraints requiring existing models to be adapted [1, 2]. Thus, a set of changes must be applied to the initial model versions to fix the inconsistencies with the new metamodel version. This process is called metamodel/model co-evolution  [1, 3].

Several co-evolution studies are proposed where most of them are providing either a manual or semi-automated support based on pre-defined templates of evolution scenarios  [4,5,6,7,8]. In addition to being pre-defined, these templates are specific to the artifact/models to co-evolve and the metamodel. Few fully automated co-evolution studies tried to find an entire edit operations sequence that revises models in accordance with the new metamodel version [3, 9, 10]. However, several transformations require interactions with the user especially when new elements are added to the new metamodel they are hard to full-automate. Recently, an approach has been proposed to interactively evaluate the co-evolved models using search-based software engineering  [11]. The designers can provide feedback about the co-evolved models and introduce manual changes to some of the edit operations that revise the model. However, this interactive process can be expensive, and tedious since designers must evaluate every recommended set of edit operations and adapt them to the targeted design, especially in large models where the number of possible co-evolution strategies can grow exponentially. Besides, it is challenging to define upfront weights to some edit operation since the designer needs to have a look at the generated solutions to express his preferences.

In this paper, we propose an approach that takes advantage of existing Metamodel/Model co-evolution works. Thus, we propose a way to convert multi-objective search into a mono-objective one after few interactions with the developer. The first step consists of using a multi-objective search, based on the evolutionary algorithm NSGA-II  [12], to generate a diverse set of model migration strategies by finding a trade-off between three objectives of reducing the number of edit operations, the dissimilarity with the initial models to reduce the information loss and the number of inconsistencies between the models and new metamodel. Then, the designer may give some feedback on the generated solutions to express his/her preferences by selecting relevant ones. The extracted preferences from the designer, via an analysis of the location of these solutions in the objective space, are used to transform the multi-objective search into a mono-objective one by generating an evaluation function based on the weights that are automatically calculated from the selected solutions. Therefore, the designer will interact in the next iterations with only one co-evolution solution generated by the mono-objective search.

Our approach is taking the advantages of mono-objective search, multi-objective search, and interactive computational intelligence. Multi-objective algorithms are powerful in diversifying solutions and finding trade-offs between many objectives but generate many solutions as an output. The interactive algorithm is useful in terms of extracting designers’ knowledge and preferences. Mono-objective algorithms are the best in terms of optimization power once the evaluation function is well-defined and generate only one solution as an output. We selected 16 active developers to manually evaluate the effectiveness of our tool on four well-known metamodel/model co-evolution case studies. The results show that the participants found their desired revised models faster and more accurate than the current state of the art. A supplementary appendix materials can be found in the following link  [13].

Fig. 1.
figure 1

A simplified metamodel evolution example

2 Background and Motivations

Figure 1 shows an example of a simplified metamodel evolution, based on the simple state machine language and an initial model conform to it. The metamodel evolution comprises three steps: extract sub-classes for State class resulting in InitialState, SimpleState, and FinalState, make class State abstract, and refine the cardinalities of the predecessor/successor references for the subclasses. This evolution results in the fact that, besides other constraints violations, the constraint which is shown in Listing 1.1 is violated when considering the initial model of Fig. 1c and its conformance to the new metamodel of Fig. 1b.

figure a

To re-establish conformance for the given example, assume for now that only two operations on models are used in this context. Non-conforming objects may either be retyped (reclassified as instances of the concrete classes) or deleted. Thus, the potential solution space for retyping or deleting non-conforming elements contains \((c + 1)^{O}\) solutions (with c = number of candidate classes + 1 for deletion, o = number of non-conforming objects). This means, in our given example, we would end up with 64 possible co-evolutions.

Several co-evolution studies proposed to revise models after metamodels evolution from manual to fully automated approaches  [2]. Recently, few automated/interactive tools  [9,10,11] used search-based software engineering to generate revised models. The proposed tools refine an initial model instantiated from the previous metamodel version to make it as conform as possible to the new metamodel version by finding the best compromise between three objectives, namely minimizing (i) the non-conformities with new metamodel version (ii) the changes to existing models, and (iii) the dissimilarities between the initial and revised models. During the process, the designer may provide some feedback on the proposed solutions in order to improve them in the next iterations. The output is several equally good solutions (edit operations that revise the model) presented to designers to select the appropriate one based on his/her preferences.

Fig. 2.
figure 2

Tentative revised models

Figure 2 shows modified models after applying a set of edit operations extracted from the output of an existing tool  [11]. This figure shows that there may be several possible solutions where the user has to decide which one to select in the search space based on the preferences.

3 Approach Overview

Our approach includes three main phases. The first phase is the multi-objective algorithm, NSGA-II, executed for several iterations to generate a set of non-dominated co-evolution solutions called Pareto-optimal solutions  [12], defined as a set of edit operations applied to the initial model, balancing the three objectives of minimizing the number of suggested edit operations, the deviation with the initial model, and the number conformance errors with the revised metamodel.

The output of the first component can be a large number of possible solutions. Thus, it is essential to provide the designers with additional support for interacting with this set of solutions. In the second phase, the user can interact with the tool at the solution level, by accepting or rejecting or modifying suggested edit operation(s) and s/he can also give a score to a selected solution between −1 and 1 (the highest is the better). Finally, we extract the preferences automatically and use them to transform the multi-objective problem into a mono-objective one by generating weights for each of the three objectives based on the selected solutions’ locations in the objective space. The output of the mono-objective search is a single solution fitting the user’s expectations and preferences; then the designer can interact with that solution if needed and continue the execution of the mono-objective algorithm until selecting a final co-evolution solution. In the following, we will explain, in detail, the phases of our approach.

Fig. 3.
figure 3

Solution representation.

3.1 Phase 1: Multi-objective Metamodel/Model Co-evolution

Solution Representation. A co-evolution solution consists of a sequence of n edit operations to revise the initial model. The vector-based representation is used to define the edit operations sequence. Each vector’s dimension has an operation, and its index in the vector indicates the order in which it will be applied. Consequently, vectors representing different solutions may have different sizes, i.e., the number of edit operations.

Table 1 shows the possible edit operations that can be applied to model elements. The instances of classes are called objects, instances of features are called slots, and instances of references are called links. These operations are inspired by the catalog of operators for metamodel/model co-evolution presented in  [14]. The catalog includes both metamodel and model changes. Thus, we selected from it all the edit operations that can be applied to the model level since we are not changing the metamodels in this paper. Figure 3 represents a solution that can be applied to the initial model of our motivating example described in Sect. 2.

Table 1. Model edit operations.

Fitness functions. The investigated co-evolution problem involves searching for the best sequence of edit operations to apply among the set of possible ones. A good solution s is a sequence of edit operations to apply to an initial model with the objectives of minimizing the number of non-conformities \(f_{1}(s)=nvc(s)\) with the new metamodel version, the number of changes \(f_{2}(s)=nbOp(s)\) applied to the initial model, and the inconsistency \(f_{3}(s)=dis(s)\) between the initial and the evolved models such as the loss of information.

The first fitness function nvc(s) counts the number of violated constraints w.r.t. the evolved metamodel after applying a sequence s of edit operations. We apply, first, the sequence of edit operations (solution) on the initial model, then we load the evolved model on the target metamodel to measure the number of conformance errors based on the number of violated constraints. We consider three types of constraints, as described in  [15]: related to model objects, i.e., model element (denoted by O.*), related to objects’ values (V.*), and related to objects’ links (L.*). We use the implementation of these constraints in our experiments inspired by Schoenboeck et al.  [3] and Richters et al.  [15] with slight adaptations. The constraints are hard-coded in the implementation of the algorithm, and most of them are from the EMF conformance verification constraints that already exist in EMF. The full list constraints can be found in this link  [13].

For the second fitness function, which aims to minimize the changes to the initial models, we simply count the number of edit operations, nbOp(s) of a solution s (size of s). The third fitness function dis(s) measures the difference between the model elements in the initial and revised model. As the type of a model element may change because of a change in the metamodel, we cannot rely on elements’ types. Alternatively, we use the identifiers to assess whether the information was added or deleted when editing a model. In this case, the renamed or extracted model elements will be considered different than the initial model element. Thus, we considered the assumption that two model elements could be syntactically similar if they use a similar vocabulary. Thus, we calculated for the textual similarity based on the Cosine similarity  [16]. In the first step, we tokenize the names of initial and revised model elements. The textual and context similarity between elements are grouped together to create a new class, which is an essential factor in evaluating the revised model’s cohesion. The initial and revised models are represented as vectors of terms in n-dimensional space where n is the number of terms in all considered models. For each model, a weight is assigned to each dimension (representing a specific term) of the representative vector that corresponds to the term frequency score (TF) in the model. The similarity among initial and revised model elements is measured by the cosine of the angle between its representative vectors as a normalized projection of one vector over the other. This function will compare each of the initial model elements and all the elements of the revised model to find the best matching.

3.2 Phase 2: Interaction and Preference Extraction

Fig. 4.
figure 4

User interactions with the solutions of the multi-objective search

Fig. 5.
figure 5

The output of phase 2 (interactions with the user). (Color figure online)

The main goal of this step is to enable the designer’s interactions with the solutions generated by the first phase of the multi-objective search. The interaction can be performed at the solution and edit operation levels depending on the user’s desire. The feedback is quantified to a continuous score in the range of [−1, 1]. The user can evaluate a solution by modifying its edit operations (edit, add, delete, re-order) or just rate the whole solution, as shown in Fig. 4. After the designer’s interaction, solution scores (\(Score_{s_i}\)) are computed as the average score of edit operations in a solution. The solutions with the highest score are considered as the region of interest. It indicates the preferred objectives and edit operations. The line chart of Pareto-front solutions after interactions is shown in Fig. 5. The color of each line indicates user preferences (green as preferred solutions versus red as non-preferred solutions.

3.3 Phase 3: Preference-Based Mono-objective Co-evolution

One of the main contributions of this paper is the ability to convert a multi-objective algorithm into a mono-objective one after interacting with the designer to extract his/her preferences. Mono-objective algorithms are known to be the best in terms of optimization but require that the fitness function should be well-defined based on the decision maker’s preferences. The Multi-objective Evolutionary Algorithm used in Phase 1 might not provide high-quality solutions in the region of interest of the developer because of the high dimensionality nature of the problem and the need to find trade-offs. Therefore, it is important to consider the user preferences extracted in Phase 2.

The goal of this phase is to use the preferences extracted from the designer after the multi-objective optimization to transform the problem into a single objective optimization problem by aggregating objectives according to the user’s preferences. This transformation gives the decision maker a single solution. Consequently, our proposed approach is a combination of all three categories of preference-based search where the preferences are expressed after the first evolutionary process, then they are incorporated to guide the single-objective optimization.

One way to convert a multi-objective optimization problem to a mono-objective problem and achieve a single solution is called the Weighted Sum Method (WSM). In this method, the single preference fitness function is computed as a linear weighted sum of multiple objectives. The main drawback of the WSM method is that it needs the weights parameters to be given. Fortunately, in our case, those parameters are computed automatically from the decision-maker preferences of the interactive optimization process (preferred solutions based on the interaction scores) in the objectives space. Thus, the weight of one or more objectives can get the value 0 (or almost) if the selected solution(s) by the developer penalized them while favoring other objectives. Also, the WSM is not computationally expensive, unlike the other scalarization methods.

In order to solve the converted mono-objective problem, we adopted a standard Genetic Algorithm (GA). To adapt the GA algorithm to our co-evolution problem, we use the same solution representation and fitness functions as reported in phase 1. The importance (weights) of the objectives are based on the preferred solutions by the user with an interaction score higher than 0.5. The obtained single fitness function is employed to evaluate the solutions in the execution of adapted GA. Thus, the weight of each objective is calculated as the average of the objective values of the preferred solution(s) by the user. We note that all the objectives of the multi-objective search are normalized using the min-max function.

Instead of generating the initial population randomly, we acquire the user preferred solutions as the elite set of solutions from which the search process is initiated. Thus, we do not generate solutions randomly for the mono-objective GA, but we take the preferred solutions as the initial population, so we do not lose the knowledge extracted from the developer. If the number of preferred solutions is low, then we apply the mutation operator to generate more solutions. The solutions are evaluated via the preference function aggregated from multiple objectives. When the stopping condition is satisfied, the single optimal solution is recommended to the user. Similar to Phase 1, the user can interact with this solution via editing/adding/removing the edit operations.

4 Evaluation

4.1 Research Questions and Experimental Setup

  • RQ1: Search validation. How does our approach perform compared to random search (RS)?

  • RQ2: Benefits. To what extent can our approach make relevant recommendations for designers compared to existing metamodel/model co-evolution techniques including multi-objective search and an existing deterministic method?

  • RQ3: The relevance of designers’ preferences extraction. To what extent can our approach reduce the interaction effort comparing to existing metamodel/model co-evolution techniques?

Studied Metamodels and Models. To answer the research questions, we considered the evolution of GMF covering a period of two years and the UML Class Diagram metamodel evolution from  [17, 18]. These case studies are interesting scenarios since they represent real metamodel evolutions, used in an empirical study  [19] and studied in other contributions  [20,21,22]. For GMF, we chose to analyze the extensive evolution of three Ecore metamodels. We considered the evolution from GMF’s release 1.0 over 2.0 to release 2.1, covering a period of two years. For achieving a broad data basis, we analyzed the revisions of three metamodels, namely the Graphical Definition Metamodel (GMF Graph for short), the Generator Metamodel (GMF Gen for short), and the Mappings Metamodel (GMF Map for short). Therefore, the respective metamodel versions had to be extracted from GMF’s version control system and, subsequently, manually analyzed. We created different scenarios based on the number of changes introduced at the metamodel level from the different metamodel releases of GMF and UML. We merged the releases that did not include extensive changes, and we generated two evolution scenarios per metamodel type.

The different models and metamodels can be classified as small-sized through medium-sized to large-sized. In our experiments, we have a total of 7 different co-evolution scenarios where each scenario included eight different models to evolve for the GMF case-studies. The percentage of changes between the different releases is estimated based on the number of modified metamodel elements divided by the size of the metamodel. The created models for our experiments are ensuring metamodels coverage. Furthermore, we used an existing set of 10 generated models for the case of UML metamodel class diagram evolution from the deterministic work of  [17, 18]; thus, we were not involved in the selection of models and metamodel changes. To ensure a fair comparison with Wimmer et al.  [17], we only compared both approaches to the existing UML dataset. Table 2 describes the statistics related to the collected data.

Table 2. Statistics related to the collected data of the investigated cases.

Evaluation Metrics. The quality of our results was measured by two methods: automatic correctness (AC) and manual correctness (MC). Automatic correctness consists of comparing the proposed edit operations to the reference ones, operation by operation using precision (AC-PR), and recall (AC-RE). For an operation sequence corresponding to a given solution, precision indicates the proportion of correct edit operations (w.r.t. the baseline sequence) in a solution. The recall is the proportion of correctly identified edit operations among the set of all expected operations. Both values range from 0 to 1, with higher values indicating good solutions. AC method has the advantage of being automatic and objective. However, since different edit operation combinations exist that describe the same evolution (different edit operations but same target model), AC could reject a good solution because it yields different edit operations from reference ones. To account for those situations, we used MC measured by the designers. It consists of the number of relevant edit operations identified by the designer over the total number of edit operations in the selected solutions. In addition, we report the number of interactions (NI) required on the Pareto front comparing to the one required once the mono-objective search is executed. This evaluation will help to understand if we efficiently extracted the developer preferences after the Pareto-front interactions. We decided to limit the comparison to only the interactive multi-objective work of Kessentini et al.  [11] since it is the only approach offering interaction with the user, and it will help us understand the real impact of the knowledge extraction and mono-objective features (not supported by existing studies) on the recommendation and interaction effort. We also report the computation time (T) for the different evolution scenarios to estimate the effort required to obtain the best co-evolution solutions.

Study Participants. Our study involved 16 master students in Software Engineering. All the participants are volunteers and familiar with model-driven engineering and co-evolution/refactoring since they are part of a graduate course on Software Testing & Quality Assurance and most of them participated in similar experiments in the past, either as part of a research project or during graduate courses. Furthermore, 12 out of the 16 students are working as full-time or part-time developers in the software industry. Participants were first asked to fill out a pre-study questionnaire containing five questions. The questionnaire helped to collect background information such as their role within the company, their modeling experience, and their familiarity with model-driven engineering and co-evolution/refactoring. In addition, all the participants attended two lectures about model transformations and evolution, and passed six tests to evaluate their performance in evaluate and suggest model evolution solutions. We formed 4 groups, each composed of 4 participants. The groups were formed based on the pre-study questionnaire and the test results to ensure that all the groups have almost the same average skill level. We divided the participants into groups according to the studied metamodels, the techniques to be tested, and the developers’ experience. The participants were asked to co-evolve the different models manually and evaluate the results of the different approaches based on a counter-balanced design  [23].

Statistical Tests. Our experimental study is performed based on 30 independent simulation runs, and the obtained results by the alternative approaches are compared using the Wilcoxon rank-sum test  [24] with a 95% confidence level. Roughly speaking, this test verifies the null hypothesis H0 that the observed differences between the alternative results were obtained by chance or if they are statistically significant (alternative hypothesis H1). The p-value of the Wilcoxon test corresponds to the probability of rejecting the null hypothesis H0 while it is true (type I error). A p-value that is less than or equal to 0.05) means that we reject H0 and accept H1. A p-value that is less than or equal to \(\alpha (\le 0.05)\) means that we accept H1, and we reject H0. However, a p-value that is strictly greater than \(\alpha (> 0.05)\) means the opposite. In this way, we could decide whether the superior performance of NSGA-II to one of each of the others (or the opposite) is statistically significant or just a random result.

Parameter Settings. The stopping criterion was set to 100,000 evaluations for all search algorithms to ensure fairness of comparison (without counting the number of interactions since it is part of the users’ decision to reach the best solution based on his/her preferences). The mono-objective search was limited to 10,000 evaluations after the interactions with the user. The other parameters’ values are as follows for both the multi-objective and mono-objective algorithms: crossover probability = 0.4; mutation probability = 0.7, where the probability of gene modification is 0.5. Each parameter has been uniformly discrete in some intervals. Values from each interval have been tested for our application. Finally, we pick the best values for all parameters. Hence, a reasonable set of parameters values have been experimented.

4.2 Results

Results for RQ1: Search Validation. Figure 6 confirms that using NSGA-II produces results by far better (and statistically significant) than just randomly exploring a comparable number of solutions based on the three different metrics of precision, recall, and manual correctness on all the different evolution case studies. NSGA-II has precision (AC-PR and MC) and recall (AC-RE) more than twice higher than the ones of random search, as shown in Fig. 6 (\({\sim }96\%\) vs. \({\sim }42\%\)). The difference in execution time in favor of random search (Table 3), due to the crossover and mutation operators, is largely compensated by the quality of the obtained results.

RS is not efficient in generating good co-evolution solutions using all the above metrics in all the experiments. Thus, an intelligent algorithm is required to find good trade-offs to propose efficient solutions. We conclude that there is empirical evidence that our multi-objective formulation surpasses the performance of RS search; thus, our formulation is adequate, and the use of metaheuristic search is justified (this answers RQ1).

Fig. 6.
figure 6

The median evaluation scores on the four metamodel evolution scenarios with 95% confidence level (\( \alpha \) = 5%)

Results for RQ2: Benefits. We report the results of the empirical evaluation in Fig. 6. The majority of the co-evolution solutions recommended by our approach were correct and validated by the participants on the different case studies. On average, for all of our four studied metamodels/models, our approach was able to recommend 96% of generated edit operations correctly. The remaining approaches have an average of 89% and 81%, respectively, for the interactive multi-objective approach  [11] and the fully automated multi-objective approach  [10]. Both of the interactive tools outperformed fully-automated ones, which shows the importance of integrating the human in the loop when co-evolving models. The deterministic approach defines generic rules for a set of possible metamodel changes that are applied to the co-evolved models. Figure 6 shows that our approach clearly outperforms, on average, the deterministic technique based on all measures: precision, recall, and manual correctness.

Table 3. Median execution time, in minutes, on the different metamodel/model co-evolution scenarios and the number of interaction proposed by both interactive approaches

Results for RQ3: The relevance of Designers’ Preferences Extraction. Table 3 summarizes the time, in minutes, and the number of interaction (for the interactive approaches) with the participants to find the most relevant solutions using our tool (IMMO), the interactive approach (I-NSGA-II)  [11], the automated approach  [10], Random search and the deterministic approach  [17]. All the participants spent less time finding the most relevant model edit operations on the different metamodels than I-NSGA-II. For instance, the average time is reduced from 71 min to just 44 min for the case of GMF Gen. The time includes the execution of IMMO and the different phases of interaction until the designer is satisfied with a specific solution. It is clear as well that the time reduction is not correlated with the number of interaction. For instance, the deviation between IMMO and I-NSGA-II for GMF Graph in terms of the number of interactions is limited to 9 (24 vs. 35), but the time reduction is 58 min. The time includes the execution of the multi-objective and mono-objective search (if any) and the different phases of interaction until the designer is satisfied with a specific solution. The drop of the execution time is mainly explained by the fast execution of the mono-objective search and the reduced search space after the interactions with the designer.

It is clear that our approach reduced as well as the number of interaction comparing to I-NSGA-II. IMMO required much fewer designer interactions. For instance, only 16 interactions to modify, reject, and select solutions were observed on GMF Map using our approach, while 25 interactions were needed for I-NSGA-II. The reductions of the number of interactions are mainly due to the move from multi-objective to mono-objective search after one round of interactions since the designers will not deal anymore with a set of solutions in the front but only one.

5 Related Work

In one of the early works  [25], the co-evolution of models is tackled by designing co-evolution transformations based on metamodel change types. In  [7, 8], the authors compute differences between two metamodel versions, which are then input to adapt models automatically. This is achieved by transforming the differences into a migration transformation with a so-called higher-order transformation, i.e., a transformation that takes/produces another transformation as input/output. In  [26], the authors proposed an approach that compromises multiple steps for model co-evolution: change detection either by comparing between metamodels or by tracing and recording the changes applied to the old version of the metamodel. The second step is a classification of the changes in metamodel and their impact in its instances. Finally, an appropriate migration algorithm for model migration is determined. For initial model elements for which no transformation rule is found, a default copy transformation rule is applied. This algorithm has been realized in the model migration framework Epsilon Flock  [27] and in the framework described in  [28].

A comprehensive survey of interactive SBSE approaches can be found in [29]. The problems of contextualization to developer’s regions of interest during the recommendation process have been treated in recent SBSE papers for the code refactoring problem [30,31,32,33,34]. Han et al. proposed in [32] an approach to enable the interactions with the user, then a Delta Table can select the next refactoring quickly to improve a specific objective without calculating a fitness function.

6 Conclusion

In this paper, we proposed a novel approach to extract designers’ preferences to find good recommendations to co-evolve models. We combined the use of multi-objective search, mono-objective search, and user interaction in our approach. To evaluate the effectiveness of our tool, we conducted an evaluation with 16 participants who evaluated the tool and compared it with the state-of-the-art techniques. As part of our future work, we are planning to evaluate our approach on further metamodel evolution cases and a more extensive set of participants. We will also adapt our approach to address other problems requiring designers’ interactions, such as model transformation rules.