1 Introduction

Conceptual models are the main artifacts for handling the high complexity involved in current information system (IS) development processes. The cognitive nature of the models natively supports all of the issues that are derived from the presence of several stakeholders/viewpoints, abstraction levels, and organizational challenges in an IS project. The model-driven engineering (MDE) is a software engineering paradigm that promotes the use of conceptual models as the primary artifacts of a complete engineering process. MDE focuses on the business and organizational concerns so that technological aspects are the result of operations over models via transformations or mappings.

An underlying foundation for working with models was proposed in the first version of the model-driven architecture (MDA) specification of the object management group (OMG 2003). Here, the basic principles for working and managing models were defined. These can be summarized in two main features: the specification of three abstraction levels Footnote 1 (computation-independent model, CIM; platform-independent model, PIM; and platform-specific model, PSM), and the definition of the model transformation operations. However, the increase in the number of communities of model-driven practitioners and the lack of a common consensus regarding model management (due to conceptual divergences from practitioners) has produced challenges in the usage and management of models. The MDA 1.0.1 specification has become insufficient to address these challenges (see Section 2.1). Paradoxically, some of the derived challenges were formulated in IS frameworks prior to the official release of MDA specification.

One of the most critical concerns for the model-driven paradigm is the difficulty of its adoption in real contexts. Several reports have pointed out issues in model-driven adoption that are related to the misalignment between the model-driven principles and the real context (Burden et al. 2014; Whittle et al. 2013, 2014). Some of these include the overload imposed by the model-driven tools, the lack of traceability mechanisms, and the lack of support for the adoption of model-driven strategies in organizational/development processes. Evidences from model-driven works and real applications suggest symptoms of quality assessment over models. In Giraldo et al. (2014), the authors demonstrated the wide divergence in quality conception for MDE.

This work presents a 3-year process to review the literature about the conceptualization of quality in MDE. Unlike other reviews on the same topic (most of which are summarized in Goulão et al. 2016), we focus on the identification of explicit definitions of quality for MDE, as well as the perception of quality in model-driven projects from real practitioners and its associated support in the academic/research field. This focus is important considering that, in the Engineering field, high quality is determined through an assessment that takes an artifact under evaluation and checks whether or not it is in accordance with to its specification (Krogstie 2012c). Due to the specific features of the MDE paradigm, it is necessary to establish the impact of the MDE specification on the current initiatives of quality for this paradigm.

This paper presents the current state of quality conception in model-driven contexts, presenting several factors that influence it. These include the subjectivity of the practitioners, the misalignment between the real application in model-driven scenarios and the research effort required, and the implications that quality in model-driven scenarios must be considered as part of an integral quality evaluation process. This paper builds upon previous works by the authors (Giraldo et al. 2014, 2015) and makes the following contributions:

  1. (i)

    An analysis of the quality issues detected for both academic/research contexts and industrial contexts is performed in order to determine if current research works on quality in MDE meet the requirements of real scenarios of model-driven usage. This analysis was performed through a structured literature review using backward snowballing (Wohlin 2014) on scientific publications and gray literature (non-scientific publications).

  2. (ii)

    A demonstration of quality in MDE issues is presented in a real scenario. This demonstration shows that current proposals of quality in MDE do not cover quality issues that are implicit in IS projects, such as the suitability in multiple-view support, the organizational adoption of modeling efforts, and the derivation of software code as a consequence of a systematic process, among others.

  3. (iii)

    A set of challenges that must be considered and addressed in model-driven works regarding quality and the identified categories and industrial/research alignments is presented. This set is derived from the literature reviews and should be integrally considered by any quality evaluation proposal in order to guide model-driven practitioners in how to detect and manage quality issues in MDE projects.

The remainder of this article is structured as follows: Section 2 describes quality in MDE contexts and includes an extension of a previous systematic literature review (Giraldo et al. 2014) to identify the main categories of quality conceptualization in MDE to date. Section 3 shows the results of a literature review to determine the mismatch between the quality conceptions in research and the quality conceptions of industrial practitioners and communities of model-driven practitioners. Section 4 presents a real example where multiple modeling languages are used to conceive and manage a real information system. This real scenario highlights quality issues on modeling languages and also the insufficiency of a quality evaluation proposal in MDE for revealing quality issues in the analyzed scenario. Section 5 describes some of the challenges that quality in MDE evaluation must address based on the reported findings and evidence. Finally, Section 6 presents our conclusions.

2 Quality issues in MDE

2.1 Evolution and limitations of the MDA standard

The model-driven paradigm does not have a common conception; instead, there are a plethora of interpretations based on the goals of each model-driven community. The most neutral and accepted reference for model-driven initiative is the MDA specification which reflects the OMG vision about model-driven scenarios. It serves as a common reference for roles and operations in models.

Even though the MDA guide 1.0.1 (OMG 2003) has been a key specification for model-driven contexts, its lack of updates over a decade has contributed to the emergence of new challenges for model-driven practitioners. Each of these challenges has been addressed by individual efforts and initiatives. Also, this guide did not provide an explicit definition about quality in models and modeling languages despite the definition of key concepts (Table 1) for using models as the main artifacts in a software/system construction process.

Table 1 Contrast of model-driven key terms between published MDA guides

The MDA guide 2.0 (OMG 2014) released in June 2014 takes into account some of the current model challenges, including issues such as communication, automation, analytics, simulation, and execution. The MDA guide 2.0 defines the implicit semantic data in the models (which is associated with diagrams of models) to support model management operations. Although the MDA 2.0 guide essentially preserves the basic principles of model usage and transformation, it also complements the specification of some key terms and adds new features for the management of models. Table 1 shows the differences in some of the key modeling terms between MDA 1.0 and MDA 2.0. One of the most important refinements of MDA 2.0 is the explicit definition of model as information.

The MDA guide 2.0 attempts to address current model challenges, including quality assessment of models through analytics of semantic data extracted from models (model analytics). However, this specification does not prescribe how to perform analytics of this kind or quality assessment of models.

Clearly, the refinement of key concepts that is presented in Table 1 (depicted in bold) demonstrates that the MDA guide 2.0 attempts to tackle new challenges that are implicit in modeling tasks. However, this effort is not sufficient considering that the MDA guide does not specify how to identify and manage semantic data derived from models; this guide is only a preliminary (or complementary) descriptive application of model-driven standards.

In addition, most of the current challenges for the model-driven paradigm have only been proposed since the emergence of previous information system frameworks by researchers. In fact, IS frameworks such as FRISCO (Falkenberg et al. 1996) (from IFIPFootnote 2) define key aspects for the model-driven approach. These include the use of models themselves (conceptual modeling), the definition of information systems, and the use of information system denotations by representations (models), the definition of computerized information system, and the abstraction level zero by the presence of processors.

FRISCO gives MDA an opportunity to consider the communicative factor which is commonly reported as a key consequence of model use (Hutchinson et al. 2011b). In 1996, FRISCO suggested the need for harmonizing modeling languages and presented the suitability and communicational aspects for the modeling languages. Communication between stakeholders is critical for harmonization purposes. It allows important quality issues to be discussed from different views (Shekhovtsov et al. 2014). FRISCO also suggested relevant features for modeling languages (expressiveness, arbitrariness, and suitability).

These kinds of FRISCO challenges produce new concerns for model-driven practitioners. For example, suitability requires the usage of a variety of modeling languages and communication requires the languages to be compatible and harmonized. Since suitability concludes that a diversity of modeling languages is needed, the differences between modeling languages (due to this diversity) are unjustified.

MDA was the first attempt to standardize the model-driven paradigm, by defining three essential abstraction levels1 for any model-driven project and by specifying model transformations between higher/lower levels. Even though MDA has been widely accepted by software development communities and model-driven communities, the question about the ability of MDA to meet the actual MDE challenges and trends remains a pending issue.

Generally, despite the specification of the most relevant features for models and modelling languages, the lack of a specification about when something is in MDE is evident. This is relevant in order to be able to establish whether or not model-based proposals are aligned with the MDE paradigm beyond the presence of notational elements. There is no evidence of a quality proposal that is aligned with MDE itself.

2.2 A literature review about models and modeling language quality categories

In the RCIS 2014 conference, we first presented the preliminary results of a Systematic Literature Review (SR) that was performed over 21 months, with the goal of identifying the main categories in quality definition in MDE (Giraldo et al. 2014). This review is ongoing since we are attempting to demonstrate the diversity in the resulting definitions, including the most recent ones.

Figure 1 summarizes the SR protocol that was performed, which follows the Kitchenham guidelines (Kitchenham and Charters 2007) for ensuring a rigorous and formal search on this topic. As is depicted in Fig. 1, the protocol was enriched with an adaptive sampling approach (Thompson and Seber 1996) in order to find the primary authors on quality in MDE (see Section 2.5).

Fig. 1
figure 1

Summary systematic review protocol performed

This SR addressed the following research questions:

  • RQ1: What does quality mean in the context of MDE literature?

  • RQ2: What does it mean to say that an artifact conforms to the principles of MDE?

While the main research question is RQ1, question RQ2 focuses on the fulfillment of the term model-compliance, i.e., whether or not the identified works have artifacts that belong to the model-driven paradigm. For this analysis, we considered modeling artifacts such as models and modeling languages. From RQ1, we derived the search string depicted as follows:

$$\begin{array}{@{}rcl@{}} \textit{Quality} \land (\textit{Language} \lor \textit{Model}* \lor \textit{Modeling} \ \textit{language} \lor \textit{Modeling} \lor \textit{Notation}) \\ \land (\textit{Model-driven}* \lor \textit{MDD} \lor \textit{MDA} \lor \textit{Model-driven} \ \textit{Architecture} \\ \lor \ \textit{Model-driven} \ \textit{development} \lor \textit{Model-based} \lor \textit{MDE} \ ) \end{array} $$

The population of this work is made up of the primary studies published in journals, book sections, or conference papers, where an explicit definition about quality in model-driven contexts can be identified. The date range for this work includes contributions from 1990 until now. In order to identify these primary studies, we defined the search string that is presented above. All logical combinations were valid for identifying related works about quality in model-driven contexts. This search string was operationalized according to several configuration options (advanced mode) of each search engine. The information about the selected studies (bibliographical references) was extracted directly from the search engine.

The main sources of the studies were:

  • Scientific databases and search engines such as ACM Digital Library, IEEE Explore, Springer, Science Direct, Scopus, and Willey. These include conference proceedings and associated journals.

  • Indexing services such as Google Scholar and DBLP.

  • Conference Proceedings: CAISE, ER (Conceptual modeling), RCIS, ECMFA, MODELS, RE, HICSS, ECSA, and MODELSWARDS.

  • Industrial repositories such as OMG and IFIP.

For this review process, a minimal set of criteria was defined in order to include/exclude studies. These are as follows:

Inclusion criteria

  • Studies from fields such as computer science, software engineering, business, and engineering.

  • Studies whose title, abstract and/or keywords have at least one word belonging to each dimension of a search string (what, in which, and where).

Exclusion criteria

  • Studies belonging to fields that differ from computer science, software engineering, model-driven engineering, and conceptual modeling (e.g., biology, chemistry, etc.).

  • Studies whose title/abstract/keywords do not have at least two dimensions of the search string’ configuration.

  • Studies related to models in areas/fields that differ from software construction and enterprize/organizational views (e.g., water models, biological models, VHDL models, etc.).

  • Studies related to artificial grammars and/or language processing.

  • Studies not related to MDA/MDE/ technical spaces (Bézivin and Kurtev 2005) (i.e., data schemas, XML processing, ontologies).

Due to the variety of studies, a classification schema was defined in order to differentiate and analyze them. Here, RQ2 plays a key role in this literature review because the evaluation of the model-driven compliant feature allow us to focus on the main artifacts of the modelling processes: models and modeling languages. Quality definitions are different for both artifacts. In fact, the SEQUAL framework (maybe the most complete work about quality in MDE) defines separately the quality of models (Krogstie 2012c) and the quality of modeling languages (Krogstie 2012b). The first definition is based on seven quality levels (Physical, Empirical, Syntactic, Semantic and Perceived Semantic, Pragmatic, Social, and Deontic). The second definition is based on six quality categories (Domain appropriateness, Comprehensibility appropriateness, Participant appropriateness, Modeller appropriateness, Tool appropriateness, and Organizational appropriateness).

All of the detected studies were analyzed using the questions in Table 2, which were defined in accordance with RQ2. We have resolved all of the questions that this table contains for quality studies detected. These questions identify whether or not quality studies address the scope of the MDE compliant feature. For studies that do not offer a quality definition, we identify the type of proposed study based on previous categories detected in our research.

Table 2 Evaluation scheme applied for model quality studies in accordance with RQ2 (Giraldo et al. 2014)

2.3 Results

Table 3 presents the results of the search string applied in the databases. A second debugging process was necessary to discard studies that appear in the search results but that do not contribute to this research. This new review was made using the abstracts of the studies. These studies were considered to be not pertinent for this research despite their presence in the results of the search on academic databases. These works show words that are defined in the search string according to the inclusion criteria defined above; however, they do not explicitly provide any method/definition about quality in MDE and the support for multiple modeling languages. In fact, works of this kind appear as results of the search string, but they cover other topics that are aligned with model-driven approaches. We also discarded repeated studies that appear in the results of searches on multiple databases. Our analysis was made on 176 relevant studies. A summary of the analysis is presented in Fig. 2.

Table 3 Summary of the query results in the scientific databases (October 2016)

This debugging is particularly important because it reflects the broad implications involved in the terms model and quality. Although these discarded works are model-driven compliance, they reflect the ambiguity that model-driven compliance represents (even without full MDA compliance), so the mere existence of models may be criteria enough to determine compliance with the model-driven paradigm. Also, the generality in the use of the terms model and quality in the software engineering context and related areas is demonstrated, producing a diversity of works to support initiatives under those terms as a result.

During the analysis of the 176 primary studies reviewed, we checked whether each paper offered an explicit definition of quality, or at least if the study provided a conceptual framework that would allow a definition of quality to be derived as a result of the application of some theory. Therefore, from the 176 detected studies, we detected 29 studies (16.48% of the target population) that provide a definition of quality in model-driven contexts. The number of papers that provide a definition of quality is relatively low with respect to the number of identified and debugged studies. This indicates that the quality concept leads to works where quality is the result of the application of a specific approach. In those cases, quality is reduced to specific dimensions (e.g., metrics, detection of defects, increased productivity, cognitive effectiveness, etc.).

Fig. 2
figure 2

Summary of identified studies that offer definition about quality in modeling languages, in response to RQ2

Of the 29 studies that provide definitions about quality, 21 studies (11.93% of all studies) offer a definition in terms of quality of models. Eighteen of these studies (10.23%) present the quality of models in terms of diagrams (mostly UML), and only one study (0.57%) defines the quality of textual models. In addition, 15 of the 29 quality studies (8.52%) offer a definition of quality at the modeling language level, of which 11 studies (6.25%) mention quality at the concrete syntax level, 14 studies (7.95%) at the abstract syntax level, and 10 studies (5.68%) at the language semantics level. Of the 29 quality studies, 8 studies (4.55%) were detected in which the quality definition is shared between models and modeling languages. Similarly, we detected 4 other studies (2.27 %) whose definitions of quality do not consider model or language artifacts. These studies are associated to category 1 presented in Section 2.4, which proposes a quality model for a quality framework for a specific model-driven approach.

On the other hand, 147 studies were detected (83.52% of total identified studies) that do not provide an explicit definition of quality in model-driven contexts. The presence of these studies is a consequence of specific model-driven proposals formulated to promote specific works on specific aspects of quality such as methodological frameworks, experiments, processes, etc. Of these works:

  • Five studies (2.84%) present specific adoptions of standards such as ISO 9126, ISO 25010, descriptive models such as CMMIⒸ, and approaches such as goal-question-metric (GQM) to support the operationalization of techniques applied in model-driven contexts (including model transformations).

  • Seventy-eight of the 176 identified studies (44.32%) have proposed methodologies to perform tasks in model-driven contexts that are commonly framed in quality assurance processes (e.g., behavioral verification of models, performance models, guidelines for quality improvement in the transformation of models, OCL verifications, checklists, model metrics and measurement, etc).

  • Fourteen studies (7.95%) report tools that are built to evaluate and/or support the applicability of specific quality initiatives in model-driven contexts.

  • Twenty-nine studies (16.48%) are about designed experiments or empirical procedures to evaluate quality features of models that are mostly oriented toward their understandability.

  • Twelve studies (6.82%) reported specific dissertations about quality procedures in model-driven contexts such as data quality, complexity, application of agile methodology principles, evaluation of languages, etc.

  • Six studies (3.41%) are works that extend predefined model-driven proposals such as metamodels, insertion of constraints into the complex system design processes, definition of contracts for model substitutability, model-driven architecture extension, etc.

  • Seven studies (3.98%) propose domain-specific languages (DSL) for specific tasks that are related to model management or model transformations.

  • Four studies (2.27%) report model-driven experiences in industrial automation contexts where models become useful mechanisms to generate software with a higher level of quality which is defined as the presence of specific considerations at the modeling level previous to the software production.

  • Fourteen studies (7.95%) define frameworks for multiple purposes such as measuring processes, quality of services, enrichment of languages, validation of software implementations according to their design, etc.

The existence of these studies indicate that the terms quality and model are often used as pivots to highlight specific initiatives that cover only certain dimensions of quality and MDE.

2.4 Identified categories of the definition of quality in MDE

In this research, a category is a set of established practices, activities, or procedures for evaluating the quality of models, regardless of any formality level and the modeling languages involved. According to RQ1, a summary of the defined categories is presented in Table 4.Footnote 3 The categories reflect the grouping of the quality works identified. In contrast to the previous report of Giraldo et al. (2014), in this extension, we found six new categories for quality in MDE, which are highlighted in Table 4.

  • Category 1—quality model for MDWE: This quality model defines and describes a set of quality criteria (usability, functionality, maintainability, and reliability) for the model-driven web approach (MDWE). The model also defines the weights for each element of the quality criteria set, and the relation of the elements with the user information needs (MDE, web modeling, tool support, and maturity).

  • Category 2—SEQUAL framework: This is a semiotic framework that is derived from the initial framework proposed by Linland et al. Quality is discussed on seven levels: physical, empirical, syntactic, semantic, pragmatic, social, and deontic. The way different quality types build upon each other is also explained.

  • Category 3—6C framework: These works propose the 6C quality framework, which defines six classes of model quality goals: correctness, completeness, consistency, comprehensibility, confinement, and changeability. This framework emerges as a grouping element that contains model quality definition and modeling concepts from previous works such as Lindland, Krogstie, Sølvberg, Nelson. and Monarchi.

  • Category 4—UML guidelines: In this work the quality of a model is defined in terms of style guide rules. The quality of a model is not subject to conformance to individual rules, but rather to statistical knowledge that is embodied as threshold values for attributes and characteristics. These thresholds come from quality objectives that are set according to the specific needs of applications. From the quality point of view, only deviations from these values will lead to corrections; otherwise, the model is considered to have the expected quality. While the style guide notifies the user of all rule violations, non-quality is detected only when the combination of a set of metrics reach critical thresholds.

  • Category 5—model size metrics: Quality is defined in terms of model size metrics (MoSMe). The quality evaluation considers defect density through model size measurement. The size is generally captured by the height, width, and depth dimensions. This already indicates that one single size measure is not sufficient to describe an entity.

  • Category 6—quality in model transformations: The work presented in Amstel (2010) defines the quality of model transformation through internal and external qualities. The internal quality of a model transformation is the quality of the transformation artifact itself. The quality attributes that describe the internal quality of a model transformation are understandability, modifiability, reusability, modularity, completeness, consistency, and correctness. The external quality of a model transformation is the quality change induced on a model by the model transformation. The work proposes a direct quality assessment for internal quality and an indirect quality assessment approach for external quality, but only if it is possible to make a comparison between the source and the target models.

    Other work that is associated to this category is presented in Merilinna (2005). This work proposes a specific tool that automates the quality-driven model transformation approach proposed in Matinlassi (2005). To do this, the authors propose a procedure that consists of the development of a rule description language, the selection of the most suitable CASE tool for making the transformations, and the design and implementation of a tool extension for the CASE tool.

    In addition, in the work presented in Grobshtein and Dori (2011), quality is a consequence of an OPM2SysML view generation process, that uses an algorithm with its respective software application. Thus, quality is defined as the effectiveness and fulfillment of faithfully translating OPM to SysML.

  • Category 7—empirical evidence about the effectiveness of modeling with UML: The identified works do not provide a definition for quality in models; it contains a synthesis of empirical evidence about the effectiveness of modeling with UML, defining it as a combination of positive (benefits) and negative (costs) effects on overall project productivity and quality. The work contributes to the quality in models by showing the need for quality assurance methods based on the level of quality required in different parts of the system, and including consistency and completeness dimensions as part of quality assurance practices as a consequence of the communicational purposes of (UML) models.

  • Category 8—understandability of UML: This is an empirical study that evaluates the effect that structural complexity has on the understandability of the UML statechart diagram. The report presents three dimensions of structural complexity that affect understandability. The authors also define a set of nine metrics for measuring the UML statechart diagram structural complexity. This work is part of broad empirical research about quality in modeling with UML diagrams where works like Piattini et al. (2011) can be identified.

  • Category 9—application of model quality frameworks: This is an empirical study that evaluates and compares feature diagrams languages and their semantics. This method relies on formally defined criteria and terminology based on the highest standards in engineering formal languages defined by Harel and Rumpe, and a global language quality framework: the Krosgtie’SEQUAL framework.

  • Category 10—quality from structural design properties: Quality assurance is the measurement of structural design properties such as coupling or complexity based on a UML-oriented representation of components. The UML design modeling is a key technology in MDA, and UML design models naturally lend themselves to design measurement. The internal quality attributes of relevance in model-driven development are structural properties of UML artifacts. The specific structural properties of interest are coupling, complexity, and size. An example is reported in Mijatov et al. (2013) where the authors propose an approach to validate the functional correctness of UML activities by the executability of a subset of UML provided by the fUML standard.

  • Category 11—quality of metamodels: Works of this kind specific languages and tools to check desired properties on metamodels and to visualize the problematic elements (i.e., the non-conforming parts of metamodels). The validation is performed over real metamodel repositories. When the evaluation is done, feedback is delivered to both MDE practitioners and metamodel tool builders.

  • Category 12—formal quality methods: This category is related to the ARENA formal method reported in Morais and da Silva (2015) that allows the quality and effectiveness of modeling languages to be evaluated. The reported selection process was performed over a set of user-interface modeling languages. The framework is a mathematical formula whose parameters are predefined properties that are specified by the authors.

  • Category 13—quality factors of business process models: In Heidari and Loucopoulos (2014), the authors proposed the quality evaluation framework (QEF) method to assess the quality of business processes through their models. This method could be applicable to any business process notation; however, its first application was reported in BPMN models. The framework relates and measures business process quality factors (like resource efficiency, performance, reliability, etc.) that are the inherent property of a business process concept and can be measured by quality metrics.

    In this category, the SIQ framework (Reijers et al. 2015) is also identified for the evaluation of business process models. Here, three categories for evaluating models are distinguished: syntactic, semantic, and pragmatic. By this, there is an inevitable association of SIQ with previous quality frameworks such as SEQUAL (category 2) and some works of Moody; however, the authors clarify that the SIQ categories are not the same as those that were previously defined in the other quality frameworks. The authors show how SIQ is a practical framework for performing quality evaluation that has links with previous quality frameworks. SIQ attempts to integrate concepts and guidelines that belong to the research in the BPM domain.

    A complete list of works around quality for business process modeling is presented in De Oca et al. (2015). This works reports a systematic review for identifying relevant works that address quality aspects of business process models. The classification of these works was performed by the use of the CMQF framework (Nelson et al. 2012), which a combination of SEQUAL and the Bunge-Wand-Weber ontology.

  • Category 14—quality procedures derived from IS success evaluation framework: The authors in Maes and Poels (2007) proposed a method to measure the quality of modeling artifacts through the application of a previous framework of Seddon (1997) for evaluating the success of information systems. The method proposes a selection of four related evaluation model variables: perceived semantic quality (PSQ), perceived ease of understanding (PEOU), perceived usefulness (PU), and user satisfaction (US). This method is directly associated with a manifestation of the perceived semantic quality (category 2) described in Krogstie et al. (1995).

  • Category 15—a quality patterns catalog for modeling languages and models: The authors in Sayeb et al. (2012) propose a collaborative pattern system that capitalizes on the knowledge about the quality of modeling languages and models. To support this, the authors introduce a web management tool for describing and sharing the collaborative quality pattern catalog.

  • Category 16—an evaluation framework for DSMLs that are used in a specific context: the authors in Challenger et al. (2015) formulate a specific quality evaluation framework for languages employed in the context of multi-agent systems (MAS). Their systematic evaluation procedure is a comparison of a modeling proposal with a hierarchical structure of dimension/sub-dimension/criteria items. The lower level (criteria) defines specific MAS characteristics. For this category, quality is a dimension that has two sub-dimensions: the general DSML assessment sub-dimension (with criteria such as domain scope, suitability, domain expertise, domain expressiveness, effective underlying generation, abstraction-viewpoint orientation, understandability, maintainability, modularity, reusability, well-written, and readability) and the user perspective sub-dimension (with criteria such as developer ease, and advantages/disadvantages). Both sub-dimensions are addressed by qualitative analysis; it is assumed that this type of analysis is performed with case studies that are designed with experimental protocols.

Table 4 Summary of identified studies about quality in MDE (updated to October 2016)

2.5 Adaptive sampling

Using the principles of the adaptive sampling approach defined in Thompson and Seber (1996), we analyzed the identified papers in order to explore clustered populations of studies about quality in models. We made a review of the bibliographical references of each study detecting reference authors or works (i.e., previous studies formulated before the publication of the analyzed study that have been cited in the quality studies identified). We established the reference authors or reference works as those who have been referenced by at least two quality studies detected of different authors.

To do this, we defined Tables 5 and 6, where the rows refer to the authors of the identified quality studies and the columns contain the referenced authors or works. A link in the (i,j) cell on Tables 5 and 6 (the color black in the cell fill) indicates that the author of the j column has influenced the authors of the i row; so that the i-author(s) cite the j-author(s) in the quality study(ies) that were analyzed.

Table 5 Sampling of categories’ authors in quality frameworks for MDE (Part I)
Table 6 Sampling of categories’ authors and reference authors or works in quality frameworks for MDE (Part II)

In the Table 5, the columns (or j-authors) correspond to the same authors of quality studies; this was intentionally done in order to show the influence of authors on the analyzed quality studies. Table 5 shows that Krogstie (category 2) is the author that has had the most influence on the quality works analyzed. His work influences 50% of the identified quality studies, followed by Lange (category 5) with 31.3%. Two special cases occur in the columns of Krogstie and Mohagheghi (category 3); they appear as authors of identified quality papers, but they were cited by other works that were not detected in the searches of the academic databases. We wanted to highlight the other works of the authors that influence the analyzed studies.

Table 5 also shows the studies that are referenced, created, or influenced by works of the same author. These studies do not affect other authors or proposals for quality in models. However, Table 5 also shows quality communities of researchers on topics such as model metrics and guidelines mainly applied over UML. Works led by Lange, Chaudron, and Hindawi contribute to the consolidation of these research communities. This community phenomenon was originally reported in Budgen et al. (2011), and is described in works like Lange and Chaudron (2006), Lange et al. (2003), Lange and Chaudron (2005), and Lange et al. (2006). In fact, the works of Lange presented in Budgen et al. (2011) suggest that most model quality problems are related to the design process, which shows that a conflict arises with all viewpoint-based modeling forms, and not just UML.

In the Table 6, the columns represent other authors or works which were identified in the review of the bibliographical references for each quality study. As Table 6 shows, the OMG specifications and ISO 9126 standard are the most important industrial references that influence the formulation of quality studies.

The OMG specifications were cited by 68.8% of the authors of identified categories. The OMG specifications that were most cited by authors were MDAFootnote 4 specification followed by UML, MOF, OCL, and SysML specifications. Evidence of the adoption of the OMG standard suggests that the works are MDA compliant, but this does not necessarily means an explicit adoption or alignment to the MDA initiative itself. The ISO standards (cited by 50% of the works) are used to support quality model proposals on the taxonomy composed by features, sub-features, and quality attributes. It is even useful for evaluation purposes. This kind of adoption excludes the quality dimensions that are involved in the ISO standards (quality of the process, intern quality, extern quality, and quality in use).

Linland’s quality framework (Lindland et al. 1994) is one of the reference frameworks that is most frequently used and cited by the authors of the primary studies (43.75%). This framework was one of the first quality proposals formulated and it takes into account the syntactic, semantic, and pragmatic qualities regarding goals, means, activities, and modelling properties. The Krosgtie quality framework (an evolution from Linland’s framework) is recognized as the work with the most influence on contemporary works about the quality of models. In the case of Krogstie and Moody (cited by the 31.25% of the works), the authors of the analyzed studies cited early papers where they began to present the first versions and applications of their approaches. Finally, it is important to highlight the references to Kitchenham’s works to support the application of systematic review guidelines and analysis in procedures on empirical software engineering.

2.6 Other findings

As a consequence of the searches performed, an identification of studies belonging to the same authors or topics was made. These were sets of related works with specific approaches for evaluating quality over models such as model metrics, defect detections, cognitive evaluation procedures, checklists, and other works about quality frameworks. For our research, this distinction is particularly important because of their presence in the search results; however, most of them do not contribute a formal definition for quality in models. Instead, they focus on specific topics that are considered in quality strategies.

The identified families are the following:

  • Understandability of UML diagrams (Piattini et al).

  • SMF approach (Piattini et al.).

  • NDT (University of Sevilla Spain)

  • SEQUAL Framework (Krogstie)

  • Constraint—Model verification (Cabot et al., and others) (Chenouard et al. 2008; González et al. 2012; Tairas and Cabot 2013; Planas et al. 2016).

  • fUML (Laurent et al. 2013; Mayerhofer 2012).

  • OOmCFP (Pastor et al.) (Marín et al. 2010; Marín et al. 2013; Panach et al. 2015a).

  • 6C Framework (Mohagheghi et al.).

These families show how the interpretation of quality is reduced to specific proceedings or approaches in a way similar to mismatches or limitations on the term software quality. Because of this, some authors like Piattini et al. (2011) suggest the need for more empirical research to develop (at least) a theoretical understanding of concept of quality in model.

2.7 Discussion

Section 2.4 answered RQ1 (the meaning of quality in the MDE literature). The obtained categories of quality were classified in accordance with the schema that was defined in Table 2 (derived from RQ2). Despite the many model-driven works, tools, modeling languages, etc., the concept of quality has only been ambiguously defined by the MDE community. Most quality proposals are focused primarily on the evaluation of UML for many varied interests and goals.

Works about quality in MDE are limited to specific initiatives of the researchers without having applicability beyond the research or specific works considered. This contrasts with the relative maturity level of quality definitions such as the one presented in Section 2.2 (SEQUAL framework).

The low number of works on quality and the diversity of quality categories reflect specific quality frameworks and the respective communities that support these quality concept. The high number of results in the searches performed indicates misconceptions about quality due to the wide spectrum of model engineering in terms of its ease of application (any model can conform to MDE), and the lack of mechanisms to indicate when something is in accordance with MDE.

There are many definitions on quality in models in the literature, but, there is also dispersion and a general disagreement about quality in MDE contexts; this is demonstrated by multiple categories in the quality in MDE presented in Section 2.4.

MDE requires a definition of quality that is aligned with the principles and main motivations of this approach. Extrapolation of software quality approaches alone are insufficient because we move from a concrete level (code production, software quality assurance activities) to a higher abstract level to support specific modeling domains.

Traditional evaluations of UML are not enough for a full understanding of quality in models; UML is oriented to functional software features and also, is an object-oriented modeling approach. UML is the defacto software modeling approach, but the evaluation of quality models in terms of UML excludes the overall spectrum of MDE initiatives. Quality evaluation of cognitive effectiveness could restrict the overall quality in models to the diagram and notational levels.

The quality proposals analyzed do not consider how to reduce the complexity added by the model quality activities (experiments, changes in syntax and semantics, evaluation of quality features of a high level of abstraction, etc).

The quality evaluation categories reported do not take into account the implications at the tool level. Tools are a particularly important issue because a language can be explained by its associated tool. New challenges related to the tools that support MDE initiatives have emerged; an example can be seen in Köhnlein (2013). In the proposals, tools are limited to validation cases without further applicability beyond the proposal itself. Also, the lack of reports about the validation and use of the quality proposals demonstrates the level that they were formulated in the preliminarily stage of research.

2.8 The relationship between quality in MDE and V&V

Verification and validation procedures (commonly referred to as V&V) are key strategies in the software quality area for avoiding, detecting, and fixing defects and quality issues in software products. These procedures are applied throughout all the life cycle of the software product before its release.

MDE also takes advantage of V&V procedures by applying them in modeling artifacts (i.e., languages, models, and transformations) in order to find issues before the generation of artifacts such as source code or other models. One of the most representative examples in the MDE literature of V&V procedures is the MoDEVVaFootnote 5 (model-driven engineering, verification and validation) workshop of the ACM/IEEE MODELS conference.

Thirteen of the 16 categories of quality in MDE are associated to specific V&V procedures in MDE reported by the authors, highlighting the studies reported in Mijatov et al. (2013)—category 10—and (López-Fernández et al. 2014)—category 11—which appear in the proceedings of the MoDEVVa workshop (MoDEVVa 2013 and MoDEVVa 2014, respectively). Three categories (2, 3, and 15) provide guidance for evaluating quality in modeling artifacts. Works of these categories must be interpreted in order to be applied in specific evaluation scenarios.

3 A mismatch analysis between industry and academy field on MDE quality evaluation

Quality in models and modeling languages has been considered in several ontological IS frameworks even before the formulation of the model-driven architecture (MDA) specification by the object management group (OMG), as mentioned above. The ISO 42010 standard (612, 2011) defines that the architecture descriptions are supported by models,Footnote 6 but it recognizes that the evaluation of the quality of the architecture (and its descriptions) is the subject of further standardization efforts.

The survey artifact proposed in the CMA workshop of the MODELS conferenceFootnote 7 presents a set of key features for all modeling approaches, considering issues related to the modeling paradigm involved, the notation, views, etc. This is a valuable effort to harmonize the study of the modern modeling approaches, which suggest higher features to analyze in modeling languages. However, some key issues such as usability, expressiveness, completeness, and abstraction management (which are key in ontological frameworks) are poorly described. The support for transformations between models, the role of tools in a model-driven context, and the diagrams as main interaction mechanism between models and users also require better descriptions..

The above evidence demonstrates quality in MDE is not an unknown factor for the adoption of model-driven initiatives in real contexts, e.g., software, IS, or complex engineering development processes. Therefore, the consideration and/or use of the MDE paradigm in industrial scenarios is an important source for detecting quality issues, taking into account that it would impact the adoption of model-driven initiatives. It is also important to identify the support of the current MDE quality proposals for the model-driven industrial communities and practitioners.

For this reason, we performed a complementary literature review in order to find evidence of the mismatch between the research field of modeling language quality evaluation and actual MDE practice in industry. In Giraldo et al. (2015), we presented the preliminary results of a literature review. This search is currently ongoing.

3.1 Literature review process design

We have performed a structured literature review using the backward snowballing approach. It has been demonstrated that it yields similar results to search-string-based searches in terms of conclusions and patterns found (Jalali and Wohlin 2012), and we did not want to miss valuable gray literature in the results. Gray literature is not published commercially and is seldom peer-reviewed (e.g., reports, theses, technical, and commercial documentation, scientific or practitioner blog posts, official documents), but it may contain facts that complement those of conventional scientific publications.

Figure 3 summarizes the literature review protocol that was performed. This literature review is an extension of a previous systematic review reported in Section 2.2. The snowballing sampling approach helps to identify additional works from an initial reference list. This list was obtained from an initial keyword search. We use the snowballing procedure reported in Wohlin (2014) to address the following research questions:

  • RQ1: What are the main issues reported in MDE adoption for industrial practice that affect modeling quality evaluation?

  • RQ2: What is the focus of works on modeling quality evaluation in the corresponding research field?

  • RQ3: Does the term model quality evaluation have a similar meaning in both the industrial level and the academic/research level?

  • RQ4: Is there a clear correspondence between industrial issues of modeling quality and trends in the identified research?

Our snowballing search method was performed as follows:

  1. 1.

    The initial searches were done on scientific databases and search engines such as Scopus, ACM Digital Library, IEEE Explore, Springer, Science Direct, and Willey.Footnote 8These include conference proceedings and associated journals. We used search strings depicted as follows:

    $$(\textit{MDE} \ \lor \textit{Model-driven}*) \ \land \ (\textit{real}\ \textit{adoption} \ \lor \ \textit{adoption} \ \textit{issues} \ \lor \ \textit{problem} \ \textit{report} \ ) $$
  2. 2.

    For the resulting works, we chose articles that show explicit reports about the applicability of the MDE paradigm in real contexts.

  3. 3.

    For those relevant works, quality issues were identified, and their reference lists were reviewed to find related works on reporting quality issues. This iteration was made until no new works were identified.

  4. 4.

    To complement the quality issues detected, we analyzed web portals of software development communities, such as blogs, technical web sites, forums, social networks, and portals accessed from Google web search, using similar strings regarding previous scientific database searches. Our goal was to identify model quality manifestations from software practitioners who work with specific technical and business constraints.

Fig. 3
figure 3

Summary of the literature review protocol performed

Several inclusion/exclusion criteria were applied on the search results to identify relevant works for our analysis. These criteria are as follows:

Inclusion criteria

  • Works where an explicit manifestation of quality on a model-driven issue were included and presented. Examples of these manifestations are model transformation tool problems, misalignment of model-driven principles with specific business concerns, skepticism of the model-driven real application, and sufficiency, among others.

  • Reports that include an approach to identify model-driven issues in real applications (e.g., interviews with people that perform roles within an IS project, questionnaires, or description about real experiences).

  • Works that relate (and/or perform) a literature review approach on the applicability of model-driven approaches in real scenarios.

  • For non-academic works (web portals), we checked the impact and quality of the posted information. This was done by reviewing the forum messages, the academic references used, and the level of the community that supports those portals in terms of technological reports, conference-related mentions, and participants’ profiles.

  • For non-academic works (web portals), we checked the link between authors and participants with well-known companies that report the application of model-driven approaches (e.g., MetaCase, Mendix, Integranova, etc.), and academic/industrial conferences related to model-driven and IS topics (e.g., CodeGeneration Conference, RCIS, CAiSE, MODELS, and etc.).

Exclusion criteria

  • Works that report application cases of model-driven compliance approaches or initiatives (notations, application on a specific domain, guidelines, etc.), but whose main focus is the promotion of those specific approaches, without considering the collateral effects of their application.

Each included work was analyzed in order to find quality evidence (i.e., explicit sentences) in the adoption of the model-driven approach reported. Because of the kind of works detected and the level of formality of their sources, it was necessary to access the full content of each work, in order to determine the relevance of each contribution regarding the expectations formulated in our research questions. Despite the common terms used in the search strings, we only accepted works based on the MDE applicability report.

More information about reported quality issues can be found in the technical report available in Giraldo et al. (2016). This report presents all the works with their associated statements that support the detected quality issues. During the review of these issues, we found that quality evidence could be categorized as follows:

Industrial issues (RQ1)

  • Industrial issue 01: Implicit questions derived from the MDE adoption itself.

  • Industrial issue 02: Organizational support for the MDE adoption.

  • Industrial issue 03: MDA not enough.

  • Industrial issue 04: Tools as a way to increase complexity.

Academic/research issues (RQ2)

  • A/R issue 01: UML as the main language to apply metrics over models and defect prevention strategies.

  • A/R issue 02: Hard operationalization of model-quality frameworks.

  • A/R issue 03: Software quality principles extrapolated at modeling levels.

  • A/R issue 04: Specificity in the scenarios for quality in models.

Sections 3.2 and 3.3 describe in depth the above categories related to RQ1 and RQ2, respectively. Section 3.4 presents the results of the mismatch related to RQ3 and RQ4.

3.2 Detected categories for industrial quality issues

In response to RQ1, in the following, we present four categories that we defined for grouping the sentences of industrial quality issues. In Giraldo et al. (2016), 240 quality sentences are reported from industrial sources. These affect the perception of model-driven initiatives, and, therefore, their quality. Each category groups sentences of several sources that share a common quality issue. These categories were used to facilitate the analysis of the industry-academy mismatch.

The MDA is not enough category groups the sentences that report the lack of the MDA specification to resolve questions in the use and application of models and modeling languages (see Section 2.1). The Implicit questions derived from the MDE adoption itself category groups sentences in which open questions remain unresolved when a model-driven initiative (with its associated set of languages, models, transformations, and tools) is applied in a specific context.

The Tools as a way to increase complexity category groups the sentences that report explicit problems in the use and application of model-driven tools (e.g., tools based on the Eclipse EMF-GMF frameworks and associated projects). Tools are the main mechanism for creating and managing models by the application of modeling languages. Finally, the Organizational support for the MDE adoption category groups the sentences that report issues in the organizational adoption of model-driven initiatives.

In the following, we describe each category in more detail:

3.2.1 MDA is not enough

As a reference architecture, MDA provides the foundation for the usage and transformation of models in order to generate software using three predefined abstraction levels. A definition of quality in models that is supported in the alignment with MDA would not be enough. This is because the compliance with the guidelines of this architecture is the minimum criterion expected for the management of models and it must be implicitly supported by current tools and model-driven standards.

A real consequence of this MDA insufficiency is presented in Hutchinson et al. (2014). The authors show the lack of consensus about the best language and tool as being a pending issue that is not covered in the MDA specification. This issue affects real scenarios where a combination of languages is used to support specific industrial tasks. The model-driven community have recognized the lack of structural updates of the MDA specification in the last decade, which produces imprecise semantic definitions over models and transformations (Cabot). The MDA revision guide 2.0 (OMG 2014) released in June 2014 preserves these issues.

3.2.2 Implicit questions derived from the MDE adoption itself

This covers concerns about the suitability of languages and tools (Hutchinson et al. 2014; Staron 2006), new development processes derived from MDE adoption (Hutchinson et al. 2014), MDE deployment (Hutchinson et al. 2011a), the scope of the MDE application (Aranda et al. 2012; Whittle et al. 2014), and implicit questions about how and when a MDE approach is applied, e.g., when and where to apply MDE ? (Burden et al. 2014), and which MDE features mesh most easily with features of organizational change? which create most problems? (Hutchinson et al. 2011a). The correct usage of the modeling foundation in current modelling approaches is also questioned (Whittle et al. 2014).

3.2.3 Tools as a way to increase complexity

The absence of support for MDE tools and the lack of trained people require that great effort be made to adapt to the context of the organization with probably less that optimun results (Burden et al. 2014). This issue leads to problems with the followings: customization, tailoring, and interoperability among modelling tools (Burden et al. 2014; Mohagheghi et al. 2013b), management of traceability with several tools (Mohagheghi et al. 2013b), the high level of expertise and effort required to develop a MDE tool (Burden et al. 2014; Mohagheghi et al. 2013b), tool integration (Baker et al. 2005; Burden et al. 2014; Mohagheghi and Dehlen 2008b; Mohagheghi et al. 2013a), the dissatisfaction of MDE practicioners with the available tools (Tomassetti et al. 2012), the lack of technological maturity of the tools (Mohagheghi et al. 2013a), the scaling of the tools to large system development (Mohagheghi and Dehlen 2008b), poor user experience (Mohagheghi et al. 2009b), too many dependencies for adopting MDE tools (Whittle et al. 2013), and poor performance (Baker et al. 2005).

3.2.4 Organizational support for the adoption of MDE

This category represents issues that are related to commitments, costs especially training (Hutchinson et al. 2014), resistant to change (Aranda et al. 2012), the alignment and adaptation of MDE with how people and organizations work (Burden et al. 2014; Whittle et al. 2014), and organizational decisions based on diverging expert opinions (Hutchinson et al. 2011b).

The main concern of these works is the misalignment between the model-driven principles and the organizational elements. Most of the works on model-driven compliance are related to technical adoption, such as modelling tools, model-transformation consistency, and the incorporation of models in software development scenarios. However, due to the lack of an explicit model-driven process, organizational issues may not be able to be completely managed in a model-driven approach, by final model users.

3.3 Detected categories for academic/research quality issues

In response to RQ2, we propose another four categories in order to group the focus of the works on quality evaluation in the academic/research field. Seventy-one issues from this field were reported in Giraldo et al. (2016). The categories reflect the intention of the researchers in the model-driven field for managing quality issues. These are as follows:

3.3.1 Hard operationalization of model-quality frameworks

High abstraction and specific model issues influence the operationalization of model quality frameworks (i.e., the instrumentation of a framework by a software tool). Therefore, quality rules or procedures may not be fully implemented by operational mechanisms such as XSD schemas, EMF Query support, etc. In Störrle and Fish (2013) present an attempt to make operational the Physics of notations evaluation framework (Moody 2009); however, this operationalization (and any similar proposal) could be ambiguous as a consequence of the lack of precision and detail of the framework itself.

An example of model quality assurance tools as reported in Arendt and Taentzer (2013) where an operational process for assessing quality through static model analysis is presented. Instead of having an operational model quality framework, a quality framework like 6C (Mohagheghi et al. 2009a) has been used as a conceptual basis for deriving a quality assurance tool.

The lack of full operationalizations of model quality evaluation frameworks shows that model evaluation is still more an art than science (Nelson et al. 2005), and that current specifications to evaluate quality in models and modelling languages continue to be complex procedures for language designers and final model users.

3.3.2 Defects and metrics mainly in UML

Most of the quality proposals in models focus their effort on the applicability of metrics in UML models and the definition of guidelines to detect and avoid defects in UML diagrams. This trend is a direct consequence of the limitation of the model-driven paradigm in UML terms.

Limitations are based on the specific model-driven vision of OMG. This promotes the model-driven approach in UML, which offers a set of modelling notations that cover multiple aspects of business and systems modeling. MDA also promotes the UML extension using profiles by tailoring the core UML capabilities in a unified tooling environment (OMG 2003, 2014).

However, this vision contrasts with the low incidence of UML as the main artifact in software and IS development processes. Clear and recent evidence is reported in Petre (2013), where the main trend regarding the use of UML among a group of software experts was No Usage (No UML); the second representative trend was UML models were useful artifacts for specific and personal tasks, but these were discarded after explanatory tasks were completed. A very low number of participant experts mention UML in code-generation tasks. This vision also contrasts with recent evidence of removing UML in recognized development environments due to its lack of use as reported in Krill (2016).

Ambiguity in UML persists due to the specific meanings and interpretations that model practitioners applied to it. This ambiguity directly affects the full adoption of UML as a standard for software and information systems development communities. Also, there is no link between the quality issues reported in UML with the standardization effort of UML by OMG. The complexity in the UML formal specifications contributes to the confusion of model-driven practitioners.

3.3.3 Specificity in the scenarios for quality in models

The most relevant works in this issue have a specific focus from which the quality of models are defined. The quality frameworks formulated in Krogstie (2012a) and Lindland et al. (1994) have a semiotic foundation due to the use of signs in the process of the domain representation. Other works like Mohagheghi and Dehlen (2008a) and Mohagheghi et al. (2009a) propose desirable features (goals) for models. Some proposals are specific to the scope of the research performed (e.g. Domínguez-Mayo et al. 2010).

Some of the classical procedures for verifying the quality of conceptual models are related to the cognitive effectiveness of notations (generally UML models). In this way, quality motivations are limited to an evaluation (and probably intervention) process on a notation.

3.3.4 Software quality principles extrapolated at modeling levels

Within the MDE literature, there are proposals that extrapolate specific approaches for evaluating software quality at model levels, which are supported by the fact that MDE is a focus of software engineering. Some of the reported software quality approaches include the usage of metrics, defect detection in models, application of software quality hierarchies (in terms of characteristics, sub-characteristics, and quality attributes), best practices for implementing high-quality models, and model transformations. There is even a research area that is oriented to the evaluation of the usability of modeling languages (Schalles 2013), where the usability in diagrams is prioritized as the main quality attribute of models.

The main motivation for this extrapolation is the level of relative maturity of the software quality initiatives. In Moody (2005), the author suggests the formulation of quality frameworks for conceptual models based on the explicit adoption of the ISO 9126 standard, because of its wide usage in real scenarios and the fact that this standard makes recognizable the properties of a product or service. In Kahraman and Bilgen (2013), authors present a set of artifacts that are formulated to support the evaluation of domain-specific languages (DSLs). These instruments are derived from an integration of the CMMI model, the ISO 25010, standard, and the DESMET approach. The success of a DSL is defined as a combination of related characteristics that must be collectively possessed (by combining practices from CMMI and ISO 25010 hierarchy). Proposals of this kind assume that there is an existing relation among organizational process improvement efforts, their maturity levels, and the quality of DSL’s.

Software quality involves a strategy for the production of software that ensures user satisfaction, absence of defects, compliance with budget and time constraints, and the application of standards and best practices for software development. However, software quality is a ubiquitous concern in software engineering (Abran et al. 2013), and therefore, in the MDE context, additional effort is required for the adoption of the MDE approach.

3.4 Findings in the literature review of mismatch

For this literature review, journal papers were the main source of quality issues for both contexts (industrial and research), as shown in Fig. 4. However, for the industrial context, specialized websites (gray literatureFootnote 9) make significant contributions to the quality from a practitioners’ perspective. We found 49 industrial works and 24 academic/research works; the analysis was made on a total of 73 works.

Fig. 4
figure 4

Percentage distribution of identified works by type

To answer RQ3, Table 7 presents the identified works classified in the categories described in Section 3.1. The found mismatches show that model-driven practitioners perceive quality of models and modeling languages in different ways. It greatly depends on the application context where modelling approaches are used.

Table 7 The works found in the review of the mismatch between the research field of modelling language quality evaluation and the actual MDE practice in industry

Figure 5 shows the percentage of quality issues detected in the industrial works analyzed. From a real software engineering perspective, there is an initial assumption about the high degree of impact related to model-driven tools and its consequences on development and organizational environments. However, for the industrial works analyzed, we detected the implicit questions derived from the MDE adoption itself issue as being the first concern of quality regarding the applicability of models and modeling languages. This issue is derived from the great ambiguity about when something is in MDE (or when something is MDE compliant) and also from the open questions generated in the application of models.

Fig. 5
figure 5

Percentage of quality industrial issues detected

Clearly, industrial publications show a marked trend when discussing the deficiency, consequences, and support of the modeling act itself before using of specific modeling tools. In addition, quality issues related to the tools are evident in the detected works. Beyond the consequences of the application of model-driven initiatives, tools become a key artefact in perceiving, measuring and managing quality issues in modeling languages, taking into account concerns related to organizational, interactional, and technical levels.

The results in Fig. 6 highlight the presence of academic and research works that address industrial issues such as implicit questions derived from the MDE adoption itself and tools as a way to increase complexity. Some statements from academic and research sources show an alignment with industrial issues. However, in Fig. 6, the percentage of works that address industrial issues is lower than the sum of the percentages of works that promote specific interests of researchers in this field. It shows that model-driven researchers tend to focus on theoretical works; thus, these industrial issues are not interesting or relevant to model-driven researchers. This lack of research support increases the conceptual and methodological gaps for the real application of model-driven initiatives and promotes confusion in the model-driven paradigm.

Fig. 6
figure 6

Percentage of quality academic/research issues detected

An example of this theoretical emphasis of researchers is the relative proximity of the issue of implicit questions derived from the MDE adoption itself of the industrial category and the issue of defects and metrics mainly on UML of the academic/research category. There are many efforts that target the quality management of models through the intervention of modeling practices in UML as the defacto language for software analysis/design. There is clearly a gap between these quality trends and the reports about the real usage and applicability of UML, as in the study reported in Petre (2013).

Academic/research works also consider the inherent complexity involved in achieving concrete tools from theoretical quality frameworks for models and languages due to the high level of abstraction involved in them. In contrast, industrial works do not report specific quality issues that are related to the academic/research categories. Therefore, for answering RQ4, the above evidence demonstrates a very significant difference between the perceptions and efforts regarding quality in modeling languages and models for industrial and academic/research scenarios. This issue gap between industrial and academic communities requires a method that resolves the problems in industry that are not covered by the current methods.

In the academic-research and industrial contexts, the subjectivity and the particularities of the application scenarios play an important role in the derivation of quality issues in model-driven initiatives. Figure 7 shows the main intention of the analyzed works, depending on whether the work was written for academic/research purposes or for industrial purposes. These intentions refer to personal opinions, studies, or approaches. The main sources for the industrial context are opinions and interactions in web sites reported in the gray literature. This is valuable considering that these resources show real experiences of attempts to use model-driven initiatives in real software projects.

Fig. 7
figure 7

Summary of the intentions found in the analyzed works

In the academic and research field, there is a strong trend (41.67% of reported works) toward specific model-driven initiatives promoted by practitioners. Among these initiatives are DSL, model-driven approaches, operations on models (e.g., searching over models, establishing the level of detail of models), and specific considerations for model transformations (e.g., BPMN models to petri nets). Although several modeling language quality issues were extracted from formal studies performed by researchers, it is important to note how quality issues also serve as excuses (or pivots) for promoting specific model-driven initiatives.

In summary, the current academic/research methods have not solved quality issues for MDE reported in the industry (Section 3.2). It seems that researchers have not yet addressed these problems satisfactorily.Footnote 10 Therefore, we consider it necessary to list the open challenges and to define (in a greater depth) the research roadmap proposed in Giraldo et al. (2015) in order to cover these issues comprehensively. Thus, in Section 4, we show a real scenario in which quality issues associated to the Sections 3.2.2 and 3.2.4 are depicted. Afterwards, in the Section 5, we present a set of challenges we inferred from the evidences related in both above literature reviews.

4 The sufficiency of current quality evaluation proposals

In this section, we present a scenario for multiple application modeling languages. The case presented in this section was a finished project that had been previously developed by the authors, the implementation of an information system for institutional academic quality management. In this IS project, quality issues were empirically demonstrated. Quality evaluation methods were not used during the execution of this model-driven project. The full specification of the case is presented in Appendix A.

The objective of this scenario is to demonstrate that the application of an existing quality method has not revealed all of the modeling quality issues of the project, despite the execution of the analysis as a post-mortem task. For this empirical study, we have chosen the Physics of Notations - PoN - (Moody 2009), the most widely cited modeling language quality evaluation framework available in the literature. We show that, despite having many useful features, this framework is insufficient to cover all the needs that arise when evaluating the quality of (sets of) modeling languages in MDE projects. The identification of these uncovered needs serves as additional input for the definition of a research roadmap in Section 5.

A post-mortem analysis was performed to evaluate the quality of a set of modeling languages that were employed in the project (Flowchart, UML, E/R, and architecture languages). Appendix A.1 presents the models that were obtained in the project. Each one of the PoN principles was applied to the obtained models in the project to determine whether or not the models meet the PoN principles. Appendix A.2 presents the results of the quality assessment with the PoN framework.

Table 8 summarizes the detected quality issues in the proposed scenario. Although it is true that the application of the PoN framework allows quality issues in the modeling scenario to be detected, other critical quality issues were not detected by this method. PoN meets its goals of analyzing the concrete syntax of the modelling languages under evaluation. However, other quality issues appear for factors such as multiple modeling languages, different abstraction levels, several stakeholders, and viewpoints.

Table 8 Quality issues detected for the multiple modelling language scenario

One single quality framework may be insufficient to integrally address all quality issues in MDE projects. Even though there are guidelines to support the application of existing individual quality methods which avoid subjective criteria that influence the final results of the analysis for PON (e.g., da Silva Teixeira et al. 2016), there are no systematic guidelines for using quality methods for MDE in combination.

5 Open challenges in the evaluation of the quality of modeling languages in MDE contexts

Sections 23, and 4 presented the problems and questions that remain regarding the evaluation of quality issues in the MDE field. Current phenomena for model-driven applicability, use, and the associated quality issues create several challenges that impact the adoption of the model-driven paradigm. Here, it is not enough to evaluate quality from a prescriptive perspective as is proposed for most of the identified quality categories of Section 2.4. Any quality evaluation method in models and modeling languages requires the incorporation of the realities regarding MDE itself.

These realities are not unfamiliar to the model-driven community. In the following, we have highlighted the terms and sentences that represent them in bold. They were taken from recognized sources that provide definitions about models. A quick overview of some classical model definitions reveals the presence of subject as a fundamental element of the model itself. This is valid for the unified axiom of model as concept in order to understand a subject or phenomenon in the form of description, specification, or theory:

  • OMG MDA guide 1.0 (OMG 2003): A model of a system is a description or specification of that system and its environment for a certain purpose. A model is often presented as a combination of drawings and text. The text may be in a modeling language or in a natural language. Model is also a formal specification of the function, structure, and/or behavior of an application or system.

  • OMGA MDA guide 2.0 (OMG 2014): A model is information that selectively represents an aspect of a system based on a specific set of concerns. A model should include the set of information about a system that is within.

  • ISO 42010-2011 (612, 2011): A model can be anything: a model can be a concept (a mental model), or a model can be a work product. Every model has a subject, so the model must answer questions about this subject.

  • Stanford Encyclopedia of Philosophy (SEP) (Hodges 2013): A model is a construction of a formal theory that describes and explains a phenomenon. You model a system or structure that you plan to build by writing a description of it.

A conceptual foundation for the model-driven approach was established for the information system community before the formulation of MDA itself, taking into account the main challenges (see Section 2.1). ISO 42010 established the importance of the viewpoint, view, model kind, and architectural description concepts. Also, the term correspondence must be used in the specification of model transformations. Specifically, FRISCO presents the suitability and communicational aspects for the modeling languages and the need for harmonization of modeling languages. The communicative factor is commonly reported as a key consequence of model usage (Hutchinson et al. 2011b).

In addition, the subject of modeling includes quality issues as presented in Section 3. The subjective usage of model representations, the freedom to formulate model-driven compliance initiatives, and the wide applicability of models for any IS-supported domain requires an underlying support to analyze models and all the artifacts that modeling languages provide in order to model any IS phenomena. This rationale must consider the key premises on which the model-driven context was promoted. These become the main input for any model analytics process, in a way that is complementary to previous model quality evaluation frameworks.

The research roadmap of France and Rumpe (2007) was (and continues to be) widely accepted by model-driven practitioners due to their explicit skepticism about the MDE vision and its related problems (including quality evaluation issues). Other roadmaps as presented in Kolovos et al. (2013), Mohagheghi et al. (2009b), Rios et al. (2006), and Vallecillo (2010) address specific concerns about MDE applicability, with informal considerations about its adoption in real scenarios, and lack of relation to any IS foundations. These quality issues that we have described in Section 3 show a gap between the real application of MDE and its foundational principles.

Because of the divergence of the quality definition in MDE, the lack of support from the academic/research field for practitioners of the model-driven initiatives, and the diverse interpretations by the different research communities, we have deduced a set of challenges that any quality evaluation method for MDE should consider in order to assess quality from a MDE viewpoint (i.e., taking into account the main realities that govern this paradigm). We consider that a required rationale for quality evaluation in model-driven initiatives must address the following critical challenges in the MDE paradigm itself:

5.1 Using multiple modeling languages in combination

This reality is inherent to IS development where multiple views must be used to manage the concerns derived from stakeholders. Each view could have its associated language, and in the same way, one language could support several views of the information system. In this case, if L is the set of all the modeling languages \( \{ l_{1}, l_{2}, {\dots } l_{n} \} \) used to support the views (and viewpoints) in a IS project and Q is the assessment of quality for MDE, then \( Q \{L\} \ne Q\{l_{1}\} \cup Q\{l_{2}\} {\dots } \cup Q\{l_{n}\} \).

Several questions are derived from IS feature: the suitability of the languages used to model and manage a specific view, the coverage level of the modeling proposals, the relevance and pertinence regarding the specific intention of modeling, and the degree of utility of a modeling language by virtue of the stakeholder concerns under consideration.

Even though, the evaluation of these features heavily depends on subjective criteria, their consideration is mandatory to be able to support modeling and integration approaches on views within a model-driven project (with their respective implications). Subjectivity is intrinsic to the model-driven paradigm, and although an absolute truth in model-driven will not be possible, its consideration facilitates the consolidation of model management strategies in model-driven environments. These quality questions are the essential for information systems.

The treatment of the multi-factor concept is not a new topic in the MDE community. It has been considered in previous MDE challenges as reported in Van Der Straeten et al. (2009). However, the percentage of works that propose a method to manage the multi modeling phenomenon is very low (Giraldo et al. 2014), and these do not provide a computerized (operational) tool for model-driven practitioners.

The multiple feature in models and information systems (and its derived quality implications) inherently leads to the analysis of the capabilities provided by modeling languages to represent an IS phenomenon adequately and to integrate it with other proposals that cover others IS concerns. The current information systems foundations provide the required inference tools to contrast the capabilities of modeling languages to support the multiple feature.

In Section 2.3, the percentage of identified works that consider quality evaluation methods for multiple languages is low (4.02%). It shows the minor impact of quality in model proposals on the management of complex information system developments, which contain multiple views and viewpoints supported by conceptual models.

The works that consider evaluation over a set of modelling languages (Krogstie 2012c, d; Mohagheghi et al. 2009a) present two theoretical evaluation frameworks whose operationalizations are not clear (i.e., any evaluation procedure could be too abstract for the MDE community especially for people from software development contexts). However, their works are a very important advance in the foundation of a body of knowledge for quality in MDE. The evaluation of multiple modeling languages remains an open issue. Evidence can be found in different reports of the application of some quality works. Generally, these reports present the evaluation of a single modeling language. The evaluation of multiple languages is empirically deducted.

5.2 Assessing the compliance of modeling languages with MDE principles

There is a general consensus about the MDE concept as the promotion of models as primary artifacts for software engineering activities (Di Ruscio et al. 2013; González and Cabot 2014), and as the presence of model transformations that refine abstract/concrete modeling levels. However, due to the generality of this consensus, an initiative may be model-driven without a strict fulfillment of the minimum aspects necessary for real applicability with technological support (e.g., notations without an associated abstract syntax, stereotyped elements of common modeling languages, or modeling proposals with specific intentions and poor adoption by model-driven practitioners).

Despite the specification of the most relevant features for models and modeling languages, there is a lack of specification about when something is in MDE (Section 3.2.2); this must be established if model-based proposals are aligned with the MDE paradigm beyond the presence of notational or textual elements. There is no quality proposal that is aligned with MDE itself (i.e., a quality approach that defines a validation procedure to determine whether or not a model-driven initiative meets the MDE core features). Although intuitively one could consider that it all boils down to the extent to which a specific model-driven method meets the core MDE features, literature on quality has not explicitly covered in detail what it means to be aligned with MDE and whether the quality of this alignment can be measured.

It is arguable that the existence of methods claiming to be model-driven that do not actually fulfill the MDE paradigm influences the stakeholders perception of the MDE paradigm itself. For instance, an alleged method might not fulfill expectations, and these negative experiences might end up being generalized to the paradigm itself. This can be a factor that hinders the adoption of MDE approaches and contributes to open issues such as the ones covered in Section 3.2.

The definition about when something is in MDE or when something is MDE compliant must take into account critical concerns beyond the simple usage of models or textual and graphical representations. This includes the alignment of the model with a modeling purpose (in a way similar to the multidimensional views in IS development), the explicit association with an abstraction level (principle that is introduced by MDA), the conceptual support of modeling languages through metamodels, and the capabilities provided by the modeling artifact to integrate with other modeling initiatives and to support models transformations, mappings, and software generation.

In this way, quality (Q) can be defined as the operation Q = {L, E}, where L is a set \( \{ l_{1}, l_{2} {\dots } l_{n} \} \) of one or more modeling languages in a MDE project and E represents a MDE environment (i.e., the set of the concerns in a MDE project such as the one described above). Therefore, determining Q implies that ∀ lL, l satisfactorily meets (or addresses) E.

5.3 Explicitly using abstraction levels as quality filters of modeling languages

This challenge is a consequence of the MDA specification where three abstraction levels (computation-independent model (CIM), platform-independent model (PIM), and platform-specific model (PSM)) were explicitly proposed in order to clarify and define the usage and scope of models with regard to their intention and closeness with business, system, or technical levels.

Abstraction levels act as the reference element to evaluate the convenience of modeling proposals. Harmonization of modeling initiatives within model-driven projects should be supported in information provided by the abstraction levels. Other quality features such as suitability, coverage, communication, integration capacities, and mapping support can be analyzed (possibly predicted) by the explicit presence of abstraction levels. Abstraction levels should not have ambiguous concepts. Theoretical frameworks such as FRISCO provide definitions about computerized information systems and the abstraction level zero through the presence of processors. In this way, the lower abstraction level is framed around technological boundaries where information is processed.

Abstraction levels are a critical approach for understanding information systems and defining the alignment of model-driven initiatives with business, system, or technical scenarios within an IS architecture (in accordance with the MDA specification). The abstraction levels make the use of modeling techniques explicitly, so that a posterior inference process can determines the suitability of the modeling proposal.

The abstraction level challenge includes a discussion about the convenience of the model-driven architecture and its support through the instanceOf relation. This relation occurrs between layers, not inside them. No other relations are permissible. This is a constraint artificially imposed without any philosophical or ontological arguments.

The lack of a common consensus about when something is model-driven compliant (challenge 5.2) favors the emergence of self-denominated model-driven initiatives without a formal analysis beyond notational proposals that are supported by a problem context that justifies their formulation. The explicit presence of abstraction levels within a model quality evaluation procedure allows the convenience of any model-driven compliance initiative to be taken into account based on the rules and prescripts of each level. For example, decisions about the practical implications for using UML at business levels could be addressed and contrasted against the implications of the model semantics and the scope of the business level.

5.4 Agreeing on a set of generic quality metrics for modelling languages

The applicability of metrics and measurement processes in models has been used to rate specific elements that are associated to model-driven projects. This includes the presence of defects (Marín et al. 2013), the size of diagrams (commonly UML diagrams) (Lange 2007a), model transformations (van Amstel et al. 2009), metamodels (Monperrus et al. 2008; Pallec and Dupuy-Chessa 2013), metamodels with controlled experiments (Yue et al. 2010), etc. Since these application are very specific, works of this kind can only be starting points to define operationalization of specific quality efforts.

Reports about metrics in models present the intention of applying metric approaches derived from software quality works. However, the quality features presented above (Section 2.4) do not include certain associated metrics. The usage and applicability of metrics is highly subjective. Consequently, it is not important to discern which specific field of the model-driven paradigm is the most appropriate to identify and implement metrics (e.g., metrics on notations, metrics for the use of models, metrics on metamodels, metrics for a specific modeling language).

Most of the identified works define metrics for models. We recommend metrics for modeling languages. Some works also define metrics for subsets of languages, e.g., metrics that are specified by metamodeling (López-Fernández et al. 2014). However, the scope of these metrics is limited. Therefore, we consider metrics that can be applicable to any modeling language or sets of modelling languages.

The most important contribution of the metrics should be to consolidate the essential aspects of model management in order to establish a set of core modeling features that can be used. This is a challenge given the large size of the model-driven paradigm compared to traditional software development projects. From a MDE pure viewpoint, a set of metrics is required to measure the derived features and issues in the information systems modeling process itself.

Thus, approaches such as goal-question-metric (GQM) or other metric-related techniques can be useful for deriving metrics from the goals associated to the modeling act itself (independent of the degree of subjectivity presented). Modeling goals should be aligned with information systems architectural principles over specific individual considerations derived from the application of specific model-driven approaches.

5.5 Including model transformations in the modeling language quality equation

Model transformations are critical in model-driven contexts. Modeling languages are often the source or target of model transformations. It is critical to ensure that the modeling languages are appropriate for this purposes.

Transformations constitute the full manifestation of the power of conceptual models in terms of managing the complexity associated with the multiple views and deriving artifacts from the same subject under study. Works such as Van Amstel (2010) present new quality features for the transformations. These are derived from a transformation process itself (i.e., the rationale of the transformation and its consequences beyond the mere usage of model transformation languages).

Some current works propose methods for evaluating the quality of transformation of languages. We think it is important to consider the opposite way, i.e., given the goal of defining a transformation from one modelling perspective to another either horizontally (endogenous) or vertically (exogenous), we need methods to evaluate whether or not the choice of source/target modeling languages is appropriate. This idea is also considered in Da Silva (2015). The author claims that models must be defined in a consistent and rigorous way; therefore, a certain level of quality is required so that models might be properly used in transformation scenarios.

For existing works, a pre-selection of the languages is assumed so that the appropriateness of the transformation is evaluated. However, there are no mechanisms for reasoning whether or not the languages are appropriate. Figure 8 presents an appropriate order for transformations. It includes reasoning about the languages as the first step, then the design of a transformation, and, finally, the quality evaluation.

Fig. 8
figure 8

Proposed order for model transformations

The inherent complexity of transformations must be tamed by a process, where the main features of the transformation can be identified and managed. Modeling transformation languages cannot provide full support to phenomena derived from issues such as the following: transformations between modeling languages in the same abstraction level; influence of traceability in the transformation; implications of information carried in traceability models (Galvão and Goknil 2007); addition of information in mappings models; and differences between mapping and transformation models.

Orientations about model transformations as presented in Mens and Gorp (2006) consider the mappings and transformation to be a managed process, where activities such as analysis, design, implementation, verification, and deployment can be performed. Both alternatives (mapping and transformation) must be considered in accordance with the MDA principles (the basis for the general consensus around the model-driven initiative).

All decisions about transformations should not be delegated exclusively to the model transformation language employed; it is an artifact of the model transformation process itself. In addition, semantic rules in models (expressed by object constraint languages—OCL—for example) require supporting information about considerations for their translation. Addressing the question about when the conversion under analysis is a mapping model or transformation model must be the initial activity and orientation of the process itself.

5.6 Acknowledging the increasing dynamics of models

Taking advantage of the context of use of to the semiotic dimension of pragmatics, in Barišić et al. (2011), the authors propose the evaluation of the productivity of domain experts through experimental validation of the introduction of DSLs. This is a key issue because it considers the quality in use characteristic for DSLs, so quality in model-driven context transcends beyond the internal quality presented in Moody (2005). Using usability evaluation, the authors provide some traces for the cognitive activities in the context of languages based on user and tasks scenarios.

Unfortunately, experiments of this kind only consider element languages (except the representation) as the natural consequence and interface between the syntax, semantics, and users. A representation must reflect the semantics of the language, i.e., implicitly the semantics could be derived from the representations. With this term, we considered both diagrams and textual instances of the modelling languages from the perspective of their users.

The MDA revision guide 2.0 promotes this challenge by presenting analytic procedures that are performed once the data semantics behind the diagrams are captured.Footnote 11 MDA 2.0 prescribes the capture of models in the form of required data for operations such as querying, analyzing, reporting, simulating, and transforming (OMG 2014).

There is more evidence that models are no longer static representations of realities. The dynamics in models is increasing in MDE environments. By dynamics, we refer to interaction with the elements of the model, the navigation through structures of related models, the simulation of behavior (e.g., GUI models), queries on models, etc.

This dynamics is not usually considered by frameworks for the evaluation of quality in modeling languages. However, it is important for the essential management and use of models in MDE projects. Ignoring it can lead to problems in the final system. Therefore, we believe this challenge must be explicitly considered as part of the quality of (sets of) modeling languages in MDE environments.

Most of the modern proposals about semantics management in model-driven context are too formal and empirical for the community. The lack of an appropriate treatment for representations promotes the presence of modeling tools that do not have the appropriated tools support for modeling purposes (only representations without any association to the semantics). A modeling language can be considered good if its associated tool implicitly explains and supports its semantics.

5.7 Streamlining ontological analyses of modeling languages

The reported methods for evaluating quality in models and modeling languages include artifacts such as guidelines, ontological analysis, experimentation, and usability evaluation. Ontological analysis is one of the approach that is most reported to evaluate modeling languages regarding concrete conceptualizations. Works such as Becker et al. (2010), Opdahl and Henderson-Sellers (2002), and Siau (2010) give some examples of evaluation processes with the BWW ontological model applied over UML and DSLs, respectively. In Costal et al. (2011), the enhancement of the expressiveness of the UML was proposed based on the analysis using the UFO ontology. The authors in Ruiz et al. (2014) use an ontological semiotic framework for information systems (FRISCO) as pivot to integrate two modeling languages; ontological elements are used to relate and support the integration between concepts of both languages.

While it is true that ontological guidance provides a powerful tool to help in the understandability of models (Saghafi and Wand 2014), ontological analysis includes procedures at philosophical levels which may not be accessible (or interesting) for all of the model-driven community. These analyses are performed by method engineers who have a general vision about the implications of modeling languages in model-driven projects. However, most of the model-driven community are final users of modeling languages, so their interests are focused on the applicability of languages in a domain. An agile ontological approach is needed to facilitate analysis and reasoning about the applicability of modeling languages, according to the particular characterizations of the domain being modeled.

The term agile means the real knowledge about the modeling act in accordance with information systems principles. Agile approaches consider constant improvements, short iterations, and the exchange of knowledge and experience among team members (Silva et al. 2015). Current ontological analysis proposals on models and modeling languages limit their application to specific model-driven communities, which are interested in the evaluation of modeling approaches or the promotion of specific modeling proposals. In addition, there are several information systems frameworks (not just ontological frameworks) which contribute their own individual conception of information systems. In order to promote ontologic reasonings about modeling implications, we propose an intermediate stage where a native IS neutral description can be used to classify modeling artifacts before starting the inference process with an information system ontology.

Another important advantage that an agile ontological analysis could offer to the model-driven community is its potential use to develop supporting material (orientation, guidelines, etc.) for the correct application of modeling-related practices in real contexts. Some examples of practices are the choice of language, adequate usage of tools, management of traceability information in transformation processes, etc.

5.8 Incorporating modeling language quality as a source of technical debt in MDE

Most of the proposed frameworks for quality in models act upon specific model artifacts, abstract syntax, or concrete syntax. These frameworks do not consider the implications of the activities performed in models in terms of the consequences of the good practices that were not followed. This is a critical issue because model-driven projects have the same project constraints as software projects. The only difference is the high abstract level of the project artifacts and the new roles for to domain experts and language users.

The main concern of the term technical debt is the consequence of poor software development (Tom et al. 2013). This is a critical issue that is not covered in model-driven processes whose focus is specific operation in models such as model management and model transformations. A landscape for technical debt in software is proposed in Kruchten et al. (2012) in terms of evolvability and external/internal quality issues. We think that model-driven initiatives cover all the elements of these landscapes since that authors such as Moody (2005) suggest models as elements of internal quality software due to their intermediate nature in a software development process. Researchers of the Software Engineering Institute (SEI) in Schmidt (2012) propose a further work that is related to the analysis and management of decisions concerning architecture (expressed as modeling software decisions) because it implies costs, values, and debts for a software development process. The integration between the model-driven engineering and technical debt has not been considered by practitioners of each area despite the enormous potential and benefits for software development processes.

Some of the quality issues reported in Section 3 show concerns about the consequences of model-driven applied practices (especially their formal manifestation as model-driven processes). However, unlike traditional software technical debt, the consequences of MDE activities could cover all of the abstraction levels involved, including business and organizational concerns.

The benefit of considering this challenge is twofold because this implies that model-driven processes must be formulated and formalized. In addition, a prior vision of the consequence of model-driven activities will avoid misalignments with the real application context. Most of the MDE applicability problems are generated by technical incidences in the MDE tools. The consequence of any model-driven activity should be measurable and quantified without waiting until the quality is impacted in an specific scenario.

Technical debt in model-driven contexts has begun to be considered by model-driven practitioners. An example is presented in Izurieta et al. (2015), where the authors explore the improvement of the software architecture quality through the explicit management of technical debt during modeling tasks. In the opinion of the authors, taking the technical debt into account at modeling levels enhances the value added to MDE and it also promotes the progressive adoption of modeling by regular software practitioners.

6 Conclusions

The virtue of quality does not exist per se; it depends on the subject under consideration. In MDE contexts, there are a plethora of meanings about quality in MDE as consequence of the multiple interpretations about the real scope of the MDE paradigm. This paradigm ranges from between the mere usage of conceptual models to specialized semantic forms. As a relatively young discipline, multiple conceptualizations of quality have not yet been acknowledged by model-driven communities and practitioners. The most critical consequence of this is the reported misalignment between the expectations of real industrial scenarios and the proposals that emerge from academia.

A greater number of quality concepts in model-driven projects have high abstraction level sources with concerns related to the act of modelling itself. Just as there is widespread belief that good quality models should generate software artifacts with good quality, there should be a standard conceptualization of the implications of good quality models. However, this conceptualization fails because the paradigm does not establish when something is MDE (or is in compliance with MDE). For the model-driven case, the significant impact of subjectivity generates multiple efforts and works about the quality term, most of which do not address the real expectations, constraints, and requirements of real contexts.

Through two formal literature reviews, we have shown several categories in the definition of quality for the MDE field. We have also analyzed the mismatch of quality evidence between industrial practitioners (and communities of model-driven practitioners) and academic researchers. Table 9 and Fig. 9 summarize the main findings of both reviews. Table 9 presents the categories identified in the definition of quality in MDE contexts, classifying them according to their main contribution for evaluation procedures. Sixteen definitions of quality for MDE context were detected. Figure 9 summarizes the industrial-academic/research mismatch of model-driven quality issues. One hundred and twenty-one issues were detected in the works found by grouping explicit statements that show quality problems.

Table 9 Summary of the categories definition about quality in MDE contexts (August 2016)
Fig. 9
figure 9

Global distribution of quality issues in industrial and academic/research contexts

Detected categories about quality in models are a strong basis from which to start the discussion on this topic. However, most of the MDE core features and challenges are discarded. These include the suitability of languages and their joint usage, the conformity to MDE, the management of abstraction levels, the granularity of models, etc. In Section 4, we showed how these quality issues emerge in real model-driven projects with the modeling act itself because MDE projects are constrained to business, system, and technical concerns. For this reason, we claim that the model-driven community must pay attention to the challenges formulated in Section 5 in order to derive quality initiatives with an effective impact on the practitioners of the model-driven paradigm, who mostly come from traditional software development contexts.

The diversity of MDE-compliant works and the lack of a general consensus about MDE (possibly similar to the OMG MDA initiative) produce particular definitions about quality. As Krosgtie states: model quality is still an open issue (Krogstie 2012c); it will continue to be an open issue as long as the diversity of ideas about MDE persists. None of the identified categories establish when an artifact can be explicitly considered MDE compliant. Multiple categories confirm that the term quality in models does not have a consistent definition and it is defined, conceptualized, and operationalized in different ways depending on the discourse of the previous research proposals (Fettke et al. 2012). Figure 9 shows the implicit questions from the adoption of MDE itself as being the main open issue in the perception of quality in MDE that still does not have a satisfactory response due to the lack of consensus about the scope of the definition of model-driven compliance.

The software engineering field has specific standards, efforts, and initiatives that allow practitioners to reach agreements and consensus on the quality conceptualization in software projects. However, the MDE paradigm (which is born from software engineering methods) lacks consensual initiatives due to the multiple new challenges and categories that emerge in quality evaluation in MDE. We believe there must be a comprehensive consensus that takes into account the quality evaluation in MDE by using essential principles of information systems architectures that drive modeling actions and decisions.