Keywords

1 Introduction

The potential of the organizations to develop their mission and to find new paths to innovate on an increasingly competitive market is mainly grounded on data. Due to this fact, organizations are becoming more and more conscious that the better the data, the higher the benefits they can obtain. As an example of benefits, a better economic performance can be cited. It stands to reason that enough resources in deploying solutions shall be invested. These solutions will be aimed to achieve proportional data quality levels according to both intended and future uses of data.

Hence, ensuring data quality is a task which must: (1) be planned well enough in advance; (2) consider clear objectives aligned with organizational strategy; (3) assign adequately qualified human, and sufficient materials and economic resources. Only then, commensurate results with organization potential can be guaranteed. This assurance of data quality levels must be achieved by implementing integrated data management, data quality management and data governance programs.

To facilitate software processes improvement to organizations, there are alternatives based on de iure and de facto standards like COBIT [1], CMMI [2], ISO/IEC 15504 [3], ISO/IEC 33000 [4] … unfortunately, they do not specifically address low levels of data quality concerns, and it is not easy to use them directly as regards working with data management, data quality management and data governance disciplines. However, in recent times, new process-oriented initiatives (DMM [5] or ISO 8000-60 [6]) emerged to cope with these disciplines. After a detailed study, we can conclude that DMM had two important problems: its application is not easy and it is focused primarily on financial domain. On the other hand, we posed that because of its general purpose, ISO 8000-6X is easier to apply and use, although it does not explicitly cover neither data government processes aspects and it nor does fully address data management processes.

To fill this gap, and as a main result of our research we have developed the Alarcos’ Model for Data Improvement (MAMD stands for “Modelo Alarcos de Mejora de Datos” in Spanish). Our objective was to create a framework that allows organizations to develop continuous improvement projects based on PDCA cycle to progressive implantation of improvements to obtain a best performance of data. MAMD consists of two main components:

  • A process reference model that extends ISO 8000-61 [7] with data governance processes and some data management processes.

  • An assessment and improvement model based on ISO/IEC 33000 [4]. We decided to ground our proposal on ISO/IEC 33000 due to the lack of specific and standardized works in the area.

The main contribution of this paper is the presentation and description of the MAMD models. This paper is structured as follows: Sect. 2 presents related works, Sect. 3 presents MAMD framework, Sect. 4 has some conclusions obtained as result of this paper and introduce some future lines of work that we consider necessary to improve MAMD. Lastly, we include some acknowledge and references.

2 Related Works

This section is to show related works with the main content of our proposal. This implies:

  1. 1.

    To provide an overview of the assessment and improvement process models.

  2. 2.

    To compare the various existing process reference models to identify processes that will be part of the process reference model of MAMD.

A maturity model can be understood as a tool used to organize a set of elements ordered according to a given criterion [8]. In the domain of this work, the criterion is related to organizational maturity in respect of guarantee the success of business processes with regard to data quality management, data management and data governance.

The first researcher to apply the concept of maturity model in the field of computer science was probably Humphrey in 1987 [9]. He used it to explain organizations; how to have more capable processes in order to produce high quality software. Specifically, in data quality domain, English was the first one to apply the maturity concept to data management at the same time as he included the notion of “data quality” in [8]. Since then, there has been many works related to data management that try to address this issue. Following subsections will go deep into such data quality management maturity models.

2.1 Scope of the Existing Data Maturity Models

Regarding the scope of “data management practices” [10], it is easy to see how the evolution of the field has found data quality management and data governance. By the end of the twentieth century, organizations began to be aware of the need for data quality. It is difficult to provide a data quality definition because of multiple interpretations of the concept. In [11], professor Wang establishes a data quality definition as “fitness for use”, and this definition has been widely used all over the last year as reference to the development of research works on the data quality management area. Nonetheless, soon, organizations realize that Data quality management needs an integrative support from high management. The concept of data governance was presented for the first time in the middle of the previous decade. Their objective is to align the data strategy to the organizational business strategy, what implies to invest the necessary efforts to carry out data management and data quality management [12, 13]. Figure 1 shows data management’s evolution over time since 1950 to present.

Fig. 1.
figure 1

Adapted from Aiken et al. [10] by using Trends.google.com

The three mentioned disciplines are not on the focus of all the existing frameworks and currently only DMM [5] and MAMD - that is to be presented in this paper - address the three disciplines as it will be shown below. However, it is possible to find: (i) maturity models whose purpose is address only one of the three disciplines, as English [8], Caballero et al. [14, 15], Ryu et al. [16] or Baskarada [17] and (ii) frameworks that are not presented as a maturity model and include the three disciplines, like DAMA [13].

Along this work, “data maturity model” term is going to be used to refer to all maturity models that integrate data management, data quality management and data governance.

2.2 Frameworks Considered as Basis

Considering that the idea of maturity models was firstly applied to software processes, and up to now some software process maturity models have been developed, it makes sense that research work on data maturity models have used these models as a base.

A framework which is used as reference not only provides a structure to process reference model, but also other necessary components as an assessment methodology and an improvement model. CMMI [18] provides a process reference model that can be used with SCAMPI [19] or CBA-IBI [20], while ISO/IEC 15504 [21] provides an assessment model, including criteria that represents a maturity model and an assessment model that can be used with ISO 12207 [22].

In this sense, the process reference model, which has inspired most of the data maturity models is CMMI. The two representations of CMMI – staged and continuous – have been used in various proposals. To mention a few of them: IQM3 [15] is presented as a staged model, while IQMM [17] or recently DMM [5] are described as continuous models.

ISO 8000-6x project [23] includes a process reference model (ISO 8000-61) and a maturity model (ISO 8000-62) structured according to the established principles in ISO/IEC 33000.

Furthermore, it is noteworthy to mention the model proposed by Pierce et al. in [24] that is based on COBIT 4.1. Additionally, it is necessary to highlight the fact that many authors in the field of data quality use “data” and “information” as synonyms.

2.3 Existing Models Classification

To present the works in this area, they have been grouped against two criteria, reference framework and scope. In scope, there are three possible values: {“data management”, “data quality management”, “data governance”}, while in the reference framework the next values are been classified: CMMI, ISO/IEC 15504, COBIT and others. Table 1 gathers this classification.

Table 1. Data maturity models classification according to their scope.

Table 2 presents data maturity models classification according to the reference framework used.

Table 2. Data maturity models classification according to the reference framework used.

3 MAMD, the Alarcos’ Data Improvement Model

The MAMD framework is based on three aforementioned disciplines: data management [25], data quality management [11, 26] and data governance [27]. They are strongly dependant one from the others. This dependence is observed by [28] - where is revealed that the actual investigation in data quality involve the obvious need of adding certain governance, management, and technical aspects. The description of the three disciplines is showed below:

  • Data governance is aimed to design and implement data management and data quality strategies, which allows the alignment of data strategies to business organizational strategies. Such strategies are implemented as organizational policies. This will give support to the business needs by providing the necessary resources to both areas and monitoring the use of the resources regarding the strategic objectives of the organization.

  • From our perspective and for the sake of simplicity, we consider that data management implements and maintains a technological data infrastructure that must support business requirements. The requirements will be expressed through the data management policies. Likewise, the specific data quality requirements and their management shall be supported by the technological infrastructure.

  • Data quality management implements and maintains a data quality organizational culture that shall produce, maintain, perform, and communicate data quality management good practices that must be applied by data management. The actions previously mentioned shall satisfy the data quality specific requirements that ensure the organization processes success.

In order to bring to reality not only the main outcome of the three disciplines, but also the dependency between them, the Process Reference model is introduced as a way to depict what organization could do rather than specifying what organization has to do.

3.1 Process Reference Model

According to the stated in clause 5.3.1 of ISO/IEC 33004 [29], a process reference model (PRM) is defined as a set of processes that can collectively provide support to the organizational processes. The process reference model of MAMD is composed by 18 processes grouped around the three disciplines: data management, data quality management and data governance. These processes have been identified by mapping ISO 8000-61, DMM, COBIT, and DAMA (see Table 3 for a mapping between ISO 8000-61, MAMD and DMM.)

Table 3. DMM and ISO 8000-61 processes mapped to MAMD processes.

The process reference model is shown below:

Data Management Processes (DM)

  • DM.1. Data requirement management. This process aims at collecting and validate requirements referral to necessary data to manage the organization successfully.

  • DM.2. Technological infrastructure management. The goal of this process is to specify and maintain the necessary technological infrastructure to support data meaning shared between applications.

  • DM.3. Historical data management. The process addresses how to maintain and perform necessary policies to organizational historical data management.

  • DM.4. Data security management. This process is aimed to define and enable mechanisms to make possible confidentiality, integrity, accessibility or availability, authenticity, non-repudiation, consistency, isolation, and data audit.

  • DM.5. Configuration management. The process addresses how to define the processes by which an organization demand, determines, approves, and implements the reachable plans and evaluates the changes of data lifecycle.

  • DM.6. Master data management. This process is aimed to identify the relevant concepts to organization business domain and the organizational data strategy alignment around these master data.

  • DM.7. Data design. The goal of this process is to develop a consistent data model, complete, comprehensive and extensible that covers the data requirements of all organizational units. In addition, the data model shall be aligned to the organizational data strategy.

  • DM.8. Data sources and data targets establishment. The process addresses how to identify and characterize each data sources and destinations used in original business processes, as well as the agreements and interactions with providers and customers.

  • DM.9. Data integration. The goal of this process is to ensure data integrity through flow control and relationships with transferred data to application systems or data bases.

Processes related to Data Quality Management (DQM)

  • DQM.1. Data quality measurement. This process is aimed to establish necessary resources to satisfy requirement, and measure quality levels according to measurement criteria.

  • DQM.2. Data quality improvement. The goal of this process is to implement a continuous improvement cycle based on PDCA model to data improvement in organizational repositories.

Processes related to Data Governance (DG)

  • DG.1. Data strategy establishment. The process addresses how to identify and prioritize data management objectives, and work according to these prioritization to give support to the corporate strategic objectives.

  • DG.2. Data lifecycle management and data value. The goal of this process is to identify the importance degree of data have to different business processes in corresponding stages.

  • DG.3. Standards, policies and procedures definition. This process is aimed to establish those standards, policies, good practices and procedures to data management, data quality management and data governance to support as better as possible the data quality strategy.

  • DG.4. Human resources management. The process address how to manage needs adequately to required specific formation to the human resources specifically destined to data management, data quality management and data governance.

  • DG.5. Financial resources management. The goal of this process is to develop plans for financial resources provisioning and maintaining that can give support to organizational data strategy.

  • DG.6. Data organization strategies monitoring. This process is aimed to develop and measure key indicators for monitoring the achievement of data management strategy and check that it is being actually aligned with the organizational data strategy.

  • DG.7. Change management in data strategy. The goal of this process is to maintain coherently organizational data strategy according to the evolution of corporate strategic objectives.

3.2 Process Assessment Model

The purpose of a data quality management maturity assessment is to understand and assess how well the organizational processes address the requirements identified by the data quality management process reference model specified by ISO 8000-61.

ISO 8000-61 identifies needs that are covered by the data quality management process reference model. To evaluate data quality management maturity in the organizations is necessary to understand and to assess the processes efficacy to cover them.

Process Capability Levels and Process Attributes.

As stated in ISO/IEC 33020 [30], process capability is defined on a six point ordinal scale that enables capability to be assessed from the bottom of the scale, incomplete, through the top end of the scale, innovating. The scale represents increasing capability of the implemented process – from failing to achieve the process purpose through continually improvements.

ISO/IEC 33020 defines process capability on a six point ordinal scale. The scale starts on level 0 labelled as “incomplete” and ends on level 5 labelled “innovating”. Also, the scale represents capability of the implemented process.

To compute the process capability level is necessary to observe and assess the evidence of the achievement of the process attributes. For a detailed description of the full meaning of the process capability and the process attributes can be consulted in clause 5.2 of ISO/IEC 33020.

To calculate the process capability level is necessary to assess and observe the evidence of the achievement of the process attributes. The meaning of the process attributes and the process capability are described in ISO/IEC 33020. Table 4 summarises the processes attributes and capability levels that have to be achieved. Note that achieving the next level involves obtaining own level and above.

Table 4. Capability levels and process attributes.

Rating Process Attributes and Process Capability.

Rating a process attribute consists of a judgement of the extent to which a specific process attribute has been achieved for the assessed process. A process attribute (PA) is a measurable property within this process measurement framework of process capability. The capability levels and process attributes are described in ISO/IEC 33020 in clause 5.2 and the ordinal scale for rating capability levels are described in clause 5.3. In Table 4 the capability levels and process attributes, and in Table 5 the corresponding values and the ordinal scale are shown. Because of length paper restrictions, we have not include the way to develop how to compute the assessment indicator as ISO/IEC 33004 requires (Table 6).

Table 5. Ordinal scale for rating capability levels.
Table 6. Ordinal scale for rating capability levels.

Hence, when an organizational business process is to be assessed with regard to the data quality management, assessors shall investigate on an evidence-basis how much data quality management processes from the data quality management process reference model are achieved. As a result, it can be stated that one specific organizational process is capable of addressing the data quality management process with the level indicated by the ordinal.

3.3 Maturity Model

In the context of data quality management provided in this paper, a maturity level indicates how well an organizational unit’s business process achieves the goals required for data quality management processes by using the resources provided by the organization.

The processes identified for each maturity level have been included by different criteria: priority of the processes for the business, relevance of the processes in other models, complexity, and necessary resources. The maturity levels that are proposed in MAMD, together with their meaning and the processes that are included are detailed below:

  • Maturity level 0 or Immature: the organization cannot provide evidence about the effective implementation of good practices addressed by the process reference model. Therefore, there are no guaranties that their data is being used adequately.

  • Maturity level 1 or Basic: the organization can evidence that it uses a set of good practices oriented to provide the minimum support necessary to the data management required to successfully support their business processes. Nevertheless, no special attention is given to data governance and data quality.

  • Maturity level 2 or Managed: the organization can evidence that uses a set of good practices oriented to guarantee that the data used in business processes are aligned to organizational strategy. Consequently, there are guarantees that the organization has implemented the minimum necessary data governance processes to ensure the success in their business processes.

  • Maturity level 3 or Established: the organization can evidence that it uses a set of good practices oriented to data quality management to guarantee that data used in their business processes have adequate quality levels.

  • Maturity level 4 or Predictable: the organization can evidence that it uses a set of good practices oriented to monitoring that organizational data strategies are really effectives.

  • Maturity level 5 or Innovating: the organization can evidence that it uses a set of good practices oriented to guarantee that organizational data strategies are evolving. An organization will be said to be at maturity level 5 when it monitors their data strategies and it executes the following processes of process reference model. This processes are oriented to update data strategies to improve known defects and also can be used to improve the global performance.

The maturity level is calculated based on the capability level of processes on the process reference model included in the evaluation. The capability level is calculated considering the degree of institutionalization of good practices and process attributes described in ISO/IEC 33020.

To calculate the capability level of this processes the different kind of evidences shall be inspected and it will be recollected to each business processes instances that have been chosen to make the evaluation. As result of the capability level a classification will be obtained. The classification for each one of the process attributes according to ISO/IEC 33020 is: “Not Achieved (N)”, “Partially Achieved (P)”, “Full Achieved (F)”, and “Largely Achieved (L)”.

To make the improvement, the objective of the organization will be to achieve the best and the most suitable level of organizational maturity. This implies to progressively implement and improve the requirements of the capability level for the processes in the process reference model of MAMD. The objective is to guarantee better quality levels to organizational processes.

4 Conclusions and Future Work

It is important to realize that the introduced components of MAMD and their relationship meet the requirements of ISO/IEC 33004 and ISO/IEC 33020 for a maturity model.

On the other hand, we have found that the implementation of MAMD can really bring benefits to the organizations, such benefits resulting from working with data that have adequate levels of quality. We are currently working in the application of MAMD to several study case to refine the model from lesson we are learning.

In the future, we want to quantitatively establish to what extent the improvement of the level of data management maturity, data governance and data quality management poses a clear advantage for organizations.