Supporting the analyzability of architectural component models - empirical findings and tool support

Stevanetic, Srdjan; Zdun, Uwe

doi:10.1007/s10664-017-9583-4

Supporting the analyzability of architectural component models - empirical findings and tool support

Published: 29 March 2018

Volume 23, pages 3578–3625, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Software Engineering Aims and scope Submit manuscript

Supporting the analyzability of architectural component models - empirical findings and tool support

Download PDF

407 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

This article discusses the understandability of component models that are frequently used as central views in architectural descriptions of software systems. We empirically examine how different component level metrics and the participants’ experience and expertise can be used to predict the understandability of those models. In addition, we develop a tool that supports applying the obtained empirical findings in practice. Our results show that the prediction models have a large effect size, which means that their prediction strength is of high practical significance. The participants’ experience plays an important role in the prediction but the obtained models are not as accurate as the models that use the component level metrics. The developed tools combine the DSL-based architecture abstraction approach with the obtained empirical findings. While the DSL-based architecture abstraction approach enables software architects to keep source code and architecture consistent, the metrics extensions enable them, while working with the DSL, to continuously judge and improve the analyzability of architectural component models based on the understandability of their individual components they create with the DSL. Provided metrics extensions can also help in assessing how much each architectural rule used to specify the DSL affects the understandability of a component which enables for instance finding the rules that contribute the most to a limited understandability. Finally, our approach supports change impact analysis, i.e., the identification of changes that affect different analyzability levels of the component models. We studied the applicability of our approach in a case study of an existing open source system.

Software architectural patterns in practice: an empirical study

Article 03 December 2018

Zones of Pain: Visualising the Relationship Between Software Architecture and Defects

Developing a Conceptual Framework for Software Evolution Methods via Architectural Metrics

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the process of software systems development software architecture represents a key artefact that affects all later activities such as design and implementation and plays a crucial role in achieving the desired software qualities (Losavio et al. 2003). Software architecture focuses on a high level view of a software system and it is defined as : “the structure or structures of the system, which comprise software components, the externally visible properties of those components, and the relationships among them” (Bass et al. 1998).

According to the software architecture community, an architectural description can comprise multiple views concentrating on one of many system concerns, such as logical, implementation, deployment, process, or architectural knowledge view, and from the viewpoint of different stakeholders, such as end-users, developers, project managers, and business analysts (Kruchten 1995; Clements et al. 2003). Architectural component and connector models (or shortly component models), that are part of the implementation view, are frequently used as a central view of the architectural descriptions of software systems (Clements et al. 2003). Component models represent high-level abstractions of the system implementation and are often considered to contain the most significant architectural information (Clements et al. 2003). In this view, components could refer to different system entities such as processes, objects, clients, servers, data stores, modules, subsystems, etc., while connectors represent the interaction mechanisms between components (Clements et al. 2003). In this article, we consider a component more in the sense of software modules by adopting the definition of Clemens et al., i.e. a component represents an implementation unit of software that provides coherent unit of functionality at the first level of decomposition in the system (Clements et al. 2003). This definition is adopted because our work focuses on the understandability of component models which mainly relates to understanding a functional decomposition of the system and the effect of modifying the system functionalities, i.e. the impact analysis. Please note that component decomposition can be made independent of the functionality type implemented in a component. For example, a decomposition can consider both technical functionalities (e.g. components for file-access or network connection) and business functionalities (e.g. components for savings or accounts). Since a component in a component model represents a high-level abstraction of the entities in the source code of the system, it can be broken down into (i.e., is refined by) more fine-grained, technical components or classes that realize the component in the technical design or implementation of the system. In the context of object-oriented software systems that we focus on, a component usually groups a set of source code classes and/or packages with similar functionalities, while a connector could represent any kind of dependency between classes like method calls, fields access, etc.

Understandability is one of the most important characteristics of software quality (Pacione et al. 2004). The difficulty of understanding the software system limits its reuse and maintenance. Boehm defined software understandability as a feature of software quality which means ease of understanding software systems (Boehm 1978). In the context of component models, understandability refers to understanding the functionalities of individual components together with the functional relatedness among them (Dugerdil and Niculescu 2014). Understandability is a critical aspect for the component models, as their main purpose is to “ ... enable designers to abstract away fine-grained details that obscure understanding and focus on the “big picture:” system structure, the interactions between components, ...” Oreizy et al. (1999). This, however, is not possible if the given models themselves and/or the links to other design and code artefacts are hard to understand.

In our previous work (Stevanetic and Zdun 2016), we examined the relationships between the effort required to understand a component, measured through the time that participants spent on studying a component, and the hierarchical quality metrics originally designed to assess the understandability of the modular design of an object-oriented software system (Hwa et al. 2009). Those metrics refer to 6 design properties found to have an impact on the understandability of the modular design of a system: size, complexity, encapsulation (i.e. information-hiding), coupling, cohesion and modular abstraction. In the same study, we have further examined the impact of personal factors (i.e. the participants’ experience and expertise), and compared the efficiency of both personal and system related factors (metrics) with the prediction models obtained in our previous studies (Stevanetic and Zdun 2014a, 2014b). In another study reported in a position paper (Stevanetic et al. 2014), we presented a tool for supporting software evolution by integrating a DSL-based architecture evolution approach with our empirically evaluated understandability metrics. In this article, we provide: 1) an extended description of the results obtained in our previous work (Stevanetic and Zdun 2016) consisting of more detailed description of the studied metrics and applied statistical techniques as well as more detailed explanations and discussions of the obtained results, 2) a new metric for measuring the analyzability of component models based on the integration of our empirical evaluations and the existing work on the analyzability related metrics proposed by Bouwers et al. (2011), and 3) significant tool extensions compared to our previous work reported in a position paper (Stevanetic et al. 2014) including the realization of the new analyzability metric by supporting how much each of the architectural rules used to specify a DSL-based architectural abstraction specification contributes to the understandability of components and enabling change impact analysis, i.e. the identification of changes in the system that affect different analyzability levels of the component models.

The results of our empirical analyses show that the hierarchical understandability metrics can predict the understandability with high practical significance. On the one hand, the obtained prediction models are significantly better then the models obtained using the graph based metrics (examined in Stevanetic and Zdun 2014a), the package based metrics (examined in Stevanetic and Zdun 2014b) or the models that use the participants’ experiences as predictors. On the other hand, those models are not significantly different or worse in prediction from the models that combine both the system related metrics (the graph based, package based, and hierarchical understandability metrics) and the participants’ experiences. This means that, from all studied predictors, the system related metrics (i.e. the hierarchical understandability metrics) are enough to consider for the prediction. We also find that the participants’ experiences are important and can predict a significant amount of variance in the data but the obtained models are not as accurate as the models that use the metrics related to the software system itself (concretely the hierarchical understandability metrics). Regarding the tool support, we demonstrate in a case study how it can be used to create component models with appropriate analyzability level by incrementally improving an initial component model of the system. In addition, we show how the tool can be used for change impact analysis, i.e for detecting the changes that exist between different component models that affect their different analyzability levels.

This article is organized as follows: In Section 2, we discuss the related work. In Section 3 we describe the study design. Section 4 describes the statistical methods we applied and the analysis of our data. In Section 5 we discuss the threats to validity. Section 6 describes the tool we developed together with a case study on how the tool can be utilized in a practical context. In Section 7 we conclude and discuss future directions of our research.

2 Related work

So far very few studies investigate the empirical evidence on the architectural understandability. One of them examines the influence of package coupling on the understandability of the software systems (Gupta and Chhabra 2009), while another one examines the relationships between some package-level metrics and package understandability (Elish 2010). None of the studies examines the understandability of architectural components. In this section we discuss the existing works in several fields closely related to our work.

2.1 Measuring the understandability

In the work by Patig (2008) the variables and tasks that have been proposed by cognitive psychology or applied in computer science to test understandability are extracted. Those variables and tasks are summarized in Fig. 1 and they represent a theoretical framework for investigations on understandability. The variables have been theoretically justified by the authors who used them. In our case, the independent variables represent the metrics that we collected (in the work by Patig they are related to abstract/concrete syntax and therefore this part of the figure is adapted from the original one). The dependent variable in our case is the understandability of components. As we see from Fig. 1, different measures can be used to quantify the dependent variable(s) such as frequency (the number of correct answers), selection (which of several answers participants choose), response latency (how quickly participants reacts), response duration (how long participants deal with a task), and amplitude (measuring the strength of response, i.e. brain activities in performing a task). In our case we measured the correctness of the answers and the time that participants spent on resolving the questions. Regarding the comprehension tasks the participants of an experiment need to answer an appropriate set of questions. If the questions are related to the syntax of the model (constructs of the model) the task is called syntactic. If the questions are related to the understanding of the context described the task is called semantic. Both of these two types of tasks are related to surface level understanding. In problem-solving tasks that address deeper understanding participants have to resolve whether and how certain information can be extracted from a model. In our case the problem-solving tasks are more suitable because the participants have to understand not just the component models themselves, i.e. how the components interact in the model, but also the relations between them and the concrete system implementation. Modelling tasks are used more for measuring the general ease of use of some notation and therefore they are not suitable for our case.

In the work by Patig all proposed dependent variables are externally measured in terms of using some external means like the time that participants spent on answering the questions or the percentage of the correct answers on those questions. Beside the external means it is also possible to use the participants subjective ratings in the measurement process. In the context of model understandability Moody proposes three ways how to assess understandability: the model user’s rating of model understandability, the ability of users to interpret the model correctly, and the model developer’s rating for model understandability (Moody 1998). The first and the third way are based on the subjective ratings of users/developers. However, Lindland et al. explain that the ability of model users to interpret the model correctly is the best operational test whether the model is actually understood rather than whether it is understandable (Lindland et al. 1994; Moody 1998).

2.2 Architecture and design metrics and their empirical evaluations

There exist plenty of software metrics for measuring the system’s architecture, architectural components, and other high level software artefacts and structures (packages, modules, graph-based structures). For example, metrics related to components and component models measure different attributes like size, coupling, cohesion, and dependencies of components as well as the complexity of the whole component models (Sharma et al. 2009; Sartipi 2001; Sengupta et al. 2011). Regarding the software packages, different metrics that measure size, coupling, stability, and cohesion are proposed (Elish 2010; Gupta and Chhabra 2009; Martin 2003; Gupta and Chhabra 2012). Graph-based metrics measure the complexity of interactions between the graph nodes (Bhattacharya et al. 2012; Ma et al. 2006; Allen et al. 2007). Certain graph-based metrics are evaluated to be useful for measuring large scale software systems that are observed to share some properties that are common for complex networks across many fields of science (Ma et al. 2006). Most of the above given metrics lack of the links to the quality attributes. Stevanetic and Zdun (2015) present a systematic mapping study on software metrics related to the understandability concepts of software architectures with regard to their relations to the system implementation. In this article and the previous ones that empirically investigate the understandability of components, the examined metrics are chosen from the given mapping study and tested in the given context.

There exist several studies that empirically evaluate metrics. In contrast to our work, they usually evaluate the usefulness of a metric for its proposed purpose, but do not test the relationships of specific metrics, as in our case the prediction of the understandability using predictor metrics. Also, none of the studies focuses on architectural component models. Among many others, Basili et al. evaluate object-oriented design metrics as quality indicators (Basili et al. 1996). Albrecht and Gaffney provide one of many examples for a study on development effort metrics (Albrecht and Gaffney 1983). Similarly to our work Moody presents an empirical evaluation of the use of data model quality metrics (Moody 2003). In this approach a broad set of quality metrics is investigated. The result obtained is that only a few of these quality metrics have an influence on the quality as perceived by the model users. These are the system complexity, the number of data items duplicated in existing systems, the development cost estimation, the reuse percentage, and the number of defects by quality factor.

2.3 Understandability of UML models and process models

There exist a variety of studies in the literature that examine the understandability of different UML models. Some of them examine the layout or visualization aspects of UML models. Purchase et al. (2001) show that certain visualizations are better than the other depending on the kind of comprehension tasks that is used. Criteria and guidelines of how to create effective layout for UML class and sequence diagrams are established in the work by Sun and Wong (2005). They are based on perceptual theories.

Some other studies related to UML model understandability compare the effect of using different UML diagram types (e.g., sequence and collaboration diagrams). For example, Otero and Dolado take different UML diagrams types, sequence, collaboration, and state diagrams, and evaluate the semantic comprehension of the diagrams when used for different application domains (Otero and Dolado 2004).

Some authors investigate the styles and rigor in UML models and how they affect the understandability of the models. For example, Briand et al. (2005) investigate the impact of using OCL (object constraint language) in UML models on defect detection, understandability, and impact analysis of changes. They find that the benefits for the individual activities are modest but the overall benefits of using OCL on the aforementioned activities are significant. None of the aforementioned studies examine the understandability of architectural components, the central high level organizational units of the architectural descriptions of software systems.

The work in the field of process model related metrics emphasize the importance of model characteristics for assessing model understandability. Such metrics measure structural properties of a process model, motivated by prior work in software engineering related to lines of code, cyclomatic number, or object-oriented metrics (McCabe 1976; Chidamber and Kemerer 1994; Fenton and Pfleeger 1998). Soo and Jung-Mo (1992), Nissen (1998), and Morasca (1999) focus on defining metrics. Different metrics have been also validated empirically. Cardoso adapts the cyclomatic number metric for business processes (called it control-flow complexity (CFC)) and proves the correlation of the metric with perceived complexity of process models (Cardoso 2006). Canfora, Rolon, and Garcia analyse understandability as an aspect of maintainability using different metrics of size, complexity, and coupling in their experiments. They identify several significant correlations (Canfora et al. 2005; Aguilar et al. 2007). Some other metrics are related to cognitive research, e.g. Vanderfeesten et al. (2008), and based on concepts of modularity, e.g. Vanhatalo et al. (2007) and van der Aalst and Bisgaard Lassen (2008).

Different empirical validations in the field of process models clearly show that size is an important model factor for understandability, but does not fully determine phenomenons of understanding. It means that additional metrics like structuredness can help to improve the explanatory power significantly (Mendling 2008). In our case, we examine the effect of different metrics, that measure more/less the same concepts as those mentioned for process model understandability (size, coupling, complexity), on understandability of components’ functionalities implemented by the corresponding set of source code classes. We also show that the size is not enough to fully determine the understandability and additional properties need to be taken into account. Similar to our work, Reijers and Mendling (2011) investigate the impact of personal and model related factors on understandability of process models. They show that expert modelers perform significantly better and that the complexity of the model affects understanding. A combined regression model is calculated that permits preliminary conclusions on the relative importance of both groups of factors. They find that personal factors (theoretical knowledge, practical experience, educational background) have a stronger explanatory power in terms of adjusted R² than model related factors but they kept the size of the models constant by intentionally selecting models of equivalent size. We also find that the participants’ experiences are important as well as the system related metrics but in contrast to the work by Reijers and Mendling, we find that the system related metrics have a significantly stronger explanatory power and even alone can be used for the prediction, i.e. combining them with the experiences does not produce a stronger explanatory power. Furthermore, we take into account the size. Also, all our participants are students and we do not consider experts from industry as it is the case in the previous study.

2.4 Software quality models

To assess design quality different object-oriented software quality models have been proposed and validated in the literature (Chidamber and Kemerer 1994; Bansiya and Davis 2002; Genero Bocco et al. 2005; Harrison et al. 1998; Basili et al. 1996). In those models, software quality is assessed using several software metrics that are used to quantitatively assess design properties such as coupling and cohesion. But those models are insufficient to manage understandability in the high level system representations such as module-view, package-view, or component-view because they capture a software system as the set of classes and their relationships, but not the set of modules, packages or components and their relationships.

Contrary to the given quality models, Bansiya and Davis (2002) proposes a hierarchical quality model for object-oriented design quality assessment (QMOOD) which is able to assess understandability of a system. Their model extends Dromey’s quality framework used for building product based quality models (Dromey 1995; Dromey and McGettrick 1992). However, QMOOD can only consider the dependencies between classes in a module without considering the dependencies between classes of different modules as well as a module hierarchy and therefore cannot assess the quality of modular design properly. Sarkar et al. (2008) examine different metrics that can be used to assess modularization quality of a large-scale object-oriented software system. But the authors do not provide relationships between their metrics and the high-level quality attributes. Therefore more investigations are necessary to establish the links between those metrics and high-level quality attributes. Hwa et al. (2009) propose a hierarchical model to assess understandability of modularization in large-scale object-oriented software. They define several design properties, which capture the characteristics influencing on understandability, and design metrics based on the properties, which are used to quantitatively assess understandability. In this article, we use the concepts and metrics defined in the work by Hwa et al. to improve the explanatory power of our previously obtained models on understandability of architectural components.

2.5 Other aspects related to architectural component models

Even though there is a lack of empirical studies on architectural component models understandability, other aspects like fault density and reuse of components have been studied before. In the work by Fenton and Ohlsson the relations between fault density and component size are examined (Fenton and Ohlsson 2000). Mohagheghi et al. use the historical data on defects, modification rate, and software size to investigate the comparison between software reuse and defect density and stability (Mohagheghi et al. 2004). Malaiya and Denton study the factors that can be used to determine the “optimal” component size with regard to fault density (Malaiya and Denton 2000). They identified component partitioning and implementation as influencing factors. Graves et al. examine the software change history of components in order to create a fault prediction model (Graves et al. 2000). Metrics such as change times, time elapsed since the last changes, and number of changes are used in the model, while size and complexity metrics are not deemed useful. These and similar studies have in common with our one that a link between software quality or desired properties, such as fault density or reuse rate, and component properties, such as size, complexity, or change rate, are made. These studies are different from our one as they examine aspects that can be studied without considering the human participants: They only analyse aspects that can solely be studied using the software systems and their historical data.

A number of authors propose ways to improve the understandability of architectural models through additional models or documentation artefacts. A major research direction deals with documenting architectural decisions and architectural knowledge in addition to component models (Babar and Lago 2009; Jansen and Bosch 2005; Zimmermann et al. 2007). Another major research direction deals with architectural views (Clements et al. 2002; Hofmeister et al. 2000; Kruchten 1995) which enable different stakeholders to view the architectures from different perspectives. Both research directions only complement component models with additional knowledge, but neither of them studied the understandability issues of component models with regard to their relations to the system implementation.

2.6 Architecture abstraction and evolution

There exist several approaches that support the abstraction of the architecture from other system artefacts as well as the architecture evolution. Here, we discuss some of those approaches that are closely related to the approach used in our tool.

Konersmann et al. (2013) describe the ADVERT approach that provides support for software evolution on an architectural level. Their approach is based on two ideas: (1) Maintaining trace links between requirements, design decisions, and architecture elements, and (2) explicitly integrating software architecture information into the code. Contrary to our approach the ADVERT approach assumes that the architecture already exists (is built from the design solutions) and it does not provide architecture level quality checks. Another approach that focuses on architecture evolution is proposed by Barnes et al. (2014). They support the modelling of different evolution paths and allow reasoning about architecture evolution based on these different paths. Cuesta et al. (2013) extends the approach by Barnes et al. by proposing the documentation of architecture evolution using architectural knowledge. These approaches are more focussed on reasoning about architecture evolution while our approach aims at supporting architecture evolution in order to evolve source code and architecture documentation in a synchronized fashion, allowing at the same time architecture quality evaluation.

There exist several approaches that focus on the automatic creation of source code abstractions using automatic clustering. The comparison and review of those approaches and the corresponding clustering measures can be found in the work by Maqbool and Babri (2007). They define a number of clustering algorithms groups and compare their performance using different open source projects. The results show which approach works good for which application but no conclusions regarding the overall effort necessary to correct the automatic clustering are drawn. Contrary to all these approaches our DSL-based approach is semi-automatic, enables the checking of design constraints during the abstraction process, provides traceability between source code and models and focuses on the evolution of the architecture (having an “up-to-date” architecture that reflects the source code) rather then the recovery of architecture. Also, our approach provides quality checking of the generated architectural abstractions based on the corresponding empirical evaluations.

Egyed (2004) proposes an approach for model abstraction based on traceability information and abstraction rules. The author identified 120 abstraction rules for the example of UML class models, which need to be extended with a probability value because the rules may not always be valid. Our approach is based on architectural abstraction specifications that enable creating architectural models on different levels of abstraction, starting from the system implementation.

3 Empirical study description

For the planning of our study, data collection, and analysis and interpretation of the results, we have followed the experimental process guidelines proposed by Kitchenham et al. (2002). In particular, for the planning phase, the next guidelines are followed: experimental context setting guidelines (examining the related work, defining hypotheses, and considering the circumstances in which an empirical study takes place) and study design guidelines (defining the population of the study, administering the treatments, considering the methods for reducing bias). For data collection, and the analysis and interpretation of the results, the next guidelines are followed: data collection guidelines (defining measures used in the study, ensuring their accurate calculation, considering which data should be excluded), analysis guidelines (choosing the appropriate statistical techniques, performing the data sensitivity analysis), interpretation guidelines (defining the population and the circumstances for which the results apply, specifying study limitations and threats to validity).

3.1 Goals

As mentioned above, this article aims at further elaborating on the concepts and metrics related to the empirical evaluations of the understandability of components that we studied in our previous work. Namely, we examine the usefulness of the hierarchical understandability metrics proposed in the work by Hwa et al. (2009) as well as the participants’ experience and try to improve the prediction efficiency of our previous prediction models.

In the following couple of paragraphs we provide the notation and the definitions of the metrics we used in our previous work as well as the metrics from the discussed hierarchical model.

The metrics that we studied in our previous studies include: metrics adapted from the corresponding package level metrics defined by Martin (2003) (studied in Stevanetic and Zdun 2014b) and metrics on graphs that have been previously defined by Allen (2002) and Allen et al. (2007) (studied in Stevanetic and Zdun 2014a).

The metrics adapted from the package-level metrics defined by Martin are shown in Table 1. The first three metrics are adapted from the corresponding package level metrics (number of classes for a package, package afferent coupling and package efferent coupling) defined by Martin (2003). We consider the dependencies between the components in terms of the dependencies between the classes while in the work by Martin the dependencies between packages are considered through the number of packages that are related to the given package^{Footnote 1}. The first three metrics characterize the coupling and the size of a component and the fourth metric is introduced to model the internal complexity of the component in terms of the number of dependencies between classes within a component.

Table 1 Metrics adapted from the package level metrics defined by Martin (2003)

Supporting the analyzability of architectural component models - empirical findings and tool support

Abstract

Similar content being viewed by others

Software architectural patterns in practice: an empirical study

Zones of Pain: Visualising the Relationship Between Software Architecture and Defects

Developing a Conceptual Framework for Software Evolution Methods via Architectural Metrics

Explore related subjects

1 Introduction

2 Related work

2.1 Measuring the understandability

2.2 Architecture and design metrics and their empirical evaluations

2.3 Understandability of UML models and process models

2.4 Software quality models

2.5 Other aspects related to architectural component models

2.6 Architecture abstraction and evolution

3 Empirical study description

3.1 Goals

3.2 Variables

3.3 Hypotheses

3.4 Study design

3.4.1 Subjects

3.4.2 Objects

3.4.3 Instrumentation

Architectural documentation about the Soomla Android store system

Source code access

A questionnaire to be filled-in by the participants

3.5 Execution

3.5.1 Data collection

3.5.2 Validation

4 Analysis

4.1 Collinearity analysis

4.2 Multiple regression analysis

5 Validity evaluation

Conclusion validity

Construct validity

Internal validity

External validity

6 Tool support

6.1 Background

6.2 Architecture analyzability metric

6.3 Integration of the metrics in the tool

7 Conclusions and future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation