Keywords

1 Introduction

Maintainability is one of the most important features of software systems [10]. Though, defining maintainability has been a topic of interest for many researchers over decades. A software application might fulfils all the traditional requirements and yet be of little use if the cost of maintaining it is too high. For such reason, maintainability has been considered one of the requirements that must be imposed to software products [3]. Several authors have investigated the models and metrics used for the best estimation of maintainability also known as prediction or software maintainability prediction models. Traditional models based on their accuracy of prediction have been represented by Shafiabady et al. [22]. Moreover, an analysis of the evolution of software maintenance models over the four past decades has been introduced by Lenarduzzi et al. [14]. Defining the right tools on code analysis to perform research studies on software maintenance prediction is also a crucial aspect and has received much attention in the last few years [18]. Recent papers [2, 5, 11, 15, 19] have identified tools that are available and can be used to apply maintenance models published in the studies in this area.

The paper is organized as follows: Sect. 2 briefly analyzes the state of the art; Sect. 3 introduces our investigation; Sects. 4 and 5 present and discuss the results; finally, Sect. 6 draws the conclusions and introduces future work.

2 Related Research

During the years, several models have been proposed for measuring the maintainability of software. A considerable amount of studies have employed different models and techniques for predicting software maintenance using basic metrics and models: McCall’s model in 1976 [20], Barry Boehm’s quality model presented in 1978 [1], Sneed-Mercy Model in 1985 [23], Li-Henry Model in 1993 [16], Marcela Genero Model in 2004 [8]. Later, slightly different techniques were used from simple statistical models such as regression [25] to machine learning [7, 24], and deep learning [9, 13]. To predict maintainability of software, different metrics have been proposed in literature. Among the large variety of metrics, the most used ones are the OO metrics depth of the inheritance tree (DIT), response for a class (RFC), number of children (NOC), coupling between objects (CBO), lack of cohesion of methods (LCOM), and weighted method per class (WMC) [4]. Other metrics are popular, such as: number of methods (NOM), lines of code (LOC), number of semicolons in a class (SIZE1), number of properties (SIZE2), and CHANGE metrics as dependent variable to predict software maintainability by calculating the changed number of lines in the class during maintenance process [16]. Investigating different primary studies and secondary studies from the literature, we have found out that even if there are a number of different methods to predict maintainability, almost all the studies built their models from scratch and do not extend existing ones.

3 Our Investigation

We decided to analyse software maintainability change following the approach proposed in [17] for the jEdit open source project using the JHawk tool to get useful insights. jEdit is an open source project developed in Java available in SourceForgeFootnote 1 with all existing versions with the related source code. We analyzed the application starting from version 3.2 up to version 5.5, a total of 12 versions. There are in total 41 versions but they have almost no differences compared to the selected ones. A significant difference compared to [17] is that we rely on general information outside the source code to examine and find potential relation between version numbers and changes to maintainability during the evolution of the project.

JHawk is Java-based open source framework which can analyse source code while performing static analysis generating graphical form results. It takes Java code as input and generates code metrics. We are interested in gathering features such as average cyclomatic complexity (CC), number of lines of code (LOC), and maintainability index (MI).

3.1 Collected Metrics

In this study we have focused on the following metrics:

  • Cyclomatic Complexity (CC): it is used to measure the complexity of a piece of code from the point of view of the possible execution paths. In particular, it measures the independent path through a source code as a proxy of its complexity. The higher the value, the more complex is the system and this fact results in difficulty to maintain the code. We have measured the average of CC which is stated by McCabe to be cyclomatic complexity per function in a file and is calculated as the sum of the CC of all function definitions, divided by the number of function definitions in the file.

  • Lines of code (LOC): it is used to measure the size of a software by counting the number of lines included in a program. There are multiple ways to count the lines of code but in this work we focus on the simplest definition that consider the physical lines of code of the source files.

  • Maintainability index (MI): it is based on Halstead Volume (HV) metric, the cyclomatic complexity (CC) metric, and the average number of lines per code per module (LOC) metric. The higher this value, the more maintainable the software results [21]. If MI of software increases it means that the software can be maintained easily, otherwise if it decreases the degree of difficulty to maintain the software is high [12]. Values of MI index as follows:

    • MI < 65: difficult to maintain

    • MI 65–85: maintainable

    • MI > 85: good maintainability

We have to highlight the fact that in JHawk MI is reported in two forms, one that takes into account comments of the program and ones which does not. In our database we have reported both of them, but the final result of the graphs below do contain only the ones with comments. It is stated that the reason that MI formula with comments does exist is because the formula to find comments is not standardized and comments are considered to be subjective from developer to developer and do vary from project to project. In such cases it is recommended to use the MI without comments as reported from official source development of the JHawk tool. However, we think that comments are an important part of the source code, therefore we considered them in our study.

4 Results

In Fig. 1 we present the results achieved from data collection of MI for each version of jEdit, as we have mentioned in the above sections.

Fig. 1.
figure 1

MI evolution.

We can notice that from version 3.2 to 4.2 there is a constant value of MI of about 130. Upon release of version 4.0, there is a tremendous decrease of MI to about 58. Then, with next releases up to version 5.3.0, there seem to be a slight increase about 68. The last versions have a value of MI up to 144.

We also have to consider the other two features collected (Fig. 2, 3 and 4). It is of high relevance to mention that the value of the average CC has been higher for versions of low MI index in other words for early releases. A slightly different trend follows the LOC metric. For higher values of LOC there is also a higher value for MI and vice versa. We can notice the trend of size metric for each version. For an increase of size from versions 4.3 to 4.4.1, there is a decrease in MI in these versions. For a decrease of size in versions 4.5 to 5.3, there is a slight increase of MI. Surprisingly, for an increase of size in 5.4.0 and 5.5.0, there is also an increase in MI.

Fig. 2.
figure 2

LOC and MI evolutions.

Fig. 3.
figure 3

CC evolution.

Fig. 4.
figure 4

CC and MI evolutions.

From Fig. 3 and 4, we can state that for an increase in CC from version 4.2 to 4.3, there is a decrease in MI. Moreover, for a decrease in CC from versions 4.3.3 to 4.4.1, there is an increase in MI. The same result follows 5.3.0 to 5.4.0 with a decrease in CC and an increase in MI. These outputs clearly highlight the fact that a high value of CC means more complexity added to the system which increases the degree of difficulty for the system to be maintained.

5 Discussion

In this section we are going to investigate further the results obtained from previous section. We are going to test hypothesis of possible correlation of LOC, CC with MI and LOC with CC. Since the MI is defined as a function that include LOC and CC as independent variables, we expect that MI is highly correlated to them but we want to investigate how strong is the effect of such variables on the overall values of the MI.

We use the same data provided in the initial steps of our work collected with the JHawk tool. To test correlation, we are postulating hypothesis in Table 1. In overall, correlation claims the strength of a relationship. We use the Cohen definition of correlation [6] where he defined a value less than 0.3 as a weak correlation, 0.3 to 0.5 as medium correlation, and greater than 0.5 as strong correlation. We are interested to understand the strength of such correlation. To measure the strength of relationship in our data variables we use Spearman rank correlation for the reason that it does not assume any linear relation between variables and it does not assume any specific distribution of data. We analyze the data in Python, using a Jupyter Notebook with the stats.py statistical package.

Table 1. Hypotheses.

For this project, Spearman correlation coefficients are:

  1. 1.

    0.129 for LOC and MI with p-value 0.522 which aims result is not significant, so we can not reject hypothesis null.

  2. 2.

    −0.098 with a p-value 0.628, again not significant so we can not reject hypothesis null.

  3. 3.

    0.513 with p-value 0.006 is significant therefore we can reject hypothesis null and conclude that a relationship exists.

We have performed the same kind of analysis over other 4 open source projects:

  • SoundHelix (0.6-0.9 versions): a Java framework for algorithmic random music composition. It can generate random songs, play them, and it is highly customizable using XML configuration as described in SourceForge.net.

  • jWebUnit (1.0-3.2): a Java framework for testing web applications.

  • jXLS (0.7-2.7.1): a Java library for writing Excel files using XLS templates and reading data from Excel into Java objects using XML configuration.

  • jTDS (0.1-1.3.1): an open source driver for Microsoft SQL Server

We went again through same process described for the analysis of jEdit to collect LOC, MI, and CC metrics, and perform the correlation analysis as reported in Table 2.

Table 2. Correlations in all the considered projects.

As we can notice in Table 2, the values in red identify significant values. All projects include some significant values except SoundHelix. There is a strong correlation of the LOC metric and MI in JXLS and JTDS and a strong correlation of CC and LOC in jEdit project. Also to be highlighted is the fact that exist strong negative correlations. The not significant results obtained from the SoundHelix project might derive from the fact that it is the project with the least data collected, this could have affected the overall analysis since only 4 versions were analyzed.

6 Conclusions and Future Work

The paper has introduced a preliminary analysis of the maintainability of 5 popular open source projects and how strongly it is correlated with some basic code metrics (lines of code and cyclomatic complexity) that are used in the definition of the Maintainability Index.

A systematic analysis of the maintainability of popular open source projects to understand how the code is managed and how the development team address the evolution of the code is important to increase the level of awareness in the usage of such products and to understand the possible effects in the long term.