1 Introduction

In order to keep pace with ever-changing user, business and technological requirements, the source code of a software system often needs to be changed. It has been observed that the short deadlines of project delivery, budget constraints, and unfamiliarity of existing source code generally forces developers’ to focus on functionality rather than design structure (Fowler et al. 1999; Mancoridis et al. 1998). Such maintenance practices increase the complexity of design structure and degrade software quality. The software system with poor design quality is difficult to understand and evolve. The problem becomes more difficult in case of highly convoluted software design with the unavailability of their up-to-date documentation as well as original developers (Mkaouer et al. 2015).

There are many source code anomalies that contribute in degradation of design quality of a software system. In an object-oriented software system, the suboptimal placement of source code classes into packages is one of the crucial anomalies that cause the degradation of design quality (Bavota et al. 2014). To improve the design quality of software system various software remodularization approaches based on deterministic and search-based techniques have been proposed (e.g., Praditwong et al. 2011; Barros 2012; Prajapati and Chhabra 2017, 2017a; Parashar et al. 2016; Corazza et al. 2016; Bavota et al. 2010, 2013; Prajapati and Chhabra 2017b; Mkaouer et al. 2015a; Kumari et al. 2013; Doval et al. 1999; Prajapati and Chhabra 2014). The deterministic based software remodularization approaches perform well for small size software, however, for large and complex software they become impractical and sometimes infeasible (Mancoridis et al. 1998; Harman et al. 2012).

In case of a large and complex software, search-based software engineering (SBSE) (Harman et al. 2012) approach has been found as a good alternative for solving a software remodularization problem (Prajapati and Chhabra 2017; Bavota et al. 2013; Prajapati and Chhabra 2017b; Mkaouer et al. 2015a). The major advantage of using SBSE approach is that it guarantees in the generation of near-optimal solution within a reasonable amount of time. The effectiveness of SBSE based software remodularization approaches depends on many factors such as search algorithms, fitness and objective function formulations, etc. However, fitness and objective function formulation are the two most important factors that help in driving the remodularization process towards good solutions (Bavota et al. 2014; Anquetil and Lethbridge 1999). Hence, in order to achieve a good quality remodularization solution appropriate formulation of fitness and objective functions need to be incorporated in search algorithms.

In last two decades, unprecedented efforts have been put forward in designing fitness and objective functions along with search algorithms to solve the different aspects of single and multi-objective software remodularization problems (e.g., Kumari et al. 2013; Prajapati and Chhabra 2014, 2018, 2018a). Majority of the fitness and objective functions for software remodularization are designed in terms of direct link coupling of software artefacts, such as method calls or inheritance (Mancoridis et al. 1998; Praditwong et al. 2011; Barros 2012; Prajapati and Chhabra 2017, 2017a). However, some researchers (Parashar and Chhabra 2016; Corazza et al. 2016) have attempted to design the objective functions by analyzing the sibling link similarity based on lexical and changed information. Some other researchers have tried to combine the structural based direct link similarity with lexical based sibling link similarity in their remodularization techniques (Bavota et al. 2010, 2013, 2014; Prajapati and Chhabra 2017a). Other researchers have also attempted to design the objective functions by combining the changed history information with structural information and lexical information (Mkaouer et al. 2015a).

Although the existing fitness and objective functions designed for search-based software remodularization approaches have been reported to be quite effective, the major limitation of such approaches is that they are not able to generate the remodularization solution that is meaningful from developers’ perspective (Mkaouer et al. 2015a) The main reason is that such remodularization fitness and objective functions do not comply with the perspective of developers. Therefore, to generate a remodularization solution that is meaningful from developers’ perspective, we need to improve our remodularization fitness and objective functions that must consider factors that convey the developers’ perception along with optimization of design principles.

Further, the existing software remodularization approaches commonly treat all sources of information of entities to be modularized equally (i.e., presence or absence of a feature) to determine the fitness and objective functions. However, software developers usually give different importance to different types of features while fitness and objective functions (Bavota et al. 2013a). Therefore, the fitness and objective functions used for remodularization evaluation should consider different dimensions of information with their relative importance. Most of the search-based software remodularization approaches except (Mkaouer et al. 2015a) ignore the changed history dependency information. However, the changed history information may reveal many dependencies among software components which cannot be observed by the structural or lexical based information. The study (Bavota et al. 2013a) showed that the changed history information is also one of the factors that some extent reflects the developers’ perception of coupling. Therefore, remodularization fitness and objective functions should also consider the changed-history information along with the structural and lexical information.

To address the above-discussed issues, this paper introduces a multi-objective formulation of object-oriented remodularization problem where entropy-based similarity measure along with inter-module class change coupling, intra-module class change coupling, module size index (MSI), and module count index (MCI) have been used as objective functions. In this contribution, the remodularization objective functions use different types of structural as well as lexical information that captures the developers’ aspects of coupling in the computation of similarity measure. Moreover, the approach uses different types of structural and lexical information with their relative importance. However, relative weights of different dimensions of information are subjective in nature and depend on many factors (e.g., quality measurement goal). To deal with this, this paper uses term frequency-inverse document frequency (TFIDF) (Yates and Neto 1999; Corazza et al. 2016) to compute the weight. Using the different dimensions of structural and lexical information, an information theoretic similarity measure (i.e., entropy-based similarity measure) has been used. The information theoretic concepts have been successfully applied to other unsupervised machine learning approaches (e.g., data clustering) (Gokcay and Principe 2002; Sugiyama et al. 2014; Andritsos and Tzerpos 2005). The entropy measures uncertainty about a random event, which can be used to design a remodularization criterion for restructuring the packages of software systems. In fact, when we assign a class to one of the different modules we incur an entropy cost. Minimizing this incremental entropy cost could be an effective evaluation criterion for software remodularization. The model also exploits change-history information stored in the version repository. In particular, the approach extracts the changed dependencies between the classes and uses them to make the remodularization solution consistent with changed history. The major idea of using such dependencies in remodularization is to force the remodularization process towards a solution where classes changing together should be grouped together.

The organization of the rest of this paper is as follows: Section 2 presents related works. Section 3 provides background from information theory and structural/lexical based coupling computation. Section 4 discusses the proposed approach. Section 5 presents an experimental setup. Section 6 presents results and discussion. Section 7 concludes with future directions.

2 Related Works

Automatic remodularization of software systems has become an interesting application for SBSE, where a different aspect of software remodularization problems are simulated as search-based optimization problems (e.g., such as single, multi or many-objective optimization) and are solved using search-based meta-heuristics (Harman et al. 2012). The main attraction of SBSE towards software remodularization is that the combinatorial and NP-hard nature of software remodularization problem makes SBSE approaches best alternative (Praditwong et al. 2011). Recently, many researchers have applied various SBSE approaches by adopting different metaheuristics and single/multi-objective formulations to address the different aspects of software remodularization problems (Praditwong et al. 2011; Ouni et al. 2013, 2014, 2015; Prajapati and Chhabra 2017a; Kumari et al. 2013; Ouni et al. 2016, 2016a, b , 2017; Mancoridis et al. 1998).

In previous literature, the software remodularization problem has been defined in different ways according to different aspect of software restructuring. For example, 1) number of objectives: single-objective remodularization (Mancoridis et al. 1998, 1999; Doval et al. 1999) multi-objective remodularization (Praditwong et al. 2011; Barros 2012; Kumari et al. 2013; Prajapati and Chhabra 2014), and many-objective remodularization (Mkaouer et al. 2015, 2015a; Prajapati and Chhabra 2018, 2018a), 2) type of information: structural-based remodularization (Praditwong et al. 2011; Mancoridis et al. 1998, 1999; Mahdavi et al. 2003), lexical-based remodularization (Corazza et al. 2016), and combined structural + lexical based remodularization (Mancoridis et al. 1998; Prajapati and Chhabra 2017a; Bavota et al. 2010) level of modifications: moderate remodularization (Bavota et al. 2010; Prajapati and Chhabra 2017; Abdeen et al. 2009) software clustering (Praditwong et al. 2011; Kumari et al. 2013; Ouni et al. 2016).

The application of SBSE technique to the software remodularization is approximately two-decade-old. In the formulation of search-based remodularization, the first credit goes to the authors Mancoridis et al. (1998) who first applied the SBSE concepts to cluster the software entities into more cohesive form. In their contribution, they also introduced modularization quality (MQ) measure to evaluate the clustering quality which is defined in terms of two software quality attributes (i.e., inter-connectivity and intra-connectivity). The MQ, a structural information based software modularity quality criterion, has been widely used as the fitness function to guide the remodularization process (Doval et al. 1999; Harman et al. 2002; Mitchell and Mancoridis 2002; Mamaghani and Meybodi 2009). The authors Mancoridis et al. (1999) customized different meta-heuristic search techniques such as Genetic Algorithm (GA), Simulated Annealing (SA), and Hill-Climbing (HC) algorithm to address the software module clustering problem.

Recently, authors Praditwong et al. (2011) used the MQ measure along with other software clustering criteria to formulate the software clustering problem as a multi-objective optimization problem. They also introduced two new multi-objective clustering formulations namely maximize cluster approach (MCA) and equal cluster size approach (ECA). Each of the MCA and ECA formulations contains five partially conflicting objectives and is based on the structural information. The researchers (Barros 2012; Kumari et al. 2013; Prajapati and Chhabra 2014) have also used the MCA and ECA formulation to evaluate different meta-heuristic algorithms.

The above discussed MQ measure is based on the direct link coupling. Recently, the authors Jinhuang and Jing (2016) defined a new MQ measure which is determined on the basis of similarity coupling. Their experimentation results demonstrated that the similarity based MQ outperformed compared to the direct link based MQ. The authors (Prajapati and Chhabra 2017a) have also used the similarity based coupling measure to remodularize the software system. The results demonstrated that the similarity based coupling measure is able to generate good quality software modularization.

Even though structural based software modularity measure able to drive search algorithms towards remodularization solution which is good from the structural point of view but not good from a semantic perspective or developers view. To remodularize the software system which is good from the semantic point of view the researchers Corazza et al. (2016) used the lexical information to compute the similarity between the software entities. Their results showed that the lexical based software remodularization is able to generate remodularization solution which is better from the semantic perspective. Some researchers (e.g., Prajapati and Chhabra 2017a; Bavota et al. 2010, 2013, 2014) used combined structural and lexical information to remodularize the software system.

3 Basic Concepts

This section presents a brief description of information theory which is used in our proposed many-objective remodularization approach. The information theory concepts are very wide; here it is not possible to describe them in detail. The interested reader may find details about the information theory concepts in any information theory textbook (e.g., Cover and Thomas 1991). Apart from the basic concepts of information theory, in this section, a brief description about the various types of structural (e.g., calls, inheritance, contains, etc.) and lexical (e.g., method name, class name, parameter name, etc.) are also provided.

3.1 Minimum Entropy Concept

In this section, we explain the concepts of software entropy corresponding to an object-oriented software remodularization. Here the term feature is used to refer to different types of coupling information (e.g., structural and lexical) of a class. The different values that each feature takes are referred as a feature values. We assume that the object-oriented software to be remodularized contains a set of N number of source code classes, i.e., C = {c1, c2, …, cN} and each class have a set of M features, i.e., F = {f1, f2,. .., fM} with feature values wi of i-th feature. Our approach starts by representing software system into matrix M as given in Table 1.

Table 1 Matrix M representing software system

The rows of the above matrix represent the source code classes to be remodularized while the columns shows the values of the features that describe these source code classes. Each entry of the matrix Mij is filled with the coupling value of jth feature in ith class. Let X represents a discrete random variable taking its values from a set of classes C.

If p (xi) is the probability distribution function (pdf) of the values xi that X takes (xi ϵ C), the entropy H(X) of variable X is defined as follows:

$$ H(X)=-\sum \limits_{x_i\in C}p\left({x}_i\right)\log \Big(\left(p\left({x}_i\right)\right) $$
(1)

Intuitively, entropy quantifies the disorder of a system. The higher the uncertainty leads higher the entropy. Since entropy determines the amount of “disorder” of a system, many approaches utilizes some form of such a measure as a quality criterion (i.e., fitness function) for clustering different types of data (Cover and Thomas 1991; Gokcay and Principe 2002; Hino and Murata 2014). In clustering, the cluster containing elements with high similarity showed the low entropy.

In the context of the remodularization of object-oriented package structure, we assume that a module/package is the group of classes, which has minimum entropy. It means that a module is a partition of set of classes with minimum degree of “disorder”. The entropy of a module is directly related to its classes. In terms of probability, the entropy of the module depends of the probability distribution function of its classes. For object-oriented software system, let N number of source code classes are distributed among the M number of modules and ci is an ith class of N. The pdf of ci is defined as follows:

$$ p\left({c}_i\right)=\sum \limits_{t=1}^Mp\left({c}_i|t\right)p(t) $$
(2)

where p(t) is the prior probability for the tth module and p(ci|t) is the prior probability of ci given the tth module. However we would like to know the dependence of pdf of the tth module with respect to ci. This dependency can be computed using Bayes Theorem (Joyce 2008):

$$ p\left(t|{c}_i\right)=\frac{p\left({c}_i|t\right)p(t)}{p\left({c}_i\right)} $$
(3)

when p(t|ci) is uniformly distributed for all t, we can say that the class ci belongs to any module and thus the uncertainty is maximum. On the other hand if all p(t|ci) but one are zero (one having the value unity) then we are certain about the module to which the class ci belongs to. Now, let C be a random variable whose possible values are 1, 2,..,K which represent the modules of M. Let X be a random variable whose possible values are all elements ci that belong to N. Then the entropy of C given X is:

$$ H\left(C|X\right)=-\sum \limits_{t=1}^Kp\left(t|{c}_i\right)\log \left(p\left(t|{c}_i\right)\right) $$
(4)

where p(t|ci) is a posteriori pmf. Thus, our goal is to find this function such that H (C|X) is minimum. The entropy given by Eq.4 is called Conditional Entropy.

The information theoretic similarity measure (i.e., H (C|X)) as discussed above is a representative of collective similarity measure rather than traditional direct link similarity measure. The direct link similarity measures (e.g., inter-module class coupling and intra-module class coupling) may leads optimization process towards remodularization solution better from the coupling and cohesion perspective and may not be meaningful from the developers’ perspective. However, information theoretic similarity measure encompasses many dimensions of similarity; hence it is supposed that the incorporation of such similarity measure into remodularization process may lead generation of meaningful solution. To compute the conditional entropy more accurately, we need to compute the feature value of classes more accurately. In this paper, we consider structural as well as lexical features of the source code classes.

3.2 Structural and Lexical Features

In this paper, we utilize various types of class information to determine the class features. In particular, the proposed approach considers 8 different types of structural (e.g., calls, inheritance, contains, etc.) and 6 different types of lexical (e.g., method name, class name, parameter name, etc.) information as class features.

3.2.1 Structural-Based Features

The classes in an object-oriented software system may be connected with other classes by zero or more structural coupling relationships (e.g., method calls, inheritance, references, etc). Hence, structural features of a source code class can be defined in terms of individual classes which are connected with that class. In this paper, the structural features of a class are the different structural-relationships by which it is connected with another class. Combination of all structural relations among a pair of classes determines the connection strength among these two classes and is considered as the feature value for this pair. To determine the coupling strength between the classes, we use eight different types of structural relationships between the classes (Prajapati and Chhabra 2017a). The brief descriptions of these relationships are given in Table 2.

Table 2 Structural relationships existing between two classes

The value of connection strength between class ci and class cj is computed by aggregating the number of instances of each relationship with their relative weights. The connection strength (CS) from class ci to class cj is defined as follows:

$$ CS\left({c}_i,{c}_j\right)=\sum \limits_{r\in R}{w}_r\left({c}_i,{c}_j\right)\times {n}_r\left({c}_i,{c}_j\right) $$
(5)

where nr and wr represent the number of instances and weights of r-type relationship respectively. To compute the relative weights wr for r-type relationships this paper uses term frequency-inverse document frequency (TFIDF) weighting scheme. This is the most widely used technique in data mining to assign the weights to document terms (Yates and Neto 1999). In this study, the documents are the source code classes and terms are their relationships. The weight wr of r-type relationship from class ci to class cj is given as follows:

$$ {w}_r\left({c}_i,{c}_j\right)= tf(r).\log \left( idf(r)\right) $$
(6)

Where tf(r) (term frequency) is the frequency of r-type relationship from class ci to class cj and idf(r) (inverse document frequency) is the fraction n/nr(cj), where n is the number of classes, and nr(cj) is the number of classes connected with class cj with r-type relationships.

3.2.2 Lexical-Based Features

In addition to structural-based features of source code classes, the proposed remodularization approach also uses the lexical-based features of classes. Different types of unique terms present in the classes are considered as the feature of the classes. Our approach uses six major categories of lexical features similar to the approach reported in (Corazza et al. 2016). A brief description of the different categories is given in Table 3.

Table 3 Types of lexical class relationships

Similar to the structural-based features, here first each lexical-based feature (i.e., the terms with their occurrence) is computed, then the relative importance (weights) of each term is determined, and finally, the corresponding weight is multiplied with the actual terms occurrences. Given a class ci, the weight of the term ti of a particular zone is computed as follows:

$$ w\left(t,c,z\right)= tf(t).\log \left( idf(t)\right) $$
(7)

where tf(t) (term frequency) is the frequency of term t of zone z for class c and idf(t) (inverse document frequency) is the fraction n/df(t, z), where n is the number of classes, and df(t, z)is the number of classes in which the term t occurs, within the zone z.

4 Proposed Approach

This section presents the detailed descriptions of major steps used in proposed information-theoretic software remodularization. The main objective of the proposed approach is to re-organize the source code classes of object-oriented software system among packages/modules such that the re-grouping is good from the quality metrics point of view as well as meaningful from developers’ perspectives. To achieve the goal, proposed work defines five package quality evaluation criteria (i.e., package entropy (to minimize), inter-module class change-coupling (to minimize), intra-module class change-coupling (to maximize), Module Count Index (to minimize), and Module Size Index (to minimize)) that help in guiding the remodularization process towards more promising search region.

The complete framework of overall working process of the proposed software remodularization approach is presented in Fig. 1. The activities of whole framework are divided into three main phases. In first phase, the required information regarding the structural, lexical, and changed history of software are extracted. In second phase, structural and lexical coupling is computed independently and then combined together. In third phase, package entropy, inter-module class change-coupling, intra-module class change-coupling, Module Count Index, and Module Size Index are computed. Finally, the search-based algorithm is applied to generate the remodularization solution.

Fig. 1
figure 1

Framework of proposed remodularization approach

4.1 Extraction of Software Information

In this phase the software information such as structural information (e.g., classes, packages, and class relationships) lexical information (e.g., class name, attribute name, method name, parameter name, comments, and source code statements), and changed history information are collected. As the proposed software remodularization approach is specially designed for the object-oriented software implemented in the Java programming language. Hence, various terminologies and information used in this work is based on the Java programming language.

4.2 Remodularization Objectives

The automated software remodularization driven by a search-based meta-heuristic algorithm requires fitness functions that can force the optimization process towards expected solution. To achieve a meaningful remodularization solution from the developers’ perspective, it is necessary to incorporate a better software evaluation model and metrics during remodularization process that can reflect developer’s perspective. The developers’ perspective can be inference from the source code information where developers’ knowledge is embedded. Based on the various dimensions of structural and lexical information, change-history information, and module dispersion information this paper designs the following objective functions:

  • Software Entropy: Entropy is the measure of disorder; higher the entropy, lower is the certainty. For a good software module, it is necessary that it should contain highly correlated classes. The entropy of the software system is defined as follows (Aldana-Bobadilla and Kuri-Morales 2011):

$$ H\left(C|X\right)=\sum \limits_{t=1}^KH\left(t|X\right) $$
(8)

where H(t|X) is the entropy of module t. The objective is to minimize the entropy for each module.

  • Inter and Intra-module Class Change Coupling: The classes changed together should be grouped together is one of the core design principle for software systems. Here, we use the change-history information from the version repository and compute the change coupling between classes in terms of change-coupling at the class level by mining their co-change pattern. The inter-module class change coupling refers to the total class change-strength within modules and intra-module class change coupling refers to the total class change-strength between modules (Parashar and Chhabra 2016). The change-strength between two classes is defined as follows:

$$ Change- strength\;\left({C}_i,{C}_j\right)=\frac{\mid {C}_i\cap {C}_j\mid }{\mid {C}_i\mid }+\frac{\mid {C}_i\cap {C}_j\mid }{\mid {C}_j\mid } $$
(9)

where |CiCj| is count of change-commits in which both Ci and Cj are changed together. |Ci| is count of change-commits consisting of Ci.

  • Module Count Index (MCI): In search-based software remodularization, the optimization process driven by the only similarity criteria may lead to generation of singleton module or modules containing single entities. To avoid such situation, this work uses two other conflicting namely module count index (MCI) and module size index (MSI) introduced by the authors (Prajapati and Chhabra 2017a). The MCI determine the deviation between the number of modules produced during optimization and number of modules set by the by developers. The MCI is defined as follows:

$$ MCI=\exp \left[-\frac{1}{2}{\left(\frac{\ln (m)-\ln \kern0.24em \left(m\ast \right)}{w\;\ln (n)}\right)}^2\right] $$
(10)

Where m, m*, and n represent the number of modules in produced solution, number of module defined by the developers, and number of classes respectively. The parameter ‘w’ represents the penalty factor that penalizes the remodularization which is far away from m*.

  • Module Size Index (MSI): Then main goal of the MSI objective function is to prevent generation of very large size of modules (i.e., containing large number of entities). In other words, the MSI evaluate the deviation between the size of generated modules and size of developer’s modules. To determine the ideal module size, the researchers used the method of Component Packaging Density (CPD) (Abdeen et al. 2009). The MSI is defined as follows:

$$ MSI=\exp \left[-\frac{1}{2}{\left(\frac{\ln\;\left({s}_{avg}\right)-\ln\;\left(s\ast \right)}{w\;\ln (n)}\right)}^2\right]\kern0.84em where\kern0.96em {s}_{avg}=\sum \limits_{i=1}^n\frac{s_i}{n}\kern0.72em $$
(11)

where si is the size of the module in which class i locate in. The ideal module size is defined as s* = n/m*. Similar to the MCI, the parameter ‘w’ represents the penalty factor.

4.3 Remodularization Problem Encoding

To apply a search-based meta-heuristic, the problems need to be encoded in a suitable form such that the various manipulator operator of meta-heuristics can be performed effectively. To encode the software package organization of existing software system, an n-sized (i.e., equal to the number of lasses) integer array is used, where the value v 0 < v ≤ p of ith element indicates the package number v, to which the ith source code class is assigned. The symbol p represents the number of packages/modules. For example, in Fig. 2, array index 5 represents the class 5 and the value at index 5, i.e., 2 shows that the class 5 is assigned to the module 2.

Fig. 2
figure 2

An example of remodularization encoding

To initialize the population, the solutions of the population are generated randomly in the range of lower and upper bounds of the each decision variables.

4.4 Many-Objective Evolutionary Algorithm

There has been an important progress in formulating the real-world optimization problem as search-based multi-objective optimization problem and solving them using multi-objective evolutionary algorithms MOEAs (e.g. NSGA-II (Deb et al. 2002), SPEA2 (Zitzler et al. 2002), and PESA-II (Corne et al. 2001). These algorithms used the Pareto-dominance concept to rank the solutions in the population for the purpose of selection. However, the studies (Bingdong et al. 2015; Jaimes et al. 2009; Wang et al. 2015) demonstrated that the Pareto-dominance based MOEAs approaches are not able to perform well when the number of objective functions in the optimization problem gets large (specifically more than three).

In our software remodularization, there are five objective functions that have to be optimized simultaneously in order to improve the existing package structure. Hence, in this case the Pareto-dominance based MOEA may not work effectively. To optimize these objective functions simultaneously and effectively, we adapt NSGA-III (Deb and Jain 2014), MOEA/D (Zhang and Li 2007), IBEA (Zitzler and Kunzli 2004), and TAA (Praditwong and Yao 2006) the most popular many-objective evolutionary which are designed to work effectively in case of large number of objective functions. These algorithms have been applied successfully to solve the different many-objective optimization problems (Jain and Deb 2014).

5 Experimental Setup

To evaluate the ability of our entropy-based software remodularization to generate good modularization solution, we conducted a set of experiments on seven open-source software systems. The details of experimental setup includes 1) description of software systems on which proposed approach is evaluated, 2) results collecting method, 3) the results evaluation criteria, 4) existing remodularization algorithms, and 5) used Statistical tests.

5.1 Studied Software Projects

We chose seven object-oriented software systems each one is characterizing a real-world software system with diverse complexity in terms of number of connections, number of classes, number of modules, and lines of code (LOC). These software systems are open-source and written in Java programming language. The main reason of considering these software systems is that they are different sizes and complexities and these software systems have also been used by the previous researchers (Prajapati and Chhabra 2018; Erdemir and Buzluca 2014; Prajapati and Chhabra 2018a; Bavota et al. 2013) to evaluate the similar approach. Table 4 provides the complete description of their respective characteristics.

Table 4 Characteristics of the used software projects

5.2 Collecting Results

Since search-based software remodularization approaches are stochastic optimizers, they can generate different results for the same software instance from one run to another. For this reason, we collect results from our proposed software remodularization approach by applying it for each test software system on 31 independent runs. In each execution, the many-objective optimization techniques generate a set of non-dominated solutions. To select a single solution that gives maximum trade-off to all the considered objective functions, we use the trade-off worthiness metric defined in the works (Rachmawati and Srinivasan 2009).

5.3 Result Evaluation Criteria

Evaluations of the search-based remodularization results are generally performed by two approaches: internal and external assessment. Internal assessment approach is to evaluate the quality of the internal characteristics of the modules in produced remodularization. There exist a number of quality metrics to assess the remodularization internally, for example, coupling and cohesion (Cui and Chae 2011), modularization quality (Praditwong et al. 2011), size of clusters (extremity) (Glorie et al. 2009; Erdemir and Buzluca 2014), and the number of clusters (Wang et al. 2010). In this study, we intend to use the size of clusters (extremity).

The aim of the external assessment is to find the association between obtained remodularization and the authoritative remodularization suggested by a human expert (e.g., original developer). The approach is also known as Authoritativeness. The produced remodularization solution should resemble the authoritative remodularization as much as possible (Wu et al. 2005). To find the authoritativeness, different measures may be used, MoJo and MoJoFM (Wu et al. 2005; Tzerpos and Holt 1999; Andritsos and Tzerpos 2005; Bittencourt and Guerrero 2009) and precision, recall (Sartipi and Kontogiannis 2003). In this study, we used MoJoFM a most widely used measure for authoritativeness. The brief descriptions of these metrics are given in following sub-sections.

5.3.1 Non-Extreme Distribution (NED)

For a well-modularized software system, it is necessary that the size of any individual module should not be extremely small or large (Erdemir and Buzluca 2014). Hence, an automatic software remodularization approach should generate a modularization solution that has a better distribution of classes into the modules. To evaluate the extremity of module size, Wu et al. (2005) defined a non-extreme distribution (NED) as follows:

$$ NED=\frac{\sum_{i=1,{M}_i\; IS\kern0.34em NOT\kern0.34em EXTREME}^k\mid {M}_i\mid }{n},{M}_i\ is\kern0.17em not\kern0.17em extreme\kern0.17em if\kern0.05em 5<\mid {M}_i\mid <1.5\times \mid M{A}_{\mathrm{max}}\mid $$
(12)

Where k is the number of modules, n is the total number of classes, |Mi| is the size of module i, and |MAmax| is the size of the largest module.

5.3.2 Authoritativeness

Authoritativeness is a measurement used for determining the similarity between the remodularization solution generated by the automatic remodularization method and the remodularization solution suggested by the experts (Erdemir and Buzluca 2014). To find the authoritativeness, different measures may be used. In this study, we use the MoJoFM measure. Let M and Ma be two remodularization solutions, mno(M,Ma) be the minimum number of join and move operations to transform remodularization M to remodularization Ma where join operation combines two modules into single module and move operation moves a component from one module to another module. MoJoFM (M,Ma) is defined as follows:

$$ MoJoFM\left(M, Ma\right)=100-\left(\frac{mno\left(M, Ma\right)}{\max \Big( mno\left(\forall M, Ma\right)}\times 100\right) $$
(13)

where MoJoFM(M,Ma) represents authoritativeness, M and Ma represent the modular architecture generated by the approach and authoritative modular structure suggested by the experts respectively, and max(mno(∀M, Ma)is the maximum possible distance of any remodularization M from the remodularization Ma.

In practice, for academicians it is a very difficult task to find the original developers of the software systems being evaluated, who were engaged in developing the corresponding original software system. To overcome this problem, we use the same method to obtain authoritative remodularization as used by the previous researchers (Erdemir and Buzluca 2014; Wu et al. 2005; Corazza et al. 2016). The process of obtaining authoritative remodularization for a test software system is summarized as follows: 1) Find the packages and classes associated with that package, 2) Validate the existing package organization with the comments are written in the source code class in these software systems, 3) Merge a package with its closed package in case it contains a number of classes that are less or equal to five, 4) Final, authoritative remodularization is developed from the preliminary authoritative remodularization by involving external expert developers.

5.3.3 Stability

An automatic software remodularization approach should not generate dramatically different modular structure for a similar version of the software with minor changes. Stability is formulated as follows:Stability(Mn) = MoJoFM(Mn, Mn − 1), where Mn and Mn-1are the software remodularization results generated for two consecutive versions of a software system.

5.4 Rival Remodularization Approaches

Most of the existing software remodularization approaches used modularization quality (MQ) (Mancoridis et al. 1998) metric as remodularization objective function to drive the optimization process. In the multi-objective formulation of software remodularization problem, the MQ is used as core objective along with other supportive objective functions. The authors Praditwong et al. (2011) redesigned the MQ and used it as core objective function along with other four supportive objective functions (e.g., inter-cluster coupling, intra-cluster coupling, number of clusters, etc) to remodularize the software systems. Similarly, the authors (Barros 2012; Prajapati and Chhabra 2017, 2017a; Kumari et al. 2013; Prajapati and Chhabra 2018, 2018a) used the MQ as the core objective function in their multi-objective formulation of software remodularization problem. The MQ is defined as follows:

$$ MQ=\sum \limits_{k=1}^nM{F}_k $$
(14)

Where n is the number of packages/modules and MF is the modularization factor. The Modularization Factor (MFk) for module k is defined as follows:

$$ M{F}_k=\Big\{{\displaystyle \begin{array}{ll}0,& if\kern0.24em i=0\\ {}\frac{i}{i+\frac{1}{2}j},& if\kern0.24em i>0\end{array}} $$
(15)

where i is the coupling of the classes within the packages k and j is the coupling between the classes of package k and the classes exist in rest of the packages of the system. The coupling between the classes can be determined based on the different types of information (e.g., structural information, lexical information, and combined structural and lexical information). To make a fair comparison between our entropy based software remodularization (entropy as core objective) and existing MQ based software remodularization (MQ as core objective), we have taken the same supportive objective functions (i.e., minimize inter-module class change coupling, maximize intra-module class change coupling, minimize module count index, minimize module size index). Table 5 summarizes the used objective functions in the proposed approach and existing search-based multi-objective remodularization approaches.

Table 5 Many-objective remodularization approaches

5.5 Statistical Tests

The meta-heuristic optimization algorithms are random optimizers (i.e., algorithms can generate different results over the same test problem from one run to another Mkaouer et al. (2015). The result obtained through a single run cannot be used to reveal any conclusion about the algorithms. Hence, it becomes necessary to obtain a set of results for the same problem instance over many runs. In this study, we collect the results by executing each algorithm 31 times on the same problem instance. The sample with 31solutions are statistically analyzed by using the Wilcoxon rank sum test (Arcuri and Briand 2011) with a 95% confidence level (α = 5%).

6 Results and Analysis

In this section, we present the authoritativeness, NED, and stability results achieved through proposed and existing search-based software remodularization approaches.

6.1 Authoritativeness

Tables 6, 7, 8, and 9 present the authoritativeness results achieved through the proposed entropy-based approach and existing remodularization approaches (i.e., structural similarity, lexical similarity, structural + lexical based) on seven software systems for each considered many-objective meta-heuristic algorithms (i.e., NSGA-III, MOEA/D, IBEA, and TAA). The columns of each table labelled with Wilcoxon test shows the statistical results of comparison between the proposed approach and existing remodularization approaches. The symbol’+’ denotes that there is a statistically significant difference between the proposed approach and the existing approach and it is in favour of the proposed approach. The symbol’–’ denotes that there is a statistically significant difference between the proposed approach and the existing approach and it is in favour of the existing approach. The symbol ‘≈’ denotes that there is no significant difference between the proposed approach and the existing approach. In the sequence three symbols of the Wilcoxon test columns, the first symbol is the result of the statistical test between structural-based approach and proposed approach, the second symbol is the result of a statistical test between lexical-based approach and proposed approach, and the third symbol is the result of a statistical test between structural + lexical based approach and the proposed approach. For example, the first symbol (i.e., ‘+’) in sequence “+ + –” for Weka software system of Table 7, represents that there is a significant difference between the proposed approach and the structural-based approach and the difference is in favour of the proposed approach. The second symbol (i.e., ‘+’) in sequence “+ + −” for Weka software system of Table 7, represents that there is a significant difference between the proposed approach and the lexical-based approach and the difference is in favour of the proposed approach. Similarly, the third symbol (i.e., ‘–’) in “+ + −” for Weka software system represents that there is a significant difference between the proposed approach and the structural + lexical based approach and the difference is in favour of the structural + lexical based approach.

Table 6 Authoritativeness using NSGA-III
Table 7 Authoritativeness using MOEA/D
Table 8 Authoritativeness using IBEA
Table 9 Authoritativeness using TAA

Now if we see the authoritativeness results presented in Tables 6, 7, 8, and 9 achieved through proposed and existing approaches on seven software systems with many-objective meta-heuristic algorithms (i.e., NSGA-III, MOEA/D, IBEA, and TAA), it is clearly indicates that the proposed approach outperforms the existing approaches by producing significant better authoritativeness values in most of the cases. In particular, the proposed approach outperforms the structural and lexical model significantly better in all test software systems. However, in some cases, authoritativeness of proposed approach is competitive with the structural + lexical approach.

Apart from the comparison between the proposed approach and existing approaches, we have also compared the authoritativeness of each many-objective meta-heuristic algorithms (i.e., NSGA-III, MOEA/D, IBEA, and TAA) over each remodularization approach (structural, lexical, structural + lexical, and entropy-based). These comparative results demonstrated by Fig. 3. Now if we observe the results of Fig. 3, it is clearly indicated that the NSGA-III algorithms performs better compared to the MOEA/D, IBEA, and TAA. However, the results of MOEA/D, IBEA, and TAA algorithms are competitive.

Fig. 3
figure 3

The impact of different many-objective meta-heuristic algorithms on remodularization

6.2 Non-Extreme Distribution (NED)

Tables 10, 11, 12, and 13 present the NED results achieved through the proposed entropy-based approach and existing remodularization approaches (i.e., structural similarity, lexical similarity, structural + lexical based) on seven software systems for each considered many-objective meta-heuristic algorithms (i.e., NSGA-III, MOEA/D, IBEA, and TAA). The meanings of symbols used in Wilcoxon test columns of each table are same as described in section 6.1. For each proposed and exiting approaches the 100% NED value denotes the most stable remodularization solution. Now if we see the results presented in Tables 10, 11, 12, and 13, it clearly shows that the proposed entropy-based approach is able to achieve 100% NED for each many-objective meta-heuristic algorithms on each of the test software systems. However, existing remodularization approaches results are competitive and produce only slightly lower NED values in some cases. The Wilcoxon test results also show that there are no significant difference between the proposed entropy-based approach and the existing approaches in all cases.

Table 10 Non-extreme distribution (NED) using NSGA-III
Table 11 Non-extreme distribution (NED) using MOEA/D
Table 12 Non-extreme distribution (NED) using IBEA
Table 13 Non-extreme distribution (NED) using TAA

6.3 Stability

To assess the stability of the proposed approach, we analyzed 21 successive versions of the JFreeChart. The stability results of proposed entropy-based approach and existing structural, lexical, structural + lexical based approaches with NSGA-III, MOEA/D, IBEA, and TAA algorithms are given in Tables 14, 15, 16, and 17. For each remodularization approaches, the bold value indicates the best stability result. The meanings of symbols used in Wilcoxon test columns of each table are same as described in section 6.1.

Table 14 Stability results of NSGA-III over JFreeChart project
Table 15 Stability results of MOEA/D over JFreeChart project
Table 16 Stability results of IBEA over JFreeChart project
Table 17 Stability results of TAA over JFreeChart project

Now if we see the stability results presented in Table 14, it clearly shows that the structural, lexical, structural + lexical, and entropy-based approach with NSGA-III algorithm, the stability values range between 51.35–76.53%, 68.51–92.75%, 74.65–98.65%, and 81.25–99.35%, respectively. These stability values obtained show that the entropy-based evaluation model achieves higher stability compared to the existing approaches. Similar to the NSGA-III, the results of Tables 15, 16, and 17 shows that the entropy-based approach achieves higher stability in case of MOEA/D, IBEA, and TAA algorithms compared to the existing algorithms.

In summary, the results show that the information theoretic similarity measure has a significant impact on the generation of software modularization with better authoritativeness, NED, and stability values compared to other approaches. Therefore, we think our information theoretic many-objective approach can be useful for remodularizing object-oriented software systems.

6.4 Discussion

Even though the information theoretic similarity measure concept is not a wide-spread concept in SBSE. We believe that it is very useful for remodularizing software systems. The fact that remodularization of a software system using information theoretic similarity measure concept is not used by many SBSE practitioners. In the previous years, the software remodularization process was based on the structural and lexical similarity measure; the proper use of information theoretic similarity measure concept for software remodularization is a novel concept and has been used for the first time in this paper to remodularize software systems. The usefulness of this concept is supported by the experimental results of our approach, which clearly show the advantages of information theoretic similarity measure for software remodularization. Further, as there are five objective functions (i.e., more than three objective functions) to be optimized simultaneously in our remodularization approach, hence considering a many-objective meta-heuristic algorithm is a good alternative.

7 Threats to Validity

In this section, we explore the factors that can influence the validity of the results reported in this paper. For software engineering experimentation, the work reported in literature (Wohlin et al. 2000) has divided threats to validity into four categories: conclusion, internal, construct, and external threats.

  • Conclusion threats to validity: These threats are concerned with the relationship between treatment and outcome. The meta-heuristic algorithms use many random operators (e.g., random initial population generation) and they may produce different results for the same problem instance on a different run. To mitigate this threat, we executed each algorithm on the same problem instance 31 times and collected the sample of results. To compare the results of meta-heuristic algorithms, we obtained results by executing each algorithm on each problem instance 31 times and it is statistically analyzed by using Wilcoxon rank sum test with a 95% confidence level (α =5%).

  • Internal threats to validity: In this category of threats, the effects of experimental design choices, algorithm’s parameter settings, and data collection have been considered. The parameter settings of algorithms are based on the similar previous remodularization studies (Mkaouer et al. 2015, 2015a), while for others many-objective algorithms we used a trial-and-error calibration method.

  • Construct threats to validity: These threats are concerned with the relations between theory and observation. The design of fitness functions are based on previous and widely used software remodularization works (Prajapati and Chhabra 2017a; Corazza et al. 2016; Parashar and Chhabra 2016). For a proper comparison between the two algorithms, we assigned an equal number of fitness evaluation.

  • External threats to validity: The external threats to validity are concerned with the generalization of the results achieved by the proposed approach. The approach has been carried out over medium to large real-world object-oriented software systems with different complexity in terms of number of connections, number of classes, number of modules, and lines of code. The correctness of the authoritativeness remodularization might also affect the results. To obtain the authoritativeness remodularization, we follow the same approach as reported in literature (Prajapati and Chhabra 2017a; Corazza et al. 2016; Erdemir and Buzluca 2014; Wu et al. 2005).

8 Conclusion and Future Directions

A new many-objective software remodularization approach for object-oriented software system has been proposed in this paper. The approach proposes the use of information theoretic proximity measure as a new objective function along with four other objective measures (i.e., inter-module class change coupling, intra-module class change coupling, module count index, and module size index). In addition, our approach utilized different aspects of structural and lexical with their relative weights. Information present in the change-history of the software has also been integrated into the approach for identifying consistent modularization. The proposed approach has been compared with other variants of remodularization evaluation models over seven test software using different search based meta-heuristics (NSGA-III, MOEA/D, IBEA, and TAA). The obtained results have been assessed in terms of authoritative software remodularization, non-extreme distribution, and stability. The results of the evaluation clearly suggest that the proposed approach can be a good alternative to improve the quality of software systems whose quality is not up to the mark. As part of the future work, we perform an empirical study on more problem instance with different configuration settings.