Keywords

1 Introduction

Traceability software refers to a system where artifacts at each phase of the software lifecycle process can be traced by other artifacts, and is now considered as a representation of high-quality software [21, 42, 50]. For example, source code in the implementation phase needs to be traced to artifacts in the maintenance and testing phases, while it can also trace artifacts from previous phases, such as requirement and design phases.

During the development of complex software, traceability plays an important role in reducing maintenance cost and analyzing change impact, but it is difficult to achieve in practice. There are several major challenges as follows: (1) In the software lifecycle process, it is hard to represent traceability between large number of artifacts [50] in a non-formal language (see Fig. 1); (2) Since various types of artifacts and relations are involved in different software systems [12, 42], the abstraction level of a formal model is not easy to determine; (3) The scalability of traceability models may be impacted as the software becomes progressively larger [12, 36].

When dealing with the above challenges, the main approaches of most studies are as followed: (1) Abstraction of artifacts and relations at different phases [19, 20, 26, 30]; (2) Establishment of relations between artifacts in non-adjacent phases, such as the link between source code and requirements [43]; (3) Recording of traceability using different storage structures, such as matrix [25] and hierarchical tree [37]. However, these existing approaches face some problems, such as the potential loss of information about the relations between artifacts in non-adjacent phases, and the large computational burden imposed by the operations of the matrix. To solve the above problems, we propose a formal model called a structure model [48] to describe traceability, which follows the actual software development process to directly establish different types of relations between artifacts.

Fig. 1.
figure 1

Traceability links among artifacts in the software lifecycle process

In some early studies on traceability modeling (e.g. [32, 42]), the types of traceability links are fixed, and then as the research progresses, most of the studies (e.g. [19, 20, 26]) consider abstracting similar types of links into more general relations between artifacts. However, these approaches are somewhat constrained when the types of traceability links in actual development are extended or modified. Furthermore, although some studies [4, 18] achieve the extensibility and customization of traceability links, they do not support well the many-to-many relationship between two artifacts. Another interesting direction is to retrieve traceability links between source and target artifacts based on the probability [6, 24]. In spite of the results showing that the majority of relevant artifacts can be retrieved, some incorrect links are also generated. Moreover, the visualization of traceability is one of the most significant aspects of modeling. Matrices, as two-dimensional structures, are commonly used in commercial tools because they can intuitively portray traceability and can be easily understood by non experts [12, 33]. In addition, linear [44], hierarchical [37], graph-based [3, 41] and cross-referenced [13, 28] representations are also general methods. Nevertheless, the issues related to scalability have not been well addressed in previous work.

Instead of using dynamic behavior to study traceability systems, the structure model is based on the static system structure (a view similar to the structural models in UML [22] and SysML [39]), thus it avoids complex reachability algorithms when analyzing a software system. For convenience, we adopt a similar concept as in SysML [39], considering all the artifacts of different phases as model elements named with various labels in a structure model. Since the structure model does not limit the number of types of relations between model elements and supports the many-to-many relation, developers can automatically or semi-automatically assign relations between model elements according to the real development process using a prototyping tool called Formalized Software Management System (FSMS) where traceability can be simply visualized. Compared to the previous definition of the structure model [48], we introduce version number controls and delete the set \(\lambda \) used for modeling constraints.

For a given structural model, composition and restriction are a kind of operations in horizontal direction, and refinement is a kind of operations in vertical direction [31]. However, to the best of our knowledge, only few studies [11, 34, 38] focus on whether traceability is preserved in different scenarios. This paper discusses in detail whether traceability is preserved under these three operations.

Contribution. This paper makes the following contributions.

  • We propose a novel formal model named structure model to describe different types of artifacts and relations in the software development process. In contrast to classical formal models (such as Petri nets and transition systems) that describe systems from a behavioral perspective, our structure model can better represent traceability from the static structure of the system and do not need to use a complex reachability algorithm.

  • We study three basic operations composition, restriction, and refinement between structure models, respectively. The results show that combination and refinement do not affect traceability, however restriction only preserves traceability in the vertical direction.

  • To support our work, we develop a prototype tool which can visualize traceability.

The remainder of this paper is structured as follows. Section 2 introduces the structure model and presents the definition of traceability. Section 3 discusses the preservation of traceability based on three basic operations. Section 4 introduces our prototype tool. Section 5 is the related work. Section 6 concludes the paper and discusses the future work.

2 Traceability

In this section, we will introduce a formal model called a structure model [48] which is used to model and analyze traceability in the software development process. The structure model consists of model elements, variable traceability relationships between model elements, classifications and version numbers of model elements. Model elements are similar to those in SysML [39]. Based on the above concepts, developers can model and analyze traceability without diving deep into the specific implementation details of a software artifact.

In order to cope with the variable number of relation types between model elements, we choose to represent the structure model with a tuple of variable length. And the concept of version number is introduced in this model for the sake of subsequent analysis of traceability. Some assistant definitions are given below. Let VN be the set of version numbers such that \(\forall R\subseteq \textbf{VN}\times \textbf{VN}\), R is a strict total order. Note that \(\infty \) is a special version number which means a undetermined version number, and \(\infty \not \in \textbf{VN}\).

Definition 1

A structure model (\(\mathcal{S}\mathcal{M}\)) is a tuple \( \langle \textrm{ME},\prec , {\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}},{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) with

  • \(\textrm{ME}\), a finite set of the model elements,

  • \(\prec \subseteq \textrm{ME}\times \textrm{ME}\), the containment relation such that it is a (irreflexive) partial order,

  • \(\forall i\in \{1,\cdots ,n\},{\mathop {\hookrightarrow }\limits ^{i}}\subseteq \textrm{ME}\times \textrm{ME}\), the dependency relation,

  • \(\forall j\in \{1,\cdots ,m\},{\mathop {\tau }\limits ^{j}}\subseteq \textrm{ME}\), the type set of model elements such that \(\forall e\in \textrm{ME},\exists \tau \in \{{\mathop {\tau }\limits ^{1}},\cdots ,\cdots ,{\mathop {\tau }\limits ^{m}}\}: e\in \tau \).

  • \(\upsilon s:\textrm{ME}\longrightarrow \textbf{VN}\cup \{\infty \}\), the initial version function, and

  • \(\upsilon e:\textrm{ME}\longrightarrow \textbf{VN}\cup \{\infty \}\), the final version function.

A model element is assigned the initial version number when it is created, whereas it is assigned the final version number when it is inactivated. If the version number of a model element is not determined, it is assigned \(\infty \).

In order to better understand the definition, we use the structure model to represent a simple student information management system in the following example.

Fig. 2.
figure 2

The partial relations among software artifacts in a system

Example 1

We here present the example shown in Fig. 2. In this example, the initial requirements of a student information management system are as follows.

  • R: The system shall be able to manage students’ information.

  • R1: The system shall allow students to choose course.

  • R2: The system shall allow students to check their course scores.

Obviously, the requirements R1 and R2 are both contained in R. In addition to requirements, the other model elements in Fig. 2 correspond to artifacts at different phases of software development. U1, U2, U3, U4 represent use cases, C1, C2, C3 denote classes, S1 denotes the sequence diagram, and D1 is the corresponding implementation code of S1. For convenience, the specific detail for each artifact is not given in Fig. 2.

There exist four types of relations between model elements: containment, trace, extend, composition. Note that the relations between these model elements are assigned by the software engineer. Thus the student information management system can be represented by a structure model \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec , {\mathop {\hookrightarrow }\limits ^{trace}},{\mathop {\hookrightarrow }\limits ^{extend}}, {\mathop {\hookrightarrow }\limits ^{composition}}, {\mathop {\tau }\limits ^{requirements}},{\mathop {\tau }\limits ^{design}}, {\mathop {\tau }\limits ^{implementation}}, \upsilon s, \upsilon e\rangle \) with \(\textrm{ME}=\{R,R1,R2,U1,U2,U3,U4,C1,C2,C3,S1,D1\},\prec =\{(R1,R),(R2,R)\}, {\mathop {\hookrightarrow }\limits ^{trace}}=\{(U1,R1),(U2,R1),(U3,R1),(U4,R2),(C1,U1),(C1,U2),(C1,U3),(C2,U4),(C3,U2),(C3,U3),(C3,U4),(S1,C1),(S1,C3),(D1,S1)\},{\mathop {\hookrightarrow }\limits ^{extend}}=\{(U3,U2)\},{\mathop {\hookrightarrow }\limits ^{composition}}=\{(C2,C1)\},{\mathop {\tau }\limits ^{requirements}}=\{R,R1,R2,U1,U2,U3,U4\},{\mathop {\tau }\limits ^{design}}=\{C1,C2,C3,S1\},{\mathop {\tau }\limits ^{implementation}}=\{D1\}\). Here, we may suppose that the current version number is 1.0, thus \(\forall e\in \textrm{ME}, \upsilon s(e)=1.0\) and \(\upsilon e(e)=\infty \).

This example shows that developers can visually represent the model elements and their relations using a structure model. Note that the version numbers are attributes of model elements.

Moreover, in order to make the paper more readable, we summarize some notations which will be used later (see Table 1).

Table 1. Notations.

Definition 2

Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec , {\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}},{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) be a structure model.

  1. (1)

    A sequence \(rc=x_1\cdots x_p (p>1)\) is called a relation chain in \(\mathcal{S}\mathcal{M}\) iff \(\forall i\in \{1,\cdots , p-1\},x_i,x_{i+1}\in \textrm{ME}\), \((x_i,x_{i+1})\in (\prec \cup {\mathop {\hookrightarrow }\limits ^{1}}\cup \cdots \cup {\mathop {\hookrightarrow }\limits ^{n}})\vee (x_{i+1},x_{i})\in (\prec \cup {\mathop {\hookrightarrow }\limits ^{1}}\cup \cdots \cup {\mathop {\hookrightarrow }\limits ^{n}})\). \(RC(\mathcal{S}\mathcal{M})\) denotes all possible relation chains in \(\mathcal{S}\mathcal{M}\).

  2. (2)

    A sequence \(dc=x_1\cdots x_p(p>1)\) is called a dependency chain in \(\mathcal{S}\mathcal{M}\) iff \(\forall i\in \{1,\cdots , p-1\},x_i,x_{i+1}\in \textrm{ME}\), \((x_i,x_{i+1})\in (\prec \cup {\mathop {\hookrightarrow }\limits ^{1}}\cup \cdots \cup {\mathop {\hookrightarrow }\limits ^{n}})\). \(\hat{{dc}}\) denotes the model elements in the dependency chain \(dc=x_1\cdots x_p\), that is, \(\hat{{dc}}=\{x_1,\cdots ,x_p\}\). \(DC(\mathcal{S}\mathcal{M})\) denotes all possible dependency chains in \(\mathcal{S}\mathcal{M}\). [dc] denotes the number of model elements in the dependency chain dc, that is, \([{{dc}}]=p\).

Clearly, a relation chain does not distinguish the direction of relations, while a dependency chain is a directed sequence. For instance, there exists a dependency chain D1S1C3U3U2R1R in Example 1, and this dependency chain is also a relation chain.

Proposition 1

If \(\mathcal{S}\mathcal{M}\) is a structure model, then \(DC(\mathcal{S}\mathcal{M})\subseteq RC(\mathcal{S}\mathcal{M})\).

Proof

This proof is straightforward.

Proposition 1 states that the dependency chain is a special type of the relation chain.

Traceability is one of the significant criteria for assessing software quality [42]. Since there exist several different model elements (artifacts) for each phase of the software lifecycle process, and these model elements are largely isolated, it is necessary to correlate between various model elements through traceability [2]. Next, we will present some related concepts and give a formal definition of traceability.

The traceability of software systems can be classified as horizontal or vertical traceability [35, 36, 49]. The definition of traceability is as follows.

Definition 3

Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec , {\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}},{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) be a structure model and specify a system. Let \({\mathop {\tau }\limits ^{requirements}}\in \{{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}\}\) and \({\mathop {\tau }\limits ^{requirements}}\) be the set of the model elements for representing requirements.

  1. (1)

    \(\mathcal{S}\mathcal{M}\) is said to be horizontally traceable iff \(\forall e\in \textrm{ME},\exists dc=x_1\cdots x_p\in DC(\mathcal{S}\mathcal{M}): e\in \hat{dc}\) and \( x_p\in {\mathop {\tau }\limits ^{requirements}}\).

  2. (2)

    \(\mathcal{S}\mathcal{M}\) is said to be vertically traceable iff \(\forall e\in \textrm{ME}, \exists dc=x_1\cdots x_p\in DC(\mathcal{S}\mathcal{M}): e\in \hat{dc}\) and \(\forall i\in \{1,\cdots , p-1\}, \upsilon s(x_{i+1})\le \upsilon s(x_{i})\).

  3. (3)

    \(\mathcal{S}\mathcal{M}\) is said to be traceable iff \(\forall e\in \textrm{ME},\exists dc=x_1\cdots x_p\in DC(\mathcal{S}\mathcal{M}): e\in \hat{dc}, x_p\in {\mathop {\tau }\limits ^{requirements}}\) and \( \forall i\in \{1,\cdots , p-1\}, \upsilon s(x_{i+1})\le \upsilon s(x_{i})\).

Here, horizontal traceability considers that all software artifacts directly or indirectly depend on requirements artifacts, whereas vertical traceability focuses on the version changes of model elements [43], that is, the version number of a model element is greater than or equal to that of its dependent model elements. Obviously, though dependency chains are directed, they can be reversely traversed based on directed graphs. Thus, forward traceability and backward traceability [5, 51] are both contained in this definition. The definition considers not only the inner traceability of a model, but also the traceability among multiple models. Note that as long as there is an isolated artifact in a software system, the software system is not traceable.

Example 2

For each model element of the structure model \(\mathcal{S}\mathcal{M}\) in Fig. 2, there is a dependency chain containing this model element such that the elements of this chain are directly or indirectly related to requirements. Therefore according to Definition 3(1), \(\mathcal{S}\mathcal{M}\) is horizontally traceable. In addition, since all model elements in the structure model \(\mathcal{S}\mathcal{M}\) have the same version number, by Definition 3(2), \(\mathcal{S}\mathcal{M}\) is vertically traceable. Thus, by Definition 3(3), \(\mathcal{S}\mathcal{M}\) is traceable.

3 The Preservation of Traceability

In this section, we will introduce three common operations, and explore the preservation of traceability in the software development process.

A large-scale software project is often divided into several smaller projects which are developed concurrently in practice. A structure model is used to model and analyze the traceability of each project whichever is large or small. Thus, from the perspective of traceability, every project corresponds to a structure model. The decomposition of a complex software system into multiple subsystems can correspond to that of a structure model into multiple substructure models.

Definition 4

Let \(\mathcal{S}\mathcal{M}'=\langle \textrm{ME}',\prec ',{\mathop {\hookrightarrow '}\limits ^{1}},\cdots , {\mathop {\hookrightarrow '}\limits ^{n}}, {\mathop {\tau '}\limits ^{1}},\cdots ,{\mathop {\tau '}\limits ^{m}}, \upsilon s', \upsilon e'\rangle \) and \(\mathcal{S}\mathcal{M}''=\langle \textrm{ME}'',\prec '', {\mathop {\hookrightarrow ''}\limits ^{1}},\cdots , {\mathop {\hookrightarrow ''}\limits ^{n}},{\mathop {\tau ''}\limits ^{1}},\cdots ,{\mathop {\tau ''}\limits ^{m}}, \upsilon s'', \upsilon e''\rangle \) be two structure models.

A structure model \(\mathcal{S}\mathcal{M}'\) is called a substructure of \(\mathcal{S}\mathcal{M}''\), denoted as \(\mathcal{S}\mathcal{M}'\sqsubseteq \mathcal{S}\mathcal{M}''\), iff \(\textrm{ME}'\subseteq \textrm{ME}'', \prec '\subseteq \prec ''\), \(\forall i\in \{1,\cdots ,n\}, {\mathop {\hookrightarrow '}\limits ^{i}}\subseteq {\mathop {\hookrightarrow ''}\limits ^{i}}, \forall j\in \{1,\cdots ,m\}:{\mathop {\tau '}\limits ^{j}}\subseteq {\mathop {\tau ''}\limits ^{j}}\) and \(\forall e\in \textrm{ME}':\upsilon s'(e)=\upsilon s''(e)\wedge \upsilon e'(e)=\upsilon e''(e)\). A structure model \(\mathcal{S}\mathcal{M}'\) is called a proper substructure of \(\mathcal{S}\mathcal{M}''\), denoted as \(\mathcal{S}\mathcal{M}'\sqsubset \mathcal{S}\mathcal{M}''\), iff \(\mathcal{S}\mathcal{M}'\sqsubseteq \mathcal{S}\mathcal{M}''\) and \(\mathcal{S}\mathcal{M}'\ne \mathcal{S}\mathcal{M}''\).

Proposition 2

Let \(\mathcal{S}\mathcal{M}'\) and \(\mathcal{S}\mathcal{M}''\) be two structure models. If \(\mathcal{S}\mathcal{M}'\sqsubseteq \mathcal{S}\mathcal{M}''\), then \(DC(\mathcal{S}\mathcal{M}')\subseteq DC(\mathcal{S}\mathcal{M}'')\).

Proof

According to Definition 4, all model elements and relations of \(\mathcal{S}\mathcal{M}'\) are in \(\mathcal{S}\mathcal{M}''\). Therefore, all dependency chains of \(\mathcal{S}\mathcal{M}'\) are in \(\mathcal{S}\mathcal{M}''\). By Definition 2(2), \(DC(\mathcal{S}\mathcal{M}')\subseteq DC(\mathcal{S}\mathcal{M}'')\).

3.1 Composition

Once all subsystems of a system are completed, all software artifacts in the subsystems should be composed. The model elements and relations in every subsystem should be preserved.

Definition 5

Let \(\mathcal{S}\mathcal{M}'=\langle \textrm{ME}',\prec ',{\mathop {\hookrightarrow '}\limits ^{1}},\cdots , {\mathop {\hookrightarrow '}\limits ^{n}}, {\mathop {\tau '}\limits ^{1}},\cdots ,{\mathop {\tau '}\limits ^{m}}, \upsilon s', \upsilon e'\rangle \) and \(\mathcal{S}\mathcal{M}''=\langle \textrm{ME}'',\prec '', {\mathop {\hookrightarrow ''}\limits ^{1}},\cdots , {\mathop {\hookrightarrow ''}\limits ^{n}},{\mathop {\tau ''}\limits ^{1}},\cdots ,{\mathop {\tau ''}\limits ^{m}}, \upsilon s'', \upsilon e''\rangle \) be two structure models.

If \(\forall e\in \textrm{ME}'\cap \textrm{ME}'',\upsilon s'(e)=\upsilon s''(e)\wedge \upsilon e'(e)=\upsilon e''(e)\), and \(\prec '\cup \prec ''\) is a (irreflexive) partial order, the composition of \(\mathcal{S}\mathcal{M}'\) and \(\mathcal{S}\mathcal{M}''\) is defined as \(\mathcal{S}\mathcal{M}'\uplus \mathcal{S}\mathcal{M}''=\) \(\langle \textrm{ME},\prec , {\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}},{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) where \(\textrm{ME}=\textrm{ME}'\cup \textrm{ME}'', \prec =\prec '\cup \prec ''\), \(\forall i\in \{1,\cdots ,n\}\): \({\mathop {\hookrightarrow }\limits ^{i}}={\mathop {\hookrightarrow '}\limits ^{i}}\cup {\mathop {\hookrightarrow ''}\limits ^{i}}\), \(\forall j\in \{1,\cdots ,m\}: {\mathop {\tau }\limits ^{j}}={\mathop {\tau '}\limits ^{j}}\cup {\mathop {\tau ''}\limits ^{j}}\), \(\forall e\in \textrm{ME}': \upsilon s(e)=\upsilon s'(e)\wedge \upsilon e(e)=\upsilon e'(e)\), and \(\forall e\in \textrm{ME}'': \upsilon s(e)=\upsilon s''(e)\wedge \upsilon e(e)=\upsilon e''(e) \). \(\mathcal{S}\mathcal{M}'\), \(\mathcal{S}\mathcal{M}''\) are said to be composable.

Fig. 3.
figure 3

The composition of two structure models

Here, when there are different types of relations between model elements in two composable structure models, we need to equivalently translate them into the two structure models which have the same relations before composition. For instance, in Fig. 3, there exist two types of relations in \(\mathcal{S}\mathcal{M}_1\): containment and trace, while there exist two types of relations in \(\mathcal{S}\mathcal{M}_2\): trace and include. \(\mathcal{S}\mathcal{M}_1\) and \(\mathcal{S}\mathcal{M}_2\) can be translated into the following two structure models \(\mathcal{S}\mathcal{M}'\) and \(\mathcal{S}\mathcal{M}''\), respectively.

$$\mathcal{S}\mathcal{M}'=\langle \textrm{ME}',\prec ',{\mathop {\hookrightarrow '}\limits ^{trace}},{\mathop {\hookrightarrow '}\limits ^{include}}, {\mathop {\tau '}\limits ^{requirements}},{\mathop {\tau '}\limits ^{design}}, \upsilon s', \upsilon e'\rangle \text { with }{\mathop {\hookrightarrow '}\limits ^{include}}=\emptyset ,$$
$$\mathcal{S}\mathcal{M}''=\langle \textrm{ME}'',\prec '',{\mathop {\hookrightarrow ''}\limits ^{trace}},{\mathop {\hookrightarrow ''}\limits ^{include}}, {\mathop {\tau ''}\limits ^{requirements}},{\mathop {\tau ''}\limits ^{design}}, \upsilon s'', \upsilon e''\rangle \text { with }\prec ''=\emptyset .$$

Obviously, \(\mathcal{S}\mathcal{M}_1=\mathcal{S}\mathcal{M}'\) and \(\mathcal{S}\mathcal{M}_2=\mathcal{S}\mathcal{M}''\). Thus \(\mathcal{S}\mathcal{M}_1\) and \(\mathcal{S}\mathcal{M}_2\) have the same type of relations after translation. Note that we need to perform a similar translation when composing two structure models with different types of classifications. By Definition 5, \(\mathcal{S}\mathcal{M}_1\) and \(\mathcal{S}\mathcal{M}_2\) are composable. The structure model \(\mathcal{S}\mathcal{M}_3\) is the composition of \(\mathcal{S}\mathcal{M}_1\) and \(\mathcal{S}\mathcal{M}_2\) in Fig. 3.

Proposition 3

Let \(\mathcal{S}\mathcal{M}\), \(\mathcal{S}\mathcal{M}'\) and \(\mathcal{S}\mathcal{M}''\) be three structure models. And let every two of the three structure models be composable. Then

  1. (1)

    \(\mathcal{S}\mathcal{M}'\uplus \mathcal{S}\mathcal{M}''\) is a structure model.

  2. (2)

    \(\mathcal{S}\mathcal{M}'\uplus \mathcal{S}\mathcal{M}''=\mathcal{S}\mathcal{M}''\uplus \mathcal{S}\mathcal{M}'\).

  3. (3)

    \((\mathcal{S}\mathcal{M}\uplus \mathcal{S}\mathcal{M}')\uplus \mathcal{S}\mathcal{M}''=\mathcal{S}\mathcal{M}\uplus (\mathcal{S}\mathcal{M}'\uplus \mathcal{S}\mathcal{M}'')\).

Proof

This proof is straightforward.

Proposition 3 states that the composition of structure models has closure, commutativity and associativity.

Theorem 1

Let \(\mathcal{S}\mathcal{M},\mathcal{S}\mathcal{M}'\) be two composable structure models. If \(\mathcal{S}\mathcal{M}\) and \(\mathcal{S}\mathcal{M}'\) are traceable, then \(\mathcal{S}\mathcal{M}\uplus \mathcal{S}\mathcal{M}'\) is traceable.

Proof

Assume that \(\exists \mathcal{S}\mathcal{M}''=\mathcal{S}\mathcal{M}\uplus \mathcal{S}\mathcal{M}'=\langle \textrm{ME}'',\prec '',{\mathop {\hookrightarrow ''}\limits ^{1}},\cdots , {\mathop {\hookrightarrow ''}\limits ^{n}}, {\mathop {\tau ''}\limits ^{1}},\cdots ,{\mathop {\tau ''}\limits ^{m}}, \upsilon s'', \upsilon e''\rangle \) and \({\mathop {\tau ''}\limits ^{requirements}}\in \{{\mathop {\tau ''}\limits ^{1}},\cdots ,{\mathop {\tau ''}\limits ^{m}}\}\). By Definition 5, \(\forall e\in \textrm{ME}'':e\in \textrm{ME}\vee e\in \textrm{ME}'\). When \(e\in \textrm{ME}\), since \(\mathcal{S}\mathcal{M}\) is traceable, according to Definition 3 (3), \(\exists dc_1=x_1\cdots x_p\in DC(\mathcal{S}\mathcal{M}): e\in \hat{dc_1}, x_p\in {\mathop {\tau }\limits ^{requirements}}\) and \( \forall i\in \{1,\cdots , p-1\}, \upsilon s(x_{i+1})\le \upsilon s(x_{i})\). When \(e\in \textrm{ME}'\), since \(\mathcal{S}\mathcal{M}'\) is traceable, according to Definition 3 (3), \(\exists dc_2=y_1\cdots y_q\in DC(\mathcal{S}\mathcal{M}'): e\in \hat{dc_2}, y_q\in {\mathop {\tau }\limits ^{requirements}}\) and \( \forall i\in \{1,\cdots , q-1\}, \upsilon s'(y_{i+1})\le \upsilon s'(y_{i})\). By Proposition 2, \(\mathcal{S}\mathcal{M}\sqsubseteq \mathcal{S}\mathcal{M}''\wedge \mathcal{S}\mathcal{M}'\sqsubseteq \mathcal{S}\mathcal{M}'':DC(\mathcal{S}\mathcal{M})\subseteq DC(\mathcal{S}\mathcal{M}'')\wedge DC(\mathcal{S}\mathcal{M}')\subseteq DC(\mathcal{S}\mathcal{M}'')\). Moreover, \(\upsilon s''(e)=\upsilon s(e)\vee \upsilon s''(e)=\upsilon s'(e)\), thus \(\exists dc_3=z_1\cdots z_k\in DC(\mathcal{S}\mathcal{M}''): e\in \hat{dc_3}, z_k\in {\mathop {\tau }\limits ^{requirements}}\) and \( \forall i\in \{1,\cdots , k-1\}, \upsilon s''(z_{i+1})\le \upsilon s''(z_{i})\). By Definition 3 (3), \(\mathcal{S}\mathcal{M}\uplus \mathcal{S}\mathcal{M}'\) is traceable.

This theorem shows that the composition of two traceable structure models is traceable.

3.2 Restriction

In Sect. 3.1, we have introduced the composition operation which describes a structure model becoming increasingly large and complex. By contrary, there exists some research on model decomposition [1, 9, 45], which can automatically or semi-automatically obtain sub-models. Thus, we next discuss how to decompose a structure model by restriction.

Definition 6

Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec ,{\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}},{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) be a structure model. For \(X\subseteq \textrm{ME}\), the restriction of \(\mathcal{S}\mathcal{M}\) to X is defined as a structure model \(\mathcal{S}\mathcal{M}|_X=\langle \textrm{ME}',\prec ',{\mathop {\hookrightarrow '}\limits ^{1}},\cdots , {\mathop {\hookrightarrow '}\limits ^{n}}, {\mathop {\tau '}\limits ^{1}},\cdots ,{\mathop {\tau '}\limits ^{m}}, \upsilon s', \upsilon e'\rangle \) with

  • \(\textrm{ME}'=X\),

  • \(\prec '=\{(x,y)|\forall x,y\in X, (x,y)\in \prec \}\),

  • \(\forall i\in \{1,\cdots ,n\}:{\mathop {\hookrightarrow '}\limits ^{i}} =\{(x,y)| \forall x,y\in X, (x,y)\in {\mathop {\hookrightarrow }\limits ^{i}} \}\) ,

  • \(\forall j\in \{1,\cdots ,m\}:{\mathop {\tau '}\limits ^{j}}= X \cap {\mathop {\tau }\limits ^{j}}\) ,

  • \(\forall x\in X, \upsilon s'(x)=\upsilon s(x)\), and

  • \(\forall x\in X, \upsilon e'(x)=\upsilon e(x)\).

Here, we can obtain various substructures of a structure model by restriction.

Proposition 4

Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec ,{\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}}, {\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) be a structure model. Then \(\forall X\subseteq \textrm{ME}, \mathcal{S}\mathcal{M}|_X\sqsubseteq \mathcal{S}\mathcal{M}\).

Proof

According to Definition 4 and Definition 6, the result obviously holds.

The restriction of a structure model must be the substructure of the original structure model, but the substructure of a structure model may not be the restriction of the original model. For example, both \(\mathcal{S}\mathcal{M}_2\) and \(\mathcal{S}\mathcal{M}_3\) are obviously proper substructures of \(\mathcal{S}\mathcal{M}_1\) (see Fig. 4) and thus \(\mathcal{S}\mathcal{M}_2\sqsubset \mathcal{S}\mathcal{M}_1\), \(\mathcal{S}\mathcal{M}_3\sqsubset \mathcal{S}\mathcal{M}_1\). Then \(\mathcal{S}\mathcal{M}_2\) is obviously a restriction of \(\mathcal{S}\mathcal{M}_1\) to \(X=\{R,U1,U2\}\), but \(\mathcal{S}\mathcal{M}_3\) is not \(\mathcal{S}\mathcal{M}_1|_X\).

Fig. 4.
figure 4

An example for explaining the restriction of a structure model

Proposition 5

Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec ,{\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}}, \upsilon s, \upsilon e\rangle \) be a structure model. If \(\forall X,Y\subseteq \textrm{ME}: \mathcal{S}\mathcal{M}|_X\uplus \mathcal{S}\mathcal{M}|_Y =\mathcal{S}\mathcal{M} \), then \(X\cup Y=\textrm{ME}\).

Proof

According to Definition 5 and Definition 6, the result clearly holds.

This proposition shows that when two substructures are composed to obtain the original structure model, the two substructures necessarily contain all the model elements of the original structure model.

Theorem 2

Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec ,{\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}},{\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}}, \upsilon s, \upsilon e\rangle \) be a structure model and \(\mathcal{S}\mathcal{M}\) be traceable. Then \(\forall X\subseteq \textrm{ME}\), \(\mathcal{S}\mathcal{M}|_X\) is vertically traceable.

Proof

This proof is straightforward.

Theorem 2 states that vertical traceability can be preserved after the restriction. However, when the restricted structure model contains isolated elements or does not contain requirements, horizontal traceability is obviously influenced.

3.3 Refinement

Refinement is another important operation during the software development process. It means that a model at higher abstraction level is transformed into the corresponding concrete one at lower abstraction level. Many researchers have studied the refinement operation based on different models [1, 14, 46]. We here investigate the refinement operation between structure models in this subsection.

We assume a fixed set \({\textbf {ME}}\) of model elements. Let \({\textbf {SM}}\) denotes the set of all structure models. The empty structure model \(\langle \emptyset ,\emptyset ,\emptyset ,\cdots , \emptyset , \emptyset , \emptyset \rangle \) is denoted by \(\varnothing \).

Definition 7

(1) A refinement function for structure models is a total function \(ref: {\textbf {ME}} \rightarrow {\textbf {SM}}\backslash \{\varnothing \}\) such that \(\forall e\in {\textbf {ME}}, ref(e)=\langle \textrm{ME},\prec ,{\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}}, {\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}},\upsilon s, \upsilon e\rangle \) where

  • \(\exists e'\in \textrm{ME}\), \(e'\) is a copy of e, \(\prec = \{(x,e')\mid x\in \textrm{ME}\setminus \{e'\}\}\), and

  • \(\forall x,y\in \textrm{ME},\upsilon s(x)=\upsilon s(y)\wedge \upsilon e(x)=\upsilon e(y)\).

(2) Let \(\mathcal{S}\mathcal{M}=\langle \textrm{ME},\prec ,{\mathop {\hookrightarrow }\limits ^{1}},\cdots , {\mathop {\hookrightarrow }\limits ^{n}}, {\mathop {\tau }\limits ^{1}},\cdots ,{\mathop {\tau }\limits ^{m}},\upsilon s, \upsilon e\rangle \) be a structure model. Let \(e\in \textrm{ME}\), \(ref(e)=\langle \textrm{ME}',\prec ',{\mathop {\hookrightarrow '}\limits ^{1}},\cdots , {\mathop {\hookrightarrow '}\limits ^{n}}, {\mathop {\tau '}\limits ^{1}},\cdots ,{\mathop {\tau '}\limits ^{m}},\upsilon s', \upsilon e'\rangle \) such that \(\forall x\in \textrm{ME}',\upsilon s'(x)\ge \upsilon s(e)\wedge \upsilon e'(x)\ge \upsilon e(e)\). Moreover, let \(\textrm{ME}\cap \textrm{ME}'=\emptyset \). The refinement of \(\mathcal{S}\mathcal{M}\) under ref(e) is the structure model \(\mathcal{S}\mathcal{M}[e.ref(e)]=\langle \textrm{ME}'',\prec '',{\mathop {\hookrightarrow ''}\limits ^{1}},\cdots , {\mathop {\hookrightarrow ''}\limits ^{n}}, {\mathop {\tau ''}\limits ^{1}},\cdots ,{\mathop {\tau ''}\limits ^{m}},\upsilon s'', \upsilon e''\rangle \) with

  • \(\textrm{ME}''=\textrm{ME}\cup \textrm{ME}'\),

  • \(\prec ''= \prec \cup \prec '\),

  • \(\exists i\in \{1,\cdots ,n\}:{\mathop {\hookrightarrow ''}\limits ^{i}}={\mathop {\hookrightarrow ''}\limits ^{refine}}={\mathop {\hookrightarrow }\limits ^{refine}} \cup {\mathop {\hookrightarrow '}\limits ^{refine}}\cup \{(e',e)\}\),

  • \(\forall j\in \{1,\cdots ,n\}\backslash \{i\}: {\mathop {\hookrightarrow ''}\limits ^{j}}={\mathop {\hookrightarrow }\limits ^{j}}\cup {\mathop {\hookrightarrow '}\limits ^{j}}\),

  • \(\forall k\in \{1,\cdots ,m\}: {\mathop {\tau ''}\limits ^{k}}={\mathop {\tau }\limits ^{k}}\cup {\mathop {\tau '}\limits ^{k}}\),

  • \(\forall p\in \mathrm {ME'}: \upsilon s''(p)=\upsilon s'(p)\wedge \upsilon e''(p)=\upsilon e'(p)\), and

  • \(\forall q\in \textrm{ME}: \upsilon s''(q)=\upsilon s(q) \wedge \upsilon e''(q)=\upsilon e(q)\).

Definition 7 shows that if the model element of a structure model is refined, then the newly obtained structure model is the refinement of the original one. In order not to pollute the relations between model elements in the original structure model [7, 40], we choose to first copy the model element to be refined and then create new relations based on the replica. Thus the refinement operation includes the following steps:

  1. (1)

    copy e as the new model element \(e'\).

  2. (2)

    assign \(e'\) a new version number, then create the containment relation between \(e'\) and all model elements of ref(e) respectively (although the model elements of ref(e) are a more concrete representation of e, they may be regarded as descendants of e).

  3. (3)

    establish the refine relation between e and \(e'\).

The refinement operation between structure models can be clearly represented graphically. For distinguishing between different version numbers, once a model element is assigned to a new version number, the old version number will become grey in our tool.

For example, we assume that the model element U1 of \(\mathcal{S}\mathcal{M}_1\) needs to be refined in Fig. 5 and there exists a structure model \(\mathcal{S}\mathcal{M}_2=ref(U1)\), then the refinement of \(\mathcal{S}\mathcal{M}_1\) under ref(U1) can be represented by a new structure model \(\mathcal{S}\mathcal{M}_3=\mathcal{S}\mathcal{M}_1[U1.ref(U1)]\).

Fig. 5.
figure 5

An example for interpreting the refinement of a structure model

Proposition 6

Let \(\mathcal{S}\mathcal{M}_1=\langle \textrm{ME}_1,\prec _1,{\mathop {\hookrightarrow _1}\limits ^{1}},\cdots , {\mathop {\hookrightarrow _1}\limits ^{n}}, {\mathop {\tau _1}\limits ^{1}},\cdots ,{\mathop {\tau _1}\limits ^{m}},\upsilon s_1, \upsilon e_1\rangle \) be a structure model. Moreover, let ref be a refinement function for structure models. If \(\forall e\in \textrm{ME}_1:\mathcal{S}\mathcal{M}_2=\mathcal{S}\mathcal{M}_1[e.ref(e)]\), then \(\mathcal{S}\mathcal{M}_1=\mathcal{S}\mathcal{M}_2|_{\textrm{ME}_1}\).

Proof

Since \(\mathcal{S}\mathcal{M}_2\) is a refinement of \(\mathcal{S}\mathcal{M}_1\) under ref(e), by Definition 7 (2), all model elements and relations in \(\mathcal{S}\mathcal{M}_1\) are not change. Assume \(X=\textrm{ME}_1\), since \(X\subseteq \textrm{ME}_2\), by Definition 6, \(\mathcal{S}\mathcal{M}_1\) is a restriction of \(\mathcal{S}\mathcal{M}_2\) to X. Thus \(\mathcal{S}\mathcal{M}_1=\mathcal{S}\mathcal{M}_2|_{\textrm{ME}_1}\).

This proposition states that if a structure model is refined, then the newly obtained structure model preserves all the model elements of the original structure model.

Theorem 3

Let \(\mathcal{S}\mathcal{M}_1=\langle \textrm{ME}_1,\prec _1,{\mathop {\hookrightarrow _1}\limits ^{1}},\cdots , {\mathop {\hookrightarrow _1}\limits ^{n}}, {\mathop {\tau _1}\limits ^{1}},\cdots ,{\mathop {\tau _1}\limits ^{m}}, \upsilon s_1, \upsilon e_1\rangle \) be a structure model and \(\mathcal{S}\mathcal{M}_1\) be traceable. Moreover, let ref be a refinement function for structure models. If \(\forall e\in \textrm{ME}_1:\mathcal{S}\mathcal{M}_2=\mathcal{S}\mathcal{M}_1[e.ref(e)]\), then \(\mathcal{S}\mathcal{M}_2\) is traceable.

Proof

Assume that \(\mathcal{S}\mathcal{M}_2=\langle \textrm{ME}_2,\prec _2,{\mathop {\hookrightarrow _2}\limits ^{1}},\cdots , {\mathop {\hookrightarrow _2}\limits ^{n}}, {\mathop {\tau _2}\limits ^{1}},\cdots ,{\mathop {\tau _2}\limits ^{m}}, \upsilon s_2, \upsilon e_2\rangle \). Since \(\mathcal{S}\mathcal{M}_2\) is a refinement of \(\mathcal{S}\mathcal{M}_1\) under ref(e), by Definition 7, \(\forall x\in \textrm{ME}_2: x\in \textrm{ME}_1\vee x\in \textrm{ME}_2\setminus \textrm{ME}_1\). When \(x\in \textrm{ME}_1\), since \(\mathcal{S}\mathcal{M}_1\) is traceable, by Definition 3 (3), \(\exists dc_1=x_1\cdots x_p\in DC(\mathcal{S}\mathcal{M}_1): x\in \hat{dc_1}, x_p\in {\mathop {\tau }\limits ^{requirements}}\) and \( \forall i\in \{1,\cdots , p-1\}, \upsilon s_1(x_{i+1})\le \upsilon s_1(x_{i})\). Moreover, by Proposition 4 and Proposition 6, \(\mathcal{S}\mathcal{M}_1\sqsubseteq \mathcal{S}\mathcal{M}_2\). According to Proposition 2, \(DC(\mathcal{S}\mathcal{M}_1)\subseteq DC(\mathcal{S}\mathcal{M}_2)\), thus \(dc_1\in DC(\mathcal{S}\mathcal{M}_2)\). When \(x\in \textrm{ME}_2\setminus \textrm{ME}_1\), by Definition 7, \(\exists dc_2=y_1\cdots y_q\in DC(\mathcal{S}\mathcal{M}_2): y_q=e\wedge y_1=x\) and \(\upsilon s_2(e)\le \upsilon s_2(x)\). Thus \(\exists dc_3=z_1\cdots z_k\in DC(\mathcal{S}\mathcal{M}_2): x\in \hat{dc_3}, z_k\in {\mathop {\tau }\limits ^{requirements}}\) and \( \forall i\in \{1,\cdots , k-1\}, \upsilon s_2(z_{i+1})\le \upsilon s_2(z_{i})\). By Definition 3 (3), \(\mathcal{S}\mathcal{M}_2\) is traceable.

This theorem states that the refinement operation does not change traceability in the software development process.

4 Tool: Formalized Software Management System

In order to facilitate the investigation of traceability, a prototype tool named Formalized Software Management System (FSMS) was developed with JavaScript. This tool mainly consists of two independently developed modules (See Fig. 6 for the main interface). One is a draggable drawing modules with traceability management, the other is the consistency management module. This paper is dedicated to the former.

Developers can use different components in the toolbox to visualize the artifacts and the relations between them in the software system, and then the graphic representation of this system can be automatically converted to JSON format for storage. Moreover, this tool implements several analysis functions based on the structure model, such as change impact analysis. The tool has been deployed on the website http://124.220.63.75/ now. More functions will be added to this tool step by step.

5 Related Work

Software traceability has been studied for many years (see References [2, 36, 51] for surveys). Most of the studies (see References [4, 8, 15] for surveys) involving traceability do not apply formal methods, but our work is based on formal methods to rigorously model traceability and to analysis traceability. Therefore, we only provide some comparisons between our work and the most related work on traceability modeling and change impact analysis.

Fig. 6.
figure 6

The main interface of the prototype tool

Goknil et al. [19, 20] mainly focus on the well-defined traceability between the requirement (R) and architecture (A), and they have investigated the reasoning, consistency checking and automatic generation of trace relations based on the system’s specifications. Rempel and Mader [43] focus only on the relations between the requirement and implementation (code) phases. However, we hold that the artifacts in the following phases depend on those in the preceding phases, so traceability between non-adjacent or abstract phases may be less practical to lose some implicit details in software lifecycle process.

In [26], six types of traceability relations, which are used to analyze the evolution of the requirement and architecture, are introduced: goal dependency, temporal dependency, service dependency, conditional dependency, task dependency and infrastructure dependency. Lago et al. [30] give a scoped approach to identify core traceability paths and distinguish four relations in R &A: drive, depend-on, modify and influence. Based on the above two studies, Goknil et al. [20] generalize these trace types into within-model traces (refines, requires) and between-model traces (satisfies, allocatedTo). Spanoudakis et al. [47] propose a rule-based method for establishing traceability. This method contains two rules: requirements-to-object-model (RTOM) and inter-requirement traceability (IREQ). Then based on RTOM and IREQ, four different types of traceability relations can be generated between requirement statements and use cases. Cleland-Huang et al. [10] present a novel concept. They consider that the relations between various software artifacts as a direct or indirect link which can be dynamically modified by event notifications. Obviously, neither the different number of relations nor event-based links can be applied to a large-scale development with a very large variety of traceability relations between artifacts. However, our structure model does not limit the number of relation types.

Antoniol et al. [6] propose a seminal approach to retrieve traceability relations between documentation and code based on probability. Their work uses the information retrieval algorithm to calculate the similarity between two artifacts to establish links between them. However, the precision is not high enough, resulting in the possibility of generating incorrect links. In order to establish high quality links, De Lucia et al. [16] introduce a method to understand the semantic context between artifacts using Latent Semantic Indexing, and in [23, 47], how to connect heuristic rules with link retrieval is discussed. Obviously, these methods can reduce the number of incorrect links, but cannot avoid the error. In [17], a platform called Tarskil is used to build the semantics of interactive traceability, which is based on first-order relational logic, thus simplifying the reasoning of relations and consistency checking. Santos et al. [44] use a tool named TIRT to show data with list, which is a kind of one-dimensional approach. However, these two researches mentioned above do not well support the many-to-many relation between artifacts. In Holten’s work [25], he propose to store adjacency relations by a two-dimensional approach, such as traceability matrix. Although this approach is easily understood by the stakeholder and saves storage space, the change of traceability matrix becomes complicated when the structure of the system is changed.

Laghouaouta et al. [29] use the softgoal tree to manage the traceability of the composition of multi-view models in the Model Driven Engineering (MDE). In [7, 27], some approaches to merging trackable links on-demanded are discussed. Obviously, these studies are completely different from our work.

6 Conclusion and Future Work

In this paper, we have investigated how to ensure traceability under the operations such as composition, restriction and refinement in software development process. To demonstrate the availability of these results, we have developed a prototype tool called FSMS that enables traceability visualization.

In future work, we will explore how to make the visualization of our tool better. We also wish to integrate our tool with other software development platforms. On the other hand, automated code generation is another important field of our research based on formal methods. Since automated code generation techniques heavily rely on the traceability, we will focus on how to automatically generate high quality code in the software development process.