Keywords

1 Introduction

Assurance cases (ACs) can be very complex; e.g., an assurance case for an air traffic control system may comprise over 500 pages and 400 referenced documents [34]. Tools to support safety engineers in creating, maintaining and analysing ACs have been developed. For example, Resolute [23] can automatically generate ACs based on a system’s architectural models, while AGSN [35] supports the assessment of an AC’s validity. The development of these tools has been enabled by the introduction of formal syntaxes for ACs, such as the Goal Structuring Notation (GSN) [28]. In this paper, we aim to perform a systematic review of the progress made in the development of tools for ACs. To the best of our knowledge, this is the first such study. More specifically, the main contributions of this work are (1) a comprehensive list of AC tools developed over the past 20 years; and (2) an analysis of these tools according to their functionality.

The remainder of this paper is organised as follows. Sect. 2 presents our methodology for finding and comparing AC tools. In Sect. 3, we present and summarise our findings and potential threats to validity. We conclude by discussing the implications of our work in Sect. 4.

2 Methodology

We carried out a Systematic Literature Review (SLR) in order to establish a complete list of AC tools and provide a comprehensive assessment of their features. Our SLR followed a simplified version of the guidelines proposed by Kitchenham et al. [8], as well as the search strategy proposed by Zhang et al. [46]. It consists of three stages: (1) establishing the quasi-gold standard (QGS) through a manual search of different publication venues, (2) an automated literature search of digital libraries, e.g., Springer Link and IEEE Xplore, and (3) a web-based search for commercial tools and tools that may not have been mentioned in publications. We describe these steps below.

Manual Search and Establishing the QGS. A QGS is a set of high quality studies from the related publication venues on a research topic, e.g., domain-specific conferences and journals recognized by the community in the subject, for a given time span [46]. To create a QGS, relevant publication venues are identified and manually searched in order to retrieve studies that serve as a benchmark for the subsequent automated search. Through consultation with domain experts, we identified six major conferences and journals that published research on ACs: (1) SAFECOMP (International Conference on Computer Safety, Reliability, & Security), (2) HASE (International Symposium on High Assurance Systems Engineering), (3) IMBSA (International Symposium on Model-Based Safety and Assessment), (4) ISSRE (International Symposium on Software Reliability Engineering), (5) Reliability Engineering & System Safety (journal), and (6) COMPSAC (International Conference on Computers, Software & Applications). We performed a manual search through the proceedings of these venues including all associated workshops, for 2015-17 inclusive, yielding 10 relevant AC tool papers which established our QGS.

Defining the Search String and Performing the Automated Search. A careful examination of the papers in our QGS constructed the search string to be “(“Safety Assurance” OR GSN OR SACM OR “Safety Case” OR “Safety Cases” OR “Assurance Case” OR “Assurance Cases” OR “Safety Compliance”) AND (Editor OR Tool OR Editors OR Tools OR Toolset OR Toolsets)". We used it to conduct an automated literature search on IEEE Xplore, Engineering Village, ACM Digital Library and Spring LinkFootnote 1, combined with the criterion that the papers were in English and published after 1998.

IEEE Xplore, Engineering Village, ACM Digital Library, and Springer Link returned 112, 739, 21, and 80 papers respectively, for a total of 952 papers. We checked the resulting papers against our QGS which captured 8/10 papers, achieving the recommended 80% sensitivity [46]. After filtering out duplicate papers, papers not accessible in full text, irrelevant papers (based on a manual review of their abstracts or the full text), we identified 82 papers.

Performing the Web-Based Search. To obtain knowledge about commercial AC tools, tools that were published but were not found by our literature search, or tools that simply were not mentioned in publications, we conducted a web-based searchFootnote 2 using Google as the search engine. We used the same search string as for the literature search and viewed the first 100 results. This step yielded eight additional tools.

Table 1. Tool functionality categories and the corresponding degrees of support.

Evaluating the Tools. Having read all of the publications and the resources gathered by our searches, we established six distinct recurring tool functionalities, using them as the basis for our evaluation. These functionalities are categorized as AC creation, maintenance, assessment, collaboration, reporting and integration (see Table 1). We then defined four levels of tool support for each of the categories, ranging from D (no support) to A (strong support), thus creating our grading criteria. We then graded each tool’s degree of support for each category, using information from the publications and the web resources. Since information in some of the publications can be out of date, we made an effort to use the newest publications so as to arrive at a more accurate evaluation. Please note that our evaluation is based purely on the information found in the above resources rather than on the hands-on testing of the tools.

3 Results

Our systematic literature review discovered a total of 46 AC tools. Eight of these tools (AssureNote [1], PREEVision [3], SMS Pro [4], Artisan GSN modeler [2], Assure-It [45], SEAS [12], TurboAC [5] and eDependabilityCase [33]) were discovered by our web search; two (MMINT-A [22] and Resolute [23]) were identified with the help of domain experts, and the remainder were found by our literature search. Nine tools (AssureNote [1], DECOS Test Bench [11], e-Safety Case [32], GSN CaseMaker ERA [32], ISIS High Integrity Solutions [32], PREEVision [3], SCAPT [10], SEAS [12] and SMS Pro [4]) did not provide sufficient information allowing us to conduct an educated evaluation, and are thus excluded from further discussionFootnote 3.

Out of the 37 AC tools (see Table 2), 32 offer support for GSN [6]. Some exceptions to this are Modus [44] (a plug-in for Enterprise Architect), ACBuilder [27], NOR-STA [24], etc., which have their own notations. Multiple tools (e.g., CertWare [13] and ASCE [40]) also offer support for a variety of different notations, such as the Structured Assurance Case Metamodel (SACM) [7] and the Claims-Arguments-Evidence (CAE) [15] notations, in addition to others. Our findings also show that most of the tools are not domain specific, meaning that they can be used to construct ACs for military, automotive, medical, and nuclear systems, among others. Exceptions to this are tools such as ACBuilder [27] (hardware security analysis) and TurboAC [5] (medical devices). Non domain specific tools (e.g., D-Case Editor [37]) have been marked with a hyphen under the domain column in Table 2.

Table 2. General tool information.
Table 3. Evaluation of capabilities of individual tools.

3.1 Evaluation of the Tools and Discussion

Each tool has been manually evaluated for its support in the previously established categories, with the results shown in Table 3. Figure 1 represents the overall grade distribution for each category. To simplify visualization, all split grades have been rounded up and represented as the higher grade.

Creation. Support for creation of ACs primarily ranges between minimal (43%) and moderate (49%) (see Fig. 1(a)). The notable exceptions, ENTRUST [16] and Resolute [23], offer strong support by providing the automatic generation of ACs, based on various underlying system and/or behavioral models. As previously mentioned however, these tools are domain specific. Unless modified, their use is confined to the specific underlying architectural languages, models, etc., that they support. To our knowledge, a tool that can automatically generate complete ACs for a broad range of domains is yet to be developed. Based on these observations, it would seem that the benefits obtained by creating a strong dependency between ACs and system models come at the cost of flexibility and generalized usability.

Maintenance. Again, the absolute majority of tools provide either minimal (51%) or moderate (41%) support for maintenance (see Fig. 1(b)). Tools with moderate support for maintenance often allow the linking of evidence, models and other artefacts to the corresponding AC elements, making it easy to notify the user of the impacts of the change. In turn, ENTRUST [16] and ETB [19] offer strong support by automatically reflecting artefact changes on the AC. ETB [19] allows the incorporation of 3rd party tools for the purpose of generating evidence and logs timestamps of their invocations in order to determine which analyses are out of date with respect to the current development artefacts, re-running those that are not synchronized. ENTRUST [16] is tightly coupled with the design-time and runtime models of a system. It has the ability to dynamically verify self-adaptive systems at runtime and update their ACs as necessary.

Fig. 1.
figure 1

Overall AC tool support for: (a) creation, (b) maintenance, (c) assessment, (d) collaboration, (e) reporting and (f) integration.

Assessment. Figure 1(c) shows that the results for AC assessment are fairly distributed among all levels of support as compared to the other functional categories, with the majority offering moderate support (38%). The highest percentage of strong support (19%) is seen in this category. Unlike creation and maintenance however, 19% of tools offer no support for assessment. Furthermore, no correlation is seen between support for assessment and any other category, implying that assessment is a fairly standalone tool functionality, the support for which is not largely dependent on the other categories.

Collaboration and Reporting. Most of the tools we surveyed offer no support for collaboration (68%) or reporting (57%). A pronounced trend (see Table 3) is that tools with support in these categories are usually industrial, such as ASCE [40], ISCaDE [32], NOR-STA [24], OpenCERT [31] and SCT: Safety Case Toolkit [9]. Perhaps such capabilities are not receiving adequate interest among researchers, and thus are being developed only after tools reach significant maturity, if at all.

Integration. Support for integration is split between moderate (40%) and none (38%). Not a single tool among the ones we evaluated offered strong support, indicating that some manual integration between other assurance lifecycle activities and the ACs is always required. Table 3 shows a strong correlation between high support for integration, and high support for AC creation and maintenance. It would appear that a more integrated environment allows tighter coupling between various artefacts, such as system models and evidence, subsequently enabling automation through dependencies. As previously discussed however, the creation of these dependencies might introduce limitations in other aspects.

3.2 Threats to Validity

The main threat to validity in our work is the completeness of our list of tools and tool information. Even though our search methodology is thorough, it is possible that it did not capture all existing AC tools. As discussed in Sect. 2, our evaluation was based only on information found in the corresponding tool’s documentation, publications, website and other publically available resources. It is possible that the description of some functionality received a lower grade because it was not adequately described or the relevant resource was unavailable.

4 Summary and Conclusion

In this paper, we reported on a comprehensive identification and a preliminary evaluation of AC tools, comparing them w.r.t. several categories using the available documentation. In the future, we intend to refine our results using deeper analysis, through a systematic evaluation of the tools themselves.

Our experience shows that there is significant room for improvement of the tools in all of the discussed categories. Furthermore, it appears that several categories are interdependent, i.e., high support in one is strongly correlated with high support in another. For example, we expect that improvements in the integration category will significantly benefit other categories such as creation and maintenance. Yet, to the best of our knowledge, there is currently no tool that supports the seamless linking of the various assurance lifecycle processes.