Evaluating guidelines for reporting empirical software engineering studies

Kitchenham, Barbara; Al-Khilidar, Hiyam; Babar, Muhammed Ali; Berry, Mike; Cox, Karl; Keung, Jacky; Kurniawati, Felicia; Staples, Mark; Zhang, He; Zhu, Liming

doi:10.1007/s10664-007-9053-5

Evaluating guidelines for reporting empirical software engineering studies

Published: 17 October 2007

Volume 13, pages 97–121, (2008)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Empirical Software Engineering Aims and scope Submit manuscript

Evaluating guidelines for reporting empirical software engineering studies

Download PDF

Barbara Kitchenham⁴,
Hiyam Al-Khilidar^1,3,
Muhammed Ali Babar²,
Mike Berry^1,3,
Karl Cox¹,
Jacky Keung^1,3,
Felicia Kurniawati¹,
Mark Staples¹,
He Zhang^1,3 &
…
Liming Zhu¹

2080 Accesses
79 Citations
Explore all metrics

Abstract

Background

Several researchers have criticized the standards of performing and reporting empirical studies in software engineering. In order to address this problem, Jedlitschka and Pfahl have produced reporting guidelines for controlled experiments in software engineering. They pointed out that their guidelines needed evaluation. We agree that guidelines need to be evaluated before they can be widely adopted.

Aim

The aim of this paper is to present the method we used to evaluate the guidelines and report the results of our evaluation exercise. We suggest our evaluation process may be of more general use if reporting guidelines for other types of empirical study are developed.

Method

We used a reading method inspired by perspective-based and checklist-based reviews to perform a theoretical evaluation of the guidelines. The perspectives used were: Researcher, Practitioner/Consultant, Meta-analyst, Replicator, Reviewer and Author. Apart from the Author perspective, the reviews were based on a set of questions derived by brainstorming. A separate review was performed for each perspective. The review using the Author perspective considered each section of the guidelines sequentially.

Results

The reviews detected 44 issues where the guidelines would benefit from amendment or clarification and 8 defects.

Conclusions

Reporting guidelines need to specify what information goes into what section and avoid excessive duplication. The current guidelines need to be revised and then subjected to further theoretical and empirical validation. Perspective-based checklists are a useful validation method but the practitioner/consultant perspective presents difficulties.

Categories and Subject Descriptors

K.6.3 [Software Engineering]: Software Management—Software process.

General Terms

Management, Experimentation.

On the pragmatic design of literature studies in software engineering: an experience-based guideline

Article 06 January 2017

A study into the practice of reporting software engineering experiments

Article Open access 18 August 2021

Operationalizing validity of empirical software engineering studies

Article 13 November 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper reports an exercise undertaken by staff and students in the Empirical Software Engineering (ESE) group at NICTA (National ICT Australia) to evaluate the reporting guidelines for controlled experiments proposed by Jedlitschka and Pfahl (2005). In spite of the existence of a specialist book to help software engineers conduct experiments (Wohlin et al. 2000), software engineering experiments are still subject to criticism. The guidelines were developed in response to general criticisms of current standards of performing and reporting empirical studies (Kitchenham et al. 2002), and more specific criticisms that the lack of reporting standards is causing problems when researchers attempt to aggregate empirical evidence because important information is not reported or is reported in an inconsistent fashion (e.g. Pickard et al. 1998; Wohlin et al. 2003).

In fact, controlled experiments are performed infrequently in software engineering. In a recent survey of 5,453 software engineering articles from 12 leading conferences and journals, Sjøberg et al. (2005) found only 103 articles that could be categorized as experiments. However, there is evidence that current reporting practice is inadequate. Dybå et al. (2006) had to exclude 21 experiments from their analysis of power because the authors did not report enough information for a power analysis. Authors did not report any statistical analysis for 14 experiments and in seven cases the experiments were so badly documented that Dybå et al. “did not manage to track which tests answered which hypothesis or research question”. This result confirms the need for reporting guidelines for software engineering experiments.

Jedlitschka and Pfahl recognised that their guidelines need to be evaluated, saying:

“Our proposal has not yet been evaluated e.g. through peer review by stakeholders, or by applying it to a significant number of controlled experiments to check its usability. We are aware that this proposal can only be the first step towards a standardized reporting guideline.” (Jedlitschka and Pfahl 2005)

We agree with the need for guidelines to be evaluated. If the guidelines are themselves flawed, they could make the problem of poor quality reporting worse than it is currently.

Our evaluation exercise took place between 5th October 2005 and December 14th 2005. It was organized as a series of eight working meetings each taking between 1 and 2.5 h. In this paper, we report the evaluation method we used and the results of our evaluation. We have already reported our results to Jedlitschka and Pfahl, so the main purpose of this paper is to report our evaluation method, since it might prove useful to other groups wanting to evaluate the next version of the reporting guidelines or future reporting guidelines for other forms of empirical study such as case studies, surveys, or systematic reviews.

In Section 2 we give a brief overview of the proposed guidelines. In Section 3 we discuss the various options available for evaluating experimental guidelines and provide a rational for our choice of perspective-based reviews. In Section 4 we report our evaluation process. In Section 5 we report our evaluation results. In section 6 we discuss our results.

An earlier version of this paper was presented at ISESE06 (Kitchenham et al. 2006). In this paper, we have extended the report of our evaluation exercise to include:

A more detailed discussion of our evaluation making it clear that we have adopted a method based on perspective-based checklists.
Consideration of the advantages of the guidelines. We identify the questions in each perspective that were addressed by the guidelines.
The full list of amendments classified according to amendment type.
A list of questions that are applicable to all (or most) perspectives. This will enable other users of this evaluation method to separate general questions from perspective specific questions.

2 Proposed Reporting Guidelines

Jedlitschka and Pfahl (2005) propose the reporting structure for experiments shown in Table 1. Table 1 identifies the recommended section and subsection headings in a report of an experiment together with a brief description of the information required in each section and a cross reference to the subsection in the guidelines that discusses the information that authors should supply in each section.

Table 1 Proposed reporting structure

Evaluating guidelines for reporting empirical software engineering studies

Abstract

Background

Aim

Method

Results

Conclusions

Categories and Subject Descriptors

General Terms

Similar content being viewed by others

On the pragmatic design of literature studies in software engineering: an experience-based guideline

A study into the practice of reporting software engineering experiments

Operationalizing validity of empirical software engineering studies

Explore related subjects

1 Introduction

2 Proposed Reporting Guidelines

3 Evaluation Options

4 Applying Perspective-based Reading to Evaluating the Experimental Guidelines

4.1 Evaluation Process

4.2 Identification of the Relevant Perspectives

4.3 Validity of Checklist Approach

4.4 Performing the Reviews

5 Results

6 Discussion and Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation