Presenting software engineering results using structured abstracts: a randomised experiment

Budgen, David; Kitchenham, Barbara A.; Charters, Stuart M.; Turner, Mark; Brereton, Pearl; Linkman, Stephen G.

doi:10.1007/s10664-008-9075-7

Presenting software engineering results using structured abstracts: a randomised experiment

Published: 17 July 2008

Volume 13, pages 435–468, (2008)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Empirical Software Engineering Aims and scope Submit manuscript

Presenting software engineering results using structured abstracts: a randomised experiment

Download PDF

David Budgen¹,
Barbara A. Kitchenham²,
Stuart M. Charters³,
Mark Turner²,
Pearl Brereton² &
…
Stephen G. Linkman²

1344 Accesses
46 Citations
1 Altmetric
Explore all metrics

Abstract

When conducting a systematic literature review, researchers usually determine the relevance of primary studies on the basis of the title and abstract. However, experience indicates that the abstracts for many software engineering papers are of too poor a quality to be used for this purpose. A solution adopted in other domains is to employ structured abstracts to improve the quality of information provided. This study consists of a formal experiment to investigate whether structured abstracts are more complete and easier to understand than non-structured abstracts for papers that describe software engineering experiments. We constructed structured versions of the abstracts for a random selection of 25 papers describing software engineering experiments. The 64 participants were each presented with one abstract in its original unstructured form and one in a structured form, and for each one were asked to assess its clarity (measured on a scale of 1 to 10) and completeness (measured with a questionnaire that used 18 items). Based on a regression analysis that adjusted for participant, abstract, type of abstract seen first, knowledge of structured abstracts, software engineering role, and preference for conventional or structured abstracts, the use of structured abstracts increased the completeness score by 6.65 (SE 0.37, p < 0.001) and the clarity score by 2.98 (SE 0.23, p < 0.001). 57 participants reported their preferences regarding structured abstracts: 13 (23%) had no preference; 40 (70%) preferred structured abstracts; four preferred conventional abstracts. Many conventional software engineering abstracts omit important information. Our study is consistent with studies from other disciplines and confirms that structured abstracts can improve both information content and readability. Although care must be taken to develop appropriate structures for different types of article, we recommend that Software Engineering journals and conferences adopt structured abstracts.

Writing an abstract

Article Open access 10 May 2024

Automating Systematic Literature Review

Identification and prioritization of SLR search tool requirements: an SLR and a survey

Article 24 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The adoption of the evidence-based paradigm for use in software engineering has made considerable progress in the period following the seminal paper published in 2004 (Kitchenham et al. 2004). Since then, over 20 systematic reviews have been published on software engineering issues (Kitchenham et al. 2007).

A key requirement for the continuing development of evidence-based software engineering (EBSE) is the ability to find, evaluate and aggregate all of the appropriate sources of evidence. In particular, the evidence-based paradigm is one that relies heavily upon the use of systematic literature reviews as the means of aggregating the (empirical) evidence that is needed to address a given research question (Kitchenham 2004; Webster and Watson 2002; Petticrew and Roberts 2006). A secondary study such as a systematic literature review requires exhaustive searches of the literature in order to identify potentially relevant primary studies. Such searches involve two stages: firstly researchers need to perform a wide search to identify as many candidate primary studies as possible; secondly they must undertake a more detailed review of these candidates against specific inclusion and exclusion criteria. Indeed, the first step of the search process is very likely to identify many studies that will actually be irrelevant.

Current procedures, based on experience from other domains such as clinical medicine, education, psychology and the social sciences, suggest that a review of the title and abstract of a primary study should be sufficient to enable the researcher to determine whether or not it is relevant to the study being undertaken (Kitchenham 2004). However, when conducting systematic literature reviews in the domain of software engineering, researchers have reported difficulties with identifying whether or not primary studies are relevant to a topic of interest (Brereton et al. 2007; Jedlitschka and Pfahl 2005). This is because the information provided in abstracts is often incomplete, with the effect that it may be necessary to read other parts of the paper to determine whether or not it is relevant to the particular study.

Consulting the full paper not only involves additional time and effort, but also, since not all papers may be available on-line, there may be costs and delays involved in obtaining these. So lack of information in abstracts may significantly increase both the cost and the time required to perform a systematic literature review.

While our motivation for this study is that of systematic reviewing, as described above, there are other occasions where decisions may need to be based upon information that has been retrieved from abstracts. These include practitioners who want to know about the effectiveness of particular tools or techniques; researchers planning to perform additional studies on a topic; and researchers who may be intending to replicate a primary study.

One approach to improving the standard of abstracts is to adopt the use of structured abstracts (Booth 2003; Hartley 2004). A structured abstract employs a standard set of headings through which the authors précis the key aspects of a study, such as its context, aim, method, results and conclusions. The result of empirical studies conducted in Educational Psychology suggests that structured abstracts are a potentially valuable approach to improving the readability and value of abstracts (Hartley 2003), and in Bayley and Eldredge (2003) the authors identify other benefits of adopting this form as both aiding with searching the literature as well as helping to improve the design of a study. A recent study of six dental journals, three of which had adopted structured abstracts in the late 1980s, also provides convincing empirical evidence in favour of improved relevant information content provided through the use of structured abstracts (Sharma and Harrison 2006) (a broader discussion of previous research related to structured abstracts can be found in Kitchenham et al. (2008)).

To investigate whether the same improvements in information content and readability could be obtained if structured abstracts were to be adopted for software engineering papers and reports, we have undertaken two studies. The first of these was an observational study, that used the conventional abstracts presented for the 2004 and 2006 EASE conferences (Evaluation and Assessment in Software Engineering), prepared structured versions of the abstracts, and analysed them using established readability measures. This confirmed that structured abstracts are longer than unstructured abstracts but that they scored more highly when analysed for readability (Kitchenham et al. 2008). In this paper we use the guidelines for reporting empirical studies provided in (Kitchenham et al. 2002) to describe the second study. This took the form of a randomised controlled laboratory experiment in which participants were asked to act as ‘judges’, assessing one abstract of each form for clarity and completeness.

The following sections first identify the detailed research question that the study set out to answer; describe our experimental method, including the preparation of the structured abstracts; summarise the results; discuss possible threats to validity and observe how our results compare with those from other domains. We finally draw our conclusions about the benefits likely to accrue from the adoption of this form for software engineering papers.

2 Research Question

Hartley’s work has suggested that not only is the distinction between structured and unstructured abstracts a significant issue, but also that the typographic layout employed provides an important factor that needs consideration (Hartley and Sydes 1996; Hartley 2000). The second issue leads to two further factors that need to be considered in the specific context of performing systematic reviews in software engineering.

1.
The typographical layout used for many conferences (typically two columns, with the abstract being in the same format) differs quite markedly from the forms used in many journals, which print the abstract in a single-column format, even when using two column format for the main body of the paper.
2.
The difference between reading the abstract on the screen or on paper. Hartley’s work was essentially paper-based, but in recent years the availability of electronic databases and of search engines has altered practice, with the result that (particularly in the context of systematic reviews), the abstract is highly likely to be read from a screen, at least when determining whether to include a particular primary study.

There are therefore several research questions that could have been addressed in this study.

1.
Are structured abstracts easier to understand—which in our context relates to being able to extract the required information about the study?
2.
Are structured abstracts easier to understand when presented in single-column than in double-column format?
3.
Are structured abstracts easier to read when using a screen than when using paper?
4.
How good are the unstructured abstracts currently provided with software engineering papers? In other words, regardless of the question of which form is better—how adequate are existing abstracts?

A comprehensive study should address all of these. However, given finite time and resources we decided to restrict ourselves to addressing the first of those particular questions within a screen-based context, where the abstracts are displayed using a web browser, since this is the format that is most likely to be employed when accessing on-line bibliographic information. This study therefore addressed the question:

When displayed on a screen using a web browser and basic HTML formatting, do structured abstracts contain more relevant information in a more readable format than unstructured ones?

Within this context, we interpret ‘more readable format’ as meaning that the reader can more readily obtain the necessary information about the study described in the paper.

So, for this paper we have set out to investigate the following two hypotheses:

Null Hypothesis 1: Structured abstracts and unstructured abstracts are not significantly different with respect to completeness.
Alternative Hypothesis 1: Structured abstracts are significantly more complete than unstructured abstracts. (Note that we are also interested in the possibility that structured abstracts are less complete than unstructured abstracts.)
Null Hypothesis 2: Structured abstracts are not significantly different from unstructured abstracts with regard to clarity.
Alternative Hypothesis 2: Structured abstracts are significantly clearer than unstructured abstracts.

Although we have presented our experiment in terms of testing two hypotheses, we must point out that having added information to unstructured versions of abstracts it is unlikely that a structured abstract could fail to score better than an unstructured abstract with respect to completeness. Furthermore, if we find that clarity is significantly better for structured abstracts we will not be able to tell whether this is due to the restructuring or due to the additional information. Thus, if we formally reject both null hypotheses we will interpret this to mean that conventional abstracts have problems with respect to completeness and clarity and that structured abstracts are a potential means of addressing these problems. We cannot claim that restructuring abstracts alone will address clarity and completeness problems.

3 Experimental Method and Materials

The study was conducted as a controlled laboratory experiment, where participants were asked to read the abstracts of two different papers, one of which was structured, while the other was not. The participants were asked to assess the information content of each abstract. The order in which the abstracts were presented (i.e. whether they read the structured one first or second) was randomised across participants as was the allocation of abstracts. Before starting on the study, we first developed a comprehensive research protocol detailing our plans (Budgen et al. 2007b). We also updated this to record any ’divergences’ from this that occurred during performance of the study.^{Footnote 1}

3.1 Population

Participants were drawn from undergraduate students in their final year of study, postgraduate students, researchers and practitioners. Our rationale for this was that while for the immediate purpose of conducting systematic literature reviews the category of practitioner is less significant (on the assumption that such reviews will mainly be conducted by students and researchers), in the bigger picture of software engineering these constitute an important grouping, since we also need to consider the role of structured abstracts in the context of conveying information to them. Overall therefore, we tried to include all of these categories in our study. For the purpose of analysis, we included a question in the demographic element of the questionnaire that enabled us to separate students from more experienced workers.

Our aim was to enlist between 50 and 100 participants to act as judges and we were finally able to recruit 64 participants, divided into 20 students and 44 researchers and practitioners.

3.2 Selection of Participants

For the categories of researcher and practitioner we constructed a list of possible participants which tried to avoid including people likely to be pre-disposed to favour structured abstracts (possibly those already involved in evidence-based research). As a primary source we used the EASE conference mail-list together with contacts suggested by the EASE programme committee. This process was not completely unbiased, but since structured abstracts are only used very rarely in software engineering, we considered the likely effects to be small. Our aim was to have half of our participants drawn from these categories (taken together).

For students we used a more generalised recruitment process. Three groups considered to be suitable recruits for our study were:

final year students—limited experience of reading papers, but should have the background to be able to read the abstracts;
taught postgraduate students—again limited experience of reading papers, but with background broadly similar to the final year students—with one possible limitation that current cohorts on taught postgraduate degrees are heavily weighted to non-native English speakers;
research postgraduate students—probably the most ‘representative’ group of those likely to need to read abstracts and to conduct systematic forms of literature review—but only available in limited numbers and possibly also containing a large proportion of non-native English speakers.

Again, our aim was to have half of our participants from these categories. Initial recruitment was by asking for volunteers from the appropriate cohorts at Durham and Keele Universities, but this was later extended to volunteers from a number of other universities.

In practice, the students proved to be the more difficult group to recruit—possibly because, unlike (say) psychology students, our students are not accustomed to volunteering to take part in such studies. We were also very careful not to put pressure on students to take part, so that the invitation was usually issued by the researchers rather than the academics—and as these may have been less well-known to the students this may also have been a negative factor.

3.3 The Abstracts

Here we address the questions of the choice of abstracts to use, how they were to be rewritten in a structured form, and what consultation was undertaken with the original authors.

3.3.1 Selecting the Abstracts

A key question was the number of abstracts to use. Based upon the size of our research team and the likely number of participants, we decided to use a total of 25 papers, since this spread the re-writing task evenly among members of the team.

The 25 papers were taken from the set of 103 empirical papers previously identified and analysed by Sjøberg et al. (2005). This set of papers were taken from nine journals and three conference proceedings, published over the period 1993–2002. The task of selection was performed on a random basis by Dag Sjøberg, and maintained the same proportion of journal papers and conference articles as were present in the complete set. This resulted in a subset of 6 conference articles and 19 journal papers. None of the abstracts for these papers were in a structured form.

The basic structure we propose is intended to be suitable for most empirical studies, whether experiments, case studies, surveys or observational studies. However, all these papers describe controlled experiments (including both randomised experiments and quasi-random experiments). This indicates some limitation to the generalisability of our results and also influenced our associated guidelines for constructing structured abstracts, which currently are oriented towards experiments rather than other types of empirical study.

3.3.2 Rewriting Abstracts into a Structured Form

Hartley’s studies used two approaches:

all abstracts being re-written by the principal investigator (James Hartley) as described in Hartley (2003);
all abstracts being re-written by the original authors (Hartley and Benjamin 1998).

When using the first approach there is an additional option of sending the re-written abstract to the original authors to ask their permission to use and to invite their comments and suggestions about the revised form. This was the approach we adopted in our study.

To create the structured abstracts, each abstract was re-written by one member of the team and then checked and reviewed by another member of the team. Five of us acted as authors, and all six as reviewers, organised so that each of the five abstracts rewritten by an author was reviewed by a different reviewer, allocated randomly. We used the following headings and contents guidelines to construct the structured abstracts:

Background: Previous research or rationale for a study

Aim: Hypotheses to be tested or goal of the study

Method: Description of the type of study, treatments (including control), number and nature of experimental units (people, teams, algorithms, programs, tasks etc.), the experimental design, outcome being measured

Results: Treatment outcome values, standard deviation and/or level of significance

Conclusions: Future work, limitations of study

(These are largely those proposed in Jedlitschka et al. (2008), except that we included ‘limitations’ under ‘Conclusions’ rather than treating as a separate heading.)

Guidelines on the procedures to follow for the task of re-writing were prepared in advance, and a summary of these are attached as Appendix 1. To ensure consistency of style we also adopted the following conventions:

Each heading should be set in boldface type and should end with a colon
The sentence beginning after the heading should start with a capital letter
References should be removed and acronyms expanded
The word count should be kept below 300 if possible

Since not all of the information identified for the structured form was necessarily available from the existing abstract, where necessary this was obtained from the body of the paper.

One of our concerns was that the process of rewriting should not change the writing style. With this in mind, we instructed team members to rewrite the abstract as far as possible by using the original sentences from the abstract, and, in the event of missing information, to take this from the body of the paper. However, adherence to this instruction was not formally monitored by the research team. As a final check, as described in the next sub-section, we also asked the original authors to check our revised abstract for both correctness and style.

3.3.3 Involvement of the Original Authors

After re-writing each abstract into a structured form, we then tried to contact the original authors, asking them to check our re-written abstract, and providing them with some information about structured abstracts. The authors of sixteen papers did respond, and for those papers where we were unable to contact any of the authors, a further member of the team was used to act as an additional checker. (A further author did respond after the review process—fortunately, they agreed with the changes!) Where the authors made any suggestions about the wording of the structured abstract we endeavoured to take these into account, on the basis that these would be closer to the abstract that they would have written. After all required changes had been incorporated, the original reviewer assessed the structured abstract for completeness and clarity using the same set of questions as the participants.

3.4 Scoring the Abstracts for Clarity and Completeness

The initial version of the questionnaire to be used in judging the abstracts was based upon the one used in Hartley and Benjamin (1998). We used a set of 18 questions to assess completeness (counting the number of ‘yes’ responses to measure completeness for subsets of information and overall). To measure clarity, we used a subjective value on a scale of 1 to 10 in order to be consistent with previous research (Hartley 2003). Participants were asked to provide this in response to the following question:

Please give an assessment of the clarity of this abstract by entering a number on a scale of 1–10, where a value of 1 represents Very Obscure and 10 represents Extremely Clearly Written.

To ensure that the set of questions was complete and consistent for our purpose, a perspective-based analysis of the questionnaire was conducted by one member of the team (DB), with the outcomes being checked by three others (BAK, SC, MT).

The arguments for adopting a perspective-based approach for validation are essentially the same as those used in Kitchenham et al. (2006) when using this approach for evaluating guidelines for reporting empirical studies in software engineering. These are that using this approach is valid; feasible for us to do; cost effective; and provides learning potential (although this last point was less immediately relevant for this study).

The set of perspectives adopted were a subset of those used in Kitchenham et al. (2006). These were chosen on the basis that not all of the original categories of user would be likely to seek information from the abstract. We therefore considered the following four perspectives:

1.
Researcher who reads the abstract to determine whether the paper offers important new information on a topic of interest;
2.
Practitioner/Consultant who wants to know if the paper will provide information that will be of use in industry or commerce, and whether the results are likely to be of direct use to their company/clients;
3.
Systematic Reviewer seeking quantitative or qualitative information that can be integrated with the results of other studies;
4.
Replicator considering whether it may be worthwhile or practical to repeat the study.

The category of systematic reviewer was essentially an extended form of the perspective of meta-analyst as used by Kitchenham et al. (2006), since we should be looking beyond aggregation of numerical results. The perspectives of reviewer and author were not considered relevant to this analysis, nor were any other perspectives identified.

Since the original set of perspective questions were intended for use in assessing full papers, our first step was to determine which ones were also appropriate to an abstract, and we then used these as a check upon the questionnaire. In a few cases we used variations upon the original questions. (Again, a fuller analysis is presented in the research protocol (Budgen et al. 2007b).)

To complete the exercise, each of the perspective questions was matched against the questions used in the questionnaire. There was no attempt at exact matches, and it was accepted that a perspective question might be mapped on to more than one question in the questionnaire. The outcome of this exercise is shown in Table 1. (The numbers in the columns relate to the original perspective questions, where letters have been used, these indicate questions that were adapted to meet the needs of an abstract rather than a paper.)

Table 1 Checking the original questionnaire against the perspectives for completeness

Presenting software engineering results using structured abstracts: a randomised experiment

Abstract

Similar content being viewed by others

Writing an abstract

Automating Systematic Literature Review

Identification and prioritization of SLR search tool requirements: an SLR and a survey

1 Introduction

2 Research Question

3 Experimental Method and Materials

3.1 Population

3.2 Selection of Participants

3.3 The Abstracts

3.3.1 Selecting the Abstracts

3.3.2 Rewriting Abstracts into a Structured Form

3.3.3 Involvement of the Original Authors

3.4 Scoring the Abstracts for Clarity and Completeness

3.5 Allocation to Participants

4 Data Collection and Analysis

4.1 On-Line Data Collection

4.2 Data Preparation

4.2.1 Quantitative Data

4.2.2 Qualitative Data

5 Results

5.1 Preliminary Data Validation

5.2 Analysis Results

5.2.1 Summary Statistics

5.2.2 Demographic Information

5.3 Hypothesis Testing

5.4 Missing Information

5.5 The Relationship between Completeness and Clarity

5.6 Qualitative Data Analysis Results

6 Discussion

6.1 Comparison with Previous Studies

6.2 Limitations

6.2.1 Construct Validity

6.2.2 Internal Validity

6.2.3 Generalisability

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1 Procedures for Rewriting of Abstracts into Structured Form

1.1 A.1 The Rewriting Process

1.2 A.2 Organisation of Material

Appendix 2 Completeness Questions used in Evaluating Abstracts

Appendix 3 Demographic and Qualitative Data Questions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation