Keywords

1 Introduction

I am regularly contacted by various organizations for help regarding their software processes. They range from small, private companies to large, public sector agencies, some of whose projects failed to the order of hundreds of millions of euros. These organizations are not interested in general, overall principles regarding process; they are interested in what would work for them. They expect us, as researchers in the field, to know “what works for whom, where, when, and why” [1].

In a few cases, I have immediate suggestions for improvement, such as introducing automated testing if the number of defects is out of control. But in most cases, I cannot propose anything without working with the organization for some length of time. Identifying and measuring the factors that should be taken into account when proposing process changes are far from trivial; the software engineering literature does not give much help in concrete settings. The current body of knowledge is mostly too general or too specific; see Fig. 1.

Fig. 1.
figure 1

Useful theories and patterns

Software engineering folklore guides processes to some extent. For example, a collection of “laws” and principles that have emerged over the years can be found in [24]. One of the laws stated by Endres and Rombach [4], attributed to the work by Boehm [5], is: “Errors are most frequent during the requirements and design activities and are more expensive the later they are removed.” This “law” encourages processes that emphasize the analysis and design phases. However, what does this mean in practice? Should one spend, say, 10 to 30 percent of the total effort of a development project in these phases? Because of varying contexts, software engineering folklore is often contradictory. For example, much effort has been devoted to developing process models that include a large number of activities, practices and roles together with formal, detailed project documents. In contrast to such heavy processes are the light processes recommended in agile development.

Few empirical studies in software engineering discuss the contexts to which the results may be generalizable. Experiments in software engineering generally have few subjects and almost all of them use convenience sampling [6]. Most case studies are of single cases, and few attempt to generalize the results through theories with a well-defined scope of validity [7]. Surveys collect people’s subjective opinions, which are based on knowledge and experience gained in specific contexts. The results of surveys also need to be related to theories to become generally useful.

Nevertheless, a premise in software engineering is that there is a relationship between software processes and success of a project or task. The success is typically described in terms of the quality of the delivered software, how long it takes to develop it and how much it costs. It is also commonly agreed that this relationship is moderated by the context of the processes, as illustrated in Fig. 2. It is reasonable to assume that an optimal process varies with context; for example, a small team may not benefit from activities designed to help large teams.

Fig. 2.
figure 2

Relationship between process, context and outcome

The ideal model would be a deterministic one, in which a set of given context and outcome parameters would determine the optimal process, and a given process and context would determine the outcome. However, it is unlikely that we will manage to develop such models given that software development is a mostly human activity and we are unable to describe human behaviour deterministically in general, even though certain theories describe human behaviour in specific situations, for example, the prospect theory [8].

Although we are far from a scenario where we can fully determine which process gives which result in a specific context, our community can improve in identifying patterns and proposing theories for the relationships among process, context and outcome. This paper reports on a study that is an ongoing attempt in that direction.

2 Design of Study

The empirical software engineering community conducts both controlled experiments, which focus on cause and effect, and case studies, which focus on realism. How to identify cause–effect relationships in realistic settings is a challenge. What if we hire several companies to develop the same system and see what happens? Some years ago, our research group had such an opportunity. We needed a web application to store information about all the empirical studies of the group. We developed a requirement specification and sent a call for tender to 81 consultancy companies and received bids from 35 of them. A study of this bidding process was reported in [9].

The striking difference in the bids, given that we provided a well-defined 11-page requirement specification, led us to use price as the selection criteria. We wanted to study the effect of price on process and outcome. Thus, in four price segments, we selected the company that appeared most likely to develop a good system based on the quality of the bid documents. The companies are named A to D in this paper, in the order of bid price; see Fig. 3.

Fig. 3.
figure 3

Four out of 35 companies selected for development

The data sources in this study are comprehensive. They include daily time sheets on tasks and subtasks of each developer, weekly snapshots of all documents including source code produced during the projects, full history provided by the configuration management and issue tracker tools and other information collected from defect logs, e-mail communication and team interviews.

From this study, we published an investigation on reproducibility and variability in software engineering [10] and a study of effort estimation based on use case points [11]. The code developed by the four companies has also been used in follow-up studies on maintenance metrics [12] and effects of code smells [13]. (In this paper, the companies are named according to bid price, while in the papers already published, the order was alphabetic. Company C is now Company A, Company A is now Company B, and Company B is now Company C. Company D remains Company D.)

A detailed investigation of the effect of process and context has not yet been published. Initial results are reported here.

3 Context

We controlled parts of the context to make them the same for all the companies; other context factors were specific to each company. The controlled ones included:

  • Requirement specification

  • Application domain (web document management)

  • Functional size of the system (57 unadjusted use case points [11])

  • Low complexity of system

  • Customer (our research department)

  • Programming language (Java, Javascript and SQL)

  • Tools (IDE: Netbeans or Eclipse, Build & Deploy: Ant, Configuration management: CVS; note that these tools were selected by the companies themselves but they happened to be the same by accident)

  • Team composition (1 project manager and 2 developers, except in Company B, which had 1 developer and 1 interaction designer)

  • Uniform interaction between development team and customer (e.g., use of same issue tracker, acceptance tests by the same customer team)

  • Intermediate skill level of the developers

Regarding skill level, we selected the developers on the basis of their CVs. All of them had at least three years of formal education in programming and three years of industrial experience with the technology to be used. Ideally, we should have tested the developers using a validated skill evaluation instrument [14], but in the absence of such an instrument at that time, the developers were tested for their Java skills by taking part in a one-day exercise in which they performed the same Java programming tasks used in a former experiment [15]. Their performance was thus compared with that of 77 other Java programmers. Similarly, the developers were tested for their design skills by taking part in a half-day UML exercise where they performed the same tasks used by 28 persons in a former experiment on use cases and class diagrams [16]. We did not observe any clear relationship between the skills of the team and project outcome.

Table 1 shows context factors that varied among the companies. Some factors were specific to the development organization; others were specific to this development project. Note that the bid by Company D, of 69,000 euros, shown in Fig. 3, was negotiated down to the 56,000 euros, shown in Table 1.

Table 1. Varying context

The table shows several internal relationships among the factors. Company A is small and can therefore only run fairly small projects with small teams. Their organization-level process models are therefore light. The low price offered to build the system is followed by an expectation of a short lead time, a low number of effort hours, and the need for the developers to work on several projects in parallel. At the other end, Company D is large and has a heavy organization-level process model. The high bid allows higher estimated effort and allows developers to work full time on this project.

4 Processes

As an example of process data, Fig. 4 shows the number of hours the companies spent on various development activities. Note the one-to-one correspondence between the effort spent on the activities “Implementation” and “Analysis and Design”. There is no indication here that much effort spent on analysis and design reduces the effort needed on implementation, or vice versa. Remember that the amount of functionality is fixed. Figure 5 shows the hours spent on the activities as the projects were running.

Fig. 4.
figure 4

Effort spent on various activities

Fig. 5.
figure 5

Effort on activities along the way

5 Outcome

The outcome of a process or project may be measured along many dimensions. For the four systems, we assessed reliability, usability and maintainability. Reliability was measured by investigating the defects found over a period of two years when the systems were operational [10]. The usability was measured through a dedicated experiment [17] and maintainability in a follow-up experiment [13]. Table 2 shows the results and also includes measurements of effort and lead time.

Table 2. Outcome variables

6 Relationships

Did the rather extreme price differences, of a factor from 1 to 6, lead to corresponding differences in outcome? Generally not. Company A had given the lowest bid and accordingly spent the least effort on the whole development, particularly on analysis and design and on testing. In a sense, this company developed a “quick and dirty” solution, but the small size of their Java code led to the most maintainable system. The low number of lines of Java code trumped other maintainability metrics [12]. On the other hand, the low focus on testing resulted in the least reliable system and required us as a customer to spend much more effort on testing than we did for the other companies. In total, we spent almost twice the number of effort hours on Company A as we did on the other companies, which to some extent reduces the cost savings of hiring Company A. Furthermore, we were a competent customer; an incompetent customer might have resulted in a failed project.

Company B scored best on the system quality dimensions on average. Given the next lowest price, one may consider their project as the best value for money.

Company C over-designed their system, which resulted in excess code size, which in turn resulted in poor maintainability. But they scored top on reliability.

Company D had relatively heavy processes and a highly competent project manager. The developers worked full-time and were co-located. Their project seemed to have full control all the way and resulted in the lowest lead time and very little overrun.

We have observed many other relationships among context, process and outcome, but much analysis remains. We hope to reveal interesting patterns that may shed new light on existing theories or be the basis for new theories.