Keywords

1 Introduction

Most of today’s software development organizations aspire to save time and reduce costs. Therefore, globally distributed environment has invaded the software development industry. The strategy of distributed software development generates many benefits that support the development of software product in an effective way, but this strategy still faces many challenges which may hinder the success of globally distributed software development projects. In this context, a significant number of projects failed to deliver within time and budget in globally distributed environment [1]. Thus, managing the globally distributed environment is a key characteristic. However, in order to successfully plan software development projects’ activities, it is important to sustain a high level of accuracy to cost and time estimation methods .

Developing software products in a cost -effective way is the overwhelming objective of many organizations. In addition, the ultimate goal is the accurate estimation of the required amount of effort for the completion of each project. Many research studies indicate that projects without realistic planning and accurate estimation are often beyond their allocated budget and the proposed completion time [24].

The drivers involved in the distributed environment are investigated with respect to four aspects: (1) software product , (2) personnel attributes, (3) computer attributes, and (4) project attributes [5]. We also suggest that distributed software development projects’ success is never isolated to one particular driver.

Although there are many methods and techniques available to assist in creating distributed software project effort estimates, they are still far from the required accuracy . Several authors concerned with software development have given varied suggestions for these inaccuracies and ways to overcome some of them [6, 7]. In contrast, this chapter focuses on ways in which existing effort estimation methods can be tailored to account for global software development . It investigates the influence of the different factors that affect the effort estimation method’s accuracy in the context of globally distributed software development projects. Furthermore, this chapter presents the effort estimation methods based on the treated factors.

The chapter is structured as follows: Sect. 2.2 presents the globally distributed environments. Section 2.3 reports the software effort estimation process. Section 2.4 outlines software cost /time estimation techniques for global software development (GSD ). Section 2.5 discusses the main cost and time drivers . Section 2.6 presents the risk analysis ; finally, the conclusions and future work are presented in Sect. 2.7.

2 Globally Distributed Environment (GSD)

GSD refers to software development that is done by multiple teams in different geographic locations. The teams are separated physically, and they are located in different countries within one region or around the world. The teams can either be from one organization or from multiple different organizations (outsourcing) [8].

Global software development is usually considered to be much more difficult than collocated software development given the many different challenges related to the software development in a globally distributed setting. These challenges include negative impact of physical distance, cultural differences, and many other complexity factors which are elaborated in the following subsections [9, 10].

Past studies have shown that tasks take about 2.5 times longer in distributed setting than in collocated setting [11, 12]. Other studies reported that about 40% of GSD projects fail to deliver the expected benefits , due to the lack of theoretical basis and difficult complications in GSD project [13, 14]. On the other hand, Teasley et al. [15] reported that in collocated teams, productivity and job satisfaction are much higher than projects that do locate the entire project team in a war room.

The additional activities and difficulties in global software development require additional effort for substantial planning, coordination, and control overhead in the day-to-day governance of global software development. This additional effort should be considered in the time and cost estimation . Hence the time and cost estimation in GSD is more complex than in local development.

2.1 Challenges

Although GSD offers several benefits , the distributed work has also many challenges (Table 2.1). If globally distributed software projects are not managed neatly, then they are likely to turn any company into a loss-making business [16]. That means that there are many challenges associated with global software development . Physical separation among project members has diverse effects on many levels. The following factors have been gathered from research literature [17] to have an impact on the amount of effort and cost required for global software development .:

  • Geographic distance: Software development , particularly in the early stages, requires much communication, coordination, and control [18]. Geographical distance is a measure of the effort required for one actor to visit another and can be seen as reducing the intensity of communication [19], especially when people experience problems with media and have difficulties finding a sufficiently good substitute for face-to-face interaction [20]. Kraut and Streeter [21] found that formal communication is useful for routine coordination, while informal communication is needed to face uncertainty and unanticipated problems, which are typical of software development. They observed that the need for informal communication increases dramatically as the size and complexity of the software increase. In a large software organization, developers can spend on average up to 75 min per day for informal unplanned communication [22]. In general, low geographical distance offers greater opportunity for periods of collocated teamwork.

  • Temporal distance: Time zone differs among project members when development team is distributed around the world. Temporal distance is a measure of the dislocation in time experienced by two actors wishing to interact [19]. Temporal distance can be caused by time zone difference or time shifting work patterns and can be seen as a factor that reduces opportunities for real-time collaboration, as response time increases when working hours at remote locations do not overlap [23]. Temporal dispersion reduces the possibilities of synchronous interaction, which is a critical communicational attribute for real-time problem solving and design activities. In practice, teams in different time zones have few hours in the work day when multiple sites can participate in a joint synchronous meetings and discussions. Temporal dispersion can also make misunderstandings and errors more likely to occur [24].

    This leads to delay in response to asynchronous communication. For example, an e-mail sent from one site arrives after working hours at the destination; as a consequence, the response cannot be sent until the next day begins, and it will be visible to the sender only when he/she comes to office on the following day.

  • Linguistic distance: The lack of a common native language creates further barriers to communication [25, 26]. Linguistic distance limits the ability for coherent communication to take place [27]. English has become the popular language of GSD [28]. This affects not only the quality of communication but also the choice of communication media. Language skills can impede communication in more subtle ways. When participants to a conversation have different levels of proficiency, the group with better language skills occupies a position of strength and can appear to be more powerful and thus suppress important communication through unintended intimidation [28]. Further, lack of proficiency in the chosen language can lead to a preference for asynchronous communication, which can be an impediment if video and teleconferencing are important communication media [29].

  • Cultural distance: GSD requires close cooperation of individuals with different cultural backgrounds which often creates another barrier for efficient work. Cultures differ on many critical dimensions, such as the need for structure , attitudes toward hierarchy, sense of time , and communication styles. These differences have been recognized as major barriers to communication. Culture also affects interpretation of requirements ; domain knowledge used to fill in gaps or place requirements in context varies considerably across national culture [30]. Culture also interferes with collaboration when cultural norms result in conflicting approaches to problem solving.

  • Social challenges: Another fundamental challenge in global software development is the social issues like fear and trust. Fear and distrust can negatively impact the motivation, the desire to work, the cooperation, and the communication and share of knowledge with remove colleagues. Hence, it has a direct bearing on the success of implementing global software development [31]. It is very difficult for individuals and groups to trust and build relationships with people they feel threaten their jobs. On-site teams in expensive countries are fearful of their job security when off-site teams are added in less expensive locations; this creates mistrust to their off-site colleagues as well as their own management ’s motives. This can result in clear examples of not wanting to cooperate and share knowledge with remote [26, 31].

Table 2.1 Challenges in global software development

In some cases, wherein people have successfully worked together for up to year in a collocated situation, once a virtual team strategy was fully implemented, these problems soon came to the fore.

2.2 Benefits

This section identifies the main benefits that have been associated with global software development .

2.2.1 Cost Savings

One of the most obvious reasons for organizations to embark on a challenging and risky endeavor such as GSD is, not surprisingly, the potential to reduce development costs. By moving parts of the development work to low-wage countries, the same work can be done for a fraction of the cost [32]. The basis for this benefit is that companies are globalizing their software development activities to leverage cheaper employees located in lower-cost economies. This has been made possible by the deployment of cross-continental high-speed communication links enabling the instantaneous transfer of the basic product at hand: software.

The difference in wages across regions can be significant, with a US software engineer’s salary being multiple times greater than that of a person with equivalent skills (at least parts) from Asia or South America. However, this seems to be rising, and there has been hyper-growth in local IT employment markets such as in Bangalore. It is our experience that companies are now looking at alternative locations, which offer more acceptable attrition rates with the continued promise of cheaper labor.

2.2.2 Reduced Time

Having developers located in different time zones can allow organizations to increase the number of daily working hours in a “follow-the-sun” development model which can decrease cycle time. Time zone effectiveness is the degree to which an organization manages resources in multiple time zones, maximizing productivity by increasing the number of hours during a 24-h day that software is being developed by its teams. When time zone effectiveness is maximized to span 24 h of the day, this is referred to as the “follow-the-sun” development model. This is achieved by handing off work from one team at the end of their day to another team located in another time zone. The approach can aid organizations which are under severe pressure to improve time to market [11].

3 Software Effort Estimation Process

In software project management , effort estimation is the process of developing an approximation of the monetary and temporal resources needed to complete project activities [33]. Usually software is developed in projects, and hence software cost and time estimate can be considered as an approximation of the monetary and temporal resources needed to complete software.

3.1 Estimation Process

In order to establish an accurate effort estimate for software , a structured approach with significant amount of work is needed. The software effort estimation can be seen as a small size project which needs to be carefully planned, managed, and followed up. Many organizations have different processes for software effort estimation. These processes vary in many aspects, and there does not seem to be one common process which is used in all organizations and in research. The process for software cost and time estimation data gathered from the NASA’s Handbook for Software Cost Estimation [34] enables us to develop the following table (Table 2.2). It consists on preparing a description of cost analysis requirements , revising its processes and its procedural requirements document and cost/time estimation handbook accordingly.

Table 2.2 Software cost estimation process from NASA

Most of the software effort estimation models view the estimation process as being a function that is computed from a set of cost drivers . And in most estimation techniques, the primary driver or the most important driver is believed to be the software size. As illustrated in Fig. 2.1, a view of software estimation process, the software requirements are the primary input to the process and also form the basis for the estimation.

Fig. 2.1
figure 1

View of software estimation process

3.2 Estimation Accuracy

The effort estimation accuracy helps to determine how well or how accurate our estimation is when using a particular model or technique. In addition to the degree of project determination, estimate accuracy is driven by:

  • Level of non-familiar technology in the project

  • Complexity of the project

  • Quality of reference cost estimating data

  • Quality of assumptions used in preparing the estimate

  • Experience and skill level of the estimator

  • Estimating techniques employed

  • Time and level of effort budgeted to prepare the estimate

  • The accuracy of the composition of the input and output process streams

We can assess the performance of the software estimation technique by the following two mechanisms:

3.2.1 Mean Absolute Error (MAE)

Mean of absolute error (MAE) (Eq. 2.1) [35] is computed by averaging the total of absolute errors (AE) (Eq. 2.2).

$$ MAE=\frac{1}{n}\sum_{i=1}^n{AE}_i $$
(2.1)
$$ {AE}_i=\left|{e}_i-{\widehat{e}}_i\right| $$
(2.2)

3.2.2 Mean Magnitude of Relative Error (MMRE)

MMRE is defined in Eq. 2.3. This measure is derived from the magnitude of the relative error (MRE) as shown in Eq. 2.4. This MRE criterion has been criticized by some researchers for being biased toward underestimates, which makes it not significant for being an accuracy measure [36, 37].

$$ MMRE=\frac{1}{n}\sum_{i=0}^n{MRE}_i $$
(2.3)
$$ MRE=\frac{AE_i}{e_i} $$
(2.4)

where e i and \( {\widehat{e}}_i \) are the actual and predicted effort for the ith project.

Each of the error calculation techniques has advantages and disadvantages. For example, absolute error fails to measure the size of the project especially in GSD context, and mean magnitude of relative error will mask any systematic bias (do not know if the estimation is over or under).

4 Software Cost/Time Estimation Techniques for GSD

The cost /time estimation has been in the focus of software engineering research for many decades, and hence a high number of different estimation techniques have been developed [3840]. Unfortunately most of the techniques for software cost estimation have been developed before the recent trend on global software development . Many techniques assume that the software is developed locally, and therefore they do not take into account the additional challenges for the development of distributed software [41, 42].

Estimation for the development of distributed software differs from estimation of local software development at least in two different ways. Firstly, there is a large overhead effort caused by several factors such as language differences; cultural barriers, or time shifts between sites; etc. Secondly, many factors (such as the skills and experience of the workforce) are specific and cannot be considered globally for a project. In many projects, the development sites have very different characteristics, and thus the productivity and cost rate is different between sites.

In the recent research, techniques used to estimate project effort and task duration in distributed context [43] include expert judgment, estimation by analogy, and algorithmic models (i.e., COCOMO II, SLIM, and recently function point analysis -based models) [41].

4.1 Expert Judgment

Experts’ judgment is one of the methods by which assessors conduct their effort estimation via using their expertise and their logical reasoning to estimate the required amount of effort needed to develop a software product . The accuracy of this method mainly depends on the skills, knowledge, and experience of the assessors to estimate the required amount of effort to complete a given project. Expert judgment can be very accurate, but it fails to provide an objective and quantitative analysis of what are the factors that affect effort and duration in GSD context, and it is hard to separate real experience from the expert’s subjective view [44]. The accuracy of the estimates depends on how closely the project correlates with past experience and the ability of the expert to recall all the facets of historic projects.

4.2 Estimation by Analogy

Estimating by analogy means comparing the proposed project to previously completed similar project, where the project development information is known. Actual data from the completed projects are extrapolated to estimate the proposed project. This technique is relatively straightforward. Actually in some respects, it is a systematic form of expert judgment since experts often search for analogous situations so as to inform their opinion. The methodology that should be followed to succeed the estimations by analogy involves characterizing the proposed project, selecting the most similar completed projects whose characteristics have been stored in the historical data base, and deriving the estimate for the proposed project from the most similar completed projects by analogy [41, 45].

4.3 Algorithmic Models

The algorithmic methods are designed to provide some mathematical equations to perform software estimation . These mathematical equations are based on research and historical data and resort to inputs such as source lines of code, number of functions to perform, and other cost /time drivers such as project effort , design methodology, task allocation, team size, etc. The algorithmic methods have been largely studied and offer several advantages such as generating repeatable estimations, refining and customizing formulas, supporting a family of estimations or a sensitivity analysis , and calibrating previous experience. Models such as COCOMO II (Constructive Cost Model) and SLIM Model are the most frequently algorithmic methods used in a GSD context [43]. In the following, we present:

4.3.1 Constructive Cost Model

One of the popular and extensively used algorithmic models for the estimation of cost and schedule of a developing software was given by Shruti Jain [46] and is known as the Constructive Cost Model (COCOMO ) [47, 48]. The parameters and equations that are used in this model are obtained through previous software projects. The size of code is usually given in KLOC (thousand lines of code), and the obtained effort is in person months (PM). The PM represents the number of hours that a person spend to complete a given task presented in a calendar month. COCOMO II deals with variety of factors that influence development of distributed software projects’ effort estimation. There are three submodels for COCOMO II: Application Composition Model, Post-architecture Model, and Early Design Model. COCOMO II includes factors in order to steer the effort estimation team to make better approximation based on the influencing factors. These factors are related to organizational and team characteristics. Each factor has values from range of very low to extra high rating level. The weight of scaling factors could divert according to organizations and projects. The following are the equations which COCOMO II proposed to estimate the required effort:

$$ PM= A\times {\mathrm{Size}}^E\times {\prod}_{i=0}^n{ E M}^i $$
(2.5)

where:

  • n represents the number of drivers in a GSD context.

  • A = 2.94 (for COCOMO II). size is estimated by kilo source lines of code (KSLOC) measure.

  • \( E= B+0.01\times \sum_{i=1}^4\mathrm{Factor} \)

  • EM represents the effort multiplier; B = 0.91 for COCOMO II.

$$ \mathrm{Duration}= C\times {PM}^{D+0.2\times \left( E- B\right)} $$
(2.6)

where C = 3.67, D = 0.28, and B = 0.91

4.3.2 Slim

SLIM [49] is an algorithmic method that is used to estimate effort and schedule for projects. The underlying reason for developing SLIM is to measure the overall size of a project based on its estimated SLOC. It is represented by two equations: Eq. 2.7 for allocating productivity parameter (PP), expressed in man years, which would be required in Eq. 2.8 for calculating effort.

$$ PP=\frac{{\mathrm{Size}}_{SLOC}}{\left({E}_{\mathrm{Man},\mathrm{Year}}/{B}^{1.13}\right)\times \mathrm{Duration}\left({Y}^{4/3}\right)} $$
(2.7)
$$ {E}_{\mathrm{Man},\mathrm{Year}}={\left[\frac{{\mathrm{Size}}_{SLOC}}{PP\times {\left({\mathrm{Duration}}_{\mathrm{Years}}\right)}^{3/4}}\right]}^3 $$
(2.8)

where:

  • E Man,Year represents the amount of effort required to accomplish a given task in a man-year unit.

  • Y is the development time in years.

  • B is a special skill factor and is based on size and duration.

Muhairat et al. [43] investigated the effects of different factors on the accuracy of effort estimation methods in GSD environments. Precisely, COCOMO II and SLIM methods of estimating project efforts were considered. They discovered that the estimation methods were less accurate in determining the actual time of completion of some software development projects. The main factor that affected this outcome included the project environment. They concluded that developing software in a GSD environment always requires more effort and time to complete.

5 Cost and Time Drivers

As already known, the distributed software development introduces new challenges in the software engineering area. In order to have a better project planning for multisite projects, it is important to identify the main drivers that can increase the project’s effort . This section aims to describe these main effort drivers and their impacts on a distributed project.

Analyzing the main researches found in the literature and the feedback from project managers about the impact on project duration and effort would enable us to suggest some effort drivers for distributed software development projects. We present the effort drivers extracted from theoretical research and interview analyses. The effort drivers are split into four categories depicted in Table 2.3 [41]: product , platform, personnel, and project factors . Therefore, the effort drivers tend to be measures of system size and complexity, personnel capabilities and experience, hardware constraints, and availability of software development tools .

Table 2.3 Software drivers

5.1 Product Factors

The product factors are determined by the novelty of the software to be developed. This category factors indicates the degree of innovation which is directly proportional to the level of spontaneous communication, the need for specific domain knowledge, and the frequency of unforeseen changes. Another important factor is the work assignments that have to be carefully crafted and taking into account the organizational structure and the functional coupling among software units [50]. Therefore, the architecture has major influence on the efforts needed to coordinate the development phase. Indicators for the degree of architectural adequacy might be modularity, interface match and dependencies, and communicability of the architecture. Examples of product cost drivers of COCOMO II are:

  • Required Software Reliability (RELY): This is the measure of the extent to which the software must perform its intended function over a period of time .

  • Date Base Size (DATA): Measure to capture the effect of large data requirements have on product development .

  • Required Reusability (RUSE): This cost driver accounts for the additional effort to construct components intended for reuse on the current or future projects.

  • Documentation Match to Life Cycle Needs (DOCU): Measure of the suitability of the project’s documentation to its life cycle needs.

5.2 Platform Factors

The platform factor refers to the target-machine complex of hardware and infrastructure software . Platform products have more demanding task characteristics than derivative products. Specifically, platform projects undertake development of greater levels of new technology and have higher levels of project complexity. Examples of platform cost drivers of COCOMO II are:

  • Execution Time Constraint (TIME): This is a measure of the execution time constraint imposed upon a software system .

  • Main Storage Constraint (STOR): This is a rating that represents the degree of main storage constraint imposed on a software system or subsystem.

  • Platform Volatility (PVOL): This is a measure of the complex of hardware and software .

5.3 Personnel Factors

As for the personnel factors , it includes cultural fit mainly related to closeness of team members’ mental models [51] which is influenced by the combination of countries involved, the international experience of the teams, etc., skill level measured by educational level and language skills indicating the formal abilities of remote team(s) [50], shared understanding embodied by tacit knowledge that is required indicating the level of completeness of documentation and specification and the common knowledge about goals, and finally information sharing constraints representing competitive restrictions on information distribution, e.g., when working with external subcontractors or in security-sensitive environments. Examples of personnel cost drivers of COCOMO II are:

  • Programmer Capability (PCAP): Current trends continue to emphasize the importance of highly capable analysts.

  • Applications Experience (AEXP): This rating is dependent on the level of application experience of the project team developing the software system .

  • Language and Tool Experience (LTEX): This is a measure of the level of programming language and software tool experience of the project team developing the software system .

5.4 Project Factors

Regarding project factors , we might consider the novelty of collaboration model by analyzing the initial cost for the search of offshore partners and contract negotiation. The tools and infrastructure represent the homogeneity of the tool chains used in all sites and potential ramp-up costs for setting up the infrastructure in remote sites and finally the physical distance representing the potential overlaps of working time and, accordingly, the intensity of use of asynchronous communication media and collaboration tools [52]. Examples of project cost drivers of COCOMO II are:

  • Multisite Development (SITE): The assessment and averaging of two factors , site collocation and communication support.

  • Required Development Schedule (SCED): This rating measures the schedule constraint imposed on the project team developing the software .

6 Risk Analysis

In order to analyze the impact of risk involved in the development of software , the project manager has to identify the risk drivers . Software risk components can be classified as “cost” and “time” risks . The degree of uncertainty that the project budget will be maintained is the cost risk. The degree of uncertainty that the project schedule will be maintained and that the product will be delivered in time is the time risk.

Software risks are managerial issues which should be handled through proper management of the project especially when estimating costs and times. Only expert manger associated with software project office can handle these issues, while a less experienced software manager may lead to un-controlling the risks and ultimately result in the failure of the project. Software risks should be monitored and controlled since the starting phases of the project management life cycle [53].

The GSD is becoming very difficult, complex, and challenging in the context of software project management as the user problem is getting more and more challengeable [19, 54]. In this respect, the risk management in distributed software development is also much complex than in local software development. It particularly has specific concerns that may not be obvious until their impact has been realized. Many projects got failed they did not realize, soon enough, the importance of certain common factors in GSD projects [41]. Table 2.4 presents the potential risks in a GSD project and provides their cost and time impact in this respect.

Table 2.4 Risk items and their impacts

To systematically identify risks and evaluate appropriate risk mitigation for estimating cost and time in the GSD context, we analyze the features of GSD and then elaborate how they are impacted by risks [55].

Efficiency

Software and IT companies need to deliver promptly and reliably while the competition is literally a mouse click away. Hardly any other business has so low entry barriers as IT and therefore stimulates an endless fight for efficiency along the dimensions of improved cost , quality, and time to profit. GSD clearly helps in improving efficiency due to labor cost differences across the world, better quality with many well-trained and process-minded engineers especially in Asia, and shorter time to profit with following the sun and developing and maintaining software in two to three shifts in different time zones. Risks directly related to the efficiency target are project delivery failures, requirement, and design quality (Table 2.4).

Flexibility

Software organizations are driven by fast changing demands on skills and sheer numbers of engineers. With the development of a new and innovative product , many people are needed with broad experiences. However, when arriving in maintenance, these skill needs look different and manpower distributions are also changing. Such flexible demand cannot anymore be handled inside the enterprise. GSD is the answer to provide skilled engineers just in time and thus allows building flexible ecosystems combining suppliers, customers with engineering and service providers. Directly related risks to the flexibility goal are poor management visibility and distance and culture clashes (refer to Table 2.4).

7 Conclusion

There is a strong surge for global software development to countries with lower labor cost . This chapter promotes analysis of project drivers to gain insights into comparing development costs and time for distributed software development projects as compared to collocated projects.

Even though most of the evaluated software effort estimation techniques do not have any of the GSD -related cost /time factors included by default, these techniques are still suitable and applicable for estimation of GSD project with some setup and calibration work. Estimation methods such as estimation by analogy and algorithmic models can be applied to the development of distributed software if the person doing the estimation model setup is experienced in outsourcing. Then, the person would be able to include all necessary cost/time factors into the estimation model.

Also, all expertise-based techniques can be directly applied for GSD projects, but they require experts with experience and knowledge on GSD. The available development of distributed software specific techniques can naturally be also directly applied for GSD projects.

Future work of this research includes on one hand the verification and improvement of the factors of a distributed development project and on the other hand the application of methods on projects while collecting effort data to calibrate the relevance of each project driver.