Keywords

1 Introduction

Agile development methods have become highly popular in software organizations since the early 2000. The principles and practices of agile development were originally designed for small and co-located teams [5]. To leverage the potential benefits also in larger enterprises, the agile practices have to be scaled [8, 33]. Large-scale agile transformations has been a burning topic [8, 26], with increased concerns for additional coordination mechanisms and integration of non-development units, such as finance and human resource management [8]. To support scaling, new frameworks such as the Scaled Agile Framework (SAFe) [25], Large Scale Scrum (LeSS) [24], and Disciplined Agile Delivery (DAD) [3] have been proposed by agile consultants. International workshops on large-scale agile organized at XP2016 [27] and XP2017 [26] have highlighted the importance and need for research into the adoption of scaling frameworks [26].

According to the \(12^\mathrm{th}\) State of Agile Survey, which Version One conducts yearly, SAFe continues to be the most popular scaling framework in large enterprises [29]. Moreover, a recent survey on software development approaches indicated the predominance of SAFe over LeSS and DAD [23]. As the number of organizations adopting scaling frameworks is increasing [29], this provides opportunities for researchers and software practitioners to accumulate knowledge on the usage of these frameworks through case studies, technical reports and experience reports. To our knowledge, no secondary studies on the benefits and challenges of the scaling frameworks or their adoption process have been published. This is striking, given their importance in the industry. In this paper, we start filling this gap by summarizing the benefits and challenges of adopting the SAFe framework in the form of a multi-vocal literature review.

2 Related Work

2.1 Scaled Agile Framework (SAFe)

The Scaled Agile Framework was designed by Dean Leffingwell to scale agile to large enterprises [1, 36]. It incorporates practices from Scrum, Extreme Programming, Kanban and Lean. It offers four levels: the Team, Program, Portfolio and Value stream levels. The team level comprises the agile teams. Agile Release Trains (ART’s) are introduced to scale a large number of teams and individuals at the program level. ART’s follow HIP (Hardening, Innovation, Planning) iterations to develop Potential Shippable Increments (PSI) or Program Increments (PI). PI’s are planned during the release planning days. SAFe introduces additional roles such as the Agile Release Train Engineer, system teams, release management team and portfolio management team. The core values of SAFe are: build in quality, transparency, alignment and program execution [16]. Figure 1 gives on overview of the SAFe framework.

Fig. 1.
figure 1

Scaled agile framework, version 4.5 [19]

2.2 Secondary Studies on Large-Scale Agile

Secondary studies on large-scale agile have explored topics such as challenges and success factors of large-scale agile transformations [8], organizational, managerial and cultural aspects [32], scalability and adoptability [22], inter-team coordination [14], architectural roles [35] and quality requirements practices [2]. However, systematic literature reviews on scaling frameworks have not been found in the literature. The only review found on scaling frameworks, compares a few scaling frameworks based on team size, practices and organization type [1]. Neither that review, nor other previous reviews on large-scale agile have included the grey literature, e.g. case studies or experience reports published on the homepages of the frameworks. While there are inherent problems with case studies published by the proponents of a particular framework, completely eliminating such studies from literature reviews unnecessarily excludes the voice of the practitioners on the usage of scaling frameworks and implementation of agile at scale. Thus, in particular given the lack of scientific literature, we consider it important to study such cases, fully understanding the related problems, further discussed below.

3 Research Method

We conducted a multivocal literature review on the adoption of the SAFe framework to answer the following research questions:

  • RQ1: What are the reported benefits of adopting SAFe?

  • RQ2: What are the reported challenges of adopting SAFe?

3.1 Multivocal Literature Review

Systematic literature reviews and systematic mapping studies have been popular in the field of software engineering. They help to summarize the existing studies reported in a specific research domain [11]. According to the widely adopted systematic literature review guidelines [21], a “fully systematic literature review” should include both the grey and the peer reviewed literature. Grey literature is defined as, “(the literature), produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers, i.e., where publishing is not the primary activity of the producing body” [11].

However, most SLRs published in the software engineering literature have not included the grey literature [12]. Including the grey literature is challenging, and the search strategy for grey literature has not been systematically addressed in the SLR guidelines [21]. This is unfortunate, as excluding this literature will eliminate the voice and opinions of the practitioners who do not publish in the academic forums [4, 12]. It has been evident that most of the software practitioners do not publish in academic fora [13].

The situation seems to be slowly changing for the better. Recent literature reviews including both peer reviewed literature, as well as grey literature from blogs, websites, and white papers, have popularized the term “multi-vocal literature reviews”, or MLRs [12]. Several such reviews have been conducted in software engineering to bridge the gap between the voice of practitioners and academics [4, 12, 28, 34].

The inclusion of grey literature can also be considered as a threat, as the information reported is based on the opinions and experiences of the practitioners rather than systematic data collection procedures and analysis [10]. Thus, there are severe issues with, e.g., author and publication bias that needs to be accounted for when analyzing such literature. However, several SLRs published in the SE literature have already included peer-reviewed experience reports, written by consultants and practitioners, and that rely on experiences rather than systematic data collection and analysis (e.g., [8, 9, 31]).

Table 1. Search strings

3.2 Study Selection

Databases: For identifying the peer-reviewed literature we searched scientific databases by formulating keywords. The number of matches and the search strings are given in Table 1.

The main source for the grey literature was the official SAFe website [19]. We included the case studies, including backlinks, published there. We used this source, as it currently is the most notable source of SAFe studies available. The case studies are based on a defined review and data collection procedure. Organizations initially answer a questionnaire [17] and thereafter, the “scaled agile team” reviews the answers and supplements provided by the organizations. The team contacts organizations for interviews with key members responsible during the SAFe implementation to gather background information [18]. Drafts are written with the help of case study specialists. These are reviewed and approved by the organizations before being published on the website. The aim is to publish reports of mature SAFe organizations, i.e., the reports should reflect the situation no earlier than after 18–24 months into the SAFe implementation [18].

The main benefits of this material are the standard format and questionnaire used giving some opportunity for cross-case analysis. However, the review process is likely built not only to guarantee the quality of the published case studies, but also to ensure that the SAFe framework is put in a good light. Therefore, the publication bias is extremely strong, and it can be questioned whether case studies providing negative results would make it through.

Inclusion Criteria: We used the following inclusion criteria:

  1. 1.

    Only articles related to the Scaled Agile Framework.

  2. 2.

    Only primary evidence: experience reports, case studies, action research.

  3. 3.

    Publication type: Conference papers, journal papers, workshop papers, white papers from the Scaled Agile Framework’s homepage.

Search Procedure: During the keyword search from scientific databases 63 matches from four databases were identified. After removing the duplicates, we had a total of 41 papers. These were filtered based on the titles and abstracts by two authors resulting in eight includes and 33 excludes. After full-text filtering, six scientific papers were selected for the analysis. Finally, we selected five papers, eliminating one paper, as the same case was published both as a conference and a journal paper: we included only the journal paper. We backward searched by snowballing through the references of selected five papers and also forward searched by snowballing the citations. In the forward search we found one primary study meeting the inclusion criteria. Thus, in total, we included six primary studies from peer-reviewed sources.

Grey Literature: For identifying the grey literature, we manually searched the SAFe homepage [19] and identified 48 white paper reports. In addition, we used backlinks and gathered additional supplements supporting the case studies, such as downloadable presentations and external links published within each white paper report (e.g.: John Deere [G11, G12, G13, G14, G80]Footnote 1). We gathered 46 case study reports published on the SAFe homepage, seven downloadable reports, sixteen presentations and thirteen links (8 internal and 5 external). In total, we included 82 reports from the grey literature.

Search Results: In total, we selected 88Footnote 2 documents: 82 gathered from the grey literature and six from the scientific databases. When the same organization was described in multiple documents, we treated it as one case only if the documents described an adoption in the same organizational unit. If the adoptions were separate, e.g., coming from different organizational units of the the same company (e.g., AVL Gmbh: DFootnote 36 and D7) they were treated as different cases coming from the same organization. When the same paper described multiple cases (e.g., adoption in different units at different time frames), they were separated as different cases (e.g., Comptel [P3]Footnote 4: D3 and D4). Altogether, we got 54 unique SAFe adoption cases from 52 organizations (see Table 2).

3.3 Analysis

The qualitative data from both peer-reviewed and non-peer reviewed sources was imported into the coding tool NVivo 11 [20]. We followed the coding guidelines presented in [6]. The analysis started with open coding. The open codes were constantly compared to each other based on similarities and differences observed between them. They were grouped into higher code categories, called axial codes. Both axial and open codes formed were thoroughly discussed by the three authors constantly during the coding process that was performed by the first author. We identified 23 codesFootnote 5 for the benefits and 15 codes for the challenges of SAFe adoption. We clustered the benefits according the core values of the SAFe framework: alignment, build in quality, transparency, as they are the elementary beliefs that are claimed to be of primary importance for the effective SAFe implementation [16]. We clustered the challenges into organizational and cultural, roles, practices, as well scaled and distributed. Regarding each benefit and challenge, we mention the number of cases to express the predominance across organizations. However, we did not make any other quantitative analysis, like ranking the benefits and challenges according to the most important and least important, as even though very interesting, that was not possible with this qualitative data.

4 Results and Discussion

We identified only six peer-reviewed primary studies on SAFe. The focus areas of these studies were: assessing the maturity of SAFe adoption [P6], SAFe self-assessment [P5], the SAFe framework in testing [P2], a real-world example on key elements of SAFe [P1], the adoption of SAFe in a globally distributed organization [P3] and one partially focused on the challenges [P4]. Only three studies focused on the adoption and usage of SAFe [P4, P3, P5]. We identified 47 unique cases (82 documents) from the SAFe homepage. These reports focused on the adoption reasons, transformation steps, and benefits of SAFe. Neither the peer-reviewed nor the grey literature had an explicit focus on the adoption challenges. The grey literature provided deeper insights on the SAFe adoption and usage compared to the peer-reviewed literature.

A total of 54 unique cases from 52 organizations (Table 2), were identified. Out of 54 cases, sevenFootnote 6 were identified from the peer-reviewed literatureFootnote 7 and 47 casesFootnote 8 were identified from the grey literature. Organizations from various domains have adopted SAFe such as financial (12 cases), software (9 cases), manufacturing (6 cases), and telecommunications (6 cases). The most prominent domain was the financial services. Moreover, SAFe has been popular in globally distributed organizations.

Table 2. Domain of the case organizations

4.1 Benefits of Adopting SAFe

The reported benefits achieved by adopting SAFe are summarized in Table 3. The most common benefits identified are: transparency (22 cases), alignment (19 cases), quality (17 cases), time to market (17 cases), predictability (16 cases) and productivity (15 cases). The benefits marked by a star (*) in Table 3, are common to both the peer-reviewed and the non-peer reviewed studies.

The core values of SAFe are: build in quality, alignment, program execution and transparency [16]. A large proportion of the cases mentioned they had gained these benefits by adopting SAFe. We compared our findings to the \(12^\mathrm{th}\) state of agile survey. 29% of respondents of that survey had adopted SAFe [29]. This survey reported similar benefits of agile in general as our study found regarding SAFe, visibility (66%), productivity (61%), alignment (65%), morale (61%), predictability (49%), quality (47%), and time to market (62%).

According to our results, practitioners seem to think that SAFe can help to bring several business benefits, such as improved time to market, and faster and more frequent deliveries. Surprisingly, none of the business benefits were reported in the peer-reviewed studies, but the majority of non-peer reviewed studies have attributed their business success to the SAFe framework. This difference could be due to the Scaled Agile Team insisting for business benefits, “most importantly, we look for specific business results, which may include time-to-market, productivity, quality, and employee engagement” [18]. Moreover, non-peer reviewed studies are more inclined towards presenting the benefits of the SAFe framework compared to peer-reviewed studies, e.g., only 8 (marked by *) out of 24 benefits were reported by peer-reviewed studies.

According to our results, practitioners clearly think that SAFe has brought benefits, however, it is also important to look into how the organizations measured these benefits. Unfortunately, not much information is given related to this. Only one study in the peer-reviewed literature focused on SAFe metrics [P5]. Most grey literature cases attributed all the mentioned benefits to the SAFe adoption. However, it would be interesting to learn, e.g., which practices of SAFe brought the benefits. Only few cases had done that. Moreover, most of the benefits mentioned were similar to the general benefits from implementing agile. In the future it would be interesting to study what the unique benefits provided by SAFe practices, such as Agile Release Trains, PI planning meetings and value streams are.

4.2 Challenges of Adopting SAFe

The reported challenges of adopting SAFe are summarized in Table 4. The most commonly mentioned challenges are: resistance to change (10 cases), moving away from agile (7 cases), First PI planning (7 cases), controversies with the framework (6 cases), Agile Release Train challenges (6 cases), staffing roles (5 cases), and GSD challenges (4 cases). Change resistance, GSD challenges, integration of the non-development units and test automation challenges found in our review, were also mentioned in the Systematic Literature Review on Challenges and Success Factors for Large-scale Agile Transformations [8]. Further, change resistance, could be supported by the results from the \(12^\mathrm{th}\) state of agile survey, general resistance [29].

11 out of 15 challenges were common both for the peer-reviewed and non-peer reviewed studies (marked by * in Table 4). It is notable that the majority of the peer-reviewed studies reported challenges during the SAFe adoption, while very few non-peer reviewed studies mentioned challenges.

Even though SAFe is a framework for scaling agile to large enterprises, several organizations felt they were moving away from agile. This challenge is supported by the arguments of several “agilists”, for example, Ken Schwaber (co-creator of Scrum), says that “SAFe is based on RUP, rather than Scrum” [15], Ron Jeffries (co-founder of XP) sees issues in centralized approaches and planning in the framework [15] and Stephen Denning (board of directors of Scrum Alliance), finds SAFe to enforce the horizontal ideology of agile into vertical structures by saying [7] “they run the risk that the firm will emerge back in the unproductive vertical world of hierarchical bureaucracy” [7]. Pancholi and Grover [30] argue that SAFe “murders the spirit of agile development” and claim that SAFe is sold to large organizations that fear change, but would like to increase their productivity and reduce defects. According to them the framework portrays an “agile fairy illusion” [30].

Both organizations previously using traditional methods, as well as those having agile already in use, have shown resistance towards accepting SAFe. There is also a need to draw attention towards the specific challenges of SAFe, such as the challenges related to PI planning, value streams and agile release trains. Some faced controversies within the framework itself, like overhead, and story point normalization. Unlike the benefits, challenges have been mentioned only by 40% of the cases. Consequently, there is a need for more research into the challenges of the SAFe framework adoption and usage, as well as ways to overcome those challenges.

Table 3. Benefits of adopting SAFe
Table 4. Challenges of adopting SAFe

5 Limitations

This section presents the threats to validity [37] and the steps that have been taken to mitigate those threats.

Selection Bias. This occurs during the selection of primary studies based on the interpretation of inclusion and exclusion criteria. We mitigated this by involving all authors in designing the criteria and two researchers filtered the abstracts and titles of peer-reviewed articles independently. Regarding the non-peer reviewed literature, we included all the case studies published on the homepage of the Scaled Agile Framework, which mitigated the threat of selection bias.

Subjective Bias. This threat occurs during the coding of qualitative data. Coding was performed by the first author very meticulously. Coding was iterative, and all authors had several discussions during the coding process regarding the naming of the axial codes and categorization of the open codes into axial codes. The process is traceable.

Restricted Time Span. In the database search, we included primary studies published in the selected databases until November 2017. For the non-peer reviewed literature, we included all case studies published by May 4th, 2018.

Publication Bias. Including grey, non-peer reviewed, literature can be seen as a limitation. Non-peer reviewed articles usually present positive results [37]. This was also evident from our study, as majority of these cases gave attention to the benefits of the framework. In addition, the Scaled Agile team reviewed all case studies reported by the organizations. There might be a possibility for them to influence the organizations to present only the positive elements of the SAFe adoption process. However, out of 82 documents from grey literature 26 documents came directly from the organizations and other online websites, as additional supplements. These supplements (e.g.: presentations) reported the same information as was published under the case studies on the website. This threat of bias was partially mitigated by comparing the benefits from peer-reviewed primary studies, identified in this MLR (6 studies) and the State of Agile survey [29]. The challenges of adopting SAFe were compared to the findings of the SLR on challenges of large-scale agile transformations [8] as well as to the State of Agile survey [29]. However, to establish scientific evidence there is a strong need for more empirical research on benefits and challenges of SAFe.

Information Loss. The codes with only a few quotes and cases (3 cases or less for benefits, 1 case for challenges) were not reported. The keyword search could have missed some studies. We mitigated this by going through the references and citations of all 5 selected studies. We found one additional case study [P5] from the citations of already selected papers [P3].

6 Conclusions and Future Work

The number of organizations adopting scaling frameworks has increased tremendously during the recent years. A few studies have given insights on agile usage in large organizations, however, the literature on the adoption and usage of scaling frameworks has not been systematically reviewed. Moreover, systematic literature reviews on large-scale agile, have not included the grey literature. This means that most published information about the scaling frameworks has been excluded, giving an incomplete picture, as current research literature on them is very limited. Therefore, we included also grey literature in this multivocal literature review.

We analyzed 54 peer and non-peer reviewed cases on the adoption of the Scaled Agile Framework. The most salient benefit categories were: transparency, alignment, productivity, predictability and time to market. The most frequently mentioned challenge categories were: change resistance, challenges with the first program increment planning and moving away from agile. The most important difference between the peer-reviewed and grey literature was the bias in reporting benefits, especially with respect to business benefits received from the usage of the framework. These benefits were mentioned only in the grey literature. This emphasizes the need for validation of the claimed benefits reported in the grey literature. The majority of the challenges were common for both the peer-reviewed and grey literature.

Apart from the challenges related to scaling agile, SAFe has brought in new challenges with respect to practices such as PI planning, value streams, and Agile Release Trains. Empirical research on how the SAFe framework is addressing the existing challenges, that have been reported in the agile in the large literature, could be interesting for practitioners. Moreover, finding solutions for the challenges reported in this MLR, would help organizations to address these challenges.

We identified only six peer reviewed primary studies on SAFe since the introduction of the framework (year 2011). Literature lacks in-depth primary studies on the usage and adoption of SAFe. Some of the non-peer reviewed cases published at the SAFe home page had deep insights on the rationale behind the SAFe adoption, transformation steps, implementation of practices, as well as the benefits of the adoption. Unfortunately, both peer-reviewed and grey literature lack extensive information on challenges and the negative traits of SAFe in large enterprises, as there likely is an inherent positive bias in the cases published at the SAFe home page. Hence, it is crucial to conduct more in-depth primary studies on SAFe adoptions to establish scientific evidence on the SAFe framework usage in large scale.