Keywords

1 Introduction

Agile is extensively used by organizations today [26] as it serves as a powerful and adaptive alternative to the rigid and wasteful software development approaches that were previously used, e.g., waterfall. However, there are some issues with Agile that indicate that it might not be enough by itself—such as lack of user involvement [62] and clear identification of added value [35]. Recent industry cases [22, 66] show that a combined use of Agile, Lean Startup, and User-Centered Design (UCD) can be a way to overcome the aforementioned issues: Lean Startup [57] focuses on adding value to business stakeholders through strategic experimentation, while UCD [43] puts the user at the center of the discussion to foster empathy.

Adopting such a combined approach can lead to several organizational challenges of different nature, such as cultural (e.g., trust), structural (e.g., roles), and technical (e.g., techniques), which are aggravated when dealing with large enterprises as new large-scale issues arise (e.g., inter-team coordination) [48], making instruments to guide and assess the transformation essential in these cases. An example of such instruments are maturity models, which can gauge the transformation in a not overly expensive and time-consuming manner [39] and provide guidance towards improving software engineering processes [18]. Maturity models are widely used in several domains as a means to improve something (e.g., processes or products) and are typically suggest “levels” of maturity to be achieved, as is the case for the well-known Capability Maturity Model Integration (CMMI) [12].

We aim to show what is the current state of the art in maturity models for a software development approach composed of Agile, Lean Startup, and UCD pillars through a systematic literature review following existing best practices for systematic reviews on the software engineering domain. We report on several aspects of maturity models found both in academic and gray literature. As an extended version of a previous paper [78], our study provides new analysis criteria and novel insight on currently available maturity models.

The remainder of this paper is organized as follows: Sect. 2 discusses the use of a combined approach of Agile, Lean Startup, and UCD in software development; Sect. 3 discusses related work; Sect. 4 explains how the systematic literature review was conducted and outlines the research questions; Sect. 5 presents our findings; Sect. 6 deliberates on our analysis; and Sect. 7 deliberates on this study and considers future work.

2 Agile, Lean Startup, and User-Centered Design

The Agile movement dates back to 2001 with the introduction of the Agile Manifesto [5], a result of the then-current wasteful and rigid software development culture and work processes. The extensive use of agile in the past two decades has brought to light some of its weaknesses, such as difficulty in increasing user involvement [62]. A development method composed of Agile, Lean Startup, and UCD is a novel approach that has been argued as a way to overcome such weaknesses [75] and that is drawing the attention of academics [13, 75] and industry practitioners [22, 66].

This combined approach tackles business-level issues with Lean Startup, an entrepreneurship method that focuses on developing a business plan iteratively through the use of a “build-measure-learn” loop, where business hypotheses are evaluated through carefully planned and efficient experiments that gather useful customer feedback, enabling organizations to pivot away from ideas that data suggests to be unfruitful and persevere on the ones most likely to succeed [57]. This idea pulls heavily from traditional Lean values, by way of reducing development waste on new products or features that do not earn back enough to warrant being successful. Although not specifically a software development method, studies have reported on it being a great driving force for software development [15, 76], although embracing its continuous experimentation practices requires proper technological capabilities (e.g., continuous deployment) and organizational support (e.g., culture) [37].

To ensure that the software not only meets business demands but also the users’, the combined approach enlists the use of UCD to enable developers to understand the user’s real needs and create improved software with better usability and user satisfaction [59]. UCD consists of a set of procedures, processes, and techniques that focus on setting the user as the center of the design space or development process [43] at varying degrees of intensity, from consultation of their needs to having them actively participate in the design process [1]. Integrating UCD and software development can help developers with the difficult practice of involving customers, and the wider concern of how to integrate human-computer-interaction concerns with software engineering [9, 61]. As it stands, UCD has evolved into an umbrella term for similar approaches, thus encompassing terms like Design Thinking and Human-Centered Design.

One successful example of the combined approach is fashion retailer Nordstrom’s Discovery by Design. Grossman-Kahn and Rosensweig [22] report on the evolution of the Nordstrom Innovation Lab, a Nordstrom initiative to rapidly and cheaply test novel concepts internally. Each iteration of the lab improved upon the shortcomings of the former, turning what started as an isolated agile development team into an acclaimed innovation team with its own development methodology which encapsulates Design Thinking, Lean Startup, and Agile. The team with its “iterative mindset, relentless focus on the needs of the customer, and bias towards rapid experimentation, prototyping and testing” [22] emerged as a powerful and dynamic asset for Nordstrom.

In academia, Moralles et al. [41] conducted an empirical study to compare Extreme Programming (XP), Lean, and UCD concepts identified through literature reviews with what was being used in practice by two software development teams that use a development methodology that encompasses the three methods. Their findings suggest that both teams use a complementary subset of concepts from each pillar, in addition to techniques and roles not found in the literature. Their study motivated us to seek maturity models that propose the combination of the three aforementioned pillars. Maturity models, which can be prescriptive or descriptive, aim to offer guidance on practices that are relevant to master. The Agile Compass [16], backed by an agile maturing framework [17], is an example of a checklist-based agile maturity model which introduces the category of outcomes an agile team should seek as it matures with regards to the use of practices. Such models can be of help to bring awareness to newcomers to the combined use of Agile, Lean Startup, and UCD.

3 Related Work

There has been several studies contemplating the integration of Agile Software Development with UCD [61]. Adding Lean Startup to this “method combo” is rather of a novelty given the time frame of the three approaches. Current literature encompasses studies contemplating several aspects of the approach itself (e.g., benefits, challenges, or use of experimentation) [66, 67, 74] and also studies that propose models to using the combined approach with varying degrees of abstraction [13, 22, 75]. The combined approach can be very different from typical agile development as it requires a certain degree of developer empowerment that larger organizations might not be used to, making adaptation efforts difficult and the use of maturity models and enticing choice.

Maturity models for agile development has been an interesting research subject ever since the rise of Agile: studies on such maturity models can be traced back to the early 2000s. Leppänen [36] reports that maturity models for Agile have varying levels of maturity, and Nurdiani et al. [44] compares the practice adoption order proposed in existing maturity models with that of industry experts. Ozcan-Top and Demirörs [46] evaluated the strengths and weaknesses of Agile maturity models and frameworks from a process assessment and improvement perspective. Fontana et al. [19] conducted a systematic literature review on Agile maturity models and delineated a classification criteria for how maturity can be defined based on the analysis of the identified models. Henriques and Tanner [25] bring to light that maturity models lack research providing them validation. Pereira and Serrano [52] analyze the main development and evaluation methods for IT maturity models.

4 Systematic Review Protocol

This study was conducted as a systematic literature review based on guidelines for conducting systematic literature reviews in software engineering [33]. Our first effort on mapping maturity models for a combined approach of the three aforementioned pillars found zero results, so we expanded our effort into 7 systematic literature reviews (SR) about maturity models for Agile, Lean Startup, UCD, and their intersections: Agile combined with Lean Startup; Agile combined with UCD; Lean Startup combined with UCD; and Agile, Lean Startup, and UCD combined (each is hereinafter referred to as a search context). The goal of these SRs is to identify and assess primary and secondary studies regarding the use, structure, and evaluation of maturity models for the three pillars. The protocol for the systematic literature review is documented next.

4.1 Research Questions

All SRs address the same research questions, each related to their respective search context.

RQ1. What maturity models are available?

RQ2. How are these maturity models characterized?

RQ3. How do these maturity models envision maturity?

RQ4. How are these maturity models applied?

RQ5. How are these maturity models evaluated?

4.2 Search

As suggested by Kitchenham [33], we used the PICO criteria to guide the formulation of our search string.

Population: Primary and secondary studies related to their respective search context.

Intervention: Maturity models related to their respective SR context.

Comparison: This criterion does not apply to our RQs because the goal of this study is not to compare the identified maturity models.

Outcomes: Understanding of use, structure, and evaluation of identified maturity models.

All SRs followed the same search process. We retrieved studies from electronic databases that met the following source selection criteria:

  • Databases that include journal articles, conference, and workshop papers related to their respective SR context;

  • Databases with an advanced search mechanism that allows filtering of the results by keywords that address the research questions; and

  • Databases that provide access to full papers written in English.

Based on these criteria, we selected the following databases: ACM Digital Library, IEEExplore, Science Direct, Scopus, and Springer Database. We adapted the search string (Eq. 1) for each database based on the search functionality offered by each. Each search string consisted of two parts—S1 and S2—defined as follows:

  • S1 is a string composed of keywords related to maturity models, namely: maturity model, capability model, self assessment, health check, and team assessment; and

  • S2 is a string composed of keywords related to the search context of each SR. Table 1 presents the keywords used.

Table 1. Keywords used in the search string of each SR [78].

As Lean Startup is the newest of the three pillars, we chose to broaden its search context by including other Lean thinking schools, such as Lean UX.

Equation 1. Search criteria boolean expression.

$$\begin{aligned} S1 \text { AND } S2 \end{aligned}$$
(1)

Afterwards, inclusion and exclusion criteria were applied by a varying number of researchers for each SR on the retrieved studies in two distinct rounds, as explained in Sect. 4.3. The first round consisted of title and abstract inspection to triage the candidate studies based on the inclusion and exclusion criteria. The second round consisted of a thorough inspection with full text reading to further filter the studies and to perform the data extraction procedure (Sect. 4.5).

4.3 Study Selection

To determine whether a study should be selected, all SRs applied the following selection criteria.

Inclusion Criteria: (I1) the study presents a maturity model for its SR context; (I2) the study is written in English; (I3) the study is fully written in electronic format; (I4) the study was retrieved from a conference, workshop, or journal.

Exclusion Criteria: (E1) the study does not present a maturity model for its SR context; (E2) the study is an extended abstract or editorial paper; (E3) the study is duplicated.

We only searched for studies published between 2001 and 2020. We chose 2001 as the lower bound as it is the publication date of the Agile Manifesto [5]. Additionally, we performed a manual, informal search on the internet and considered gray literature studies, as these concern very current issues which might have not yet been covered in academic literature [34].

4.4 Quality Assessment

We used a set of quality criteria proposed by Guyatt et al. [23]—later used by Dybå and Dingsøyr [14] in software engineering—to assess the methodological quality of the studies selected for review, as they cover thoroughness, trustworthiness, and significance of the studies [28]. The criteria are based on four quality assessment questions:

C1 –:

Is the research objective clearly defined?

C2 –:

Is the research context well addressed?

C3 –:

Are the findings clearly stated?

C4 –:

Based on the findings, how valuable is the research?

We scored the selected studies on each criterion using an ordinal scale instead of a dichotomous scale to obtain a more accurate assessment [28]. Table 2 shows the scoring scale for each criterion. When there was not an agreement on a study’s score, we had meetings to discuss the issue until we agreed upon the same score.

Table 2. Quality criteria scoring scheme [78].

4.5 Data Extraction and Analysis

We performed a full text reading of each study to identify, categorize, and analyze the following items:

D1 –:

Study identification (RQ1);

D2 –:

Publication source and year (RQ1);

D3 –:

Audience: the expected users of the model (RQ2);

D4 –:

Aim: if the model determines necessary improvements for its use case (analysis) or if it presents best practices for comparison (benchmarking) (RQ2);

D5 –:

Scope: if the model is generic or limited to a specific method (RQ2);

D6 –:

Strategy: whether the model suggests a “big bang” or “gradual” approach to its adoption process (RQ2);

D7 –:

Maturity levels: if the model has specified quantifiable standards (levels) of maturity and has definitions for each (RQ3);

D8 –:

Maturity class: whether the model’s characterization of maturity fits into “practices adoption”, “continuous improvement”, “sustaining approach”, “project performance”, or “highly productive teams”; (RQ3)

D9 –:

Administration mechanism: if the model has defined a mechanism to apply the model (RQ4).

D10 –:

Evaluation: if the model was evaluated, such as by having it applied in a real context (RQ5); and

D11 –:

Evaluation type: whether the model’s evaluation process (if any) can be regarded as an “author evaluation”, “domain expert evaluation”, or “practical setting evaluation” (RQ5);

Most of these items were adapted from the guidelines for developing maturity grids by Maier, Moultrie, and Clarkson (2012). Although the guidelines concern maturity grids, we found them adequate to fulfill the needs of our study. We chose guideline elements that facilitate the categorization of maturity models.

New to this study are items D2, D3, D6, D8, and D11. We enhance our existing evaluation analysis (D10) by categorizing existing evaluation methods as defined by Helgesson, Höst, and Weyns [24] and using Salah and Cairn’s [58] nomenclature: “author evaluation”, an evaluation performed by the model’s authors to assess the model’s processes regarding its intended use or to compare it to similar models; “domain expert evaluation”, an evaluation performed by domain experts external to the model’s development; and “practical setting evaluation”, an evaluation that involves applying the model in a practical setting.

We also make use of the maturity definitions outlined by Fontana et al. [19] and analyze each model’s characterization of maturity, which can resolve into the following categories: “practices adoption”, maturity is increased when new practices are adopted; “continuous improvement”, maturity is similar to CMMI-DEV’s, i.e., mature organizations/teams focus on process improvement; “sustaining approach”, maturity implies organizations/teams with lean processes and adherence to agile values; “project performance”, maturity is a way to obtain results; and “highly productive teams”, maturity is related directly to team productivity.

Additionally, we draw on the work of Julian, Noble, and Anslow [30], who argue that having a strategy to adopting a new approach is crucial for its success, and categorize the suggested strategies of existing models as either “big bang”, in which a set of practices are adopted all at once by-the-book for teams then to learn and modify, or “gradual”, in which practices are gradually adopted and adapted alongside in-place non-agile practices in what is a typically longer transitioning period.

Each researcher received an equal amount of studies to extract data from and apply the study selection criteria again. We made use of the data found in a similar literature review study [19] that focused on Agile maturity models as our search resolved into a superset of the models it identified.

Table 3. Number of identified studies during the distinct rounds of our systematic search for maturity models [78].

5 Results

This section summarizes the results of each SR. Table 3 presents the results of the search process in the electronic databases selected in Sect. 4.2. We analyze the studies in light of our research questions based on the data extracted using the procedure in Sect. 4.5 next. Entries marked as “—” on the following tables stand for “unspecified”.

Table 4. Selected maturity model studies and their sources.
Table 5. Selected maturity model studies and their respective quality scores [78].
Fig. 1.
figure 1

Venn diagram of maturity models for Agile, Lean, and UCD (adapted [78]).

Fig. 2.
figure 2

Normalized quality score distribution for maturity model studies.

Fig. 3.
figure 3

Publication frequency of maturity models [78].

5.1 RQ1. What Maturity Models Are Available?

As mentioned in Table 3, our systematic literature review identified a total of 29 studies establishing maturity models for Agile, Lean Startup, UCD, and their intersections. From our manual search, we selected an additional 4 academic studies [50, 56, 65, 77] and 2 gray literature studies [3, 54] for a total of 35 studies. Table 4 shows the selected maturity models studies and their publication year and venue/source. Our initial objective was to identify maturity models for a combined approach of Agile, UCD, and Lean Startup. This search, however, proved fruitless. There are few maturity models for intersections of the pillars—only 2 for a combined use of Agile and UCD—with a notable absence of models for all three pillars combined. Figure 1 shows the number of maturity models for each category using a Venn diagram. The higher number of maturity models for Agile is expected, as it is the most dominant approach to software engineering worldwide. Of the existing Lean studies, we point out that none concern the use of Lean Startup.

We assessed the quality of the papers as per systematic literature review guidelines [33] (see Table 5). The papers scored approximately 0.78 on average, with at most a 0.03 score difference between identified categories (Agile, UCD, Lean Startup, and Agile with UCD). Studies with a low score (0.45 and below) tended to be short studies with a low page count. Figure 2 shows the studies’ scores in a normalized fashion.

Figure 3 shows the publication frequency of the maturity model studies on a stacked bar chart. Agile maturity models see a fairly consistent publication rate throughout the years. Most Lean maturity models and all UCD ones were published in the past ten years, likely due to the rising popularity of Design Thinking and Lean Startup in software engineering.

Table 6. Overview of data extracted from selected maturity model studies (adapted [78]).

5.2 RQ2. How Are These Maturity Models Characterized?

Table 6 shows an overview of the maturity models. While earlier agile maturity models focused on XP [6, 38, 42] due to it being one of the first agile methods to become popular [4] (and also harboring a strong influence from CMMI), the majority of the maturity models are generic, focusing on the general idea and values of the method they adhere to. Although most of them come from a more general need for maturity models, some originate from very specific demands, such as Lui and Chan’s model [38], which was specifically developed to help Chinese companies dealing with commercial-off-the-shelf software; or Peres’ [53], which focuses on integrating Agile and UCD on CMMI level two compliant organizations.

We identified some benchmark-based maturity models (two for Agile [6, 49] and two for Lean [29, 31]) that were based on CMMI-DEV [12] and adapted from industry models to be used in a generic fashion. The remaining models have analysis as their focus, evaluating the situation of teams and organizations and making use of the notion that each organization has its own unique context and characteristics, where comparisons can be inadequate and faced with resistance [17].

The Lean maturity models were mostly influenced by the manufacturing industry and propose gradually evolving circumstances, focusing on a sustainable adoption of their method of choice [2, 11, 31, 63], with the exception of Cil and Turkan [11] with its analytic network process approach. Whilst Julian et al. [30] identifies two paths to minimize the effects of adopting “big bang” and “gradual” strategies, Lui and Chan [38] interestingly considers both approaches.

5.3 RQ3. How Do These Maturity Models Envision Maturity?

Maturity can be seen as a state to be achieved by teams or organizations, as such, maturity models mostly quantify this by defining maturity levels. Models usually define four to six levels, most commonly defaulting to five (see Table 7). Some authors do not use levels to quantify maturity, however, such as Fontana et al. [17], who argue that “maturity is obtained when results are accomplished in various aspects of software development” and propose an outcome-based model; Cil and Turkan [11], who propose a model based on analytic network process; and Schröders and Cruz-Machados [63], whose model individually assesses and quantitatively rates several criteria pertaining to leadership, culture, knowledge, and process.

Table 7. Maturity class and levels of maturity models.
Table 8. Administration mechanisms of maturity models [78].

5.4 RQ4. How Are These Maturity Models Applied?

Table 8 shows the administration mechanisms of the selected maturity models. Administration mechanisms tend to be simple, not deviating much from instruments similar to questionnaires or checklists, attesting to the inexpensive and somewhat swift quality maturity models are known for. Of note is the work of Patel and Ramachandran [51], which has the support of a web-based tool; and Cil and Turkan [11], which employs the analytical network process to resolve assessments. Almost half of the maturity models, however, provide no mechanism at all, perhaps overstepping on simplicity.

We notice a necessity for more studies and/or discussions on how administration mechanisms must designed or used in a way that is adequate for the specific needs for which their corresponding maturity model was designed for. As a mechanism sensitive to the context its applied on, it most likely needs to be highly dynamic features to properly assess and identify improvement needs and to perform follow-up measurements. Only a few maturity models present in-depth metrics, typically analysis-focused ones [11], but we speculate that using both a quantitative and qualitative approach to analysis might be a better path to proper assessment.

Table 9. Evaluations performed on maturity models (adapted [78]).

5.5 RQ5. How Are These Maturity Models Evaluated?

Table 9 shows how each maturity model was evaluated. Of the models that were in fact evaluated, most of them underwent either a domain-expert or practical setting evaluation. Given that author evaluations seem to be the cheapest alternative and that the majority of the models were not evaluated at all, this perhaps suggests that author evaluations are seen as not worthwhile by academics, or that there is a lack of industry involvement during the development of these artifacts. Indeed, only a few models seem to be actively used in the software industry and none had follow-up studies, revealing a concerning detachment between academia and industry with regard to maturity models.

6 Discussion

A large sum of the maturity models resort to the simple course of action of practice adoption as their main focus to improving maturity, though many also see the competence of being able to continuously improve as the key takeaway of maturity. We highlight that Lean maturity models generally consider the culture and behavior domains, while UCD models focus on development teams trying to adhere to UCD during their work process; outlining the disparity of what maturity even is among different studies. This “confusion” of maturity is not without reason (after all, different models try to solve different things), but makes for a difficult time in trying to establish guidelines on proper maturity model development [7]. Several maturity models are developed to solve the problems and fulfill the specific needs of certain contexts and/or problem domains identified by academia or requested by the industry, even though most of them make use of generic methods. Furthermore, many go without a suitable evaluation procedure and none were demonstrated being applied on real teams or organizations in one or more follow-up studies, casting some doubt on the actual capabilities of some models, despite the fact that our quality assessment resolved into generally positive scores. This could be due to the lack of established guides on maturity model development [52], which would reflect on the quality of the models themselves [21], the inaccessibility of domain-experts in some academia circles, or the higher costs of conducting a proper evaluation procedure with industry partners.

The combined approach of Agile, UCD, and Lean Startup seems most promising still, and it and its variants are being used by organizations worldwide. As larger institutions move to use it, the hardships of large-scale adoption will become more apparent, highlighting the need of maturity models to support them and likely instigating research on the topic. The maturity models reported in this paper garner a lot of knowledge that could help in developing a proper model for the combined approach, even if Lean Startup was not directly addressed by them.

6.1 Threats to Validity

As with any systematic literature review, most threats to validity concern study selection bias and inaccuracy during data extraction. We carried out procedures to reduce such threats, but our protocol is prone to faults: the first round of inclusion and exclusion criteria was applied only once by multiple researchers (no study was evaluated more than once); the studies that participated in the second round of inclusion and exclusion criteria were assessed by two researchers, but no metric to rate inter-rater agreement among the researchers was calculated; data extraction results obtained from a researcher were not checked by another; and no snowball search of any kind was executed.

The systematic literature review was conducted by one PhD, one PhD candidate, three graduate students, and two undergraduate students (all from IT-related education); with guidance from two senior researchers.

7 Conclusion

This paper reports on a systematic literature review of maturity models for Agile, Lean Startup, UCD, and their intersections in a software engineering context following existing systematic review guidelines [33]. We found a total of 35 maturity models, but none were of a combined approach of the three pillars. The methodological quality of the maturity model studies was evaluated using previously established criteria [14, 23, 28]. Then, we analyzed and categorized the maturity models using criteria adapted from maturity grid guidelines [39] and other studies [24, 58].

The absence of maturity models for the combined approach of the three pillars is likely due to its infancy. Research on the use of Lean Startup in software development is not as extensive as the other two pillars, which also already have a subject area specific to researching integration efforts between the two, i.e., Agile User-Centered Design Integration (AUCDI). Additionally, all three pillars lack a widely accepted theoretical basis that properly defines each pillar, leading back to issues like “what is a mature agile team?” and what issues should each pillar tackle individually or together; making integration efforts difficult. The yet unexplained inner workings of the combined approach make the development of a maturity model for it a daunting task.

Although we found some maturity models for Lean thinking, none were specifically for Lean Startup, which seems to be a major driving force behind the combined approach of Agile, Lean Startup, and UCD [22]. Lean Startup deals heavily with continuous experimentation, a practice that is very much intertwined with the method’s somewhat risk-tolerant mindset, perhaps making future maturity models for it focus on cultural concerns, much like the reported Lean models of this study.

The identified maturity models show a worrisome trend in evaluation procedures: about half of the studies did not report on evaluating their maturity model. Although many studies lack a sound theoretical basis and methodology [21], Pereira and Serrano [52] report that several maturity model development guidelines have been created, but that authors choose to follow their own method instead, which could be the cause of this trend. Nevertheless, the lack of evaluation shows an alarming disconnect from the industry, which is where the maturity models are to be applied in the first place.

For future work, the development of a maturity model for the combined approach of the three pillars is evident, although a better understanding of how the pillars interact should be attained first, even if it has been suggested that cultural and mindset factors should be one of the focus points of such a model [22]. The combined approach is a promising take on software development [22, 66], albeit an understudied one. As practices from Lean Startup continue to be adopted by the software industry, we hope to see improved interest in this subject.