Keywords

1 Introduction

Companies use a range of techniques to collect customer feedback in the early stages of product development. In the pre-development phase, techniques such as customer interviews, customer observation and customer surveys are typically used to get an understanding for customer perceptions of new product functionality [2, 610]. Furthermore, mock-ups and different prototyping techniques are common to have customers try early versions of the product and for evaluating e.g. user interfaces. In addition, there exist a number of techniques that can be used to validate customer needs during the development process, e.g. the HYPEX model [1]. Inspired by the ‘Build-Measure-Learn’ loop as outlined in the Lean Startup literature [11], they emphasize the need to build smaller increments of features that can be frequently validated by customers to avoid developing features that are not appreciated. In a number of recent studies [1, 12, 13], the notion of frequent customer validation is described as ‘innovation experiment systems’ in which the R&D organization responds based on instant feedback from customers. Typically, these techniques provide a limited set of qualitative data reflecting individual customer needs [2]. In this paper, we explore the value of mockups as a tool to collect consumer usage from a product management perspective when prioritizing requirements for future releases. Mockup techniques are widely used in software development to create user interfaces to show to the end consumer how the software will look like without having to develop the underlying functionality. From a consumer interaction perspective, one would only discover the missing functionality once the development of the application had started and the first interaction with the UI had been made. Populating the UI with a number of features that are the possible requirements for next release of the software gives the possibility to collect quantitative data [4] about the usage of the specific application. The question is how this data is valued by product managers when taking investment decision for future releases pre-development (e.g. before any funding are given to the development). In most software development companies, the pre-study is the phase in which decisions are taken on whether to develop a feature or not. In this phase, the expected value of a feature is estimated and if the outcome is positive the feature is developed no matter what happens in the later stages of development. However, there are a number of problems associated with this.

First, the estimated value of a feature is typically based on very limited data that can prove whether the estimation is correct. Instead, previous experience and opinions held by product management is what guide decision–making and feature prioritization processes [1]. Due to lack of mechanisms to validate feature value with customers also after development has started, the outcome of the pre–study phase is difficult to question and no continuous re-prioritizations of features and feature content occurs. As a result, feature prioritization becomes an opinion-based process leading to software companies investing in developing features that were considered value adding in the pre-study phase, but the opportunity to continuously validate if this is also true in later stages of development and close to release is not employed [1].

Second, the estimations that are done in the pre-study phase are typically based on limited amounts of qualitative feedback from customers. Data is collected by asking customers what they want and by observing what they do and the output is a limited set of individual customer opinions and experiences regarding system use. While this feedback is valuable, it does not represent a large customer base and it does not reveal actual system usage. Ideally, qualitative customer feedback in the pre–study phase should be complemented with quantitative data that confirm individual perceptions but this has proven difficult to accomplish [3].

Third, due to lack of mechanisms to measure feature usage, companies invest in developing features that have no proven customer value. Often, and as recognized in previous research the majority of features in a system are never, or only very seldom, used. This situation could be avoided if accurate data collection mechanisms were in place, allowing companies to allocate resources to development with proven customer value. In this paper, based on a case study conducted at Sony Mobile, we explore data collection techniques that allow for collection of quantitative data also in the pre-development phase, i.e. before development of a feature starts.

The remainder of this paper is organized as follows. The next section presents the problem our research addresses. Section 3 describes the research methodology and research question. Section 4 describes the case study and data collected. Section 5 contains final analysis and discussion of the result and possibilities for further work.

2 Problem Statement

Based on the research presented above as well as our own research [4], we have learned that three problems are likely to occur in companies developing software-intensive systems. Below we describe each problem in more detail.

2.1 Release Content Cast in Stone

Many companies use a release model [5] where the feature content for each release is decided upon before the start of development. If companies lack mechanisms to continuously validate feature content with customers, they would find it difficult to re-prioritize pre-study outcomes. This causes companies to complete the building of features even if during development it becomes obvious that the feature clearly doesn’t provide value to customers. This causes a sizeable part of the R&D resources to be allocated to wasteful activities and deteriorates the competitive position of companies over time.

2.2 Featuritis

There is evidence that a majority of features are seldom or never used and that customers seldom use the full potential of the functionality they receive [3]. Often referred to as “featuritis” [15], this means half or more of the R&D effort of a company is wasted on non-value adding activities. Similar to the previous problem, if competitors manage to have less waste in their R&D activities, over time the market position of the company is affected negatively.

2.3 Everything and the Kitchen Sink

Although often treated as atomic in research, features can be implemented iteratively and to a lesser or greater extent. Engineers often have a tendency to build features such that all use cases, exceptions and special situations are taken into account. Often, however, the value of a feature to customers is already accomplished after building a small part of the feature that provides the greatest value. Further development does not lead to (much) more value for customers. However, there is a risk that companies find it difficult to decide on when and how to stop building a feature when further iterations fail to add value to customers due to a lack of mechanisms for collecting feedback before, during development and after deployment of functionality [2].

2.4 Summary of Problems

There is a need to overcome these problems to stay competitive. The one who makes the best prediction (by prioritizing the features with highest value) not only wins the market shares but also reduces waste in the development cycle. However, prioritization is a challenging part of the requirement engineering processes since it is trying to predict the future. This is especially true in a market driven context addressing end-users as customers. A number of qualitative prioritization methods are defined in requirement engineering processes. Qualitative prioritization methods are often by nature subjective and involve for example guessing or weighing requirements against each other. An alternative approach to find the requirement priority is to quantitatively measure usage by introducing mockups to collect what users find interesting. In [4] research shows that collecting quantitative feedback before development is feasible, the data collected deviates from the original feature prioritization, i.e. it is beneficial, and the data gives further insight in requirement prioritization than a qualitative method could have provided.

3 Research Methodology

3.1 The Case Company

A case study has been performed at Sony Mobile Communications Inc. (Sony Mobile). Sony Mobile is a wholly owned subsidiary of Tokyo-based Sony Corporation, a leading global innovator of audio, video, game, communications, key device and information technology products for both the consumer and professional markets.

3.2 The Research Questions

Three aspects of the requirement prioritization are within the scope of the study. The goal of the aspects is to capture if and how quantitative data have an impact on requirement prioritization. The questions are formulated as following.

  1. 1.

    What impact does quantitative data that measures customer usage via a mock-up have on prioritization made by product managers?

  2. 2.

    How does the prioritization made by product managers match the consumer usage customers measured via a mock-up?

  3. 3.

    How valuable do product managers believe that quantitative data is for prioritizing requirements?

3.3 Research Process

In our study, we focus on the data collection practices, and especially if quantitative data collected in the pre-development phase impacts the decision of senior product managers.

To study the described topic, we conducted a case study research [17, 20] based on the case company where two of the authors also have an assignment. As a research method, case study research is typically used to contribute to our knowledge of individual, group, organizational and social phenomena, and is typically used to explain ‘how’ and ‘why’ and questions that require an extensive and in-depth understanding of the phenomenon [17, 20]. The case study with an associated outcome from each step is described in the high level flowchart in Fig. 1 below. Each step is then further described more in detail.

Fig. 1.
figure 1

Process steps and outcome of the case study

  • STEP 1: Choice of Product and Application.

    The first step is to select a product and application that meets requirements 1-4 shown below:

    1. 1.

      Possibility to change the feature set for selected users by showing a mock-up of a new feature set.

    2. 2.

      Large number of interactions with end consumers

    3. 3.

      Main assumption and statistics is that people using the app are first time users.

    4. 4.

      Senior product managers working with the feature set that are willing to do a prioritization with and without the quantitative data collected.

Requirement 1 targets the technical aspects of the Android applications. It must be possible to create a mockup of an existing application and replace the application in the already deployed product. Requirements 2-4 address different aspects of the validity of the study and are further discussed in the validity threat section.

The choice of product landed on Sony Xperia™ phone and a specific Android application. This choice of product and application has been governed by compliance with all four requirements. Due to confidentiality reasons the nature and name of the application cannot be disclosed. However, we can provide the following base data: The application was first deployed to live users in January 2013 and at present it is available in approximately 40 million devices globally. With a few exceptions, the application is shipped with every mobile phone and tablets that Sony Mobile ships. Every month, the application is used approximately 3.5 million times where every use typically involves three main use cases. Based on collected data we know that the original application has approximately 9.5 million interactions per month. The use of the application is consistent over the year and does not show e.g. seasonal variations or variations due to product releases. The application is considered to be one of the base line applications delivered with Sony Mobile. The mock-up was used 34,393 times during the 10 days it was published. The idea was that the user preference of different features should indicate what features should be given the highest priority.

  • STEP 2: Elicit a feature set for the application.

    A competitive analysis was done to elicit the features to be included in the application. App store and Google play were scanned for either apps within the same domain or apps from competitive brands with similar functionality. Each app was then investigated and 12 features were chosen as suitable for the study.

  • STEP 3: Collect quantitative data for the feature set.

    As described in [4] a mock-up of the application was produced and deployed to the user base in December 2014. The application consisted of 12 features that were displayed to the user when starting the application. The order of the features were randomized in order avoid the risk that the user selected e.g. the first or last option. To keep development cost and lead time as low as possible it was only available in English and hence only available in English speaking markets. The design was such that it was only shown once for each user. Returning users did not receive the mock-up behavior but was instead shown the actual first version of the application.

  • STEP 4: First interview with product managers.

    Individual interviews were performed with five selected senior product managers from the case organization.

The product managers had twelve features to prioritize against each other to get most of their investment concerning the goal of the application. The duration of each interview was approximately 1 h and was conducted by two of the researchers. During the interviews, open questions were asked, related to the prioritization of selected features for the specific android application.

Also, in order to validate the feature set elicited by the researchers the interviewee had the possibility to add features both prior, during and after they set the prioritization.

  • STEP 5. Second interview adding result from the mockup.

    In step 5 the product managers were once again presented with the purpose and context of the study. After that, the mockup result and prioritization of the first interview was presented. When having secured that the context was understood, the prioritization made earlier and the mock-up data was presented and acknowledge by the product managers. Now they were are asked to re-prioritize the order of the 12 features. A last question in the interview was an open question about the value of the data presented in their line of work.

  • STEP 6: Analysis.

    The analysis was done by comparing the prioritization made in the first interview but the prioritization made in the second interview. A difference between prioritization before and after the data could be seen as an indication that the quantitative data presented impacts the decision making of the product managers. The difference is shown per feature and for each product manager. Also, the difference could show if the prioritization is aligned towards the usage of the mockup. For example, a high usage of a specific feature in the mockup would indicate that his feature is of high interest for the many users and thus be placed high (low number) in the prioritized list

3.4 Validity Threats

Requirement 2 in STEP 1 targets the external validity [16, 17] of the study. External validity refers to how well data and theories from one single case study can be generalized. Another aspect of the external validity regards whether the domain (mobile applications) of the case study can be generalized to other domains, as for example IT systems. This threat can be dealt with by doing multiple use cases and are not in scope for this paper.

Requirement 3 in STEP 1 targets the internal [16, 17] validity of the study. Internal validity refers to how well the case study is designed to avoid confounding factors. A confounding factor can be described as a possible independent variable causing the effect, rather than the variable concerned in the case study. Another aspect of the external validity regards whether the domain if the case study can be generalized to others such as for example IT systems.

Requirement 4 in STEP 1 targets the construct [16, 17] validity by making sure the senior product managers have experience within the same feature set as the application chosen. Another threat to the construct validity is the discourse of the interview. Since all participants are from the same company and project it is believed that the discourse in the interviews was the same for all interviews. However to further secure the discourse aspect the first part of the interview was used to explain the context for the prioritization. The context of the interview was explained as; (1) investment decisions of an existing application and the (2) AIDAS (Attention, Interest, Desire, Action and Satisfaction) is used to define the wanted effect of the investment [18, 19]. AIDAS is an acronym for a model that illustrates a theoretical consumer journey from the moment a brand or product attracted consumer attention to the point of action or purchase. This journey is often referred as a conversion funnel.

The following explanation was made in the first part of the interviews to further secure the context. During the interview the context of prioritization was described from three different angles:

  1. 1.

    The purpose of the list is to be used for prioritization of the investments that should be made in the next release of the application

  2. 2.

    The categories of the feature chosen and the investment should be made to impact the ACTION segment in the AIDAS model.

  3. 3.

    Finally the interviewers explained that the feature set chosen was related to the usage of the device itself (e.g. not controlling or concerning another device or service).

4 Results

4.1 Data Collected from Mock-up Application

The features, enumerated as FT01-FT05, FT07-FT13, The feature FT06 was excluded due to semantic and technical reasons (see [4] for a thorough explanation of why FT06 had to be excluded from the feature list). Based on the 34,393 interactions the relative selection distribution was as shown in and Fig. 2 below.

Fig. 2.
figure 2

Feature usage in the mockup

Looking at Fig. 2 it is clear that FT01 is by far the most used feature, FT13 also stands out and there seems to be a small preference for FT02, FT03, F08 and FT09.

4.2 Variation in Prioritization Between First and Second Interview

The five product managers were asked to prioritize the same twelve features that were used in the mock-up with and without information about result from the mockup. The features were shown to the product managers with the same text that was used in the application. Additional explanation was given on request though the features in this domain and context to a large degree are given for the product managers. No additional features were added during the interviews.

The different prioritizations are listed in Table 1. The prioritization based on usage of the mock-up is denoted Usage, S1-1 to S5-1 denotes each of the product managers and first interview and S1-2 to S5-2 denotes the second interview.

Table 1. Prioritization based on mock-up usage, first interview and second interview. Items in priority order top to bottom.

Our hypothesis is that the priority of the features would be aligned to the feature usage of a mockup when that information is available to the product managers. To analyze and visualize this hypothesis the prioritization of the product managers have been compared to a prioritization constructed by taking into account the feature usage only. In Figs. 3 and 4 the rank of the prioritization made be the product managers has been compared to the rank (1–12) of the prioritized list constructed by mock-up data. For each feature (P) the difference in distance between the ranks is calculated as Rank (Prioritization by Product Managers) – Rank (Prioritization by feature usage). This is done for each of the product managers. The data is grouped per feature and is presented in Fig. 3 without information from the mock-up and Fig. 4 with information about the mock-up.

Fig. 3.
figure 3

Difference in prioritization of each feature without knowledge about feature usage from mock-up

Fig. 4.
figure 4

Difference in prioritization of each feature with knowledge about feature usage from mock-up

Product Manager Prioritization - First Interview.

In Fig. 3 we see that the product managers had different opinions in 57 out of 60 possible prioritizations. Looking in more detail the differences are more than just small variations, as almost one third (19 items) differ more than 3 positions (see Table 2 for details).

Table 2. Difference in prioritization; number of items that differ per difference.

Visually it also appears that the product manager’s low priority items differ a lot from the usage of mock-up. Giving low priority to consumer high priority, however investigating this further we see that the average change per priority is about 16,5 with a standard deviation close to 6, this variation is considered random, in spite of its visual appearance in Fig. 3.

Product Manager Prioritization – Second Interview.

After reprioritization the difference is listed in Table 3 below.

Table 3. Number of items with difference > X.

Visually the difference is smaller between the prioritization done by product managers and the prioritization constructed by the feature usage of the mock-up. After reprioritization the number of items that differs three or more positions have decreased from 19 to 12, the number of items with same priority as the user group has gone from 3 to 11. The average difference is approximately 11,33 (down from 16,5) with a standard deviation of 4,8.

4.3 Alignment Towards Consumer Data Between First and Second Interview

First Interview.

To further investigate if the product managers prioritize differently we look at the prioritizations from a feature perspective. The total difference in prioritization is 180 with an average prioritization difference of 15. The difference can be illustrated as in Fig. 5 below. Largest difference is for FT13 (32) and smallest difference for FT08 (8). Noting that if all product managers had agreed with the consumer selection the difference would have been 0, if all product managers had prioritized with a priority 1 point above or below the difference would be 5. The maximum possible difference is 55, had all product managers prioritized differently with the maximum offset of 11 for that feature.

Fig. 5.
figure 5

Difference in prioritization per feature, first interview

Second Interview.

Also in the second round of prioritization we compare the prioritization from a feature perspective. Producing the same graph as before the product managers had seen the customer prioritization we get the illustration shown in Fig. 6 below. Visually it appears that all product managers have aligned with the data and doing the same calculations as before we see that the total difference is 138 (was 180) with an average of 11,5 (was 15).

Fig. 6.
figure 6

Difference in prioritization per feature, second interview

There is still large disagreement on FT13 that differs 26 (was 32). FT02, FT03, FT05, FT08 and FT11 now have a difference of 8 or lower (FT03 has 7).

4.4 Answers to Open Questions About Value of the Mockup Data

As a response to the question “What is the basis for your new prioritization?” all of the product managers agreed that the consumer data made them re-prioritize. Some highlights of the responses are denoted below. For confidentiality reasons the complete responses cannot be reported in the paper, since they included the specific features and the nature of the features.

  • “We have similar data from other sources that would be interested to compare”

  • “Would have been interesting to understand the impact of changing the naming of the features”

  • “Cannot prioritize solely on quantitative data, but gives valuable support when prioritizing”

  • “Important data for pre-development decisions”

  • “Did not know that this specific feature would be so popular”

  • “As a strong product manager I need to set the direction and use this data for experimentation”

  • “Important to understand what the consumer wants as presented by the data”

  • “Small differences do not change my prioritization, need to see considerable differences to change my prioritization”

  • “Triangulation with other data sources important”

  • “Would try to use this method for pre-development decisions”

  • “Need to consider the top usages but difficult who to interpret the rest”

  • “Need to understand the cost of the feature before prioritizing, even if customer wants the feature there might be no business case to develop”

4.5 Summary

Looking at the numbers there seems to be a difference in prioritization before and after the product managers were shown the result of the mock-up. Looking at prioritization order and the different features we see that the product manager’s prioritization is less different from consumers after the data was presented.

5 Discussion

5.1 Research Question 1: What Impact Does Quantitative Data that Measures Consumer Usage via a Mock-up Have on Prioritization Made by Product Managers?

According to Figs. 3 and 4 we see that there has been a re-prioritization of features when using the quantitative data. One could argue that the impact can be of a random nature or another factor impacted the prioritization. This threat has been handled during the interviews by direct question from the interviewers to re-prioritize based on input from the data. Thus our research indicates that this changes stems from the availability of the quantitative data collected.

Another insight is the variation between the product managers both in the first interview and second interview. An interpretation can be that product management is regarded an individual skill based on individual opinion. This makes it even more important to have something to align with and to get feedback from. Peers in product management would have their own opinion; we therefore see the quantitative data as a valuable feedback source for each individual product manager to use in their daily work.

5.2 Research Question 2: How Does the Prioritization Made by Product Managers Match the Consumer Usage Measured via a Mock-up?

This is a challenging question in light of the variation in the first prioritization by the product managers. However, a trend towards aligning with customer data can be seen in Fig. 5 by comparing to Fig. 6. Both visually and by measuring the discrete distance for all product managers it can be derived that prioritization is conforming to the quantitative data collected. There is however a not negligible difference between individual product managers prioritization. The difference is interesting, thus is indicates that there are more factors besides customer usage that impacts the decision making.

5.3 Research Question 3: How Valuable Do Product Managers Believe that Quantitative Data Is for Prioritizing Requirements?

As an open question in the end of the interview the senior product managers got a qualitative question about if the data present was beneficial in their prioritization and gave more insight. All of the product managers were positive to use the data in the prioritization, although many of them did not prioritize the features with most usage. This would indicate that product managers have multiple factors to consider. Other aspects could be that their accountability is not towards consumer’s satisfaction only. Even if product managers decide not to follow the consumer data, they should be aware of the consequences of missing expectations [21]. The consequences are especially true within software services such as web and many smartphone applications that target a wide consumer base. You only get limited number of chances for consumer engagement. If the opportunities are missed the consumer will engage with a competitor or alternative service. Then the business opportunities are lost for this particular instantiation of the conversion funnel as for example described in the A, I and D steps in the AIDAS model. You would need a new investment to regain the consumer base and you need to start a new instance of the AID steps.

Using a mock-up to evaluate features should be an integrated part of the prioritization of features. The strength is to relatively cheaply (compared to a complete development cycle) collect information about consumer expectations about possible new features. The collection could also be in an iterative way to compare combination of features set of the product. Example of features-sets could be elicited by analyzing the scope in similar products or looking at new opportunities.

6 Conclusion and Further Study

Concluding, the contributions of the paper are threefold. First, product managers change their prioritization when quantitative data is presented to them. Second, product managers change their prioritization which is converged to the prioritization indicated by the quantitative data. Third, the quantitative data is regarded as beneficial by the product managers.

Finally, there are a number of opportunities for further study that we are considering, especially concerning the empirical design. Examples of further studies could be; expanding the analysis to more applications, other types of applications, consider the cost of value delivered for each features, other case companies and other domains as for example IT systems where doing a mock-up could more expensive.