1 Introduction

Software engineering (SE) projects, processes, and artifacts are typical objects for which case studies are a feasible research approach. Case studies are characterized by their flexible nature, evolving over the course of the study, focusing on a phenomenon in context, using multiple methods of evidence or data collection. Selection of cases to study is not governed by sampling logic and representativeness; rather cases are selected for the purpose of being ‘typical’, ‘critical’, ‘revelatory’, or ‘unique’ in some respect (Yin 2014). Case studies, as any empirical research is costly and it is usually not possible to investigate all the aspects of a phenomenon in one case study. The issues of what kind of generalization is possible from a single case and how such generalizations might be established are important to investigate, as these issues are not concerned with statistical generalization, where there is established theory and practice on how to generalize.

Progress, however, in any scientific field depends on the accumulation of knowledge from diverse aspects of a phenomenon; it is necessary, therefore, to adopt approaches for integrating and providing new interpretive explanations about existing case studies. Case study synthesis can help accomplish this goal, by extending the overall evidence base beyond the single case (Runeson and Höst 2009; Runeson et al. 2012).

Research synthesis is used as a collective term for a family of methods to summarize, integrate, combine, and compare the findings of different studies on a specific topic or research question (Cruzes and Dybå 2011a). It is built upon the observation, that no matter how well designed and executed, empirical findings from single studies are limited in the extent to which they may be generalized (Cruzes and Dybå 2011a). The synthesis of case studies must take into account the flexible nature of the cases, the mixed qualitative and quantitative characteristic of the data, and the type of cases being studied. The flexibility in the choice of methods for performing a case study is one of the characteristics that lead to challenges in conducting the synthesis.

The process of synthesis entails organizing the relevant evidence extracted from the included sources and then finding some way of bringing it together. The way the evidence is organized depends to some extent on the type(s) and scope of the evidence, the method(s) employed and on the preferences of the researcher (Pope et al. 2007). As with data extraction, the process of organizing the studies is often facilitated by the use of charts or tables summarizing key aspects of the studies. The formats of these largely depend on how many studies or pieces of evidence are included, but they need to be capable of allowing repeated examination and comparison of the relevant data from each study.

Synthesis methods are usually tailored to a particular type of evidence, for example meta-analysis aggregates and averages different findings in experimental or quasi-experimental studies, whereas qualitative synthesis (such as meta-ethnography, thematic synthesis, narrative synthesis and cross-case analysis) synthesizes findings from qualitative studies (Bethel and Bernard 2010). In addition, there are a large variety of methods for synthesizing qualitative and mixed-methods evidence (Cruzes and Dybå 2010, 2011a; Pope et al. 2007). Common to these methods is that they embody the idea of making a new whole out of the parts to provide novel concepts and higher-order interpretations, novel explanatory frameworks, an argument, new or enhanced theories, or new conclusions. Further, many similar methods appear under different names in different research traditions. Cruzes and Dybå describe how some of these methods have been used in systematic literature reviews in SE (Cruzes and Dybå 2011a), but the vast majority of the methods are yet unexplored in SE.

For the purpose of this paper, three of the most relevant methods of case study synthesis are compared: thematic synthesis, cross-case analysis, and narrative synthesis. Our aim is to demonstrate the similarities and differences of the results and conclusions when applying different methods of synthesis, and to discuss the challenges of synthesizing evidence from reported case studies in SE. Our main research questions are:

What are the differences in the results when using narrative, cross-case or thematic synthesis of case studies evidence in SE?

What are the main challenges of performing case studies synthesis in SE?

To investigate these research questions, we performed two independent syntheses of two published case studies (on trust in outsourcing) (Babar et al. 2007; Oza et al. 2006). The primary studies were selected because of their relative homogeneity, allowing us to address the easier synthesis issues first. One team applied cross-case analysis of the two papers and the other team applied thematic synthesis. We compare and discuss the results of these two syntheses to each other and also to a third, already published narrative synthesis of the same two papers (Babar et al. 2007). In addition, we discuss the challenges of performing the syntheses. Preliminary findings were reported as a short paper (Cruzes and Dybå 2011a). We have now explored the analysis in depth and present a worked example to illustrate the methods, and the challenges in applying them to published case studies.

The rest of this paper is organized as follows: Based on the literature on research synthesis we discuss case study synthesis and describe the three methods of synthesis in Section II. The worked example is described in Section III. The experiences, strengths and differences from the syntheses are presented in Section IV. Section V concludes and outlines further work.

2 Case Study Synthesis

Most case studies in SE research are single-case or few-case studies, with large sample comparative studies still being seldom. The result is that knowledge about the phenomena of SE practices, methods, and techniques are spread over a myriad of diverse studies. Additionally, the majority of the data collected in these case studies are observations and interviews that are analyzed qualitatively.

The simplest and possibly the most widely used way to combine such studies is the traditional informal, narrative literature review, which is used to review every kind of conceptual and empirical literature, including case studies as well as quantitative studies. Relying primarily on the subjective insight and knowledge of the researcher, these traditional reviews lend themselves mainly to exploratory studies aimed at summarizing a certain research literature without applying a strict research question (Pope et al. 2007). The advantage is that the researcher can put his/her own judgments of particular studies and compare them in a flexible manner. The disadvantage is that the researcher can be biased towards his/her own experience and beliefs on the topic. Besides, as traditional reviews typically do not develop clear criteria as to which studies are to be included and how they are synthesized, other researchers can hardly replicate their synthesis.

Systematic literature reviews (SLR) has been the approach used in SE for synthesizing research for diverse primary studies since 2005 (Kitchenham and Charters 2007; Kitchenham et al. 2004). In SLRs, the researchers attempt to gather relevant studies, critically appraise them, and come to judgments about what works using explicit, transparent, state-of-the-art methods. SLRs include details about each stage of the review process, including the questions guiding the review, search methods, inclusion and exclusion criteria, details on the data extraction and methods and process of synthesis. Synthesis is one of the phases in software engineering SLRs that suffer the most from lack of transparency and usage of state-of-the-art methods. Despite the fact that methods of synthesis have been available for many years in other disciplines (Pope et al. 2007), about half of the SLRs in SE limit themselves to map the area of study without synthesizing the evidence (Cruzes and Dybå 2011a), and even the ones that do synthesize evidence are not fully exploring the methods that are well established in other disciplines.

For case studies in particular, synthesis methods have been available for at least four decades (Larsson 1993; Lucas 1974; Newig and Fritsch 2009). These methods allow systematic and rigorous synthesis of previous case-based research by generating findings and conclusions based on rich case material created by different researchers, contexts and study designs, and at the same time allowing for a much wider generalization than from single cases. The empirical evidence, which such syntheses depend upon, is the data on which a conclusion or judgment may be based. Although there are many ways to generate evidence, case studies have a special ability to provide deep understandings of the phenomena under study from direct observations of practice through rich, longitudinal and multi-sourced data. The synthesis must take into account the flexible nature of the case study, the qualitative and mixed characteristic of the data, and the number and type of cases in each primary study.

Table 1 outlines some of the methods that are most relevant for synthesizing evidence across case studies (a more complete list is provided in (Cruzes and Dybå 2010, 2011a). Largely depending on the research goal and overall research approach, for the synthesis of qualitative case studies, most probably no single method will offer all the required features for performing the synthesis, so a combination of methods may often be the best approach. In the following, we describe and compare three most used of such methods; thematic synthesis, cross-case analysis, and narrative synthesis, which we use in the worked example to explore some of the methodological challenges of SE case studies synthesis (see Table 2).

Table 1 Relevant case study synthesis methods (adapted from (Cruzes and Dybå 2010, 2011b))
Table 2 Detailed description of thematic, cross-case and narrative methods of synthesis

Thematic synthesis is a method for identifying, analyzing, and reporting patterns (themes) within data. It is one of the most common methods for synthesis of evidence in SE (Cruzes and Dybå 2011a). Thematic synthesis resembles some of the characteristics of grounded theory analysis, in that the themes emerge from (are grounded in) the primary data. It minimally organizes and describes the data set in rich detail and frequently interprets various aspects of the research topic. It comprises the identification of the main, recurrent or most important (based on the specific question being answered or the theoretical position of the reviewer) issues or themes arising from a body of evidence (Cruzes and Dybå 2011a). The level of sophistication achieved by this method can vary; ranging from simple description of all the themes identified, through to analyses of how the different themes relate to one another in a conceptual map (Pope et al. 2007). The advantage of thematic synthesis is that it provides a means of organizing and combining the findings from a large, diverse body of research (Pope et al. 2007). It can handle qualitative and quantitative findings, and it can be a deductive, theoretically driven approach or an inductive one, in which themes ‘emerge’ from the process of synthesis. However, transparency is usually criticized in thematic synthesis, since there are many different ways to perform it. Recently, (Cruzes and Dybå 2011b) extended existing approaches of thematic synthesis with relevant guides and recommendations, conceptualize thematic synthesis in SE as a scientific inquiry consisting of five steps based on the extent literature (See also Table 2).

Cross-case analysis is a method that facilitates the comparison of commonalities and differences in the events, activities, and processes; the units of analyses in case studies. The term cross-case analysis is sometimes used as a general umbrella term for the analysis of two or more case studies to produce a synthesized outcome (Khan and VanWynsberghe 2008). In some contexts, it has narrower meaning, referring to a specific method for performing the analysis, organizing the data from the cases in tables and graphs. We use the term in the specific sense, referring to a method to synthesize the findings of two or more case studies. Although there are several cross-case analysis approaches and techniques available to the case study researcher (Khan and VanWynsberghe 2008), currently, cross-case analysis has not been applied in SE systematic reviews (Cruzes and Dybå 2011a). The cross-case analysis, as proposed by Miles and Huberman (1984, 1994) is originally presented as a method to synthesize evidence from multiple cases within a multi-case setting, rather than a secondary analysis of different case studies. A possible reason is that the method is used for studies that have the same research questions, although this is not necessarily the case for independent case studies. However, there is nothing in the method as such, preventing it from being applied in secondary studies. The drawback in the secondary study context is that the access to raw data from the primary studies is limited by the publication format; but nevertheless, a limitation common for all synthesis methods. Miles and Huberman’s process (1984, 1994) consists of three concurrent flows of activities: data reduction, data display and conclusion drawing/verification (see Table 2).

Data reduction is the identification of items of evidence in the primary studies. It is worth noting that the major data reduction is conducted in the analyses in the primary studies themselves. Data is then clustered into meta-matrices and time-ordered displays, which are used to draw conclusions from the synthesized studies. The use of matrices and tables facilitates the comparison of the cases and areas of agreement or disagreement across cases. Miles and Huberman classify cross-case clustering approaches in variable-oriented or case-oriented. In variable-oriented approaches, variables identified in the cases take center stage, that is, the inner-dynamic of the case is replaced with a search for patterns and themes that cut across the cases; the pressure is put on the researcher in terms of interpreting the answers so that they can be reduced to variables. In case-oriented approaches, commonalities across multiple instances of a phenomenon may contribute to conditional generalizations thought formation of types or families of studies. One advantage of the method is the transparency that the data-matrices allow to the process of synthesis. One disadvantage is that it may lead to conclusions of the abstracts levels of the variables and cases without considering the whole context of the studies.

Narrative synthesis refers to an approach of synthesis that relies primarily on the use of words and text to condense and explain the findings of the synthesis. Whilst narrative synthesis can involve the manipulation of statistical data, the defining characteristic is that it adopts a textual approach to the process of synthesis to ‘tell the story’ of the findings from the included studies (Popay et al. 2010; Pope et al. 2007). As used here ‘narrative synthesis’ refers to a process of synthesis focusing on a wide range of questions, not only those relating to the effectiveness of a particular intervention. It is a general approach within which a wide range of specific methods of synthesis can be used. Popay et al. (2010) define four main elements of a narrative synthesis process (Table 2): theory development, development of a preliminary synthesis, exploring relationships in the data, and testing the robustness of the synthesis. Around 20 % of the synthesis methods in systematic reviews in SE can be classified as narrative synthesis (Cruzes and Dybå 2011a). However, none of these systematic reviews are explicit about which approach was followed. The lack of transparency and lack of an authoritative body of knowledge as well as the lack of reliable and rigorous techniques are among the drawbacks of the approach. The data collection is also a point of debate as there is not a systematic defined criterion to choose the data and it is usually based on the convenience of the analyst. The framework by Popay et al. (2010) has the potential to produce more transparent and more sophisticated narrative syntheses if they start to be adopted in SE.

3 Worked Example

To investigate the research questions posed in this paper, we conducted two independent syntheses of two published case studies (on trust in outsourcing relationships) (Babar et al. 2007; Oza et al. 2006). We defined a common synthesis goal and ran one synthesis in Sweden (using cross-case analysis) and the other in Norway (using thematic synthesis). These two syntheses were then compared to a third independently performed, already conducted narrative synthesis of the two case studies. The common goal of the syntheses was to:

  • Understand factors of trust in outsourcing relationships.

This is a knowledge support goal and not a decision support goal (Ashrafian et al. 2011; Pope et al. 2007). A synthesis directed at knowledge support will typically bring together and synthesize research evidence on a particular topic aiming at creating new knowledge on the topic. We identified two papers that could help us to fulfill our goal: Oza et al. (2006) and Babar et al. (2007). They were selected based on their relatively high homogeneity, investigating very similar research questions, from a similar perspective, although in two different contexts, 2 years apart, and with two different sets of researchers. Preliminary versions of both studies were published at the EASE conference in 2005 and 2006 (Nguyen et al. 2006; Oza et al. 2005), respectively. At the 2006 conference, the similarity between the two studies were observed, leading to the latter study being extended with a narrative synthesis between the two, when expanded into a journal version (Babar et al. 2007). Interestingly enough, only one of the papers was included in an SLR of global software engineering, despite their similarity (Šmite et al. 2010).

The Oza et al. study, was based on interviews of 18 software development practitioners in India (Oza et al. 2006), while the Babar et al. study was based on interviews of 12 Vietnamese practitioners developing software for Far Eastern, European, and American clients (Babar et al. 2007).

The goal of the Oza et al. study was to investigate the following research questions:

  1. i)

    What are the critical factors to achieving trust initially in an outsourcing relationship?

  2. ii)

    What are the critical factors to maintaining trust in an established outsourcing relationship?

The goal of the Babar et al. study was to investigate what factors are important for:

  1. i)

    Establishing trust in off-shore software outsourcing relationships, and;

  2. ii)

    Maintaining and strengthening trust in offshore software outsourcing relationships.

A secondary goal of the journal version of the study by Babar et al. was to compare their results with Oza et al. (the first study). This comparison was performed through narrative synthesis. We decided to not read the narrative synthesis before we had performed our own syntheses. For the data collection, the Oza et al. study used standardized open-ended interviews to collect qualitative data. Babar et al. used semi-structured interviews based on a modified version of the questionnaire developed and used by Oza et al. Both studies used qualitative data analysis approaches for reaching conclusions. Both studies also have their own definitions for each factor of trust. These definitions are reproduced in Tables 3 and 4.

Table 3 Definition of trust as defined by Oza et al. (2006)
Table 4 Factors important to establish and maintain trust relationship, as defined by Babar et al. (2007)

In the following, we describe how we performed the syntheses and what were the results from each synthesis process: thematic, cross-case, and narrative synthesis.

3.1 Thematic Synthesis

The thematic synthesis followed the steps and checklist proposed by Cruzes and Dybå (2011b) (see also Table 2), and was performed by the Norwegian team. Five steps were performed (as described in Fig. 1): initial reading of data/text (extraction), identification of specific segments of text, labeling of segments of text (coding), translation of codes into themes, creation of the model and assessment of the trustworthiness of the model.

Fig. 1
figure 1

Process of thematic synthesis followed in the worked example (adapted from Cruzes and Dybå (2011b))

The extraction of the data consisted of the publications’ details (authors, title and publication year), the context (geography), and the study results (factors of trust in outsourcing relationships). We used NVivo to help with the identification of the segments of text containing references to factors of trust in the two papers (Table 3 and Table 4). The coding was also done using NVivo and consulting the list of definitions of each factor as used by the authors of each paper. As shown in Fig. 1, we extracted 32 segments of text from the 22 pages of the two papers (references in NVivo). From these segments, 27 codes were abstracted considering the commonalities and differences on the definitions and the text where the definitions were quoted (as shown in Fig. 2). For each code, it is possible to retrieve the definition given by each paper to that concept and also get a link to original text where the code came from, in Fig. 3 the communication node is shown, it has two segments of text that specifically describe communication as a factor of trust in outsourcing relationships: Oza et al. defined it as: “How communication can help maintaining trust with the clients,” while Babar et al. defined it as: “How effectiveness of communication with clients (maybe in clients’ native language) help to maintain the trust”. As we can see, the definitions of communication in the two papers differ slightly, and in these cases we needed to create a new definition that would encompass both definitions.

Fig. 2
figure 2

Codes in NVivo for the thematic synthesis

Fig. 3
figure 3

Communication as a node in the synthesis – on the left side of the figure there is the mind map of the thematic synthesis and on the right side the notes (definitions as described in the papers and the reference on the text describing the evidence found on communication on both papers

We reduced overlap and translated the 27 codes into the following seven themes: Commitment, Communication, Development Process, Investments in People, Technologies and Infrastructure, Reputation, Team Member Skills, and Team Performance (as shown in Fig. 2). Now, Communication (Fig. 4) is a theme composed of four codes: transparency, personal relationships, honesty, and communication. The definitions and the quotes from these codes were all related to the more abstract concept (or theme) ‘communication’, which we defined as: “How a regular process by which information is exchanged between individuals through a common system of symbols, signs, or behavior can help maintaining trust with the clients.”

Fig. 4
figure 4

Communication as a theme in the thematic synthesis. The left side of the figure shows the mind map of the thematic synthesis and the right side shows the notes from the researchers on the theme communication that consists of four nodes (transparency, personal relationships, honesty and communication)

Finally, we created a model of higher-order themes where we mapped the seven themes into three higher order themes: Initial Trust, Maintain Trust, and Initial and Maintaining Trust. On these themes the seven previous mentioned themes were organized. The final concept map is the one shown in Fig. 5. For each entity of the mind map there is some information associated to it: Definitions from the paper, references to text backing up these definitions, a note showing in which paper the factor appeared, and for each of the seven main themes there is also a conclusion and a definition associated with it, as shown in Table 5. The strength of the conclusion is based on the number of times mentioned by the interviewees in each study.

Fig. 5
figure 5

Concept map from the thematic synthesis. The map describes the final thematic map of the synthesis; some factors were found only in one paper (1 for OZA and 2 for Babar) or both. The most mentioned of all by the interviews are also marked. A red arrow shows that there is a relationship between the nodes or Themes

Table 5 Themes definitions and conclusions

The trustworthiness of the model was a straightforward activity because we had only two papers to relate to, therefore all the codes and references could be easily mapped back to the original papers. Besides, we were two researchers doing the work and assessing every step of the process.

3.2 Cross-Case Analysis

The cross-case analysis method is not a prescriptive step-by-step procedure; instead it offers a high-level three-step method, and a “tool-box” of cross-case displays, primarily matrices, to organize that data by variable and/or by case. The process is most clearly presented in (Miles et al. 1994), while the toolbox is introduced in (Miles et al. 1984). Also remember that the method is originally presented as a synthesis method within a multi-case study, while we here use it to synthesize across two single-case studies. The method has three major steps: 1) data reduction, 2) data display, and 3) conclusion drawing and verification (See Table 2). As in any qualitative analysis method, the steps are iterated during the analysis to reach the final conclusion.

In our worked example, the major part of data reduction step was conducted already in the analysis in the primary studies. The synthesis focused on reduction of the material into data in both primary studies, which were stated to have an impact on trust in the outsourcing relationships. Since we only synthesized evidence from two journal papers, which were quite condensed and homogenous, we could get an overview of the papers in their raw formats, and tagged data directly in printouts of the papers. If more studies are synthesized, or if less homogenous studies are synthesized, data have to be stored in, for example, NVivo to allow easier navigation in the data.

The data we derived from the papers were of two kinds: 1) characteristics of the case studies, and 2) factors of trust in outsourcing relationships. Most of these data were presented under easily found section headings in both primary studies, and hence straightforward to derive. All data about the characteristics available in the papers were collected, including facts about the studies’ goals, the companies and interviewees participating in the study, the research methods used for data collection and analysis, and the theoretical frame of reference for the “trust” concept. These data are tabulated in Table 6. The aspects of the case characteristics originate from the aspects presented in the primary studies. It can be worth noting that the concepts of maturity and size of the companies are measured using different characteristics in the two cases; by CMMI level, employer experience, and size in the Oza et al. study, and by employer experience and company age in the Babar el al. study.

Table 6 Reported characteristics in the papers

The trust factors identified in the two studies were given identification tags for Initial and Maintaining factors, respectively (e.g. MB4 is the 4th Maintaining factor listed by Babar et al.). Some factors included sub-characteristics, which are rather specific (e.g. IB2 Creditability had one sub-component, IB2.1 References). As an act of data display, these were presented in an un-ordered matrix (which was later ordered, see Table 7), one column for each study, without any consideration of the semantics of the terms. In the current synthesis, the process was pretty straightforward, as the Oza et al. study had the factors collected in an appendix, presenting their codebook, and the Babar et al. study presented them in two tables (which we reproduced in Tables 3 and 4). One additional factor was mentioned in their text, but not handled as a factor in the tables, but we thought it was important to add it in our matrix (Factor IO7 Representativeness).

Table 7 Related factors in the two studies, tagged XYn, where X = {I,M} for Initial and Maintaining factors, Y = {O,B} for Oza and Babar studies, and n is an order number for the factor found in the primariy studies

The next step in the analysis of trust factors was a new act of data reduction to analyze the semantics of the identified factors. (Remember that the three analysis steps of cross-case analysis are not sequential, but iterative.) We identified the synonyms and hyponyms based on the definitions of the factors in each case, and their presentation in context. We rearranged the factors table, based on the semantic meaning of terms in the two studies. Three of the identified factors had the same meaning in both studies (IO4/IB5 personal visits, IO6/IB6 investments, and MO4/MB10 performance). Two factors had different terms for approximately the same underlying concepts (IO5 people background = IB3 capabilities; MO6 commitment = MB8 managing expectations). In five cases, one factor in one of the studies had two, three, or four hyponym factors in the other study (for example, IO1 references, IO2 experience, and IO3 reputation in the Oza et al. study, corresponds together to the term IB2 creditability in the Babar et al. study, and are hence hyponyms of creditability). Eight factors appear in one study only and have no correspondent in the other (IO7, IB1, IB4, MB2, MB3, MB4, MB9, and MO9).

Both studies used quasi-statistics in their analysis, which is a doubtful practice, if interpreted wrongly (Runeson et al. 2012). However, it may be used to bring forward the most frequently identified factors, and hence we site-ordered meta-matrices based on the sum of the term frequency in the two studies, resulting in Table 7. The ordered table shows that IB2 creditability (Babar), and its hyponyms IO1 references, IO2 experience and IO3 reputation (Oza) are the most frequently mentioned factor for trust establishment, while MB1 communication (Babar) and its hyponyms MO5 communication, MO3 honesty, MO1 transparency and MO8 understanding (Oza) are the most frequently mentioned maintaining factors.

It is also clear from the matrix that IB1 cultural understanding and IB4 pilot project performance are identified as trust establishing factors only in the Babar et al. study, and only the Oza et al. study mentions the importance of the representatives sent forward to represent the company (IO7). Further, MB2 cultural understanding, MB3 capabilities, and MB4 contract conformance, are identified as trust maintaining factors only in the Babar et al. study, while MO9 confidentiality is only identified as a maintaining factor in the Oza et al. study.

The synthesis activity of conclusion drawing was to identify the relations stated between the factors in each of the primary studies and express them in a graph, see Fig. 6. These relations were expressed qualitatively in the analysis text. For example, Oza et al. stated that: “vendors also consider their market reputation a critical factor to gain trust initially. This reputation was also part of the good references and long experience of software outsourcing. Reputation building was also reported to be based on the CMM and other quality certifications…” This statement was captured in the top-left corner of the graph in Fig. 6, where factors IO1 References, IO2 Experience, and the IO3.1 CMM level contribute to factor IO3 reputation, which is a subset of the factor IB2 creditability.

Fig. 6
figure 6

Graph from cross-case analysis. The dashed lines are relations from Oza et al. while solid lines come from Babar et al

For the maintenance of trust, Oza et al. continue: “the majority of vendors identified transparency as a critical factor to maintain trust with the clients. Vendors identified transparency in undertaking the process, communicating with the client and showing outcomes of the project. One vendor commented: We have a project office tool. With the use of this tool clients can view each employees timesheet information on a daily basis and the work status. We are always happy to make it open to the customer, if he wants, he can get it. When you open the whole system process to somebody it gives lots of confidence and they trust you.”. This was interpreted as having a project office tool (MO1.1), contributes to MO1 transparency, which in turn contributes to maintaining trust in the outsourcing relationship.

These relations were listed for each study and then displayed in graphs, see an example in Fig. 6. Note that only factors with more than one relation are drawn in the graph for visibility reasons, i.e. factors stated to have only a single relation to trust are not included in the graph. Relations originating from the Oza et al. study have dashed lines, and relations from the Babar et al. study have solid lines.

The conclusion drawing and verification step involved refinement of the above steps. We phrased condensed summaries of each of the papers’ views, for example, on the trust factors for maintaining trust as follows:

  • Oza et al. present different aspects of transparency as critical success factors. Trust grows when you demonstrate that you have nothing to hide. This is well in line with their definition of trust as “willingness to be vulnerable”. Examples of transparency are 1) having a project office tool, where the client can monitor all project data, 2) backing up statements and promises with real actions, 3) being honest and not hiding anything to the client, 4) using processes as a framework to relate the progress to.

  • Babar et al. focus on communication and cultural understanding in their analysis. Communication is the basis for building and maintaining trust, both formal and informal relationships. The communication leads to cultural communication through mutual exchange visits, which in turn improves trust. Hence, the communication is assumed to have one direct and one indirect impact on trust.

The cross-case analysis does not reveal any contradictions between the two studies with respect to trust factors, meaning that one did not state the opposite of what the other states. The frequency ranking is also very much the same in the two studies. However, they put different emphasis on the factors when they qualitatively discuss a few key ones.

3.3 Narrative Synthesis

Apart from understanding the perspectives of Vietnamese vendors (Babar et al. study) with regard to trust between clients and vendors in the context of off-shore software development outsourcing, another objective of Babar et al.’s research was to compare the Vietnamese practitioners’ views with their Indian counterparts (Babar et al. 2007). In this section, we present their comparative analysis of factors identified as important in establishing and maintaining trust relationships by the Vietnamese and Indian practitioners, which was made by the authors of the second paper, Babar et al. The data for Indian practitioners’ views were taken from the study reported by Oza et al. (2005) and the data and narrative synthesis for the Vietnamese vendors is fully described in the paper by Babar et al. (2007). Babar et al. structured their synthesis in two main sections: Establishing trust and maintaining trust. The authors structured the narrative based on comparing the results from the two settings.

For establishing trust Babar et al. synthesize the findings as:

For factors perceived as important for establishing initial trust between clients and vendors, Indian and Vietnamese practitioners seem to agree on only two factors: ‘Client visitsandInvestment’. A semantic analysis of the factors identified by both groups revealed that Indian practitioners mentionedCustomer references’, ‘Experience in outsourcing’, andReputationas important factors, which can be semantically considered quite close tocreditability’, a factor mentioned by the Vietnamese practitioners. Vietnamese practitioners also agree on the importance of some other factors (such as references, experience, reputation, and creditability) in gaining a clients trust initially.”

Babar et al. also give some of their perceptions on similarities and perceptions in the results for the main factors perceived as similar for the two settings, as for instance in the following part of the narrative synthesis:

Vietnamese companies are relatively new in the software outsourcing business and have relatively few significant customers, whose references they can use to gain the trust of new customers. However, Indian companies, being veterans of the software outsourcing business, do not face this situation, as they are able to use the references of large multi-national companies, who have outsourced their software development to Indian companies. One of the interviewees of study conducted by Oza et al. emphasized the role of customer references in gaining initial trust in these words: ‘References help us to a great extent, if I pull out a long customer list then people will say see they are working with scores of such companies, they can work for us’”.

An interesting part of the narrative synthesis was that they also discussed the main differences in the findings for initial trust:

One significant difference between Vietnamese (Babar et al. study) and Indian practitionersviews (Oza et al. study) is the role ofcultural understandingin establishing a trust relationship. As we discussed previously that Vietnamese vendors consider the understanding of a clients culture as a critical factor in gaining initial trust. However, it appears that their Indian counterparts do not perceive the cultural understanding being of any importance, as they do not mention anything about the cultural understanding at all.

The narrative synthesis approach was particularly useful in describing the differences and in making explicit the diversity in the study context. For example, Babar et al., when explaining the differences between the two contexts in terms of the factor “cultural understanding” wrote:

One explanation for Indian practitioners not mentioning cultural understanding can be their familiarity with the culture and language of their major clients, usually Americans. Understanding the written and spoken language (English), coupled with a strong linkage to Western countries through expatriates has been widely cited for Indian companiessuccess in attracting outsourced contracts (Carmel, 2003a). On the other hand, Vietnam has its own language and business ethos, which necessitates learning the languages and gaining a cultural understanding of their clients. That is why Vietnamese practitioners viewcultural understandingas one of the most important factors in gaining a clients trust for establishing long-term relationships. This is particularly true for their Japanese clients, who because of the strong uncertainty avoidance nature of their culture (Hofstede, 1980) prefer long term business relationships to gaining short term benefits.”

The same structure of the narrative synthesis, similarities and differences, was used for establishing trust and for maintaining trust. Babar et al. summarize the main similarities in the findings of establishing trust as:

Both groups agree that processes, communication and performance are important factors for maintaining trust between clients and vendors. We also observe that both Indian and Vietnamese practitioners also seem to agree on some other factors as important in maintaining trust, though the description of these factors is lexically different. For example, Indian practitioners mentionedHonesty’, ‘Commitment’, ‘Confidentiality’, ‘Cooperation’, andUnderstanding’, while the Vietnamese used constructs likeContract conformance’, ‘Managing expectations’, andPersonal relationshipto describe similar factors. For instance, Oza et al. quoted one of the Indian practitioners emphasizing the importance ofHonestyin maintaining trust in ongoing relationships in following words:

You have to be upfront and honest with your client. You should not hide anything from him, whether it is good or bad, whether it is going to earn you a flack for that moment. This is very important for the long lasting relationship and to achieve trust.’

Compared with the above-mentioned views of the Indian practitioner on the role of Honesty, Vietnamese practitioners described the requirement of being Honest and open to clients to gain and maintain their trust in terms of contract conformance and managing expectations. One Vietnamese interviewee described the importance of being honest and upfront in the following words:

It is very important to demonstrate to your clients that there are certain measures in place to make all project team members aware of the criticality of the conformance with the contractual obligations; it might be conformance with the non-disclosure agreement (NDA) or keeping commitment to the deadlines and budgetary limits. Be upfront in explaining what can or cannot be done within the given time and budget. Change requests must be monitored carefully and customers should be taken into confidence if something goes against the plan, for example, key members of a project team leaving the company’”.

The main difference perceived in the findings of factors in the maintenance of trust in outsource relationships is again the importance of the cultural understanding. Babar et al. conclude that:

The views of Vietnamese and Indian practitioners again differ on the importance ofCultural understanding’. The Vietnamese considered this factor also vital to maintaining trust in ongoing relationships, while their Indian counterparts did not mention this factor again. Thus, a major difference between the Indian and the Vietnamese practitionersviews is that the former seem to consider factors related to business process more important, while the latter not only realize the importance of business process related factors but also recognize the vital role of cultural understanding and personal relationships in maintaining a trust relationship. These two factors are considered very important for successful business partnership by Asian clients in general and by the Japanese in particular, who are major software outsourcing clients of Vietnamese vendors.”

To summarize, Babar et al., conclude the synthesis saying:

Another significant point revealed by this comparative analysis is that the Indian practitioners identified entirely different factors that they considered important for establishing trust and maintaining trust, while there are a few factors that the Vietnamese practitioners considered important for both the establishment and maintenance of trust in software outsourcing relationship. For example, the Vietnamese practitioners described cultural understanding and capabilities as important factors for both establishing and maintaining trust.”

4 Discussion

In this section we discuss and provide answers to the research questions of this study. Our aim is to demonstrate the similarities and differences of results and conclusions when applying different methods of synthesis and to discuss the challenges of synthesizing evidence from reported case studies in SE. Our main research questions were:

  1. i)

    What are the differences in the results when using narrative, cross-case or thematic synthesis of case studies evidence?

  2. ii)

    What are the main challenges of performing case studies synthesis?

4.1 Comparison of Results from Methods of Synthesis

For the purpose of this paper three of the most relevant methods are compared: thematic synthesis, cross-case analysis, and narrative synthesis. This comparison is performed based on a worked example as shown in the previous section. Before doing an analysis of the comparison of the syntheses, we would like to note the thematic synthesis and the cross-case analysis were performed by researchers that were not involved in any of the two primary studies. However, the narrative synthesis was performed by the three authors of the second study (Babar et al. study), therefore, they also had access to the raw data of the second study, which may have given them the opportunity to go deeper in their synthesis. Whether the findings might be different with independent researchers or not, is per se a researchable question. It is clear from the example that the facts are taken into the narrative analysis, which were not in the original study (cultural origin, maturity of in sourcing in the country), which the more structured methods tried to avoid. However, while synthesizing the results from the two papers, both teams, Norwegian (thematic synthesis) and Swedish (cross-case analysis), found that the quotes inserted in the papers were not enough to be totally confident that we were synthesizing the papers at the right level of abstraction and granularity.

In our example, the primary studies had the same goals and methodological framework. The main variations were the target culture (India vs. Vietnam) and the research groups. There was a temporal variation in the sense that Babar et al.’s study was run based on Oza et al.’s previous paper and results. There is hence a threat that Babar et al.’s results may be influenced by Oza et al.’s results. But they also added two important variations: definitions and target cultures. The terminology and definitions are partly different; e.g., the factor ‘performance’ was defined by Babar et al. as: “How performance (productivity/effectiveness) of staff in carrying out the projects help to maintain trusts with clients”, while in Oza et al. the same term was defined as: “You have to perform the work to gain the trust, it is based on performance”. Another example is ‘communication’, which in both papers is defined as “How effectiveness of communication with clients [maybe in clientsnative language] help to maintain the trusts”, while we in Oza et al. found three additional terms (transparency, honesty, and understanding) which were used to together represent what Babar et al. referred to as communication.

However, in both cases important terms were well defined, which helped with understanding the differences between them. The Norwegian and Swedish teams were conscious about the definitions. In the cross-case analysis, the results table also includes pairing of the definitions across the two studies. In the thematic synthesis, the definitions were kept in the thematic network so the researchers could always see and compare the different definitions. In the narrative synthesis, the authors were more conservative when aggregating or redefining concepts. Consequently, the narrative analysis concludes that the two studies “identified entirely different factors”, while the other two analyses, when analyzing the more detailed meaning of each term, found fewer differences. Table 8 summarizes the main factors found when using each method of synthesis.

Table 8 Comparison of results

The thematic synthesis method produced a graph (Fig. 5) showing the relations between the concepts identified, with legends showing which ‘trust’ factors originate from one study, the other, or both. The cross-case analysis method produced tables, comparing the characteristics of the two cases, and comparing the ‘trust’ factors originating from the two studies. However, the toolbox of the cross-case analysis method did not enforce aggregation of factors into higher-level factors, as did the thematic synthesis, thus resulting in a longer list of more specific factors. The narrative synthesis produced a text explaining the commonalities and differences in the results from the papers.

Reassuringly, the conclusions on the synthesis of the two papers on factors of trust in outsourcing relationships were largely similar across the thematic and textual narrative synthesis. It is not clear from our study whether the variations are due the different methods or the two sets of analysts. The conclusions were dominated by the similarities of the results from each paper. The narrative synthesis focused on the differences between the findings in the studies. The thematic synthesis created a new category of factors “initial and maintain” where these differences were placed. One important distinction in the conclusions from the narrative and thematic synthesis was in the interpretation of the factor “Understanding” from the Oza et al. study, which is defined as “Understanding between clients and vendors in transacting with each other”. The researchers on the thematic synthesis understood this definition to include the definition of “Cultural Understanding” from Babar et al.: “How knowledge of the norms, beliefs, business ethos, and skill in the native language of potential clients helps vendors achieve trusts”. Nevertheless, Babar et al. did not consider these definitions as similar. But the analysts on the thematic synthesis concluded that Cultural Understanding was a factor of influence to initial and maintaining trust based on their interpretation of the factor.

The narrative synthesis approach was particularly useful in describing the differences and in making explicit the diversity in the study’s context, for this reason Babar et al. affirms that they have identified entirely different factors compared to Oza et al., but the authors do not discuss or consider mention the issue of similar concepts using different terms. The thematic synthesis did not extrapolate as much and did not discuss much the contexts of the findings, this method showed rather poor at examining contradictions in the data and at highlighting gaps in the evidence. The cross-case analysis focused more on the semantic similarities and differences between the two studies. The results are hence primarily a synthesized list of factors, expressed in a common language. Further the cause-effect graphing provides an initial understanding of casual relations between factors. However, the lack of access to raw data prevents the analysis from going deeper than the narrative analysis did.

On the transparency of the synthesis, the thematic synthesis and the cross-case analysis showed to fulfill the expectations, but transparency remained a problem in the narrative synthesis, for example, the choice of the examples and quotes in the narrative synthesis rely on the judgments of the researchers, thus it is not clear if they chose the quotes to e.g. reinforce the results from their own studies, or if they chose the quotes that best represented the factors. In the thematic synthesis all the information is traceable and the whole process can be repeated. The same is with the cross-case analysis, where all the matrices and charts can be remade. All the products of the synthesis from the thematic and cross-case analysis can be debatable and discussed again, but that is not the case with the narrative synthesis.

The methods showed to be complementary in some points. For example, the tables of the context on the cross-case analysis counteract the lack of explicit focus on the context of the studies in the thematic synthesis. The thematic synthesis process also led the analysts to extrapolate on the evidence found in each paper and draw conclusions based on the papers. This was not a step foreseen by the cross-case analysis, but it was a natural step conducted by the narrative synthesis.

4.2 Challenges of Synthesizing Case Studies

One important point to highlight is that no matter the method of synthesis, the experience of the analysts will highly influence the final conclusions of the synthesis and that should be accounted for when comparing the methods of synthesis. In addition, by performing the worked example, we could perceive some other factors that can impact the use of a method of synthesis, including: goals and research questions, types and number of the case studies selected, variations in context, limited access to raw data, and quality of the case studies. Most probably, no single method will offer all the required features for the synthesis, so a combination of methods may often be the best approach. In the following subsections we describe the five most important challenges we identified when synthesizing case studies.

4.2.1 Goals and Research Questions

Several methods have a broad application to a variety of different questions. It is necessary therefore, to select a synthesis method that is applicable to the underlying study aim and question. Typically, a synthesis focuses on a well-defined question and aims to provide an answer by synthesizing the findings from a relatively narrow range of quality-assessed studies. A fundamental distinction regarding the objective of such syntheses is whether they attempt to provide knowledge support or decision support (Ashrafian et al. 2011; Pope et al. 2007). A synthesis directed to knowledge support will typically bring together and synthesize evidence on a particular topic, while a synthesis aimed at decision support will be more specific and include analytical tasks to help make a decision within a particular context (Mays et al. 2005).

Although these are the two ultimate goals, the synthesis goal may vary from the need of pure factual knowledge to attainment of judgment and decision (Ashrafian et al. 2011), e.g. impacts of objects of study, comparison of objects of study, feasibility of objects of study, impacts of context on the object of study, etc. An object of study can be a technique, a method, an approach, or a phenomenon.

Knowledge of facts, such as whether a specific object of study is important or not, can be suitably answered by a thematic synthesis which can bring broad conclusions and is flexible to the buildup of knowledge (Ashrafian et al. 2011). Contextualizing an object by comparing different usage contexts can be for example performed with a cross-case analysis, as seen in our worked example. Impacts of an object of study on software development as well as the feasibility of the object can be synthesized by thematic synthesis and cross-case analysis. Some more specific techniques and a more interpretative approach would be needed to provide guidelines for decision support; thematic synthesis can be extended for that. In our worked example, it was important that we kept in mind the research question in all the steps of the synthesis, so we would not start exploring other aspects that appeared during the synthesis, as for example exploring how the CMMI context influenced in the answers of the interviews, which would possibly lead to another synthesis process.

4.2.2 Number of Case Studies

Some synthesis methods require more studies than others to be effectively applied, e.g. case surveys are tailored to synthesize many studies. However, it is not possible to say for sure how many studies are needed to answer a specific research question. For qualitative studies, the notion of ‘saturation’ must be taken into account, i.e., judging whether new studies add more knowledge on the research question. The number of studies needed depends on how broad the research question is and how many independent variables and factors affect the results of the object of study. For example, a narrative synthesis cannot be meaningfully performed with a large number of cases, as the data volume would be exceedingly large.

In the case that the synthesis comprises many studies, then the synthesis will probably be more quantitative than qualitative. This is so because whenever one attempts to incorporate a large number of cases into a single synthesis, it will be necessary to reduce the evidence to a smaller number of dimensions (Gerring 2007). Thematic synthesis is a method that is suitable for this scenario. In our worked example, if we would add another paper there were many possible ways, for example, for the coding process, it would just be needed to add references from the new papers to the already existent codes or adding new codes that came with the new studies. Although, it would be more complicated if the new papers would foster new themes or require a reorganization of the already established themes. Cross-case analysis can help to handle with the contexts of the cases but it requires more organization to get the evidence of the papers in tables; the larger the number of papers, the more complex the matrices and tables are.

No matter the method of synthesis, there is always a trade-off between the ability to generalize and the ability to understand fully all the nuances of individual cases. The use of different methods may result in different conclusions. This is a general issue on studies based on qualitative data and is an effect of the richness and lack of precision of such data. A measure to increase the validity of analysis is to maintain a clear chain of evidence from the primary studies to the synthesized evidence, as in our example; we used tools support to manage the traceability of the references to the themes and models. Without this type of support, it is almost impossible to keep the rigor and transparency of the process. Another method to increase the validity of the synthesized knowledge could be to involve the authors of the primary studies to review the synthesis.

4.2.3 Variation in the Context of the Primary Studies

Context is a central concept in empirical software engineering. It is one of the distinctive features of the discipline and it is an indispensable part of software practice. It is likely responsible for one of the most challenging methodological and theoretical problems: study-to-study variation in research findings (Dybå et al. 2012, Dybå 2013). The settings in which practice takes place are rarely, if ever, the same. For example, one software organization will have a different environment or be influenced by different environmental factors to that of another software organization. Thus, Dybå et al. discuss the importance of drawing attention to the who, what, when, where and why of a study (Dybå et al. 2012):

Gerring identifies two possible styles of co-variational evidence in a case study synthesis: temporal and spatial (Gerring 2007). Spatial variation refers to case studies that were run by different research groups/authors but with similar objectives and instruments of data collection (an example of the ‘Who’ and ‘Where’ dimensions). Besides, if different groups perform the studies, one challenge is that they may have different measurement procedures or definitions etc. Temporal variation refers to development over time: if a research group is running a series of case studies successively, the synthesis must consider context variations over time in the studies that may explain the change. Clearly, cases must be similar to each other in whatever respects might affect the causal relationship that the researcher is investigating, or such differences must be controlled for (Gerring 2007). Uncontrolled heterogeneity means that cases are “apples and oranges”, and that one cannot learn anything about underlying causal processes by comparing their histories.

Under circumstances of extreme case-heterogeneity, the researcher may decide that it is better to focus on a single case or a small number of relatively homogeneous cases (Gerring 2007). Cross-case evidence drawn from a handful of most-similar cases may be more useful than cross-case evidence of many studies, even though the ultimate interest of investment is in a broader population of cases. The issue of population heterogeneity/homogeneity may be understood, therefore, as a trade-off between the number of cases and the number of variables.

The two case studies investigated in this paper were run by different research groups but seen as similar enough in order to make the analysis possible. The Babar et al. study was run after the Oza et al. study was published. In this case some of the definitions of the factors in Babar et al. study may have been influenced by Oza et al. study. Another important context variable in this case is the country in which the studies were performed; the Indian software companies were very much influenced by the CMMI model, while the Vietnamese were starting with the process of getting certification level 2 or level 3 on CMMI. Babar et al. stated that: ‘Vietnamese practitioners believe that being a relatively new player in the software outsourcing arena, they need to quickly build a reputation for being able to develop quality software by following rigorous and systematic processes. Most respondents reported that they have learned from Indian companies that certification is an important mechanism for building creditability and assists in convincing clients to trust in their capabilities’. These differences were better pointed by the narrative synthesis, because Babar et al. were immersed in the context of their own study, in contrast to the authors of this paper. Details like this may be overlooked when the primary studies do not describe the important factors moderating or influencing the results of the studies. In these cases no method of synthesis can guarantee the validity of the conclusions.

Terminology and definitions may also differ between studies. In some cases they are quite well defined, which helps, but does not solve the problem. Well-established SE terminology may help addressing this challenge, but the area is not mature enough in this way. Therefore the authors of primary studies should be clearer on the definitions of the concepts used in their papers. The challenge here is that the underlying factors of interest have different meanings in different contexts (conceptual stretching) or the causal relationships are different in different contexts. For example in the example cited in the previous paragraph, Babar et al. found that Creditability was an important factor of trust defined as “How references, certifications, previous experiences help to gain trust from clients”, but the “certifications” part of the definition is based on what they heard of the experience of the Indians with CMM and CMMI certifications. Further, as mentioned before, Communication had a wider meaning in the Babar et al. study than in the Oza el al. study.

In the Indian context (Oza et al. study) they defined “Reputation” as a factor of trust meaning: “Vendors opinion about how certifications from international companies, successful project histories and other previous achievements lead to a good reputation of the company and in turn if becomes useful in achieving trust from the prospective client”, but when they received answers in the context of Indian companies, the answers were based on their own experience with CMM and CMMI certifications. In the case of the Indian respondents only 6 out of 18 mentioned this as an important factor of gaining trust initially, and in the case of the Vietnamese, 11 out of 12 mentioned creditability as an important factor of initial gaining trust from the clients. Clearly the understanding of the importance of the concept for the gaining of trust in the two contexts varies and should be taken into account in the synthesis of the studies.

4.2.4 Limited Access to Raw Data

Synthesis of evidence published in journals and conference proceedings involve a key challenge in the limited access to raw data. Authors generally judge differently which level of information is important to be reported in the papers. Some authors, for example, may be very detailed with the descriptions of context information and others may only give some general information on the context of the studies. In shorter articles it may be that certain aspects are not covered due to space limitations, which could affect the analysis. Getting access to or working with other researchers’ (probably disclosed) data is not easy. Therefore, using the final report of the results of the primary studies may hinder the synthesis on the details of the evidence being used.

For example in the Babar et al. study, the authors wrote on the evidence about cultural understanding as a factor of trust: “Respondents described cultural understanding as knowledge of the norms, beliefs, business ethos, and skill in the native language of the potential client. Vietnamese practitioners believe that familiarity with the culture of a clients country and ability to communicate in the native language of that country can help vendors get prospective clients to feel comfortable in starting business initiatives. One of interviewees elaborated on the importance of cultural understanding: ‘Ability to communicate in a clients native language and familiarity with his/her culture can provide the biggest advantages or barriers to achieving initial trust’”.

In this case, Babar et al. used one of the respondents’ answers to justify their conclusion that Vietnamese practitioners believe that familiarity with the culture of a client’s country and ability to communicate in the native language of that country can help vendors get prospective clients to feel comfortable in starting business initiatives. It seems like they based their conclusion upon reading all the other respondents’ answers and found that this respondent’s answer was the one that most explicitly showed that they could conclude this way. And in this case we, as analysts in a synthesis, need to trust that their judgment was done in a systematic and impartial way and that this quote really represents the overall view of the evidence.

In the worked example presented here, it was clear that access to the raw data helped on deeper insights in the narrative synthesis, once that they had access to all the data from the study they performed. Nevertheless, if the analysts are not involved in the primary studies included in the synthesis, there is no easy fix to this problem, but it is important that the analysts account for this fact and assure that the quality of the papers included in the review are good enough to be sure the evidence is credible and complete.

4.2.5 Quality Assesment

As the number of primary studies in a synthesis increases, the variation of the quality of the studies also increases. If the studies, which did not meet a certain number of quality criteria were excluded from the synthesis, then little need to be reported regarding the quality of individual studies. But, if these studies were to be included, then there is a need to comment upon the quality of the individual studies as well as the overall strength of evidence when synthesizing the findings (Dybå and Dingsøyr 2008). None of the methods discussed in this paper address these questions. So it is much more up to the experience of the researcher to consider the quality of the studies when drawing the conclusions of the synthesis.

Performing quality evaluation of papers is not a straightforward task. However, there have been several suggestions for quality checklists that can be used to evaluate the quality of empirical studies in SE (Dybå and Dingsøyr 2008). But, even using these checklists, Kitchenham et al. (2012) found that different reviewers perform differently in assessing the quality of the papers. Nevertheless, they also concluded that the reliability obtained by pairs of judges with a round of discussion is generally quite good. Therefore, it is recommended that more than one researcher perform the quality assessment of papers.

In our worked example, the two studies were very similar and their differences in quality did not impact the synthesis. Hence, we did not perform an evaluation of the quality of the papers.

4.3 Limitations of the Study

The main limitation of the study is the number of primary studies. With only two papers we cannot be sure how well our results will generalize. For example, the papers are very simple case studies, consisting of data from interviews only. Besides, the papers are quite similar in terms of the goals and design of the studies; the Babar et al. study was even designed based upon Oza et al. study. This makes the papers easier to synthesize. On the other hand, for the purpose of this paper, we needed cases that would make it possible to demonstrate the usage of the methods of synthesis and how they would be performed in some case studies. Therefore, the papers were purposely chosen to make it relatively easy to perform a synthesis. And, even though they were chosen this way, we still could identify very important challenges during the synthesis.

Other limitations are that we, as a group of researchers, have extensive experience of empirical software engineering, so our results may be better aligned than those that would have been obtained by a random selection of researchers. Nevertheless, this fact also permitted us to be able to be critical to the process of synthesis and to not only reflect on the results of the syntheses of the two papers, or the differences in background among the researchers performing the syntheses.

Another limitation is that the narrative synthesis was not performed for the purpose of the paper; it was performed by the authors of the second primary study. Therefore all the reflections of the method of synthesis were based on our judgments and in what we could perceive as advantages and disadvantages of the method as performed by the authors. Although this might potentially have introduced bias, we felt that it also have enriched our results, once that we could extract the results from an independent synthesis of the two included papers.

5 Conclusions and Future Work

While methods for qualitative synthesis have many similarities, there are clear differences in approach between them. In this paper we showed some similarities, differences, and challenges of using three methods of synthesis applied to one example.

The final conclusions of the syntheses reached by the two teams using thematic synthesis and cross-case analysis were not the same in all aspects, but give different views of the syntheses of the two papers. So the factors derived as the most important factors for trust in outsourcing relationships were sometimes complimentary and sometimes grouped in different perspectives. But overall, the two teams reached similar conclusions. Additionally, Babar et al. included a narrative synthesis (which the teams in this study did not read until after their syntheses) focusing on hypothesized differences between the Indian and Vietnamese contexts, which were not part of the original studies.

There are implications for both the conduct of synthesis in secondary studies and for the notion of the differences between methods. With respect to undertaking synthesis, our experience suggests that the process not only depends on the method of synthesis, but also on other factors that can impact the use of a specific method of synthesis, including: goals and research questions, types and number of the case studies selected, variations in context, limited access to raw data, and quality of the case studies. Thus, we recommend that the analysts should be aware of these challenges and try to account for them during the execution of the synthesis. We also recommend that analysts consider using more than one method of synthesis for the sake of reliability of the results and conclusions. However, as with the choice of other research methods, personal preferences, educational background, and experience also play an important role in the choice of methods for case study synthesis.

Future work includes increasing the number of papers in the investigation. Our hypothesis is that other challenges will be discovered and some of the challenges described in this paper will impact the final conclusions of a synthesis more than others. We will also explore other methods of synthesis that may be suitable for synthesizing case studies in SE.