1 Introduction

Organizations are in a competitive environment. In today’s dynamic markets, the pressure to improve quality and productivity and to maintain competitive advantages has made the adaptability to new business requirements a critical issue to the survival of organizations. The Information Technology, as an important tool for organizations, has also followed this trend. Therefore, the construction of interoperable services, which can be organized in a flexible way to quickly meet business needs as described in the service-oriented architecture (SOA), became a promising alternative to be considered.

The process of service-oriented modeling and architectural design consists of three general steps, namely identification; specification; and realization of services, components and workflows [5]. The identification step (the main subject of this study) aims to determine which services are appropriate to be implemented in a SOA. Erl [16] defines three possible strategies for service identification. One is the top–down strategy, which advocates the completion of an inventory analysis (definition of enterprise business model, technology architecture, and service inventory blueprint) prior to the physical design and development of services. The second is the bottom–up strategy that is tactically focused and makes the fulfillment of immediate business requirements a priority and the prime objective of the SOA project. The last strategy is meet in the middle, which is a combination of both. The top–down strategy is used to promote alignment with business goals or processes, whereas the bottom–up strategy is used to evaluate the existing assets, such as information systems, services repositories, databases, and legacy documentation.

Service identification is one of the most practical phases, and it is a real challenge when designing and implementing a SOA [8]. Besides predicting which services an enterprise will eventually need and defining which functions should be part of each service, service should also take into account different levels of service granularity in order to promote reuse and, at the same time, to provide enough flexibility to enable service composition and orchestration without significant performance loss. Moreover, service identification should also produce a catalog of services that is meaningful to the businesses. In order to address these challenges, it is essential to have a methodology to support examination of the businesses from multiple perspectives.

By considering the importance and the complexity of the identification phase in a SOA process, many Service identification methods (SIMs) have been proposed in the recent literature. These SIMs offer different techniques to identify SOA services, such as process decomposition, model-driven approaches, value analysis, source code extraction, and ontology mapping [13]. The ultimate goal of a SIM is to deal with challenges in the service identification phase in order to identify services that have correct functionality, granularity, reuse, and flexibility for service composition and orchestration. In this context, some surveys [913, 115, 116] have been published aiming at providing an overview of the existing SIMs. However, meet in the middle approaches are generally not addressed in these surveys. Moreover, none of them evaluates several perspectives pointed out as relevant to the industry, including standards such as reference architectures [14]. Furthermore, up to date only one survey on SIMs [12] was conducted by adopting the guidelines of a systematic literature review (SLR) [1], which provides a methodological, fair analysis of a given subject in a comprehensive and non-biased way. Therefore, all the aforementioned points create a gap that we seek to fulfill with this survey.

This survey intends to (i) take into account the different service perspectives stated by the SOA reference architectures presented in [14] and suggest which of them are relevant to the service identification phase; (ii) provide a comprehensive overview of existing SIMs by detailing the techniques used by them to identify candidate services; and (iii) to shed light on further opportunities for improvements in this field.

The remainder of this paper is structured as follows. Section 2 describes related work. Section 3 defines the systematic steps adopted in the performed survey. Section 4 presents the comparison criteria and reports the obtained results. Section 5 describes the threats to validity. Finally the conclusion is provided in Sect. 6.

2 Related work

Table 1 contains a summary about seven surveys focused on SIMs found in the current literature.

Table 1 Comparison among existing SIM surveys

Boerner and Goeken [9] propose six groups of SIM characteristics (basic characteristics, business aspects, technical aspects, economic aspects, components of method engineering, and principles of design science research) that were used to compare five existing SIMs. In turn, Birkmeier et al. [10] define 13 SIM characteristics classified into foundations (conceptual design), model and supporting measures, and procedure (technique). Kohlborn et al. [11] list eight SIM characteristics (type of services, strategies, lifecycle covered, degree of prescription of the methods, validity, adoption of existing notations/processes, adoption of consumers and providers perspectives, and use of service classification) and compare 30 existing identification methods by using these characteristics.

Gu and Lago [12] select 30 primary studies from a set of 237 examined studies and compare them regarding types of inputs, outputs, and strategies used in service identification. Cai et al. [13] assess 41 studies and propose a classification of high-value activities shared by different identification methods. Zadeh et al. [116] propose a criterion to evaluate SIM inputs regarding their machine readability, the level of interaction details among process, stakeholders, service choreography that they elicit, their level of abstraction, goals coverage, and possibility of being decomposed. Finally, Vale et al. [115] select the most significant criteria in previous surveys and compare 30 SIMs by considering the service granularity and type, strategy, inputs, outputs, activities, research method, validation formalism, economic aspects, and the industry sector in which the method is applied.

These surveys provide a good overview of SIMs and differ in terms of the adopted criteria for selection and comparison. However, it is possible to notice that there is a lack of systematic methods for service identification. Moreover, they propose that the method to be developed has to be configurable depending on the utilization constraints within the organizations (e.g., unavailability of an input or the need of applying the method to small domains), and some of them also suggest that economic aspects and non-functional requirements must be considered.

The main differences among these surveys and the survey presented in this paper regard to (i) the research method that was employed; (ii) the inclusion of a significant number of new studies on SIMs, published between 2010 and 2013; and (iii) the adopted criteria for comparison and analysis of existing SIMs.

First, as we have already mentioned, only one of the existing SIM surveys [12] has followed the guidelines of a SLR [6], which provides a methodological, fair analysis of a given subject in a comprehensive and non-biased way. Our survey was undertaken with the SLR guidelines suggested by Kitchenham et al. [1, 3].

Second, our survey represents an update of the state of the art regarding SIMs. As also presented in Sect. 3.2, 36 studies were published from 2010 to June 2013, thus representing an expressive amount of recent studies that were not considered in the existing surveys. We have also observed that some of these new studies (for instance, [61]) are cited by more than 40 authors, thus indicating significant contributions published in the last years that were addressed in our survey.

Third, this survey compares and evaluates SIMs according to perspectives based on the OASIS’ reference architecture for SOA [14]. Such a reference architecture is well known for both academia and industry as it describes several characteristics of a reference SOA environment and assists the SOA understanding and adoption. Many of these characteristics are related to service definition and their analysis may reveal opportunities of improvements to SIMs.

Finally, this survey also presents a greater number of findings when compared to the other surveys as it incorporates researches based on the meet in the middle strategy, which was less analyzed in previous surveys. This is an important aspect of the present survey since approaches based on such a strategy are more comprehensive than those based on top–down or bottom–up strategies. Meet in the middle strategies evaluate both business and technical perspectives, and they are more aligned with the enterprise reality when considering existing assets and quickly delivering recognizable benefits without neglecting the fact in which services are designed for reuse and must comply with the business context.

3 Systematic literature review

SLRs are means of evaluating and interpreting all available relevant research to particular research question(s), topic area, or phenomenon of interest, thus aiming to present a fair evaluation of a research topic by using a rigorous methodology. Such a rigorous methodology can be viewed as the main point that differentiates a SLR from a simple, traditional literature review (as it was performed in the related surveys about SIMs in Sect. 2) as it seeks to avoid the maximum of bias throughout the process, thus providing scientific value for the obtained findings. SLRs have been recently viewed as an useful way for dealing with research evidences, thus making it possible to systematically identify, select, analyze, and aggregate them for providing knowledge about a given research topic. They have been commonly used for synthesizing existing work from the literature in a comprehensive and non-biased way and for identifying research challenges and opportunities in the state of the art regarding a research subject.

As proposed by Kitchenham et al. [2], a SLR is structured over a systematic process that is typically divided in three main basic steps (Fig. 1):

  1. 1.

    Planning, which defines the research questions, search strategy, selection criteria, data extraction, and synthesis methods to be used, and yields a protocol that will guide the conduction of the whole SLR process;

  2. 2.

    Conduction, in which the primary studies are identified, selected, and evaluated according to the established protocol, and;

  3. 3.

    Reporting (or Analysis), which aggregates extracted information from the relevant primary studies considering the research questions and outlines conclusions from them.

Section 3.1 and 3.2 detail the application of the Planning and Conduction steps to the systematic survey presented in this paper, whereas Sect. 4 presents the results of the performed SLR (Reporting step).

Fig. 1
figure 1

SLR process

3.1 Planning

In this phase, the goals and protocol of the SLR were defined. Such a protocol consists of a predetermined plan that describes the research questions, how the SLR process itself will be conducted (i.e., the search strategy to be adopted), and establishes the selection criteria and the data extraction and synthesis methods. The research questions must have a clear and well-defined focus as they drive the whole SLR, so that the search procedure must identify the studies that help to answer the research questions, and the data extraction and analysis processes must produce data and knowledge to answer them.

3.1.1 Research questions

Despite many published SIMs, there are gaps toward a comprehensive and systematic method for service identification. Identification methods can vary depending on the availability of inputs (business models, documentation, etc.) and the scope of identification (comprehensive and proactive identification or direct answer to a development project). Therefore, aiming at analyzing previous studies and then summarizing evidences about how existing SIMs work, the following research questions (RQs) were proposed based on these challenges:

RQ1: :

How do current SIMs address the different service perspectives stated by the SOA reference architecture presented in [14]?

RQ2: :

Which techniques are used by existing SIMs to identify candidate services?

3.1.2 Search strategy

In order to establish the search strategy based on the defined research questions, three main terms were initially identified, namely SOA, identification, and services. In addition, in order to ensure a greater coverage in the results, we have included the design and analysis terms, thus resulting in the following search string:

service-oriented architecture AND (service identification OR identify services OR service design OR service analysis)

in which the main terms were connected by using the AND logical operator and the possible variations by using the OR logical operator.

In order to select the most proper databases for the search process, the following criteria discussed by Dieste et al. [7] were considered: (i) content update, i.e., if the publications are regularly updated; (ii) availability, i.e., if the full text of the primary study is available, and; (iii) quality of results, which is related to the accuracy of the results obtained by the search. As shown in Table 2, four electronic databases were selected based on these criteria and because they are the most commonly used databases in systematic reviews in the Software Engineering domain, as pointed out by Kitchenham and Charters [3] and Dybå et al. [4].

Table 2 Electronic databases selected as sources for the search process in the conducted SLR

3.1.3 Inclusion and exclusion criteria

Some of the found studies might contain the keywords used in the search string, but are irrelevant to the research questions. Therefore, selection criteria are used to evaluate each primary study obtained by the search procedure according to the defined research questions, thus making it possible to include studies that are potentially relevant to answer them and to exclude studies that do not contribute to answer them.

The considered inclusion criteria (IC) were:

IC1: :

The study focuses on service identification.

IC2: :

The study should address different SIMs. If two different studies address improvements on the same SIM, then the most recent is considered.

The established exclusion criteria (EC) were:

EC1: :

The study is not publicly available in its complete form (full-access).

EC2: :

The study is not inserted into the SOA context.

EC3: :

The study is not written in English, which is the most common language in scientific papers.

3.1.4 Data extraction and synthesis methods

The bibliographic details of each selected primary study were recorded by using Jabref [117]. We have also recorded the names of the authors, title of the study, venue (journal, proceedings of conferences, etc.), and year of publication in a spreadsheet for quantitative analysis purposes. Furthermore, in order to extract data from these studies, data extraction spreadsheets related to each research question were built in order to synthesize the results and foster drawing of conclusions.

3.2 Conduction

In this phase, the primary studies were searched, selected, and evaluated according to the established protocol, thus resulting in a set of possibly relevant studies for the SLR. During the search process, the generic search string defined in the Planning phase has undergone minor changes in order to make it compatible with the specificities of each electronic database engine. After that, the automated search of primary studies was performed over the selected electronic databases (see Table 2) by searching for primary studies that matched the adapted search string. The performed search procedure was initially limited to title, abstract, and keyword fields and has covered publications from 2002 to June 2013; this start year was chosen by considering the oldest SIM reported by the surveys presented in Table 1. In addition, a cross-reference checking (snowballing) was also conducted by reviewing the related works section of the surveys presented in Table 1 aiming to identify additional studies that are potentially relevant and were not retrieved by the search procedures. These relevant studies found in the cross-reference checking were included in the result set of selected studies.

As summarized in Table 3, 871 studies were retrieved from the electronic databases. From this initial set, 93 studies were removed as duplicate entries between the databases and 692 studies were removed based on the selection (inclusion/exclusion) criteria. All of the retrieved studies had their title, abstract, and keywords read for the evaluation against the selection criteria. In case of doubt, the full text was analyzed. Many studies were excluded because they used terms such as service identification to refer to the process of discovering services that had already been deployed and could be reused, whereas other studies used service design to address implementation issues for composing services. Moreover, 19 studies were included after the cross-references checking, thus resulting in a set of 105 primary studies (7 surveys and 98 methods) that were finally considered relevant to this SLR and then selected for data extraction. In this SLR, a given primary study is considered relevant if it does not meet any exclusion criterion and meets at least one inclusion criterion. Figure 2 depicts these steps for selecting the relevant studies.

Fig. 2
figure 2

Selection of relevant studies

Table 3 Number of retrieved studies after the search procedure over the electronic databases

Figure 3 shows the number of selected studies classified per year of publication. From 2010 to June 2013 (a range that is not covered by the unique systematic survey [12] about SIMs published in 2010), 36 studies were published in these last four years. Therefore, this expressive number of recent studies (34.28 % of the total amount) reveals one of the main contributions of this systematic survey, in terms of updating the state of the art about SIMs.

Fig. 3
figure 3

Number of selected studies per year

4 Analysis of the selected studies

4.1 Classification scheme

In order to compare the approaches for identification of candidate services, we have developed a classification scheme based on the OASIS’ reference architecture framework for SOA [14]. The proposed classification scheme intends to provide subsides to analyze how perspectives pointed out as relevant by industry are addressed by existing SIMs. The OASIS’ reference architecture is an abstract realization of SOA that focuses on the required elements and their relationships to enable SOA-based systems to be used, realized, and governed. It also provides a common language for understanding the important features of SOA, it is independent from any technology, and it has been adopted by industry, thus being an important guide for issues that should be considered during the service identification phase.

The OASIS’ reference architecture framework is structured upon views. Views are representations of the whole system from the perspective of a related set of concerns. Each view is comprised of models, which represent an abstraction or representation of some architectural aspect. Models are mainly described by class diagrams in which each class is an element or a concept involved in a SOA ecosystem. A SOA ecosystem is a network of processes and machines that, along with a community of people, creates, uses, and governs specific services [14].

According to the OASIS’ framework, three views are used to describe SOA concerns: (i) Participation in a SOA ecosystem; (ii) Realization of a SOA ecosystem; and (iii) Ownership in a SOA ecosystem. The Participation in a SOA ecosystem view focuses on the constraints and context in which people conduct businesses by using a SOA-based system. The Realization of a SOA ecosystem view focuses on elements that are needed to support the discovery and interaction with services. Finally, the Ownership in a SOA ecosystem view focuses on the governance and management of SOA-based systems. Since service identification can crosscut within the software development activities of elicitation, analysis, and project of services, our classification schema only considers the first two views, which are directly related to such activities. Therefore, the third view is out of the scope of this work.

The first two views and their models were analyzed and the main elements and concepts of each model were identified. Figure 4 presents the classification scheme derived from the OASIS’ views and models. Table 4 shows each element/concept selected from the reference architecture and the classification perspective that was derived from them. For each element/concept, a classification perspective was derived according to service identification concerns in order to compose our classification scheme. For the Participation in a SOA ecosystem view, we have chosen the Participant and Ownership boundary elements, which are part of the Social structures in a SOA ecosystem model, and the Real-world effect element, which is part of the Actions in a SOA ecosystem model. For the Realization of a SOA ecosystem view, we have chosen the Service functionality, Behavior, and Information model elements, which are part of the Service description model. In addition, we have also chosen the Composability element, which is part of the Interacting with services model, and the Policy element, which is part of the Policies and contracts model.

Fig. 4
figure 4

Classification scheme based on the OASIS’ reference architecture for SOA

Table 4 Classification scheme

It is important to mention that the Realization of a SOA ecosystem view has a fourth model named Service visibility. Achieving visibility is one of the key requirements to enable participants to interact with each other in the context of SOA in terms of awareness, willingness, and reachability. Since these concepts are more related to processes aiming at maintaining services descriptions available and to service deployment issues than being related to the identification phase, such a model is out of the scope of this work.

Although one of the purposes of this survey is to analyze perspectives pointed out as relevant by industry, the classification schema proposed illustrated in Fig. 4 is not disconnected from the criteria proposed by the surveys mentioned in Sect. 2. Several perspectives in Table 4 encompass the criteria used by the existing surveys as follows:

  • Participant Concerns encompasses Regard to Stakeholders in [11];

  • Context of Transactions encompasses Types of categorization in [9];

  • Service Value to the business encompasses Consideration of Strategic Perspectives in [9] and Inputs in [12, 115, 116];

  • Behavior Model Detailing and Information Model Detailing encompass Model Views in [10] and Output Format in [12, 115];

  • Service Granularity encompasses Granularity in [9, 115], Service Hierarchy in [10, 11], and Service Types in [12, 115];

  • Service Dependency encompasses Supported Objects in [9] and Dependencies in [10];

  • Type of Conversation encompasses Orchestration vs. Choreography in [9];

  • Quality Attributes Elicitation encompasses Legal Compliance, Internal Policies, and Service Level Agreements in [9].

Some perspectives such as Economic Perspectives [9, 115], Method Degree of Detail [911], Tool Support [10], SOA Lifecycle Coverture [9, 11], and Industry Sector [9, 115] do not have correspondence to our classification criteria, and then, they were not analyzed in this survey. Finally, Techniques [9, 10, 12, 13, 115] and Identification Strategy [912, 115] perspectives employed in the service identification process cannot be directly correlated to any element of the reference architecture. In particular, the Techniques perspective implements the identification strategy and describes the method used to identify service candidates. Nevertheless, due their relevance to the service identification phase and their recurrence in the existing SIM surveys, this study analyzes the Identification Strategy perspective within the scope of almost all perspectives described in Table 4.

4.2 Results

The following sections detail the achieved findings of the conducted survey by evaluating the perspectives derived from the reference architecture (Sect. 4.2.1) and then detailing the techniques employed in the surveyed SIMs according to the Technique perspective (Sect. 4.2.2). Furthermore, we discuss some existing gaps in the field of SIMs (Sect. 4.2.3).

4.2.1 Analysis of the classification perspectives derived from the reference architecture

After defining our classification schema presented in Table 4, the findings were analyzed accordingly to it. Such an analysis intends to identify which of these classification perspectives are addressed by existing SIMs, thus answering RQ1. A summary of how each perspective was addressed is presented in the following subsections.

Participant concerns

The Participant Concerns perspective evaluates providers’ or consumers’ concerns addressed by the SIMs. The most common concerns are in this perspective reutilization and implementation issues. Reutilization concern is addressed in SIMs by the identification of functions or tasks with high potential of reuse, i.e., functions required by several stakeholders inside or outside the organization.

Implementation concern is addressed in SIMs by (i) providing steps and guidelines to service realization and implementation [32, 67, 85, 86, 90, 114]; (ii) extracting legacy code for service packaging [17, 29, 33, 37, 48, 52, 53, 62, 64, 72, 77, 84, 97, 101103]; and (iii) identifying of points of variability. Variability refers to assumptions about how members of a family of products may differ from each other [40]. The most common types of variability addressed are variability of activities (in which activities can be optional or alternative to accomplish an action) [25, 36, 40, 44, 45, 47, 65, 95], variability in the interfaces [36, 44, 66, 95], and product dynamic reconfiguration based on context information [20].

Most of the top–down approaches aim at just identifying reusable services with no implementation concerns [19, 2124, 26, 27, 30, 31, 34, 38, 39, 4143, 46, 4951, 5456, 5861, 63, 67, 69, 70, 74, 75, 78, 80, 81, 8789, 92, 94, 98100, 104107, 109, 111, 113] Top–down SIMs that have implementation concerns [20, 25, 32, 40, 44, 45, 47, 6567, 90, 95] are those that address variability as aforementioned. On the other hand, bottom–up SIMs usually address implementation concerns [17, 29, 33, 37, 48, 52, 53, 62, 64, 72, 77, 84, 97, 101, 103, 114] since they deal with legacy code extraction and reorganization to define the service packaging.

It is noteworthy that despite the legacy systems evaluation performed by meet in the middle approaches, the majority of them only uses it to detail the business domain [57, 91] or to identify if any existing function corresponds to a requirement that a service candidate must fulfill [18, 57, 73, 76, 79, 83, 108, 110] without addressing implementation concerns.

Regarding the techniques detailed in Sect. 4.2.2, only product line, source code analysis, and wrapping tackle implementation concerns. The other techniques only tackle reutilization issues.

Context of transactions

The Context of Transactions perspective classifies SIMs by taking into account issues related to the distribution of resources and interactions of people and systems inside or outside the enterprise. The explicitness of these boundaries is important to identify the implications of crossing them, especially for analyzing their impact on aspects related to governance and security.

An example of how the Context of Transactions perspective is addressed inside of enterprise boundaries is the approach presented in [61], which decomposes business processes and identifies services goals, services interactions with applications or people, and the resources (inputs and outputs) of a service. In turn, the Context of Transactions perspective outside enterprise boundaries can be exemplified by the SIM proposed in [19], which uses the service responsibility analysis technique. Such a technique identifies services by using information sharing relationships between organizations as inputs, thus eliciting their responsibilities and rules that govern the exchange of information.

The majority of the studies does not detail interactions of actors (people or systems) with a service [17, 18, 20, 2230, 32, 33, 3537, 40, 4248, 5259, 62, 6468, 7193, 96103, 105108, 110114]. The studies that make such a detailing are top–down approaches that apply ontologies to describe services [21, 31, 61, 63, 69, 70, 95] or employ techniques as value analysis [38, 39, 41, 4951, 56, 60, 94, 109] and service responsibility analysis [19]. Ontologies provide means to describe services capabilities, resources, and actors, whereas value analysis and service responsibility analysis techniques identify services from the explicitness of participant’s interaction.

SIMs that deal with identification outside the enterprise boundaries [19, 34, 38, 39, 41, 4951, 56, 60, 94, 104, 109] can be adapted to identify services inside enterprise boundaries. This can be accomplished by replacing external economic entities by internal entities, such as departments. Nevertheless, the opposite is harder to be achieved because SIMs that deal with identification inside enterprise boundaries usually use intra-enterprise business process or legacy application as inputs to identify services.

Service value to the business

The SOA approach aims to align IT and business perspectives by building services that are business-focused and can be reused and deployed across multiple software applications. The Service Value to the business perspective then evaluates whether the effect expected when a consumer interacts with a service is related to business strategy, thus being directly perceived by the consumer or not. Services that are indirectly related to business are usually related to technical aspects or are fine-grained services used to compose other services that offer direct value to the business.

This classification perspective is directly related to the input used by the SIM. The majority of the top–down approaches identify services that are both directly and indirectly related to business. The other top–down approaches only identify services directly related to business. Meet in the middle approaches always identify services that are both directly and indirectly related to business, and the bottom–up ones identify services that are indirectly related to business.

Business goals [22, 38, 49, 58, 79, 81] are part of the business strategy. A SIM can identify services to achieve these goals, thus having direct traceability to the business needs. The same principle applies to business models [19, 21, 39, 41, 50, 51, 56, 63, 75, 80, 89, 94, 104, 109, 111]. As business models describe the enterprise mission, business requirements, and organizational architecture, they can be used as inputs to identify services that support information exchanges among organizations (e.g., [39]) or inside them (e.g., [63]). On the other hand, legacy code (existing software assets of an enterprise) [17, 28, 33, 3537, 48, 52, 53, 62, 64, 73, 76, 77, 79, 8285, 91, 96, 97, 101103, 110] or documentation [18, 29, 57, 68, 71, 86, 93, 108, 112] inputs derive services that usually have indirect relation to business. These services correspond to technical aspects or they are fine-grained services used to compose other services (e.g., [29]).

Inputs as business processes [18, 23, 24, 26, 27, 32, 36, 40, 4244, 54, 55, 57, 5961, 65, 67, 69, 73, 74, 76, 78, 83, 8587, 90, 91, 98100, 105, 107, 108, 110, 113], features [20, 25, 45, 47, 95], requirements [30, 31, 34, 46, 66, 70, 88, 92, 106, 114] or database [72] assets can originate services with both direct and indirect effects. The service effect depends on the process decomposition level. High-level process originates services directly related to the businesses, whereas subsequent decompositions originate more fine-grained services that tend to be indirectly related to the businesses [100]. When using features as inputs, the effect depends on the relevance to the businesses of the feature’s product.

Service description

A good service description is essential to enable service reutilization by matching user requirements against service capabilities. Service descriptions can be started within identification or specification phases. The advantage of describing services in the identification phase is to have a detailed perspective of service capabilities by identifying whether the service really delivers the expected value, and also providing a better input to the specification phase. Descriptions can be semantic or syntax-based. Semantic descriptions describe a service based on semantically enriched formats, such as ontology annotation [118] and context information-based [20] methods. In turn, syntax-based descriptions hide service implementation details and expose the externally observable service behavior as input and output values of service interfaces [15]. Semantic descriptions convey real-world meaning to the services, so that this type of description is important to provide a clear understanding of the effects resulted from the invocation of a service and a consistent interpretation of the handled data, in particular when the interaction occurs across ownership boundaries. It can also enable automatic service discovery and composition [119].

We have noticed that only 16 studies present semantic descriptions of services [20, 21, 31, 43, 55, 58, 61, 63, 6971, 75, 82, 87, 94, 95]. Among them, only two are bottom–up approaches [71, 82] and the other 14 are top–down approaches. This result might reflect an influence of the traditional process modeling and software engineering methodologies that are used for modeling and developing systems, which do not have a strategy to document semantic aspects of information. The studies that semantically describe services usually adopt ontology-based techniques to identify services since ontologies are able to provide semantic classes to organize relevant domain aspects to service description, such as participants, resources, and operations.

Only one study [20] explicitly identifies and describes context information. This is accomplished by defining attributes (as data types and validity conditions) of each identified context parameter and specifying each situation as a logical expression of contextual parameters to enable service dynamic reconfiguration.

Behavior model

A behavior model is part of the service description, and it is essential to understand and foster the interaction with the service. A well-defined behavior model characterizes knowledge of the (i) actions invoked against the service, (ii) events, and (iii) temporal relationships associated in a service interaction. It should also describe activities involved in a workflow that represents a work unit [14]. Since the service external behavior is dependent of service internal actions, sequence, and events, SIMs should elicit these perspectives to support the behavior model description.

A behavior model description can be done after the service implementation, i.e., outside SIMs scope, but such a description helps to elicit and understand the service scope and to enable assessment of service responsibilities (cohesion) and dependency from other services. Existing SIMs deal with the behavior model in three ways:

  • External behavior only: The main purpose is to identify the external behavior of a service, but not its internal operation. The focus is on inputs and outputs or on the service purpose (real-world effect resulted from service execution) [38];

  • Action description: SIMs identify and describe internal service actions, but do not detail their internal behavior [78];

  • Actions and behavior description: SIMs identify and describe service internal actions and their behavior. Internal action behavior can be described as events, pre- and post-conditions, and actions sequence [87].

All SIMs that have elicited only external behavior were top–down approaches [34, 38, 39, 41, 50, 51, 56, 80, 94, 104, 111]. Among the SIMs that elicit action descriptions, the one in [68] is bottom–up, [91] is meet in the middle, and the other SIMs are top–down [22, 25, 27, 30, 40, 4244, 47, 59, 70, 78, 102] . In regards to events and conditions elicitation, most of SIMs are top–down approaches [19, 21, 58, 63, 69, 75, 87], except [57] that is a meet in the middle one, and [71] that is bottom–up. Except for these studies, all bottom–up and meet in the middle approaches elicit the sequence of actions for a service [17, 18, 20, 23, 24, 26, 28, 29, 3137, 45, 46, 48, 49, 5255, 6062, 6467, 7274, 76, 77, 79, 8186, 8890, 92, 93, 95101, 103, 105110, 112114] .

Some of the techniques detailed in Sect. 4.2.2 are related to behavior model description. Among the studies that only detail external behavior, the majority uses the value analysis technique [38, 39, 41, 50, 51, 56, 94, 104, 111]. In the studies that provide internal actions and internal behavior description, techniques such as model-driven ones, ontology mapping, and service responsibility analysis tend to foster the description events, pre- and-post conditions in the behavior model. Platform-independent models (PIMs) are used to identify events and conditions in model-driven approaches [57, 58, 87, 90], whereas classes and relationship types are used in ontology mapping [21, 63, 69, 71, 75], and governance rules are used in service responsibility analysis [19]. On the other hand, techniques such as decomposition, requirement analysis, source code analysis, and wrapping tend to focus in the description of internal actions sequence. In these techniques, the sequence of actions is identified by process activities sequence [18, 23, 24, 26, 32, 54, 60, 61, 67, 73, 74, 76, 79, 81, 83, 85, 86, 89, 98100, 105, 107110, 113] in decomposition approaches, by process diagrams [36, 46] or use case description [31, 66, 88, 92, 106, 114] in requirement analysis, and by implementation sequence in source code analysis [28, 33, 35, 48, 52, 53, 62, 64, 72, 82, 84, 96, 97, 101, 103, 112], as well as in wrapping [17, 37, 77, 102].

Information model

As the behavior model, the information model is part of the service description. A well-defined service information model describes the syntax and semantics of the messages and data payloads, exception conditions, and error handling in the event of faults [14], thus enabling meaningful exchange of information by matching the model semantics with the semantics of the service consumers.

SIMs deal with the information model in a variety of ways:

  • Not detailed: Information handled by the service is not identified or it is identified, but is not detailed. The focus is on the service functionalities or on identifying the service purpose and not its internal operation [38].

  • Information structure description: Information model is detailed with focus on the description of the information structure (classes and attributes) [18].

  • Messages and parameters: Information model is detailed with focus on identification and description of service messages and parameters structure [37].

  • Semantic description: Information model is detailed with focus on semantic description of the handled information or service messages [69].

The majority of the studies that do not detail information model [22, 23, 2530, 34, 36, 3844, 47, 50, 51, 54, 56, 59, 65, 73, 7881, 83, 9296, 98100, 104107, 111113] regards to top–down approaches. Bottom–up approaches detail information structure or messages and parameters [1721, 24, 3133, 35, 37, 37, 45, 46, 46, 48, 49, 52, 53, 55, 55, 57, 58, 60, 62, 6264, 64, 66, 6668, 68, 69, 69, 70, 7072, 74, 76, 77, 77, 84, 8486, 88, 88, 89, 8991, 97, 97, 101103, 108, 109, 114]. Finally, meet in the middle approaches do not detail neither information model nor information structure.

In regards to the techniques, we have found some correlation with the information model description perspective. The techniques that traditionally do not focus on describing an information model are (i) pattern matching, which focuses on identifying patterns in processes [42, 59] or in legacy code [29]; (ii) product line, which focuses on functionalities performed by services [25, 40, 44, 47, 65, 95]; and (iii) value analysis [38, 39, 41, 50, 51, 56, 94, 104, 111], which focuses on identifying what a service should be and not on its internal description.

Another correlation was found with the SIMs that detail the behavior model. SIMs that use syntax-based service descriptions tend to focus on detailing information or message structures, whereas the ones that use semantic-based tend to focus on the semantic description of information and messages or on message structure [21, 61, 63, 69, 70, 75, 82, 87].

Service granularity

Services can be atomic, i.e., visible to a consumer via a single interface and described via a single service description that does not use or interact with other services, or composite, i.e., visible to a consumer via a single interface and described via a single service description comprised by the aggregation or composition of one or more other services [14]. Service composition can be either performed by composing atomic or composite services. When composing services, the business logic is implemented by several services, thus allowing the definition of increasingly complex solutions by progressively aggregating components at higher levels of abstraction. In this perspective, 57 of the considered SIMs deal with both atomic and composite services [1720, 24, 2628, 30, 31, 33, 36, 37, 39, 40, 45, 4749, 55, 56, 58, 60, 61, 65, 6772, 7477, 79, 81, 83, 8587, 90, 9298, 100, 106108, 110] , whereas 41 SIMs specifically deal with atomic services [21, 22, 25, 32, 34, 35, 38, 4143, 46, 5054, 57, 59, 6264, 66, 73, 78, 80, 82, 84, 88, 89, 91, 99, 101105, 109, 111114] , and only three SIMs specifically deal with composite services [23, 29, 44].

The study in [28] is an example of how a SIM can address the identification of both atomic and composite services. Such a SIM defines a service layers model, which is a natural composition hierarchy. The definition of which functions are part of a service is done accordingly to the layer responsibility. Therefore, the organization in layers contributes to the identification of services with “right” granularity and cohesion.

It was not possible to identify any correlation between the techniques or approaches with the identification of atomic or composite services as all techniques were able to identify both types of services. The decision of grouping functions in a coarse-grained service or creating only fine-grained services or even creating fine-grained services and composing them seems to be related only to the scope chosen by each SIM.

Service dependency

The Service Dependency perspective aims to describe whether SIMs identify relations between services or resources required by them. Resources and relationship elicitation is important to provide a broader knowledge of service utilization and operation, thus making service design and implementation easier. SIMs are classified in four categories according to the dependencies that they elicit:

All SIMs that do not identify any relationship are top–down approaches. Among the SIMs that use a bottom–up approach, almost all (except two) identify resources required by candidate services or resources and relationships. The majority of the top–down approaches identifies service relationships, whereas the meet in the middle ones identifies both resources required and relationships between services.

The majority of the SIMs elicits relationships between services. This happens because the majority of the techniques used by the SIMs elicits relationships (dependencies) within the several inputs and these dependencies can derive services relationships. Process-oriented approaches as decomposition [18, 23, 24, 26, 60, 61, 67, 74, 76, 79, 81, 83, 85, 86, 89, 91, 98100, 105, 107110, 113] and model-driven [27, 57, 58, 87, 90] ones usually identify control flow between activities. These activities (or group of activities) are performed by service compositions and collaborations. Relationships between activities also reflect relationships between services that implement them. Product line identifies services relationships from similarities and relationships between functions [20, 25, 40, 44, 45, 47, 65, 95], which can be achieved by service collaborations and compositions. Requirements analysis [30, 31, 34, 36, 70, 88, 92, 106, 114] and Service-oriented design aspect (SODA) [55, 78] identify service collaborations in order to fulfill a requirement or an aspect, and service responsibility identifies dependencies between task and data services [19]. Techniques such as asset identification [18, 57, 68, 86, 93, 108, 112] and ontology mapping [21, 43, 63, 69, 71, 75] focus on the identification of both relationships and resources, whereas source code analysis [28, 33, 35, 36, 48, 52, 53, 62, 64, 72, 73, 76, 79, 8285, 96, 97, 101, 103] and wrapping [17, 37, 77, 102] tend to identify legacy resources (source code, data and existing services) that might be used for implementing a service. Finally, pattern matching [42, 59] and value analysis [38, 41, 50, 51, 56, 104, 111] usually do not identify service resources or relationships as the focus is on defining which operations should be grouped within a service or the service purpose, respectively.

Type of conversation

Services can be composed in a variety of ways, including direct consumer-to-service interaction by using programming techniques, or they can be aggregated by means of an aggregation approach such as choreography and orchestration. Choreography is used to characterize and to compose business collaborations based on ordered message exchanges between peer entities in order to achieve a common business goal. In turn, orchestration is used to compose hierarchical and self-contained service-oriented business processes that are executed and coordinated by a single agent [14].

SIMs can detail the type of conversation by means of orchestration and/or choreography or none of them. Most studies do not detail the type of conversation between services [22, 2426, 2835, 37, 38, 4043, 4648, 5054, 57, 59, 60, 6264, 6669, 7173, 75, 78, 8085, 88, 89, 9193, 95114]. Among the studies that detail this perspective, the majority regards to top–down approaches. Studies that mention both types of conversation do not detail how service collaborations are identified or implemented [27, 45, 86]. SIMs that mention choreography identify service collaborations by interactions flows [21, 44, 79] or define service adapters to enable dynamic composition [36]. Service orchestrations are also identified by interactions flows [17, 20, 23, 39, 56, 58, 61, 65, 70, 74, 76, 87, 90, 94] or by defining composition patterns [49, 55]. We have not found any correlation between conversation type detailing and the technique or approach employed by the SIM.

Quality attributes elicitation

Finally, the Quality Attributes elicitation perspective is related to the elicitation of the quality attributes that influence the design, policies, or execution contexts of the services. Quality depends on the stakeholders’ requirements, but some general service quality attributes can be identified in a SOA context. Erl [16] emphasizes that the basic software quality design principles of low coupling and high cohesion should be observed during all service creation cycle. Service granularity is also pointed out as a quality attribute because the granularity level of a service can affect its capabilities, performance, reusability, and coupling.

Only ten studies identify quality attributes. SIMs deal with service candidate quality by using metrics of coupling [22, 45, 100, 109], cohesion [22, 45, 100, 101, 109], granularity [22, 105], modularity [26, 33], reusability using the semantic distance between features [25], and QoS [23] by estimating a weight to execute activities. Due to this small number of reported SIMs that address service quality attributes, we can infer that such a perspective still is immature in the context of service identification. Almost all SIMs intend to identify service candidates, but they usually do not assess their quality neither any effort to improve identified candidates. Quality attribute elicitation does not seem to be related to techniques or approaches, thus being a consequence of the scope chosen by the SIM.

4.2.2 Analysis of the techniques employed in the service identification process

In order to answer the RQ2, the studies were categorized in light of software engineering techniques employed in the service identification process and the proposed classification scheme (Sect. 4.1). Aiming to categorize each technique, we have used the high-value activities criteria proposed in [13] with some improvements: (i) the addition of Product line approach and Artifact-centric approach techniques and (ii) the adjustments of the delivery strategy for Ontology mapping and Service classification techniques in order to include the bottom–up strategy since these techniques can be used with both top–down and bottom–up strategies. Table 5 presents all selected techniques. For each one, we show their description, artifacts generated by them, and the delivery strategy adopted (top–down or bottom–up).

Table 5 Techniques employed in the service identification process

Decomposition is the most used technique with 38 occurrences among the analyzed SIMs, followed by Source code analysis (27), and Value analysis (13). More than a half of the studies combine two or more techniques. The most used combination is Decomposition and Source code analysis, which is usually applied in meet in the middle approaches. Service classification, SODA, Asset identification, and Wrapping are always used together with other techniques, but only Service classification acts as a complementary technique often used to ensure services with right granularity and high cohesion.

Service classification and Ontology mapping techniques can be applied in both top–down and bottom–up strategies. Considering all techniques, the top–down is the most used approach, being employed by 56 SIMs, while the bottom–up one is employed by 30 studies. The meet in the middle approach is less used, being employed only by 12 studies.

Table 6 correlates the multiple perspectives of the OASIS’ reference architecture for SOA described in Sect. 4.1 with the software engineering techniques employed by the existing SIMs. Table 6 aims to assist practitioners to reason about how the techniques can be applied to the service identification process in the light of the OASIS’ reference architecture classification scheme. The techniques presented in Table 5 (except Service Classification as it is a complementary technique) were analyzed in order to find out influences or correlations to each perspective. Since we have not found any correlation among Service Granularity, Type of Conversation, and Quality Attributes elicitation perspectives and the software engineering techniques employed by existent SIMs, they are not listed in Table 6. Despite the absence of direct correlations, some relevant observations can be drawn from the analysis of these perspectives. The identification of services with different granularities (Service Granularity perspective) can be accomplished by segregating services in layers as suggested in [18]. A precondition to address the Type of Conversation perspective is to elicit service relationship. Therefore, a technique that enables this type of identification must be chosen (for instance, a model-driven approach or the value analysis technique). The Quality Attributes elicitation perspective can be addressed by defining metrics to assess services quality accordingly to the elicited requirements, independently on the applied technique. The other perspectives are addressed according to the correlations between techniques and classification perspectives presented in Table 6.

Table 6 Techniques and classification perspectives

As an example of usage of Table 6 to aid practitioners in the service identification phase, consider that a practitioner wants to know which Participant concerns can be used from the Decomposition technique. Since such perspective presents two concerns, namely reutilization issues only and implementation and realization issues (see Sect. 4.2.1), Table 6 shows that the answer is Reutilization only. Another example is when a practitioner wants to identify services’ implementation concerns and the resources required or provided by them, according to the Service Dependency perspective (see Sect. 4.2.1). In this case, Table 6 also indicates that (s)he can use Product line or Source code analysis or Wrapping techniques combined with the Decomposition or Ontology Mapping or Assets Identification techniques.

Table 6 can also assist practitioners in the selection of software engineering techniques to be used depending on the type of project they are developing. For example, if a practitioner is in charge of a SOA project focused on delivering fast results, he/she can choose the Source code analysis technique because it fosters the elicitation of implementation concerns (from the Participant Concerns perspective), thus contributing to speed up the construction process. In this same example, according to Table 6, the identified services will be indirectly related to businesses (Service Value perspective). Therefore, such services tend to deliver value to IT department rather than to the core enterprise business. As another example, if the practitioner is involved in a SOA project to integrate inter-organizational processes, Table 6 tells him/her that the Value Analysis or Service Responsibility Analysis techniques must be used because they are the only ones capable of eliciting cross-enterprise interactions (Context of Transactions perspective). These examples show how this type of correlation can be a useful tool to shed light on the consequences of practitioner decisions of using different software engineering techniques in the service identification phase of SOA projects.

Moreover, Table 6 can suggest which techniques are the most complete ones for service identification purposes. Service identification must support SOA goals of increasing organizational agility, increasing the return of investment, and promoting the alignment of business and IT domains [16]. In order to promote business and IT alignment, a software engineering technique must be able to identify services that deliver direct value to business and also IT services that support business services (C1). The business agility goal needs a clear understanding of the service candidate capability in order to provide services that deliver value to businesses and that are also reusable. Service capability comprehension can be enhanced by: eliciting the service actions and behavior (behavior model) (C2), eliciting service information model (messages, parameters, and semantic) (C3), using semantic descriptions (C4), and eliciting services interactions with applications or people (C5). The return of investment can be increased by analyzing implementation issues (C6) during the identification in order to avoid the identification of candidates that are too complex to be implemented, and also by identifying services and resources that can be reused to realize a service capability (C7). The analysis of implementation issues in the identification step and the clear understanding of the service candidate capability also promote the delivery of a better input for the specification and realization steps, thus enhancing the overall quality of the service-oriented modeling and architectural design process.

The analysis of the aforementioned characteristics in light of the Table 6 reveals that Ontology mapping would be the most complete software engineering technique for service identification. This technique addresses six of the aforementioned characteristics: (i) identification of services that deliver both direct and indirect value to businesses (C1); (ii) elicitation of the internal events, pre- and-post conditions (C2); elicitation of the information model (C3); utilization of semantic-based service description (C4); elicitation of service interactions with applications or people (C5); and identification of services and resources dependencies (C7). Table 7 summarizes the characteristics addressed by the software engineering techniques analyzed in Sect. 4.2.2.

Table 7 Service identification characteristics addressed by software engineering techniques

4.2.3 Existing gaps in the SIM field

Besides analyzing existing SIM in the literature in the context of the reference architecture perspectives, the survey findings allowed us to identify gaps in current work that could be further exploited in future researches. The most relevant gaps identified in the surveyed SIMs are (i) identification of both business and software services; (ii) analysis of non-functional requirements; and (iii) quality assessment of candidate services. In the following, we briefly discuss each one in the context of our classification scheme and/or of the analyzed techniques adopted by SIMs.

Identification of both business and software services. Service identification should be wide-ranged and consider multiple perspectives comprising business and technological issues [120]. Business perspective is related to business goals and requirements, and it is generally structured into processes or business models that express rules, constraints, and dependencies. IT perspective is the automation of the business perspective organized into several technology solutions. The analysis of the business perspective is important to identify services that deliver direct value to business and promote business agility, which is one of the SOA major goals [16]. On the other hand, the analysis of the IT perspective promotes an alignment with the existing IT assets and helps to identify resources (data, application functions, and existing services) needed to realize service capabilities, thus producing a better input to the specification phase. These two perspectives are complementary, so that the analysis of only one perspective can compromise the achievement of SOA goals or lead to services that are not suitable to the organizations reality. For example, the isolated analysis of the business perspective can lead to the identification of services that fulfill business requirements, but are very expensive to be developed and integrated with the existing software assets or IT architecture. Another example is the identification of services by functional analysis of existing software assets, which may lead to services with a limited range of reuse, thus compromising the SOA goals of promoting business agility and increasing the return of investment. This gap is directly related to the Service Value to the business perspective analyzed in Sect. 4.2.1. As reported in Table 7, few SIMs address both perspectives, and the majority of the SIMs addresses only the business perspective.

Analysis of non-functional requirements. Business requirements are not the only requirements that originate service candidates or affect the capabilities of service candidates. Non-functional, technical requirements might also reveal constraints, conditions of use of a service, or even additional service candidates that support the accomplishment of non-functional requirements. For instance, non-functional requirements regarding security can originate services to authenticate users, control access to specific functionalities, or limit access to some services depending on the user’s profile. Moreover, conflicting non-functional requirements might cause service redesign [12], thus impacting all process of service-oriented modeling and design. Despite the relevance of non-functional requirements, few SIMs consider this aspect. It is noteworthy that the software engineering technique employed by the SIM directly affects the capability of eliciting non-functional requirements. Some SIMs elicit non-functional requirements by using SODA technique [52, 55, 78], which identifies services based on the decomposition of interactions, concerns and features into aspects, and compose them according to requirements. Dinh and Nguyen-Ngoc [19] elicit constraints and legal issues of the information exchange between organizations. Samavi et al. [41] identify non-functional requirements as services softgoals. Finally, the researches in [20, 22, 25, 34, 36, 38, 45, 47, 49, 51, 54, 65, 85, 87] have an activity to elicit non-functional requirements or receive the requirements as inputs. Regarding the classification scheme adopted in our work, the gap concerning non-functional requirements is a crosscutting aspect affecting several of the considered perspectives, such as the Participation Concern, Service Description, Behavior Model, Quality of Attribute, and Service Dependency.

Quality assessment of candidate services. Most researchers agree on the importance of metrics to improve the quality of the identified services and of the SIM itself. Quality is dependent of stakeholder requirements, but some general service quality attributes can be identified in a SOA context. Erl [16] emphasizes that the basic software quality design principles of low coupling and high cohesion should be observed during all service lifecycle. Service granularity is also pointed out as a quality attribute as the granularity level of a service can affect its capabilities, performance, reusability, and coupling. SIMs deal with service candidate quality by using metrics of coupling [22, 45, 100, 109], cohesion [22, 45, 100, 101, 109], granularity [22, 88], modularity [26, 33], reusability using the semantic distance between features [25], and quality of service by estimating a weight to execute activities [23]. SIMs usually do not assess service quality neither do any effort to improve quality attributes of identified candidates. Regardless of the adopted quality attribute, SIMs should provide means to assess service candidate quality. Services with low quality can affect the reuse, thus compromising the achievement of the SOA goals of promoting business agility and improving the return of investment.

5 Threats to validity

The main threats to the validity of this survey are related to:

  • Its completeness. The electronic databases used in this systematic survey (see Table 2) are considered the most relevant available sources [3, 4], but some studies may have been missed due to technical limitations of the search engines themselves, which are out of the control of the researchers. Furthermore, these databases do not represent an exhaustive list of publication sources, so that other databases might also be included.

  • Reviewers’ reliability. Although the conclusions might have been influenced by the researchers’ opinions, it was adopted a dual-revision strategy in order to minimize the effect of any bias or misinterpretation. Therefore, the studies were evaluated more than once, each time by a different researcher.

  • Data extraction. Since not all information was obvious to answer the established research questions, some data had to be interpreted. Nevertheless, discussions were conducted whenever a disagreement between the researchers occurred in order to ensure the validity of this systematic survey.

6 Conclusion

This paper reported the results of a systematic survey that quantitatively characterizes SIMs reported in 105 studies published from 2002 to June 2013 retrieved from four electronic libraries. A classification scheme based on a reference architecture adopted by industry was proposed as a way to suggest which issues should be considered during services identification phase and how the existing approaches address them. SIMs differ in the way they address the proposed classification perspectives. Nevertheless, the analysis presented in Sect. 4.2.1 demonstrates that existing SIMs address many perspectives of the adopted SOA reference architecture, thus suggesting that these SIMs are aligned with the concerns related to SOA adoption in the context of service identification phase.

More than a half of the proposed methods use more than one software engineering technique, but few are meet in the middle approaches. Meet in the middle approaches are more complete as they evaluate models from the highest level to the most detailed one, thus allowing the reuse of existing assets (services and applications) as well as the generation of fine-grained (more reusable) and coarse-grained services that generate immediate value to the businesses.

The technique chosen by each SIM can influence on how each classification perspective is addressed. Perspectives as Participant Concerns, Context of Transactions, Service Description, Behavior Model, Information Model, and Service Dependency have a correlation with the technique. Service Value perspective is influenced by the input used by the technique, but not by the technique itself, and perspectives as Service Granularity, Type of Conversation, and Quality Attributes elicitation do not seem to have correlation with the employed techniques. Furthermore, we intend to aid practitioners to understand the consequences of using software engineering techniques employed by different SIMs and also to encourage researchers to promote improvements in this field by combining techniques or creating new ways to address the service identification perspectives.