Keywords

1 Introduction

Modern society is increasingly dependent on large-scale IT infrastructures. At the same time, the latest IT challenges impose higher levels of reliability and efficiency on computer systems. Because of the large increase in size and complexity of these systems, IT operators are increasingly challenged while performing tedious administration tasks manually. This has sparked in recent years much interest towards the study of self-managing and autonomic computing systems to improve efficiency and responsiveness of IT services. While many static algorithmic solutions have been proposed, these automated solutions often show limitations in terms of adaptiveness and scalability. The presence of large data volumes in different modalities motivates the investigation of intelligent learning systems, able to adapt their behavior to new observations and situations.

Artificial Intelligence for IT Operations (AIOps) investigates the use of Artificial Intelligence (AI) for the management and improvement of IT services. AIOps relies Machine Learning, Big Data, and analytic technologies to monitor computer infrastructures and provide proactive insights and recommendations to reduce failures, improve mean-time-to-recovery (MTTR) and allocate computing resources efficiently [3]. AIOps offers a wide, diverse set of tools for several applications, from efficient resource management and scheduling to complex failure management tasks such as failure prediction, anomaly detection and remediation [13, 23]. However, being a recent and cross-disciplinary field, AIOps is still a largely unstructured research area. The existing contributions are scattered across different conferences and apply different terminology conventions. Moreover, the high number of application areas renders the search and collection of relevant papers difficult. Some previous systematic works only treat single tasks or subareas inside AIOps [20, 31]. This motivates the need for a complete and updated study of AIOps contributions.

In this paper, we present in-depth analysis of AIOps to cover for these limitations. We have identified and extracted over 1000 AIOps contributions through a systematic mapping study, enabling us to delineate common trends, problems and tools. First, we provide an in-depth description of the methodology followed in our mapping study (Sect. 2), reporting and motivating our planning choices regarding problem definition, search, selection and mapping. Then, we present and discuss the results drawn from our study, including the identification of most common topics, data sources, and target components (Sect. 3). Finally, Sect. 4 summarizes the outcomes and conclusions treated in this work.

2 Methodology

2.1 Systematic Mapping Studies

A systematic mapping study (SMS) is a research methodology widely adopted in many research areas, including software engineering [34]. The ultimate goal of a SMS is to provide an overview of a specific research area, to obtain a set of related papers and to delineate trends present inside such area. Relevant papers are collected via predefined search and selection techniques and research trends are identified using categorization techniques across different aspects of the identified papers, e.g. topic or contribution type. We choose to perform a SMS because we are interested in gathering contributions and obtaining statistical insights about AIOps, such as the distribution of works in different subareas and the presence of temporal trends for particular topics. SMSs have also been shown to increase the effectiveness of follow-up systematic literature reviews [34]. To this end, we have also used our systematic mapping study to collect references for a survey on failure management in AIOps separately published.

2.2 Planning

According to the step outline followed in [34], a systematic mapping study is composed of:

  • Formulation, i.e. express the goals intended for the study through research questions. Equally important is to clearly define the scope of investigation;

  • Search, i.e. define strategies to obtain a sufficiently high number of papers within the scope of investigation. This comprises the selection of one or more search strategies (database search, manual search, reference search, etc.);

  • Selection (or screening), i.e. define and apply a set of inclusion/exclusion criteria for identifying relevant papers inside the search result set;

  • Data Extraction and Mapping, i.e. gather the information required to map the selected papers into predefined categorization scheme(s). Finally, results are presented in graphical form, such as histograms or bubble plots.

The next sections illustrate and motivate our choices regarding these four steps for our systematic mapping study in AIOps.

2.3 Formulation

The main goal of this mapping study is to identify the extent of past research in AIOps. In particular, we would like to identify a representative set of AIOps contributions which can be grouped based on the similarity of goals, employed data sources and target system components. We also wish to understand the relative distribution of publications within these categories and the temporal implications involved. Formally, we articulate the following research questions:

  • RQ1. What categories can be observed while classifying AIOps contributions in scientific literature?

  • RQ2. What is the distribution of papers in such categories?

  • RQ3. Which temporal trends can be observed for the field of AIOps?

In terms of scope, we express the boundaries of AIOps as the union of goals and problems in IT Operations when dealt with AI techniques. To circumvent ambiguity about the term AI, we adopt an inclusive convention where we consider AI both date-driven approaches, such as Machine Learning and data mining, as well as goal-based approaches, such as reasoning, search and optimization approaches. However, we mostly concentrate our efforts on the first category due to its stronger presence and connection to AIOps methodologies (e.g. data collection).

2.4 Search and Selection

Selection Criteria. We start illustrating the selection principles beforehand, so that the discussion will appear clearer when we describe our result collection strategy, composed of search and selection altogether. In terms of inclusion criteria, we define only one relevance criterion, based on the main topic of the document. Following from our discussion on scoping such inclusion criterion comprises two necessary conditions:

  • The document references one or more AI methods. These mentions can either be part of the implementation or as part of its discussion/analysis (e.g. in a survey). Any mention to AI algorithms employed by others (i.e. mentioned in the related work section or as baseline comparison) that is not strictly the focus of the document, is not considered valid;

  • The document applies its concepts to some kind of IT system management. We therefore exclude papers with no specific target domain or with a target domain outside of IT Operations.

In terms of exclusion criteria, we define the following as exclusion rules:

  • The language of the document is not English;

  • The document is not accessible online;

  • The document does not belong to the following categories: scientific article (conference paper, journal article), book, white paper;

  • The main topic of the document is one of the following: cybersecurity, industrial process control, cyber-physical systems, and optical sensor networks.

For the special case of survey and review papers, we consider them relevant as long while carrying out our mapping study, but we then exclude them from our final result set, as these articles are useful to find other connected works through references, but they do not constitute novel contributions to the field.

Table 1. The two keyword sets obtained via PICO used for database search.

Database Search. For the search process, database search represents the first and most important step, as it aims to provide the highest number of results and perform an initial screening of irrelevant papers. We perform database search in three steps: keywording, query construction and result polling. For keywording we use the PICO technique presented in [34] to derive a set of keywords for AI and a set of keywords for IT Operations. The keywords are listed in Table 1. Then, following our scoping considerations, we construct queries so that they return results where both AI and IT Operations are present. In particular, we apply logic conjunction of keywords across all combinations of the two keyword sets (e.g. “logistic regression” and “cloud computing”). This helps enforcing precision in our search results. For keywords with synonyms and abbreviations, we allow all equivalent expressions via OR disjunction. We also perform general search queries, related to the topic as a whole (e.g. “AIOps”). Finally, we group some queries with common terms to reduce the number of queries.

We select three online search databases that are appropriate for the scope of investigation: IEEE Xplore, ACM Digital Library and arXiv. For each query we restrict our analysis to the top 2000 results returned. We aggregate results from all searches in one large set of papers, removing duplicates and annotating for each item corresponding search metadata (e.g. number of hits, index position in corresponding searches, etc.). The result from this step consists of 83817 unique articles. For each item we collect the title, authors, year, publication venue, contribution type and citation count (from Google Scholar).

Preliminary Filtering and Ranking-Based Selection. In the filtering step we start improving the quality of our selection of papers. First, papers are automatically excluded based on publication venue, for those venues that are clearly irrelevant for topic reasons (e.g. meteorology). We also exclude based on the year of publication (year \({<}1990\)) as it precedes the advent of large-scale IT services. By doing so, we can exclude approximately 8000 elements.

Usually at this point, a full-text analysis would be performed on all the available papers to screen relevant contributions using the above cited selection rules. Although we partly filtered results, it is still not feasible to perform an exhaustive selection analysis, even as simple as filtering by title. It is also impractical to attempt an automated selection by content, as it is not clear how to perform an efficient, high-recall, high-precision text classification without supervision. Therefore, before proceeding with the rest of the search and selection steps, we apply a ranking procedure on these intermediate results, so that we can prioritize investigation of more relevant papers. We apply the exclusion and inclusion rules of Sect. 2.4 to the papers examined in ranking order.

This approximate procedure however raises the question of when it is convenient to stop our selection and discard the remaining items. To solve this, we develop a new approach from our observations of ranked items. We base the method on the following assumption: a considerable ratio of relevant papers can be identified by ranking and selecting top results using different relevance criteria (conference, position index in the query result set, number of hits in all queries, etc.), but in this sorting scenario we also observe a long-tail distribution of relevant documents, i.e. some relevant papers appear in the last positions even after sorting with our relevance heuristics (see Fig. 1). This is coherent with the known impossibility of performing exhaustive systematic literature reviews and mapping studies, as completing the long tail provides less results at the expense of a larger research effort. We assume the ratio of relevant papers in the long tail to be constant and comparable in magnitude to the number of relevant papers when sampled randomly from the result set. Based on this assumption, we proceed as follows:

  • We start screening all papers in the result set, ranked according to different relevance heuristics (e.g. number of hits in queries), and we observe the ratio of relevant papers identified over time;

  • We examine the same papers in random order, and measure the same ratio;

  • When the two ratios are comparable, we assert we reached the tail of the distribution of relevant papers and stop examining and selecting new papers.

As sorting criteria, we use the number of hits in the search performed in the previous step, as well as other more complex heuristics, taking into account the index position in result sets and the number of citations. When examining a paper, we look into the full content to identify concepts related to our selection criteria previously illustrated. As done previously with search results, we gather relevant papers in one unique group. Using this stopping criterion, we conclude this selection step when we have identified 430 relevant papers.

Fig. 1.
figure 1

Estimated relevance probability for collected papers (y-axis), as a function of the index in the result set (x-axis, in thousands), with paper arranged: (a) in random order (b) using a relevance heuristic based on search hits. We can observe how, thanks to the heuristic (b), the majority of relevant papers can be identified by examining only a small fraction of the set (the top results on the left side).

2.5 Additional Search Techniques

The “early stopping” criterion previously described, while allowing a feasible and comprehensive selection strategy across thousands of contributions, has a natural tendency towards discarding relevant papers. We also expect to miss other relevant papers, not present in the initial set of 83817, because they were not identified by our database search. To cover for these limitations, we apply other search techniques in addition to database search. Differing from before, we here apply our selection criteria exhaustively for each document retrieved.

Reference Search. For each of the 430 relevant papers identified in the previous step, we search inside their cited references. In particular, we adopt backward snowball sampling [18]: we include in our relevant set all papers previously cited by a relevant paper whenever they fulfill the selection requirements mentioned above. By doing so, we obtain 631 relevant elements, for a total of 1061.

Conference Search. Reference search allows to identify prominent contributions frequently mentioned by other authors. A drawback is the introduction of bias towards specific research groups and authors. We also observed how reference search rewards specific tasks and research fields as they are typically more cited. We therefore apply other search techniques to compensate for these facts. We perform a manual search by inspecting papers published in relevant conferences. These relevant conferences are identified via correlation with other relevant papers and have also been confirmed by experts in the field. We look at the latest 3 editions of each conference, in an effort to compensate the sampling of dated papers performed by reference search. We obtain 5 more papers with this method.

Table 2. Sample k-shingles with relevance probability (and total occurrences).

Iterative Search Improvement. To conclude our search, we attempt at improving our initial guess on IT Operations keywords via analysis of the available text content (text and abstract). Using our relevant paper set as positive samples, we perform a statistical analysis to identify k-shingles (sets of k consecutive tokens) that appear often in relevant documents (Table 2). In particular, we measure the document relevance probability given the set of shingles observed in the available text content. We choose \(k=1, \dots , 5\). We use these shingles as keywords to construct new queries along with previously used AI keywords. We here limit the collection to 20 results per query. Thanks to this step, we identify 20 new relevant papers. As a by-product, we get in contact with frequently cited concepts and keywords in AIOps, later useful for taxonomy and classification.

2.6 Data Extraction and Categorization

After obtaining the result set of relevant papers (counting 1086 contributions), we analyze the available information to draw quantitative results and answer our research questions. We describe here the data extraction process and the analysis techniques employed to gather insights and trends for the AIOps field.

First, we classify the relevant papers according to target components and data sources. Target components indicate a high-level piece of software or hardware in an IT system that the document tries to enhance (e.g. hard drive for hard disk failure prediction). We group components in five high-level categories: code, application, hardware, network and datacenter. Data sources provide an indication of the input information of the algorithm (such as logs, metrics, or execution traces). Data sources are categorized in source code, testing resources, system metrics, key performance indicators (KPIs), network traffic, topology, incident reports, logs and traces. the “AI Method” axis denotes the actual algorithm employed, with similar methods aggregated in bigger classes to avoid excessive fragmentation (e.g. ‘clustering’ may contain both k-means and agglomerative hierarchical clustering approaches). Table 3 presents a selection of papers from the result set with the corresponding target, source and category annotation.

Table 3. Selection of result papers grouped by data sources, targets and (sub)categories.

Then, we use the result set to infer a taxonomy based on tasks and target goals. The taxonomy is depicted in Fig. 2. We divide in AIOps contributions in failure management (FM), the study on how to deal with undesired behavior in the delivery of IT services; and resource provisioning, the study of allocation of energetic, computational, storage and time resources for the optimal delivery of IT services. Within each of these macro-areas, we further distinguish approaches in categories based on the similarity of goals. In failure management, these categories are failure prevention, online failure prediction, failure detection, root cause analysis (RCA) and remediation. In resource provisioning, we divide contributions in resource consolidation, scheduling, power management, service composition, and workload estimation. We further choose to expand our analysis of FM (red box of Fig. 2) by applying for this macro-area an additional subcategorization based on specific problems. Examples of subcategories are checkpointing for failure prevention, or fault localization for root cause analysis (see also Table 3).

Fig. 2.
figure 2

Taxonomy of AIOps as observed in the identified contributions

Fig. 3.
figure 3

Left: distribution of AIOps papers in macro-areas and categories. Right: percent distribution of failure management papers by category in corresponding sub-categories.

Fig. 4.
figure 4

Published papers in AIOps by year and categories from the described taxonomy.

3 Results

We now discuss the results of our mapping study. We first analyze the distribution of papers in our taxonomy. The left side of Fig. 3 visualizes the distribution of identified papers by macro-area and category. Excluding papers treating AIOps in general (8), we observe that more the majority of items (670, 62.1%) are associated with failure management (FM), with most contributions concentrated in online failure prediction (26.4%), failure detection (33.7%), and root cause analysis (26.7%); the remaining resource provisioning papers support in large part resource consolidation, scheduling and workload prediction. On the right side, we can observe that the most common problems in FM are software defect prediction, system failure prediction, anomaly detection, fault localization and root cause diagnosis. To analyze temporal trends present inside the AIOps field, we measured the number of publications in each category by year of publication. The corresponding bar plot is depicted in Fig. 4. Overall, we observe a large, on-growing number of publications in AIOps. We can observe how failure detection has gained particular traction in recent years (71 publications for the 2018–2019 period) with a contribution size larger than the entire resource provisioning macro-area (69 publications in the same time frame). Failure detection is followed by root cause analysis (39) and online failure prediction (34), while failure prevention and remediation are the areas with the smallest number of attested contributions (11 and 5, respectively).

4 Conclusion

In this paper, we presented our contribution towards better structuring the AIOps field. We planned and conducted a systematic mapping study by means of pre-established formulation, search, selection, and categorization techniques, thanks to which we collected more than 1000 contributions and grouped into several categories thanks to our proposed taxonomy, and differing substantially in terms of goals, data sources and target components. In our result section, we have shown how the majority of papers address failures in different forms. From a time perspective, we observed a generalized on-growing research interest, espcially for tasks such as anomaly detection and root cause analysis.