1 Introduction

With the rapid development of cloud computing technology, a substantial amount of multivariate time series data is generated in microservices stored system (Di Francesco et al., 2017). The microservice structure involves creating multiple applications that can work interdependently. It creates a streamlined delivery pipeline that decomposes the application into multiple small services, speeding up development and maintenance and providing greater flexibility. The microservice’s distributed nature allows for its high scalability properties.

In real-world applications, a critical task is to detect anomalies in multivariate time series data in microservice systems. Diverse anomalies would be produced, such as memory leaks, network delays, and high CPU usage between the cooperations of microservice components. Currently, most anomaly detection methods aim at one-class anomalies (Wen et al., 2022; Chen et al., 2022; Song et al., 2023). Compared with traditional anomaly detection methods, multi-classification based on diverse anomalies in microservices architecture is becoming more complex. Convolutional neural networks (CNN) are used in general time series classification methods. But CNNs may not perform well since they cannot capture spatial information completely or lack attention to the correlations between convolutional channels in feature extraction (Fauvel et al., 2021). The deep learning network GDN (Deng & Hooi, 2021) uses graphs to model multivariate time series spatial features but does not consider temporal features. Multivariable time series data has become a typical data type, but for multivariable streams, both time dependence and correlation between observations should be considered. Thus, how to extract both spatial and temporal features from monitoring data is the challenge of multivariate time series anomaly detection.

The attention mechanism plays an important role in deep neural networks. Because attention gives the model the ability to discriminate, the machine will pay attention to the information that is more critical to the current task in a lot of information (Woo et al., 2018). The attention mechanism increases the performance of extracting diverse features and makes the neural network models more flexible.

An anomaly will propagate with the connection among microservices and eventually affect the whole system performance if the faults cannot be located in time. Log-based methods (Yang et al., 2021) detect and locate bugs based on log parsing. Even though discovering more informational causes, they are hard to work in real time and require abnormal information in log files. Thus, efficiently diagnosing the runtime system fault and being able to identify and locate it is a great challenge after anomaly detection. The graph structure provides the idea for fault diagnosis. We use a graph model and localize root causes with an algorithm similar to the random walk.

The main motivation is to accurately detect anomalies in the multivariate time series data from the monitoring data in microservices scenarios. And our main research questions are as follows: (1) How to improve the accuracy of anomalous multi-classification in microservice system? Specifically, microservice data collected from monitors are generally stored as multivariate time series; (2) when anomalies occur in the monitoring data, how to effectively identify the dimensions that have the greatest impact on generating the anomaly, to better locate the root cause? We approach a method to classify the various anomaly events in monitoring microservice data in the cloud and identify the abnormal time series that are most likely to be the causes of each anomaly in system. The proposed PCAC model includes two parts: anomaly detection and fault localization, as shown in Fig. 1. In the part of anomaly detection, there are two modules: feature capturing and anomaly multi-classify. First, we construct a convolutional structure, which contains two branches in parallel. To capture the association features in microservice data, one branch extracts the channel features with attention, and another extracts the spatial feature with attention. Then, based on both features, anomaly multi-classification is achieved. Another part of fault localization includes anomalous graph and causal inference modules. In this part, we use causal inference methods to learn the fault propagation paths generated by graph methods.

Fig. 1
figure 1

PCAC module structure

The main contributions of this work are summarized as follows:

  • To address the difficulty of extracting spatial and temporal features from multivariate time series, we designed a parallel convolution architecture, which can better capture the spatio-temporal dependencies of multivariate time series simultaneously to achieve better anomaly detection for microservice system.

  • To solve the problem of incomplete feature extraction in ordinary CNNs, we propose a method with channel and spatial attention mechanisms to extract features in subnetworks independently and reduce the loss of feature representations.

  • To effectively determine the fault cause after anomaly detection, we analyze and compare several causal inference-based cause localization methods to identify the specific fault service.

  • We conduct experiments on eight state-of-the-art baseline methods on six public microservice datasets, with a 37.9% improvement in average macro-F1 and a 4.4% improvement in average micro-F1 scores, respectively.

Section 2 reviews different anomaly detection methods for microservice systems. Section 3 introduces the proposed model in detail. Section 4 evaluates the effectiveness of the model through comparative experiments and ablation experiments, analyzes the abnormal cause, and diagnoses the fault service. Section 5 summarizes the work and presents potential future research.

2 Related work

The study of anomaly detection has been carried out for several decades and is an active research area gaining increasing attention in deep learning. At the same time, many anomaly multi-classification methods for microservice system are proposed. We mainly review the related work on statistical model-based, machine learning-based, deep learning-based, and root cause localization methods.

2.1 Statics-based method

Generalized autoregressive conditional heteroskedasticity (GARCH) (Engle, 1982) is a method for modeling the volatility caused by conditional mean and conditional heteroscedasticity of monitoring microservice metrics. It calculates each point’s anomaly score, clusters the points, and then detects multiple categories of anomalies in the microservice system. Principal component analysis (PCA) (Shyu et al., 2003) extracts data features by dimensionality reduction and achieves anomaly classification on the low-dimensional data.

2.2 Machine learning-based methods

Support vector machine (SVM) (Kriegel et al., 2011) is a binary classification model. Multiple binary classifiers will be constructed in the microservice system to obtain the predicted probabilities by comparing each classifier in the testing set. The goal is to find a hyperplane with a maximum margin that can separate points of different classes. K-nearest neighbor (KNN) algorithm (Kiss et al., 2014) predicts the label of data by the labels of its K-nearest pre-marked neighbors. A decision tree (Lewis, 2000) is a classification model that formulates the features and anomaly labels that indicate the relations among the data points.

2.3 Deep learning-based method

Autoencoder (AE) (Fan et al., 2018; Xin et al., 2023) is a common neural network model that consists of an encoder and a decoder. The encoder extracts features by defining the neural architecture, and the decoder is similar to the encoder, to convert the encoding back to the original data. After training AE, the encoded features are used to train a classifier for anomaly classification. CNN (convolutional neural network) (LeCun et al., 1998) extracts microservice system monitoring metrics features by setting convolution kernels based on sliding windows and performs classification using the cross-entropy function. FCN (fully convolutional network) (Long et al., 2017) replaces fully connected layers with convolutional layers based on the convolutional neural network. LSTM (long short-term memory) (Graves & Graves, 2012) is a variation of RNN (recurrent neural network), which mitigates the problem of exploding gradients to some extent. The output of LSTM will go through a fully connected layer and softmax function to convert it into the probability distribution of each category. TapNet (Zhang et al., 2020; Xu et al., 2022) uses LSTM and CNN stacking to model microservice system monitoring metrics and classify them via softmax. MTEXCNN (Assaf et al., 2019) employs three cascaded 2D convolutions to extract spatial information, followed by a 1D convolution to extract temporal information as well as classify microservice system monitoring metrics. TranAD (Tuli et al., 2022) utilizes transformer-based adversarial training to detect anomalies, while GDN (Deng & Hooi, 2021) employs graph structure learning to capture the relationships between different sensors. Both TranAD and GDN are relatively novel methods for anomaly detection, demonstrating excellent performance. We apply the same modifications as those used for LSTM mentioned above to adapt these two models for multi-class anomaly classification, enabling a comparison with our proposed model.

2.4 Root cause localization method based on causal inference

The dependencies between services in a microservice application may cause the propagation of faults. Root cause localization helps our anomaly multi-classification models diagnose the source for anomalies and find the most fundamental reason for their occurrence. Based on the fault propagation paths, graphs methods to locate the root cause of fault are developed. For example, AutoMap (Deng & Hooi, 2020) treats the different components in the system as individual nodes, and their interdependencies form a graph, and then finds the root cause based on the PC (Spirtes et al., 2000) algorithm and PageRank (Page et al., 1999) algorithm. Causeinfer (Chen et al., 2016) uses the PC algorithm to build a causal graph and then uses Breadth First Search (BFS) to infer the root cause of the causal graph. MicroDiag (Wu et al., 2021) uses linear non-Gaussian acyclic model (LiNGAM) (Hyvärinen et al., 2010) to learn the fault propagation relationship between microservices, build a fault propagation graph, and use PageRank to perform root cause localization on the propagation graph. The above methods ignore the capture of fault patterns of entity measurement data. However, some faults in the measurement data related to entities during a system fault may affect the final root cause localization results (Dongjie et al., 2023). Thus, capturing the fault patterns of measurement data in root cause localization and improving localization accuracy become challenges.

3 Method

3.1 Overall architecture

The architecture of the proposed PCAC is shown in Fig. 2. PCAC is composed of four modules: feature capturing (1), anomaly multi-classification (2), anomalous graph (3), and inferred causal (4). The input T of the model is system metrics of multivariate time series from microservice data monitoring. Firstly, it enters the feature capturing module (1), which consists of the channel attention branch and spatial attention branch. Two branches in parallel process data and a weighted feature map are generated. Module (1) solves the problem of incomplete feature extraction and gains spatio-temporal dependencies. Then, anomaly multi-classification is achieved through the operations of flattening and softmax on the attention map in module (2). Module (2) uses the cross-entropy to update the parameters to reduce the loss of features. Based on the multi-classification results, the root cause analysis is carried out by an anomalous graph generation in module (3). Finally, the probabilities of fault services are output in module (4) for cause location, effectively avoiding spreading among microservices.

Fig. 2
figure 2

Architecture of PCAC

3.2 Feature capturing: parallel convolution with attention

Data is input into the upper and lower branches simultaneously. In the upper branch, the data is passed through two one-dimensional convolution operations and corresponding activation functions before entering the channel attention function, which adjusts the importance of each channel in the model by learning attention weights and strengthens the ability to capture the correlation between multiple channels. In the lower branch, spatial attention branch, the data undergoes the same process as in the upper branch and then enters the spatial attention function, which adds weights to different positions to enable the model to focus better on critical local features in the system metrics. Finally, the feature maps output by the two branches are concatenated to obtain the final feature map.

Conv1D represents the one-dimensional convolution, and ReLU represents the activation function ReLU. Feature map represents the weighted feature map obtained by concatenating the features from the channel attention and spatial attention branches based on attention mechanism. For the given input feature tensor F, we compute the channel attention map \(M_c(F)\) and the spatial attention map \(M_s(F)\) at two separate branches, then compute the attention map M(F) as follows:

$$\begin{aligned} M(F)=M_c(F)+M_s(F) \end{aligned}$$
(1)

Channel attention

The process of channel attention based on attention mechanism (Fauvel et al., 2021) is shown in detail in Fig. 3. To aggregate the feature map in each channel, we take two global pools on the feature F and produce the channel attention feature \(M_c(F)\). As shown in Fig. 3, the channel attention mainly includes a shared multi-layer perceptron (MLP) network, a maximum pooling (MaxPool), and an average pooling (AvgPool).

Fig. 3
figure 3

Channel attention

Firstly, use MaxPool and AvgPool to extract feature information and input them to the shared MLP network to obtain the corresponding MaxPool_OUT and AvgPool_OUT. After that, the MaxPool_OUT and AvgPool_OUT are spliced and activated by the sigmoid function to obtain the attention score matrix. Finally, multiply the attention score matrix with the original input feature tensor F to get the channel attention map \(M_c(F)\). The calculation formula is as follows:

$$\begin{aligned} M_c(F)=F\times \sigma (MLP(AvgPool(F))+MLP(MaxPool(F))) \end{aligned}$$
(2)

where \(\sigma\) is the sigmoid function, and \(\times\) is the matrix multiplication. Each channel has an attention score because channel attention adds weight to feature information. Using the sigmoid function ensures that the scores of different channels are independent. The attention score matrix multiplies the original input, and then different weights are given according to the degree of importance. That means the original data is filtered and selected.

Spatial attention

The proposed model introduces a spatial attention mechanism (Fauvel et al., 2021) to enhance the ability to capture features in different spatial locations. As shown in Fig. 4, the spatial attention map \(M_s(F)\) is calculated as follows:

$$\begin{aligned} M_s(F)=F\times \sigma (Conv1D(ReLU(Conv1D(F)))) \end{aligned}$$
(3)

where \(\sigma\) is the softmax function. The spatial attention mechanism weights different features at different positions, considering the relationship between the score at each position and other positions. Using the softmax function ensures that the sum of the scores is 1, thereby ensuring global consistency.

Fig. 4
figure 4

Spatial attention

3.3 Anomaly multi-classification

The final attention map from parallel convention with attention module inputs the flatten and softmax function, respectively. Data is converted into one-dimension vectors through the flatten function and then output into probabilities p for multi-classification through the softmax function. That is, date is labeled as normal and abnormal. Furthermore, the abnormal data is predictably labeled as different types of anomalies based on probabilities, such as memory leak, network delay, or high CPU hog, which usually occurs during service invocation in the microservice system.

In the training phase, the loss is calculated using a cross-entropy loss function defined as Eq. 4. We use the calculated loss to perform parameter updating. Using the cross-entropy loss function as the optimization objective during training allows the model to continuously adjust its parameters during training to minimize the difference between the predicted probability and the actual label.

$$\begin{aligned} loss=-\frac{1}{n}\sum \limits _{i=0}^{n-1}\sum \limits _{j=0}^{m-1}y_{ij}log(p_{ij}) \end{aligned}$$
(4)

where n is the number of training samples, m is the number of classes, y represents the actual label, and p represents the probability of the label predicted by the model.

In the testing phase, the test data is processed the same way as in the training phase, but the model is not updated. Instead, the macro-F1 and micro-F1 scores of the model are calculated.

3.4 Fault localization

Once anomalies are detected, the fault location engine in the microservice system starts to trace the execution paths and then locate faulty services. The engine is composed of two main procedures: anomalous graph construction and causal inference. Fault localization procedure is as follows:

  • Step 1: Select a causal inference algorithm and construct a directed acyclic graph (DAG) with minimum loss information as anomalous graph G based on data after anomaly multi-classification.

  • Step 2: Use PageRank algorithm on G to compute the score of each anomalous node.

  • Output: Anomalous graph G and probability of each anomalous node.

We choose four common causal inference algorithms to construct causal graphs in order to find the best one to root causes analysis, including Peter-Clark (PC) (Spirtes et al., 2000) algorithm, Greedy Equivalence Search (GES) (Chickering & Boutilier, 2003) algorithm, and linear non-Gaussian acyclic model (LINGAM) (Hyvärinen et al., 2010) algorithm, in which ICA-LINGAM (Shimizu et al., 2006) and Direct-LINGAM (Shimizu et al., 2011) are included. In order to locate the faulty services, a graph centrality algorithm named PageRank (Page et al., 1999) is used on the anomalous graph and outputs the probability of each anomalous node. In the root causal inference phase, these probabilities serve as the basis to diagnose which microservice is most likely to cause faults.

4 Experiments

4.1 Datasets and experimental setup

Datasets

Sock ShopFootnote 1 is a widely used microservice benchmark designed to test and evaluate microservices technology. It consists of 13 microservices, in which we mainly choose the front-end, catalogue, users, orders, payment, and shipping. The microservice architecture of Sock Shop is shown in Fig. 5. The complex connections between them make the multi-classification task of our model more challenging for those multivariate time series data in the microservice system. We deploy the Sock Shop using Kubernetes on multiple virtual machines(VMs) in the cloud. The Kubernetes cluster includes one master node and three worker nodes. We deploy open-source monitoring and visualization tools PrometheusFootnote 2 and GrafanaFootnote 3 on the master node to monitor the application and collect data. Furthermore, we use the load generation tool LocustFootnote 4 on the master node to simulate workloads for the microservice application. All services of Sock Shop are deployed on the nodes allocated to different VMs automatically.

Fig. 5
figure 5

The microservice architecture of Sock Shop

To simulate realistic scenarios, we inject three types of anomalies into our experiment: CPU hog, memory leak, and network latency (Mariani et al., 2018; Chen et al., 2015). The PumbaFootnote 5 tool was utilized to simulate network failures, and Docker container resources were subjected to stress tests to induce anomalies. The duration of each anomaly ranged from 1 to 5 min, while the application ran normally for 10 to 30 min, after which the process was repeated for each anomaly at least five times. Data is collected in real time every 5 s, based on Prometheus configuration, including service-level and resource-level data. At the service level, the latency of each service is recorded. At the resource level, metrics related to container resources are collected, including CPU hog, memory leak, and network transmit bytes.

Table 1 shows the details of six microservice datasets, including the size of the training set and test set and the number of feature dimensions. The ratio of training and test size is seven to three. In addition, three types of anomaly proportions are represented.

Table 1 The details of datasets used in experiments

Metrics

We use marco-F1 and micro-F1 scores as evaluation indicators to verify the performance of the model and compare it to other baseline anomaly detection methods. Both marco-F1 and micro-F1 are commonly used to evaluate models in multi-classification scenarios. Macro-F1 is calculated by average precision and recall score regardless of the importance of the different classes. Micro-F1 is suitable for a dataset with an unbalanced multi-classification distribution.

Baseline methods

We compared different types of anomaly multi-classification models to validate the effectiveness of our model. These include (i) classical machine learning models GaussianNB, KNN, SVM, and SGD and (ii) the deep learning models CNN, DNN, LSTM, OmniAnomaly, transformer-based TranAD, and graph structure-based GDN. TranAD and GDN are relatively new anomaly detection models.

Experimental settings

All experiments are implemented in Python 3.7.11 and PyTorch 1.6.0 using a single NVIDIA GeForce 940MX (12 G) GPU, Intel (R) Core (TM) i7-7500U CPU @ 2.70GHz, and 12 G RAM. The convolutional kernel size in Conv1D is 3, 5, the number of epochs is 80, the batch size is 128, and the neural networks are optimized by the Adam optimizer, with the initial learning rate set to \(10^4\).

4.2 Main results

We compare PCAC with eight baseline methods on six microservice datasets in Table 2 and Fig. 6 in terms of macro-F1 and micro-F1. The best performance is bolded.

Table 2 Performance of baseline models and ours
Fig. 6
figure 6

Performance comparison

Table 2 shows that PCAC achieves the highest macro-F1 and micro-F1 scores overall in eight baseline methods on the six datasets. Furthermore, we provide the ranking of PCAC and all baseline methods on macro-F1. The micro-F1 specific ranking score differs slightly from the macro-F1, but the model ranking is the same. Our model exceeds the other methods from the ranking results, proving that our method is effective in multi-classify.

The details of anomaly detection results of our method are shown in Fig. 7. It can be seen that on catalogue, front-end, orders, payment, shipping, and users datasets from Fig. 7a–f, the detection error rate is only 1.95%, 3.18%, 2.78%, 1.59%, 2.86%, and 4.12%, respectively. Users dataset requires more accurate detection of CPU hog and memory leak. In summary, our method has demonstrated an average false alarm rate of only 2.75% on the six datasets, indicating its effectiveness for detecting the three types of anomalies and cascading them to achieve better performance in fault diagnosis.

Fig. 7
figure 7

Confusion matrix of anomaly detection results under the different microservices

4.3 Ablation study

To investigate the impact of each component branch on the PCAC performance, we repeat the experiments without channel attention or spatial attention successively on the six datasets.

Channel attention

The results of macro-F1 and micro-F1 in the experiment of PCAC and of PCAC without channel attention (CA) are shown in Fig. 8. The values of both macro-F1 and micro-F1 of PCAC without CA decrease on all datasets. Particularly, a decrease of 7.45% in macro-F1 and \(-\)2.10% in micro-F1 on the shipping dataset indicates that channel attention can increase the model’s attention to specific channels and improve performance.

Fig. 8
figure 8

Performance comparison between PCAC and PCAC without CA

Spatial attention

Similarly, as shown in Fig. 9, the performance of PCAC is better than PCAC without spatial attention (SA) on most datasets. The values of macro-F1 and micro-F1 decrease by 2.15% and 0.62% on the payment and front-end datasets, indicating that spatial attention is beneficial for capturing spatial information.

Fig. 9
figure 9

Performance comparison between PCAC and PCAC without SA

4.4 Parameter sensitivity analysis

Batch_size

This represents the size of the batch. Sensitivity analysis of batch size is helpful for hyperparameter adjustment. We applied different batch sizes on the catalogue datasets used in the experiment. The experimental results are shown in Table 3.

Table 3 Results of different batch_size on catalogue dataset

It can be seen from Table 3 that the Macro-F1 score is the best when batch_size is 128, but there is little difference in the Macro-F1 scores produced by other batch_sizes. It shows that the size of batch_size has little effect on the classification results of the model. However, the performance decreases when the batch_size is 256, while the performance is optimal when the batch_size is 128. Thus, the batch_size can be adjusted dynamically. In practice, we can choose different batch_sizes to make full use of computational resources without causing low computational efficiency.

Reduction_ratio

This controls the dimension reduction ratio of the fully connected layer in the channel attention module. For example, when reduction_ratio is 16, it means that the output dimension of the fully connected layer will be \(\nicefrac {1}{16}\) of the input dimension. We also performed an analysis on the catalogue dataset used in the experiment, and the experimental results are shown in Table 4.

Table 4 Results of different reduction_ratio on catalogue dataset

It can be seen from Table 4 that the macro-F1 score is highest when the reduction_ratio is 16, but there is not much difference with the macro-F1 scores generated by other reduction_ratio values. Therefore, similar to batch_size, we can choose different reduction_ratio values according to the size of the dataset dimensions to balance the model’s performance and computational cost.

Parameter sensitivity experiments show that the two parameters batch_size and reduction_ratio in the model are stable, and their values do not have much influence on the results of anomaly detection. Therefore, the model proposed in this paper has strong robustness, and the performance does not fluctuate greatly with the values of the two parameters. In practical applications, we can adjust the values of batch_size and reduction_ratio according to the computational cost consideration.

4.5 Fault location

In this subsection, we select the front-end dataset as a sample to complete the fault diagnosis based on the anomaly multi-classification results of our model. Firstly, we choose the common four algorithms, including PC, GES, ICA-LINGAM, and Direct-LINGAM, to find a directed acyclic graph corresponding to anomalies with minimum loss. Then, we use the PageRank algorithm to perform a random walk on the anomalous graph and calculate the probability of each anomaly node. Finally, based on the ranking of these probabilities, we analyze the most likely fault cause in the system.

The anomalous graphs generated by the Direct-LINGAM algorithm are shown in Table 5. The nodes in the figure represent 0 (front-end), 1 (user), 2 (catalogue), 3 (orders), 4 (carts), 5 (payment), and 6 (shipping), seven microservices.

Table 5 Anomalous graph

Different microservices may cause various anomalies. We aim to obtain the most fundamental microservice that caused the anomaly and diagnose the source, so we choose the PageRank algorithm to perform in anomalous propagation graphs in Table 5 to calculate the probabilities of an abnormality occurring in each node, and finally, we select the top 5 nodes as the most fundamental cause of the anomaly. The results of PageRank are shown in Table 6.

Table 6 Probabilities of anomalous nodes

The above table shows that different microservices may cause various anomalies, and the fraction of exceptions occurring for each microservice varies. For example, regarding CPU anomalies, the most likely are orders and payment; memory anomalies, the most likely are front-end and shipping; and latency anomalies, the most likely are carts and front-end.

In the network, most failures may not be caused by a single cause. Specially, in microservice systems, an exception in any service may lead to the failure of a series of related services, because these microservices call each other and affect each other. Table 4 describes the probability of failure for each microservice. For example, network latency in someplace of microservice systems may be caused by 4 (carts) service and 0 (front-end) service because the two services have a combined failure probability of 79%.

Metrics

To quantify the performance of each algorithm on a set of anomalies A, we use two wide metrics: PR@k and AVG@k. PR@k represents the probability that the top k results given by an algorithm include the real root cause. A higher PR@k score, especially for small values of k, represents the algorithm correctly and identifies the root cause according to each anomaly a. AVG@k evaluates the overall performance of a method by computing the average PR@k. They are defined as follows:

$$\begin{aligned} PR@k=\frac{1}{|A|}\sum _{a\in A}\frac{\sum _{i<k}R^a(i)\in V^a}{min(k,|V^a|)} \end{aligned}$$
(5)
$$\begin{aligned} AVG@k=\frac{1}{k}\sum \limits _{i\le j \le k}PR@j \end{aligned}$$
(6)

Let \(R^a(i)\) be the rank of each cause, and \(V^a\) be the set of the cause in a. This paper uses \(A=['cpu\;hog','memory\;leak','network\;latency']\). Furthermore, \(V^a\) is the real root cause, and \(R^a\) is the predicted root cause. We set k from 1 to 5 for PR@k and calculate AVG@5 as the average localization accuracy.

Table 7 shows the performance of cause locating three types of faults under different anomalous graphs.

Table 7 Performance of cause locating

It shows that the anomalous graph based on the Direct-LINGAM algorithm achieves the best in AVG@5 and effectively locates root causes in all three types of anomalies. The causal propagation graph obtained by the Direct-LINGAM algorithm can most accurately show the connection among different microservices. More other root cause algorithms would like to be investigated to improve the accuracy of fault diagnosis and be suitable for more diverse time series distributions.

5 Summary

Since multivariate time series data monitored in microservices can be occasionally and unexpectedly abnormal, it is necessary to classify the anomalies and analyze the root cause. This paper proposes an effective convolutional model with attention using a parallel structure to classify diversity anomalies and analyze the root cause from the classified anomaly data. Our model has better anomaly classification ability and achieved state-of-the-art results on a detailed set of empirical studies. For future research, there is hope to design an unsupervised model to address the challenge of label collection in microservices environments. Furthermore, we would like to explore the root cause of anomalies at a more granular level, not only at the service level but also based on the host and server. More generally, we will conduct in-depth work to increase the generalization and universality of the model in non-microservices environments in the future. This hopefully could be applied in non-microservices environments such as the Internet of Things (IoT) systems.