Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection

Aryal, Sunil; Ting, Kai Ming; Haffari, Gholamreza

doi:10.1007/978-3-319-31863-9_6

Sunil Aryal^16,17,
Kai Ming Ting¹⁷ &
Gholamreza Haffari¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9650))

Included in the following conference series:

Pacific-Asia Workshop on Intelligence and Security Informatics

1214 Accesses
10 Citations

Abstract

In this paper, we revisit the simple probabilistic approach of unsupervised anomaly detection by estimating multivariate probability as a product of univariate probabilities, assuming attributes are generated independently. We show that this simple traditional approach performs competitively to or better than five state-of-the-art unsupervised anomaly detection methods across a wide range of data sets from categorical, numeric or mixed domains. It is arguably the fastest anomaly detector. It is one order of magnitude faster than the fastest state-of-the-art method in high dimensional data sets.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Methodological Overview on Anomaly Detection

A survey of anomaly detection techniques

Article 16 February 2024

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Keywords

1 Introduction

Let database D be a collection of n data instances ($\mathbf{x}_1, \mathbf{x}_2,\cdots , \mathbf{x}_n$). Each instance $\mathbf{x}$ is a m-dimensional vector $\langle x_{1},x_{2},\cdots ,x_{m}\rangle $ where each component is either a numeric attribute $x_{i} \in \mathcal{R}$ ($\mathcal{R}$ is a real domain) or a categorical attribute $x_{i} \in \{v_{i_1},\cdots ,v_{i_w}\}$ (where $v_{i_j}$ is a label out of w possible labels for attribute $x_i$). The problem of anomaly detection is to identify anomalous instances which are significantly different from the majority of instances in the database.

In the literature, anomaly detection has two main approaches [1]: (i) the supervised approach classifies an instance in either anomaly or normal class by using a classification model trained on a labelled training set; (ii) the unsupervised approach trains an anomaly detector from unlabelled training data, and identifies anomalies based on their anomaly scores. In many real-world applications, labelled training data are difficult to obtain; and thus anomalies have to be identified using an unsupervised approach. In this paper, we focus on the unsupervised approach to anomaly detection.

As databases are currently growing rapidly in terms of volume and dimensionality, identifying anomalous patterns in massive databases is a challenging task. Traditional distance or density based unsupervised methods using nearest neighbours such as kNN [2] and LOF [3] are not applicable in large databases because of their high time complexity in the order of $O(n^2m)$. Recently, simpler and more efficient methods such as iForest [4] and the nearest neighbour in a small subsamples (Sp) [5] are proposed for numeric domains. In categorical domains, anomaly detection methods have been largely based on frequent patterns (e.g., FPOF [6] and COMPREX [7]) which do not scale up to high dimensionality and large data size.

In probabilistic approach, instances in low density region are considered as anomalies, i.e., anomalies have a low probability to be generated from the distribution of normal instances. The simplest efficient probabilistic approach of estimating multivariate probability as the product of univariate probabilities has been used for anomaly detection in some domains [1, 8]. The intuition behind this Simple univariate Probabilistic Anomaly Detector (we call it SPAD) is that an anomalous instance is significantly different from normal instances in a few attributes where it has low probability.

Despite its simplicity, effectiveness and efficiency, SPAD is not considered as a benchmark to compare the performance of recently proposed efficient unsupervised anomaly detectors [4–7]. They are shown to run orders of magnitude faster than traditional kNN based methods and produce better or competitive detection accuracy. But, it is not clear if they are more effective and efficient than the simplest traditional probabilistic method SPAD.

In this paper, we show that SPAD performs competitively to or better than all the state-of-the-art anomaly detection methods mentioned above in a wide range of 25 data sets from categorical only, numeric only, and mixed domains. It runs one order of magnitude faster than the simplest nearest neighbour anomaly detector Sp [5] in data sets with high dimensionality. In categorical domains, it runs up to five orders of magnitude faster than the existing state-of-the-art categorical based methods [6, 7].

The rest of the paper is organised as follows. SPAD and five widely used state-of-the-art anomaly detection methods are discussed in Sect. 2, followed by experimental evaluations in Sect. 3. The parametrized version of SPAD is described in Sect. 4, and the conclusions are provided in the last section.

2 Related Work

In this section, we review six unsupervised anomaly detectors including five widely used state-of-the-art anomaly detection methods and SPAD. The first three are designed primarily for numeric domains and the last three are mainly for categorical domains. The pertinent details of these methods are described in the following six subsections.

2.1 Local Outlier Factor (LOF)

Breunig et al. (2000) [3] proposed a method based on relative density of an instance with respect to its k-neighbourhood. Let $N^k(\mathbf{x})$ be the set of k-nearest neighbours of $\mathbf{x}$, $d(\mathbf{x}, \mathbf{y})$ be the distance between $\mathbf{x}$ and $\mathbf{y}$ and $d^k(\mathbf{x}, D)$ is the distance between $\mathbf{x}$ and its $k^{th}$-NN in D. The anomaly score of $\mathbf{x}$ is defined as follows:

$$\begin{aligned} s_{lof}(\mathbf{x}) = \frac{\displaystyle \sum _{\mathbf{y} \in N^k(\mathbf{x})}lrd(\mathbf{y})}{|N^k(\mathbf{x})|\times lrd(\mathbf{x})} \end{aligned}$$

(1)

where $lrd(\mathbf{x}) = \frac{|N^k(\mathbf{x})|}{\sum _{\mathbf{y} \in N^k(\mathbf{x})} \max \left( d^k(\mathbf{y}, D), d(\mathbf{x}, \mathbf{y})\right) }$.

It requires a distance measure to compute all pairwise distances between instances in D. Euclidean distance $\left( d_{euc}(\mathbf{x}, \mathbf{y}) = \sqrt{\sum _{a=1}^m (x_a-y_a)^2}\right) $ is the most widely used distance measure. In the case of categorical attribute a, $x_a-y_a = 0$ if $x_a=y_a$; and 1 otherwise. An alternative measure in categorical domains advocated by Boriah et al. [9] is Occurrence Frequency (OF): $d_{of}(\mathbf{x}, \mathbf{y}) = \frac{1}{m}\sum _{a=1}^m (x_a-y_a)$ where $x_a-y_a=0$ if $x_a=y_a$; and $x_a-y_a=\log \frac{n}{f(x_a)} \times \log \frac{n}{f(y_a)}$ otherwise; where $f(x_a)$ is the frequency of the categorical label $x_a$ in D.

2.2 Isolation Forest (iForest)

Instead of density, iForest [4] employs an isolation mechanism to isolate every instance in the given training set. This is done efficiently by random axis-parallel partitioning of the data space in a tree structure until every instance is isolated. A set of t trees is constructed, each tree $T_i$ is built using a subsample randomly selected from D. The anomaly score of an instance $\mathbf{x}$ is measured as the average path length over t trees as follows:

$$\begin{aligned} s_{iforest}(\mathbf{x}) = \frac{1}{t}\sum _{i=1}^t \ell _i(\mathbf{x}) \end{aligned}$$

(2)

where $\ell _i(\mathbf{x})$ is the path length of $\mathbf{x}$ in tree $T_i$.

The intuition is that anomalies are more susceptible to isolation. Isolation using trees yields that anomalies have shorter average path lengths than normal instances. iForest is designed for numeric domains only. In this paper, we show that it can be effective in categorical and mixed domains as well, by simply converting each categorical label into a binary {0, 1} attributes and treating them as numeric attributes (see the empirical evaluation in Sect. 3).

2.3 Sampling (Sp)

Instead of searching k-nearest neighbour in D, Sugiyama and Borgwardt (2013) [5] proposed to search the nearest neighbour (i.e., $k=1$) in a small random subsample $\mathcal{D}\subset D$ to detect anomalies. The anomaly score is the distance to the nearest neighbour in $\mathcal{D}$, defined as follows:

$$\begin{aligned} s_{sp}(\mathbf{x}) = \displaystyle \min _{\mathbf{y}\in \mathcal{D}} d(\mathbf{x},\mathbf{y}) \end{aligned}$$

(3)

They have shown that the method called Sp with a very small subsample ($\psi =20$) performs better than or competitive to LOF; but it runs several orders of magnitude faster. In [5], Sp is used only in numeric domains with the euclidean distance. In this paper, we evaluated Sp in categorical and mixed domains as well with the euclidean and Occurrence Frequency (OF) distance measures [9].

2.4 Frequent Pattern Outlier Factor (FPOF)

He et al. (2005) [6] proposed an anomaly detection method for categorical domains based on frequent patterns. It uses Apriori algorithm [10] to generate frequent itemsets of maximum size $\eta $ with minimum support threshold $\delta $ in D, denoted as $FPS(D, \eta , \delta )$. The score of an instance $\mathbf{x}$ is estimated as follows:

$$\begin{aligned} s_{fpof}(\mathbf{x}) = \frac{\displaystyle \sum _{z\subseteq \mathbf{x} \wedge z\in FPS(D,\eta ,\delta )} support(z)}{|FPS(D,\eta ,\delta )|} \end{aligned}$$

(4)

where z is a frequent itemset, $z\subseteq \mathbf{x}$ denotes that z is contained in $\mathbf{x}$, and support(z) is the support of z.

The intuition of Eq. 4 is that $\mathbf{x}$ is more likely to be an anomaly if it has a few or none of the frequent itemsets, i.e., the lower the score, the more likely $\mathbf{x}$ is an anomaly.

2.5 Pattern Based Compression Technique (COMPREX)

Recently, Akoglu et al. (2012) proposed a pattern-based compression technique called COMPREX for anomaly detection in categorical data [7]. It builds a collection of dictionaries (code tables) $CT_1, CT_2,\cdots ,CT_k$ which are learnt from data using disjoint subsets of highly correlated features based on information gain. The anomaly score of $\mathbf{x}$ is defined as the cost of encoding $\mathbf{x}$ using the code tables.

$$\begin{aligned} s_{cmprx}(\mathbf{x}) = \sum _{F\in \mathcal{P}}\sum _{p \subseteq \pi _F(\mathbf{x})} L(code(p)|CT_F) \end{aligned}$$

(5)

where $\mathcal{P}=\{F_1,F_2,\cdots ,F_k\}$ is a set of disjoint partitions of the feature set, p is a pattern, $\pi _F(\mathbf{x})$ is a projection of $\mathbf{x}$ into feature subset F, code(p) is a code word corresponding to p and $L(\cdot )$ is the length of a code word.

The intuition is that the higher the cost of encoding $\mathbf{x}$, the more likely it is to be an anomaly. Even though it produces better detection accuracy than other categorical methods, it is limited to low dimensional data sets because of its high time complexity.

2.6 Simple Probabilistic Method (SPAD)

In probabilistic approach, instance $\mathbf{x}$ is an anomaly if the probability of $\mathbf{x}$, $P(\mathbf{x})$, is low. An estimate of $P(\mathbf{x})$ requires a large amount of data (even in a moderate number of dimensions) which is usually infeasible in many applications. Assuming the attributes are independent of each other (e.g. naive Bayes [11]), it can be decomposed as:

$$\begin{aligned} \hat{P}(\mathbf{x}) = \prod _{i=1}^m \hat{P}(x_i) \end{aligned}$$

(6)

The one-dimensional $\hat{P}(x_i)$ can be estimated from D using the Laplace-corrected estimate as: $\hat{P}(x_i) = \frac{f(x_i)+1}{n+w_i}$, where $f(x_i)$ is the occurrence frequency of $x_i$ in D, and $w_i$ is the number of possible values of $x_i$. The same estimation can be used in numeric domains by converting numeric attributes into categorical attributes through discretisation^{Footnote 1}.

In order to avoid floating point overflow, instances are ranked using logarithm of $P(\mathbf{x})$ in Eq. 6, effectively using the summation of the logarithm of univariate probabilities $P(x_i)$ as:

$$\begin{aligned} s_{spad}(\mathbf{x}) = \sum _{i=1}^m \log {\hat{P}(x_i)} \end{aligned}$$

(7)

Instances are ranked in ascending order based on their probabilities — instances having low probability, which are ranked at the top, are likely to be anomalies. Note that SPAD is parameter-free, like COMPREX. Goldstein and Dengel (2012) used a similar score as defined in Eq. 7, though they have called it a histogram-based method and evaluated it using three numeric data sets only [8].

Compared with the commonly used frequent pattern based approach, SPAD shares two common features: they are all based on categorical domains and employ probability or frequency as the basis for detecting anomalies. Yet, SPAD is significantly simpler and requires no search; whereas the frequent pattern approach requires an expensive search.

SPAD is scalable to both high dimension and large data sets as it just needs to store the frequency count of every categorical label which can be done in a single pass of data in O(nm) time and requires O(mw) space (where w is the average number of labels in each dimension). Having constructed the frequency count table (in a preprocessing step), calculating the score of an instance just needs a table look-up which costs O(m). Hence, the total runtime complexity of ranking n instances is O(nm) which is cheaper than iForest (if $m<t\log _2\psi $ which is generally the case unless the dimensionality of the data is very high in the order of thousands); and it is $\psi $ times faster than Sp.

The time and space complexities of the above six anomaly detectors are presented in Table 1.

Table 1. Time and space complexities

Full size table

3 Empirical Evaluation

In this section, we present the evaluation results of the six anomaly detection methods discussed in Sect. 2: LOF-L2 (LOF using the euclidean distance), LOF-OF (LOF using the occurrence frequency based distance), iForest, Sp-L2 (Sp with the euclidean distance), Sp-OF (Sp with the occurrence frequency based distance), FPOF, COMPREX and SPAD.

Table 2. Characteristics of data sets. n: data size, m: # attributes, $m_{num}$: # numeric attributes, $m_{cat}$: # categorical attributes

Full size table

We used 25 benchmark data sets from UCI machine learning data repository [12] with categorical only, numeric only and mixed attributes. The data sets were from different domains ranging from health and medicine, text, email filtering to digit recognition. Many of the data sets used were from classification problems. We converted them to anomaly detection problems by considering some larger classes (1 or more) as normal and some smaller classes (1 or more) as anomalies. In the case where classes were more or less uniformly distributed, anomalies were random samples from classes which are not used as normal. They have different data characteristics in terms of data size ($366 \le n \le 5$ million), dimensionality ($5\le m \le 4670$) and anomaly proportion ($0.5\,\% \le p \le 35\,\%$). The characteristics of data sets used are given in Table 2.

Some preprocessing is required for some algorithms. For iForest which can handle numeric attributes only, each categorical label was converted into a binary attribute [13] (where 0 represents the absence of the label and 1 represents the presence); and all converted binary attributes are treated as numeric. For algorithms that can handle categorical attributes only, numeric attributes were converted into categorical using a modified version of equal width bins which divides the range [$\mu _j-3\sigma _j, \mu _j+3\sigma _j$] into b equal width bins (where $\mu _j$ and $\sigma _j$ are the mean and standard deviation of data values in dimension j). This version is used to reduce the distortion due to outliers. This discretisation technique with $b=5$ was used to discretise numeric attributes for all categorical methods: SPAD, LOF-OF, Sp-OF, FPOF and COMPREX.

Parameters in all the algorithms were set to the suggested values in respective papers. For LOF, k was set to a commonly used value of 10 [3, 5]. The parameter $\psi $ in Sp was set to 20 as suggested in [5]. Parameters $\eta $ and $\delta $ in FPOF were set to 5 and 0.1, respectively, as suggested in [6]. Parameters $t=100$ and $\psi =256$ were used in iForest, as suggested in [4].

All the algorithms were implemented in JAVA using the WEKA [13] platform, except COMPREX for which we used the MATLAB implementation provided by the authors [7]^{Footnote 2}. All the experiments were conducted in a Linux machine with 2.27 GHz processor and 8 GB memory.

We used area under the receiver operating curve (AUC) as the measure of anomaly detection performance. AUC = 1 if all the anomalies are at the top of the ranked list; and a randomly ranked list will yield AUC = 0.5. We conducted 10 runs for each of the randomised methods: iForest, Sp-L2 and Sp-OF; and reported the average AUC (and time). A significance test was conducted using confidence interval based on two standard errors over 10 runs. A win or loss between two algorithms is counted only if the difference is significant; otherwise it is a draw.

Table 3. Anomaly detection performance (AUC). The average rank, the number of data sets on which a method is among the top three performers, and win:loss:draw (w:l:d) counts of a method against SPAD are included in the last three rows.

Full size table

Table 4. Total runtime (in seconds).

Full size table

The AUC of all methods in the 25 data sets is provided in Table 3. Note that FPOF and COMPREX did not complete in data sets with high number of dimensions and/or large data size in 24 h.

The overall performance is summarised using the average rank, shown in the bottom three rows in Table 3. SPAD is the best method having an average rank of 1.8; and the closest contenders are COMPREX which has a rank of 2.3 (based on the result of 15 data sets only), and iForest which has a rank of 2.9.

In the last row of Table 3, the pairwise win:loss:draw (w:l:d) counts of contenders against SPAD clearly shows that LOF-L2, LOF-OF, FPOF, iForest, Sp-L2 and Sp-OF produced significantly worse AUC than SPAD (i.e., they have more losses than wins). COMPREX produced competitive detection accuracy in comparison with SPAD, where they have the same number of wins and losses on 15 data sets in which COMPREX completed.

SPAD produced the best (or equivalent to the best) AUC in 13 data sets followed by iForest (7), COMPREX (5), Sp-OF (3), Sp-L2 (2), FPOF (2), LOF-L2 (1) and LOF-OF (0). Note that SPAD is one of the top three performers in almost every data set we have used, shown in the second last row in Table 3. The only exception is Mnist, where SPAD is the fourth best performer. The two closest contenders are iForest and COMPREX which are one of the top three performers in 17 out of 25 data sets and 11 out of 15 data sets, respectively.

The total runtime, including preprocessing (discretisation or nominal to binary conversion), model building (if required), ranking n instances and computing AUC, is provided in Table 4. Note that the direct comparison of the runtime of COMPREX with other methods is not fair as it was implemented in MATLAB and others were implemented in JAVA. It is included in the table to provide an idea about its runtime.

SPAD was faster than all the contenders in all data sets, except in a few small data sets where it ran slight slower than Sp. Note that SPAD ran one order of magnitude faster than its closest contender Sp and iForest in the two highest dimensional data sets (Reuters and Newsgroup) which have more than 4000 attributes; and SPAD ran one to five orders of magnitude faster than LOF, FPOF and COMPREX in all data sets. The only exception is Lymph, compared with LOF-L2. Note that the runtime of LOF-L2 presented here are without using indexing scheme to expedite the k nearest neighbour search [2, 14, 15]. We did not use indexing because they are not effective as the number of dimensions increases and they do not work with non-metric distance such as OF.

It is interesting to note the longest time a method took in all these data sets: SPAD took less than 20 s to complete in Linkage, the largest data set having more than 5 million instances. Sp and iForest took about 50 s and 256 s, respectively, in the same data set. Ignoring the data sets in which they could not complete in the experiments, LOF took the longest, close to one day, in Covertype which has less than 300000 instances; COMPREX took about half a day in Linkage; and FPOF took about 22 h in Kddcup99 which has 41 attributes and less than 65000 instances. Note also that COMPREX and FPOF would take more than a day to complete even for small data sets such as Isolete, Mfeat and Arrhythmia which have less than 1000 instances and 700 attributes.

4 Detecting Anomalies Which Rely on Multiple Attributes

Despite its strong assumption of attribute independence, SPAD produced superior performance than other state-of-the-art anomaly detectors in many data sets. But, it has a limitation in detecting anomalies which rely on multiple attributes as it does not capture the relationship between attributes. An example is shown in Fig. 1 where anomalies are not different from normal instances in any single attribute; but they exhibit outlierness only if both attributes are examined.

In order to handle such a situation, we propose a parametrized model called SPAD$_r$, in which data in the original m-dimensional space ${\mathcal X}$ are embedded into a new s-dimensional space ${\mathcal Z}$. Each dimension (or attribute) in ${\mathcal Z}$ is a product set of r dimensions (or attributes) in ${\mathcal X}$. The instances can be ranked using Eq. 7 in the new space ${\mathcal Z}$. In the new space, we are able to inspect data distribution along various combinations of the original attributes to identify those outliers which are otherwise difficult to identify.

To guarantee that all possible r attributes are represented in ${\mathcal Z}$, a set S of ${m\atopwithdelims ()r}$ combinations is required, which is intractable even for moderate values of m and r. Instead, we randomly generate a subset of attribute combinations ${\mathcal S}\subset S$ $(|{\mathcal S}|=s\ll {m\atopwithdelims ()r})$ as follows: A random order of m attributes (in ${\mathcal X}$) is generated. Then m sets of new attributes are formed from the ordered attributes through a moving window of size r (the attributes ordering is considered as a circular sequence). As an example, for $r=2$ and a random ordering [$x_3,x_1,x_2$] of three attributes in ${\mathcal X}$, three new attributes ($z_1,z_2,z_3$) are formed as [($x_3\wedge x_1$), ($x_1\wedge x_2$), ($x_2 \wedge x_3$)]. The above is repeated t times to produce $s=mt$ new attributes, where t is chosen such that $s \ll {m\atopwithdelims ()r}$. Note that the simplest version with $r=1$ and $t=1$ is the SPAD used in Sect. 3.

SPAD$_r$ has the advantage that the size of $\mathcal{S}$ is linear in t and m; and each attribute is used exactly r times. The time complexity for SPAD$_r$ is O(nmtr) and the space complexity is $O(mtw^r)$ since each attribute in ${\mathcal Z}$ has $w^r$ labels.

If anomaly $\mathbf{x}$ relies on r attributes to be identified, and as long as one of the attributes in ${\mathcal Z}$ represents the r attributes in ${\mathcal X}$, then it can be detected by SPAD$_r$ because it has low probability in that particular attribute in ${\mathcal Z}$.

In a data set, the appropriate setting of r in SPAD$_r$ depends on the number of attributes required to detect anomalies. Figure 2 shows three examples where $r=1$ is enough in Sick; $r=10$ is required in Mushroom; and $r=5$ is the best in Mfeat. Though we have used $t=1$ in these examples, we found that in some cases, $t>1$ can further improve the AUC of SPAD$_r$.

One can see some similarity of SPAD$_r$ with subspace anomaly detection methods such as HICS [16]. The fundamental difference is that SPAD$_r$ avoids the expensive search in HICS to find subspaces by considering random subspaces; and SPAD$_r$ runs significantly faster than those complex subspace search based methods.

5 Conclusions

We show that the simple parameter-free anomaly detection method based on univariate probabilities is the fastest method and works as effective as, if not better than, five state-of-the-art anomaly detection methods. Its anomaly detection performance is consistent across different domains – categorical only, numeric only and mixed domains. It is the method of choice in big data and data streams, as far as we know.

The parametrized version can be used to further improve the detection performance of the parameter-free version in data sets where the detection of anomalies relies on multiple attributes.

Notes

1.
Though $\hat{P}(\cdot )$ can be estimated directly in numeric domains, it is a lot easier and faster to do it in categorical domains.
2.
http://www3.cs.stonybrook.edu/~leman/pubs.html.

References

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15: 1–15: 58 (2009)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 427–438 (2000)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Liu, F., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of the Eighth IEEE International Conference on Data Mining, (ICDM), pp. 413–422 (2008)
Google Scholar
Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, United States, pp. 467–475 (2013)
Google Scholar
He, Z., Xu, X., Huang, J.Z., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)
Article Google Scholar
Akoglu, L., Tong, H., Vreeken, J., Faloutsos, C.: Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM), pp. 415–424 (2012)
Google Scholar
Goldstein, M., Dengel, A.: Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. In: Proceedings of the 35th German Conference on Artificial Intelligence (KI-2012), pp. 59–63 (2012)
Google Scholar
Chandola, V., Boriah, S., Kumar, V.: Similarity measures for categorical data: a comparative study. Technical report TR 07–022, Department of Computer Science and Engineering, University of Minnesota, USA (2007)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)
MATH Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Article Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2003)
Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104 (2006)
Google Scholar
Keller, F., Mller, E., Bhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of ICDE, pp. 1037–1048. IEEE Computer Society (2012)
Google Scholar

Download references

Acknowledgments

We would like to thank Prof. Takashi Washio for providing very useful comments and suggestions. We are thankful to the anonymous reviewers for their critical comments to improve the quality of the paper.

Author information

Authors and Affiliations

Clayton School of Information Technology, Monash University, Victoria, Australia
Sunil Aryal & Gholamreza Haffari
School of Engineering and Information Technology, Federation University, Victoria, Australia
Sunil Aryal & Kai Ming Ting

Authors

Sunil Aryal
View author publications
You can also search for this author in PubMed Google Scholar
Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Haffari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunil Aryal .

Editor information

Editors and Affiliations

The University of Hong Kong, Hong Kong, Hong Kong
Michael Chau
Virginia Tech, Blacksburg, Virginia, USA
G. Alan Wang
The University of Arizona, Tucson, Arizona, USA
Hsinchun Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aryal, S., Ting, K.M., Haffari, G. (2016). Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection. In: Chau, M., Wang, G., Chen, H. (eds) Intelligence and Security Informatics. PAISI 2016. Lecture Notes in Computer Science(), vol 9650. Springer, Cham. https://doi.org/10.1007/978-3-319-31863-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-31863-9_6
Published: 29 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31862-2
Online ISBN: 978-3-319-31863-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection

Abstract

Similar content being viewed by others