Abstract
We consider the problem of simultaneous sparse coding and anomaly detection in a collection of data vectors. The majority of the data vectors are assumed to conform with a sparse representation model, whereas the anomaly is caused by an unknown subset of the data vectors—the outliers—which significantly deviate from this model. The proposed approach utilizes the Alternating Direction Method of Multipliers (ADMM) to recover simultaneously the sparse representations and the outliers components for the entire collection. This approach provides a unified solution both for jointly sparse and independently sparse data vectors. We demonstrate the usefulness of the proposed approach for irregular heartbeats detection in Electrocardiogram (ECG) as well as for specular reflectance and shadows removal from natural images.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Anomaly detection is the problem of detecting patterns that significantly deviate from an expected model. This problem has numerous applications such as fraud detection for banking and businesses, intrusion detection for network security, fault detection for production systems, health problems detection for biomedical systems and more, see [1] for a review. In this paper we assume that the expected behavior of the data vectors is to conform with a sparse representation model [2], and address the problem of simultaneous sparse coding and anomaly detection. This problem can be applied to three different tasks: 1) anomaly detection within sparsely represented data vectors. 2) removal of interference from sparsely represented data vectors. 3) dictionary learning in the presence of outliers. In this paper we address the first two tasks, and the latter is beyond the scope of this work.
Related work
Joint-sparse coding was addressed by [3, 4] for cases in which all data vectors are contaminated by either a sparse or a sparsely-represented interference. Anomaly detection in video was addressed by [5] which proposed a sparse reconstruction cost to measure the normality of events, with respect to a dictionary with various spatio-temporal structures. This problem was addressed also by [6], which combined online dictionary learning with an objective function that measures the normality of events. The work of [7] utilized sparse representations to analyze stochastic processes over graphs for anomaly detection in SmartGrids.
Contributions
The contributions of this paper are two-fold: 1) A unified formulation for the problem of simultaneous sparse coding and anomaly detection is provided for jointly sparse as well as for independently sparse data vectors; and a numerical solver is provided for both cases. 2) the proposed approach is demonstrated to detect irregular heartbeats in ECG, and remove specular reflections and shadows from natural images.
Organization
Section 2 reviews sparse representations, Section 3 formulates the problem, Section 4 explains the proposed approach, and Section 5 demonstrates its performance.
2 Sparse Representation Modeling
Sparse representation modeling [2] assumes that a signal (data vector) \(\mathbf {y}\in \mathbb {R}^{N}\) can be described as y≈D x, where \(\mathbf {D} \in \mathbb {R}^{N\times M}\) is a dictionary and \(\mathbf {x}\in \mathbb {R}^{M}\) is sparse. Therefore, y is represented by a linear combination of few columns (atoms) of D. The recovery of the sparse representation, termed sparse coding, can be obtained by solving the following problem:
where ∥x∥0 is the l 0 pseudo-norm that counts the number of non-zero entries of x, and T 0 is the maximum number of non-zero entries. Problem (1) can be augmented for a collection of signals:
where \(\mathbf {Y}\in \mathbb {R}^{N\times L}\) contains L signals \(\{\mathbf {y}_{i}\in \mathbb {R}^{N}\}^{L}_{i=1}\), \(\mathbf {X}\in \mathbb {R}^{M\times L}\) contains L sparse representations \(\{\mathbf {x}_{i}\in \mathbb {R}^{M}\}^{L}_{i=1}\) and ∥X∥0 counts the number of non-zero entries of X. This type of model is referred to as the Single Measurement Vector (SMV), since each signal is assumed to be a single measurement associated with a unique non-zero pattern of its sparse representation (i.e. a unique combination of atoms). The case in which all the sparse representations share the same non-zero pattern is referred to as the Multiple Measurement Vector (MMV) [8] or joint-sparsity model, as illustrated in Fig. 1. For the MMV case, the following optimization problem recovers more accurately the sparse representations, by exploiting the joint-sparsity property:
where \(\left \|\mathbf {X}\right \|_{0,2}={\sum }_{j}{\mathcal {I}(\|\mathbf {X}(j,:)\|_{2})}\) counts the number of non-zero rows, X(j,:) is the j-th row of X and \(\mathcal {I}\) is the indicator function:
Note that problems (1)–(3) are NP-hard and their solutions can be approximated using convex relaxations: the l 1 norm \(\left \|\mathbf {x}\right \|_{1}={\sum }_{i}{|x_{i}|}\) often replaces ∥x∥0, and the l 1,p norm \(\left \|\mathbf {X}\right \|_{1,p}={\sum }_{j}{\|\mathbf {X}(j,:)\|_{p}}\) often replaces ∥X∥0 with p=1 and ∥X∥0,2 with p=2.
3 Problem Formulation
Let \(\mathbf {Y}\in \mathbb {R}^{N\times L}\) be a collection of signals that are well approximated by a sparse representations model, excluding a small number of signals—the outliers—which significantly deviate from this model. The collection Y is described as follows:
where D is assumed known, E has few non-zero columns that equal to the deviation of each outlier from the sparse representations model, and V is a low-energy noise component (\(\|\mathbf {V}\|_{F}^{2}\) is small compared to \(\|\mathbf {Y}\|_{F}^{2}\)).
Our objective is to detect the outliers in Y and recover the sparse representations. For the SMV case this objective can be obtained by solving the following problem:
where \(\left \|\mathbf {E}\right \|_{2,0}={\sum }_{i}{\mathcal {I}(\|\mathbf {E}(:,i)\|_{2})}\) counts the number of non-zero columns in E, E(:,i) is the i-th column of E, and K 0 is the maximum number of non-zero columns (i.e. outliers). Problem (5) encourages a solution in which X is sparse, however, for the outliers that cannot be represented exclusively by D, it permits non-zero columns in E. For the MMV case the objective can be obtained by solving the following problem:
where the constraints ensure at most T 0 non-zero rows in X and at most K 0 non-zero columns in E.
4 The Proposed Approach
The solutions to problemsFootnote 1 (5) and (6) can be approximated by solving the following unconstrained convex problem:Footnote 2
where p=1 for the SMV case, p=2 for the MMV case and α,β are a small positive scalars. In addition, \(\left \|\mathbf {E}\right \|_{2,1}={\sum }_{i}{\|\mathbf {E}(:,i)\|_{2}}\) is the l 2,1 norm which provides a convex relaxation to ∥E∥2,0, and was applied in [9, 10] for robust non-negative matrix factorization. We propose to solve problem (7) with the Alternating Direction Method of Multipliers (ADMM) [11] due to the following reasons: (i) it is suitable for our problem format, (ii) it has proven convergence properties, and (iii) it leads to a simple, coordinate-descent structure. In the following we describe the ADMM method and its application to our problem.
4.1 Alternating Direction Method of Multipliers
ADMM is a numerical method for solving problems of the following form:
where X,Z,A,B,C are matrices, and the objective function is either separable f(X,Z)=g(X)+h(Z) or bi-convex. ADMM solves (8) by minimizing its Augmented-Lagrangian:
where M is a Lagrange multiplier and μ is a penalty coefficient that controls the penalty level of deviation from the equality constraint. The minimization of \(\mathcal {L}_{A}(\mathbf {X},\mathbf {Z},\mu ,\mathbf {M})\) is performed iteratively, while alternating between the minimizations of X and Z:
ADMM can be extended to more than two variables, and its convergenceFootnote 3 properties are analyzed in [11].
4.2 Sparse Coding with Anomaly Detection
In order to apply ADMM to solve problem (7), we add an auxiliary variable Z and an equality constraint as follows:
Note that by converting Eq. (7) into a constrained problem, we have decoupled the first and second terms of Eq. (7), thus, avoiding the need for an iterated-shrinkage [17] solution for X. The addition of the auxiliary variable Z results in a closed-form solution for X and a one-shot shrinkage solution for Z. The Augmented-Lagrangian of problem (17) is given by:
The main stages of the ADMM-based solution are summarized in Algorithm 1, and in the following we describe the update equations of X k+1,Z k+1,E k+1. The update equation of X k+1 is closed-form (and derived in the Appendix):
The update equation of Z k+1 for the SMV case is obtained from:
which can be simplified to:
where \(\mathbf {P}=\mathbf {X}^{k+1}-\frac {1}{\mu ^{k}}\mathbf {M}^{k}\) and \(\gamma =\frac {\alpha }{\mu ^{k}}\). The solution to problem (18) is the element-wise soft thresholding operator [11]:
where:
The update equation of Z k+1 for the MMV case is given by:
which results in a row-shrinkage operator (as proved in [4]):
where P(j,:) is the j-th row of P.
The update equation of E k+1 is obtained from:
which results in a column-shrinkage operator (similar to the derivation of Eq. (21)):
where Q=Y−D X k+1 and Q(:,i) is the i-th column of Q.
4.3 Computational Complexity
The complexity of the proposed approach depends linearly on the number of signals L and polynomially on the number of atoms M. The complexity of a single ADMM iteration is given by O(M 3)+O(M N L)+O(M L)+O(M 2 L)+O(M L)+O(N L), where the leading four terms comprise the complexity of the update step of X k+1, the fifth term is the complexity of the update equation of Z k+1, and the sixth term is the complexity of the update equation of E k+1. Figure 2 depicts the measured complexity of the proposed approach (SMV mode) for a varying number of signals between L=50,000 to L=500,000 (using N=32 and M=128), demonstrating the linear dependence on the number of signals .
5 Performance Evaluation
The purpose of this section is to show the usefulness of the proposed approach, by demonstratingFootnote 4 it on two very different real life problems: The SMV mode of Algorithm 1 is utilized to detect irregular heartbeats in ECG signal; and the MMV mode of Algorithm 1 is utilized for the image processing task of specular reflectance and shadows removal from natural images. The simulations were performed on an i7 quad-core computer with 8GB of RAM memory.
5.1 Arrhythmia Detection in ECG Signals
Irregular heartbeats, know as Arrhythmia, is a collection of several types of abnormal cardiac electrical activity. Arrhythmia is detected by analyzing ECG, which is a non-invasive technique for monitoring cardiac electrical activity. The durations of ECG recordings often reach 24 h, which promoted research efforts for automatic Arrhythmia detection algorithms, see for example [12, 13]. In this experiment we focused on the detection of one type of Arrhythmia event: a Premature Ventricular Contraction (PVC), which is demonstrated in Fig. 3. Sparse representations have been proposed by [14] for ECG source separation problems, which motivated the utilization of the proposed approach for Arrhythmia detection: given an ECG signal that contains mostly normal heartbeats, the key idea is to decompose the signal into all possible N-samples windows (on the order of 1 s duration) and train a dictionary that will provide a sparse representation for these windows. Note that due to the multiplicity and periodicity of normal heartbeats, their corresponding windows are highly repetitive, and constitute the majority among all windows. The dictionary is expected to enable an accurate sparse representation for the windows that correspond to normal heartbeats, due to their high contribution to the training stage. However, the windows that correspond to Arrhythmia events are not expected to be accurately represented by this dictionary, due to their significant deviation from the normal heartbeats waveforms and their low contribution to the training stage (due to rareness of such events). Therefore, a possible strategy for Arrhythmia detection is to solve problem (7) for the SMV case, since each window is expected to be sparsely represent by a different combination of dictionary atoms, and mark columns of E with an l 2-norm above a threshold τ as irregular heartbeats locations.
We validated our approach using the MIT-BIH Arrhythmia Database [15] that contains a collection of 30 min fully annotated ECG recordings, sampled at 360 Hz. We analyzed ECG recording #109, which includes Footnote 5 40 PVC events and 2492 regular heartbeats, by extracting all possible 256-samples windows, leading to initial signal collection dimensions of 256×647,745. Due to normal sinus rhythm variations in this recording between 77 to 101, this collection was divided into six segments of 5 min that were processed independently: the dimension of all windows in a segment was reduced from 256 to 32 by projection onto the 32 leading PCA basis vectors of the segment, and an over-complete dictionary \(\mathbf {D}\in \mathbb R^{32\times 128}\) was trained using the K-SVD [16] algorithm for each segment,Footnote 6 as demonstrated in Fig. 4. The SMV mode of Algorithm 1was employed for each segment with the following parameters: μ 0=1.0,ρ=1.25,α=1.0,β=2.6,𝜖=0.0025, and Arrhythmia events were detected as column in E with l 2-norm above a threshold τ=0.1 (the processing time for the 30 min recording was 176 s). Figure 5 depicts ∥E(:,i)∥2 for the first 15 min (formed by concatenation of the results from the first 3 segments), demonstrating accurate matching between most of the non-zero l 2-norm columns and the ground truth annotations of this recording. Due to the randomness of the initial dictionary used in the K-SVD algorithm, the entire experiment was repeated 10 times, resulting in an average of 97.18 % true positive detections with standard deviation 1.89 %, and an average of 2.82 % false negatives with standard deviation 1.89 %. Additional 13 non-PVC events were detected on average, which corresponded to noise and waveform distortions. In order to demonstrate the effectiveness of the proposed approach, we repeated the same experiment using independent sparse coding per each ECG window: we employed the Orthogonal Matching Pursuit (OMP) algorithm to reconstruct each dimensionality-reduced window using the trained dictionary (with a fixed number of atoms). A window was marked as an outlier if its reconstruction error exceeded a threshold τ O M P . We set the threshold value to achieve (approximately) the same true positive rate of the proposed approach, and obtained the following result: an average (over 10 experiments) of 98.46 % true positive detections with standard deviation 1.79 %, and 55.2 non-PVC events. Namely, the OMP-based approach resulted in a significantly higher false positive rate, compared to the proposed approach.
5.2 Specular Reflectance and Shadows Removal from Natural Images
The reflection of light from surfaces is associated with two main components [18]: diffuse and specular. The diffuse component scatters light uniformly in all directions, whereas the specular component scatters light in a direction that depends on the angles of incident light and the surface normal. Light energy due to specular reflections is often concentrated, causing strong bright regions (highlights) to appear in the image, as demonstrated in Fig. 7 (left column). These bright regions can cause computer vision algorithms such as segmentation, shape from shading, stereo, and motion detection to produce errors. Therefore, there has been significant interest in specular reflectance removal, see [19] for a review. According to Phong shading model [18], the intensity of the diffuse component at image pixel k is given by:
where λ is the wavelength (color), L(λ) is the intensity profile of incident light, ρ(λ) is the albedo, \(\hat {\mathbf {n}}=[n_{x},n_{y},n_{z}]^{T}\) is the surface normal, and \(\hat {\mathbf {v}}=[v_{x},v_{y},v_{z}]^{T}\) is a unit vector pointing to the direction of incident light. Equation (23) is interpreted as follows: The measured intensity is given by the product of the source intensity, the albedo and the cosine of the angle 𝜃 i between the surface normal and direction of incident light. In the case that |𝜃 i |>π/2 the intensity equals zero, which results in a self-shadowFootnote 7 effect. By column-stacking m pixels, and neglecting the self shadowing effect (i.e. allowing |𝜃 i |>π/2), the following matrix formulation is obtained:
where
Given a collection of K images of a diffuse object, photographed from the same view-point and under varying light source directions, the following rank-3 model is obtained:
where
Therefore, the diffuse component can be modeled by a low-dimensional subspace, and the works [20, 21] proved that the dimension of this subspace is upper bounded by 9. The basis for this subspace can be computed from the PCA basis of the images. However, specular components and shadows are not represented by this subspace. Therefore, we can solve problem (7) with the MMV mode, in order to decompose the images Y (each column of Y is one column-stacked image) into diffuse components D X, and specular components and shadows E as follows: the diffuse components of all images are expected to be jointly-sparse with respect to the PCA basis D, whereas the specular components and shadows are assumed to appear in a subset of the images, thus, by minimizing ∥E∥2,1 the columns of E would contain those parts of the images that do not conform with the joint-sparsity model. In our experiment we used a collection of 37 images (195×317 pixels) of a wrist watch, photographed from the same view-point and using 37 different illumination conditions. We computed the PCA basis of \(\mathbf {Y}\in \mathbb {R}^{61,815\times 37}\) and used it as the dictionary D. We further employed Algorithm 1 and set p=2,α=4.5,β=0.5,μ=0.05,ρ=1.15,𝜖=10−10. Figure 6 presents convergence of the algorithm within 35 iterations (processing time was 22 s) to a joint-sparsity pattern with 3 non-zero rows (a 3-dimensional subspace). Figure 7 presents specular reflectance removal results (best viewed in the electronic version of this paper) for five images: the obtained diffuse components (equal to D X(:,l), where l is the corresponding index of each input image) are free of specular reflections and the shadows are significantly removed.
For the case of color images, the low-dimensional subspace model still holds since for a certain image pixel, only the albedo and incident light intensity are color (wavelength) dependent. Using an RGB color representation we obtain the following model:
Given a collection of K color images of a diffuse object, photographed from the same view-point and under varying light source directions, a rank-3 model is obtained, similar to Eq. (26). Figure 8 demonstrates specular reflectance and shadows removal results for color images: the obtained diffuse components are free of specular reflections and the shadows are significantly removed.
6 Conclusions
Sparse coding and anomaly detection are important tasks, with numerous signal processing applications. This paper presented a unified approach for simultaneous sparse coding and anomaly detection for both jointly-sparse and independently-sparse signal models. The usefulness of the proposed approach was demonstrated for two challenging real-life problems: Arrhythmia detection in ECGs and specular reflectance removal from natural images. Due to the constantly growing number of signals that are well modeled by sparse representations, the proposed approach could be combined in many existing and emerging applications.
Notes
The observant reader may notice that problem (5) is actually separable, implying that we can solve for each column of X independently from the others. Nevertheless, we choose in this paper a joint solver for two reasons: (i) Giving a unified view of the two problems (5 and 6); and (ii) Our approach loses nothing in terms of complexity nor elegance when compared to the independent sparse coding tasks.
Note that the related problem of a sparsely represented interference matrix E=ΨC, where Ψ is a dictionary and C is sparse, can be formulated as follows:
$$\begin{array}{*{20}l} \min\limits_{\mathbf{X},\mathbf{C}} \frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-{\Psi} \mathbf{C}\|_{F}^{2}+ \alpha\|\mathbf{X}\|_{1,p} + \beta\|\mathbf{C}\|_{1,1} , \end{array} $$and its solution can be also obtained using the proposed approach in this section.
All the results in this paper are reproducible with a MATLAB package that is freely available for distribution.
A total of six dictionaries were trained—each using all of the dimensionality-reduced samples from each segment.
This is in contrast to cast-shadows, where one part of an object is shadowed by another part.
References
Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), 15:1–15:58.
Bruckstein, A.M., Donoho, D., Elad, M. (2009). From sparse solutions of systems of equations to sparse modeling of signals and images. Society for Industrial and Applied Mathematics Review, 51(1), 34–81.
Nguyen, N.H., Nasrabadi, N.M., Tran, T.D. (2011). Robust multi-sensor classification via joint sparse representation. In International conference on information fusion (FUSION).
Shekhar, S., Patel, V.M., Nasrabadi, M., Chellappa, R. (2012). Joint sparsity-based robust multimodal biometrics recognition. In The 12th European conference on computer vision (ECCV).
Cong, Y., Yuan, J., Liu, J. (2011). Sparse reconstruction cost for abnormal event detection. In Computer vision and pattern recognition (CVPR).
Zhao, B., Fei-Fei, L., Xing, E.P. (2011). Online detection of unusual events in videos via dynamic sparse coding. In Computer vision and pattern recognition (CVPR).
Levorato, M., & Mitra, U. (2012). Fast anomaly detection in smartgrids via sparse approximation theory. In Sensor array and multichannel signal processing workshop (SAM).
Cotter, S.F., Rao, B.D., Engan, K., Kreutz-Delgado, K. (2005). Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Transactions on Signal Processing, 53(7), 2477–2488.
Kong, D., Ding, C., Huang, H. (2011). Robust nonnegative matrix factorization using l21-norm. In The 20th ACM international conference on information and knowledge management(CIKM).
Dobigeon, N., & Févotte, C. (2013). Robust nonnegative matrix factorization for nonlinear unmixing of hyperspectral images. In The 5th workshop on hyperspectral image and signal processing: evolution in remote sensing (WHISPERS).
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.
Ince, T., Kiranyaz, S., Gabbouj, M. (2009). A generic and robust system for automated patient-specific classification of ecg signals. IEEE Transactions on Biomedical Engineering, 56(5), 1415–1426.
Ye, C., Kumar, B.V., Coimbra, M.T. (2012). Heartbeat classification using morphological and dynamic features of ecg signals. IEEE Transactions on Biomedical Engineering, 59(10), 2930–2941.
Mailhe, B., Gribonval, R., Bimbot, F., Lemay, M., Vandergheynst, P., Vesin, J. M. (2009). Dictionary learning for the sparse modelling of atrial fibrillation in ecg signals. In Internatioanl conference on acoustics, speech and signal processing (ICASSP).
Moody, G.B., & Mark, R.G. (2001). The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.
Aharon, M., Elad, M., Bruckstein, A.M. (2006). K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Image Science.
Szeliski, R. (2010). Computer vision: algorithms and applications. New York: Springer.
Artusi, A., Banterle, F., Chetverikov, D. (2011). A survey of specularity removal methods. Computer Graphics Forum, 30(11).
Basri, R., & Jacobs, D.W. (2003). Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2).
Shashua, A. (1997). On photometric issues in 3d visual recognition from a single 2d image. International Journal of Computer Vision, 21(1–2).
Author information
Authors and Affiliations
Corresponding author
Additional information
A. Adler is the recipient of the 2011 Google European Doctoral Fellowship in Multimedia, and this research is partly supported by this Google Fellowship and partly by ERC Grant agreement no. 320649.
Appendix
Appendix
The update equation for X k+1 is obtained by solving:
The solution of Eq. (29) is computed from:
which results in:
and the update equation is given by:
Rights and permissions
About this article
Cite this article
Adler, A., Elad, M., Hel-Or, Y. et al. Sparse Coding with Anomaly Detection. J Sign Process Syst 79, 179–188 (2015). https://doi.org/10.1007/s11265-014-0913-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0913-0