Sparse Coding with Anomaly Detection

Adler, Amir; Elad, Michael; Hel-Or, Yacov; Rivlin, Ehud

doi:10.1007/s11265-014-0913-0

Sparse Coding with Anomaly Detection

Published: 12 July 2014

Volume 79, pages 179–188, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Signal Processing Systems Aims and scope Submit manuscript

Sparse Coding with Anomaly Detection

Download PDF

Amir Adler¹,
Michael Elad¹,
Yacov Hel-Or² &
…
Ehud Rivlin¹

1751 Accesses
66 Citations
Explore all metrics

Abstract

We consider the problem of simultaneous sparse coding and anomaly detection in a collection of data vectors. The majority of the data vectors are assumed to conform with a sparse representation model, whereas the anomaly is caused by an unknown subset of the data vectors—the outliers—which significantly deviate from this model. The proposed approach utilizes the Alternating Direction Method of Multipliers (ADMM) to recover simultaneously the sparse representations and the outliers components for the entire collection. This approach provides a unified solution both for jointly sparse and independently sparse data vectors. We demonstrate the usefulness of the proposed approach for irregular heartbeats detection in Electrocardiogram (ECG) as well as for specular reflectance and shadows removal from natural images.

Sparse Representation Based Anomalies Detection in Electrocardiography Signals

Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms

Article 12 October 2018

Adaptive Approximation Algorithms for Sparse Data Representation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Anomaly detection is the problem of detecting patterns that significantly deviate from an expected model. This problem has numerous applications such as fraud detection for banking and businesses, intrusion detection for network security, fault detection for production systems, health problems detection for biomedical systems and more, see [1] for a review. In this paper we assume that the expected behavior of the data vectors is to conform with a sparse representation model [2], and address the problem of simultaneous sparse coding and anomaly detection. This problem can be applied to three different tasks: 1) anomaly detection within sparsely represented data vectors. 2) removal of interference from sparsely represented data vectors. 3) dictionary learning in the presence of outliers. In this paper we address the first two tasks, and the latter is beyond the scope of this work.

Related work

Joint-sparse coding was addressed by [3, 4] for cases in which all data vectors are contaminated by either a sparse or a sparsely-represented interference. Anomaly detection in video was addressed by [5] which proposed a sparse reconstruction cost to measure the normality of events, with respect to a dictionary with various spatio-temporal structures. This problem was addressed also by [6], which combined online dictionary learning with an objective function that measures the normality of events. The work of [7] utilized sparse representations to analyze stochastic processes over graphs for anomaly detection in SmartGrids.

Contributions

The contributions of this paper are two-fold: 1) A unified formulation for the problem of simultaneous sparse coding and anomaly detection is provided for jointly sparse as well as for independently sparse data vectors; and a numerical solver is provided for both cases. 2) the proposed approach is demonstrated to detect irregular heartbeats in ECG, and remove specular reflections and shadows from natural images.

Organization

Section 2 reviews sparse representations, Section 3 formulates the problem, Section 4 explains the proposed approach, and Section 5 demonstrates its performance.

2 Sparse Representation Modeling

Sparse representation modeling [2] assumes that a signal (data vector) $\mathbf {y}\in \mathbb {R}^{N}$ can be described as y≈D x, where $\mathbf {D} \in \mathbb {R}^{N\times M}$ is a dictionary and $\mathbf {x}\in \mathbb {R}^{M}$ is sparse. Therefore, y is represented by a linear combination of few columns (atoms) of D. The recovery of the sparse representation, termed sparse coding, can be obtained by solving the following problem:

$$\begin{array}{*{20}l} \hat{\mathbf{x}} & = \arg\min\limits_{\mathbf{x}} \|\mathbf{y} - \mathbf{D} \mathbf{x}\|_{2}^{2} \text{ s.t. }~ \left\|\mathbf{x}\right\|_{0} \le T_{0}, \end{array} $$

(1)

where ∥x∥₀ is the l ₀ pseudo-norm that counts the number of non-zero entries of x, and T ₀ is the maximum number of non-zero entries. Problem (1) can be augmented for a collection of signals:

$$\begin{array}{*{20}l} \hat{\mathbf{X}} & = \arg\min\limits_{\mathbf{X}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} \|_{F}^{2} \text{ s.t. }~ \left\|\mathbf{X}\right\|_{0} \le LT_{0}, \end{array} $$

(2)

where $\mathbf {Y}\in \mathbb {R}^{N\times L}$ contains L signals $\{\mathbf {y}_{i}\in \mathbb {R}^{N}\}^{L}_{i=1}$, $\mathbf {X}\in \mathbb {R}^{M\times L}$ contains L sparse representations $\{\mathbf {x}_{i}\in \mathbb {R}^{M}\}^{L}_{i=1}$ and ∥X∥₀ counts the number of non-zero entries of X. This type of model is referred to as the Single Measurement Vector (SMV), since each signal is assumed to be a single measurement associated with a unique non-zero pattern of its sparse representation (i.e. a unique combination of atoms). The case in which all the sparse representations share the same non-zero pattern is referred to as the Multiple Measurement Vector (MMV) [8] or joint-sparsity model, as illustrated in Fig. 1. For the MMV case, the following optimization problem recovers more accurately the sparse representations, by exploiting the joint-sparsity property:

$$\begin{array}{*{20}l} \hat{\mathbf{X}} & = \arg\min\limits_{\mathbf{X}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} \|_{F}^{2} \text{ s.t. }~ \left\|\mathbf{X}\right\|_{0,2} \le T_{0}, \end{array} $$

(3)

where $\left \|\mathbf {X}\right \|_{0,2}={\sum }_{j}{\mathcal {I}(\|\mathbf {X}(j,:)\|_{2})}$ counts the number of non-zero rows, X(j,:) is the j-th row of X and $\mathcal {I}$ is the indicator function:

$$\mathcal{I}(a) = \left\{\begin{array}{ll} 1 & \quad\text{if}~ |a| > 0\\ 0 & \quad\text{otherwise} \end{array}\right.. $$

Note that problems (1)–(3) are NP-hard and their solutions can be approximated using convex relaxations: the l ₁ norm $\left \|\mathbf {x}\right \|_{1}={\sum }_{i}{|x_{i}|}$ often replaces ∥x∥₀, and the l _1,p norm $\left \|\mathbf {X}\right \|_{1,p}={\sum }_{j}{\|\mathbf {X}(j,:)\|_{p}}$ often replaces ∥X∥₀ with p=1 and ∥X∥_0,2 with p=2.

3 Problem Formulation

Let $\mathbf {Y}\in \mathbb {R}^{N\times L}$ be a collection of signals that are well approximated by a sparse representations model, excluding a small number of signals—the outliers—which significantly deviate from this model. The collection Y is described as follows:

$$\begin{array}{*{20}l} \mathbf{Y}=\mathbf{D} \mathbf{X}+ \mathbf{E} + \mathbf{V}, \end{array} $$

(4)

where D is assumed known, E has few non-zero columns that equal to the deviation of each outlier from the sparse representations model, and V is a low-energy noise component ($\|\mathbf {V}\|_{F}^{2}$ is small compared to $\|\mathbf {Y}\|_{F}^{2}$).

Our objective is to detect the outliers in Y and recover the sparse representations. For the SMV case this objective can be obtained by solving the following problem:

$$\begin{array}{*{20}l} \{\mathbf{X},\mathbf{E}\}=\arg &\min\limits_{\mathbf{X},\mathbf{E}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} -\mathbf{E} \|_{F}^{2} \\ \text{s.t. } & \|\mathbf{X}\|_{0} \le LT_{0} \\ & \|\mathbf{E}\|_{2,0} \le K_{0}, \end{array} $$

(5)

where $\left \|\mathbf {E}\right \|_{2,0}={\sum }_{i}{\mathcal {I}(\|\mathbf {E}(:,i)\|_{2})}$ counts the number of non-zero columns in E, E(:,i) is the i-th column of E, and K ₀ is the maximum number of non-zero columns (i.e. outliers). Problem (5) encourages a solution in which X is sparse, however, for the outliers that cannot be represented exclusively by D, it permits non-zero columns in E. For the MMV case the objective can be obtained by solving the following problem:

$$\begin{array}{*{20}l} \{\mathbf{X},\mathbf{E}\}=\arg &\min\limits_{\mathbf{X},\mathbf{E}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} -\mathbf{E} \|_{F}^{2} \\ \text{s.t. } & \|\mathbf{X}\|_{0,2} \le T_{0} \\ & \|\mathbf{E}\|_{2,0} \le K_{0}, \end{array} $$

(6)

where the constraints ensure at most T ₀ non-zero rows in X and at most K ₀ non-zero columns in E.

4 The Proposed Approach

The solutions to problems^{Footnote 1} (5) and (6) can be approximated by solving the following unconstrained convex problem:^{Footnote 2}

$$ \min\limits_{\mathbf{X},\mathbf{E}} \frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}\|_{F}^{2}+ \alpha\|\mathbf{X}\|_{1,p} + \beta\|\mathbf{E}\|_{2,1} $$

(7)

where p=1 for the SMV case, p=2 for the MMV case and α,β are a small positive scalars. In addition, $\left \|\mathbf {E}\right \|_{2,1}={\sum }_{i}{\|\mathbf {E}(:,i)\|_{2}}$ is the l _2,1 norm which provides a convex relaxation to ∥E∥_2,0, and was applied in [9, 10] for robust non-negative matrix factorization. We propose to solve problem (7) with the Alternating Direction Method of Multipliers (ADMM) [11] due to the following reasons: (i) it is suitable for our problem format, (ii) it has proven convergence properties, and (iii) it leads to a simple, coordinate-descent structure. In the following we describe the ADMM method and its application to our problem.

4.1 Alternating Direction Method of Multipliers

ADMM is a numerical method for solving problems of the following form:

$$\begin{array}{*{20}l} \min\limits_{\mathbf{X},\mathbf{Z}}&\;f(\mathbf{X},\mathbf{Z}) \text{ s.t. }~\mathbf{A}\mathbf{X}+\mathbf{B}\mathbf{Z}=\mathbf{C}, \end{array} $$

(8)

where X,Z,A,B,C are matrices, and the objective function is either separable f(X,Z)=g(X)+h(Z) or bi-convex. ADMM solves (8) by minimizing its Augmented-Lagrangian:

$$\begin{array}{@{}rcl@{}} \mathcal{L}_{A}(\mathbf{X},\mathbf{Z},\mu,\mathbf{M})&=&f(\mathbf{X},\mathbf{Z}) + <\mathbf{M},\mathbf{A}\mathbf{X}+\mathbf{B}\mathbf{Z}-\mathbf{C}>\\ && + \frac{\mu}{2}\|\mathbf{A}\mathbf{X}+\mathbf{B}\mathbf{Z}-\mathbf{C}\|^{2}_{F}, \end{array} $$

(9)

where M is a Lagrange multiplier and μ is a penalty coefficient that controls the penalty level of deviation from the equality constraint. The minimization of $\mathcal {L}_{A}(\mathbf {X},\mathbf {Z},\mu ,\mathbf {M})$ is performed iteratively, while alternating between the minimizations of X and Z:

$$\begin{array}{@{}rcl@{}} \mathbf{X}^{k+1}&=&\arg \min\limits_{\mathbf{X}} \mathcal {L}_{A}(\mathbf{X},\mathbf{Z}^{k},\mu^{k},\mathbf{M}^{k})\\ \end{array} $$

(10)

$$\begin{array}{@{}rcl@{}} \mathbf{Z}^{k+1}&=&\arg \min\limits_{\mathbf{Z}} \mathcal {L}_{A}(\mathbf{X}^{k+1},\mathbf{Z},\mu^{k},\mathbf{M}^{k})\\ \end{array} $$

(11)

$$\begin{array}{@{}rcl@{}} \mathbf{M}^{k+1}&=&\mathbf{M}^{k}+\mu^{k}(\mathbf{A}\mathbf{X}^{k+1}+\mathbf{B}\mathbf{Z}^{k+1}-\mathbf{C})\\ \end{array} $$

(12)

$$\begin{array}{@{}rcl@{}} {\mu}^{k+1}&=&{\rho}{\mu}^{k}; \rho >1. \end{array} $$

(13)

ADMM can be extended to more than two variables, and its convergence^{Footnote 3} properties are analyzed in [11].

4.2 Sparse Coding with Anomaly Detection

In order to apply ADMM to solve problem (7), we add an auxiliary variable Z and an equality constraint as follows:

$$\begin{array}{*{20}l} \min\limits_{\mathbf{X},\mathbf{E},\mathbf{Z}} &\frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}\|_{F}^{2}+ \alpha\|\mathbf{Z}\|_{1,p} + \beta\|\mathbf{E}\|_{2,1} \text{ s.t. }~ \mathbf{Z}=\mathbf{X}. \end{array} $$

(14)

Note that by converting Eq. (7) into a constrained problem, we have decoupled the first and second terms of Eq. (7), thus, avoiding the need for an iterated-shrinkage [17] solution for X. The addition of the auxiliary variable Z results in a closed-form solution for X and a one-shot shrinkage solution for Z. The Augmented-Lagrangian of problem (17) is given by:

$$\begin{array}{@{}rcl@{}} \mathcal L_{p}(\mathbf{X},\mathbf{Z},\mathbf{E},\mathbf{M},\mu)& =& \frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}\|_{F}^{2} + \alpha\|\mathbf{Z}\|_{1,p} \\ && + \beta\|\mathbf{E}\|_{2,1} + <\mathbf{M},\mathbf{Z}-\mathbf{X}> \\ &&+ \frac{\mu}{2}\|\mathbf{Z}-\mathbf{X}\|^{2}_{F}. \end{array} $$

(15)

The main stages of the ADMM-based solution are summarized in Algorithm 1, and in the following we describe the update equations of X ^k+1,Z ^k+1,E ^k+1. The update equation of X ^k+1 is closed-form (and derived in the Appendix):

$$\begin{array}{*{20}l} \mathbf{X}^{k+1} = \left(\mathbf{D}^{T}\mathbf{D}+\mu^{k}\mathbf{I}\right)^{-1}\left(\mathbf{D}^{T}\left(\mathbf{Y}-\mathbf{E}^{k}\right) +\mathbf{M}^{k}+\mu^{k} \mathbf{Z}^{k}\right). \end{array} $$

(16)

The update equation of Z ^k+1 for the SMV case is obtained from:

$$\begin{array}{@{}rcl@{}} \mathbf{Z}^{k+1} &=& \arg \min\limits_{\mathbf{Z}} \alpha\|\mathbf{Z}\|_{1,1} + <\mathbf{M}^{k},\mathbf{Z}-\mathbf{X}^{k+1}> \\ && +\frac{\mu^{k}}{2}\left\|\mathbf{Z}-\mathbf{X}^{k+1}\right\|^{2}_{F}, \end{array} $$

(17)

which can be simplified to:

$$\begin{array}{*{20}l} & \mathbf{Z}^{k+1} =\arg \min\limits_{\mathbf{Z}} \frac{1}{2}\left\|\mathbf{P}-\mathbf{Z}\right\|^{2}_{F} + \gamma\|\mathbf{Z}\|_{1,1} , \end{array} $$

(18)

where $\mathbf {P}=\mathbf {X}^{k+1}-\frac {1}{\mu ^{k}}\mathbf {M}^{k}$ and $\gamma =\frac {\alpha }{\mu ^{k}}$. The solution to problem (18) is the element-wise soft thresholding operator [11]:

$$\begin{array}{*{20}l} & \mathbf{Z}^{k+1} = \mathcal S_{\gamma}(\mathbf{P}), \end{array} $$

(19)

where:

$$\mathcal S_{\gamma}(a) = \left\{\begin{array}{ll} a-\gamma & \;\text{if}~ a > \gamma \\ 0 & \;\text{if}~ |a|\le \gamma\\ a+\gamma & \;\text{if}~ a<-\gamma \end{array}\right.. $$

The update equation of Z ^k+1 for the MMV case is given by:

$$\begin{array}{*{20}l} & \mathbf{Z}^{k+1} =\arg \min\limits_{\mathbf{Z}} \frac{1}{2}\|\mathbf{P}-\mathbf{Z}\|^{2}_{F} + \gamma\|\mathbf{Z}\|_{1,2}, \end{array} $$

(20)

which results in a row-shrinkage operator (as proved in [4]):

$$ \mathbf{Z}^{k+1}(j,:) = \left\{\begin{array}{ll} \frac{\left\|\mathbf{P}(j,:)\right\|_{2}-\gamma}{\left\|\mathbf{P}(j,:)\right\|_{2}}\mathbf{P}(j,:) & \quad\text{if}~ \gamma < \|\mathbf{P}(j,:)\|_{2}\\ 0 & \quad\text{otherwise} \end{array}\right., $$

(21)

where P(j,:) is the j-th row of P.

Table 1

Full size table

The update equation of E ^k+1 is obtained from:

$$\begin{array}{*{20}l} \mathbf{E}^{k+1} =& \arg \min\limits_{\mathbf{E}} \frac{1}{2}\left\|\mathbf{Y}-\mathbf{D}\mathbf{X}^{k+1}-\mathbf{E}\right\|^{2}_{F}+\beta\left\|\mathbf{E}\right\|_{2,1}, \end{array} $$

(22)

which results in a column-shrinkage operator (similar to the derivation of Eq. (21)):

$$\mathbf{E}^{k+1}(:,i) = \left\{\begin{array}{ll} \frac{\|\mathbf{Q}(:,i)\|_{2}-\beta}{\|\mathbf{Q}(:,i)\|_{2}}\mathbf{Q}(:,i) & \text{if}~ \beta < \|\mathbf{Q}(:,i)\|_{2}\\ 0 & \text{otherwise} \end{array}\right., $$

where Q=Y−D X ^k+1 and Q(:,i) is the i-th column of Q.

4.3 Computational Complexity

The complexity of the proposed approach depends linearly on the number of signals L and polynomially on the number of atoms M. The complexity of a single ADMM iteration is given by O(M ³)+O(M N L)+O(M L)+O(M ² L)+O(M L)+O(N L), where the leading four terms comprise the complexity of the update step of X ^k+1, the fifth term is the complexity of the update equation of Z ^k+1, and the sixth term is the complexity of the update equation of E ^k+1. Figure 2 depicts the measured complexity of the proposed approach (SMV mode) for a varying number of signals between L=50,000 to L=500,000 (using N=32 and M=128), demonstrating the linear dependence on the number of signals .

5 Performance Evaluation

The purpose of this section is to show the usefulness of the proposed approach, by demonstrating^{Footnote 4} it on two very different real life problems: The SMV mode of Algorithm 1 is utilized to detect irregular heartbeats in ECG signal; and the MMV mode of Algorithm 1 is utilized for the image processing task of specular reflectance and shadows removal from natural images. The simulations were performed on an i7 quad-core computer with 8GB of RAM memory.

5.1 Arrhythmia Detection in ECG Signals

Irregular heartbeats, know as Arrhythmia, is a collection of several types of abnormal cardiac electrical activity. Arrhythmia is detected by analyzing ECG, which is a non-invasive technique for monitoring cardiac electrical activity. The durations of ECG recordings often reach 24 h, which promoted research efforts for automatic Arrhythmia detection algorithms, see for example [12, 13]. In this experiment we focused on the detection of one type of Arrhythmia event: a Premature Ventricular Contraction (PVC), which is demonstrated in Fig. 3. Sparse representations have been proposed by [14] for ECG source separation problems, which motivated the utilization of the proposed approach for Arrhythmia detection: given an ECG signal that contains mostly normal heartbeats, the key idea is to decompose the signal into all possible N-samples windows (on the order of 1 s duration) and train a dictionary that will provide a sparse representation for these windows. Note that due to the multiplicity and periodicity of normal heartbeats, their corresponding windows are highly repetitive, and constitute the majority among all windows. The dictionary is expected to enable an accurate sparse representation for the windows that correspond to normal heartbeats, due to their high contribution to the training stage. However, the windows that correspond to Arrhythmia events are not expected to be accurately represented by this dictionary, due to their significant deviation from the normal heartbeats waveforms and their low contribution to the training stage (due to rareness of such events). Therefore, a possible strategy for Arrhythmia detection is to solve problem (7) for the SMV case, since each window is expected to be sparsely represent by a different combination of dictionary atoms, and mark columns of E with an l ₂-norm above a threshold τ as irregular heartbeats locations.

We validated our approach using the MIT-BIH Arrhythmia Database [15] that contains a collection of 30 min fully annotated ECG recordings, sampled at 360 Hz. We analyzed ECG recording #109, which includes ^{Footnote 5} 40 PVC events and 2492 regular heartbeats, by extracting all possible 256-samples windows, leading to initial signal collection dimensions of 256×647,745. Due to normal sinus rhythm variations in this recording between 77 to 101, this collection was divided into six segments of 5 min that were processed independently: the dimension of all windows in a segment was reduced from 256 to 32 by projection onto the 32 leading PCA basis vectors of the segment, and an over-complete dictionary $\mathbf {D}\in \mathbb R^{32\times 128}$ was trained using the K-SVD [16] algorithm for each segment,^{Footnote 6} as demonstrated in Fig. 4. The SMV mode of Algorithm 1was employed for each segment with the following parameters: μ ⁰=1.0,ρ=1.25,α=1.0,β=2.6,𝜖=0.0025, and Arrhythmia events were detected as column in E with l ₂-norm above a threshold τ=0.1 (the processing time for the 30 min recording was 176 s). Figure 5 depicts ∥E(:,i)∥₂ for the first 15 min (formed by concatenation of the results from the first 3 segments), demonstrating accurate matching between most of the non-zero l ₂-norm columns and the ground truth annotations of this recording. Due to the randomness of the initial dictionary used in the K-SVD algorithm, the entire experiment was repeated 10 times, resulting in an average of 97.18 % true positive detections with standard deviation 1.89 %, and an average of 2.82 % false negatives with standard deviation 1.89 %. Additional 13 non-PVC events were detected on average, which corresponded to noise and waveform distortions. In order to demonstrate the effectiveness of the proposed approach, we repeated the same experiment using independent sparse coding per each ECG window: we employed the Orthogonal Matching Pursuit (OMP) algorithm to reconstruct each dimensionality-reduced window using the trained dictionary (with a fixed number of atoms). A window was marked as an outlier if its reconstruction error exceeded a threshold τ _{O
M
P}. We set the threshold value to achieve (approximately) the same true positive rate of the proposed approach, and obtained the following result: an average (over 10 experiments) of 98.46 % true positive detections with standard deviation 1.79 %, and 55.2 non-PVC events. Namely, the OMP-based approach resulted in a significantly higher false positive rate, compared to the proposed approach.

5.2 Specular Reflectance and Shadows Removal from Natural Images

The reflection of light from surfaces is associated with two main components [18]: diffuse and specular. The diffuse component scatters light uniformly in all directions, whereas the specular component scatters light in a direction that depends on the angles of incident light and the surface normal. Light energy due to specular reflections is often concentrated, causing strong bright regions (highlights) to appear in the image, as demonstrated in Fig. 7 (left column). These bright regions can cause computer vision algorithms such as segmentation, shape from shading, stereo, and motion detection to produce errors. Therefore, there has been significant interest in specular reflectance removal, see [19] for a review. According to Phong shading model [18], the intensity of the diffuse component at image pixel k is given by:

$$ i_{k}(\lambda) = L(\lambda)\rho_{k}(\lambda)\max (0,\hat{\mathbf{n}}_{k} \cdot \hat{\mathbf{v}}), $$

(23)

where λ is the wavelength (color), L(λ) is the intensity profile of incident light, ρ(λ) is the albedo, $\hat {\mathbf {n}}=[n_{x},n_{y},n_{z}]^{T}$ is the surface normal, and $\hat {\mathbf {v}}=[v_{x},v_{y},v_{z}]^{T}$ is a unit vector pointing to the direction of incident light. Equation (23) is interpreted as follows: The measured intensity is given by the product of the source intensity, the albedo and the cosine of the angle 𝜃 _i between the surface normal and direction of incident light. In the case that |𝜃 _i|>π/2 the intensity equals zero, which results in a self-shadow^{Footnote 7} effect. By column-stacking m pixels, and neglecting the self shadowing effect (i.e. allowing |𝜃 _i|>π/2), the following matrix formulation is obtained:

$$ \mathbf{i}(\lambda) = \mathbf{N}(\lambda)\hat{\mathbf{v}}\in\mathbb R^{m\times 1}, $$

(24)

where

$$ \mathbf{i}(\lambda)=\left[\begin{array}{c} i_{1}(\lambda)\\ i_{2}(\lambda)\\ \vdots \end{array}\right] \in\mathbb R^{m\times 1}, \mathbf{N}(\lambda)=\left[\begin{array}{c} L(\lambda)\rho_{1}(\lambda)\hat{\mathbf{n}}^{T}_{1}\\ L(\lambda)\rho_{2}(\lambda)\hat{\mathbf{n}}^{T}_{2}\\ \vdots \end{array}\right]\in\mathbb R^{m\times 3}. $$

(25)

Given a collection of K images of a diffuse object, photographed from the same view-point and under varying light source directions, the following rank-3 model is obtained:

$$ \mathbf{I}(\lambda) = \mathbf{N}(\lambda)\mathbf{V}, $$

(26)

where

$$ \mathbf{I}(\lambda)=\left[\begin{array}{c} \mathbf{i}_{1}(\lambda)\\ \mathbf{i}_{2}(\lambda)\\ \vdots \end{array}\right] \in\mathbb R^{m\times K}, \mathbf{V}=[\hat{\mathbf{v}}_{1}, \cdots,\hat{\mathbf{v}}_{K}]\in\mathbb R^{3\times K}. $$

(27)

Therefore, the diffuse component can be modeled by a low-dimensional subspace, and the works [20, 21] proved that the dimension of this subspace is upper bounded by 9. The basis for this subspace can be computed from the PCA basis of the images. However, specular components and shadows are not represented by this subspace. Therefore, we can solve problem (7) with the MMV mode, in order to decompose the images Y (each column of Y is one column-stacked image) into diffuse components D X, and specular components and shadows E as follows: the diffuse components of all images are expected to be jointly-sparse with respect to the PCA basis D, whereas the specular components and shadows are assumed to appear in a subset of the images, thus, by minimizing ∥E∥_2,1 the columns of E would contain those parts of the images that do not conform with the joint-sparsity model. In our experiment we used a collection of 37 images (195×317 pixels) of a wrist watch, photographed from the same view-point and using 37 different illumination conditions. We computed the PCA basis of $\mathbf {Y}\in \mathbb {R}^{61,815\times 37}$ and used it as the dictionary D. We further employed Algorithm 1 and set p=2,α=4.5,β=0.5,μ=0.05,ρ=1.15,𝜖=10⁻¹⁰. Figure 6 presents convergence of the algorithm within 35 iterations (processing time was 22 s) to a joint-sparsity pattern with 3 non-zero rows (a 3-dimensional subspace). Figure 7 presents specular reflectance removal results (best viewed in the electronic version of this paper) for five images: the obtained diffuse components (equal to D X(:,l), where l is the corresponding index of each input image) are free of specular reflections and the shadows are significantly removed.

For the case of color images, the low-dimensional subspace model still holds since for a certain image pixel, only the albedo and incident light intensity are color (wavelength) dependent. Using an RGB color representation we obtain the following model:

$$ \mathbf{i}_{RGB}=\left[\begin{array}{l} \mathbf{i}(R)\\ \mathbf{i}(G)\\ \mathbf{i}(B) \end{array}\right]= \left[\begin{array}{l} \mathbf{N}(R)\hat{\mathbf{v}}\\ \mathbf{N}(G)\hat{\mathbf{v}}\\ \mathbf{N}(B)\hat{\mathbf{v}} \end{array}\right]= \left[\begin{array}{l} \mathbf{N}(R)\\ \mathbf{N}(G)\\ \mathbf{N}(B) \end{array}\right]\hat{\mathbf{v}}\in\mathbb R^{3m\times 1}. $$

(28)

Given a collection of K color images of a diffuse object, photographed from the same view-point and under varying light source directions, a rank-3 model is obtained, similar to Eq. (26). Figure 8 demonstrates specular reflectance and shadows removal results for color images: the obtained diffuse components are free of specular reflections and the shadows are significantly removed.

6 Conclusions

Sparse coding and anomaly detection are important tasks, with numerous signal processing applications. This paper presented a unified approach for simultaneous sparse coding and anomaly detection for both jointly-sparse and independently-sparse signal models. The usefulness of the proposed approach was demonstrated for two challenging real-life problems: Arrhythmia detection in ECGs and specular reflectance removal from natural images. Due to the constantly growing number of signals that are well modeled by sparse representations, the proposed approach could be combined in many existing and emerging applications.

Notes

The observant reader may notice that problem (5) is actually separable, implying that we can solve for each column of X independently from the others. Nevertheless, we choose in this paper a joint solver for two reasons: (i) Giving a unified view of the two problems (5 and 6); and (ii) Our approach loses nothing in terms of complexity nor elegance when compared to the independent sparse coding tasks.
Note that the related problem of a sparsely represented interference matrix E=ΨC, where Ψ is a dictionary and C is sparse, can be formulated as follows:
$$\begin{array}{*{20}l} \min\limits_{\mathbf{X},\mathbf{C}} \frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-{\Psi} \mathbf{C}\|_{F}^{2}+ \alpha\|\mathbf{X}\|_{1,p} + \beta\|\mathbf{C}\|_{1,1} , \end{array} $$
and its solution can be also obtained using the proposed approach in this section.
Note that problem (7) is convex, therefore ADMM will approach the global minimum. Since the Frobenius-norm term in (7) is not separable, perfect convergence is not guaranteed, however, it was verified experimentally that the solution obtained by ADMM is sufficiently accurate after 50 iterations.
All the results in this paper are reproducible with a MATLAB package that is freely available for distribution.
http://www.physionet.org/physiobank/database/html/mitdbdir/records.htm
A total of six dictionaries were trained—each using all of the dimensionality-reduced samples from each segment.
This is in contrast to cast-shadows, where one part of an object is shadowed by another part.

References

Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), 15:1–15:58.
Bruckstein, A.M., Donoho, D., Elad, M. (2009). From sparse solutions of systems of equations to sparse modeling of signals and images. Society for Industrial and Applied Mathematics Review, 51(1), 34–81.
Nguyen, N.H., Nasrabadi, N.M., Tran, T.D. (2011). Robust multi-sensor classification via joint sparse representation. In International conference on information fusion (FUSION).
Shekhar, S., Patel, V.M., Nasrabadi, M., Chellappa, R. (2012). Joint sparsity-based robust multimodal biometrics recognition. In The 12th European conference on computer vision (ECCV).
Cong, Y., Yuan, J., Liu, J. (2011). Sparse reconstruction cost for abnormal event detection. In Computer vision and pattern recognition (CVPR).
Zhao, B., Fei-Fei, L., Xing, E.P. (2011). Online detection of unusual events in videos via dynamic sparse coding. In Computer vision and pattern recognition (CVPR).
Levorato, M., & Mitra, U. (2012). Fast anomaly detection in smartgrids via sparse approximation theory. In Sensor array and multichannel signal processing workshop (SAM).
Cotter, S.F., Rao, B.D., Engan, K., Kreutz-Delgado, K. (2005). Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Transactions on Signal Processing, 53(7), 2477–2488.
Kong, D., Ding, C., Huang, H. (2011). Robust nonnegative matrix factorization using l21-norm. In The 20th ACM international conference on information and knowledge management(CIKM).
Dobigeon, N., & Févotte, C. (2013). Robust nonnegative matrix factorization for nonlinear unmixing of hyperspectral images. In The 5th workshop on hyperspectral image and signal processing: evolution in remote sensing (WHISPERS).
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.
Ince, T., Kiranyaz, S., Gabbouj, M. (2009). A generic and robust system for automated patient-specific classification of ecg signals. IEEE Transactions on Biomedical Engineering, 56(5), 1415–1426.
Ye, C., Kumar, B.V., Coimbra, M.T. (2012). Heartbeat classification using morphological and dynamic features of ecg signals. IEEE Transactions on Biomedical Engineering, 59(10), 2930–2941.
Mailhe, B., Gribonval, R., Bimbot, F., Lemay, M., Vandergheynst, P., Vesin, J. M. (2009). Dictionary learning for the sparse modelling of atrial fibrillation in ecg signals. In Internatioanl conference on acoustics, speech and signal processing (ICASSP).
Moody, G.B., & Mark, R.G. (2001). The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.
Aharon, M., Elad, M., Bruckstein, A.M. (2006). K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Image Science.
Szeliski, R. (2010). Computer vision: algorithms and applications. New York: Springer.
Artusi, A., Banterle, F., Chetverikov, D. (2011). A survey of specularity removal methods. Computer Graphics Forum, 30(11).
Basri, R., & Jacobs, D.W. (2003). Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2).
Shashua, A. (1997). On photometric issues in 3d visual recognition from a single 2d image. International Journal of Computer Vision, 21(1–2).

Download references

Author information

Authors and Affiliations

Computer Science Department, Technion, Haifa, Israel
Amir Adler, Michael Elad & Ehud Rivlin
Computer Science Department, Interdisciplinary Center, Herzelia, Israel
Yacov Hel-Or

Authors

Amir Adler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Elad
View author publications
You can also search for this author in PubMed Google Scholar
Yacov Hel-Or
View author publications
You can also search for this author in PubMed Google Scholar
Ehud Rivlin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Adler.

Additional information

A. Adler is the recipient of the 2011 Google European Doctoral Fellowship in Multimedia, and this research is partly supported by this Google Fellowship and partly by ERC Grant agreement no. 320649.

Appendix

The update equation for X ^k+1 is obtained by solving:

$$\begin{array}{*{20}l} \min\limits_{\mathbf{X}} &\frac{1}{2} \left\| \mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}^{k} \right\|^{2}_{F} + <\mathbf{M}^{k},\mathbf{Z}^{k}-\mathbf{X}> + \frac{\mu^{k}}{2}\left\|\mathbf{Z}^{k}-\mathbf{X}\right\|^{2}_{F}. \end{array} $$

(29)

The solution of Eq. (29) is computed from:

$$\begin{array}{@{}rcl@{}} &&{}\frac{\partial}{\partial\mathbf{X}} \left(\frac{1}{2}\textbf{Tr}\left\{\left(\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}^{k}\right)\left(\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}^{k}\right)^{T}\right\}+ \textbf{Tr}\left\{(\mathbf{Z}^{k}-\mathbf{X})^{T} \mathbf{M}^{k}\right\} \right.\\ &&{\kern1.5pc}\left. + \frac{\mu^{k}}{2} \textbf{Tr}\left\{\left(\mathbf{Z}^{k}-\mathbf{X}\right)\left(\mathbf{Z}^{k}-\mathbf{X}\right)^{T}\right\}\right)=0, \end{array} $$

(30)

which results in:

$$\begin{array}{*{20}l} & \mathbf{D}^{T}\left(\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}^{k}\right) + \mathbf{M}^{k} + \mu^{k}\left(\mathbf{Z}^{k}-\mathbf{X}\right)=0, \end{array} $$

(31)

and the update equation is given by:

$$\begin{array}{*{20}l} \mathbf{X}^{k+1} = \left(\mathbf{D}^{T}\mathbf{D}+\mu^{k}\mathbf{I}\right)^{-1}\left(\mathbf{D}^{T}\left(\mathbf{Y}-\mathbf{E}^{k}\right) +\mathbf{M}^{k}+\mu^{k} \mathbf{Z}^{k}\right). \end{array} $$

(32)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adler, A., Elad, M., Hel-Or, Y. et al. Sparse Coding with Anomaly Detection. J Sign Process Syst 79, 179–188 (2015). https://doi.org/10.1007/s11265-014-0913-0

Download citation

Received: 19 January 2014
Revised: 26 May 2014
Accepted: 05 June 2014
Published: 12 July 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11265-014-0913-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sparse Coding with Anomaly Detection

Abstract

Similar content being viewed by others

Sparse Representation Based Anomalies Detection in Electrocardiography Signals

Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms

Adaptive Approximation Algorithms for Sparse Data Representation