1 Introduction

Anomaly detection is the problem of detecting patterns that significantly deviate from an expected model. This problem has numerous applications such as fraud detection for banking and businesses, intrusion detection for network security, fault detection for production systems, health problems detection for biomedical systems and more, see [1] for a review. In this paper we assume that the expected behavior of the data vectors is to conform with a sparse representation model [2], and address the problem of simultaneous sparse coding and anomaly detection. This problem can be applied to three different tasks: 1) anomaly detection within sparsely represented data vectors. 2) removal of interference from sparsely represented data vectors. 3) dictionary learning in the presence of outliers. In this paper we address the first two tasks, and the latter is beyond the scope of this work.

Related work

Joint-sparse coding was addressed by [3, 4] for cases in which all data vectors are contaminated by either a sparse or a sparsely-represented interference. Anomaly detection in video was addressed by [5] which proposed a sparse reconstruction cost to measure the normality of events, with respect to a dictionary with various spatio-temporal structures. This problem was addressed also by [6], which combined online dictionary learning with an objective function that measures the normality of events. The work of [7] utilized sparse representations to analyze stochastic processes over graphs for anomaly detection in SmartGrids.

Contributions

The contributions of this paper are two-fold: 1) A unified formulation for the problem of simultaneous sparse coding and anomaly detection is provided for jointly sparse as well as for independently sparse data vectors; and a numerical solver is provided for both cases. 2) the proposed approach is demonstrated to detect irregular heartbeats in ECG, and remove specular reflections and shadows from natural images.

Organization

Section 2 reviews sparse representations, Section 3 formulates the problem, Section 4 explains the proposed approach, and Section 5 demonstrates its performance.

2 Sparse Representation Modeling

Sparse representation modeling [2] assumes that a signal (data vector) \(\mathbf {y}\in \mathbb {R}^{N}\) can be described as yD x, where \(\mathbf {D} \in \mathbb {R}^{N\times M}\) is a dictionary and \(\mathbf {x}\in \mathbb {R}^{M}\) is sparse. Therefore, y is represented by a linear combination of few columns (atoms) of D. The recovery of the sparse representation, termed sparse coding, can be obtained by solving the following problem:

$$\begin{array}{*{20}l} \hat{\mathbf{x}} & = \arg\min\limits_{\mathbf{x}} \|\mathbf{y} - \mathbf{D} \mathbf{x}\|_{2}^{2} \text{ s.t. }~ \left\|\mathbf{x}\right\|_{0} \le T_{0}, \end{array} $$
(1)

where ∥x0 is the l 0 pseudo-norm that counts the number of non-zero entries of x, and T 0 is the maximum number of non-zero entries. Problem (1) can be augmented for a collection of signals:

$$\begin{array}{*{20}l} \hat{\mathbf{X}} & = \arg\min\limits_{\mathbf{X}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} \|_{F}^{2} \text{ s.t. }~ \left\|\mathbf{X}\right\|_{0} \le LT_{0}, \end{array} $$
(2)

where \(\mathbf {Y}\in \mathbb {R}^{N\times L}\) contains L signals \(\{\mathbf {y}_{i}\in \mathbb {R}^{N}\}^{L}_{i=1}\), \(\mathbf {X}\in \mathbb {R}^{M\times L}\) contains L sparse representations \(\{\mathbf {x}_{i}\in \mathbb {R}^{M}\}^{L}_{i=1}\) and ∥X0 counts the number of non-zero entries of X. This type of model is referred to as the Single Measurement Vector (SMV), since each signal is assumed to be a single measurement associated with a unique non-zero pattern of its sparse representation (i.e. a unique combination of atoms). The case in which all the sparse representations share the same non-zero pattern is referred to as the Multiple Measurement Vector (MMV) [8] or joint-sparsity model, as illustrated in Fig. 1. For the MMV case, the following optimization problem recovers more accurately the sparse representations, by exploiting the joint-sparsity property:

$$\begin{array}{*{20}l} \hat{\mathbf{X}} & = \arg\min\limits_{\mathbf{X}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} \|_{F}^{2} \text{ s.t. }~ \left\|\mathbf{X}\right\|_{0,2} \le T_{0}, \end{array} $$
(3)

where \(\left \|\mathbf {X}\right \|_{0,2}={\sum }_{j}{\mathcal {I}(\|\mathbf {X}(j,:)\|_{2})}\) counts the number of non-zero rows, X(j,:) is the j-th row of X and \(\mathcal {I}\) is the indicator function:

$$\mathcal{I}(a) = \left\{\begin{array}{ll} 1 & \quad\text{if}~ |a| > 0\\ 0 & \quad\text{otherwise} \end{array}\right.. $$

Note that problems (1)–(3) are NP-hard and their solutions can be approximated using convex relaxations: the l 1 norm \(\left \|\mathbf {x}\right \|_{1}={\sum }_{i}{|x_{i}|}\) often replaces ∥x0, and the l 1,p norm \(\left \|\mathbf {X}\right \|_{1,p}={\sum }_{j}{\|\mathbf {X}(j,:)\|_{p}}\) often replaces ∥X0 with p=1 and ∥X0,2 with p=2.

Figure 1
figure 1

The non-zeros (dark squares) of the sparse representations matrix X for the SMV (left) and MMV (right) models.

3 Problem Formulation

Let \(\mathbf {Y}\in \mathbb {R}^{N\times L}\) be a collection of signals that are well approximated by a sparse representations model, excluding a small number of signals—the outliers—which significantly deviate from this model. The collection Y is described as follows:

$$\begin{array}{*{20}l} \mathbf{Y}=\mathbf{D} \mathbf{X}+ \mathbf{E} + \mathbf{V}, \end{array} $$
(4)

where D is assumed known, E has few non-zero columns that equal to the deviation of each outlier from the sparse representations model, and V is a low-energy noise component (\(\|\mathbf {V}\|_{F}^{2}\) is small compared to \(\|\mathbf {Y}\|_{F}^{2}\)).

Our objective is to detect the outliers in Y and recover the sparse representations. For the SMV case this objective can be obtained by solving the following problem:

$$\begin{array}{*{20}l} \{\mathbf{X},\mathbf{E}\}=\arg &\min\limits_{\mathbf{X},\mathbf{E}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} -\mathbf{E} \|_{F}^{2} \\ \text{s.t. } & \|\mathbf{X}\|_{0} \le LT_{0} \\ & \|\mathbf{E}\|_{2,0} \le K_{0}, \end{array} $$
(5)

where \(\left \|\mathbf {E}\right \|_{2,0}={\sum }_{i}{\mathcal {I}(\|\mathbf {E}(:,i)\|_{2})}\) counts the number of non-zero columns in E, E(:,i) is the i-th column of E, and K 0 is the maximum number of non-zero columns (i.e. outliers). Problem (5) encourages a solution in which X is sparse, however, for the outliers that cannot be represented exclusively by D, it permits non-zero columns in E. For the MMV case the objective can be obtained by solving the following problem:

$$\begin{array}{*{20}l} \{\mathbf{X},\mathbf{E}\}=\arg &\min\limits_{\mathbf{X},\mathbf{E}} \|\mathbf{Y} - \mathbf{D} \mathbf{X} -\mathbf{E} \|_{F}^{2} \\ \text{s.t. } & \|\mathbf{X}\|_{0,2} \le T_{0} \\ & \|\mathbf{E}\|_{2,0} \le K_{0}, \end{array} $$
(6)

where the constraints ensure at most T 0 non-zero rows in X and at most K 0 non-zero columns in E.

4 The Proposed Approach

The solutions to problemsFootnote 1 (5) and (6) can be approximated by solving the following unconstrained convex problem:Footnote 2

$$ \min\limits_{\mathbf{X},\mathbf{E}} \frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}\|_{F}^{2}+ \alpha\|\mathbf{X}\|_{1,p} + \beta\|\mathbf{E}\|_{2,1} $$
(7)

where p=1 for the SMV case, p=2 for the MMV case and α,β are a small positive scalars. In addition, \(\left \|\mathbf {E}\right \|_{2,1}={\sum }_{i}{\|\mathbf {E}(:,i)\|_{2}}\) is the l 2,1 norm which provides a convex relaxation to ∥E2,0, and was applied in [9, 10] for robust non-negative matrix factorization. We propose to solve problem (7) with the Alternating Direction Method of Multipliers (ADMM) [11] due to the following reasons: (i) it is suitable for our problem format, (ii) it has proven convergence properties, and (iii) it leads to a simple, coordinate-descent structure. In the following we describe the ADMM method and its application to our problem.

4.1 Alternating Direction Method of Multipliers

ADMM is a numerical method for solving problems of the following form:

$$\begin{array}{*{20}l} \min\limits_{\mathbf{X},\mathbf{Z}}&\;f(\mathbf{X},\mathbf{Z}) \text{ s.t. }~\mathbf{A}\mathbf{X}+\mathbf{B}\mathbf{Z}=\mathbf{C}, \end{array} $$
(8)

where X,Z,A,B,C are matrices, and the objective function is either separable f(X,Z)=g(X)+h(Z) or bi-convex. ADMM solves (8) by minimizing its Augmented-Lagrangian:

$$\begin{array}{@{}rcl@{}} \mathcal{L}_{A}(\mathbf{X},\mathbf{Z},\mu,\mathbf{M})&=&f(\mathbf{X},\mathbf{Z}) + <\mathbf{M},\mathbf{A}\mathbf{X}+\mathbf{B}\mathbf{Z}-\mathbf{C}>\\ && + \frac{\mu}{2}\|\mathbf{A}\mathbf{X}+\mathbf{B}\mathbf{Z}-\mathbf{C}\|^{2}_{F}, \end{array} $$
(9)

where M is a Lagrange multiplier and μ is a penalty coefficient that controls the penalty level of deviation from the equality constraint. The minimization of \(\mathcal {L}_{A}(\mathbf {X},\mathbf {Z},\mu ,\mathbf {M})\) is performed iteratively, while alternating between the minimizations of X and Z:

$$\begin{array}{@{}rcl@{}} \mathbf{X}^{k+1}&=&\arg \min\limits_{\mathbf{X}} \mathcal {L}_{A}(\mathbf{X},\mathbf{Z}^{k},\mu^{k},\mathbf{M}^{k})\\ \end{array} $$
(10)
$$\begin{array}{@{}rcl@{}} \mathbf{Z}^{k+1}&=&\arg \min\limits_{\mathbf{Z}} \mathcal {L}_{A}(\mathbf{X}^{k+1},\mathbf{Z},\mu^{k},\mathbf{M}^{k})\\ \end{array} $$
(11)
$$\begin{array}{@{}rcl@{}} \mathbf{M}^{k+1}&=&\mathbf{M}^{k}+\mu^{k}(\mathbf{A}\mathbf{X}^{k+1}+\mathbf{B}\mathbf{Z}^{k+1}-\mathbf{C})\\ \end{array} $$
(12)
$$\begin{array}{@{}rcl@{}} {\mu}^{k+1}&=&{\rho}{\mu}^{k}; \rho >1. \end{array} $$
(13)

ADMM can be extended to more than two variables, and its convergenceFootnote 3 properties are analyzed in [11].

4.2 Sparse Coding with Anomaly Detection

In order to apply ADMM to solve problem (7), we add an auxiliary variable Z and an equality constraint as follows:

$$\begin{array}{*{20}l} \min\limits_{\mathbf{X},\mathbf{E},\mathbf{Z}} &\frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}\|_{F}^{2}+ \alpha\|\mathbf{Z}\|_{1,p} + \beta\|\mathbf{E}\|_{2,1} \text{ s.t. }~ \mathbf{Z}=\mathbf{X}. \end{array} $$
(14)

Note that by converting Eq. (7) into a constrained problem, we have decoupled the first and second terms of Eq. (7), thus, avoiding the need for an iterated-shrinkage [17] solution for X. The addition of the auxiliary variable Z results in a closed-form solution for X and a one-shot shrinkage solution for Z. The Augmented-Lagrangian of problem (17) is given by:

$$\begin{array}{@{}rcl@{}} \mathcal L_{p}(\mathbf{X},\mathbf{Z},\mathbf{E},\mathbf{M},\mu)& =& \frac{1}{2}\|\mathbf{Y}-\mathbf{D} \mathbf{X}-\mathbf{E}\|_{F}^{2} + \alpha\|\mathbf{Z}\|_{1,p} \\ && + \beta\|\mathbf{E}\|_{2,1} + <\mathbf{M},\mathbf{Z}-\mathbf{X}> \\ &&+ \frac{\mu}{2}\|\mathbf{Z}-\mathbf{X}\|^{2}_{F}. \end{array} $$
(15)

The main stages of the ADMM-based solution are summarized in Algorithm 1, and in the following we describe the update equations of X k+1,Z k+1,E k+1. The update equation of X k+1 is closed-form (and derived in the Appendix):

$$\begin{array}{*{20}l} \mathbf{X}^{k+1} = \left(\mathbf{D}^{T}\mathbf{D}+\mu^{k}\mathbf{I}\right)^{-1}\left(\mathbf{D}^{T}\left(\mathbf{Y}-\mathbf{E}^{k}\right) +\mathbf{M}^{k}+\mu^{k} \mathbf{Z}^{k}\right). \end{array} $$
(16)

The update equation of Z k+1 for the SMV case is obtained from:

$$\begin{array}{@{}rcl@{}} \mathbf{Z}^{k+1} &=& \arg \min\limits_{\mathbf{Z}} \alpha\|\mathbf{Z}\|_{1,1} + <\mathbf{M}^{k},\mathbf{Z}-\mathbf{X}^{k+1}> \\ && +\frac{\mu^{k}}{2}\left\|\mathbf{Z}-\mathbf{X}^{k+1}\right\|^{2}_{F}, \end{array} $$
(17)

which can be simplified to:

$$\begin{array}{*{20}l} & \mathbf{Z}^{k+1} =\arg \min\limits_{\mathbf{Z}} \frac{1}{2}\left\|\mathbf{P}-\mathbf{Z}\right\|^{2}_{F} + \gamma\|\mathbf{Z}\|_{1,1} , \end{array} $$
(18)

where \(\mathbf {P}=\mathbf {X}^{k+1}-\frac {1}{\mu ^{k}}\mathbf {M}^{k}\) and \(\gamma =\frac {\alpha }{\mu ^{k}}\). The solution to problem (18) is the element-wise soft thresholding operator [11]:

$$\begin{array}{*{20}l} & \mathbf{Z}^{k+1} = \mathcal S_{\gamma}(\mathbf{P}), \end{array} $$
(19)

where:

$$\mathcal S_{\gamma}(a) = \left\{\begin{array}{ll} a-\gamma & \;\text{if}~ a > \gamma \\ 0 & \;\text{if}~ |a|\le \gamma\\ a+\gamma & \;\text{if}~ a<-\gamma \end{array}\right.. $$

The update equation of Z k+1 for the MMV case is given by:

$$\begin{array}{*{20}l} & \mathbf{Z}^{k+1} =\arg \min\limits_{\mathbf{Z}} \frac{1}{2}\|\mathbf{P}-\mathbf{Z}\|^{2}_{F} + \gamma\|\mathbf{Z}\|_{1,2}, \end{array} $$
(20)

which results in a row-shrinkage operator (as proved in [4]):

$$ \mathbf{Z}^{k+1}(j,:) = \left\{\begin{array}{ll} \frac{\left\|\mathbf{P}(j,:)\right\|_{2}-\gamma}{\left\|\mathbf{P}(j,:)\right\|_{2}}\mathbf{P}(j,:) & \quad\text{if}~ \gamma < \|\mathbf{P}(j,:)\|_{2}\\ 0 & \quad\text{otherwise} \end{array}\right., $$
(21)

where P(j,:) is the j-th row of P.

Table 1

The update equation of E k+1 is obtained from:

$$\begin{array}{*{20}l} \mathbf{E}^{k+1} =& \arg \min\limits_{\mathbf{E}} \frac{1}{2}\left\|\mathbf{Y}-\mathbf{D}\mathbf{X}^{k+1}-\mathbf{E}\right\|^{2}_{F}+\beta\left\|\mathbf{E}\right\|_{2,1}, \end{array} $$
(22)

which results in a column-shrinkage operator (similar to the derivation of Eq. (21)):

$$\mathbf{E}^{k+1}(:,i) = \left\{\begin{array}{ll} \frac{\|\mathbf{Q}(:,i)\|_{2}-\beta}{\|\mathbf{Q}(:,i)\|_{2}}\mathbf{Q}(:,i) & \text{if}~ \beta < \|\mathbf{Q}(:,i)\|_{2}\\ 0 & \text{otherwise} \end{array}\right., $$

where Q=YD X k+1 and Q(:,i) is the i-th column of Q.

4.3 Computational Complexity

The complexity of the proposed approach depends linearly on the number of signals L and polynomially on the number of atoms M. The complexity of a single ADMM iteration is given by O(M 3)+O(M N L)+O(M L)+O(M 2 L)+O(M L)+O(N L), where the leading four terms comprise the complexity of the update step of X k+1, the fifth term is the complexity of the update equation of Z k+1, and the sixth term is the complexity of the update equation of E k+1. Figure 2 depicts the measured complexity of the proposed approach (SMV mode) for a varying number of signals between L=50,000 to L=500,000 (using N=32 and M=128), demonstrating the linear dependence on the number of signals .

Figure 2
figure 2

Computation time of the proposed approach (SMV mode) for a varying number of signals between 50,000 to 500,000.

5 Performance Evaluation

The purpose of this section is to show the usefulness of the proposed approach, by demonstratingFootnote 4 it on two very different real life problems: The SMV mode of Algorithm 1 is utilized to detect irregular heartbeats in ECG signal; and the MMV mode of Algorithm 1 is utilized for the image processing task of specular reflectance and shadows removal from natural images. The simulations were performed on an i7 quad-core computer with 8GB of RAM memory.

5.1 Arrhythmia Detection in ECG Signals

Irregular heartbeats, know as Arrhythmia, is a collection of several types of abnormal cardiac electrical activity. Arrhythmia is detected by analyzing ECG, which is a non-invasive technique for monitoring cardiac electrical activity. The durations of ECG recordings often reach 24 h, which promoted research efforts for automatic Arrhythmia detection algorithms, see for example [12, 13]. In this experiment we focused on the detection of one type of Arrhythmia event: a Premature Ventricular Contraction (PVC), which is demonstrated in Fig. 3. Sparse representations have been proposed by [14] for ECG source separation problems, which motivated the utilization of the proposed approach for Arrhythmia detection: given an ECG signal that contains mostly normal heartbeats, the key idea is to decompose the signal into all possible N-samples windows (on the order of 1 s duration) and train a dictionary that will provide a sparse representation for these windows. Note that due to the multiplicity and periodicity of normal heartbeats, their corresponding windows are highly repetitive, and constitute the majority among all windows. The dictionary is expected to enable an accurate sparse representation for the windows that correspond to normal heartbeats, due to their high contribution to the training stage. However, the windows that correspond to Arrhythmia events are not expected to be accurately represented by this dictionary, due to their significant deviation from the normal heartbeats waveforms and their low contribution to the training stage (due to rareness of such events). Therefore, a possible strategy for Arrhythmia detection is to solve problem (7) for the SMV case, since each window is expected to be sparsely represent by a different combination of dictionary atoms, and mark columns of E with an l 2-norm above a threshold τ as irregular heartbeats locations.

Figure 3
figure 3

A Premature Ventricular Contraction (PVC) event interrupts a series of normal heartbeats in ECG recording #109 from the MIT-BIH Arrhythmia Database [15].

We validated our approach using the MIT-BIH Arrhythmia Database [15] that contains a collection of 30 min fully annotated ECG recordings, sampled at 360 Hz. We analyzed ECG recording #109, which includes Footnote 5 40 PVC events and 2492 regular heartbeats, by extracting all possible 256-samples windows, leading to initial signal collection dimensions of 256×647,745. Due to normal sinus rhythm variations in this recording between 77 to 101, this collection was divided into six segments of 5 min that were processed independently: the dimension of all windows in a segment was reduced from 256 to 32 by projection onto the 32 leading PCA basis vectors of the segment, and an over-complete dictionary \(\mathbf {D}\in \mathbb R^{32\times 128}\) was trained using the K-SVD [16] algorithm for each segment,Footnote 6 as demonstrated in Fig. 4. The SMV mode of Algorithm 1was employed for each segment with the following parameters: μ 0=1.0,ρ=1.25,α=1.0,β=2.6,𝜖=0.0025, and Arrhythmia events were detected as column in E with l 2-norm above a threshold τ=0.1 (the processing time for the 30 min recording was 176 s). Figure 5 depicts ∥E(:,i)∥2 for the first 15 min (formed by concatenation of the results from the first 3 segments), demonstrating accurate matching between most of the non-zero l 2-norm columns and the ground truth annotations of this recording. Due to the randomness of the initial dictionary used in the K-SVD algorithm, the entire experiment was repeated 10 times, resulting in an average of 97.18 % true positive detections with standard deviation 1.89 %, and an average of 2.82 % false negatives with standard deviation 1.89 %. Additional 13 non-PVC events were detected on average, which corresponded to noise and waveform distortions. In order to demonstrate the effectiveness of the proposed approach, we repeated the same experiment using independent sparse coding per each ECG window: we employed the Orthogonal Matching Pursuit (OMP) algorithm to reconstruct each dimensionality-reduced window using the trained dictionary (with a fixed number of atoms). A window was marked as an outlier if its reconstruction error exceeded a threshold τ O M P . We set the threshold value to achieve (approximately) the same true positive rate of the proposed approach, and obtained the following result: an average (over 10 experiments) of 98.46 % true positive detections with standard deviation 1.79 %, and 55.2 non-PVC events. Namely, the OMP-based approach resulted in a significantly higher false positive rate, compared to the proposed approach.

Figure 4
figure 4

Arrhythmia detection: columns of E with an l 2-norm above τ=0.1 indicate an ECG anomaly.

Figure 5
figure 5

ECG dictionary:16 atoms of one ECG segment, displayed after reconstruction using the 32 leading PCA basis vectors of the segment.

5.2 Specular Reflectance and Shadows Removal from Natural Images

The reflection of light from surfaces is associated with two main components [18]: diffuse and specular. The diffuse component scatters light uniformly in all directions, whereas the specular component scatters light in a direction that depends on the angles of incident light and the surface normal. Light energy due to specular reflections is often concentrated, causing strong bright regions (highlights) to appear in the image, as demonstrated in Fig. 7 (left column). These bright regions can cause computer vision algorithms such as segmentation, shape from shading, stereo, and motion detection to produce errors. Therefore, there has been significant interest in specular reflectance removal, see [19] for a review. According to Phong shading model [18], the intensity of the diffuse component at image pixel k is given by:

$$ i_{k}(\lambda) = L(\lambda)\rho_{k}(\lambda)\max (0,\hat{\mathbf{n}}_{k} \cdot \hat{\mathbf{v}}), $$
(23)

where λ is the wavelength (color), L(λ) is the intensity profile of incident light, ρ(λ) is the albedo, \(\hat {\mathbf {n}}=[n_{x},n_{y},n_{z}]^{T}\) is the surface normal, and \(\hat {\mathbf {v}}=[v_{x},v_{y},v_{z}]^{T}\) is a unit vector pointing to the direction of incident light. Equation (23) is interpreted as follows: The measured intensity is given by the product of the source intensity, the albedo and the cosine of the angle 𝜃 i between the surface normal and direction of incident light. In the case that |𝜃 i |>π/2 the intensity equals zero, which results in a self-shadowFootnote 7 effect. By column-stacking m pixels, and neglecting the self shadowing effect (i.e. allowing |𝜃 i |>π/2), the following matrix formulation is obtained:

$$ \mathbf{i}(\lambda) = \mathbf{N}(\lambda)\hat{\mathbf{v}}\in\mathbb R^{m\times 1}, $$
(24)

where

$$ \mathbf{i}(\lambda)=\left[\begin{array}{c} i_{1}(\lambda)\\ i_{2}(\lambda)\\ \vdots \end{array}\right] \in\mathbb R^{m\times 1}, \mathbf{N}(\lambda)=\left[\begin{array}{c} L(\lambda)\rho_{1}(\lambda)\hat{\mathbf{n}}^{T}_{1}\\ L(\lambda)\rho_{2}(\lambda)\hat{\mathbf{n}}^{T}_{2}\\ \vdots \end{array}\right]\in\mathbb R^{m\times 3}. $$
(25)

Given a collection of K images of a diffuse object, photographed from the same view-point and under varying light source directions, the following rank-3 model is obtained:

$$ \mathbf{I}(\lambda) = \mathbf{N}(\lambda)\mathbf{V}, $$
(26)

where

$$ \mathbf{I}(\lambda)=\left[\begin{array}{c} \mathbf{i}_{1}(\lambda)\\ \mathbf{i}_{2}(\lambda)\\ \vdots \end{array}\right] \in\mathbb R^{m\times K}, \mathbf{V}=[\hat{\mathbf{v}}_{1}, \cdots,\hat{\mathbf{v}}_{K}]\in\mathbb R^{3\times K}. $$
(27)

Therefore, the diffuse component can be modeled by a low-dimensional subspace, and the works [20, 21] proved that the dimension of this subspace is upper bounded by 9. The basis for this subspace can be computed from the PCA basis of the images. However, specular components and shadows are not represented by this subspace. Therefore, we can solve problem (7) with the MMV mode, in order to decompose the images Y (each column of Y is one column-stacked image) into diffuse components D X, and specular components and shadows E as follows: the diffuse components of all images are expected to be jointly-sparse with respect to the PCA basis D, whereas the specular components and shadows are assumed to appear in a subset of the images, thus, by minimizing ∥E2,1 the columns of E would contain those parts of the images that do not conform with the joint-sparsity model. In our experiment we used a collection of 37 images (195×317 pixels) of a wrist watch, photographed from the same view-point and using 37 different illumination conditions. We computed the PCA basis of \(\mathbf {Y}\in \mathbb {R}^{61,815\times 37}\) and used it as the dictionary D. We further employed Algorithm 1 and set p=2,α=4.5,β=0.5,μ=0.05,ρ=1.15,𝜖=10−10. Figure 6 presents convergence of the algorithm within 35 iterations (processing time was 22 s) to a joint-sparsity pattern with 3 non-zero rows (a 3-dimensional subspace). Figure 7 presents specular reflectance removal results (best viewed in the electronic version of this paper) for five images: the obtained diffuse components (equal to D X(:,l), where l is the corresponding index of each input image) are free of specular reflections and the shadows are significantly removed.

Figure 6
figure 6

Specular reflectance removal convergence: a number of non-zero rows in X, b approximation error \(\frac {\left \|\mathbf {Z}^{k}-\mathbf {X}^{k}\right \|_{F}^{2}}{\|\mathbf {X}^{k}\|_{F}^{2}}\).

Figure 7
figure 7

Specular reflectance removal: input images (left), diffuse components (center) and specular components (right).

For the case of color images, the low-dimensional subspace model still holds since for a certain image pixel, only the albedo and incident light intensity are color (wavelength) dependent. Using an RGB color representation we obtain the following model:

$$ \mathbf{i}_{RGB}=\left[\begin{array}{l} \mathbf{i}(R)\\ \mathbf{i}(G)\\ \mathbf{i}(B) \end{array}\right]= \left[\begin{array}{l} \mathbf{N}(R)\hat{\mathbf{v}}\\ \mathbf{N}(G)\hat{\mathbf{v}}\\ \mathbf{N}(B)\hat{\mathbf{v}} \end{array}\right]= \left[\begin{array}{l} \mathbf{N}(R)\\ \mathbf{N}(G)\\ \mathbf{N}(B) \end{array}\right]\hat{\mathbf{v}}\in\mathbb R^{3m\times 1}. $$
(28)

Given a collection of K color images of a diffuse object, photographed from the same view-point and under varying light source directions, a rank-3 model is obtained, similar to Eq. (26). Figure 8 demonstrates specular reflectance and shadows removal results for color images: the obtained diffuse components are free of specular reflections and the shadows are significantly removed.

Figure 8
figure 8

Specular reflectance and shadows removal from color images: input images (left), diffuse components (center) and specular components (right).

6 Conclusions

Sparse coding and anomaly detection are important tasks, with numerous signal processing applications. This paper presented a unified approach for simultaneous sparse coding and anomaly detection for both jointly-sparse and independently-sparse signal models. The usefulness of the proposed approach was demonstrated for two challenging real-life problems: Arrhythmia detection in ECGs and specular reflectance removal from natural images. Due to the constantly growing number of signals that are well modeled by sparse representations, the proposed approach could be combined in many existing and emerging applications.