Abstract
Moving Object Segmentation (MOS) is an important topic in computer vision. MOS becomes a challenging problem in the presence of dynamic background and moving camera videos such as Pan–Tilt–Zoom cameras (PTZ). The MOS problem has been solved using unsupervised and supervised learning strategies. Recently, new ideas to solve MOS using semi-supervised learning have emerged inspired from the theory of Graph Signal Processing (GSP). These new algorithms are usually composed of several steps including: segmentation, background initialization, features extraction, graph construction, graph signal sampling, and a semi-supervised learning algorithm inspired from reconstruction of graph signals. In this work, we summarize and explain the theoretical foundations as well as the technical details of MOS using GPS. We also propose two architectures for MOS using semi-supervised learning and a new evaluation procedure for GSP-based MOS algorithms. GSP-based algorithms are evaluated in the Change Detection (CDNet2014) dataset for MOS, outperforming numerous State-Of-The-Art (SOTA) methods in several challenging conditions.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Moving Object Segmentation (MOS) is a fundamental topic in computer vision. MOS has several applications [17], and it is also a fundamental step in numerous computer vision and image/video processing tasks such as video surveillance, security, visual object tracking, and human activity analysis, among others. The target of MOS is to separate the moving from the static objects in a scene [4]. MOS is also considered as a classification problem, where the objective is to classify between background and foreground each pixel in the videos, and therefore MOS is also known as background subtraction [7]. MOS becomes very challenging in the presence of undesirable variations in the background scene for static and moving camera sequences [21].
The approaches to solve MOS can be categorized as unsupervised, semi-supervised, and supervised learning methods. Unsupervised methods usually rely on a background model [25] of the video to segment the moving pixels (or foreground) [4,5,6, 28]. However, these unsupervised methods usually fail in dynamic background and moving camera sequences. On the other hand, supervised methods usually train very deep Convolutional Neural Networks (CNN) [31] to predict the foreground [8, 22, 35, 50]. Common CNNs in the literature contain millions of parameters [23], and thus several tricks such as dropout and data augmentation techniques [30] are required to avoid the overfitting problem. Furthermore, a general answer about the sample complexity required to guarantee performance in deep learning regimen does not exist in CNNs [15, 47, 53]. Recently, a new semi-supervised learning category for MOS has emerged [18, 20, 21]. These semi-supervised methods for MOS can achieve good performance in several challenges with a small amount of labeled information, and they are inspired from the Graph Signal Processing (GSP) community [37, 48].
GSP tries to extend the classical concepts of digital signal processing to signals supported on graphs. Graphs can model complex relationships among data, and therefore GSP has attracted a lot of interest in the last few years. Specifically, the problems of sampling and reconstruction of graph signals have been broadly studied [2, 13, 39, 40, 46]. In the same way, GSP has been used in numerous applications such as computer vision [18, 20, 21], image and 3D point cloud processing [38, 52, 57], sensor networks [16], and the estimation of new cases of Coronavirus Disease 2019 [19].
In this work, we explain the basic notions, as well as the mathematical and technical foundations of MOS using GSP, for simplicity we call Graph Signal Processing-Moving Object Segmentation (GSP-MOS) to this new approach. We propose two new architectures and a new evaluation procedure for GSP-MOS. This new evaluation scheme takes into account the amount of labeled information used in the GSP-MOS algorithms. The idea of GSP-MOS is to model regions of pixels in videos as nodes living in a high dimensional space, and thereafter a semi-supervised method classifies if each node is either a static or moving object. GSP-MOS algorithms are usually composed of: 1) a segmentation method, 2) feature extraction procedure for nodes representation, 3) a graph construction method, 4) graph signal representation, 5) a graph signal sampling strategy, and 6) a semi-supervised learning algorithm inspired from reconstruction methods of GSP. GSP-MOS has the advantage of having good performance for static and moving camera videos with small amounts of labeled information. Furthermore, Giraldo et al. [21] answered the question of the sample complexity in semi-supervised learning regimen for GSP-MOS using the concept of Paley-Wiener space in graphs [44].
The rest of the paper is organized as follows. Section 2 presents the related work in GSP-MOS. Section 3 explains the basic concepts and mathematical notions of GSP-MOS algorithms. Section 4 introduces the experimental framework that has been used for GSP-MOS, and the new evaluation procedure. Finally, Sects. 5 and 6 present a summary of results and the conclusions of this work, respectively.
2 Related Work
There are few GSP-MOS algorithms in the literature. The first GSP-MOS algorithm was proposed in [20] and coined as Graph BackGround Subtraction with Total Variation minimization (GraphBGS-TV), this method uses a Mask Region CNN (Mask R-CNN) [24] with a set of handcrafted features. GraphBGS-TV solves the semi-supervised learning problem using the Total Variation (TV) of graph signals [29]. Giraldo and Bouwmans [18] proposed the GraphBGS method, where the segmentation step uses a Cascade Mask R-CNN [10], and the semi-supervised learning problem is solved with the Sobolev norm of graph signals [19]. Finally, Giraldo et al. [21] proposed Graph Moving Object Segmentation (GraphMOS). This method uses superpixel segmentation with handcrafted plus deep features. GraphMOS solves the semi-supervised learning problem with the Sobolev norm or TV of graph signals. Additionally, Giraldo et al. [21] extended the GSP-MOS ideas to Video Object Segmentation (VOS) [41], and they coined their algorithm as GraphVOS. This work summarizes and discusses the results of the GSP-MOS algorithms. Similarly, we present potential future directions in the field of GSP-MOS.
3 Moving Object Segmentation and Graph Signal Processing
This section presents the basic concepts and mathematical foundations of GSP-MOS. Figure 1 shows the general pipeline for GSP-MOS. Most GSP-MOS algorithms follow the same pipeline with some modifications, in this work we refer to architecture as the pipeline of a certain GSP-MOS algorithm.
3.1 Graph Signals Preliminaries
A graph \(G=(\mathcal {V},\mathcal {E})\) is represented with a set of nodes \(\mathcal {V}=\{1,\dots ,N\}\) and a set of edges \(\mathcal {E}=\{(i,j)\}\), where (i, j) is the edge between the nodes i and j. \(\mathbf {W} \in \mathbb {R}^{N\times N}\) is the adjacency matrix of the graph such that \(\mathbf {W}(i,j)\in \mathbb {R}^+\) is the weight of the edge (i, j) and \(\mathbf {W}(i,j)=0~\forall ~(i,j)\notin \mathcal {E}\). Therefore, \(\mathbf {W}\) is symmetric for undirected graphs. \(\mathbf {D} \in \mathbb {R}^{N\times N}\) is a diagonal matrix called the degree matrix of G such that \(\mathbf {D} = \mathrm {diag}(\mathbf {W1})\), where \(\mathbf {1} \in \mathbb {R}^{N}\) is a vector of ones. Similarly, \(\mathbf {L} = \mathbf {D}-\mathbf {W}\) is the combinatorial Laplacian operator. \(\mathbf {L}\) is positive semi-definite for undirected graphs, and thus \(\mathbf {L}\) has eigenvalues \(0=\lambda _1 \le \lambda _2 \le \dots \le \lambda _N\) and corresponding eigenvectors \(\{ \mathbf {u}_1,\mathbf {u}_2,\dots ,\mathbf {u}_N\}\). Finally, a graph signal is a function \(y:\mathcal {V} \rightarrow \mathbb {R}\), and we can represent this signal as \(\mathbf {y} \in \mathbb {R}^N\).
The graph Fourier basis of G is given by the eigenvalue decomposition of \(\mathbf {L}=\mathbf {U}\mathbf {\Lambda }\mathbf {U}^{\mathsf {T}}\) [37], where \(\mathbf {U}=[\mathbf {u}_1,\mathbf {u}_2,\dots ,\mathbf {u}_N]\) and \(\mathbf {\Lambda }=\mathrm {diag}(\lambda _1,\lambda _2,\dots ,\lambda _N)\). The Graph Fourier Transform (GFT) \(\mathbf {\hat{y}}\) of the graph signal is given by \(\mathbf {\hat{y}}=\mathbf {U}^{{\mathsf {T}}}\mathbf {y}\), and similarly the inverse GFT is given by \(\mathbf {y} = \mathbf {U}\mathbf {\hat{y}}\) [37]. A graph signal \(\mathbf {y}\) is called bandlimited if \(\exists ~\rho \in \{ 1,2,\dots ,N-1 \}\) such that its GFT satisfies \(\mathbf {\hat{y}}(i)=0~\forall ~i > \rho \). Similar to the classical digital signal processing theory, \(\rho \) is called the bandwidth of \(\mathbf {y}\) in GSP, and \(\lambda _{\rho }\) is the frequency associated with that bandwidth. Pesenson [44] used these concepts of frequency to define the space of all \(\omega \)-bandlimited graph signals such as \(PW_{\omega }(G)=\mathrm {span}(\mathbf {U}_{\rho }:\lambda _{\rho } \le \omega )\), where \(\mathbf {U}_{\rho }\) is the first \(\rho \) eigenvectors of \(\mathbf {L}\), and \(PW_{\omega }(G)\) is called the Paley-Wiener space of the graph.
3.2 Nodes Representation in GSP-MOS
GSP-MOS requires a representation for the nodes \(\mathcal {V}\) in the graph. A node of G is represented with a feature characterization of a group of pixels, where these regions are obtained with some segmentation algorithm. State-Of-The-Art (SOTA) GSP-MOS algorithms have mainly used superpixel [1], block-based [21], semantic [12], and instance segmentation [24] for the representation of nodes. Figure 2 shows an example of nodes representation with instance segmentation, where each instance is mapped to a node in the graph using certain feature representation. Figure 2 also shows the representation with the graph signals, where the green nodes are moving objects, the blue nodes are static objects, and the black nodes are unknown. Intuitively, the task of the semi-supervised learning algorithm is to color the graph signals (blue and green) as best as possible from the sampled graph (last graph to the right in Fig. 2).
Superpixel Segmentation. Giraldo et al. [21] used the Simple Linear Iterative Clustering (SLIC) method for superpixel segmentation [1]. SLIC is a well known classical method for superpixel segmentation. In the same way, more recent superpixel segmentation methods such as [56] can also be used. SLIC method outputs approximately \(\zeta \) superpixels per image, where \(\zeta \) is an important parameter for GSP-MOS algorithms. For example, bigger values of \(\zeta \) allow the processing of more detailed regions, but at the same time these values increase the computational complexity of the GSP-MOS algorithm. Indeed, Giraldo et al. [21] used a powerful Nvidia DGX-2 server machine to execute the experiments related to superpixels.
Block-Based Segmentation. Giraldo et al. [21] also used block-based segmentation for the representation of nodes in G. This method divides each image in equally sized blocks for further processing. Block-based has the advantage of being faster than SLIC, but it also introduces degradation in performance as shown in [21].
Semantic Segmentation. Some GSP-MOS algorithms in the literature have used Fully CNN based segmentation methods. For instance, Giraldo et al. [21] used the DeepLab method [12] for the representation of nodes. Semantic-based GSP-MOS algorithms work well in several challenging conditions, however it fails when there is interception of static and moving objects of the same semantic class. Figure 3 shows the output of a semantic segmentation method in a scene where there are moving cars just in front of parked cars in the background. In such cases, semantic methods fail for GSP-MOS since the method cannot differentiate between the static and moving object.
Instance Segmentation. Several GSP-MOS algorithms have focused on instance segmentation methods such as Mask R-CNN [24] and Cascade Mask R-CNN [10]. For example, GraphBGS-TV [20] used a Mask R-CNN with a ResNeXt-101 [55] as backbone, while GraphBGS [18] used a Cascade Mask R-CNN with a ResNeSt-200 [58] as backbone. Unlike semantic segmentation, instance segmentation for GSP-MOS algorithms has the advantage of differentiating between intercepted static and moving objects of the same semantic class as shown in Fig. 3. Instance segmentation also dramatically reduces the computational complexity with respect to superpixel and block-based segmentation [21].
3.3 Feature Extraction
The feature representation plays a central role for GSP-MOS. SOTA GSP-MOS algorithms have used combinations of handcrafted and deep features. For example, GraphBGS [18] and GraphBGS-TV [18] uses a combination of optical flow [33], intensity, background, and texture [36] features. In the same way, GraphMOS [21] uses the same combination of handcrafted features of GraphBGS and GraphBGS-TV, plus deep features inspired from the visual object tracking community [14, 27]. The deep features of GraphMOS are extracted from a pre-trained VGG-m model [11]. Each node in \(\mathcal {V}\) is represented by a M-dimensional vector \(\mathbf {x}_v \in \mathbb {R}^{M}\), where \(\mathbf {x}_v\) is the concatenation of all previously extracted features.
3.4 Graph Construction
GSP-MOS algorithms have little explored the construction of the graph, SOTA methods have used a k Nearest Neighborhood (k-NN) with a Gaussian kernel. Let \(\mathbf {X} \in \mathbb {R}^{N\times M}\) be the matrix of features of the N nodes such that \(\mathbf {X}=[\mathbf {x}_1,\mathbf {x}_2,\dots ,\mathbf {x}_N]^{\mathsf {T}}\). The construction of G is accomplished connecting the first k neighbors of each node, and thereafter the following kernel is applied to get the weight of each edge:
where \(\sigma \) is the standard deviation of the Gaussian kernel. \(\sigma \) is computed as follows:
Finally, we got the adjacency matrix \(\mathbf {W}(i,j)=w_{ij}~\forall ~(i,j) \in \mathcal {E}\) from this procedure.
3.5 Graph Signal
The graph signal is a matrix \(\mathbf {Y} \in \{0,1\}^{N\times Q}\), where Q is the number of classes of the problem. For example, Q is equal to two corresponding to static and moving objects in MOS. In the same way, Q is equal to the number of different objects for each video in VOS. Each row of \(\mathbf {Y}\) is a one-hot vector encoding the class of each node (static or moving object for MOS). Furthermore, the construction of \(\mathbf {Y}\) depends on the segmentation algorithm. One can construct the graph signal using the Intersection over Union (IoU) and the Intersection over Node (IoN) metrics proposed in [21]. The decision of whether a node is a static or moving object has been done using empirical formulations based on IoU, IoN, and the specific segmentation method. For further details, the readers are referred to [21].
3.6 Sample Complexity for Semi-supervised Learning
An important question in GSP-MOS is: how many labeled nodes are required to get perfect classification in the semi-supervised learning algorithm? The answer can be found in the Paley-Wiener space of G. Let \(\mathbf {y}\) be one of the columns of \(\mathbf {Y}\), if \(\mathbf {y} \in PW_{\omega }(G)\) then we can perfectly reconstruct the original graph signal \(\mathbf {y}\) with at least \(\rho \) samples. Formally, let \(\mathcal {S} \subset \mathcal {V}\) be a subset of nodes with \(\mathcal {S}=\{s_1,s_2,\dots ,s_m\}\), where \(m = \vert \mathcal {S}\vert \le N\) is the number of sampled nodes. Let \(\mathbf {y}(\mathcal {S})=\mathbf {My}\) be the sampled graph signal such that \(\mathbf {M}\) is a sampling matrix whose entries are given by \(\mathbf {M} = [\boldsymbol{\delta }_{s_1},\dots ,\boldsymbol{\delta }_{s_m}]^{\mathsf {T}}\) and \(\boldsymbol{\delta }_{v}\) is the one-hot vector centered at v. Let \(\mathbf {\tilde{y}} = \mathbf {\Phi My}\) be the reconstructed graph signal where \(\mathbf {\Phi } \in \mathbb {R}^{N \times m}\) is an interpolation operator. One can get perfect reconstruction if \(\mathbf {\Phi M}=\mathbf {I}\), i.e., \(\mathbf {\tilde{y}}=\mathbf {I}\mathbf {y}=\mathbf {y}\). In general, perfect reconstruction is not possible because \(\mathrm {rank}\,(\mathbf {\Phi M}) \le m \le N\). However, Chen et al. [13] proved that perfect reconstruction is possible when \(|\mathcal {S}| \ge \rho \). Giraldo et al. [21] used this idea to compute the sample complexity in GSP-MOS with the following corollary:
Corollary 1
(Giraldo’s corollary [21]). Let \(\mathbf {Y} \in \mathbb {R}^{N \times Q}\) be a graph signal matrix of a semi-supervised learning problem with Q classes. Let \(N_s\) be the sample complexity of the learning problem. \(\mathbf {Y}\) has a set of cutoff frequencies \(\{ \omega _1,\dots ,\omega _Q\}\), with corresponding bandwidths \(\{\rho _1,\dots ,\rho _Q\}\) for each column of the graph signal. As a consequence, \(N_s\) is bounded as follows:
Proof: see [21].
Corollary 1 provides an upper bound in the sample complexity for a semi-supervised learning problem. The upper bound is the maximum of the bandwidths \(\{\rho _1,\dots ,\rho _Q\}\) of each column of \(\mathbf {Y}\), i.e., we are assuming that each column of \(\mathbf {Y}\) is in the Paley-Wiener space of G.
3.7 Semi-supervised Learning Algorithms
GSP-MOS algorithms have mainly used semi-supervised learning algorithms inspired from reconstruction of graph signals. The objective functions of these semi-supervised learning problems usually involve a graph signal norm. For instance, GraphBGS [18] and GraphMOS [21] used the Sobolev norm, while GraphBGS-TV [20] used the TV norm.
Sobolev Norm. The Sobolev norm of graph signals was defined by Pesenson [45] as follows:
Definition 1
For parameters \(\epsilon \ge 0\) and \(\alpha \in \mathbb {R}\), the Sobolev norm of graph signals is introduced as follows:
where \(\mathbf {I}\) is the identity matrix.
The semi-supervised learning algorithm based on the Sobolev norm in Definition 1 can be expressed as follows:
where we need to find \(\mathbf {z}_q~\forall ~1 \le q \le Q\). Giraldo et al. [21] highlighted two valuable properties of Eqn. (5): 1) the optimization problem is convex, and 2) the Sobolev term \(\mathbf {L}+\epsilon \mathbf {I}\) is always invertible for \(\epsilon > 0\) in undirected graphs. For small values of N, the optimization function for all \(1\le q \le Q\) in Eqn. (5) can be solved as follows:
where \(\mathbf {Y}(\mathcal {S})\) is the sub-matrix of \(\mathbf {Y}\) with rows indexed by \(\mathcal {S}\). For larger values of N, one can solve Eqn. (5) using the interior-point method from quadratic programming or the GSP toolbox [43].
Total Variation. The optimization problem for TV minimization is given as follows:
where the TV norm is such that:
where \(\mathcal {N}_i\) is the set of first neighbors of the ith node. The minimization of the TV is related to the cluster assumption [29] and leads to piecewise constant signals as the solution of the optimization problem. GraphBGS-TV [20] solved Eqn. (7) using the GSP [43] and the Unlocbox [42].
4 Experimental Framework
The evaluation of GSP-MOS algorithms in [18, 20, 21] have been adjusted to the classical evaluation of MOS method (such as in background subtraction and VOS) for comparing with previous SOTA methods. These comparisons have been conducted with the F-measure metric defined as follows:
where TP, FP, and FN are the number of true positives, false positives, and false negatives pixels, respectively. However, another interesting approach for evaluation is to compute the F-measure along a set of different sampling densities (percentage of training images). As a result, we can see what is the performance of each method for a small labeled data regimen. In this work, we adopt both evaluation approaches: the classical procedure and the evaluation of GSP-MOS algorithms for several sampling densities.
4.1 Dataset
GSP-MOS algorithms have been evaluated on several datasets for background subtraction and VOS. Arguably, The most popular dataset is the Change Detection (CDNet2014) [54]. CDNet2014 contains 11 challenging categories including: bad weather, low frame rate, night videos, turbulence, baseline, dynamic backgrounds, Pan–Tilt–Zoom cameras (PTZ), camera jitter, intermittent object motion, shadow, and thermal. Each challenge of CDNet2014 contains from four up to six videos, and each video has a certain amount of labeled ground truth images.
4.2 Resources for GSP-MOS
There are some important resources to study GSP-MOS. For GSP, the survey papers [37, 48] are fundamental sources of information. Moreover, the paper [21] contains detailed theoretical explanations and proofs for GSP-MOS along with a GitHub repositoryFootnote 1 with publicly available codes. The readers are also encouraged to visit the GSP WebsiteFootnote 2.
5 Results and Discussion
Table 1 shows some visual results of the GSP-MOS algorithms GraphBGS-TV [19] and GraphBGS [18] on CDNet2014 against the deep learning method BSUV-Net [51]. Furthermore, Table 2 shows the numerical comparison of three GSP-MOS algorithms (GraphBGS [18], GraphBGS-TV [20], and GraphMOS [21]), two deep learning methods (FgSegNet v2 [32] and BSUV-Net [51]), and three unsupervised methods (SuBSENSE [49], IUTIS-5 [3], and SemanticBGS [9]). All methods in Table 2 have been evaluated with an agnostic video scheme [19, 34, 51] (unseen videos). One can see that GSP-MOS algorithms are good in almost all challenges except in intermittent object motion (I-O Motion) and in low frame rate (Low-F Rate). All GSP-MOS algorithms have used the median filter as the background initialization method, however more sophisticated methods such as [26] can be used to improve the performance of GSP-MOS in I-O Motion. Furthermore, the computation of the optical flow for the feature extraction can be affecting the performance of GSP-MOS algorithms due to the nature of Low-F Rate videos.
For the new evaluation strategy using several sampling densities, we adopt the following convention to name the architecture of a GSP-MOS algorithm: (semi-supervised learning method) - (segmentation method) - (k parameter for the k-NN method). Similarly, the following abbreviations are given: 1) SN means Sobolev norm with \(\epsilon =0.2\) and \(\alpha =1\), 2) TV means Total Variation, 3) M50 means instance segmentation with a Mask R-CNN with a ResNet-50, 4) X101 means instance segmentation with a Mask R-CNN with a ResNeXt-101, 5) C200 means instance segmentation with a Cascade Mask R-CNN and a ResNeSt-200, and 6) DL means semantic segmentation with DeepLab method. For example, SN-M50-30 is a GSP-MOS architecture that uses Sobolev norm minimization, a Mask R-CNN with ResNet-50, and k-NN with \(k=30\).
Figure 4 shows the results of several configurations of GSP-MOS algorithms. For example, TV-X101-20 corresponds to GraphBGS-TV [19], while SN-C200-30 corresponds to GraphBGS [18]. GraphMOS [21] was not evaluated using this procedure due to its high computational complexity. Similarly, we evaluate two new architectures namely SN-DL-30 and SN-X101-30. The design of an architecture for GSP-MOS should be focused on the specific challenges that we can have in our dataset. For instance, TV works better than Sobolev norm minimization in PTZ, camera jitter, and bad weather. Similarly, C200 works better than the others segmentation methods in almost all cases, however one should consider the high computational complexity of C200. Figure 4 also helps to evaluate certain algorithms given an amount of labeled information. For example, if one has a dataset with intermittent objects motions, and the percentage of labeled images in the dataset is less than \(2\%\), TV-X101-20 works better than the other architectures in that case.
6 Conclusions
In this study, we summarized the ideas and theoretical developments of SOTA GSP-MOS algorithms. Similarly, we proposed two new architectures of GSP-MOS and a new evaluation procedure for taking into account the sampling density of labeled information (the amount of training images). GSP-MOS architectures are usually composed of: 1) a segmentation method, 2) a background model, 3) a feature extraction for nodes representation, 4) a graph construction, 5) graph signal representation, 6) a sampling method, and 7) a semi-supervised learning method. GSP-MOS algorithms have shown good results for static as well as moving camera sequences. Furthermore, GSP-MOS algorithms have important theoretical guarantees rooted from the rich theory of GSP.
GSP-MOS algorithms have several directions for future research. For example, one can try to use light CNN architectures to learn better movement features than using handcrafted and pre-trained CNN features. Similarly, one can try to improve the performance of GSP-MOS using other tools such as more sophisticated background initialization methods, or feature selection/dimensionality reduction after the feature extraction procedure.
References
Achanta, R., et al.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Anis, A., Gadde, A., Ortega, A.: Efficient sampling set selection for bandlimited graph signals using graph spectral proxies. IEEE Trans. Signal Process. 64(14), 3775–3789 (2016)
Bianco, S., Ciocca, G., Schettini, R.: Combination of video change detection algorithms by genetic programming. IEEE Trans. Evol. Comput. 21(6), 914–928 (2017)
Bouwmans, T.: Traditional and recent approaches in background modeling for foreground detection: an overview. Comput. Sci. Rev. 11, 31–66 (2014)
Bouwmans, T., El Baf, F., Vachon, B.: Background modeling using mixture of Gaussians for foreground detection-a survey. Recent Patents Comput. Sci. 1(3), 219–237 (2008)
Bouwmans, T., Zahzah, E.H.: Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance. Comput. Vis. Image Underst. 122, 22–34 (2014)
Bouwmans, T., et al.: Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset. Comput. Sci. Rev. 23, 1–71 (2017)
Bouwmans, T., et al.: Deep neural network concepts for background subtraction: a systematic review and comparative evaluation. Neural Netw. 117, 8–66 (2019)
Braham, M., Piérard, S., Van Droogenbroeck, M.: Semantic background subtraction. In: IEEE ICIP (2017)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1483–1498 (2019)
Chatfield, K., et al.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Chen, L.C., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, S., et al.: Discrete signal processing on graphs: sampling theory. IEEE Trans. Signal Process. 63(24), 6510–6523 (2015)
Danelljan, M., et al.: ECO: efficient convolution operators for tracking. In: IEEE CVPR (2017)
Du, S.S., et al.: How many samples are needed to estimate a convolutional neural network? In: NeurIPS (2018)
Egilmez, H.E., Ortega, A.: Spectral anomaly detection using graph-based filtering for wireless sensor networks. In: IEEE ICASSP (2014)
Garcia-Garcia, B., Bouwmans, T., Silva, A.J.: Background subtraction in real applications: challenges, current models and future directions. Comput. Sci. Rev. 35, 100204 (2020)
Giraldo, J.H., Bouwmans, T.: GraphBGS: background subtraction via recovery of graph signals. In: ICPR (2021)
Giraldo, J.H., Bouwmans, T.: On the minimization of Sobolev norms of time-varying graph signals: estimation of new Coronavirus disease 2019 cases. In: IEEE MLSP (2020)
Giraldo, J.H., Bouwmans, T.: Semi-supervised background subtraction of unseen videos: minimization of the total variation of graph signals. In: IEEE ICIP (2020)
Giraldo, J.H., Javed, S., Bouwmans, T.: Graph moving object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Giraldo, J.H., Le, H.T., Bouwmans, T.: Deep learning based background subtraction: a systematic survey. In: Handbook of Pattern Recognition and Computer Vision, p. 51 (2020)
He, K., et al.: Deep residual learning for image recognition. In: IEEE CVPR (2016)
He, K., et al.: Mask R-CNN. In: IEEE CVPR (2017)
Javed, S., et al.: Spatiotemporal low-rank modeling for complex scene background initialization. IEEE Trans. Circuit Syst. Video Technol. 28(6), 1315–1329 (2016)
Javed, S., et al.: Background-foreground modeling based on spatiotemporal sparse subspace clustering. IEEE Trans. Image Process. 26(12), 5840–5854 (2017)
Javed, S., et al.: Robust structural low-rank tracking. IEEE Trans. Image Process. 29, 4390–4405 (2020)
Javed, S., et al.: Moving object detection in complex scene using spatiotemporal structured-sparse RPCA. IEEE Trans. Image Process. 28(2), 1007–1022 (2018)
Jung, A., et al.: Semi-supervised learning in network-structured data via total variation minimization. IEEE Trans. Signal Process. 67(24), 6256–6269 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lim, L.A., Keles, H.Y.: Learning multi-scale features for foreground segmentation. Pattern Anal. Appl. 23(3), 1369–1380 (2020)
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision (1981)
Mandal, M., Vipparthi, S.K.: Scene independency matters: an empirical study of scene dependent and scene independent evaluation for CNN-based change detection. IEEE Trans. Intell. Transp. Syst., 1–14 (2020)
Mandal, M., et al.: 3DCD: scene independent end-to-end spatiotemporal feature learning framework for change detection in unseen videos. IEEE Trans. Image Process. 30, 546–558 (2020)
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 7, 971–987 (2002)
Ortega, A., et al.: Graph signal processing: overview, challenges, and applications. Proc. IEEE 106(5), 808–828 (2018)
Pang, J., et al.: Optimal graph Laplacian regularization for natural image denoising. In: IEEE ICASSP (2015)
Parada-Mayorga, A., et al.: Blue-noise sampling on graphs. IEEE Trans. Signal Inf. Process. Netw. 5(3), 554–569 (2019)
Parada-Mayorga, A., et al.: Sampling of graph signals with blue noise dithering. In: IEEE DSW (2019)
Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE CVPR (2016)
Perraudin, N., et al.: UNLocBoX a Matlab convex optimization toolbox using proximal splitting methods. arXiv preprint arXiv:1402.0779
Perraudin, N., et al.: GSPBOX: a toolbox for signal processing on graphs. arXiv preprint arXiv:1408.5781 (2014)
Pesenson, I.: Sampling in Paley-Wiener spaces on combinatorial graphs. Trans. Amer. Math. Soc. 360(10), 5603–5627 (2008)
Pesenson, I.: Variational splines and Paley-Wiener spaces on combinatorial graphs. Constructive Approximation 29(1), 1–21 (2009)
Romero, D., Ma, M., Giannakis, G.B.: Kernel-based reconstruction of graph signals. IEEE Trans. Signal Process. 65(3), 764–778 (2016)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Shuman, D.I., et al.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)
St-Charles, P.L., Bilodeau, G.A., Bergevin, R.: SuBSENSE: a universal change detection method with local adaptive sensitivity. IEEE Trans. Image Process. 24(1), 359–373 (2014)
Sultana, M., et al.: Unsupervised deep context prediction for background estimation and foreground segmentation. Mach. Vis. Appl. 30(3), 375–395 (2019)
Tezcan, O., Ishwar, P., Konrad, J.: BSUV-Net: a fully-convolutional neural network for background subtraction of unseen videos. In: IEEE WACV (2020)
Thanou, D., Chou, P.A., Frossard, P.: Graph-based compression of dynamic 3D point cloud sequences. IEEE Trans. Image Process. 25(4), 1765–1778 (2016)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-2440-0
Wang, Y., et al.: CDnet 2014: an expanded change detection benchmark dataset. In: IEEE CVPR-W (2014)
Xie, S., et al.: Aggregated residual transformations for deep neural networks. In: IEEE CVPR (2017)
Yang, F., et al.: Superpixel segmentation with fully convolutional networks. In: IEEE CVPR (2020)
Zhang, C., Florencio, D., Loop, C.: Point cloud attribute compression with graph transform. In: IEEE ICIP (2014)
Zhang, H., et al.: ResNeSt: split-attention networks. arXiv preprint arXiv:2004.08955 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Giraldo, J.H., Javed, S., Sultana, M., Jung, S.K., Bouwmans, T. (2021). The Emerging Field of Graph Signal Processing for Moving Object Segmentation. In: Jeong, H., Sumi, K. (eds) Frontiers of Computer Vision. IW-FCV 2021. Communications in Computer and Information Science, vol 1405. Springer, Cham. https://doi.org/10.1007/978-3-030-81638-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-81638-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81637-7
Online ISBN: 978-3-030-81638-4
eBook Packages: Computer ScienceComputer Science (R0)