Keywords

1 Introduction

Computer vision is a wide and emerging area over the past few years. The analysis of images involving humans comes under computer vision problem. Human detection techniques are used in many areas such as people abnormal behavior monitoring, robots, automobile safety systems, and gait recognition. The main goal of a human detector is to check whether humans are present in the image or not. If human is identified in the particular image then it can be used for further analysis. Human detection is still an open problem. Human detection is one of the active and challenging problems in computer vision, due to different articulations and poses, different types of appearances of clothes and accessories acting as occlusions. In this paper humans are identified in a static image. Identifying humans in a static image is more difficult than in a video sequence because no motion and background information is available to provide clues to approximate human position. In our approach, input of the human detector is an image and output is a decision value finding whether there is a human in a given image or not. In this paper static images are considered to detect humans.

2 Related Works

Many human detection techniques have been proposed so far in different approaches. The implicit shape model (ISM) [1], a part-based object detection algorithm proposed by Leibe et al., uses local features derived from a visual vocabulary or codebook as object parts. Codebooks are generated using SIFT [2] or shape context local feature descriptor. Lu et al. proposed image depth based algorithm [3] to detect humans by taking depth information of given image. Jiaolong et al. proposed a part-based classifier technique [4] to detect humans in a given image window. In this method mixture of parts technique was used for part sharing among different aspects. Andriluka et al. proposed a generic approach for nonrigid object detection and articulated pose estimation based on the pictorial structures framework [5]. Gavrila et al. introduced a template matching approach for pedestrian detection [6]. Template hierarchy of pedestrian silhouettes is built to capture the variety of pedestrian shapes. For identifying shapes, canny edge detector [7] is used.

Gradient orientation based feature descriptors such as SIFT [2], HOG [8], CoHOG [9], etc., are recent trends in human detection. SIFT [2] (scale-invariant feature transform) features proposed by Lowe et al. are used in human body parts detection in [10]. Histogram-based features are popularly used in human recognition and object detection because of their robustness. Histograms of oriented gradients (HOG) [8] is a famous and effective method for human detection. It uses histograms of oriented gradients as a feature descriptor. HOG features are robust towards illumination variances and deformations in objects. Co-occurrence histograms of oriented gradients (CoHOG) [9] is an extensive work of HOG which has more detection rate and lesser miss rate. In recent days, CoHOG used in many computer vision applications such as object recognition [11], image classification [12], and character recognition [13] . In CoHOG, co-occurrence matrices calculated for oriented gradients for making feature descriptor strong. In CoHOG only gradient direction details are considered and gradient magnitude details are ignored. In the proposed method, gradient magnitude components are also considered to bring more accuracy to the existing CoHOG.

The rest of the paper is organized as follows: Sect. 2 gives a brief overview of HOG and CoHOG. Proposed method W-CoHOG is discussed in Sect. 3 in detail. Section 4 contains experimental results and comparison with existing methods. Finally the work concluded in Sect. 5.

3 Background: HOG and CoHOG

3.1 HOG

In HOG, initially gradients are computed on each pixel in a given image and are divided into nine orientations. Next the image is divided into small nonoverlapping regions. Typical regions are of size 8 × 8 or 16 × 16. Then HOGs are calculated for each and every small region. Finally histograms of each region are concatenated using vectorization.

3.2 CoHOG

Co-occurrence histograms of oriented gradients (CoHOG) is an extension to HOG and more robust than HOG. In CoHOG pair of oriented gradients is used instead of single gradient orientation. Co-occurrence matrix is calculated for pair of gradient orientation with different offsets.

$$C_{\Delta x,\Delta y} \left( {p,q} \right) = \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{m} \left\{ {\begin{array}{*{20}l} 1 & {{\text{if}}\,I\left( {i,j} \right) = p\,{\text{and}}\,I\left( {i + x,j + y} \right) = q} \\ {\text{None}} & {\text{Otherwise}} \\ \end{array} } \right.$$
(1)

Equation (1) shows the calculation of co-occurrence matrix. Figure 1a shows typical co-occurrence matrix histograms of oriented gradients and Fig. 1b shows possible offsets for CoHOG.

Fig. 1
figure 1

a Typical co-occurrence matrix histogram. b Possible offsets

In CoHOG, orientation values of gradient are alone considered and magnitude is ignored. In the proposed method magnitude is also considered, as magnitude also contains discriminative information for human detection. Let us consider the following example: Fig. 2a is quite different from Fig. 2b because of different magnitude values even though it has same gradient orientation. Hence magnitude also describes about what image contains. Existing feature descriptors does not consider magnitude details.

Fig. 2
figure 2

Magnitudes of two gradients having same orientation. a and b are in same orientations but they are not the same

4 Proposed Method (W-CoHOG)

4.1 Overview

In the CoHOG method gradient directions are alone considered and magnitude is ignored. In the proposed method magnitude is also considered to extract more robust feature. Magnitude weighted co-occurrence histograms of oriented gradients (W-CoHOG) is proposed for better feature descriptor. Figure 3 briefly explains the classification process for human detection using W-CoHOG extraction method.

Fig. 3
figure 3

Our proposed classification process

Initially, gradients of image are computed in magnitude and direction form and converted into oriented gradients. Next, image is divided into 3 × 6 or 6 × 12 sized non-overlapping cells. Then, weighted co-occurrence matrices are computed for each region. After that, all co-occurrence matrices of all regions are combined.

4.2 Feature Extraction

For a given input image, gradients are computed for each pixel. In this method, Sobel and Robert’s filters are used to compute gradients of a given input image. Equations (2) and (3) show gradient calculation using Sobel and Robert’s filters, respectively, for a given input image I, as shown in below.

Sobel gradient operator

$$({\text{a}})\,G_{x} = \left[ {\begin{array}{*{20}c} { - 1} & 0 & { + 1} \\ { - 2} & 0 & { + 2} \\ { - 1} & 0 & { + 1} \\ \end{array} } \right] \,^{*} \,I\quad \quad ( {\text{b)}}\,G_{y} = \left[ {\begin{array}{*{20}c} { + 1} & { + 2} & { + 1} \\ { 0} & { 0} & { 0} \\ { - 1} & { - 2} & { - 1} \\ \end{array} } \right]\,^{*} \,I$$
(2)

Robert’s gradient operator

$$({\text{a}})\,G_{x} = \left[ {\begin{array}{*{20}c} { + 1} & { 0} \\ { 0} & { - 1} \\ \end{array} } \right] \,^{*} \,I\quad \quad ({\text{b}})\,G_{y} = \left[ {\begin{array}{*{20}c} { 0} & { + 1} \\ { - 1} & { 0} \\ \end{array} } \right]\,^{*} \,I$$
(3)

Then, gradients are converted into magnitude and direction using Eq. (4). The gradients directions are converted into eight equal bins with 450 intervals.

$$( {\text{a)}}\,\uptheta = \tan^{ - 1} \frac{{g_{x} }}{{g_{y} }}\quad \quad ( {\text{b)}}\,m = \sqrt {g_{x}^{2} + g_{y}^{2} }$$
(4)

After that, magnitude matrix is convoluted with mean mask to eliminate noise which may cause aliasing effect. Equation (5) shows the 7 × 7 mean mask used in the proposed method. Figure 4 shows the overview of W-CoHOG feature calculation process.

Fig. 4
figure 4

Overview of W-CoHOG calculation

$${\text{Conv}}_{7 \times 7} = \frac{1}{49}\left[ {\begin{array}{*{20}c} 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \end{array} } \right]$$
(5)

Weight Function

In this proposed method magnitude component of a gradient is used as weight function to calculate weighted co-occurrence matrix. In order to calculate magnitude weighted co-occurrence matrix, the magnitude weights of each pixel are calculated. Weight function is applied to co-occurrence matrix to influence the co-occurrence matrix using gradient magnitude of each pixel. The weight functions used in this method are described in following paragraph.

Let I be a given input image. i, j are the any two orientations in the given eight orientations and Δx, Δy are offset for co-occurrence. \(C_{{\Delta x,\Delta y}} \left( {i,j} \right)\) is weighted co-occurrence matrix for a given offset Δx, Δy and orientation i, j. The Eqs. (6 and 7) describes the calculation of the weighted co-occurrence matrix.

$$C_{\Delta x,\Delta y} \left( {i,j} \right) = \mathop \sum \limits_{p = 1}^{n} \mathop \sum \limits_{q = 1}^{m} \left\{ {W_{{\left( {p,q} \right) ,(p+\Delta x,p+\Delta y)}} *\alpha } \right.$$
(6)

where

$$\alpha = \left\{ {\begin{array}{*{20}l} 1 & {{\text{if}}\,O\left( {p,q} \right) = i \,{\text{and}}\,O\left( {p +\Delta x,q +\Delta y} \right) = j} \\ 0 & {\text{Otherwise }} \\ \end{array} } \right.$$
(7)

Let \(m_{{p, q}}\) be a gradient at a given pixel p, q for a given input image I. \(\bar{M}\) and M max are mean and maximum gradient values in I. The weight calculation was performed with simple operations like mean and division operations. Equations (8 and 9) show two possible weight functions to calculate weight for a given pixel (p, q) and (p + Δx, q + Δy). Any of the two functions is preferable to calculate weights for calculating weighted co-occurrence matrix. In this proposed method, Eq. (8) is used to calculate weights for experimental results.

$$W_{{\left( {p,q} \right) ,(p + \Delta x,p + \Delta y)}} = \left(\frac{{m_{p,q} }}{{\bar{M}}}*\frac{{m_{p + \Delta x,p + \Delta y} }}{{\bar{M}}}\right) + \mu$$
(8)
$$W_{{\left( {p,q} \right) ,(p + \Delta x,p + \Delta y)}} = \left(\frac{{m_{p,q} }}{{M_{\hbox{max} } }}*\frac{{m_{p + \Delta x,p + \Delta y} }}{{M_{\hbox{max} } }}\right) + \mu$$
(9)

where, μ is constant and μ = 1.

After computing magnitude weighted co-occurrence matrices for all regions, the matrices are vectorized by simple concatenation of all matrix rows into a single row. There are 31 offsets possible for calculating co-occurrence matrix shown in Fig. 1b. Co-occurrence matrices need not be calculated for all offsets. In calculation of W-CoHOG, two offsets are good enough for pedestrian detection problem.

The size of feature vector is very large in histogram-based feature descriptors. For these types of features linear SVM [04] classifier is suitable. In this proposed method LIBLINEAR classifier [14] is used. LIBLINEAR classifier is an SVM-based classifier which works faster than SVM classifier [15] even for million instances of data. HOG and CoHOG also used SVM classifier for classification.

5 Experimental Results

Experiments are conducted on two datasets Daimler Chrysler [11] and INRIA dataset [8]. These are the familiar benchmark datasets for human detection. In Daimler Chrysler dataset, 4,800 human images and 5,000 nonhuman images are taken for training; and another 4,800 human images for training and 5,000 images are taken for testing. Each image size in Daimler Chrysler dataset is 48 × 96 pixels. In INRIA dataset, 1208 positive images and 12,180 patches are randomly sampled from person-free image for training and testing.

Figures 5 and 6 show that sample positive and negative examples of INRIA dataset and Chrysler dataset, respectively. Negative images are generated by taking 64 × 128 patches from no person images in INRIA dataset. Simple Sobel filter and Roberts filter were used to calculate the gradients of input image.

Fig. 5
figure 5

INRIA sample images in dataset

Fig. 6
figure 6

Chrysler sample images in dataset

ROC curves are used for performance evaluation of binary classification like object detection problems. Sliding window technique is used to detect the humans in the image. A typical scanning window size for INRIA dataset is equal to the same as positive image size 64 × 128. In this paper, true positive rate versus false positive per window (FPPW) was plotted to evaluate the performance of proposed method and to compare with state-of-the-art methods. An ROC curves towards the top-left of the graph means better performance for classification problem. Figures 7 and 8 clearly show that curves obtained by proposed method achieved better detection rate for all false positive rates than other existing methods or at least comparable. The results clearly show that our method reduced miss rate around 20 % compared with CoHOG. The accuracy of the classifier is also better than other state-of-the-art methods shown in the figure. In the proposed method only two offsets are used instead of all 31 possible offsets, even though good results are acquired by adding gradient magnitude component.

Fig. 7
figure 7

Comparison of proposed method with other state-of-the-art methods (INRIA Dataset)

Fig. 8
figure 8

Comparison of proposed method with other state-of-the-art methods (Chrysler Dataset)

6 Conclusion

In this paper a new method called weighted CoHOG is proposed which is an extension work to CoHOG. Magnitude component is also added to feature vector to improve the classification. The proposed method achieved improvement in accuracy on two benchmark datasets. Experimental results prove that performance of the proposed method is better than the other state-of-the-art methods. Even though calculation of weights adds additional computational complexity, the overall feature vector generation time decreased by reducing the number of offsets to two. Future work involves proposed feature descriptor to be used in other applications such as person tracking.