Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In computer vision tasks, the object extracted from the digital image for further analysis or recognition has to be properly represented. So called descriptors are applied for this purpose. The algorithms work on particular features, e.g. luminance, colour, texture, shape, context of the information, etc. [1]. The selection of a feature is crucial, strongly depends on the application, and influences the obtained results. For example, shape descriptors perform better for rigid objects, e.g. machine parts, car license plates, airplanes, while they are worse for the recognition of living beings, like animals or humans. It does not mean that they cannot be used for those types of silhouettes, but usually more sophisticated algorithms or more numerous template database have to be used in such cases. The colour would be applicable for the analysis of art images, e.g. the classification of paintings made by an artist [2]. The texture is better for the analysis of aerial images. Many other examples could be easily recalled. The combination of various features is becoming very popular nowadays and is effective in some applications [3, 4].

In this paper the usage of greyscale information for object representation and further recognition is analysed. Such an approach was applied for example in [5], where the moment theory is applied for shapes with greyscale attributes. The described algorithm is experimentally investigated using signatures and hand gestures. Another example is described in [6], where the gradient and curvature of the greyscale is applied for the recognition of handwritten numerals. A very similar application (character recognition) is described in [7]. The detection of three types of objects in an image, based on greyscale information was described in [8]. These are only few exemplary applications of the greyscale as a feature for various computer vision tasks. In this paper the Polar–Fourier Greyscale Descriptor is applied for the automatic recognition of traffic signs. The main goal is the experimental analysis of the algorithm’s parameters, when applied for greyscale objects that are significantly distorted by various factors. In [9] the most typical problems occurring when recognising road signs were identified: color fading, similarity among various classes, varying standardization for particular countries, weather conditions (e.g. rain, snow, fog, sunlight), objects visible in the scene that are similar to signs, disorientation, occlusion, damage, car vibration and motion blur, variations in illumination, shadows, highlights. Several distorted traffic signs images are provided in Fig. 1.

Fig. 1.
figure 1

Examples of distorted images containing traffic signs.

Recently the automatic road signs recognition has become popular. Many car producers have applied the system assisting the driver this way, which is a result of many years of research on the problem. In this paper only the last stage is considered, namely the description and classification of the previously located and extracted traffic signs, since the stress is put on the distortions hampering the recognition. Several approaches were applied for the description of the extracted road signs so far, e.g. Scale-Invariant Feature Transform (SIFT) [10], Haar-like features [11], Error Correcting Output Codes (ECOC) [12], FOveal System for Traffic Signs (FOSTS) [13], fractal reconstruction [14], Fourier Descriptors [15], genetic algorithms [16], HOG features [17], Colour DistanceTransform (CDT) [18], blob signature [19], Gabor wavelets [20], Zernike moments [21]. In the paper the Polar–Fourier Greyscale Descriptor (P-FGD) is applied to the problem.

The rest of the paper is organised as follows. The second section describes the applied algorithm. The third section provides the experimental conditions and results, and finally, the last section concludes the paper.

2 The Polar–Fourier Greyscale Descriptor

The descriptor under consideration (Polar–Fourier Greyscale Descriptor, P–FGD) was introduced in [22]. So far it has been applied for the identification of erythrocyte types for the automatic diagnosis of some diseases [22], the biometric identification based on ear images [23], and the recognition of objects similar in shape [24]. The algorithm is composed of several stages, however, the most important is the usage of polar and 2D Fourier transforms for greyscale object. The extracted subspectrum (10\(\times \)10 size) describes the represented object. The P-FGD is invariant to size, rotation and location within the image plane. It is also robust to some level of noise. In Fig. 2 some examples of various objects represented using the P-FGD are presented. The original images in greyscale as well as the obtained representation — the normalized polar–transformed images – are provided.

In the research described in this paper the improved version of the descriptor is employed. The algorithm can be described as follows:

  • 1. Median filtering of the input subimage I with the kernel of size 3.

  • 2. Low–pass convolution filtering using the square mask composed of nine ones and the normalization parameter of 9.

  • 3. Derivation of the centroid denoted as O:

    $$\begin{aligned} m_{pq} = \sum _{x}^{} \sum _{y}^{} x^p y^q I(x,y), \end{aligned}$$
    (1)
    $$\begin{aligned} x_c = \frac{m_{10}}{m_{00}},\qquad \qquad y_c = \frac{m_{01}}{m_{00}}. \end{aligned}$$
    (2)
  • 4. Finding the maximal distances \(d_{maxX}\) , \(d_{maxY}\) for \(X-\) and \(Y-\)axis respectively from the boundaries of I to the centroid O.

  • 5. Expanding the image into both directions by \(d_{maxX} - x_c\) and \(d_{maxY} - y_c\) and filling in the occurring new parts using greyscale level 127.

  • 6. Derivation of the polar coordinates and insertion in the image P:

    $$\begin{aligned} \rho _{i} =\sqrt{\left( x_{i} -x_c \right) ^{2} +\left( y_{i} -y_c \right) ^{2} }, \qquad \qquad \theta _{i} =atan\left( \frac{y_{i} -y_c }{x_{i} -x_c } \right) . \end{aligned}$$
    (3)
  • 7. Resizing the image P into square size, e.g. \(128 \times 128\).

  • 8. Derivation of the absolute two-dimensional Fourier transform [25]:

    $$\begin{aligned} C(k,l)=\frac{1}{HW} \left| \sum _{h=1}^{H}\sum _{w=1}^{W}P(h,w)\cdot e^{(-i\frac{2\pi }{H} (k-1)(h-1))} \cdot e^{(-i\frac{2\pi }{W} (l-1)(w-1))} \right| , \end{aligned}$$
    (4)

    where:

    H, W — height and width of P,

    k — sampling rate in vertical direction \((k \ge 1\,\) and \(\, k \le H)\),

    l — sampling rate in horizontal direction \((l \ge 1\,\) and \(\, l \le W)\),

    C(kl) — the coefficient of discrete Fourier transform in \(k-th\) row and \(l-th\) column,

    P(hw) — value in the image plane with coordinates h, w.

  • 9. Selection of the spectrum subpart, e.g. \(10\ldots 10\) size and concatenation into vector V.

Fig. 2.
figure 2

Examples of various objects represented using the Polar–Fourier Greyscale Descriptor — the normalised polar–transformed images are presented, before the application of two–dimensional Fourier transform.

3 Conditions and Results of the Experiment

As it was already mentioned, the main goal of the performed experiment was to investigate the efficiency of the Polar–Fourier Greyscale Descriptor when applied to strongly deformed and distorted objects extracted from the digital images. For this purpose the traffic signs were selected as they sometimes are very difficult to recognise in real world conditions. Amongst several publicly available databases one of the most popular is The German Traffic Sign Recognition Benchmark [26], hence it was used as the source of the images for the described experiments. The extracted road signs were used. In total, 10000 images were applied and 20 different classes were used. The examples of images employed in the experiment were presented in Fig. 1. For each class 50 instances were randomly selected from the 500 images and used as the learning examples (i.e. they were the templates) and 200 random images were employed as the test data. This procedure was repeated ten times and the average recognition rate was obtained. The Polar–Fourier Greyscale Descriptor was employed for the representation of the objects and the Euclidean distance was used for the selection of the template closest to a test instance. As a result, the recognised class was established. The average efficiency for particular classes is provided in Table 1.

Table 1. The average efficiency obtained for particular classes.

The obtained average efficiency exceeds \(89\,\%\). It seems not perfect, however it has to be stressed that the images used in the experiments were in many cases difficult to recognise even for humans. Several examples are presented in Fig. 1. The analysis of the results brings the conclusion that the most difficult are blurred images (resulting from the fast movement of the car with installed recording camera) and unusual light conditions, when the images are too dark or too bright. In those cases the evaluated descriptor failed. The examples of wrongly recognised traffic signs are provided in Fig. 3.

Fig. 3.
figure 3

Examples of wrongly recognised speed limit traffic signs.

4 Concluding Remarks

In the paper the experimental results on the application of the Polar–Fourier Greyscale Descriptor to the recognition of traffic signs were described. The algorithm is based on the combination of the polar and Fourier transforms. The usage of greyscale gives more information than when using only the shape. In case of the analysed descriptor the method of its derivation allows for the consideration of the object’s silhouette as well. However, above all the greyscale is taken into account. It is assumed that this makes the final representation more effective.

For the experiments the images from The German Traffic Sign Recognition Benchmark [26] were applied. The selection of the traffic signs for the experiments was based on the strong distortions and deformations of the real data for this case. The main goal was the analysis of the efficiency of the P–FGD in this difficult case. In total, 10000 images were employed and the average efficiency above \(89\,\%\) was obtained, which can be considered as a good result, considering the strong distortions of the experimental data (some examples can be seen in Figs. 1 and 3).