Abstract
We present a new parallel algorithm for image feature extraction. which uses a distance function based on the LZ-complexity of the string representation of the two images. An input image is represented by a feature vector whose components are the distance values between its parts (sub-images) and a set of prototypes. The algorithm is highly scalable and computes these values in parallel. It is implemented on a massively parallel graphics processing unit (GPU) with several thousands of cores which yields a three order of magnitude reduction in time for processing the images. Given a corpus of input images the algorithm produces labeled cases that can be used by any supervised or unsupervised learning algorithm to learn image classification or image clustering. A main advantage is the lack of need for any image processing or image analysis; the user only once defines image-features through a simple basic process of choosing a few small images that serve as prototypes. Results for several image classification problems are presented.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Graphic Processing Unit
- Input Image
- Image Classification
- Image Feature Extraction
- Unsupervised Learning Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Image classification research aims at finding representations of images that can be automatically used to categorize images into a finite set of classes. Typically, algorithms that classify images require some form of pre-processing of an image prior to classification. This process may involve extracting relevant features and segmenting images into sub-components based on some prior knowledge about their context [1, 2]. In [3] we introduced a new distance function, called Universal Image Distance (UID), for measuring the distance between two images. The UID first transforms each of the two images into a string of characters from a finite alphabet and then uses the string distance of [4] to give the distance value between the images. According to [4] the distance between two strings x and y is a normalized difference between the complexity of the concatenation xy of the strings and the minimal complexity of each of x and y. By complexity of a string x we mean the Lempel-Ziv complexity [5]. In [6] we presented a serial algorithm to convert images into feature vectors where the \(i^{th}\) dimension is a feature that measures the UID distance between the image and the \(i^{th}\) feature category. One of the advantages of the UID is that it can compare the distance between two images of different sizes and thus the prototypes which are representative of the different feature categories may be relatively small. For instance, a prototype of airplane category can be a small image of an airplane over a simple background such as blue sky.
In this paper we introduce a parallel distributed algorithm which is based on the serial algorithm of [6]. Compared to [6] the current version of the algorithm offers a very large acceleration in processing speed which allows us to test the algorithm on more challenging image classification tasks. On a standard graphics processing unit (GPU) it improves the execution speeds relative to [6] by more than three orders of magnitude. The algorithm converts an input image into a labeled case and doing this for the corpus of images, each labeled by its class, yields a data set that can be used to train any ‘off-the-shelf’ supervised or unsupervised learning algorithm. After describing our method in details we report on the speed and accuracy that are achieved by this method.
It is noteworthy that our process for converting an image into a finite dimensional feature vector is very straightforward and does not involve any domain knowledge or image analysis expertise. Compared to other image classification algorithms that extract features based on sophisticated mathematical analysis, for instance, analyzing the texture, or checking for special properties of an image, our approach is very basic and universal. It is based on the complexity of the ‘raw’ string-representation of an image. Our approach is to extract features automatically just by computing distances from a set of prototypes images that are selected once at the first stage.
The algorithm that we present here is designed with the main aim of scalable distributed computations. Building on recent ideas [7], we designed it to take advantage of relatively cheap and massively-parallel processors that are ubiquitous in today’s technology. Our method extracts image features that are unbiased in the sense that they do not employ any heuristics in contrast to other common image-processing techniques [1, 2]. The features that we extract are based on information implicit in the image and obtained via a complexity-based UID distance which is an information-theoretic measure.
2 Distance
The UID distance function [3] is based on the LZ- complexity of a string. The definition of this complexity follows [4, 5]: let S, Q and R be strings of characters that are defined over the alphabet \(\mathcal{A}\). Denote by l(S) the length of S, and S(i) denotes the \(i^{th}\) element of S. We denote by S(i, j) the substring of S which consists of characters of S between position i and j (inclusive). An extension \(R=SQ\) of S is reproducible from S (denoted as \(S\rightarrow S\)) if there exists an integer \(p\le l(S)\) such that \(Q(k)=R(p+k-1)\) for \(k=1,\ldots ,l(Q)\). For example, \(aacgt\rightarrow aacgtcgtcg\) with \(p=3\) and \(aacgt\rightarrow aacgtac\) with \(p=2\). R is obtained from S (the seed) by first copying all of S and then copying in a sequential manner l(Q) elements starting at the \(p^{th}\) location of S in order to obtain the Q part of R.
A string S is producible from its prefix S(1, j) (denoted \(S(1,j)\Rightarrow S\)), if \(S(1,j)\rightarrow S(1,l(S)-1)\). For example, \(aacgt\Rightarrow aacgtac\) and \(aacgt\Rightarrow aacgtacc\) both with pointers \(p=2\). The production adds an extra ‘different’ character at the end of the copying process which is not permitted in a reproduction.
Any string S can be built using a production process where at its \(i^{th}\) step we have the production \(S(1,h_{i-1})\Rightarrow S(1,h_{i})\) where \(h_{i}\) is the location of a character at the \(i^{th}\) step. (Note that \(S(1,0)\Rightarrow S(1,1)).\) An m-step production process of S results in parsing of S in which \(H(S)=S(1,h_{1})\cdot S(h_{1}+1,h_{2})\cdots S(h_{m-1}+1,h_{m})\) is called the history of S and \(H_{i}(S)=S(h_{i-1}+1,h_{i})\) is called the \(i^{th}\) component of H(S). For example for \(S=aacgtacc\) we have \(H(S)=a\cdot ac\cdot g\cdot t\cdot acc\) as the history of S. If \(S(1,h_{i})\) is not reproducible from \(S(1,h_{i-1})\) then the component \(H_{i}(S)\) is called exhaustive meaning that the copying process cannot be continued and the component should be halted with a single character innovation. A history is called exhaustive if each of its components is exhaustive. Every string S has a unique exhaustive history [5]. Let us denote by \(c_{H}(S)\) the number of components in a history of S. The LZ complexity of S is \(c(S)=\min \left\{ c_{H}(S)\right\} \) where the minimum is over all histories of S. It can be shown that \(c(S)=c_{E}(S)\) where \(c_{E}(S)\) is the number of components in the exhaustive history of S.
A distance for strings based on the LZ-complexity was introduced in [4] and is defined as follows: given two strings X and Y of any finite alphabet, denote by XY their concatenation then define
(see several normalized versions of d in [4]). In [3, 6] we have found that the following normalized distance
is useful for image classification.
In [7] we introduced a parallel distributed processing algorithm (LZMP) for computing the complexity c(X) of a string X. Let us denote by \(\mathtt{dp}(X,Y)\) the distance between X and Y where the complexity c is computed by the LZMP algorithm. Thus (1) is now represented by its parallel counterpart
where a, b are the LZ-complexity values of the string X, Y, respectively, and for efficiency they are pre-computed as seen for instance in Procedure DMat, step 2(IV).
3 The Algorithm
We describe the parallel algorithm for image feature extraction, starting with a listing of several procedures followed by the main part which is split into several sub-algorithms. Procedure LZMP computes the LZ-complexity of a given string. It runs in parallel over the symbols that comprise the string. The procedure appears in [7] and we enclose it here for completeness. Procedure VLZMP computes the LZ-complexity of a set of input strings in parallel. Procedure DMat computes the UID distance of every pair of strings from two given input lists, in parallel. The variable \(i_{p,q}\) denotes an index variable of the computing block \(B_{p,q}\) (each block has its own local memory and set of variables). The main algorithm is split into sub-algorithms (as done in [6]) which are numbered from 2 to 4 and the letter P denotes that it is a parallel computing algorithm. Algorithm 2P selects the prototype images (its serial version is Algorithms 1 and 2 of [6]).
Algorithm 3P computes the set of cases (feature vectors) for images in the input corpus. The algorithm utilizes a number of computing blocks which begin to run in parallel in step 12. Steps 6 to 11 which run in serial are responsible for converting the input images into strings of symbols.
Algorithm 4 is identical to that of [6] and we present it for completeness. It uses the training cases that are produced in Algorithm 3P and uses any off-the-shelf supervised learning algorithm to produce a classifier.
4 Results
The following hardware was used: a 2.8 Ghz AMD Phenom©II X6 1055T Processor with number of cores \(n=6\) and a Tesla K20C board with a single GK110 GPU from nVIDIA. This GPU is based on the Keppler architecture (with compute capabilities of 3.5). The CUDA is release 6.0 and the operating system is Ubuntu Linux 2.6.38-11-generic.
We tested the algorithm on several two-category image classification problem obtained from the CALTECH-101 corpus [1]. Due to the lack of space, we present one such problem which has as categories, airplane and ketch (yacht). We chose 10 prototypes of each category simply by collecting small images of airplanes and boats. The prototypes of airplane are of size \(150\times 70\) pixels and the prototypes of ketch are of size \(150\times 130\). Figure 1 shows a few examples of such prototypes.
The corpus of input images consist of 74 images of airplanes of size \(420\times 200\) and 100 images of yachts of size \(300\times 300\). It takes 345 seconds for Algorithm 3P to produce the 174 cases starting from the image corpus. Figure 2 displays two examples of input images, one from category airplane and one from ketch and their corresponding divisions into sub-images of size \(150\times 150\) (obtained in Algorithm 3P, step 7). Note that the algorithm permits the size of prototypes to differ and the size (or number) of sub-images to differ from one feature category to another. We ran four learning algorithms, multi-layer perceptrons, decision trees J48, naive-Bayes and lazy IB1, on a ten-fold cross validation using the 174 input images. Table 1 presents the accuracy results versus the baseline algorithm (rules.ZeroR) which classifies based on the prior class probability. The configuration parameter values of the learning algorithms used in WEKA [8] are displayed under the accuracy result. As can be seen, the J48 decision tree learner achieves the highest accuracy of \(96.54\,\%\) (relative to the baseline accuracy of \(57.52\,\%\)).
Next, we considered a more challenging problem of recognizing different image textures. We obtained the 1000 images of the Texture Database [2] which has 25 categories of various types of real textures, for instance, glass, water, wood, with 40 images each of size \(640\times 480\) per category. We chose as feature categories the categories themselves and selected five small prototypes of size \(150\times 150\) from each one without using Algorithm 2P (just picking parts of images in a random way to be prototypes). It takes about 25 h for Algorithm 3P to produce the 1000 cases starting from the image corpus. We ran the following classification learning algorithms: lazy IB1, decision trees J48, multi-layer perceptrons, naive Bayes, random forest. Ten fold cross validation accuracy results are displayed in Table 2 (parameter settings are displayed under the accuracy results). As shown, the best obtained accuracy result is \(70.73\,\%\) which is achieved by the random forest algorithm; this is 17.6 times better than the baseline ZeroR classification rule.
Considering how little effort and no-expertise is needed in our approach to image feature extraction, we believe that the results are impressive and can serve well in settings where very little domain knowledge is available, or as a starting point from which additional analysis and specialized feature extraction can be made.
5 Conclusions
In this paper we introduce a new parallel processing algorithm for image feature extraction. Given an input corpus of raw RGB images the algorithm computes feature vectors (cases) that represent the images with their associated classification target labels. Using these cases, any standard supervised or unsupervised learning algorithm can learn to classify or cluster the images in the database. A main advantage in our approach is the lack of need for any kind of image or data analysis. Aside of picking once at the start a few small image prototypes, the procedure is automatic and applies to any set of images. It can therefore be very useful in settings with little domain knowledge or as a starting point for a more specialized image data analysis. Our experiments indicate that the algorithm yields relatively high accuracies on image texture classification problems.
References
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories (2004)
Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1265–1278 (2005)
Chester, U., Ratsaby, J.: Universal distance measure for images. In: Proceedings of the 27th IEEE Convention of Electrical Electronics Engineers in Israel (IEEEI 2012), pp. 1–4. Eilat, Israel, 14–17 November 2012
Sayood, K., Otu, H.H.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16), 2122–2130 (2003)
Ziv, J., Lempel, A.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(3), 75–81 (1976)
Chester, U., Ratsaby, J.: Machine learning for image classification and clustering using a universal distance measure. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 59–72. Springer, Heidelberg (2013)
Belousov, A., Ratsaby, J.: Massively parallel computations of the LZ-complexity of strings. In: Proceedings of the 28th IEEE Convention of Electrical and Electronics Engineers in Israel (IEEEI 2014), pp. 1–5. Eilat, 3–5 December 2014
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Acknowledgement
We acknowledge the support of the nVIDIA corporation for their donation of GPU hardware.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Belousov, A., Ratsaby, J. (2015). A Parallel Distributed Processing Algorithm for Image Feature Extraction. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-24465-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24464-8
Online ISBN: 978-3-319-24465-5
eBook Packages: Computer ScienceComputer Science (R0)