Abstract
This chapter covers the metrics of general feature description, often used for whole images and image regions, including textural, statistical, model based, and basis space methods. Texture, a key metric, is a well-known topic within image processing, and it is commonly divided into structural and statistical methods. Structural methods look for features such as edges and shapes, while statistical methods are concerned with pixel value relationships and statistical moments. Methods for modeling image texture also exist, primarily useful for image synthesis rather than for description. Basis spaces, such as the Fourier space, are also use for feature description.
Measure twice, cut once.
—Carpenter’s saying
Access provided by Autonomous University of Puebla. Download chapter PDF
Keywords
- Basis Space
- Local Binary Pattern
- Interest Point
- Image Code
- Bidirectional Reflectance Distribution Function
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This chapter covers the metrics of general feature description, often used for whole images and image regions, including textural, statistical, model based, and basis space methods. Texture, a key metric, is a well-known topic within image processing, and it is commonly divided into structural and statistical methods. Structural methods look for features such as edges and shapes, while statistical methods are concerned with pixel value relationships and statistical moments. Methods for modeling image texture also exist, primarily useful for image synthesis rather than for description. Basis spaces, such as the Fourier space, are also use for feature description.
It is difficult to develop clean partitions between the related topics in image processing and computer vision that pertain to global vs. regional vs. local feature metrics; there is considerable overlap in the applications of most metrics. However, for this chapter, we divide these topics along reasonable boundaries, though those borders may appear to be arbitrary. Similarly, there is some overlap between discussions here on global and regional features and topics that are covered in Chap. 2 on image processing and that are discussed in Chap. 6 on local features. In short, many methods are used for local, regional, and global feature description, as well as image processing, such as the Fourier transform and the LBP.
But we begin with a brief survey of some key ideas in the field of texture analysis and general vision metrics.
Historical Survey of Features
To compare and contrast global, regional, and local feature metrics, it is useful to survey and trace the development of the key ideas, approaches, and methods used to describe features for machine vision. This survey includes image processing (textures and statistics) and machine vision (local, regional, and global features). Historically, the choice of feature metrics was limited to those that were computable at the time, given the limitations in compute performance, memory, and sensor technology. As time passed and technology developed, the metrics have become more complex to compute, consuming larger memory footprints. The images are becoming multimodal, combining intensity, color, multiple spectrums, depth sensor information, multiple-exposure settings, high dynamic range imagery, faster frame rates, and more precision and accuracy in x, y, and Z depth. Increases in memory bandwidth and compute performance, therefore, have given rise to new ways to describe feature metrics and perform analysis.
Many approaches to texture analysis have been tried; these fall into the following categories:
-
Structural, describing texture via a set of micro-texture patterns known as texels. Examples include the numerical description of natural textures such as fabric, grass, and water. Edges, lines, and corners are also structural patterns, and the characteristics of edges within a region, such as edge direction, edge count, and edge gradient magnitude, are useful as texture metrics. Histograms of edge features can be made to define texture, similar to the methods used in local feature descriptors such as SIFT (described in Chap. 6).
-
Statistical, based on gray level statistical moments describing point pixel area properties, and includes methods such as the co-occurrence matrix or SDM. For example, regions of an image with color intensity within a close range could be considered as having the same texture. Regions with the same histogram could be considered as having the same texture.
-
Model based, including fractal models, stochastic models, and various semi-random fields. Typically, the models can be used to generate synthetic textures, but may not be effective in recognizing texture, and we do not cover texture generation.
-
Transform or basis based, including methods such as Fourier, Wavelets, Gabor filters, Zernike, and other basis spaces, which are treated here as a subclass of the statistical methods (statistical moments); however, basis spaces are used in transforms for image processing and filtering as well.
Key Ideas: Global, Regional, and Local Metrics
Let us take a brief look at a few major trends and milestones in feature metrics research. While this brief outline is not intended to be a precise, inclusive look at all key events and research, it describes some general trends in mainstream industry thinking and academic activity.
1960s, 1970s, 1980s—Whole-Object Approaches
During this period, metrics describe mostly whole objects, larger regions, or images; pattern matching was performed on large targets via FFT spectral methods and correlation; recognition methods included object, shape, and texture metrics; and simple geometric primitives were used for object composition. Low-resolution images such as NTSC, PAL, and SECAM were common—primarily gray scale with some color when adequate memory was available. Some satellite images were available to the military with higher resolution, such as LANDSAT images from NASA and SPOT images from France.
Some early work on pattern recognition began to use local interest points and features: notably, Moravic [502] developed a local interest point detector in 1981, and in 1988 Harris and Stephens [148] developed local interest point detectors. Commercial systems began to appear, particularly the View PRB in the early 1980s, which used digital correlation and scale space super-pixels for coarse to fine matching, and real-time image processing and pattern recognition systems were introduced by Imaging Technology. Rack-mounted imaging and machine vision systems began to be replaced by workstations and high-end PCs with add-on imaging hardware, array processors, and software libraries and applications by companies such as Krig Research.
Early 1990s—Partial-Object Approaches
Compute power and memory were increasing, enabling more attention to local feature methods, such as developments from Shi and Tomasi [149] improving the Harris detector methods, Kitchen and Rosenfeld [200] developing gray level corner detection methods, and methods by Wang and Brady [205]. Image moments over polygon shapes were computed using Zernike polynomials in 1990 by Khotanzad and Hong [268]. Scale space theory was applied to computer vision by Lindberg [502], and many other researchers followed this line of thinking into the future, such as Lowe [153] in 2004.
Metrics described smaller pieces of objects or object components and parts of images; there was increasing use of local features and interest points. Large sets of sub-patterns or basis vectors were used and corresponding metrics were developed. There was increased use of color information; more methods appeared to improve invariance for scale, rotational, or affine variations; and recognition methods were developed based on finding parts of an object with appropriate metrics. Higher image resolution, increased pixel depths, and color information were increasingly used in the public sector (especially in medical applications), along with of new affordable image sensors, such as the KODAK MEGA-PLUS, which provided a 1024 × 1024 image.
Mid-1990s—Local Feature Approaches
More focus was put on metrics that identify small local features surrounding interest points in images. Feature descriptors added more details from a window or patch surrounding each feature, and recognition was based on searching for sets of features and matching descriptors with more complex classifiers. Descriptor spectra included gradients, edges, and colors.
Late 1990s—Classified Invariant Local Feature Approaches
New feature descriptors were developed and refined to be invariant to changes in scale, lightness, rotation, and affine transformations. Work by Schmidt and Mohr [340] advanced and generalized the local feature description methods. Features acted as an alphabet for spelling out complex feature descriptors or vectors whereby the vectors were used for matching. The feature matching and classification stages were refined to increase speed and effectiveness using neural nets and other machine learning methods [134].
Early 2000s—Scene and Object Modeling Approaches
Scenes and objects were modeled as sets of feature components or patterns with well-formed descriptors; spatial relationships between features were measured and used for matching; and new complex classification and matching methods used boosting and related methods to combine strong and weak features for more effective recognition. The SIFT [153] algorithm from Lowe was published; SURF was also published by Bay et al. [152], taking a different approach using HAAR features rather than just gradients. The Viola–Jones method [486] was published, using HAAR features and a boosted learning approach to classification, accelerating matching. The OpenCV library for computer vision was developed by Bradski at INTEL™, and released as open source.
Mid-2000s—Finer-Grain Feature and Metric Composition Approaches
The number of researchers in this field began to mushroom; various combinations of features and metrics (bags of features) were developed by Czurka et al. [226] to describe scenes and objects using key points as described by Sivic [503]; new local feature descriptors were created and old ones refined; and there was increased interest in real-time feature extraction and matching methods for commercial applications. Better local metrics and feature descriptors were analyzed, measured, and used together for increased pattern match accuracy. Also, feature learning and sparse feature codebooks were developed to decrease pattern space, speed up search time, and increase accuracy.
Post-2010—Multimodal Feature Metrics Fusion
There has been increasing use of depth sensor information and depth maps to segment images and describe features and create VOXEL metrics for example see Rusu et al. [380], for example 2D texture metrics are expressed in 3-space. 3D depth sensing methods proliferate, increasing use of high-resolution images and high dynamic range (HDR) images to enhance feature accuracy, and greater bit depth and accuracy of color images allows for valuable color-based metrics and computational imaging. Increased processing power and cheap, plentiful memory handle larger images on low-cost compute platforms. Faster and better feature descriptors using binary patterns have been developed and matched rapidly using Hamming distance, such as FREAK by Alahi et al. [122] and ORB by Rublee et al. [112]. Multimodal and multivariate descriptors [770, 771] are composed of image features with other sensor information, such as accelerometers and positional sensors.
Future computing research may even come full circle, when sufficient compute and memory capacity exist to perform the older methods, like correlation across multiple scales and geometric perspectives in real-time using parallel and fixed-function hardware methods. This would obviate some of the current focus on small invariant sets of local features and allow several methods to be used together, synergistically. Therefore, the history of development in this field is worth knowing, since it might repeat itself in a different technological embodiment.
Since there is no single solution for obtaining the right set of feature metrics, all the methods developed over time have applications today and are still in use.
Textural Analysis
One of the most basic metrics is texture, which is the description of the surface of an image channel, such as color intensity, like an elevation map or terrain map. Texture can be expressed globally or within local regions. Texture can be expressed locally by statistical relationships among neighboring pixels in a region, and it can be expressed globally by summary relationships of pixel values within an image or region. For a sampling of the literature covering a wide range of texture methods, see Refs. [13, 16–20, 52, 53, 302, 304, 305].
According to Gonzalez [4], there are three fundamental classes of texture in image analysis: statistical, structural, and spectral. Statistical measures include histograms, scatter plots, and SDMs. Structural techniques are more concerned with locating patterns or structural primitives in an image, such as parallel lines, regular patterns, and so on. These techniques are described in [1, 5, 8, 11]. Spectral texture is derived from analysis of the frequency domain representation of the data. That is, a fast Fourier transform is used to create a frequency domain image of the data, which can then be analyzed using Fourier techniques.
Histograms reveal overall pixel value distributions but say nothing about spatial relationships. Scatter plots are essentially two-dimensional histograms, and do not reveal any spatial relationships. A good survey is found in Ref. [307].
Texture has been used to achieve several goals:
-
Texture-based segmentation (covered in Chap. 2).
-
Texture analysis of image regions (covered in this chapter).
-
Texture synthesis, creating images using synthetic textures (not covered in this book).
In computer vision, texture metrics are devised to describe the perceptual attributes of texture by using discrete methods. For instance, texture has been described perceptually with several properties, including:
-
Contrast
-
Color
-
Coarseness
-
Directionality
-
Line-likeness
-
Roughness
-
Constancy
-
Grouping
-
Segmentation
If textures can be recognized, then image regions can be segmented based on texture and the corresponding regions can be measured using shape metrics such as area, perimeter, and centroid (as discussed in Chap. 6). Chapter 2 included a survey of segmentation methods, some of which are based on texture. Segmented texture regions can be recognized and compared for computer vision applications. Micro-textures of a local region, such as the LBP discussed in detail in Chap. 6, can be useful as a feature descriptor, and macro-textures can be used to describe a homogenous texture of a region such as a lake or field of grass, and therefore have natural applications to image segmentation. In summary, texture can be used to describe global image content, image region content, and local descriptor region content. The distinction between a feature descriptor and a texture metric may be small.
Sensor limitations combined with compute and memory capabilities of the past have limited the development of texture metrics to mainly 2D gray scale metrics. However, with the advances toward pervasive computational photography in every camera providing higher resolution images, higher frame rates, deeper pixels, depth imaging, more memory, and faster compute, we can expect that corresponding new advances in texture metrics will be made.
Here is a brief historical survey of texture metrics.
1950s Through 1970s—Global Uniform Texture Metrics
Auto-correlation or cross-correlation was developed by Kaizer [26] in 1955 as a method of looking for randomness and repeating pattern features in aerial photography, where auto-correlation is a statistical method of correlating a signal or image with a time-shifted version of itself, yielding a computationally simple method to analyze ground cover and structures.
Bajcsy [25] developed Fourier spectrum methods in 1973 using various types of filters in the frequency domain to isolate various types of repeating features as texture.
Gray level spatial dependency matrices, GLCMs, SDMs or co-occurrence matrices [6] were developed and used by Haralick in 1973, along with a set of summary statistical metrics from the SDMs to assist in numerical classification of texture. Some, but not all, of the summary metrics have proved useful; however, analysis of SDMs and development of new SDM metrics have continued, involving methods such as 2D visualization and filtering of the SDM data within spatial regions [23], as well as adding new SDM statistical metrics, some of which are discussed in this chapter.
1980s—Structural and Model-Based Approaches for Texture Classification
While early work focused on micro-textures describing statistical measures between small kernels of adjacent pixels, macro-textures developed to address the structure of textures within a larger region. K. Laws developed texture energy-detection methods in 1979 and 1980 [27–29], as well as texture classifiers, which may be considered the forerunners of some of the modern classifier concepts. The Laws method could be implemented as a texture classifier in a parallel pipeline with stages for taking gradients via of a set of convolution masks over Gaussian filtered images to isolate texture micro features, followed by a Gaussian smoothing stage to deal with noise, followed by the energy calculation from the combined gradients, followed by a classifier which matched texture descriptors.
Eigenfilters were developed by Ade [30] in 1983 as an alternative to the Laws gradient or energy methods and SDMs; eigenfilters are implemented using a covariance matrix representation of local 3 × 3 pixel region intensities, which allows texture analysis and aggregation into structure based on the variance within eigenvectors in the covariance matrix.
Structural approaches were developed by Davis [31] in 1979 to focus on gross structure of texture rather than primitives or micro-texture features. Hough transforms were invented in 1972 by Duda and Hart [220] as a method of finding lines and curves, and it was used by Eichmann and Kasparis [32] in 1988 to provide invariant texture description.
Fractal methods and Markov random field methods were developed into texture descriptors, and while these methods may be good for texture synthesis, they do not map well to texture classification, since both Fractal and Markov random field methods use random fields, thus there are limitations when applied to real-world textures that are not random.
1990s—Optimizations and Refinements to Texture Metrics
In 1993, Lam and Ip [33, 39] used pyramid segmentation methods to achieve spatial invariance, where an image is segmented into homogenous regions using Voronoi polygon tessellation and irregular pyramid segmentation techniques around Q points taken from a binary thresholded image; five shape descriptors are calculated for each polygon: area, perimeter, roundness, orientation, and major/minor axis ratio, combined into texture descriptors.
Local binary patterns (LBP ) were developed in 1994 by Ojala et al. [165] as a novel method of encoding both pattern and contrast to define texture [15, 16, 35, 36]; since then, hundreds of researchers have added to the LBP literature in the areas of theoretical foundations, generalization into 2D and 3D, domain-specific interest point descriptors used in face detection, and spatiotemporal applications to motion analysis [34]. LBP research remains quite active at this time. LBPs are covered in detail in Chap. 6. There are many applications for the powerful LBP method as texture metric, a feature descriptor, and an image processing operator, the latter which was discussed in Chap. 2.
2000 to Today—More Robust Invariant Texture Metrics and 3D Texture
Feature metrics research is investigating texture metrics that are invariant to scale, rotation, lighting, perspective, and so on to approach the capabilities of human texture discrimination. In fact, texture is used interchangeably as a feature descriptor in some circles. The work by Pun and Lee [37] is an example of development of rotational invariant texture metrics, as well as scale invariance. Invariance attributes are discussed in the general taxonomy in Chap. 5.
The next wave of metrics being developed increasingly will take advantage of 3D depth information. One example is the surface shape metrics developed by Spence [38, 304] in 2003, which provide a bump-map type metric for affine invariant texture recognition and texture description with scale and perspective invariance. Chapter 6 also discusses some related 3D feature descriptors.
Statistical Methods
The topic of statistical methods is vast, and we can only refer the reader to selected literature as we go along. One useful and comprehensive resource is the online NIST National Institute of Science and Technology Engineering Statistics Handbook,Footnote 1 including examples and links to additional resources and tools.
Statistical methods may be drawn upon at any time to generate novel feature metrics. Any feature, such as pixel values or local region gradients, can be expressed statistically by any number of methods. Simple methods, such as the histogram shown in Fig. 3.1, are invaluable. Basic statistics such as minimum, maximum, and average values can be seen easily in the histogram shown in Chap. 2 in Fig. 2.21. We survey several applications of statistical methods to computer vision here.
Texture Region Metrics
Now we look in detail at the specific metrics for feature description based on texture. Texture is one of the most-studied classes of metrics. It can be thought of in terms of the surface—for example, a burlap bag compared to silk fabric. There are many possible textural relationships and signatures that can be devised in a range of domains, with new ones being developed all the time. In this section we survey some of the most common methods for calculating texture metrics:
-
Edge metrics
-
Cross-correlation
-
Fourier spectrum signatures
-
Co-occurrence matrix, Haralick features, extended SDM features
-
Laws texture metrics
-
Tessellation
-
Local binary patterns (LBP)
-
Dynamic textures
Within an image, each image region has a texture signature, where texture is defined as a common structure and pattern within that region. Texture signatures may be a function of position and intensity relationships, as in the spatial domain, or be based on comparisons in some other function basis and feature domain, such as frequency space using Fourier methods.
Texture metrics can be used to both segment and describe regions. Regions are differentiated based on texture homogeneousness, and as a result, texture works well as a method for region segmentation. Texture is also a good metric for feature description, and as a result it is useful for feature detection, matching, and tracking.
Appendix B contains several ground truth datasets with example images for computing texture metrics, including the CUReT reflectance and texture database from Columbia University. Several key papers describe the metrics used against the CUReT dataset [21, 40–42] including the appearance of a surface as a bidirectional reflectance distribution function (BRDF) and a bidirectional texture function (BTF).
These metrics are intended to measure texture as a function of direction and illumination, to capture coarse details and fine details of each surface. If the surface texture contains significant sub-pixel detail not apparent in single pixels or groups of pixels, the BRDF reflectance metrics can capture the coarse reflectance details. If the surface contains pixel-by-pixel difference details, the BTF captures the fine texture details.
Edge Metrics
Edges, lines, contours, or ridges are basic textural features [308, 309]. A variety of simple metrics can be devised just by analyzing the edge structure of regions in an image. There are many edge metrics in the literature, and a few are illustrated here.
Computing edges can be considered on a continuum of methods from interest point to edges, where the interest point may be a single pixel at a gradient maxima or minima, with several connected gradient maxima pixels composed into corners, ridges line segments, or a contours. In summary, a gradient point is a degenerate edge, and an edge is a collection of connected gradient points.
The edge metrics can be computed locally or globally on image regions as follows:
-
Compute the gradient g(d) at each pixel, selecting an appropriate gradient operator g() and select the appropriate kernel size or distance d to target either micro or macro edge features.
-
The distance d or kernel size can be varied to achieve different metrics; many researchers have used 3 × 3 kernels.
-
Compute edge orientation by binning gradient directions for each edge into a histogram; for example, use 45° angle increment bins for a total of 8 bins at 0°, 45°, 90°, 135°, 180°, 225°, 270°.
Several other methods can be used to compute edge statistics. The representative methods are shown here; see also Shapiro and Stockton [499] for a standard reference.
Edge Density
Edge density can be expressed as the average value of the gradient magnitudes g m in a region.
Edge Contrast
Edge contrast can be expressed as the ratio of the average value of gradient magnitudes to the maximum possible pixel value in the region.
Edge Entropy
Edge randomness can be expressed as a measure of the Shannon entropy of the gradient magnitudes.
Edge Directivity
Edge directivity can be expressed as a measure of the Shannon entropy of the gradient directions.
Edge Linearity
Edge linearity measures the co-occurrence of collinear edge pairs using gradient direction, as shown by edges a–b in Fig. 3.2.
Edge Periodicity
Edge periodicity measures the co-occurrence of identically oriented edge pairs using gradient direction, as shown by edges a–c in Fig. 3.2.
Edge Size
Edge size measures the co-occurrence of opposite oriented edge pairs using gradient direction, as shown by edges a–d in Fig. 3.2.
Edge Primitive Length Total
Edge primitive length measures the total length of all gradient magnitudes along the same direction.
Cross-Correlation and Auto-correlation
Cross-correlation [26] is a metric showing similarity between two signals with a time displacement between them. Auto-correlation is the cross-correlation of a signal with a time-displaced version of itself. In the literature on signal processing, cross-correlation is also referred to as a sliding inner product or sliding dot product. Typically, this method is used to search a large signal for a smaller pattern.
Using the Wiener–Khinchin theorem as a special case of the general cross-correlation theorem, cross-correlation can be written as simply the Fourier transform of the absolute square of the function f v, as follows:
In computer vision, the feature used for correlation may be a 1D line of pixels or gradient magnitudes, a 2D pixel region, or a 3D voxel volume region. By comparing the features from the current image frame and the previous image frame using cross-correlation derivatives, we obtain a useful texture change correlation metric.
By comparing displaced versions of an image with itself, we obtain a set of either local or global auto-correlation texture metrics. Auto-correlation can be used to detect repeating patterns or textures in an image, and also to describe the texture in terms of fine or coarse, where coarse textures show the auto-correlation function dropping of more slowly than fine textures. See also the discussion of correlation in Chap. 6 and Fig. 6.20.
Fourier Spectrum, Wavelets, and Basis Signatures
Basis transforms, such as the FFT, decompose a signal into a set of basis vectors from which the signal can be synthesized or reconstructed. Viewing the set of basis vectors as a spectrum is a valuable method for understanding image texture and for creating a signature. Several basis spaces are discussed in this chapter, including Fourier, HAAR, wavelets, and Zernike.
Although computationally expensive and memory intensive, the Fast Fourier Transform (FFT) is often used to produce a frequency spectrum signature. The FFT spectrum is useful for a wide range of problems. The computations typically are limited to rectangular regions of fixed sizes, depending on the radix of the transform (see Bracewell [219]).
As shown in Fig. 3.3, Fourier spectrum plots reveal definite image features useful for texture and statistical analysis of images. For example, Fig. 3.10 shows an FFT spectrum of LBP pattern metrics. Note that the Fourier spectrum has many valuable attributes, such as rotational invariance, as shown in Fig. 3.3, where a texture image is rotated 90° and the corresponding FFT spectrums exhibit the same attributes, only rotated 90°.
Wavelets [219] are similar to Fourier methods, and have become increasingly popular for texture analysis [303], discussed later in the section on basis spaces.
Note that the FFT spectrum as a texture metric or descriptor is rotational invariant, as shown in the bottom left image of Fig. 3.3. FFT spectra can be taken over rectangular 2D regions. Also, 1D arrays such as annuli or Cartesian coordinates of the shape taken around the perimeter of an object shape can be used as input to an FFT and as an FFT descriptor shape metric.
Co-occurrence Matrix, Haralick Features
Haralick [6] proposed a set of 2D texture metrics calculated from directional differences between adjacent pixels, referred to as co-occurrence matrices, spatial dependency matrices (SDM), or gray level co-occurrence matrices (GLCM) . A complete set of four (4) matrices are calculated by evaluating the difference between adjacent pixels in the x, y, diagonal x and diagonal y directions, as shown in Fig. 3.4, and further illustrated with a 4 × 4 image and corresponding co-occurrence tables shown in Fig. 3.5.
One benefit of the SDM as a texture metric is that it is easy to calculate in a single pass over the image. The SDM is also fairly invariant to rotation, which is often a difficult robustness attribute to attain. Within a segmented region or around an interest point, the SDM plot can be a valuable texture metric all by itself, therefore useful for texture analysis, feature description, noise detection, and pattern matching.
For example, if a camera has digital-circuit readout noise, it will show up in the SDM for the x direction only if the lines are scanned out of the sensor one at a time in the x direction, so using the SDM information will enable intelligent sensor processing to remove the readout noise. However, it should be noted that SDM metrics are not always useful alone, and should be qualified with additional feature information. The SDM is primarily concerned with spatial relationships, with regard to spatial orientation and frequency of occurrence. So, it is primarily a statistical measure.
The SDM is calculated in four orientations, as shown in Fig. 3.4. Since the SDM is only concerned with adjacent pairs of pixels, these four calculations cover all possible spatial orientations. SDMs could be extended beyond 2 × 2 regions by using forming kernels extending into 5 × 5, 7 × 7, 9 × 9, and other dimensions.
A spatial dependency matrix is basically a count of how many times a given pixel value occurs next to another pixel value. Fig. 3.5 illustrates the concept. For example, assume we have an 8-bit image (0. 255). If an SDM shows that pixel value x frequently occurs adjacent to pixels within the range x + 1 to x − 1, then we would say that there is a “smooth” texture at that intensity. However, if pixel value x frequently occurs adjacent to pixels within the range x + 70 to x − 70, we would say that there is quite a bit of contrast at that intensity, if not noise.
A critical point in using SDMs is to be sensitive to the varied results achieved when sampling over small vs. large image areas. By sampling the SDM over a smaller area (say 64 × 64 pixels), details will be revealed in the SDMs that would otherwise be obscured. The larger the size of the sample image area, the more the SDM will be populated. And the more samples taken, the more likely that detail will be obscured in the SDM image plots. Actually, smaller areas (i.e., 64 × 64 pixels) are a good place to start when using SDMs, since smaller areas are faster to compute and will reveal a lot about local texture.
The Haralick metrics are shown in Fig. 3.6.
The statistical characteristics of the SDM have been extended by several researchers to add more useful metrics [23], and SDMs have been applied to 3D volumetric data by a number of researchers with good results [22].
Extended SDM Metrics (Krig SDM Metrics)
Extensions to the Haralick metrics have been developed by the author [23], primarily motivated by a visual study of SDM plots as shown in Fig. 3.7. Applications for the extended SDM metrics include texture analysis, data visualization, and image recognition. The visual plots of the SDMs alone are valuable indicators of pixel intensity relationships, and are worth using along with histograms to get to know the data.
The extended SDM metrics include centroid, total coverage, low-frequency coverage, total power, relative power, locus length, locus mean density, bin mean density, containment, linearity, and linearity strength. The extended SDM metrics capture key information that is best observed by looking at the SDM plots. In many cases the extended SDM metric are be computed four times, once for each SDM direction of 0°, 45°, 90°, and 135°, as shown in Fig. 3.5.
The SDMs are interesting and useful all by themselves when viewed as an image. Many of the texture metrics suggested are obvious after viewing and understanding the SDMs; others are neither obvious nor apparently useful until developing a basic familiarity with the visual interpretation of SDM image plots. Next, we survey the following:
-
Example SDMs showing four directional SDM maps: A complete set of SDMs would contain four different plots, one for each orientation. Interpreting the SDM plots visually reveals useful information. For example, an image with a smooth texture will yield a narrow diagonal band of co-occurrence values; an image with wide texture variation will yield a larger spread of values; a noisy image will yield a co-occurrence matrix with outlier values at the extrema. In some cases, noise may only be distributed along one axis of the image—perhaps, across rows or the x axis, which could indicated sensor readout noise as each line is read out of the sensor, suggesting a row- or line-oriented image preparation stage in the vision pipeline to compensate for the camera.
-
Extended SDM texture metrics: The addition of 12 other useful statistical measures to those proposed by Haralick.
-
Some code snippets: These illustrate the extended SDM computations, full source code is shown in Appendix D.
In Fig. 3.7, several of the extended SDM metrics can be easily seen, including containment and locus mean density. Note that the right image does not have a lot of outliner intensity points or noise (good containment); most of the energy is centered along the diagonal (tight locus), showing a rather smooth set of image pixel transitions and texture, while the left image shows a wider range of intensity values. For some images, wider range may be noise spread across the spectrum (poor containment), revealing a wider band of energy and contrast between adjacent pixels.
Metric 1: Centroid
To compute the centroid, for each SDM bin p(i,j), the count of the bin is multiplied by the bin coordinate for x,y and also the total bin count is summed. The centroid calculation is weighted to compute the centroid based on the actual bin counts, rather than an unweighted “binary” approach of determining the center of the binning region based on only bin data presence. The result is the weighted center of mass over the SDM bins.
Metric 2: Total Coverage
This is a measure of the spread, or range of distribution, of the binning. A small coverage percentage would be indicative of an image with few gray levels, which corresponds in some cases to image smoothness. For example, a random image would have a very large coverage number, since all or most of the SDM bins would be hit. The coverage feature metrics (2, 3, 4), taken together with the linearity features suggested below (11, 12), can give an indication of image smoothness.
Metric 3: Low-Frequency Coverage
For many images, any bins in the SDM with bin counts less than a threshold value, such as 3, may be considered as noise. The low-frequency coverage metric, or noise metric, provides an idea how much of the binning is in this range. This may be especially true as the sample area of the image area increases. For whole images, a threshold of 3 has proved to be useful for determining if a bin contains noise for a data range of 0-255, and using the SDM over smaller local kernel regions may use all the values with no thresholding needed.
Metric 4: Corrected Coverage
Corrected coverage is the total coverage with noise removed.
Metric 5: Total Power
The power metric provides a measure of the swing in value between adjacent pixels in an image, and is computed in four directions. A smooth image will have a low power number because the differences between pixels are smaller. Total power and relative power are inter-related, and relative power is computed using the total populated bins (z) and total difference power (t).
Metric 6: Relative Power
The relative power is calculated based on the scaled total power using nonempty SDM bins t, while the total power uses all bins.
Metric 7: Locus Mean Density
For many images, there is a “locus” area of high-intensity binning surrounding the bin axis (locus axis is where adjacent pixels are of the same value x = y) corresponding to a diagonal line drawn from the upper left corner of the SDM plot. The degree of clustering around the locus area indicates the amount of smoothness in the image. Binning from a noisy image will be scattered with little relation to the locus area, while a cleaner image will show a pattern centered about the locus.
The locus mean density is an average of the bin values within the locus area. The locus is the area around the center diagonal line, within a band of 7 pixels on either side of the identity line (x = y) that passes down the center of each SDM. However, the number 7 is not particularly special, but based upon experience, it just gives a good indication of the desired feature over whole images. This feature is good for indicating smoothness.
Metric 8: Locus Length
The locus length measures the range of the locus concentration about the diagonal. The algorithm for locus length is a simple count of bins populated in the locus area; a threshold band of 7 pixels about the locus has been found useful.
y = length = 0;
while (y < 256) {
x = count = 0;
while (x < 256) {
n = |y-x|;
if (p[i,j] == 0) && (n < 7) count++;
x++;
}
if (!count) length++;
y++;
}
Metric 9: Bin Mean Density
This is simply the average bin count from nonempty bins.
Metric 10: Containment
Containment is a measure of how well the binning in the SDM is contained within the boundaries or edges of the SDM, and there are four edges or boundaries, for example assuming a data range [0…255], there are containment boundaries along rows 0 and 255, and along columns 0 and 255. Typically, the bin count m is 256 bins, or possibly less such as 64. To measure containment, basically the perimeters of the SDM bins are checked to see if any binning has occurred, where the perimeter region bins of the SDM represent extrema values next to some other value. The left image in Fig. 3.7 has lower containment than the right image, especially for the low values.
If extrema are hit frequently, this probably indicates some sort of overflow condition such as numerical overflow, sensor saturation, or noise. The binning is treated unweighted. A high containment number indicates that all the binning took place within the boundaries of the SDM. A lower number indicates some bleeding. This feature appears visually very well in the SDM plots.
Metric 11: Linearity
The linearity characteristic may only be visible in a single orientation of the SDM, or by comparing SDMs. For example, the image in Fig. 3.8 reveals some linearity variations across the set of SDMs. This is consistent with the image sensor used (older tube camera).
Metric 12: Linearity Strength
The algorithm for linearity strength is shown in Metric 11. If there is any linearity present in a given angle of SDM, both linearity strength and linearity will be comparatively higher at this angle than the other SDM angles (Table 3.1).
Laws Texture Metrics
The Laws metrics [24, 27–29] provide a structural approach to texture analysis, using a set of masking kernels to measure texture energy or variation within fixed sized local regions, similar to the 2 × 2 region SDM approach but using larger pixel areas to achieve different metrics.
The basic Laws algorithm involves classifying each pixel in the image into texture based on local energy, using a few basic steps:
-
1.
The mean average intensity from each kernel neighborhood is subtracted from each pixel to compensate for illumination variations.
-
2.
The image is convolved at each pixel using a set of kernels, each of which sums to zero, followed by summing the results to obtain the absolute average value over each kernel window.
-
3.
The difference between the convolved image and the original image is measured, revealing the Laws energy metrics.
Laws defines a set of nine separable kernels to produce a set of texture region energy metrics, and some of the kernels work better than others in practice. The kernels are composed via matrix multiplication from a set of four vector masks L5, E5, S5, and R5, described below. The kernels were originally defined as 5 × 5 masks, but 3 × 3 approximations have been used also, as shown below.
5 × 5 form
3 × 3 approximations of 5 × 5 form
To create 2D masks, vectors Ln, En, Sn, and Rn (as shown above) are convolved together as separable pairs into kernels; a few examples are shown in Fig. 3.9.
Note that Laws texture metrics have been extended into 3D for volumetric texture analysis [43, 44].
LBP Local Binary Patterns
In contrast to the various structural and statistical methods of texture analysis, the LBP operator [18, 50] computes the local texture around each region as an LBP binary code, or micro-texture, allowing simple micro-texture comparisons to segment regions based on like micro-texture. (See the very detailed discussion on LBP in Chap. 6 for details and references to the literature, and especially Fig. 6.6.) The LBP operator [165] is quite versatile, easy to compute, consumes a low amount of memory, and can be used for texture analysis, interest points, and feature description. As a result, the LBP operator is discussed is several places in this book.
As shown in Fig. 3.10, the uniform set of LBP operators, composed of a subset of the possible LBPs that are by themselves rotation invariant, can be binned into a histogram, and the corresponding bin values are run through an FFT as a 1D array to create an FFT spectrum, which yields a robust metric with strong rotational invariance.
Dynamic Textures
Dynamic textures are a concept used to describe and track textured regions as they change and morph dynamically from frame to frame [13–15, 45] For example, dynamic textures may be textures in motion, like sea waves, smoke, foliage blowing in the wind, fire, facial expressions, gestures, and poses. The changes are typically tracked in spatiotemporal sets of image frames, where the consecutive frames are stacked into volumes for analysis as a group. The three dimensions are the XY frame sizes, and the Z dimension is derived from the stack of consecutive frames n − 2, n − 1, n.
A close cousin to dynamic texture research is the field of activity recognition (discussed in Chap. 6), where features are parts of moving objects that compose an activity—for example, features on arms and legs that are tracked frame to frame to determine the type of motion or activity, such as walking or running. One similarity between activity recognition and dynamic textures is that the features or textures change from frame to frame over time, so for both activity recognition and dynamic texture analysis, tracking features and textures often requires a spatiotemporal approach involving a data structure with a history buffer of past and current frames, which provides a volumetric representation to the data.
For example, VLBP and LBP-TOP (discussed in Chap. 6) provide methods for dynamic texture analysis by using the LBP constructed to operate over three dimensions in a volumetric structure, where the volume contains image frames n − 2, n − 1, and n stacked into the volume.
Statistical Region Metrics
Describing texture in terms of statistical metrics of the pixels is a common and intuitive method. Often a simple histogram of a region will be sufficient to describe the texture well enough for many applications. There are also many variations of the histogram, which lend themselves to a wide range of texture analysis. So this is a good point at which to examine histogram methods. Since statistical mathematics is a vast field, we can only introduce the topic here, dividing the discussion into image moment features and point metric features.
Image Moment Features
Image moments [4, 500] are scalar quantities, analogous to the familiar statistical measures such as mean, variance, skew, and kurtosis. Moments are well suited to describe polygon shape features and general feature metric information such as gradient distributions. Image moments can be based on either scalar point values or basis functions such as Fourier or Zernike methods discussed later in the section on basis space.
Moments can describe the projection of a function onto a basis space—for example, the Fourier transform projects a function onto a basis of harmonic functions. Note that there is a conceptual relationship between 1D and 2D moments in the context of shape description. For example, the 1D mean corresponds to the 2D centroid, and the 1D minimum and maximum correspond to the 2D major and minor axis. The 1D minimum and maximum also correspond to the 2D bounding box around the 2D polygon shape (also see Fig. 6.29).
In this work, we classify image moments under the term polygon shape descriptors in the taxonomy (see Chap. 5). Details on several image moments used for 2D shape description are covered in Chap. 6, under “Object Shape Metrics for Blobs and Objects.”
Common properties of moments in the context of 1D distributions and 2D images include:
-
Zeroth order moment is the mean or 2D centroid.
-
Central moments describe variation around the mean or 2D centroid.
-
First order central moments contain information about 2D area, centroid, and size.
-
Second order central moments are related to variance and measure 2D elliptical shape.
-
Third order central moments provide symmetry information about the 2D shape, or skewness.
-
Fourth order central moments measure 2D distribution as tall, short, thin, short, or fat.
-
Higher-level moments may be devised and composed of moment ratios, such as co-variance.
Moments can be used to create feature descriptors that are invariant to several robustness criteria, such as scale, rotation, and affine variations. The taxonomy of robustness and invariance criteria is provided in Chap. 5. For 2D shape description, in 1961 Hu developed a theoretical set of seven 2D planar moments for character recognition work, derived using invariant algebra, that are invariant under scale, translation, and rotation [7]. Several researchers have extended Hu’s work. An excellent resource for this topic is Moments and Moment Invariants in Pattern Recognition, by Jan Flusser et al. [500].
Point Metric Features
Point metrics can be used for the following: (1) feature description, (2) analysis and visualization, (3) thresholding and segmentation, and (4) image processing via programmable LUT functions (discussed in Chap. 2). Point metrics are often overlooked. Using point metrics to understand the structure of the image data is one of the first necessary steps toward devising the image preprocessing pipeline to prepare images for feature analysis. Again, the place to start is by analysis of the histogram, as shown in Figs. 3.1 and 3.11. The basic point metrics can be determined visually, such as minima, maxima, peaks, and valleys. False coloring of the histogram regions for data visualization is simple using color lookup tables to color the histogram regions in the images.
Here is a summary of statistical point metrics:
-
Quantiles, median, rescale: By sorting the pixel values into an ordered list, as during the histogram process, the various quartiles can be found, including the median value. Also, the pixels can be rescaled from the list and used for pixel remap functions (as described in Chap. 2).
-
Mix, max, mode: The minimum and maximum values, together with histogram analysis, can be used to guide image preprocessing to devise a threshold method to remove outliers from the data. The mode is the most common pixel value in the sorted list of pixels.
-
Mean, harmonic mean, and geometric mean: Various formulations of the mean are useful to learn the predominant illumination levels, dark or light, to guide image preprocessing to enhance the image for further analysis.
-
Standard deviation, skewness, and kurtosis: These moments can be visualized by looking at the SDM plots.
-
Correlation: Topic was covered earlier in this chapter under cross-correlation and auto-correlation.
-
Variance, covariance: The variance metric provides information on pixel distribution, and covariance can be used to compare variance between two images. Variance can be visualized to a degree in the SDM, also as shown in this chapter.
-
Ratios and multivariate metrics: Point metrics by themselves may be useful, but multivariate combinations or ratios using simple point metrics can be very useful as well. Depending on the application, the ratios themselves form key attributes of feature descriptors (as described in Chap. 6). For example, mean: min, mean: max, median: mean, area: perimeter.
Global Histograms
Global histograms treat the entire image. In many cases, image matching via global histograms is simple and effective, using a distance function such as SSD. As shown in Fig. 3.12, histograms reveal quantitative information on pixel intensity, but not structural information. All the pixels in the region contribute to the histogram, with no respect to the distance from any specific point or feature. As discussed in Chap. 2, the histogram itself is the basis of histogram modification methods, allowing the shape of the histogram to be stretched, compressed, or clipped as needed, and then used as an inverse lookup table to rearrange the image pixel intensity levels.
Local Region Histograms
Histograms can also be computed over local regions of pixels, such as rectangles or polygons, as well as over sets of feature attributes, such as gradient direction and magnitude or other spectra. To create a polygon region histogram feature descriptor, first a region may be segmented using morphology to create a mask shape around a region of interest, and then only the masked pixels are used for the histogram.
Local histograms of pixel intensity values can be used as attributes of a feature descriptor, and also used as the basis for remapping pixel values from one histogram shape to another, as discussed in Chap. 2, by reshaping the histogram and reprocessing the image accordingly. Chapter 6 discusses a range of feature descriptors such as SIFT, SURF, and LBP which make use of feature histograms to bin attributes such as gradient magnitude and direction.
Scatter Diagrams, 3D Histograms
The scatter diagram can be used to visualize the relationship or similarity between two image datasets for image analysis, pattern recognition, and feature description. Pixel intensity from two images or image regions can be compared in the scatter plot to visualize how well the values correspond. Scatter diagrams can be used for feature and pattern matching under limited translation invariance, but they are less useful for affine, scale, or rotation invariance. Fig. 3.13 shows an example using a scatter diagram to look for a pattern in an image, the target pattern is compared at different offsets, the smaller the offset, the better the correspondence. In general, tighter sets of peak features indicate a strong structural or pattern correspondence; more spreading of the data indicates weaker correspondence. The farther away the pattern offset moves, the lower the correspondence.
Note that by analyzing the peak features compared to the low-frequency features, correspondence can be visualized. Fig. 3.14 shows scatter diagrams from two separate images. The lack of peaks along the axis and the presence of spreading in the data show low structural or pattern correspondence.
The scatter plot can be made, pixel by pixel, from two images, where pixel pairs form the Cartesian coordinate for scatter plotting using the pixel intensity of image 1 is used as the x coordinate, and the pixel intensities of image 2 as the y coordinate, then the count of pixel pair correspondence is binned in the scatter plot. The bin count for each coordinate can be false colored for visualization. Fig. 3.15 provides some code for illustration purposes.
For feature detection, as shown in Fig. 3.12, the scatter plot may reveal enough correspondence at coarse translation steps to reduce the need for image pyramids in some feature detection and pattern matching applications. For example, the step size of the pattern search and compare could be optimized by striding or skipping pixels, searching the image at 8 or 16 pixel intervals, rather than at every pixel, reducing feature detection time. In addition, the scatter plot data could first be thresholded to a binary image, masked to show just the peak values, converted into a bit vector, and measured for correspondence using HAMMING distance for increased performance.
Multi-resolution, Multi-scale Histograms
Multi-resolution histograms have been used for texture analysis [46], and also for feature recognition [47]. The PHOG descriptor described in Chap. 6 makes use of multi-scale histograms of feature spectra—in this case, gradient information. Note that the multi-resolution histogram provides scale invariance for feature description. For texture analysis [46], multi-resolution histograms are constructed using an image pyramid, and then a histogram is created for each pyramid level and concatenated together [10], which is referred to as a multi-resolution histogram. This histogram has the desirable properties of algorithm simplicity, fast computation, low memory requirements, noise tolerance, and high reliability across spatial and rotational variations. See Fig. 3.16. A variation on the pyramid is used in the method of Zhao and Pietikainen [15], employing a multidimensional pyramid image set from a volume.
Steps involved in creating and using multi-resolution histograms are as follows:
-
1.
Apply Gaussian filter to image.
-
2.
Create an image pyramid.
-
3.
Create histograms at each level.
-
4.
Normalize the histograms using L1 norm.
-
5.
Create cumulative histograms.
-
6.
Create difference histograms or DOG images (differences between pyramid levels).
-
7.
Renormalize histograms using the difference histograms.
-
8.
Create a feature vector from the set of difference histograms.
-
9.
Use L1 norm as distance function for comparisons between histograms.
Radial Histograms
For some applications, computing the histogram using radial samples originating at the shape centroid can be valuable [128, 129]. To do this, a line is cast from the centroid to the perimeter of the shape, and pixel values are recorded along each line and then binned into histograms. See Fig. 3.17.
Contour or Edge Histograms
The perimeter or shape of an object can be the basis of a shape histogram, which includes the pixel values of each point on the perimeter of the object binned into the histogram. Besides recording the actual pixel values along the perimeter, the chain code histogram (CCH) that is discussed in Chap. 6 shows the direction of the perimeter at connected edge point coordinates. Taken together, the CCH and contour histograms provide useful shape information.
Basis Space Metrics
Features can be described in a basis space, which involves transforming pixels into an alternative basis and describing features in the chosen basis, such as the frequency domain. What is a basis space and what is a transform? Consider the decimal system, which is base 10, and the binary system which is base 2. We can change numbers between the two number systems by using a transform. A Fourier transform uses sine and cosine as basis functions in frequency space, so that the Fourier transform can move pixels between the time-domain pixel space and the frequency space. Basis space moments describe the projection of a function onto a basis space [500]—for example, the Fourier transform projects a function onto a basis of harmonic functions.
Basis spaces and transforms are useful for a wide range of applications, including image coding and reconstruction, image processing, feature description, and feature matching. As shown in Fig. 3.18, image representation and image coding are closely related to feature description. Images can be described using coding methods or feature descriptors , and images also can be reconstructed from the encodings or from the feature descriptors. Many methods exist to reconstruct images from alternative basis space encodings, ranging from lossless RLE methods to lossy JPEG methods; in Chap. 4, we provide illustrations of images that have been reconstructed from only local feature descriptors (see Figs. 4.12, 4.13 and 4.14).
As illustrated in Fig. 3.18, a spectrum of basis spaces can be imagined, ranging from a continuous real function or live scene with infinite complexity, to a complete raster image, a JPEG compressed image, a frequency domain, or other basis representations, down to local feature descriptor sets. Note that the more detail that is provided and used from the basis space representation, the better the real scene can be recognized or reconstructed. So the trade-off is to find the best representation or description, in the optimal basis space, to reach the invariance and accuracy goals using the least amount of compute and memory.
Transforms and basis spaces are a vast field within mathematics and signal processing, covered quite well in other works, so here we only introduce common transforms useful for image coding and feature description. We describe their key advantages and applications, and refer the reader to the literature as we go. See Fig. 3.19.
Since we are dealing with discrete pixels in computer vision, we are primarily interested in discrete transforms, especially those which can be accelerated with optimized software or fixed-function hardware. However, we also cover a few integral transform methods that may be slower to compute and less used. Here is an overview:
-
Global or local feature description. It is possible to use transforms and basis space representations of images as a global feature descriptor, allowing scenes and larger objects to be recognized and compared. The 2D FFT spectrum is only one example, and it is simple to compare FFT spectrum features using SAD or SSD distance measures.
-
Image coding and compression. Many of the transforms have proved valuable for image coding and image compression. The basic method involves transforming the image, or block regions of the image, into another basis space. For example, transforming blocks of an image into the Fourier domain allows the image regions to be represented as sine and cosine waves. Then, based on the amount of energy in the region, a reduced amount of frequency space components can be stored or coded to represent the image. The energy is mostly contained in the lower-frequency components, which can be observed in the Fourier power spectrum such as shown in Fig. 2.16; the high-frequency components can be discarded and the significant lower-frequency components can be encoded, thus some image compression is achieved with a small loss of detail. Many novel image coding methods exist, such as that using a basis of scaled Laplacian features over an image pyramid [310].
Fourier Description
The Fourier family of transforms was covered in detail in Chap. 2, in the context of image preprocessing and filtering. However, the Fourier frequency components can also be used for feature description. Using the forward Fourier transform, an image is transformed into frequency components, which can be selectively used to describe the transformed pixel region, commonly done for image coding and compression, and for feature description.
The Fourier descriptor provides several invariance attributes, such as rotation and scale. Any array of values can be fed to an FFT to generate a descriptor—for example, a histogram. A common application is illustrated in Fig. 3.20, describing the circularity of a shape and finding the major and minor axis as the extrema frequency deviation from the sine wave. A related application is finding the endpoints of a flat line segment on the perimeter by fitting FFT magnitude’s of the harmonic series as polar coordinates against a straight line in Cartesian space.
In Fig. 3.20, a complex wave is plotted as a dark gray circle unrolled around a sine wave function or a perfect circle. Note that the Fourier transform of the lengths of each point around the complex function yields an approximation of a periodic wave, and the Fourier descriptor of the shape of the complex wave is visible. Another example illustrating Fourier descriptors is shown in Fig. 6.29.
Walsh–Hadamard Transform
The Hadamard transform [4, 9] uses a series of square waves with the value of +1 or −1, which is ideal for digital signal processing. It is amenable to optimizations, since only signed addition is needed to sum the basis vectors, making this transform much faster than sinusoidal basis transforms. The basis vectors for the harmonic Hadamard series and corresponding transform can be generated by sampling Walsh functions, which make up an orthonormal basis set; thus, the combined method is commonly referred to as the Walsh–Hadamaard transform; see Fig. 3.21.
HAAR Transform
The HAAR transform [4, 9] is similar to the Fourier transform, except that the basis vectors are HAAR features resembling square waves, and similar to wavelets. HAAR features, owing to their orthogonal rectangular shapes, are suitable for detecting vertical and horizontal images features that have near- constant gray level. Any structural discontinuities in the data, such as edges and local texture, cannot be resolved very well by the HAAR features; see Figs. 3.21 and 6.21.
Slant Transform
The Slant transform [276], as illustrated in Fig. 3.21, was originally developed for television signal encoding, and was later applied to general image coding [4, 275]. The Slant transform is analogous to the Fourier transform, except that the basis functions are a series of slant, sawtooth, or triangle waves. The slant basis vector is suitable for applications where image brightness changes linearly over the length of the function. The slant transform is amenable to discrete optimizations in digital systems. Although the primary applications have been image coding and image compression, the slant transform is amenable to feature description. It is closely related to the Karhunen–Loeve transform and the Slant–Hadamard transform [494].
Zernike Polynomials
Fritz Zernike, 1953 Nobel Prize winner, devised Zernike polynomials during his quest to develop the phase contrast microscope, while studying the optical properties and spectra of diffraction gratings. The Zernike polynomials [264–266] have been widely used for optical analysis and modeling of the human visual system, and for assistance in medical procedures such as laser surgery. They provide an accurate model of optical wave aberrations expressed as a set of basis polynomials, illustrated in Fig. 3.22.
Zernike polynomials are analogous to steerable filters [370], which also contain oriented basis sets of filter shapes used to identify oriented features and take moments to create descriptors. The Zernike model uses radial coordinates and circular regions, rather than rectangular patches as used in many other feature description methods.
Zernike methods are widely used in optometry to model human eye aberrations. Zernike moments are also used for image watermarking [270] and image coding and reconstruction [271, 273]. The Zernike features provide scale and rotational invariance, in part due to the radial coordinate symmetry and increasing level of detail possible within the higher-order polynomials. Zernike moments are used in computer vision applications by comparing the Zernike basis features against circular patches in target images [268, 269].
Fast methods to compute the Zernike polynomials and moments exist [267, 272, 274], which exploit the symmetry of the basis functions around the x and y axes to reduce computations, and also to exploit recursion.
Steerable Filters
Steerable filters are loosely considered as basis functions here, and can be used for both filtering or feature description. Conceptually similar to Zernike polynomials, steerable filters [370, 382] are composed by synthesizing steered or oriented linearly combinations of chosen basis functions, such as quadrature pairs of Gaussian filters and oriented versions of each function, in a simple transform.
Many types of filter functions can be used as the basis for steerable filters [371, 373]. The filter transform is created by combining together the basis functions in a filter bank, as shown in Fig. 3.23. Gain is selected for each function, and all filters in the bank are summed, then adaptively applied to the image. Pyramid sets of basis functions can be created to operate over scale. Applications include convolving oriented steerable filters with target image regions to determine filter response strength, orientation and phase. Other applications include filtering images based on orientation of features, contour detection, and feature description.
For feature description, there are several methods that could work—for example, convolving each steerable basis function with an image patch. The highest one or two filter responses or moments from all the steerable filters can then be chosen as the set-ordinal feature descriptor, or all the filter responses can be used as a feature descriptor. As an optimization, an interest point can first be determined in the patch, and the orientation of the interest point can be used to select the one or two steerable filters closest to the orientation of the interest point; then the closest steerable filers are used as the basis to compute the descriptor.
Karhunen–Loeve Transform and Hotelling Transform
The Karhunen–Loeve transform (KLT) [4, 9] was devised to describe a continuous random process as a series expansion, as opposed to the Fourier method of describing periodic signals. Hotelling later devised a discrete equivalent of the KLT using principal components. “KLT” is the most common name referring to both methods.
The basis functions are dependent on the eigenvectors of the underlying image, and computing eigenvectors is a compute-intensive process with no established fast transform known. The KLT is not separable to optimize over image blocks, so the KLT is typically used for PCA on small datasets such as feature vectors used in pattern classification, clustering, and matching.
Wavelet Transform and Gabor Filters
Wavelets, as the name suggests, are short waves or wave-lets [326]. Think of a wavelet as a short-duration pulse such as a seismic tremor, starting and ending at zero, rather than a continuous or resonating wave. Wavelets are convolved with a given signal, such as an image, to find similarity and statistical moments. Wavelets can therefore be implemented like convolution kernels in the spatial domain. See Fig. 3.24.
Wavelet analysis is a vast field [283, 284] with many applications and useful resources available, including libraries of wavelet families and analysis software packages [281]. Fast wavelet transforms (FWTs) exist in common signal and image processing libraries. Several variants of the wavelet transform include:
-
Discrete wavelet transform (DWT)
-
Stationary wavelet transform (SWT)
-
Continuous wavelet transform (CWT)
-
Lifting wavelet transform (LWT)
-
Stationary wavelet packet transform (SWPT)
-
Discrete wavelet packet transform (DWPT)
-
Fractional Fourier transform (FRFT)
-
Fractional wavelet transform (FRWT)
Wavelets are designed to meet various goals, and are crafted for specific applications; there is no single wavelet function or basis. For example, a set of wavelets can be designed to represent the musical scale, where each note (such as middle C) is defined as having a duration of an eighth note wavelet pulse, and then each wavelet in the set is convolved across a signal to locate the corresponding notes in the musical scale.
When designing wavelets, the mother wavelet is the basis of the wavelet family, and then daughter wavelets are derived using translation, scaling, or compression of the mother wavelet. Ideally, a set of wavelets are overlapping and complementary so as to decompose data with no gaps and be mathematically reversible.
Wavelets are used in transforms as a set of nonlinear basis functions, where each basis function can be designed as needed to optimally match a desired feature in the input function. So, unlike transforms which use a uniform set of basis functions—as the Fourier transform uses sine and cosine functions—wavelets use a dynamic set of basis functions that are complex and nonuniform in nature. See Fig. 3.25.
Wavelets have been used as the basis for scale and rotation invariant feature description [280], image segmentation [277, 278], shape description [279], and obviously image and signal filtering of all the expected varieties, denoising, image compression, and image coding. A set of application-specific wavelets could be devised for feature description.
Gabor Functions
Wavelets can be considered an extension of the earlier concept of Gabor functions [285, 325], which can be derived for imaging applications as a set of 2D oriented bandpass filters. Gabor’s work was centered on the physical transmission of sound and problems with Fourier methods involving time-varying signals like sirens that could not be perfectly represented as periodic frequency information. Gabor proposed a more compact representation than Fourier analysis could provide, using a concept called atoms that recorded coefficients of the sound that could be transmitted more compactly. See Fig. 3.26.
Hough Transform and Radon Transform
The Hough transform [220–222] and the Radon transform [291] are related, and the results are equivalent, in the opinion of many [287, 292]; see Fig. 3.27. The Radon transform is an integral transform, while the Hough transform is a discrete method, therefore much faster. The Hough method is widely used in image processing, and can be accelerated using a GPU [290] with data parallel methods. The Radon algorithm is slightly more accurate and perhaps more mathematically sound, and is often associated with X-ray tomography applied to reconstruction from X-ray projections. We focus primarily on the Hough transform, since it is widely available in image processing libraries.
Key applications for the Hough and Radon transforms are shape detection and shape description of lines, circles, and parametric curves. The main advantages include:
-
Robust to noise and partial occlusion
-
Fill gaps in apparent lines, edges, and curves
-
Can be parameterized to handle various edge and curve shapes
The disadvantages include:
-
Look for one type or parameterization of a feature at a time, such as a line
-
Colinear segments are not distinguished and lumped together
-
May incorrectly fill in gaps and link edges that are not connected
-
Length and position of lines are not determined, but this can be done in image space
The Hough transform is primarily a global or regional descriptor and operates over larger areas. It was originally devised to detect lines, and has been subsequently generalized to detect parametric shapes [293], such as curves and circles. However, adding more parameterization to the feature requires more memory and compute. Hough features can be used to mark region boundaries described by regular parametric curves and lines. The Hough transform is attractive for some applications, since it can tolerate gaps in the lines or curves and is not strongly affected by noise or some occlusion, but morphology and edge detection via other methods is often sufficient, so the Hough transform has limited applications.
The input to the Hough transform is a gradient magnitude image, which has been thresholded, leaving the dominant gradient information. The gradient magnitude is used to build a map revealing all the parameterized features in the image—for example, lines at a given orientation or circles with a given diameter. For example, to detect lines, we map each gradient point in the pixel space into the Hough parameter space, parameterized as a single point (d , θ) corresponding to all lines with orientation angle θ at distance d from the origin. Curve and circle parameterization uses different variables [293]. The parameter space is quantized into cells or accumulator bins, and each accumulator is updated by summing the number of gradient lines passing through the same Hough points. The accumulator method is modified for detecting parametric curves and circles. Thresholding the accumulator space and reprojecting only the highest accumulator values as overlays back onto the image is useful to highlight features.
Summary
This chapter provides a selected history of global and regional metrics, with the treatment of local feature metrics deferred until Chaps. 4 and 6. Some historical context is provided on the development of structural and statistical texture metrics, as well as basis spaces useful for feature description, and several common regional and global metrics. A wide range of topics in texture analysis and statistical analysis are surveyed with applications to computer vision.
Since it is difficult to cleanly partition all the related topics in image processing and computer vision, there is some overlap of topics in here and in Chaps. 2, 4, 5, and 6.
Chapter 3: Learning Assignments
-
1.
Discuss when to use a global image processing operation vs. a local or regional image processing operation.
-
2.
Discuss in general how global image statistics can guide image preprocessing for computer vision applications, and specifically name one global image metric and discuss how it can be applied.
-
3.
Compare global image feature metrics and local feature descriptors in general, and discuss a specific example global feature metric and compare it to a specific local feature descriptor.
-
4.
Describe global image texture in general terms.
-
5.
Discuss how a 2d histogram of an image can be used to understand image texture.
-
6.
Discuss how the 2d Fourier Series of an image is used to understand image texture.
-
7.
Discuss how the Haralick texture metrics based on the co-occurrence matrix are used to understand image texture.
-
8.
Discuss how Spatial Dependency Matrix (SDM) plots are used to understand image texture.
-
9.
Discuss statistical moments of an image histogram, including at least the mean value and variance, and how these features are useful as global image descriptors.
-
10.
Describe a multi-resolution histogram built from an image pyramid, and how to interpret the results of the histogram.
-
11.
Describe how a Fourier description of the shape of a circle is created from the Fourier Series, and how it is useful as a shape descriptor.
-
12.
Describe basis features for the HAAR transform, Slant Transform, and Walsh–Hadamard Transform.
-
13.
Compare Wavelet features to Fourier Series features.
-
14.
Describe the Hough Transform and the Radon Transform algorithms, and how they are used as a global image metric for shape detection.
Notes
- 1.
See the NIST online resource for engineering statistics: http://www.itl.nist.gov/div898/handbook/
References
Bajcsy, R.: Computer description of textured surfaces. Int. Conf. Artif. Intell. Stat. (1973)
Bajcsy, R., Lieberman, L.: Texture gradient as a depth cue. Comput. Graph. Image Process. 5(1), (1976)
Cross, G.R., Jain, A.K.: Markov random field texture models. PAMI 54(1), (1983)
Gonzalez, R., Woods, R.: Digital Image Processing, 3rd edn. Prentice-Hall, Englewood Cliffs, NJ (2007)
Haralick, R.M.: Statistical and structural approaches to texture. Proc. Int. Joint Conf. Pattern Recogn. (1979)
Haralick, R.M., Shanmugan, R., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), (1973)
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans. Inform. Theor. 8(2), (1962)
Lu, H.E., Fu, K.S.: A syntactic approach to texture analysis. Comput. Graph. Image Process. 7(3), (1978)
Pratt, W.K.: Digital image processing, 3rd edn. Wiley, Hoboken, NJ (2002)
Rosenfeld, A., Kak, A.C.: Digital picture processing, 2nd edn. Academic Press, New York (1982)
Tomita, F., Shirai, Y., Tsuji, S.: Description of texture by a structural analysis. Pattern. Anal. Mach. Intell. 4(2), (1982)
Wong, R.Y., Hall, E. L.: Scene matching with invariant moments. Comput. Graph. Image Process. 8 (1978)
Guoying, Z., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Trans. Pattern. Anal. Mach. Intell. 29(6), 915–928 (2007)
Kellokumpu, V., Guoying Z., Pietikäinen, M.: Human activity recognition using a dynamic texture based method
Guoying, Z., Pietikäinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern. Anal. Mach. Intell. 29(6), 915–928 (2007)
Eichmann, G., Kasparis, T.: Topologically invariant texture descriptors. Comput. Vis. Graph. Image Process. 41(3), (1988)
Lam, S.W.C., Ip, H.H.S.: Structural texture segmentation using irregular pyramid. Pattern Recogn. Lett. 15(7), (1994)
Pietikäinen, M., Guoying, Z., Hadid, A.: Computer Vision Using Local Binary Patterns. Springer, New York (2011)
Ojala, T., Pietikäinen, M., Hardwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. Proc. Int. Conf. Pattern. Recogn. (1994)
Ojala, T., Pietikäinen, M., Hardwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996)
Van Ginneken, B., Koenderink, J.J.: Texture histograms as a function of irradiation and viewing direction. Int. J. Comput. Vis. 31(2/3), 169–184 (1999)
Stelu, A., Arati, K., Dong-Hui, X.: Texture analysis for computed tomography studies. Visual Computing Workshop DePaul University, (2004)
Krig, S.A.: Image texture analysis using spatial dependency matrices. Krig Research White Paper Series, (1994)
Laws, K.I.: Rapid texture identification. SPIE 238 (1980)
Bajcsy, R.K.: Computer identification of visual surfaces. Comput. Graph. Image Process. 2(2), 118–130 (1973)
Kaizer, H.: A quantification of textures on aerial photographs. MS Thesis, Boston University, (1955)
Laws, K.I.: Texture energy measures. Proceedings of the Image Understanding Workshop, (1979)
Laws, K.I.: Rapid texture identification. SPIE 238 (1980)
Laws, K.I.: Textured image segmentation. PhD Thesis, University of Southern California, (1980)
Ade, F.: Characterization of textures by “Eigenfilters.” Signal Process. 5 (1983)
Davis, L.S.: Computing the spatial structures of cellular texture. Comput. Graph. Image Process. 11(2), (1979)
Eichmann, G., Kasparis, T.: Topologically invariant texture descriptors. Comput. Vis. Graph. Image Process. 41?(3), (1988)
Lam, S.W.C., Ip, H.H.S.: Structural texture segmentation using irregular pyramid. Pattern Recogn. Lett. 15(7), (1994)
Pietikäinen, M., Guoying, Z., Hadid, A.: Computer vision using local binary patterns. Springer, New York (2011)
Ojala, T., Pietikäinen, M., Hardwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. Proc. Int. Conf. Pattern. Recogn. (1994)
Ojala T., Pietikäinen, M., Hardwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996)
Pun, C.M., Lee, M.C.: Log-polar wavelet energy signatures for rotation and scale invariant texture classification. Trans. Pattern. Anal. Mach. Intell. 25(5), (2003)
Spence, A., Robb, M., Timmins, M., Chantler, M.: Real-time per-pixel rendering of textiles for virtual textile catalogues. Proceedings of INTEDEC, Edinburgh, (2003)
Lam, S.W.C., Horace, H.S.I.: Adaptive pyramid approach to texture segmentation. Comput. Anal. Images Patterns Lect. Notes Comput. Sci. 719, 267–274 (1993)
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and Texture of Real World Surfaces. Technical Report CUCS-048-96, Columbia University, (1996)
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and texture of real world surfaces. Conf. Comput. Vis. Pattern Recogn. (1997)
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and texture of real world surfaces. ACM Trans. Graph. (1999)
Suzuki, M.T., Yaginuma, Y.: A solid texture analysis based on three dimensional convolution kernels. Proc. SPIE 6491, (2007)
Suzuki, M.T., Yaginuma, Y., Yamada, T., Shimizu, Y.: A shape feature extraction method based on 3D convolution masks. Eighth IEEE International Symposium on Multimedia, ISM’06. (2006)
Guoying, Z., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Trans. Pattern. Anal. Mach. Intell. 29 (2007)
Hadjidemetriou, E., Grossberg, M.D., Nayar, S.K.: Multiresolution histograms and their use for texture classification. IEEE PAMI 26
Hadjidemetriou, E., Grossberg, M.D., Nayar, S.K.: Multiresolution histograms and their use for recognition. IEEE PAMI 26(7), (2004)
Lee, K.L., Chen, L.H.: A new method for coarse classification of textures and class weight estimation for texture retrieval. Pattern Recogn. Image Anal. 12(4), (2002)
Van Ginneken, B., Koenderink, J.J.: Texture histograms as a function of irradiation and viewing direction. Int. J. Comput. Vis. 31(2/3), 169–184 (1999)
Shu, L., Chung, A.C.S.: Texture classification by using advanced local binary patterns and spatial distribution of dominant patterns. ICASSP 2007. IEEE Int. Conf. Acoust. Speech Signal Process. (2007)
Stelu, A., Arati, K., Dong-Hui, X.:. Texture analysis for computed tomography studies. Visual Computing Workshop DePaul University, (2004)
Ade, F.: Characterization of textures by “Eigenfilters.” Signal Process. 5 (1983)
Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Understand. 73(2)
Russel, B., Jianxiong, X., Torralba, A.: Localizing 3D cuboids in single-view images. Conf. Neural Inform. Process. Syst. (2012)
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (SIGGRAPH Proc.) (2006)
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. (TBP)
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. Conf. Comput. Vis. Pattern Recogn. (2010)
Yunpeng, L., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. Eur. Conf. Comput. Vis. (2012)
Russell, B., Torralba, A., Murphy, K., Freeman, W.T.: LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77 (2007).
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42 (2001)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. Int. Conf. Robot Autom. (2011)
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. Conf. Comput. Vis. Pattern Recogn. (2010)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Conf. Comput. Vis. Pattern Recogn. (2004)
Fei-Fei, L.: ImageNet: crowdsourcing, benchmarking & other cool things. CMU VASC Semin. (2010)
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. Conf. Comput. Vis. Pattern Recogn. (2012)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. Conf. Comput. Vis. Pattern Recogn. (2009)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. Int. Conf. Robot Autom. (2011)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. Eur. Conf. Comput. Vis. (2012)
Xiaofeng R., Philipose, M.: Egocentric recognition of handled objects: benchmark and analysis. CVPR Workshops, (2009)
Xiaofeng, R., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. Conf. Comput. Vis. Pattern Recogn. (2009)
Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. Eur. Conf. Comput. Vis. (2012)
Dana, K.J., van Ginneken, B., Nayar, S.K. Koenderink, J. J.: Reflectance and texture of real world surfaces. Trans. Graph. 18(1), (1999)
Ce, L., Sharan, L., Adelson, E.H., Rosenholtz, R.: Exploring features in a Bayesian framework for material recognition. Conf. Comput. Vis. Pattern Recogn. (2010)
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical report 07-49, University of Massachusetts, Amherst, (2007)
Gross, R., Matthews, I., Cohn, J.F., Kanade, T., Baker, S.: Multi-PIE. Proceedings of the Eighth IEEE International Conference on Automatic Face and Gesture Recognition, (2008)
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L.J., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. Int. Conf. Comput. Vis. (2011)
LeCun, Y., Huang, FJ., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. Proc. Conf. Comput. Vis. Pattern Recogn. (2004)
McCane, B., Novins, K., Crannitch, D., Galvin, B.: On benchmarking optical flow. Comput. Vis. Image Understand. 84(1), (2001)
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. Conf. Comput. Vis. Pattern Recogn. Provid. Rhode Island. (2012)
Hamarneh, G., Jassi, P., Tang, L.: Simulation of ground-truth validation data via physically- and statistically-based warps. MICCAI 2008, the 11th International Conference on Medical Image Computing and Computer Assisted Intervention
Prastawa, M., Bullitt, E., Gerig, G.: Synthetic ground truth for validation of brain tumor MRI segmentation. MICCAI 2005, the 8th International Conference on Medical Image Computing and Computer Assisted Intervention
Vedaldi, A., Ling, H., Soatto, S.: Knowing a good feature when you see it: ground truth and methodology to evaluate local features for recognition. Comput. Vis. Stud. Comput. Intell. 285, 27–49 (2010)
Dutagaci, H., Cheung, C.P., Godil, A.: Evaluation of 3D interest point detection techniques via human-generated ground truth. The Visual Computer 28 (2012)
Rosin, PL.: Augmenting corner descriptors. Graph. Model. Image Process. 58(3), (1996)
Rockett, P.I.: Performance assessment of feature detection algorithms: a methodology and case study on corner detectors. Trans. Image Process. 12(12), (2003)
Shahrokni, A., Ellis, A., Ferryman, J.: Overall evaluation of the PETS2009 results. IEEE PETS (2009)
Over, P., Awad, G., Sanders, G., Shaw, B., Martial, M., Fiscus, J., Kraaij, W., Smeaton, AF.: TRECVID 2013: An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics, NIST USA, (2013)
Horn, B.K.P., Schunck, B.G.: Determining Optical Flow. AI Memo 572, Massachusetts Institute of Technology, (1980)
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), (2010)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the Wild.” Conf. Comput. Vis. Pattern Recogn. (2009)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. Trans. Pattern. Anal. Mach. Intell. 33(5), (2011)
Fisher, R.B.: PETS04 surveillance ground truth data set. Proc. IEEE PETS. (2004)
Quan Y., Thangali, A., Ablavsky, V., Sclaroff, S.: Learning a family of detectors via multiplicative kernels. Pattern. Anal. Mach. Intell. 33(3), (2011)
Ericsson, A., Karlsson, J.: Measures for benchmarking of automatic correspondence algorithms. J. Math. Imaging Vis. (2007)
Takhar, D., et al.: A new compressive imaging camera architecture using optical-domain compression. In: Proceedings of IS&T/SPIE Symposium on Electronic Imaging (2006)
Marco, F.D., Baraniuk, R.G.: Kronecker compressive sensing. IEEE Trans. Image Process. 21(2), (2012)
Weinzaepfel, P., Jegou, H., Perez, P.: Reconstructing an image from its local descriptors. Conf. Comput. Vis. Pattern Recogn. (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Conf. Comput. Vis. Pattern Recogn. (2005)
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Graph. Vis. 3(3), 177–280 (2007)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Fischler, M.A., Bolles, RC.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), (1981)
Sunglok, C., Kim, T., Yu, W.: Performance evaluation of RANSAC family. Br. Mach. Vis. Assoc. (2009)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-means clustering algorithm. J. Royal Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik 133 (1908)
Capel, D.: Random forests and ferns. Penn. State University Computer Vision Laboratory, seminar lecture notes online:. ForestsAndFernsTalk.pdf.
Xiaofeng, R., Malik, J.: Learning a classification model for segmentation
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information
Xiaofeng, R., Ramanan, D.: Histograms of sparse codes for object detection. Conf. Comput. Vis. Pattern Recogn. (2013)
Liefeng, B., Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. Conf. Comput. Vis. Pattern Recogn. (2013)
Herbst, E., Ren, X., Fox, D.: RGB-D flow: dense 3-D motion estimation using color and depth. IEEE Int. Conf. Robot Autom. (ICRA) (2013)
Xiaofeng, R., Bo, L.: Discriminatively trained sparse code gradients for contour detection. Conf. Neural Inform. Process. Syst. (2012)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. ICCV ’11 Proceedings of the 2011 International Conference on Computer Vision
Rosenfeld, A., Pfaltz, J.L.: Distance functions on digital images. Pattern Recog. 1, 33–61 (1968)
Richardson, A., Olson, E.: Learning convolutional filters for interest point detection. IEEE Int. Conf. Robot Autom. ICRA’13 IEEE, 631–637, (2013)
Moon, T.K., Stirling, W.C.: Mathematical Methods and Algorithms for Signal Processing. Prentice-Hall, Englewood Cliffs, NJ (1999)
Liefeng, B, Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. Conf. Comput. Vis. Pattern Recogn. (2013)
Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. Conf. Comput. Vis. Pattern Recogn. (2013)
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996)
d’Angelo, E., Alahi, A., Vandergheynst, P.: Beyond bits: reconstructing images from local binary descriptors. Swiss Federal Institute of Technology, 21st International Conference on Pattern Recognition (ICPR), (2012)
Dengsheng, Z., Lu, G.: Review of shape representation and description techniques. J. Pattern Recogn. Soc. 37, 1–19 (2004)
Yang M., Kidiyo, K., Joseph, R.: A survey of shape feature extraction techniques. Pattern Recogn. 43–90, (2008)
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. Conf. Comput. Vis. Pattern Recogn. (2012)
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. Int. Conf. Comput. Vis. (2011)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. ECCV’10 Proceedings of the 11th European Conference Computer Vision: Part IV, (2010)
Calonder, M., et al.: BRIEF: computing a local binary descriptor very fast. Pattern. Anal. Mach. Intell. 34 (2012)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. ICCV ’11 Proceedings of the 2011 International Conference on Computer Vision, (2011)
von Hundelshausen, F., Sukthankar, R.: D-Nets: beyond patch-based image descriptors. Conf. Comput. Vis. Pattern Recogn. (2012)
Krig, S.: RFAN radial fan descriptors. Picture Center Imaging and Visualization System, White Paper Series (1992)
Krig, S.: Picture Center Imaging and Visualization System. Krig Research White Paper Series (1994)
Rosten, E., Drummond, T.: FAST machine learning for high-speed corner detection. Eur. Conf. Comput. Vis. (2006)
Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. Int. Conf. Comput. Vis. (2005)
Liefeng, B., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. Conf. Neural Inform. Process. Syst. (2011)
Miksik, O., Mikolajczyk, K.: Evaluation of local detectors and descriptors for fast feature matching. Int. Conf. Pattern. Recogn. (2012)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Gleason, J.: BRISK (Presentation by Josh Gleason) at International Conference on Computer Vision, (2011)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. Pattern. Anal. Mach. Intell. IEEE Trans. 27(10), (2005)
Gauglitz, S., Höllerer, T., Turk, M.: Evaluation of interest point detectors and feature descriptors for visual tracking. Int. J. Comput. Vis. 94(3), (2011)
Viola, Jones. Robust real time face detection. Int. J. Comput. Vis. 57(2), (2004)
Thevenaz, P., Ruttimann, U.E., Unser, M.: A pyramid approach to subpixel registration based on intensity. IEEE Trans. Image Process. 7(1), (1998)
Qi, T., Huhns, M.N.: Algorithms for subpixel registration. Comput. Vis. Graph. Image Process. 35 (1986)
Zhu, J., Yang, L.: Subpixel eye gaze tracking. Autom. Face Gesture Recogn. Conf. (2002)
Cheezum, M.K., Walker, W.F., Guilford, W.H.: Quantitative comparison of algorithms for tracking single fluorescent particles. Biophys. J. 81(4), 2378–2388 (2001)
Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algorithms. Opt. Lett. 33(2), 156–158 (2008)
Hadjidemetriou, E., Grossberg, M.D., Nayar, S.K.: Multiresolution histograms and their use for texture classification. Int. Workshop Texture Anal. Synth. 26(7), (2003)
Mikolajczyk, K., et al.: A comparison of affine region detectors. Conf. Comput. Vis. Pattern Recogn. (2006)
Canny, A.: Computational approach to edge detection. Trans. Pattern. Anal. Mach. Intell. 8(6), (1986)
Gunn, S.R.: Edge detection error in the discrete Laplacian of Gaussian. International Conference on Image Processing, ICIP 98. Proceedings. vol 2, (1998)
Harris, C., Stephens, M.: A combined corner and edge detector. Proceedings of the 4th Alvey Vision Conference, (1988)
Shi, J., Tomasi, C.: Good features to track. Conf. Comput. Vis. Pattern Recogn. (1994)
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 1991 © MIT Media Lab, (1991)
Haja, A., Jahne, B., Abraham, S.: Localization accuracy of region detectors. IEEE CVPR (2008)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Understand. 110(3), 346–359 (2008)
Lowe, D.G.: SIFT distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. Eur. Conf. Comput. Vis. (2004)
Kadir, T., Brady, J.M.: Scale, saliency and image description. Int. J. Comput. Vis. 45(2), 83–105 (2001)
Smith, S.M., Michael Brady, J.: SUSAN—a new approach to low level image processing. Technical report TR95SMS1c (patented), Crown Copyright (1995), Defence Research Agency, UK, (1995)
Smith, S.M., Michael Brady, J.: SUSAN—a new approach to low level image processing. Int. J. Comput. Vis. Arch. 23(1), 45–78 (1997)
Baohua, Y., Cao, H., Chu, J.: Combining local binary pattern and local phase quantization for face recognition. Int. Symp. Biometr. Secur. Technol. (2012)
Ojansivu, V., Heikkil, J.: Blur insensitive texture classification using local phase quantization. Proc. Image Signal Process. (2008)
Chan, C.H., Tahir, M.A., Kittler, J., Pietikäinen, M.: Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors. PAMI (2012)
Ojala, T., Pietikäinen, M., Hardwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. Proc. Int. Conf. Pattern. Recogn. (1994)
Ojala, T., Pietikäinen, M., Hardwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996)
Pietikäinen, M., Heikkilä, J.: Tutorial on image and video description with local binary pattern variants. Conf. Comput. Vis. Pattern Recogn. (2011)
Shu, L., Albert, C.S.: Chung. Texture classification by using advanced local binary patterns and spatial distribution of dominant patterns. IEEE Int. Conf. Acoust. Speech Signal Process. ICASSP, (2007)
Pietikäinen, M., Hadid, A., Zhao, G., Ahonen, T.: Computer Vision Using Binary Patterns. Computational Imaging and Vision Series, vol. 40. Springer, New York (2011)
Arandjelovi, A., Zisserman, A.: Three things everyone should know to improve object retrieval. Conf. Comput. Vis. Pattern Recogn. (2011)
Guoying Z., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern. Anal. Mach. Intell. IEEE Trans. 29(6), (2007)
Kellokumpu, V., Guoying Z., Pietikäinen, M.: Human activity recognition using a dynamic texture based method. Br. Mach. Vis. Conf. (2008)
Zabih, R., Woodfill, J.: Nonparametric local transforms for computing visual correspondence. Eur. Conf. Comput. Vis. (1994)
Lowe, D.G.: Object recognition from local scale-invariant features. The Proceedings of the Seventh IEEE International Conference on Computer Vision, (1999)
Abdel-Hakim, A.E., Farag, A.A.: CSIFT: a SIFT descriptor with color invariant characteristics. Conf. Comput. Vis. Pattern Recogn. (2006)
Vinukonda, P.: A study of the scale-invariant feature transform on a parallel pipeline. Thesis Project
Alcantarilla, P.F., Bergasa, L.M., Davison, A.: Gauge-SURF Descriptors. Elsevier, (2011)
Christopher, E.: Notes on the OpenSURF Library, University of Bristol Technical Paper, (2009)
Yan, K., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. Conf. Comput. Vis. Pattern Recogn. (2004)
Gauglitz, S., Höllerer, T., Turka, M.: Evaluation of interest point detectors and feature descriptors for visual tracking. Int. J. Comput. Vis. 94 (2011)
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. Eur. Conf. Comput. Vis. (2008)
Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vis. 57(2), 137–154 (2002)
Grigorescu, S.E., Petkov, N., Kruizinga, P.: Comparison of texture features based on Gabor filters. IEEE Trans. Image Process. 11(10), (2002)
Alcantarilla, P., Bergasa, L.M., Davison, A.: Gauge-SURF descriptors. Image Vis. Comput. 31(1), 103–116 (2013). Elsevier via DOI 1302
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. Eur. Conf. Comput. Vis. (2008)
Morse, B.S.: Lecture 11: Differential Geometry. Brigham Young University, (1998/2000). http://morse.cs.byu.edu/650/lectures/lect10/diffgeom.pdf
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. CIVR ’07 Proceedings of the 6th ACM International Conference on Image and Video Retrieval
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. (2001)
Matas, J., Chum, O., Urba, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. Proc. Br. Mach. Vis. Conf. (2002)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. ACM Proceedings of the 15th International Conference on Multimedia, pp. 357–360, (2007)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. Br. Mach. Vis. Conf. (2008)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64 (2005)
Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. Conf. Comput. Vis. Pattern Recogn. (2013)
Ke, Y., et al.: Efficient visual event detection using volumetric features. Int. Conf. Comput. Vis. (2005)
Zhang, L., da Fonseca, M.J., Ferreira, A.: Survey on 3D shape descriptors. União Europeia—Fundos Estruturais Governo da República Portuguesa Referência: POSC/EIA/59938/2004
Tangelder, J.W.H., Veltkamp, R.C.: A Survey of Contrent-Based 3D Shape Retrieval Methods. Springer, New York (2007)
Heikkila, M., Pietikäinen, M., Schmid, C.: Description of interest regions with center-symmetric local binary patterns. Comput. Vis. Graph. Image Process. Lect. Notes Comput. Sci. 4338, 58–69 (2006)
Schmidt, A., Kraft, M., Fularz, M., Domagała, Z.: The comparison of point feature detectors and descriptors in the context of robot navigation. Workshop on Perception for Mobile Robots Autonomy, (2012)
Jun, B., Kim, D.: Robust face detection using local gradient patterns and evidence accumulation. Pattern Recogn. 45(9), 3304–3316 (2012)
Froba, B., Ernst, A.: Face detection with the modified census transform. Int. Conf. Autom. Face Gesture Recogn. (2004)
Freeman, H. On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. (1961)
Salem, A.B.M., Sewisy, A.A., Elyan, U.A.: A vertex chain code approach for image recognition. Int. J. Graph. Vis. Image Process. ICGST-GVIP, (2005)
Kitchen, L., Rosenfeld, A.: Gray-level corner detection. Pattern Recogn. Lett. 1 (1992)
Koenderink, J., Richards, W.: Two-dimensional curvature operators. J. Opt. Soc. Am. 5(7), 1136–1141 (1988)
Bretzner, L., Lindeberg, T.: Feature tracking with automatic selection of spatial scales. Comput. Vis. Image Understand. 71(3), 385–392 (1998)
Lindeberg, T.: Junction detection with automatic selection of detection scales and localization scales. Proceedings of First International Conference on Image Processing, (1994)
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 79–116 (1998)
Wang, H., Brady, M.: Real-time corner detection algorithm for motion estimation. Image Vis. Comput. 13(9), 695–703 (1995)
Trajkovic, M., Hedley, M.: Fast corner detection. Image Vis. Comput. 16(2), 75–87 (1998)
Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide baseline stereo. PAMI 32(5), (2010)
Arbeiter, G., et al.: Evaluation of 3D feature descriptors for classification of surface geometries in point clouds. Int. Conf. Intell. Robots Syst. (2012) IEEE/RSJ
Rupell, A., Weisshardt, F., Verl, A.: A rotation invariant feature descriptor O-DAISY and its FPGA implementation. IROS (2011)
Ambai, M., Yoshida, Y.: CARD: compact and real-time descriptors. Int. Conf. Comput. Vis. (2011)
Takacs, G., et al.: Unified real-time tracking and recognition with rotation-invariant fast features. Conf. Comput. Vis. Pattern Recogn. (2010)
Taylor, S., Rosten, E., Drummond, T.: Robust feature matching in 2.3 μs. Conf. Comput. Vis. Pattern Recogn. (2009)
Grauman, K., Darrell, T.: The pyramid Match Kernel: discriminative classification with sets of image features. IEEE Int. Conf. Comput. Vis. Tenth 2, (2005)
Takacs, G., et al.: Unified real-time tracking and recognition with rotation-invariant fast features. Conf. Comput. Vis. Pattern Recogn. (2010)
Chandrasekhar, V., et al.: CHoG: compressed histogram of gradients, a low bitrate descriptor. Conf. Comput. Vis. Pattern Recogn. (2009)
Mainali, G.L., et al.: SIFER: scale-invariant feature detector with error resilience. Int. J. Comput. Vis. (2013)
Fowers, S.G., Lee, D.J., Ventura, D., Wilde, D.K.: A novel, efficient, tree-based descriptor and matching algorithm (BASIS). Conf. Comput. Vis. Pattern Recogn. (2012)
Fowers, S.G., Lee, D.J., Ventura, D.A., Archibald, J. K.: Nature inspired BASIS feature descriptor and its hardware implementation. IEEE Trans. Circ. Syst. Video Technol. (2012)
Bracewell, R.: The Fourier Transform & Its Applications, 3 ed., McGraw-Hill Science/Engineering/Math, (1999)
Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM. (1972)
Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), (1981)
Illingsworth, J., Kitter, K.: A survey of the Hough transform. Comput. Vis Graph. Image Process. (1988)
Slaton, G., MacGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. (2008)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. Eur. Conf. Comput. Vis. (2006)
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of key-points. SLCV workshop, Eur. Conf. Comput. Vis. (2004)
Dean, T., Washington, R., Corrado, G.: Sparse spatiotemporal coding for activity recognition. Brown Univ. Tech. Rep. (2010)
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. Conf. Comput. Vis. Pattern Recogn. (2011)
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Belongie, S., Malik, J., Puzicha, J.: Matching with shape context. CBAIVL ’00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries
Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. Conf. Neural Inform. Process. Syst. (2000)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24(4), (2002)
Belongie, S., Malik, J., Puzich, J.: Matching shapes with shape context. CBAIVL ’00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries
Liefeng, B., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. ISER, vol 88 of Springer Tracts in Advanced Robotics. Springer, pp. 387–402, (2012)
Loy, G., Zelinsky, A.: A fast radial symmetry transform for detecting points of interest. Eur. Conf. Comput. Vis. (2002)
Wolf, L., Hassner, T., Taigman, Y.: Descriptor based methods in the wild. Eur. Conf. Comput. Vis. (2008)
Kurz, D., Ben Himane, S.: Inertial sensor-aligned visual feature descriptors. Conf. Comput. Vis. Pattern Recogn. (2011)
Kingsbury, N.: Rotation-invariant local feature matching with complex wavelets. Proc. Eur. Conf. Signal Process. (EUSIPCO), (2006)
Dinggang, S., Ip, H.H.S.: Discriminative wavelet shape descriptors for recognition of 2-D patterns. Pattern Recogn. 32(2), 151–165 (1999)
Edelman, S., Intrator, N., Poggio, T.: Complex cells and object recognition. Conf. Neural Inform. Process. Syst. (1997)
Hunt, R.W.G., Pointer, M.R.: Measuring Colour. Wiley, Hoboken, NJ (2011)
Hunt, R.W.G.: The reproduction of color, 6 ed., Wiley, (2004)
Berns, R.S.: Billmeyer and Saltzman’s Principles of Color Technology. Wiley, Hoboken, NJ (2000)
Morovic, J.: Color Gamut Mapping. Wiley, Hoboken, NJ (2008)
Fairchild, M.: Color appearance models. 1st ed., Addison Wesley Longman, (1998)
Ito, M., Tsubai, M., Nomura, A.: Morphological operations by locally variable structuring elements and their applications to region extraction in ultrasound images. Syst. Comput. Jpn. 34(3), 33–43 (2003)
Tsubai, M., Ito, M.: Control of variable structure elements in adaptive mathematical morphology for boundary enhancement of ultrasound images. Electron. Commun. Jpn. Part 3 Fund. Electron. Sci. 87(11), 20–33
Mazille, J.E.: Mathematical morphology and convolutions. J. Microsc. 156, 257 (1989)
Achanta, R., et al.: SLIC superpixels compared to state-of-the-art superpixel methods. PAMI 34(11), (2012)
Achanta, R., et al.: SLIC superpixels. EPFL technical report no. 149300, (2010)
Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. J. Comput. Vis. (2004)
Levinshtein, A., et al.: Turbopixels: fast superpixels using geometric flows. PAMI (2009)
Lucchi, A., et al.: A fully automated approach to segmentation of irregularly shaped cellular structures in EM images. MICCAI (2010)
Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI (2000)
Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. Eur. Conf. Comput. Vis. (2008)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59 (2004)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. PAMI 24(5), (2002)
Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. Eur. Conf. Comput. Vis. (2008)
Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. PAMI 13(6), (1991)
Levinshtein, A., et al.: Turbopixels: fast superpixels using geometric flows. PAMI 31(12), (2009)
Scharstein, D., Pal, C.: Learning conditional random fields for stereo. Conf. Comput. Vis. Pattern Recogn. (2007)
Hirschmüller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. Conf. Comput. Vis. Pattern Recogn. (2007)
Goodman, J.W.: Introduction to Fourier optics. McGraw-Hill, New York (1968)
Gaskill, J.D.: Linear Systems, Fourier Transforms, Optics. Wiley, Hoboken, NJ (1978)
Thibos, L., Applegate, R.A., Schweigerling, J.T., Webb, R.: Standards for reporting the optical aberrations of eyes. In: Lakshminarayanan, V. (ed.) OSA Trends in Optics and Photonics, Vision Science and its Applications. Optical Society of America, Washington, DC (2000)
Hwang, S.-K., Kim, W.-Y.: A novel approach to the fast computation of Zernike moments. Pattern Recogn. 39 (2006)
Khotanzad, A., Hong, Y.H.: Invariant image recognition by Zernike moments. PAMI 12 (1990)
Chao Kan, M., Srinath, D.: Invariant character recognition with Zernike and orthogonal Fourier-Mellin moments. Pattern Recogn. 35, (2002)
Hyung, S.K., Lee, H.-K.: Invariant image watermark using Zernike moments. IEEE Trans. Circ. Syst. Video Technol. 13(8), (2003)
Papakostas, G.A., Karras, D.A., Mertzios, B.G.: Image coding using a wavelet based Zernike moments compression technique. In: Proceeding of: Digital Signal Processing, vol 2, DSP, (2002)
Mukundan, R., Ramakrishnan, K.R.: Fast computation of Legendre and Zernike moments. 28(9), 1433–1442, (1995)
Yongqing, X., Pawlak, M., Liao, S.: Image reconstruction with polar Zernike moments. ICAPR’05 Proceedings of the Third International Conference on Pattern Recognition and Image Analysis—Volume Part II (2005)
Singh, C., Upneja, R.: Fast and accurate method for high order Zernike moments computation. Appl. Math. Comput. 218(15), 7759–7773 (2012)
Pratt, W., Chen, W.-H., Welch, L.: Slant transform image coding. IEEE Trans. Commun. 22(8), (1974)
Enomoto, H., Shibata, K.: Orthogonal transform coding system for television signals. IEEE Trans. Electromagn. Compatibil. 13(3), (1974)
Dutra da Silva, R., Robson, W., Pedrini Schwartz, H.: Image segmentation based on wavelet feature descriptor and dimensionality reduction applied to remote sensing. Chilean J. Stat. 2 (2011)
Arun, N., Kumar, M., Sathidevi, P.S.: Wavelet SIFT feature descriptors for robust face recognition. Springer Adv. Intell. Syst. Comput. 177 (2013)
Dinggang, S., Ip, H.H.S.: Discriminative wavelet shape descriptors for recognition of 2-D patterns. Pattern Recogn. 32 (1999)
Kingsbury, N.: Rotation-invariant local feature matching with complex wavelets. Proc. Eur. Conf. Signal Process. EUSIPCO (2006)
Wolfram Research Mathematica Wavelet Analysis Libraries
Strang, G.: “Wavelets.” Am. Sci. 82(3), (1994)
Mallat, S.: A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed., Elsevier, (2008)
Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis. Cambridge University Press, Cambridge (2006)
Gabor, D.: Theory of communication. J. IEE. 93 (1946)
Minor, L.G., Sklansky, J.: Detection and segmentation of blobs in infrared images. IEEE Trans. Syst. Man Cyberneteics. 11(3), (1981)
van Ginkel, M., Luengo Hendriks, C.K., van Vliet, L. J.: A short introduction to the Radon and Hough transforms and how they relate to each other. Number QI-2004-01 in the Quantitative Imageing Group Technical Report Series (2004)
Toft, P.A.: Using the generalized Radon transform for detection of curves in noisy images. 1996 I.E. International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96. Conference Proceedings, vol 4, (1996)
Radon, J.: Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Mannigfaltigkeiten. Berichte Sächsische Akademie der Wissenschaften, Leipzig, Mathematisch-Physikalische Klasse 69 (1917)
Fung, J., Mann, S., Aimone, C.: OpenVIDIA: parallel GPU computer vision. Proc. ACM Multimed. (2005)
Bazin, M.J., Benoit, J.W.: Off-line global approach to pattern recognition for bubble chamber pictures. Trans. Nuclear Sci. 12 (1965)
Deans, S.R.: Hough transform from the Radon transform. Trans. Pattern. Anal. Mach. Intell. 3(2), 185–188 (1981)
Rosenfeld, A.: Digial Picture Processing by Computer. Academic Press, New York (1982)
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. ICCV ’98 Proceedings of the Sixth International Conference on Computer Vision (1998)
See the documentation for the ImageJ, ImageJ2 or Fiji software package for complete references to each method, [global] Auto Threshold command and Auto Local Threshold command. http://fiji.sc/ImageJ2
Garg, R., Mittal, B., Garg, S.: Histogram equalization techniques for image enhancement. Int. J. Electron. Commun. Technol. 2 (2011)
Sung, A.P., Wang, C.: Spatial-temporal antialiasing. Trans. Visual. Comput. Graph. 8 (2002)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60 (2004)
Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. PAMI 32 (2010)
Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. CIVR 2004, In: Proceedings of the Challenge of Image and Video Retrieval, London, LNCS 2383
Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. Conf. Comput. Vis. Pattern Recogn. (2008)
Davis, L.S.: Computing the spatial structures of cellular texture. Comput. Graph. Image Process. 11(2), (1979)
Pun, C.M., Lee, M.C.: Log-polar wavelet energy signatures for rotation and scale invariant texture classification. Trans. Pattern. Anal. Mach. Intell. 25(5), (2003)
Spence, A., Robb, M., Timmins, M., Chantler, M.: Real-time per-pixel rendering of textiles for virtual textile catalogues. Proc. INTEDEC. (2003)
Lam, S.W.C., Ip, H.H.S.: Adaptive pyramid approach to texture segmentation. Comput. Anal. Images Patterns Lect. Notes Comput. Sci. 719, 267–274 (1993)
Yinpeng J., Fayad, L., Laine, A.: Contrast enhancement by multi-scale adaptive histogram equalization. Proc. SPIE. 4478 (2001)
Jianguo, Z., Tan, T.: Brief review of invariant texture analysis methods. Pattern Recogn. 35 (2002)
Tomita, F., Shirai, Y., Tsuji, S.: Description of textures by a structural analysis. IEEE Trans. Pattern. Anal. Mach. Intell. Arch. 4 (1982)
Tomita, F., Tsuji, S.: Computer Analysis of Visual Textures. Springer, New York (1990)
Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. (1983)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. SPIE J. Electron. Imaging (2004)
Haralick, R.M., Shapiro, L.G.: Image segmentation techniques. Comput. Vis. Graph. Image Process. 29, 100–132 (1985)
Raja, Y., Gong, S.: Sparse multiscale local binary patterns. Br. Mach. Vis. Conf. (2006)
Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5 (2004)
Szelinski, R.: Computer Vision, Algorithms and Applications. Springer, New York (2011)
Pratt, W.K.: Digital Image Processing: PIKS Scientific Inside. 4 ed., Wiley-Interscience, (2007)
Russ, J.C.: The Image Processing Handbook, 5 ed., CRC Press, (2006)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. IMAR. (2007)
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. ISMAR ’11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality (2011)
Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. ACM Symp. User Interf. Software Technol. (2011)
Moravec, H.: Obstacle avoidance and navigation in the real world by a seeing robot rover. Tech Report CMU-RI-TR-3, Robotics Institute, Carnegie-Mellon University, (1980)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. Int. Conf. Comput. Vis. (2001)
Turcot, P., Lowe, D.G.: Better matching with fewer features: the selection of useful features in large database recognition problems. Int. Conf. Comput. Vis. (2009)
Feichtinger, H.G., Strohmer, T.: Gabor Analysis and Algorithms, 1997 ed., Birkhäuser, (1997)
Ricker, N.: Wavelet contraction, wavelet expansion, and the control of seismic resolution. Geophysics 18, 769–792 (1953)
Goshtasby, A.: Description and discrimination of planar shapes using shape matrices. PAMI 7(6), (1985)
Vapnik, V.N., Levin, E., LeCun, Y.: Measuring the dimension of a learning machine. Neural Comput. 6(5), 851–876 (1994)
Cowan, J. D., Tesauro, G., Alspector, J.: Learning curves: asymptotic values and rate of convergence. Adv. Neural Inform. Process. 6 (1994)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition: intelligent signal processing. Proc. IEEE 86(11), 2278–2324 (1998)
Krizhevsky, A., Sutskever, I., Hinton, E.: ImageNet classification with deep convolutional neural networks. Conf. Neural Inform. Process. Syst. (2012)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. COLT ’92 Proceedings of the Fifth Annual Workshop on Computational Learning Theory, (1992)
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20 (1995)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Kluwer Data Mining Discov. 2 (1998)
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: large displacement optical flow with deep matching. Int. Conf. Comput. Vis. (2013)
Keysers, T.C., Gollan, D., Ney, H.: Deformation models for image recognition. Trans. PAMI 20 (2007)
Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. Conf. Comput. Vis. Pattern Recogn. (2013)
Boureau, Y.-L., Ponce, J., LeCu, Y.: A theoretical analysis of feature pooling in visual recognition. IML, 27th International Conference on Machine Learning, Haifa, Israel, (2010)
Schmid, C., Mohr, R.: Object recognition using local characterization and semi-local constraints. PAMI 19(3), (1997)
Ferrari, V., Tuytelaars, T., Gool, L.V.: Simultaneous object recognition and segmentation from single or multiple model views. Int. J. Comput. Vis. 67 (2005)
Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. CIVR. (2002)
Estivill-Castro, V.: Why so many clustering algorithms—a position paper. ACM SIGKDD Explor. Newslett. 4(1), (2002)
Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisciplinary Rev. Data Mining Knowl. Discov. 1(3), 231–240 (2011)
Hartigan, J.A.: Clustering Algorithms. Wiley, Hoboken, NJ (1975)
Hartigan, J.A., Wong. M.A.: Algorithm AS 136: A K-means clustering algorithm. J. Roy. Stat. Soc. 28(1), (1979)
Hastie, T., Tibshirani, R., Friedman, J.: Hierarchical Clustering: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Phil. Mag. (1901)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3–4), 321–377 (1936)
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs, NJ (1999)
Vapnik, V.: Statistical Learning Theory. Wiley, Hoboken, NJ (1998)
Hofmann, T., Scholkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36(3), 1031 (2008)
Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. Eur. Conf. Comput. Vis. (2008)
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. Conf. Neural Inform. Process. Syst. (2004)
Schmid, C., Mohr, R.: Local gray value invariants for image retrieval. PAMI 19(5), (1997)
Dork, G., Schmid, C.: Object class recognition using discriminative local features. Technical Report RR-5497, INRIA—Rhone-Alpes (2005)
Schlkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA (2001)
Ferrari, V., Tuytelaars, T., Gool, L.V.: Simultaneous object recognition and segmentation from single or multiple model views. Int. J. Comput. Vis. 67(2), (2006)
Cinbis, R.G., Verbeek, J., Schmid, C.: Segmentation driven object detection with fisher vectors. Int. Conf. Comput. Vis. (2013)
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), (1981)
Freund, Y., Schapire, R.E.: A short introduction to boosting. Jpn. Soc. Artif. Intell. 14(5), (1999)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Heckerman, D.: A tutorial on learning with Bayesian networks. Microsoft Res. Tech. Rep. (1996)
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), (1997)
Rabiner, L.R., Juang, B.H.: An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Mag. (1986)
Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. (2001)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. Conf. Comput. Vis. Pattern Recogn. (2006)
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. PAMI 13(9), (1991)
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1) (2001)
Schmid, C.: Constructing models for content-based image retrieval. Conf. Comput. Vis. Pattern Recogn. (2001)
Alahi, A., Vandergheynst, P., Bierlaire, M., Kunt, M.: Cascade of descriptors to detect and track objects across any network of cameras. Comput. Vis. Image Understand. 114(6), 624–640 (2010)
Simard, P., Bottou, L., Haffner, P., LeCun, Y.: Boxlets: a fast convolution algorithm for signal processing and neural networks. Conf. Neural Inform. Process. Syst. (1999)
Vedaldi, A., Zisseman, A.: Efficient additive kernels via explicit feature maps. PAMI 34(3), (2012)
Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. PAMI 33(3), (2010)
Martin, E., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, (1996)
Mihael, A., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. SIGMOD ’99 Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data
Muja, M., Rusu, R.B., Bradski, G., Lowe, D.G.: REIN—a fast, robust, scalable recognition infrastructure. Int. Conf. Robot Autom. (2011)
Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. Intell. Robots Syst. (2010)
Alvaro, C., Martinez, M., Siddhartha S.: Srinivasa. MOPED: a scalable and low latency object recognition and pose estimation system. Int. Conf. Robot Autom. (2010)
Jacob, M., Unser, M.: Design of steerable filters for feature detection using canny-like criteria. PAMI 26(8), (2004)
Moré, J.J.: The Levenberg-Marquardt algorithm implementation and theory. Numer. Anal. Lect. Notes Math. 630, 105–116 (1978)
Lecun, Y.: Learning invariant feature hierarchies. Eur. Conf. Comput. Vis. (2012)
Ranzato, M.A., Huang, F.-J., Boreau, Y.-L., Cun, Y.L.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. Conf. Comput. Vis. Pattern Recogn. (2007)
Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. Int. Conf. Mach. Learn. (2010)
Kingma, D., LeCun, Y.: Regularized estimation of image statistics by score matching. Conf. Neural Inform. Process. Syst. (2010)
Losson, O., Macaire, L., Yang, Y.: Comparison of color demosaicing methods. Adv. Imaging Electron Phys. 162, 173–265 (2010)
Xin, L., Gunturk, B., Zhang, L.: Image demosaicing: a systematic survey. Proceedings of SPIE 6822, Visual Communications and Image Processing, 68221J (2008)
Tanbakuchi, A.A., et al.: Adaptive pixel defect correction. Proceedings of SPIE 5017, Sensors and Camera Systems for Scientific, Industrial, and Digital Photography Applications IV, (2003)
Ibenthal, A.: Image sensor noise estimation and reduction. ITG Fachausschuss 3.2 Digitale Bildcodierung (2007)
An Objective Look at FSI and BSI, Aptina White Paper
Cossairt, O., Miau, D., Nayar, S.K.: Gigapixel computational imaging. IEEE Int. Conf. Comput. Photogr. (2011)
Eastman Kodak Company, E-58 technical data/color negative film. Kodak 160NC Technical Data Manual, (2000)
Kuthirummal, S., Nayar, S.K.: Multiview radial catadioptric imaging for scene capture. ACM Trans. Graph. (also Proc. of ACM SIGGRAPH), (2006)
Zhou, C., Nayar, S.K.: Computational cameras: convergence of optics and processing. IEEE Trans. Image Process. 20(12), (2011)
Krishnan, G., Nayar, S.K.: Towards a true spherical camera. Proceedings of SPIE 7240, Human Vision and Electronic Imaging XIV, 724002 (2009)
Reinhard, H., Debevec, P., Ward, M., Kaufmann, M.: High Dynamic range imaging, 2nd edition acquisition, display, and image-based lighting. 2 ed., Morgan Kaufmann, (2010)
Gallo, O., et al.: Artifact-free high dynamic range imaging. IEEE Int. Conf. Comput. Photogr. (2009)
Grossberg, M.D., Nayar, S.K.: High dynamic range from multiple images: which exposures to combine? Int. Conf. Comput. Vis. (2003)
Nayar, S.K., Krishnan, G., Grossberg, M.D., Raskar, R.: Fast separation of direct and global components of a scene using high frequency illumination. Proc. SIGGRAPH (2006)
Wilson, T., Juskaitis, R., Neil, M., Kozubek, M.: Confocal microscopy by aperture correlation. Opt. Lett. 21(23), 1879–1881 (1996)
Corle, T.R., Kino, G.S.: Confocal Scanning Optical Microscopy and Related Imaging Systems. Academic Press, New York (1996)
Fitch, J.P.: Synthetic Aperture Radar. Springer, New York (1988)
Ng, R., et al.: Light field photography with a hand-held plenoptic camera. Stanford Tech Report CTSR 2005-02
Ragan-Kelley, J., et al.: Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31(4), (2012)
Levoy, M.: Experimental platforms for computational photography. Comput. Graph. Appl. 30 (2010)
Adams, A., et al.: The Frankencamera: an experimental platform for computational photography. Proc. SIGGRAPH. (2010)
Salsman, K.: 3D vision for computer based applications. Technical Report, Aptina, Inc., (2010).
Cossairt, O., Nayar, S.: Spectral focal sweep: extended depth of field from chromatic aberrations. IEEE Int. Conf. Comput. Photogr. (2010). (see also US Patent EP2664153A1)
Fife, K., El Gamal, A., Philip Wong, H.-S.: A 3D multi-aperture image sensor architecture. Proc. IEEE Custom Integr. Circ. Conf. 281–284, (2006)
Wang, A., Gill, P., Molnar, A.: Light field image sensors based on the Talbot effect. Appl. Optics 48(31), 5897–5905 (2009)
Shankar, M., et al.: Thin infrared imaging systems through multichannel sampling. Appl. Optics 47(10), B1–B10 (2008)
Flusser, B.Z.J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977–1000 (2003)
Hirschmûller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. Conf. Comput. Vis. Pattern Recogn. (2005)
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. Br. Mach. Vis. Conf. (2000)
Faugeras, O.: Three Dimensional Computer Vision. MIT Press, Cambridge, MA (1993)
Maybank, S.J., Faugeras O.D.: A theory of self-calibration of a moving camera. Int. J. Comput. Vis. 8(2), (1992)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)
Luong, Q.-T., Faugeras, O.D.: The fundamental matrix: theory, algorithms, and stability analysis. Int. J. Comput. Vis. 17 (1995)
Hartley, R.I.: Theory and practice of projective rectification. Int. J. Comput. Vis. 35 (1999)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47 (2002)
Lazaros, N., Sirakoulis, G.C., Gasteratos, A.: Review of stereo vision algorithms: from software to hardware. Int. J. Optomechatroni. 2(4), 435–462 (2008)
Clark, D.E., Ivekovic, S.: The Cramer-Rao lower bound for 3-D state estimation from rectified stereo cameras. IEEE Fusion (2010)
Nayar, S.K., Gupta, M.: Diffuse structured light. Int. Conf. Comput. Photogr. (2012)
Cattermole, F.: Principles of Pulse Code Modulation, 1st ed., American Elsevier Pub. Co., (1969)
Pagès, J., Salvi, J.: Coded light projection techniques for 3D reconstruction. J3eA, Journal sur l’enseignement des sciences et technologies de l’information et des systèmes 4(1), (2005) (Hors-Série 3)
Gu, J., et al.: Compressive structured light for recovering inhomogeneous participating media. Eur. Conf. Comput. Vis. (2008)
Nayar, S.K.: Computational cameras: approaches, benefits and limits. Technical Report, Computer Science Department, Columbia University, (2011)
Lehmann, M., et al.: CCD/CMOS lock-in pixel for range imaging: challenges, limitations and state-of-the-art. CSEM, Swiss Center for Electronics and Microtechnology, (2004)
Andersen, J.F., Busck, J., Heiselberg, H.: Submillimeter 3-D laser radar for space shuttle tile inspection. Danisch Defense Research Establishment, Copenhagen, Denmark, (2013)
Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.).: Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications Lecture Notes in Computer Science, Springer (2013)
Levoy, M., Hanrahan, P.: Light field rendering. SIGGRAPH ’96 Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. SIGGRAPH ’96 Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)
Drebin, R.A.: Loren Carpenter, and Pat Hanrahan, volume rendering. SIGGRAPH (1988)
Levoy, M.: Display of surfaces from volume data. CG&A (1988)
Levoy, M.: Volume rendering using the Fourier projection slice theorem. Technical report CSL-TR-92-521, Stanford University, (1992)
Klein, G., Murray, D.: Parallel tracking and mapping on a camera phone. ISMAR ’09 Proceedings of the 2009 8th IEEE International Symposium on Mixed and Augmented Reality (2009)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR’07, Nara)
Lucas, B.D., Kanade, T.: An image registration technique with an application to stereo vision. Proceedings of Image Understanding Workshop, (1981)
Beauchemin, S., Barron, J.D.: The computation of optical flow. ACM Comput. Surv. 27(3), (1995)
Barron, J., Fleet, D., Beauchemin, S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Baker, S., et al.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2009)
Quénot, G.M., Pakleza, J., Kowalewski, T.A.: Particle image velocimetry with optical flow. In: Experiments in Fluids, vol 25(3), pp. 177–189, (1998)
Trulls, E., Sanfeliu, A., Moreno-Noguer, F.: Spatiotemporal descriptor for wide-baseline stereo reconstruction of non-rigid and ambiguous scenes. Eur. Conf. Comput. Vis. (2012)
Steinman, S.B., Steinman, B.A., Garzia, R.P.: Foundations of Binocular Vision: A Clinical Perspective. McGraw-Hill, New York (2000)
Roy, S., Meunier, J., Cox, I.J.: Cylindrical rectification to minimize epipolar distortion. Conf. Comput. Vis. Pattern Recogn. (1997)
Oram, D.: Rectification for any epipolar geometry. Br. Mach. Vis. Conf. (2001)
Takita, K., et al.: High-accuracy subpixel image registration based on phase-only correlation. Institute of Electronics, Information and Communication Engineers(IEICE), (2003)
Huhns, T.: Algorithms for subpixel registration. CGIP Comput. Graph. Image Process. (1986)
Foroosh (Shekarforoush).: Hassan, Josiane B. Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE Trans. Image Process. (2002)
Zitnick, L., Kanade, T.: A cooperative algorithm for stereo matching and occlusion detection. Carnegie Mellon University, Technical report CMU-RI-TR-99-35
Jian, S., Li, Y., Kang, S.B., Shum, H.-Y.: Symmetric stereo matching for occlusion handling. CVPR ’05 Proceedings of the 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 2
Kang, S.B., Szeliski, R., Chai, J.: Handling occlusions in dense multi-view stereo. Conf. Comput. Vis. Pattern Recogn. (2001)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. SIGGRAPH Proc. (1996)
Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. UIST ’11 Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, (2011)
Newcombe, RA. et al.: KinectFusion: real-time dense surface mapping and tracking. ISMAR ’11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality
Durrant-Whyte, H., Bailey, T.: Simultaneous localisation and mapping (SLAM): part I the essential algorithms. IEEE Robotics Autom. Mag. (2006)
Bailey, T., Durrant-Whyte, H.: Simultaneous localisation and mapping (SLAM): part II state of the art. IEEE Robotics Autom. Mag. (2006)
Seitz, S., et al.: A comparison and evaluation of multi-view stereo reconstruction algorithms. CVPR 1, 519–526 (2006)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47 (2002)
Baker, S., Matthews, I.: Lucas-Kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56 (2004)
Gallup, D., Pollefeys, M., Frahm, J.M.: 3D reconstruction using an n-layer heightmap. Pattern Recogn. Lect. Notes Comput. Sci. 6376 (2010)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. Int Conf Comput Vis (ICCV) IEEE, 2320–2327, (2011)
Hwangbo, M., Kim, J.-S., Kanade, T.: Inertial-aided KLT feature tracking for a moving camera. Intell. Robots Syst. (IROS)—IEEE. (2009)
Lovegrove, S.J., Davison, A.J.: Real-time spherical Mosaicing using whole image alignment. Eur. Conf. Comput. Vis. (2010)
Malis, E.: Improving vision-based control using efficient second-order minimization techniques. Int. Conf. Robot Autom. (2004)
Kaiming H, Sun, J., Tang, X.: Guided image filtering. Eur. Conf. Comput. Vis. (2010)
Rhemann, C., et al.: Fast cost-volume filtering for visual correspondence and beyond. CVPR, IEEE, 3017–3024, (2011)
Fattal, R.: Edge-avoiding wavelets and their applications. SIGGRAPH (2009)
Gastal, E.S.L., Oliveira, M.M.: Domain transform for edge-aware image and video processing. ACM SIGGRAPH 2011 papers Article No. 69
Wolberg, G.: Digital Image Warping. Wiley, Hoboken, NJ (1990)
Baxes, G.: Digital Image Processing: Principles and Applications. Wiley, Hoboken, NJ (1994)
Fergus, R., et al.: Removing camera shake from a single photograph. ACM Trans. Graph. 25(3), (2006)
Rohr, K.: Landmark-Based Image Analysis Using Geometric and Intensity Models. Kluwer Academic Publishers, Dordrecht (2001)
Corbet, J., Rubini, A., Kroah-Hartman, G.: Linux Device Drivers, 3rd ed., O’Reilly Media, (2005)
Zinner, C., Kubinger, W., Isaacs, R.: PfeLib—a performance primitives library for embedded vision. EURASIP, (2007)
Houston, M.: OpenCL overview. SIGGRAPH OpenCL BOF (2011), also on KHRONOS website
Zinner, C., Kubinger, W.: ROS-DMA: a DMA double buffering method for embedded image processing with resource optimized slicing. IEEE RTAS 2006, Real-Time and Embedded Technology and Applications Symposium (2006)
Kreahling, W.C., et al.: Branch elimination by condition merging. Euro-Par 2003 Parallel Process. Lect. Notes Comput. Sci. 2790, (2003)
Ullman, J.D., Aho, A.V.: Principles of Compiler Design. Addison-Wesley, (1977)
Ragan-Kelley, J., et al.: Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. SIGGRAPH 31(4), (2012)
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. Eur. Conf. Comput. Vis. (2012)
Schneider, C.A., Rasband, W.S., Eliceiri, K.W.: NIH image to ImageJ: 25 years of image analysis. Nat. Meth. 9 (2012)
Muja, M.: Recognition pipeline and object detection scalability. Summer 2010 Internship Presentation, University of British Columbia
Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. Conf. Comput. Vis. Pattern Recogn. (2001)
Swain, M., Ballard, D.H.: Color indexing. Int. J. Comput. Vis. 7 (1991)
Zhang, Z.: A flexible new technique for camera calibration. EEE Trans. Pattern. Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Viola, P.A., Jones, M.J.: Robust real time object detection. Int. J. Comput. Vis. (2001)
Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance. Int. J. Comput. Vis. 14 (1995)
Grosse, R., et al.: Ground-truth dataset and baseline evaluations for intrinsic image algorithms. Int. Conf. Comput. Vis. (2009)
Haltakov, V., Unger, C., Ilic, S.: Framework for generation of synthetic ground truth data for driver assistance applications. Pattern Recogn. Lect. Notes Comput. Sci. 8142 (2013)
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. Comput. Vis. Pattern Recogn. 2 (2005)
Agaian, S.S., Tourshan, K., Noonan, J.P.: Parametric Slant-Hadamard transforms. Proc. SPIE, (2003)
Sauvola, J., Pietaksinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), (2000)
Yen, J.C., Chang, F.J., Chang, S.: A new criterion for automatic multilevel thresholding. Trans. Image Process. 4(3), (1995)
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging 13(1), 2004
Gaskill, J.D.: Linear Systems, Fourier Transforms, and Optics. Wiley, Hoboken, NJ (1978)
Shapiro, L.G., Stockman, G.C.: Computer Vision. Prentice-Hall, Upper Saddle River, NJ (2001)
Flusser, J., Suk, T., Zitova, B.: Moments and Moment Invariants in Pattern Recognition. Wiley, Hoboken, NJ (2009)
Mikolajcyk, K., Schmid, C.: An affine invariant interest point detector. Int. Conf. Comput. Vis. (2002)
Moravec, H.P.: Obstacle avoidance and navigation in the real world by a seeing robot rover. Tech. report CMU-RI-TR-80-03, Robotics Institute, Carnegie Mellon University & doctoral dissertation, Stanford University, (1980)
Sivic, J.: Efficient Visual search of videos cast as text retrieval. PAMI 31 (2009).
Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. AMFG’07 Proceedings of the 3rd International Conference on Analysis and Modeling of Faces and Gestures (2010)
Scale-Space. Encyclopedia of Computer Science and Engineering. Wiley, Hoboken, NJ, (2008)
Lindeberg, T.: Scale-space theory: a basic tool for analysing structures at different scales. J. Appl. Stat 21(2), 224–270 (1994)
Bengio, Y.: Learning Deep Architectures for AI, Foundations and Trends in Machine Learning. Now Publishers Inc USA, (2009)
Hinton, G.E., Osindero, S.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), (2006)
Olson, E.: AprilTag: a robust and flexible visual fiducial system. Int. Conf. Robotics Autom. (2011)
Farabet, C., et al.: Hardware accelerated convolutional neural networks for synthetic vision systems. ISCAS IEEE 257–260, (2010)
Tuytelaars, T., Van Gool, L.: Matching widely separated views based on affine invariant regions. Int. J. Comput. Vis. 59 (2004)
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEE Trans. Comput. (1973)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), (2010)
Yi Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. Conf. Comput. Vis. Pattern Recogn. (2011)
Amit, Y., Trouve, A.: POP: patchwork of parts models for object recognition. Int. J. Comput. Vis. 75 (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Conf. Comput. Vis. Pattern Recogn. (2006)
Grauman, K., Darrell, T.: The pyramid Match Kernel: discriminative classification with sets of image features. Int. Conf. Comput. Vis. (2005)
Michal, A., Elad, M., Bruckstein, A.: KSVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 64 (2006)
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Conf. Comput. Vis. Pattern Recogn. (2007)
Johnson, A.: Spin-Images: A Representation for 3-D Surface Matching Ph.D. dissertation, technical report CMU-RI-TR-97-47, Robotics Institute, Carnegie Mellon University, (1997)
Zoltan-Csaba, M., Pangercic, D., Blodow, N., Beetz, M.: Combined 2D-3D categorization and classification for multimodal perception systems. Int. J. Robotics Res. Arch. 30(11), (2011)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. (1988)
Tombari, F., Salti, S., Di Stefano, L.: A combined texture-shape descriptor for enhanced 3D feature matching. Int. Conf. Image Process. (2011)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. Int. Conf. Comput. Vis. (2001)
Ragan-Kelley, J., et al.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. PLDI ’13 Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (2013)
Kindratenko, V.V., et al.: GPU clusters for high-performance computing. In: Proceedings of Workshop on Parallel Programming on Accelerator Clusters—PPAC’09, (2009)
Munshi, A., et al.: OpenCL Programming Guide, 1 ed., Addison-Wesley Professional, (2011)
Prince, S.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, Cambridge (2012)
Lindeberg, T.: Scale Space Theory in Computer Vision. Springer, New York (2010)
Pele, O.: Distance Functions: Theory, Algorithms and Applications. Ph.D. Thesis, Hebrew University, (2011)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. (1999)
Bache, K., Lichman, M.: UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), University of California, School of Information and Computer Science, Irvine, CA, (2013)
Zach, C.: Fast and high quality fusion of depth maps. 3DPVT Joint 3DIM/3DPVT Conference 3D Imaging, Modeling, Processing, Visualization, Transmission (2008)
Visual Genomes for Synthetic Vision, Scott Krig, TBP (2016)
Grimes, D.B., Rao, R.P.N.: Bilinear sparse coding for invariant vision. Neural Comput. 17(1), 47–73 (2005)
Roger, G., Raina, R., Kwong, H., Ng, A.Y.: Shift-invariant sparse coding for audio classification. In: Proceedings of the 23rd Conference in Uncertainty in Artificial Intelligence (UAI’07), (2007)
The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All), Technical Report, James Bergstra, Aaron Courville, and Yoshua Bengio (2011)
Scalable Object Detection using Deep Neural Networks Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Anh, N., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. CVPR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV (2014)
Mutch, J., Lowe, D.G.: Object class recognition and localization using sparse features with limited receptive fields. IJCV (2008)
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. CVPR (2005)
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. IJCV (2013)
Min, L., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going Deeper with Convolutions
Behnke, S.: Hierarchical neural networks for image interpretation. Draft submitted to Springer Published as volume 2766 of Lecture Notes in Computer Science ISBN: 3-540-40722-7, Springer (2003)
Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. CVPR (2014)
van de Sande, E.A., Snoek, C.G.M., Smeulders, A.W.M.: Fisher and VLAD with FLAIR. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Ranzato, M., Boureau, Y., LeCun, Y.: Sparse feature learning for deep belief networks. In: Proceedings of Neural Information Processing Systems (NIPS), (2007)
Schmidhuber, J.: Deep learning in neural networks: an overview, Technical Report IDSIA-03-14/arXiv:1404.7828 v4
Li D., Yu, D.: Deep learning methods and applications, foundations and Trends® in signal processing 7
Yoshua, B., Goodfellow, I.J., Courville, A.: Deep learning. MIT Press, (2016) (in preparation)
Anderson, J.A., Rosenfeld, E., (eds.).: Neurocomputing: foundations of research. MIT Press, Cambridge MA, (1988). Also Neurocomputing vol. 2: directions for research. MIT Press, Cambridge MA, (1991)
Jackson, P.: Introduction to Expert Systems, 3 ed., Addison Wesley, (1998)
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. (1958)
Joseph, R.D.: Contributions to Perceptron Theory. PhD thesis, Cornell Univ. (1961)
Wiesel, D.H., Hubel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. (1959)
Hubel, D.H., Wiesel, T.: Receptive fields, binocular interaction, and functional architecturein the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. (1943)
Hebb, D.O.: The Organization of Behavior. Wiley, New York (1949)
Rosenblatt, F.: The Perceptron—a perceiving and recognizing automaton. Report 85-460-1, Cornell Aeronautical Laboratory (1957)
Ivakhnenko, A.G.: The group method of data handling—a rival of the method of stochastic approximation. Soviet Autom. Contr. (1968)
Ivakhnenko, A.G., Lapa, V.G.: Cybernetic predicting devices. CCM Inform. Corp. (1965)
Ivakhnenko, A.G., Lapa, V.G., McDonough, R.N.: Cybernetics and Forecasting Techniques. American Elsevier, NY, (1967)
Ivakhnenko, A.G.: Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern. 4, 364–378 (1971)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, (2012)
Ikeda, S., Ochiai, M., Sawaragi, Y.: Sequential GMDH algorithm and its application to river flow prediction. IEEE Trans. Syst. Man Cybern. 7, 473–479 (1976)
Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron. Trans. IECE J. 62(10), 658–665 (1979)
Fukushima, K.: Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
Dreyfus, S.E.: The numerical solution of variational problems. J. Math. Anal. Appl. 5(1), 30–45 (1962)
Dreyfus, S.E.: The computational solution of optimal. (1973)
LeCun, Y.: Une proc´edure d’apprentissage pour r´eseau `a seuil asym´etrique. Proceedings of Cognitiva, vol 85, Paris, pp. 599–604, (1985)
LeCun, Y.: A theoretical framework for back-propagation. In: Touretzky, D., Hinton, G., Sejnowski, T., (eds.) Proceedings of the 1988 Connectionist Models Summer School, CMU, Morgan Kaufmann, Pittsburgh, PA, pp. 21–28, (1988)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Back-propagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D. S., (ed.) Advances in Neural Information Processing Systems, vol 2, Morgan Kaufmann, pp. 396–404, (1990a)
Kelley, H.J.: Gradient theory of optimal flight paths. ARS J. 30(10), 947–954 (1960)
Bryson, A.E.: A gradient method for optimizing multi-stage allocation processes. In: Proc. Harvard Univ. Symposium on Digital Computers and Their Applications, (1961)
Bryson, Jr., A. E. and Denham, W. F.: A steepest-ascent method for solving optimum programming problems. Technical Report BR-1303, Raytheon Company, Missle and Space Division, (1961)
Werbos, P.J.: The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley, (1994)
Schmidhuber, J.: Learning complex, extended sequences using the principle of history compression. Neural Comput. (1992)
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. (2014)
Hochreiter, S., Jürgen, S.: Long short-term memory, neural computation. (1997)
Ng, A.: Stanford CS229 Lecture notes. Support Vector Mach.
Shawe-Taylor, J., Cristianini, N.: Support vector machines and other kernel-based learning methods, Cambridge University Press, (2000)
Hinton, G.E., Sejnowski, T.J., Rumelhart, D.E., McClelland, J.L.: Learning and relearning in Boltzmann machines, PDP Research Group (1986)
Ackley, D.H., Hinton, G.E., Sejnowski, TJ.: A learning algorithm for Boltzmann machines. Cogn. Sci. (1985)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U. S. A. (1982)
Smolensky, P.: Chapter 6: information processing in dynamical systems: foundations of harmony theory. In: Rumelhart, D.E., McLelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1, Foundations. MIT Press (1986)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv (2014)
Also see NiN slides from ILSVRC (2014) http://www.image-net.org/challenges/LSVRC/2014/slides/ILSVRC2014_NUS_release.pdf
LeCun, Y.: A theoretical framework for back-propagation. In: Touretzky, D., Hinton, G., Sejnowski, T., (eds.) Proceedings of the 1988 Connectionist Models Summer School, CMU, pp. 21–28, Morgan Kaufmann, Pittsburgh, PA, (1988)
Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Contr. (1963)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. ACM COLT ’92, (1992)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. (1995)
Vapnik, V.: Estimation of Dependences Based on Empirical Data [in Russian]. Nauka, Moscow, (1979). English translation, Springer, New York, (1982)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, Inc., New York (1998)
Powell, M.J.D.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. (1964)
Carreira-Perpignan, M.A., Hinton, G.E.: On contrastive divergence learning. In: Artificial Intelligence and Statistics, (2005)
Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column Deep Neural Networks for Image Classification, cvpr (2012)
Coates, A., Lee, H., Ng, A.: An analysis of single-layer networks in unsupervised feature learning, AISTATS (2011)
Rosenblatt, F.: Principles of Neurodynamics Unclassifie—Armed Services Technical Informatm Agency. Spartan, Washington, DC (1961)
Baddeley, A., Eysenck, M., Anderson, M.: Memory. Psychology Press, (2009)
Goldman-Rakic, P.S.: Cellular basis of working memory. Neuron 14(3), 477–485 (1995)
Rumelhart, D.E., McClelland, J.L., Group, P.R., et al.: Parallel distributed processing, vol 1. MIT Press, (1986)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv:1409.4842, (2014)
Von Neumann, J.: First draft of a report on the edvac. (1945)
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning (ICML), (2013)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1994)
Stollenga, M., Masci, J., Gomez, F., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. ICML (2014)
Rupesh Kumar, S., Masci, J., Kazerounian, S., Gomez, F., Schmidhuber, J.: Compete to compute. In: NIPS, (2013)
Cristian, B., Caruana, R., Niculescu-Mizil, A.: Model compression, ACM SIGKDD (2006)
Mansimov, E., Srivastava, N., Salakhutdinov, R.: Initialization Strategies of Spatio-Temporal Convolutional Neural Networks, Technical Report, (2014)
Weng, J., Ahuja, N., Huang, T.S.: Cresceptron: a self-organizing neural network which grows adaptively. In: Proceedings of Int’l Joint Conference on Neural Networks, Baltimore, MD, (1992)
Cadieu, CF, Hong H, Yamins DLK, Pinto N, Ardila D, Solomon EA, Majaj NJ, DiCarlo JJ.: Deep neural networks rival the representation of primate IT cortex for core visual object recognition, (2014), PLOS 2014DOI: 10.1371/journal.pcbi.1003963
Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. ICML (2011)
Jarrett, K., Kavukcuoglu, K., Ranzato, M., Le-Cun, Y.: What is the best multi-stage architecture for object recognition?, ICCV (2009)
Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. NIPS (2014)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. (2006)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. NIPS (2007)
Kandel, E.R., Schwartz, J.H., Jessel, T.M. (eds.) Principles of Neural Science, 4th ed., McGraw-Hill, (2000)
Rao, R.P.N., Ballard, D.H.: Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nat Neurosci. (1999)
Rosenfeld, A., Hummel, R.A., Zucker, S.W.: Scene labeling by relaxation operations. IEEE Trans. Syst. Man Cybernetics (1976)
Métin, C., Frost, D.O.: Visual responses of neurons in somatosensory cortex of hamsters with experimentally induced retinal projections to somatosensory thalamus. Proc. Natl. Acad. Sci. U. S. A. 86(1), 357–361 (1989)
Roe, A.W., Pallas, S.L., Kwon, Y.H., Sur, M.: Visual projections routed to the auditory pathway in ferrets: receptive fields of visual neurons in primary auditory cortex. J. Neurosci. 12(9), 3651–3664 (1992)
Bach-y-Rita, P., Kaczmarek, K.A., Tyler, M.E., Garcia-LoraVenue, J.: Form perception with a 49-point electrotactile stimulus array of the tongue: a technical note. J. Rehabil. Res. Dev. (1998)
Bach-y-Rita, P., Tyler, M.E., Kaczmarek, K.A.: Seeing with the brain. IJHCI (2003)
Laurenz, W.: How Does Our Visual System Achieve Shift and Size Invariance, Problems in Systems Neuroscience, Oxford University Press, (2002)
Thomas Yeo, B.T., Krienen, F.M., Sepulcre, J., Sabuncu, M.R., Lashkari, D., Hollinshead, M., Roffman, J.L., Smoller, J.W., Zöllei, L., Polimeni, J.R., Fischl, B., Liu, H., Buckner, R.L.: The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. (2011)
Gross, G.N., Lømo, T., Sveen, O.: Participation of inhibitory and excitatory interneurones in the control of hippocampal cortical output, Per Anderson, The Interneuron, University of California Press, Los Angeles, (1969)
John, C.E., Ito, M., Szentágothai, J.: The cerebellum as a neuronal machine, Springer, New York, (1967)
Costas, S.: Interneuronal mechanisms in the cortex. The Interneuron, University of California Press, Los Angeles, (1969)
Stephen, G.: Contour enhancement, short-term memory, and constancies in reverberating neural networks, Studies in Applied Mathematics, (1973)
Parikh, D., Zitnick, C.L.: The role of features, algorithms and data in visual recognition. CVPR (2010)
Christopher, B.: Pattern Recognition and Machine Learning, Springer, (2006)
Eigen, D., Rolfe, J., Fergus, R., LeCun, Y.: Understanding deep architectures using a recursive convolutional network, arXiv:1312.1847 [cs.LG]
NIPS.: Tutorial—Deep Learning for Computer Vision (Rob Fergus) (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. NIPS (2012)
Zeiler, M.D., Fergus, R.: Visualizing and Understanding Convolutional Networks. ECCV (2014)
Zeiler, M., Taylor, G., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: ICCV, (2011)
Olga, R., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Large scale visual recognition challenge. ImageNet http://arxiv.org/abs/1409.0575, (2015)
Random Search for Hyper-Parameter Optimization James Bergstra JAMES.BERGSTRA@UMONTREAL.CA Yoshua Bengio, JMLR (2012)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: A deep convolutional activation feature for generic visual recognition. CVPR (2013)
Yamins, D.L., Hong, H., Cadieu, C., DiCarlo, J.J.: Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. NIPS (2013)
Haykin, S.: Neural Networks: a comprehensive foundation. Pearson Educ. (1999)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. (2013)
Daniel L.K.Y., Honga, H., Cadieua, C.F., Solomona, E.A., Seiberta, D., DiCarloa, J.J.: Performance-optimized hierarchical models predict neural responses in higher visual cortex. Natl. Acad. Sci. (2015)
US Government BRAIN Initiative.: http://www.artificialbrains.com/darpa-synapse-program
European Union Human Brain Project.: https://www.humanbrainproject.eu
Canadian Government Computation & Adaptive Perception Canadian Institute For Advanced Research CIFAR. http://www.cifar.ca/neural-computation-and-adaptive-perception-research-progress
Tatyana, V., Sharpee, O., Kouh M., Reynolds, J.H.: Trade-off between curvature tuning and position invariance in visual area. PNAS. (2013)
Neural Networks, Tricks of the Trade, 2nd ed., Springer, (2012)
LeCun, Y.: Convolutional networks and applications in vision, Comput. Sci. Dept., New York Univ., New York, NY, USA, Kavukcuoglu, K., Farabet, C., ISCAS. (2010)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ICLR. (2015)
Lyu, S., Simoncelli, E.P.: Nonlinear image representation using divisive normalization. CVPR. (2008)
Pinto, N., Cox, D.D., DiCarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput Biol. (2008)
Yang Y., Hospedales, T.M.: Deep neural networks for sketch recognition. (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting, JMLR. (2014)
Wan, L., Zeiler, M., Zhang, S., LeCun, Y., Fergus, R.: Regularization of neural network using drop connect. Int. Conf. Mach. Learn. (2013)
Breiman, L.: Bagging predictors. Mach. Learn. (1994)
Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional. Neural Netw.
Mamalet, F., Garcia, C.: Simplifying convnets for fast learning. ICANN. (2012)
Gens, R., Domingos, P.: Deep symmetry networks. NIPS (2014) see also slides at http://research.microsoft.com/apps/video/default.aspx?id=219488
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?, NIPS (2014)
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV (2013)
Hagan, M.T., Demuth, H.B., Beale, M.H.: Neural network design. PWS Publishing, (1996)
Dominik S., M¨uller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. ICANN. (2010)
Kaiming, H., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. CVPR (2015)
Field, G., Gauthier, J., Sher, A., Greschner, M., Machado, T., Jepson, L., Shlens, J., Gunning, D., Mathieson, K., Dabrowski, W., et al.: Functional connectivity in the retina at the resolution of photoreceptors. Nature. (2010)
Rosenblatt, F.: The Perceptron: A theory of statistical separability in cognitive systems. Cornell Aeronautical Laboratory, Buffalo, Inc. Rep. No. VG-1196-G-1, (1958)
Auer, P., Burgsteiner, H., Maass, W.: A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Austr. Sci. Fund (2008)
Vapnik, V., Chervonenkis, A., Moskva, N.: Pattern Recognition Theory, Statistical Learning Problems. (1974)
Hearst, M.A., Berkeley, U.C.: Support vector machines. IEEE Intell. Syst. (1998)
John P.: How to implement SVM’s, Microsoft Research. IEEE Intelligent Systems, (1998)
Fukushima, K.: Cognitron: a self-organizing multilayered neural network, Biological Cybernetics, Springer, (1975)
Fukushima, K.: Artificial vision by multi-layered neural networks: and its advances. Neural Netw. 37, 103–119
Fukushima, K.: Training multi-layered neural network Neocognitron. Neural Netw. 40, 18–31
Joan, B., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv:1312.6203 [cs.LG] (2014)
Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. ICLR. (2014)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE (1998)
http://www.imagemagick.org/Usage/convolve/#convolve_vs_correlate
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. CVPR. (2015)
Fractional max-pooling Benjamin Graham. CVPR. (2014)
The Human Connectome Project is a consortium of leading neurological research labs which are mapping out the pathways in the brain. See http://www.humanconnectomeproject.org/about/
Cun, Y.L., Denker, J.S., Solla, S.A.: Optimal brain damage. NIPS. (1990)
Waibel, A.: Consonant recognition by modular construction of large phonemic time-delay neural networks. IEEE ASSP (1989)
Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Large-scale FPGA-based convolutional networks. (2011)
Clement, F., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Hardware accelerated convolutional neural networks for synthetic vision systems. ISCAS. (2010)
Sermanet, P., Eigen, D., Zhang X., Mathieu M., Fergus R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. CVPR. (2014)
Dong, J., Xia, W., Chen, Q., Feng, J., Huang, Z., Yan, S.: Subcategory-aware object classification. CVPR. (2013)
Jun, Y., Ni, B., Kassim, A.A.: Half-CNN: a general framework for whole-image regression. CVPR. (2014)
Hugo, L., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. JMLR. (2009)
Yu, C., Yu, F.X., Feris, R.S., Kumar, S., Choudhary, A., Chang, S.-F.: Fast neural networks with circulant projections. (2015)
Jochem, T., Dean Pomerleau, AI.: Life in the fast lane the evolution of an adaptive vehicle control system. Magazine (1996)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. JMLR. (2010)
Hastie, T., Friedman.: The Elements of Statistical Learning. 2nd ed., Springer, (2009)
Boureau, Y.-L., Le Roux, N., Bach, F., Ponce, J., Lecun, Y.: Ask the locals: multi-way local pooling for image recognition ICCV’11
Ren, W., Yan, S., Shan, Y., Dang, Q., Sun, G.: Deep image: scaling up image recognition. CVPR. (2015)
Karen, S., Simonyan, K.: http://imagenet.org/tutorials/cvpr2015/recent.pdf, ILSVRC Submission Essentials in the light of recent developments. ImageNet, Tutorial (2015)
Jon Shlens Google Research.: Directions in convolutional neural networks at Google, (2015), http://vision.stanford.edu/teaching/cs231n/slides/jon_talk.pdf
Sergey, I., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CVPR. (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR. (2014)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Int. Conf. Artif. Intell. Stat. (2010)
Chunhui, G., Lim, J.J., Arbelaez, P., Malik, J.: Recognition using regions. CVPR. (2009)
Ross G.: Fast R-CNN. CVPR. (2015)
Volodymyr, M., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. NIPS. (2014)
Oriol, V., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. (2015)
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. ICML (2015)
Subhashini, V., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence—video to text. (2015)
Graves, A.: Generating sequences with recurrent neural networks. (2014)
Schmidhuber, J., Wierstra, D., Gagliolo, M., Gomez, F.: Training recurrent networks by evolino. Neural Comput. (2007)
Weston, J., Chopra, S., Bordes, A.: Memory networks. ICLR. (2015)
LaRue, J.P.: A Bi-directional Neural Network Based on a Convolutional Neural Network and Associative Memory Matrices That Meets the Universal Approximation Theorem, Jadco Signals, Charleston, SC, USA, 1 315 717 9009 james@jadcosignals.com
Zhou, R.W., Quek, C.: DCBAM: A discrete chainable bidirectional associative memory. Pattern Recogn. Lett. (1991)
Kosko, B.: Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 7, 49–60 (1988)
Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. 353–359, (1972)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U. S. A. 79(8), 2554–2558 (1982)
Schmidhuber, J.: Long Short-Term Memory: Tutorial on LSTM Recurrent Networks, http://people.idsia.ch/~juergen/lstm/
Hochreiter, S., Steven, Y.A., Conwell, P.R.: Learning to learn using gradient descent. ICANN. (2001)
Schmidhuber, J.: Learning to control fast-weight memories: an alternative to recurrent nets. Neural Comput. (1992)
Jeff, D., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. CVPR. (2015)
Mengye, R., Kiros, R., Zemel, R.: Exploring models and data for image question answering. ICML. (2015)
Alex, G., Doktors der Naturwissenschaften.: Supervised Sequence Labelling with Recurrent Neural Networks
Graves, A., Fernandez, S., Schmidhuber, J.: Multi-dimensional recurrent neural networks. ICANN. (2007)
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures—DAG-RNN’s and the protein structure prediction problem. JMLR. (2003)
Karol, G., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: a recurrent neural network for image generation. ICML. (2015)
Richard, S., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. NIPS. (2012)
B., Shuai, Zuo, Z., Gang, W.: Quaddirectional 2D-recurrent neural networks for image labeling. IEEE SPL. (2015)
Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., Chen, Y.: Convolutional recurrent neural networks: learning spatial dependencies for image representation. CVPR. (2015)
Alex, G., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. NIPS. (2008)
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML. (2012)
Kyunghyun, C., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP. (2014)
Kyunghyun, C., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. SSST-8. (2014)
Peter, T., Horne, B.G., Lee Giles, C.: Collingwood, P.C.: Finite state machines and recurrent neural networks—automata and dynamical systems approaches. Neural Networks Pattern Recogn. Chapter 6, (1998)
Arai, K., Nakano, R.: Stable behavior in a recurrent neural network for a finite state machine. Neural Netw. 13(6), (2000)
Wojciech, Z., Sutskever, I.: Learning to execute
Rumelhart, D.E., McClelland, J.L.: Parallel Distributed processing: explorations in the microstructure of cognition. (1986)
Elman, J.L.: Finding structure in time. Cogn. Sci. (1990)
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. (1991)
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition (1993)
Williams, R.J., Zipser, D.: Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity. Back-propagation: Theory, Architectures and Applications, Lawrence Erlbaum Publishers, (1995)
Robinson, A.J., Fallside, F.: The Utility Driven Dynamic Error Propagation Network. Technical Report CUED/F-INFENG/TR.1, Cambridge, (1987)
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE (1990)
Boden, M.: A guide to recurrent neural networks and backpropagation. (2014)
Ders, F.: Long Short-Term Memory in Recurrent Neural Networks, PhD Dissertation, (2001)
Qi, L., Zhu, J.: Revisit long short-term memory: an optimization perspective. NIPS. (2015)
Sutskever, I., Vinyals, O., Le, QV.: Sequence to sequence learning with neural networks. NIPS. (2014)
Kyunghyun, C., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. (2014)
Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. CVPR. (2015)
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML), (2011)
Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Advances in Neural Information Processing Systems, NIPS. (2010)
Volodymyr, M., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent Models of Visual Attention
Steve, B., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In Computer Vision–ECCV, Springer, (2010)
Tom, S., Glasmachers, T., Schmidhuber, J.: High dimensions and heavy tails for natural evolution strategies. Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. ACM. (2011)
Zaremba, W., Sutskever, I.: Reinforcement Learning Neural Turing Machines. (2015)
Hebb, D.: The Organization of Behaviour. Wiley, New York (1949)
Liefeng, B., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. CVPR. (2011)
Ivakhnenko, G.A., Cerda R.: Inductive Self-Organizing GMDH Algorithms for Complex Systems Modeling and Forecasting, http://www.gmdh.net/articles/index.html, see the general GMDH website for several other resources, http://www.gmdh.net
The review of problems solvable by algorithms of the group method of data handling. Pattern Recogn. Image Anal. (1995), www.gmdh.net/articles/
Ladislav, Z.: Learning simple dependencies by polynomial neural network. J. Inform. Contr. Manag. Syst. 8(3), (2010)
Liefeng, B., Sminchisescu, C.: Efficient match kernel between sets of features for visual recognition. NIPS. (2009)
Julesz, B.: Textons, the elements of texture perception and their interactions. Nature 290, 91–97 (1981)
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV. (2007)
Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. IEEE CV. (2005)
Lampert, C.H.: Kernel methods in computer vision. Found. Trends Comput. Graph. Vis. 4(3), 193–285 (2009)
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. ICCV. (2005)
Youngmin, C., Saul, L.K.: Kernel methods for deep learning. NIPS. (2009)
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. (2009)
Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. Int. Conf. Comput. Vis. (2007)
Klaus-Robert, M., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE TNN. (2001)
Nilsback, M.-E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR. (2006)
Liefeng, B., Ren, X., Fox, D., Kernel descriptors for visual recognition. NIPS. (2010)
Boswell, D.: Introduction to Support Vector Machines. (2002)
Radu Tudor, I., Popescu, M., Grozea, C.: Local learning to improve bag of visual words model for facial expression recognition. ICML. (2013)
Haussler. D.: Convolution kernels on discrete structures. Tech. Rep. (1999)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Asilomar Conf. Signals Syst. Comput. (1993)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Bruna, J., Mallat, S.: Invariant Scattering Convolution Networks. (2012)
Wonmin, B., Breuel, T.M., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. CVPR. (2015)
Du, Y., Wei, W., Liang, W.: Hierarchical recurrent neural network for skeleton based action recognition. CVPR. (2015)
Jianchao, Y., Yu, K., Lv, F., Huang, Yihong Gong, T.: Locality-constrained Linear Coding for image classification. CVPR (2001) Jinjun Wang Akiira Media Syst., Palo Alto, CA, USA
Reubold, J.: Kernel descriptors in comparison with hierarchical matching pursuit. Seminar Thesis, Proceedings of the Robot Learning Seminar, (2010)
John, S.-T., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, (2004)
Hofmann, T., Scholkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat.
Rojas, R: Neural Networks—A Systematic Introduction, Springer, (1996)
Teknomo, K.: Support Vector Machines Tutorial
Vladimir, C., Mulier, F.M.: Learning from Data: Concepts, Theory, and Methods, 2nd ed., Wiley, (2007)
Dan, C., Meier, U., Schmidhuber, J.: Multi-column Deep Neural Networks for Image Classification. CVPR. (2012)
Amnon, S., Hazan, T.: Algebraic set kernels with application to inference over local image representations. (2005)
Gehler, P, Nowozin, S.: On feature combination for multiclass object classification. CVPR. (2009)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. JMLR. (2004)
Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. NIPS. (2009)
Candes, E., Romberg, J.: Sparsity and incoherence in compressive sampling. Inverse Probl. 23, 969 (2007)
Kai, Y., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. CVPR. (2011)
Jian, Z.F., Song, L., Yang X.K., Zhang, W.: Sub clustering K-SVD: size variable dictionary learning for sparse representations. ICIP. (2009)
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. (1996)
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 3397–3415, (1993)
Kwon, S., Wang, J., Shim, B.: Multipath matching pursuit. IEEE Trans. Inform. Theor. (2014)
Lloyd, S.P.: Least square quantization in PCM. Bell Telephone Laboratories Paper. Published in journal much later: Lloyd, S.P.: Least squares quantization in PCM, IEEE Trans. Inform. Theor. (1957/1982)
Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik 133(133), 97–178 (1908)
Mairal, J.: Sparse Coding for Machine Learning, Image Processing and Computer Vision. PhD thesis. Ecole Normale Superieure de Cachan. (2010)
Mairal, J., Sapiro, G., Elad, M.: Multiscale sparse image representation with learned dictionaries. In: IEEE International Conference on Image Processing, San Antonio, Texas, USA, (2007), Oral Presentation
Mairal, J., Sapiro, G., Elad, M.: Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Model. Simul. 7(1), 214–241 (2008)
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Learning hierarchical and topographic dictionaries with structured sparsity. In: Proceeding of the SPIE Conference on Wavelets and Sparsity XIV. (2011)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)
Ethem, A.: Introduction to Machine Learning, MIT Press, (2004)
Tom, M.: Machine Learning, McGraw Hill, (1997)
LeCun, Y., Chopra, S., Hadsell, R., Huang, F.-J., Ranzato, M.-A.: A Tutorial on Energy-Based Learning, in Predicting Structured Outputs, MIT Press, (2006)
Pursuit, R.R., Zibulevsky, M., Elad, M.: Efficient Implementation of the K-SVD algorithm using Batch Orthogonal Matching. Technical Report—CS Technion, (2008)
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature. (1999)
Logothetis, N.K., Pauls, J., Poggio, T.: Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5(5), 552–563 (1995)
Tarr, M.: News on views: pandemonium revisited. Nat. Neurosci. (1999)
Selfridge, O.G.: Pandemonium: a paradigm for learning. Proceedings of the Symposium on Mechanisation of Thought Processes (1959)
Bülthoff, H., Edelman, S.: Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. U. S. A. 89, 60–64 (1992)
Logothetis, N., Pauls, J., Bülthoff, H., Poggio, T.: Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 4, 401–414 (1994)
Tarr, M.: Rotating objects to recognize them: a case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonom Bull. Rev. 2, 55–82 (1995)
Booth, M., Rolls, E.: View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8, 510–523 (1998)
Kobatake, E., Wang, G., Tanaka, K.: Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. J. Neurophysiol. 80, 324–330 (1998)
Perrett, D., et al.: Viewer-centred and object-centred coding of heads in the macaque temporal cortex. Exp. Brain Res. 86, 159–173 (1991)
Perrett, D.I., Rolls, E.T., Caan, W.: Visual neurons responsive to faces in the monkey temporal cortex. Exp. Brain Res. 47, 329–342 (1982)
Tanaka, K., Saito, H.-A., Fukada, Y. & Moriya, M.: Coding visual images of objects in the inferotemporal cortex of the macaque monkey. J. Neurophysiol. 66, 170–189
Parental olfactory experience influences behavior and neural structure in subsequent generations. Nat. Neurosci. 17, 89–96, (2014)
Gjoneska, E., Pfenning, A., Mathys, H., Quon, G., Kundage, A., Tsai, L.H., Kellis, M.: Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature (2015), doi: 10.1038/nature14252
Tanaka, K.: Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19, 109–139 (1996)
Logothetis, N.K., Sheinberg, D.L.: Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996)
Mutch, J., Lowe, D.: Multiclass object recognition with sparse, localized features. CVPR. (2006)
Serre, R.: Realistic modeling of simple and complex cell tuning in the HMAX model, and implications for invariant object recognition in cortex. CBL Memo. 239 (2004)
Hu, X.-L., Zhang, J.-W., Li, J.-M., Zhang, B.: Sparsity-regularized HMAX for visual recognition. PLOS One. 9(1), (2014)
Charles, C., Kouh, M., Riesenhuber, M., & Poggio, T.: Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition. AI Memo 2004-024 (2004)
Christian, T., Thome, N., Cord, M.: HMAX-S: deep scale representation for biologically inspired image categorization. ICIP. (2011)
Riesenhuber, M., Poggio, T.: Neural mechanisms of object recognition. Curr. Opin. Neurobiol. 12, 162–168 (2002)
Ungerleider, L.G., Haxby, J.V.: “What” and “Where” in the human brain. Curr. Opin. Neurobiol. 4, 157–165a, (1994), National Institute of Mental Health, Bethesda, USA
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. PAMI. (2007)
Mutch, J.: HMAX architecture models slide presentation. (2010)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of CVPR, (2006)
Florent, P., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. ECCV. (2010)
Giorgos, T., Avrithis, Y., Jégou, H.: To aggregate or not to aggregate: selective match kernels for image search. ICCV. (2013)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, (1999)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. INRIA Rennes, Rennes, France, CVPR. (2010)
Relja, A., Zisserman, A.: All about VLAD. CVPR. (2013)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. Br. Mach. Vis. Conf. (2011)
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Proceedings of ECCV, (2010)
van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Proceedings of ECCV, (2008)
Perronnin, F., Liu, Y., S´anchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. CVPR. (2010)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Proceedings of ECCV, (2010)
J´egou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. Int. J. Comput. Vis. 87(3), 316–336 (2010)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE PAMI. (2012)
Hong Lau, K., Tay, Y.H., Lo, F.L.: A HMAX with LLC for visual recognition. CVPR. (2015)
Smith, K.: Brain decoding: reading minds. Nature 502(7472), (2013)
Smith, K.: Mind-reading with a brain scan. Nature (2008)
Bartholomew-Biggs, M., Brown, S., Christianson, B., Dixon, L.: “Automatic differentiation of algorithms” (PDF). J. Comput. Appl. Math. 124(1-2), 171–190 (2000)
Plaut, D., Nowlan, S., Hinton, G.: Experiments on Learning by Back Propagation, Carnegie Mellon University, (1986)
Cayley, A.: On the theory of groups, as depending on the symbolic equation θ n = 1. Phil. Mag. 7, (1854)
Cayley, A.: On the theory of groups. Am. J. Math. 11 (1889)
Voytek, B.: Brain metrics. Nature (2013)
Langleben Daniel, D., Dattilio Frank, M.: Commentary: the future of forensic functional brain imaging. J. Am. Acad. Psychiatry Law 36(4), 502–504 (2008)
Finn, E.S., Shen, X., Scheinost, D., Rosenberg, M.D., Huang, J., Chun, M.M., Papademetris, X., Todd Constable, R.: Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nature (2015)
Bergami, M., Masserdotti, G., Temprana, S.G., Motori, E., Eriksson, T.M., Göbel, J., Yang, S.M., Conzelmann, K.-K., Schinder, A.F., Götz, M., Berninger, B.: A critical period for experience-dependent remodeling of adult-born neuron connectivity. Neuron (2015)
Allen Lee, W.-C., Huang, H., Feng, G., Sanes, J.R., Brown, E.N., So, P.T., Nedivi, E.: Dynamic remodeling of dendritic arbors in gabaergic interneurons of adult visual cortex. PLoS 4(2), e29 (2006)
Wu, Z., Shuran, S., Aditya, K., Fisher, Y., Linguang, Z., Xiaoou, T., Jianxiong, X.: 3D ShapeNets: a deep representation for volumetric shapes. CVPR. (2015)
Xiang, Y., Wongun, C., Yuanqing, L., Silvio, S.: Data-driven 3D voxel patterns for object category recognition. CVPR. (2015)
Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. CVPR. (2015)
Martinovic, A., Jan, K., Riemenschneider, H., Van Gool, L.: 3D All the way: semantic segmentation of urban scenes from start to end in 3D. CVPR. (2015)
Rock, J., Tanmay, G., Justin, T., JunYoung, G., Daeyun, S., Derek, H.: Completing 3D object shape from one depth image. CVPR. (2015)
Yub, J., Lee, H., Seok Heo, S., Dong Yun, Y., II.: Random tree walk toward instantaneous 3D human pose estimation. CVPR. (2015)
Shape Priors Karimi Mahabadi, R., Hane, C., Pollefeys, M.: Segment based 3D object shape priors. CVPR (2015)
Xiaowei, Z., Spyridon, L., Xiaoyan, H., Kostas, D.: D shape estimation from 2D landmarks: a convex relaxation approach. CVPR (2015)
Levi, G., Hassner, T.: LATCH: learned arrangements of three patch codes, arXiv preprint arXiv:1501.03719 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. (2015)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Romero, A., Nicolas, B., Samira Ebrahimi, K., Antoine, C., Carlo, G., Yoshua, B.: FitNets: hints for thin deep nets. arXiv:1412.6550 [cs], (2014)
Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, ACM (2006)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. (2009)
Nikolaus, M., Eddy, I., Philip H., Philipp F., Daniel C., Alexey D., Thomas B.: A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. CVPR, (2016)
Horn, B.K.P.: Shape from Shading: A Method for Obtaining the Shape of a Smooth Opaque Object from One View, MIT DARPA report, (1970)
Mutto, C.D., Zanuttigh, P., Cortelazzo, G.M.: Microsoft Kinect™ Range Camera. Springer, (2014)
Mojsilovic, A.: A method for color naming and description of color composition in images, ICIP, (2002)
van de Weijer, J., Schmid, C., Verbeek, J.: Learning color names from real world images. CVPR, (2007)
Khan, R., Van de Weijer, J., Shahbaz Khan, F., Muselet, D., Ducottet, C., Barat, C.: Discriminative Color Descriptors. CVPR, (2013)
van de Weijer, J., Schmid, C.: Coloring Local Feature Extraction. ECCV, (2006)
Sung-Hyauk Cha.: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions, IJMMMAS, (see also Duda [826])
Deza, E., Deza, M.M.: Dictionary of Distances, Elsevier, (2006)
Glasner, D., Bagon, S., Irani, M.: Super-Resolution From a Single Image. ICCV, (2009)
Vedaldi, V., Varma, G.M., Zisserman, A.: Multiple Kernels for Object Detection A. (2009)
Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOGgles: Visualizing Object Detection Features. ICCV, (2013)
Huang, Y., Nat. Lab. of Pattern Recognition (NLPR); Inst. of Autom.; Beijing, China; Wu, Z., Wang, L., Tan, T., PAMI.: Feature Coding in Image Classification: A Comprehensive Study, (2014)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.-F.: Imagenet: a large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. IEEE, 2009
Targ, S., Almeida, D., Lyman K.: Resnet in Resnet: generalizing residual architectures, arXiv: 1603.08029. (2016)
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv: 1602.07261, (2016)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Krig, S. (2016). Global and Regional Features. In: Computer Vision Metrics. Springer, Cham. https://doi.org/10.1007/978-3-319-33762-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-33762-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33761-6
Online ISBN: 978-3-319-33762-3
eBook Packages: Computer ScienceComputer Science (R0)