Global and Regional Features

Krig, Scott

doi:10.1007/978-3-319-33762-3_3

Scott Krig²

3798 Accesses
1 Citations

Abstract

This chapter covers the metrics of general feature description, often used for whole images and image regions, including textural, statistical, model based, and basis space methods. Texture, a key metric, is a well-known topic within image processing, and it is commonly divided into structural and statistical methods. Structural methods look for features such as edges and shapes, while statistical methods are concerned with pixel value relationships and statistical moments. Methods for modeling image texture also exist, primarily useful for image synthesis rather than for description. Basis spaces, such as the Fourier space, are also use for feature description.

Measure twice, cut once.

—Carpenter’s saying

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter covers the metrics of general feature description, often used for whole images and image regions, including textural, statistical, model based, and basis space methods. Texture, a key metric, is a well-known topic within image processing, and it is commonly divided into structural and statistical methods. Structural methods look for features such as edges and shapes, while statistical methods are concerned with pixel value relationships and statistical moments. Methods for modeling image texture also exist, primarily useful for image synthesis rather than for description. Basis spaces, such as the Fourier space, are also use for feature description.

It is difficult to develop clean partitions between the related topics in image processing and computer vision that pertain to global vs. regional vs. local feature metrics; there is considerable overlap in the applications of most metrics. However, for this chapter, we divide these topics along reasonable boundaries, though those borders may appear to be arbitrary. Similarly, there is some overlap between discussions here on global and regional features and topics that are covered in Chap. 2 on image processing and that are discussed in Chap. 6 on local features. In short, many methods are used for local, regional, and global feature description, as well as image processing, such as the Fourier transform and the LBP.

But we begin with a brief survey of some key ideas in the field of texture analysis and general vision metrics.

Historical Survey of Features

To compare and contrast global, regional, and local feature metrics, it is useful to survey and trace the development of the key ideas, approaches, and methods used to describe features for machine vision. This survey includes image processing (textures and statistics) and machine vision (local, regional, and global features). Historically, the choice of feature metrics was limited to those that were computable at the time, given the limitations in compute performance, memory, and sensor technology. As time passed and technology developed, the metrics have become more complex to compute, consuming larger memory footprints. The images are becoming multimodal, combining intensity, color, multiple spectrums, depth sensor information, multiple-exposure settings, high dynamic range imagery, faster frame rates, and more precision and accuracy in x, y, and Z depth. Increases in memory bandwidth and compute performance, therefore, have given rise to new ways to describe feature metrics and perform analysis.

Many approaches to texture analysis have been tried; these fall into the following categories:

Structural, describing texture via a set of micro-texture patterns known as texels. Examples include the numerical description of natural textures such as fabric, grass, and water. Edges, lines, and corners are also structural patterns, and the characteristics of edges within a region, such as edge direction, edge count, and edge gradient magnitude, are useful as texture metrics. Histograms of edge features can be made to define texture, similar to the methods used in local feature descriptors such as SIFT (described in Chap. 6).
Statistical, based on gray level statistical moments describing point pixel area properties, and includes methods such as the co-occurrence matrix or SDM. For example, regions of an image with color intensity within a close range could be considered as having the same texture. Regions with the same histogram could be considered as having the same texture.
Model based, including fractal models, stochastic models, and various semi-random fields. Typically, the models can be used to generate synthetic textures, but may not be effective in recognizing texture, and we do not cover texture generation.
Transform or basis based, including methods such as Fourier, Wavelets, Gabor filters, Zernike, and other basis spaces, which are treated here as a subclass of the statistical methods (statistical moments); however, basis spaces are used in transforms for image processing and filtering as well.

Key Ideas: Global, Regional, and Local Metrics

Let us take a brief look at a few major trends and milestones in feature metrics research. While this brief outline is not intended to be a precise, inclusive look at all key events and research, it describes some general trends in mainstream industry thinking and academic activity.

1960s, 1970s, 1980s—Whole-Object Approaches

During this period, metrics describe mostly whole objects, larger regions, or images; pattern matching was performed on large targets via FFT spectral methods and correlation; recognition methods included object, shape, and texture metrics; and simple geometric primitives were used for object composition. Low-resolution images such as NTSC, PAL, and SECAM were common—primarily gray scale with some color when adequate memory was available. Some satellite images were available to the military with higher resolution, such as LANDSAT images from NASA and SPOT images from France.

Some early work on pattern recognition began to use local interest points and features: notably, Moravic [502] developed a local interest point detector in 1981, and in 1988 Harris and Stephens [148] developed local interest point detectors. Commercial systems began to appear, particularly the View PRB in the early 1980s, which used digital correlation and scale space super-pixels for coarse to fine matching, and real-time image processing and pattern recognition systems were introduced by Imaging Technology. Rack-mounted imaging and machine vision systems began to be replaced by workstations and high-end PCs with add-on imaging hardware, array processors, and software libraries and applications by companies such as Krig Research.

Early 1990s—Partial-Object Approaches

Compute power and memory were increasing, enabling more attention to local feature methods, such as developments from Shi and Tomasi [149] improving the Harris detector methods, Kitchen and Rosenfeld [200] developing gray level corner detection methods, and methods by Wang and Brady [205]. Image moments over polygon shapes were computed using Zernike polynomials in 1990 by Khotanzad and Hong [268]. Scale space theory was applied to computer vision by Lindberg [502], and many other researchers followed this line of thinking into the future, such as Lowe [153] in 2004.

Metrics described smaller pieces of objects or object components and parts of images; there was increasing use of local features and interest points. Large sets of sub-patterns or basis vectors were used and corresponding metrics were developed. There was increased use of color information; more methods appeared to improve invariance for scale, rotational, or affine variations; and recognition methods were developed based on finding parts of an object with appropriate metrics. Higher image resolution, increased pixel depths, and color information were increasingly used in the public sector (especially in medical applications), along with of new affordable image sensors, such as the KODAK MEGA-PLUS, which provided a 1024 × 1024 image.

Mid-1990s—Local Feature Approaches

More focus was put on metrics that identify small local features surrounding interest points in images. Feature descriptors added more details from a window or patch surrounding each feature, and recognition was based on searching for sets of features and matching descriptors with more complex classifiers. Descriptor spectra included gradients, edges, and colors.

Late 1990s—Classified Invariant Local Feature Approaches

New feature descriptors were developed and refined to be invariant to changes in scale, lightness, rotation, and affine transformations. Work by Schmidt and Mohr [340] advanced and generalized the local feature description methods. Features acted as an alphabet for spelling out complex feature descriptors or vectors whereby the vectors were used for matching. The feature matching and classification stages were refined to increase speed and effectiveness using neural nets and other machine learning methods [134].

Early 2000s—Scene and Object Modeling Approaches

Scenes and objects were modeled as sets of feature components or patterns with well-formed descriptors; spatial relationships between features were measured and used for matching; and new complex classification and matching methods used boosting and related methods to combine strong and weak features for more effective recognition. The SIFT [153] algorithm from Lowe was published; SURF was also published by Bay et al. [152], taking a different approach using HAAR features rather than just gradients. The Viola–Jones method [486] was published, using HAAR features and a boosted learning approach to classification, accelerating matching. The OpenCV library for computer vision was developed by Bradski at INTEL™, and released as open source.

Mid-2000s—Finer-Grain Feature and Metric Composition Approaches

The number of researchers in this field began to mushroom; various combinations of features and metrics (bags of features) were developed by Czurka et al. [226] to describe scenes and objects using key points as described by Sivic [503]; new local feature descriptors were created and old ones refined; and there was increased interest in real-time feature extraction and matching methods for commercial applications. Better local metrics and feature descriptors were analyzed, measured, and used together for increased pattern match accuracy. Also, feature learning and sparse feature codebooks were developed to decrease pattern space, speed up search time, and increase accuracy.

Post-2010—Multimodal Feature Metrics Fusion

There has been increasing use of depth sensor information and depth maps to segment images and describe features and create VOXEL metrics for example see Rusu et al. [380], for example 2D texture metrics are expressed in 3-space. 3D depth sensing methods proliferate, increasing use of high-resolution images and high dynamic range (HDR) images to enhance feature accuracy, and greater bit depth and accuracy of color images allows for valuable color-based metrics and computational imaging. Increased processing power and cheap, plentiful memory handle larger images on low-cost compute platforms. Faster and better feature descriptors using binary patterns have been developed and matched rapidly using Hamming distance, such as FREAK by Alahi et al. [122] and ORB by Rublee et al. [112]. Multimodal and multivariate descriptors [770, 771] are composed of image features with other sensor information, such as accelerometers and positional sensors.

Future computing research may even come full circle, when sufficient compute and memory capacity exist to perform the older methods, like correlation across multiple scales and geometric perspectives in real-time using parallel and fixed-function hardware methods. This would obviate some of the current focus on small invariant sets of local features and allow several methods to be used together, synergistically. Therefore, the history of development in this field is worth knowing, since it might repeat itself in a different technological embodiment.

Since there is no single solution for obtaining the right set of feature metrics, all the methods developed over time have applications today and are still in use.

Textural Analysis

One of the most basic metrics is texture, which is the description of the surface of an image channel, such as color intensity, like an elevation map or terrain map. Texture can be expressed globally or within local regions. Texture can be expressed locally by statistical relationships among neighboring pixels in a region, and it can be expressed globally by summary relationships of pixel values within an image or region. For a sampling of the literature covering a wide range of texture methods, see Refs. [13, 16–20, 52, 53, 302, 304, 305].

According to Gonzalez [4], there are three fundamental classes of texture in image analysis: statistical, structural, and spectral. Statistical measures include histograms, scatter plots, and SDMs. Structural techniques are more concerned with locating patterns or structural primitives in an image, such as parallel lines, regular patterns, and so on. These techniques are described in [1, 5, 8, 11]. Spectral texture is derived from analysis of the frequency domain representation of the data. That is, a fast Fourier transform is used to create a frequency domain image of the data, which can then be analyzed using Fourier techniques.

Histograms reveal overall pixel value distributions but say nothing about spatial relationships. Scatter plots are essentially two-dimensional histograms, and do not reveal any spatial relationships. A good survey is found in Ref. [307].

Texture has been used to achieve several goals:

Texture-based segmentation (covered in Chap. 2).
Texture analysis of image regions (covered in this chapter).
Texture synthesis, creating images using synthetic textures (not covered in this book).

In computer vision, texture metrics are devised to describe the perceptual attributes of texture by using discrete methods. For instance, texture has been described perceptually with several properties, including:

Contrast
Color
Coarseness
Directionality
Line-likeness
Roughness
Constancy
Grouping
Segmentation

If textures can be recognized, then image regions can be segmented based on texture and the corresponding regions can be measured using shape metrics such as area, perimeter, and centroid (as discussed in Chap. 6). Chapter 2 included a survey of segmentation methods, some of which are based on texture. Segmented texture regions can be recognized and compared for computer vision applications. Micro-textures of a local region, such as the LBP discussed in detail in Chap. 6, can be useful as a feature descriptor, and macro-textures can be used to describe a homogenous texture of a region such as a lake or field of grass, and therefore have natural applications to image segmentation. In summary, texture can be used to describe global image content, image region content, and local descriptor region content. The distinction between a feature descriptor and a texture metric may be small.

Sensor limitations combined with compute and memory capabilities of the past have limited the development of texture metrics to mainly 2D gray scale metrics. However, with the advances toward pervasive computational photography in every camera providing higher resolution images, higher frame rates, deeper pixels, depth imaging, more memory, and faster compute, we can expect that corresponding new advances in texture metrics will be made.

Here is a brief historical survey of texture metrics.

1950s Through 1970s—Global Uniform Texture Metrics

Auto-correlation or cross-correlation was developed by Kaizer [26] in 1955 as a method of looking for randomness and repeating pattern features in aerial photography, where auto-correlation is a statistical method of correlating a signal or image with a time-shifted version of itself, yielding a computationally simple method to analyze ground cover and structures.

Bajcsy [25] developed Fourier spectrum methods in 1973 using various types of filters in the frequency domain to isolate various types of repeating features as texture.

Gray level spatial dependency matrices, GLCMs, SDMs or co-occurrence matrices [6] were developed and used by Haralick in 1973, along with a set of summary statistical metrics from the SDMs to assist in numerical classification of texture. Some, but not all, of the summary metrics have proved useful; however, analysis of SDMs and development of new SDM metrics have continued, involving methods such as 2D visualization and filtering of the SDM data within spatial regions [23], as well as adding new SDM statistical metrics, some of which are discussed in this chapter.

1980s—Structural and Model-Based Approaches for Texture Classification

While early work focused on micro-textures describing statistical measures between small kernels of adjacent pixels, macro-textures developed to address the structure of textures within a larger region. K. Laws developed texture energy-detection methods in 1979 and 1980 [27–29], as well as texture classifiers, which may be considered the forerunners of some of the modern classifier concepts. The Laws method could be implemented as a texture classifier in a parallel pipeline with stages for taking gradients via of a set of convolution masks over Gaussian filtered images to isolate texture micro features, followed by a Gaussian smoothing stage to deal with noise, followed by the energy calculation from the combined gradients, followed by a classifier which matched texture descriptors.

Eigenfilters were developed by Ade [30] in 1983 as an alternative to the Laws gradient or energy methods and SDMs; eigenfilters are implemented using a covariance matrix representation of local 3 × 3 pixel region intensities, which allows texture analysis and aggregation into structure based on the variance within eigenvectors in the covariance matrix.

Structural approaches were developed by Davis [31] in 1979 to focus on gross structure of texture rather than primitives or micro-texture features. Hough transforms were invented in 1972 by Duda and Hart [220] as a method of finding lines and curves, and it was used by Eichmann and Kasparis [32] in 1988 to provide invariant texture description.

Fractal methods and Markov random field methods were developed into texture descriptors, and while these methods may be good for texture synthesis, they do not map well to texture classification, since both Fractal and Markov random field methods use random fields, thus there are limitations when applied to real-world textures that are not random.

1990s—Optimizations and Refinements to Texture Metrics

In 1993, Lam and Ip [33, 39] used pyramid segmentation methods to achieve spatial invariance, where an image is segmented into homogenous regions using Voronoi polygon tessellation and irregular pyramid segmentation techniques around Q points taken from a binary thresholded image; five shape descriptors are calculated for each polygon: area, perimeter, roundness, orientation, and major/minor axis ratio, combined into texture descriptors.

Local binary patterns (LBP ) were developed in 1994 by Ojala et al. [165] as a novel method of encoding both pattern and contrast to define texture [15, 16, 35, 36]; since then, hundreds of researchers have added to the LBP literature in the areas of theoretical foundations, generalization into 2D and 3D, domain-specific interest point descriptors used in face detection, and spatiotemporal applications to motion analysis [34]. LBP research remains quite active at this time. LBPs are covered in detail in Chap. 6. There are many applications for the powerful LBP method as texture metric, a feature descriptor, and an image processing operator, the latter which was discussed in Chap. 2.

2000 to Today—More Robust Invariant Texture Metrics and 3D Texture

Feature metrics research is investigating texture metrics that are invariant to scale, rotation, lighting, perspective, and so on to approach the capabilities of human texture discrimination. In fact, texture is used interchangeably as a feature descriptor in some circles. The work by Pun and Lee [37] is an example of development of rotational invariant texture metrics, as well as scale invariance. Invariance attributes are discussed in the general taxonomy in Chap. 5.

The next wave of metrics being developed increasingly will take advantage of 3D depth information. One example is the surface shape metrics developed by Spence [38, 304] in 2003, which provide a bump-map type metric for affine invariant texture recognition and texture description with scale and perspective invariance. Chapter 6 also discusses some related 3D feature descriptors.

Statistical Methods

The topic of statistical methods is vast, and we can only refer the reader to selected literature as we go along. One useful and comprehensive resource is the online NIST National Institute of Science and Technology Engineering Statistics Handbook,^{Footnote 1} including examples and links to additional resources and tools.

Statistical methods may be drawn upon at any time to generate novel feature metrics. Any feature, such as pixel values or local region gradients, can be expressed statistically by any number of methods. Simple methods, such as the histogram shown in Fig. 3.1, are invaluable. Basic statistics such as minimum, maximum, and average values can be seen easily in the histogram shown in Chap. 2 in Fig. 2.21. We survey several applications of statistical methods to computer vision here.

Texture Region Metrics

Now we look in detail at the specific metrics for feature description based on texture. Texture is one of the most-studied classes of metrics. It can be thought of in terms of the surface—for example, a burlap bag compared to silk fabric. There are many possible textural relationships and signatures that can be devised in a range of domains, with new ones being developed all the time. In this section we survey some of the most common methods for calculating texture metrics:

Edge metrics
Cross-correlation
Fourier spectrum signatures
Co-occurrence matrix, Haralick features, extended SDM features
Laws texture metrics
Tessellation
Local binary patterns (LBP)
Dynamic textures

Within an image, each image region has a texture signature, where texture is defined as a common structure and pattern within that region. Texture signatures may be a function of position and intensity relationships, as in the spatial domain, or be based on comparisons in some other function basis and feature domain, such as frequency space using Fourier methods.

Texture metrics can be used to both segment and describe regions. Regions are differentiated based on texture homogeneousness, and as a result, texture works well as a method for region segmentation. Texture is also a good metric for feature description, and as a result it is useful for feature detection, matching, and tracking.

Appendix B contains several ground truth datasets with example images for computing texture metrics, including the CUReT reflectance and texture database from Columbia University. Several key papers describe the metrics used against the CUReT dataset [21, 40–42] including the appearance of a surface as a bidirectional reflectance distribution function (BRDF) and a bidirectional texture function (BTF).

These metrics are intended to measure texture as a function of direction and illumination, to capture coarse details and fine details of each surface. If the surface texture contains significant sub-pixel detail not apparent in single pixels or groups of pixels, the BRDF reflectance metrics can capture the coarse reflectance details. If the surface contains pixel-by-pixel difference details, the BTF captures the fine texture details.

Edge Metrics

Edges, lines, contours, or ridges are basic textural features [308, 309]. A variety of simple metrics can be devised just by analyzing the edge structure of regions in an image. There are many edge metrics in the literature, and a few are illustrated here.

Computing edges can be considered on a continuum of methods from interest point to edges, where the interest point may be a single pixel at a gradient maxima or minima, with several connected gradient maxima pixels composed into corners, ridges line segments, or a contours. In summary, a gradient point is a degenerate edge, and an edge is a collection of connected gradient points.

The edge metrics can be computed locally or globally on image regions as follows:

Compute the gradient g(d) at each pixel, selecting an appropriate gradient operator g() and select the appropriate kernel size or distance d to target either micro or macro edge features.
The distance d or kernel size can be varied to achieve different metrics; many researchers have used 3 × 3 kernels.
Compute edge orientation by binning gradient directions for each edge into a histogram; for example, use 45° angle increment bins for a total of 8 bins at 0°, 45°, 90°, 135°, 180°, 225°, 270°.

Several other methods can be used to compute edge statistics. The representative methods are shown here; see also Shapiro and Stockton [499] for a standard reference.

Edge Density

Edge density can be expressed as the average value of the gradient magnitudes g _m in a region.

$$ {E}_{\mathrm{d}}=\frac{g_{\mathrm{m}}(d)}{\mathrm{pixels}\ \mathrm{in}\ \mathrm{region}} $$

Edge Contrast

Edge contrast can be expressed as the ratio of the average value of gradient magnitudes to the maximum possible pixel value in the region.

$$ {E}_{\mathrm{c}}=\frac{E_{\mathrm{d}}}{ \max\ \mathrm{pixel}\ \mathrm{value}} $$

Edge Entropy

Edge randomness can be expressed as a measure of the Shannon entropy of the gradient magnitudes.

$$ {E}_{\mathrm{e}}={\displaystyle \sum_{i=0}^n}{g}_{\mathrm{m}}\left({x}_i\right){ \log}_{\mathrm{b}}{g}_{\mathrm{m}}\left({x}_i\right) $$

Edge Directivity

Edge directivity can be expressed as a measure of the Shannon entropy of the gradient directions.

$$ {E}_{\mathrm{e}}={\displaystyle \sum_{i=0}^n}{g}_{\mathrm{d}}\left({x}_i\right){ \log}_{\mathrm{b}}\ {g}_{\mathrm{d}}\left({x}_i\right) $$

Edge Linearity

Edge linearity measures the co-occurrence of collinear edge pairs using gradient direction, as shown by edges a–b in Fig. 3.2.

$$ {E}_{\mathrm{l}}=\mathrm{cooccurrence}\ \mathrm{of}\ \mathrm{colinear}\ \mathrm{edge}\ \mathrm{pairs} $$

Edge Periodicity

Edge periodicity measures the co-occurrence of identically oriented edge pairs using gradient direction, as shown by edges a–c in Fig. 3.2.

$$ {E}_{\mathrm{p}}=\mathrm{cooccurrence}\ \mathrm{of}\ \mathrm{identically}\ \mathrm{oriented}\ \mathrm{edge}\ \mathrm{pairs} $$

Edge Size

Edge size measures the co-occurrence of opposite oriented edge pairs using gradient direction, as shown by edges a–d in Fig. 3.2.

$$ {E}_{\mathrm{s}}=\mathrm{cooccurrence}\ \mathrm{of}\ \mathrm{opposite}\ \mathrm{oriented}\ \mathrm{edge}\ \mathrm{pairs} $$

Edge Primitive Length Total

Edge primitive length measures the total length of all gradient magnitudes along the same direction.

$$ {E}_{\mathrm{t}}=\mathrm{total}\ \mathrm{length}\ \mathrm{of}\ \mathrm{gradient}\ \mathrm{magnitudes}\ \mathrm{with}\ \mathrm{same}\ \mathrm{direction} $$

Cross-Correlation and Auto-correlation

Cross-correlation [26] is a metric showing similarity between two signals with a time displacement between them. Auto-correlation is the cross-correlation of a signal with a time-displaced version of itself. In the literature on signal processing, cross-correlation is also referred to as a sliding inner product or sliding dot product. Typically, this method is used to search a large signal for a smaller pattern.

$$ f\ast g = \overline{f}\;\left(-t\right) \ast g(t) $$

Using the Wiener–Khinchin theorem as a special case of the general cross-correlation theorem, cross-correlation can be written as simply the Fourier transform of the absolute square of the function f _v, as follows:

$$ c\;(t)={\mathcal{F}}_{\mathrm{v}}\left[\left|{f}_{\mathrm{v}}\right|{}^2\right](t) $$

In computer vision, the feature used for correlation may be a 1D line of pixels or gradient magnitudes, a 2D pixel region, or a 3D voxel volume region. By comparing the features from the current image frame and the previous image frame using cross-correlation derivatives, we obtain a useful texture change correlation metric.

By comparing displaced versions of an image with itself, we obtain a set of either local or global auto-correlation texture metrics. Auto-correlation can be used to detect repeating patterns or textures in an image, and also to describe the texture in terms of fine or coarse, where coarse textures show the auto-correlation function dropping of more slowly than fine textures. See also the discussion of correlation in Chap. 6 and Fig. 6.20.

Fourier Spectrum, Wavelets, and Basis Signatures

Basis transforms, such as the FFT, decompose a signal into a set of basis vectors from which the signal can be synthesized or reconstructed. Viewing the set of basis vectors as a spectrum is a valuable method for understanding image texture and for creating a signature. Several basis spaces are discussed in this chapter, including Fourier, HAAR, wavelets, and Zernike.

Although computationally expensive and memory intensive, the Fast Fourier Transform (FFT) is often used to produce a frequency spectrum signature. The FFT spectrum is useful for a wide range of problems. The computations typically are limited to rectangular regions of fixed sizes, depending on the radix of the transform (see Bracewell [219]).

As shown in Fig. 3.3, Fourier spectrum plots reveal definite image features useful for texture and statistical analysis of images. For example, Fig. 3.10 shows an FFT spectrum of LBP pattern metrics. Note that the Fourier spectrum has many valuable attributes, such as rotational invariance, as shown in Fig. 3.3, where a texture image is rotated 90° and the corresponding FFT spectrums exhibit the same attributes, only rotated 90°.

Wavelets [219] are similar to Fourier methods, and have become increasingly popular for texture analysis [303], discussed later in the section on basis spaces.

Note that the FFT spectrum as a texture metric or descriptor is rotational invariant, as shown in the bottom left image of Fig. 3.3. FFT spectra can be taken over rectangular 2D regions. Also, 1D arrays such as annuli or Cartesian coordinates of the shape taken around the perimeter of an object shape can be used as input to an FFT and as an FFT descriptor shape metric.

Co-occurrence Matrix, Haralick Features

Haralick [6] proposed a set of 2D texture metrics calculated from directional differences between adjacent pixels, referred to as co-occurrence matrices, spatial dependency matrices (SDM), or gray level co-occurrence matrices (GLCM) . A complete set of four (4) matrices are calculated by evaluating the difference between adjacent pixels in the x, y, diagonal x and diagonal y directions, as shown in Fig. 3.4, and further illustrated with a 4 × 4 image and corresponding co-occurrence tables shown in Fig. 3.5.

One benefit of the SDM as a texture metric is that it is easy to calculate in a single pass over the image. The SDM is also fairly invariant to rotation, which is often a difficult robustness attribute to attain. Within a segmented region or around an interest point, the SDM plot can be a valuable texture metric all by itself, therefore useful for texture analysis, feature description, noise detection, and pattern matching.

For example, if a camera has digital-circuit readout noise, it will show up in the SDM for the x direction only if the lines are scanned out of the sensor one at a time in the x direction, so using the SDM information will enable intelligent sensor processing to remove the readout noise. However, it should be noted that SDM metrics are not always useful alone, and should be qualified with additional feature information. The SDM is primarily concerned with spatial relationships, with regard to spatial orientation and frequency of occurrence. So, it is primarily a statistical measure.

The SDM is calculated in four orientations, as shown in Fig. 3.4. Since the SDM is only concerned with adjacent pairs of pixels, these four calculations cover all possible spatial orientations. SDMs could be extended beyond 2 × 2 regions by using forming kernels extending into 5 × 5, 7 × 7, 9 × 9, and other dimensions.

A spatial dependency matrix is basically a count of how many times a given pixel value occurs next to another pixel value. Fig. 3.5 illustrates the concept. For example, assume we have an 8-bit image (0. 255). If an SDM shows that pixel value x frequently occurs adjacent to pixels within the range x + 1 to x − 1, then we would say that there is a “smooth” texture at that intensity. However, if pixel value x frequently occurs adjacent to pixels within the range x + 70 to x − 70, we would say that there is quite a bit of contrast at that intensity, if not noise.

A critical point in using SDMs is to be sensitive to the varied results achieved when sampling over small vs. large image areas. By sampling the SDM over a smaller area (say 64 × 64 pixels), details will be revealed in the SDMs that would otherwise be obscured. The larger the size of the sample image area, the more the SDM will be populated. And the more samples taken, the more likely that detail will be obscured in the SDM image plots. Actually, smaller areas (i.e., 64 × 64 pixels) are a good place to start when using SDMs, since smaller areas are faster to compute and will reveal a lot about local texture.

The Haralick metrics are shown in Fig. 3.6.

The statistical characteristics of the SDM have been extended by several researchers to add more useful metrics [23], and SDMs have been applied to 3D volumetric data by a number of researchers with good results [22].

Extended SDM Metrics (Krig SDM Metrics)

Extensions to the Haralick metrics have been developed by the author [23], primarily motivated by a visual study of SDM plots as shown in Fig. 3.7. Applications for the extended SDM metrics include texture analysis, data visualization, and image recognition. The visual plots of the SDMs alone are valuable indicators of pixel intensity relationships, and are worth using along with histograms to get to know the data.

The extended SDM metrics include centroid, total coverage, low-frequency coverage, total power, relative power, locus length, locus mean density, bin mean density, containment, linearity, and linearity strength. The extended SDM metrics capture key information that is best observed by looking at the SDM plots. In many cases the extended SDM metric are be computed four times, once for each SDM direction of 0°, 45°, 90°, and 135°, as shown in Fig. 3.5.

The SDMs are interesting and useful all by themselves when viewed as an image. Many of the texture metrics suggested are obvious after viewing and understanding the SDMs; others are neither obvious nor apparently useful until developing a basic familiarity with the visual interpretation of SDM image plots. Next, we survey the following:

Example SDMs showing four directional SDM maps: A complete set of SDMs would contain four different plots, one for each orientation. Interpreting the SDM plots visually reveals useful information. For example, an image with a smooth texture will yield a narrow diagonal band of co-occurrence values; an image with wide texture variation will yield a larger spread of values; a noisy image will yield a co-occurrence matrix with outlier values at the extrema. In some cases, noise may only be distributed along one axis of the image—perhaps, across rows or the x axis, which could indicated sensor readout noise as each line is read out of the sensor, suggesting a row- or line-oriented image preparation stage in the vision pipeline to compensate for the camera.
Extended SDM texture metrics: The addition of 12 other useful statistical measures to those proposed by Haralick.
Some code snippets: These illustrate the extended SDM computations, full source code is shown in Appendix D.

In Fig. 3.7, several of the extended SDM metrics can be easily seen, including containment and locus mean density. Note that the right image does not have a lot of outliner intensity points or noise (good containment); most of the energy is centered along the diagonal (tight locus), showing a rather smooth set of image pixel transitions and texture, while the left image shows a wider range of intensity values. For some images, wider range may be noise spread across the spectrum (poor containment), revealing a wider band of energy and contrast between adjacent pixels.

Metric 1: Centroid

To compute the centroid, for each SDM bin p(i,j), the count of the bin is multiplied by the bin coordinate for x,y and also the total bin count is summed. The centroid calculation is weighted to compute the centroid based on the actual bin counts, rather than an unweighted “binary” approach of determining the center of the binning region based on only bin data presence. The result is the weighted center of mass over the SDM bins.

$$ \mathrm{centroid} = {\displaystyle \sum_{i=0}^n}\ {\displaystyle \sum_{j=0}^m}\ \left(\begin{array}{c}\hfill x=jp\left(i,j\right)\hfill \\ {}\hfill y=ip\left(i,j\right)\hfill \\ {}\hfill z=p\left(i,j\right)\hfill \end{array}\right) $$

$$ {\mathrm{centroid}}_y = \frac{y}{z} $$

$$ {\mathrm{centroid}}_x = \frac{x}{z} $$

Metric 2: Total Coverage

This is a measure of the spread, or range of distribution, of the binning. A small coverage percentage would be indicative of an image with few gray levels, which corresponds in some cases to image smoothness. For example, a random image would have a very large coverage number, since all or most of the SDM bins would be hit. The coverage feature metrics (2, 3, 4), taken together with the linearity features suggested below (11, 12), can give an indication of image smoothness.

$$ {\mathrm{c}\mathrm{overage}}_{\mathrm{c}} = {\displaystyle \sum_{i=0}^n}\ {\displaystyle \sum_{j=0}^m}\ \left(\begin{array}{ll}1\hfill & \mathrm{if}\kern0.5em 0<p\left(i,j\right),\hfill \\ {}0\hfill & \mathrm{otherwise}\hfill \end{array}\right) $$

$$ {\mathrm{c}\mathrm{overage}}_{\mathrm{t}} = \frac{{\mathrm{c}\mathrm{overage}}_{\mathrm{c}}}{\left(n \ast m\right)} $$

Metric 3: Low-Frequency Coverage

For many images, any bins in the SDM with bin counts less than a threshold value, such as 3, may be considered as noise. The low-frequency coverage metric, or noise metric, provides an idea how much of the binning is in this range. This may be especially true as the sample area of the image area increases. For whole images, a threshold of 3 has proved to be useful for determining if a bin contains noise for a data range of 0-255, and using the SDM over smaller local kernel regions may use all the values with no thresholding needed.

$$ {\mathrm{c}\mathrm{overage}}_{\mathrm{c}} = {\displaystyle \sum_{i=0}^n}\ {\displaystyle \sum_{j=0}^m}\ \mathrm{if}\ 0<p\;\left(i,j\right)<3\left(\begin{array}{c}\hfill 1,\hfill \\ {}\hfill \mathrm{else}\kern0.5em 0\hfill \end{array}\right) $$

$$ {\mathrm{c}\mathrm{overage}}_{\mathrm{l}} = \frac{{\mathrm{c}\mathrm{overage}}_{\mathrm{c}}}{\left(n\ast m\right)} $$

Metric 4: Corrected Coverage

Corrected coverage is the total coverage with noise removed.

$$ {\mathrm{coverage}}_{\mathrm{n}}={\mathrm{coverage}}_{\mathrm{t}}-{\mathrm{coverage}}_{\mathrm{l}} $$

Metric 5: Total Power

The power metric provides a measure of the swing in value between adjacent pixels in an image, and is computed in four directions. A smooth image will have a low power number because the differences between pixels are smaller. Total power and relative power are inter-related, and relative power is computed using the total populated bins (z) and total difference power (t).

$$ {\mathrm{power}}_{\mathrm{c}} = {\displaystyle \sum_{i=0}^n}\ {\displaystyle \sum_{j=0}^m}\ \mathrm{if}\ p\;\left(i,j\right)\kern0.37em \ne 0\left(\begin{array}{c}\hfill z+=1,\hfill \\ {}\hfill t+=\left|i-j\right|\hfill \end{array}\right) $$

$$ {\mathrm{power}}_{\mathrm{t}} = t $$

Metric 6: Relative Power

The relative power is calculated based on the scaled total power using nonempty SDM bins t, while the total power uses all bins.

$$ {\mathrm{power}}_{\mathrm{r}} = \frac{t}{z} $$

Metric 7: Locus Mean Density

For many images, there is a “locus” area of high-intensity binning surrounding the bin axis (locus axis is where adjacent pixels are of the same value x = y) corresponding to a diagonal line drawn from the upper left corner of the SDM plot. The degree of clustering around the locus area indicates the amount of smoothness in the image. Binning from a noisy image will be scattered with little relation to the locus area, while a cleaner image will show a pattern centered about the locus.

$$ {\mathrm{locus}}_{\mathrm{c}} = {\displaystyle \sum_{i=0}^n}\ {\displaystyle \sum_{j=0}^m}\ \mathrm{if}\kern0.5em 0<\left|i-j\right|<7\left(\begin{array}{c}\hfill z+=1,\hfill \\ {}\hfill d+=p\left(i,j\right)\hfill \end{array}\right) $$

$$ {\mathrm{locus}}_{\mathrm{d}} = \frac{d}{z} $$

The locus mean density is an average of the bin values within the locus area. The locus is the area around the center diagonal line, within a band of 7 pixels on either side of the identity line (x = y) that passes down the center of each SDM. However, the number 7 is not particularly special, but based upon experience, it just gives a good indication of the desired feature over whole images. This feature is good for indicating smoothness.

Metric 8: Locus Length

The locus length measures the range of the locus concentration about the diagonal. The algorithm for locus length is a simple count of bins populated in the locus area; a threshold band of 7 pixels about the locus has been found useful.

y = length = 0;

while (y < 256) {

x = count = 0;

while (x < 256) {

n = |y-x|;

if (p[i,j] == 0) && (n < 7) count++;

x++;

}

if (!count) length++;

y++;

}

Metric 9: Bin Mean Density

This is simply the average bin count from nonempty bins.

$$ {\mathrm{density}}_{\mathrm{c}} = {\displaystyle \sum_{i=0}^n}\ {\displaystyle \sum_{j=0}^m}\ \mathrm{if}\ p\;\left(i,j\right)\ne 0\ \left(v=p\;\left(i,j\right),\ z + =1\right) $$

$$ {\mathrm{density}}_{\mathrm{b}} = \frac{v}{z} $$

Metric 10: Containment

Containment is a measure of how well the binning in the SDM is contained within the boundaries or edges of the SDM, and there are four edges or boundaries, for example assuming a data range [0…255], there are containment boundaries along rows 0 and 255, and along columns 0 and 255. Typically, the bin count m is 256 bins, or possibly less such as 64. To measure containment, basically the perimeters of the SDM bins are checked to see if any binning has occurred, where the perimeter region bins of the SDM represent extrema values next to some other value. The left image in Fig. 3.7 has lower containment than the right image, especially for the low values.

$$ {\mathrm{containment}}_1 = {\displaystyle \sum_{i=0}^m}\ \mathrm{if}\ p\;\left(i,0\right)\ne 0\ \left({c}_1+=1\right) $$

$$ {\mathrm{containment}}_2 = {\displaystyle \sum_{i=0}^m}\ \mathrm{if}\ p\;\left(i,m\right)\ne 0\ \left({c}_2+=1\right) $$

$$ {\mathrm{containment}}_3 = {\displaystyle \sum_{i=0}^m}\ \mathrm{if}\ p\;\left(0,i\right)\ne 0\ \left({c}_3+=1\right) $$

$$ {\mathrm{containment}}_4 = {\displaystyle \sum_{i=0}^m}\ \mathrm{if}\ p\;\left(m,i\right)\ne 0\ \left({c}_4+=1\right) $$

$$ {\mathrm{containment}}_{\mathrm{t}} = 1.0 - \frac{\left({c}_1 + {c}_2 + {c}_3 + {c}_4\right)}{4m} $$

If extrema are hit frequently, this probably indicates some sort of overflow condition such as numerical overflow, sensor saturation, or noise. The binning is treated unweighted. A high containment number indicates that all the binning took place within the boundaries of the SDM. A lower number indicates some bleeding. This feature appears visually very well in the SDM plots.

Metric 11: Linearity

The linearity characteristic may only be visible in a single orientation of the SDM, or by comparing SDMs. For example, the image in Fig. 3.8 reveals some linearity variations across the set of SDMs. This is consistent with the image sensor used (older tube camera).

$$ {\mathrm{linearity}}_{\mathrm{c}} = {\displaystyle \sum_{j=0}^m}\ \mathrm{if}\ p\;\left(jm,j\right)>1\left(\begin{array}{c}\hfill z+=1,\hfill \\ {}\hfill l + =p\left(256j,j\right)\hfill \end{array}\right) $$

$$ {\mathrm{linearity}}_{\mathrm{normalized}} = \frac{z}{m} $$

$$ {\mathrm{linearity}}_{\mathrm{strength}} = \frac{l}{z}\kern0.5em \ast {m}^{-1} $$

Metric 12: Linearity Strength

The algorithm for linearity strength is shown in Metric 11. If there is any linearity present in a given angle of SDM, both linearity strength and linearity will be comparatively higher at this angle than the other SDM angles (Table 3.1).

Table 3.1 Extended SDM metrics from Fig. 3.8

Full size table

Laws Texture Metrics

The Laws metrics [24, 27–29] provide a structural approach to texture analysis, using a set of masking kernels to measure texture energy or variation within fixed sized local regions, similar to the 2 × 2 region SDM approach but using larger pixel areas to achieve different metrics.

The basic Laws algorithm involves classifying each pixel in the image into texture based on local energy, using a few basic steps:

1.
The mean average intensity from each kernel neighborhood is subtracted from each pixel to compensate for illumination variations.
2.
The image is convolved at each pixel using a set of kernels, each of which sums to zero, followed by summing the results to obtain the absolute average value over each kernel window.
3.
The difference between the convolved image and the original image is measured, revealing the Laws energy metrics.

Laws defines a set of nine separable kernels to produce a set of texture region energy metrics, and some of the kernels work better than others in practice. The kernels are composed via matrix multiplication from a set of four vector masks L5, E5, S5, and R5, described below. The kernels were originally defined as 5 × 5 masks, but 3 × 3 approximations have been used also, as shown below.

5 × 5 form

3 × 3 approximations of 5 × 5 form

To create 2D masks, vectors Ln, En, Sn, and Rn (as shown above) are convolved together as separable pairs into kernels; a few examples are shown in Fig. 3.9.

Note that Laws texture metrics have been extended into 3D for volumetric texture analysis [43, 44].

LBP Local Binary Patterns

In contrast to the various structural and statistical methods of texture analysis, the LBP operator [18, 50] computes the local texture around each region as an LBP binary code, or micro-texture, allowing simple micro-texture comparisons to segment regions based on like micro-texture. (See the very detailed discussion on LBP in Chap. 6 for details and references to the literature, and especially Fig. 6.6.) The LBP operator [165] is quite versatile, easy to compute, consumes a low amount of memory, and can be used for texture analysis, interest points, and feature description. As a result, the LBP operator is discussed is several places in this book.

As shown in Fig. 3.10, the uniform set of LBP operators, composed of a subset of the possible LBPs that are by themselves rotation invariant, can be binned into a histogram, and the corresponding bin values are run through an FFT as a 1D array to create an FFT spectrum, which yields a robust metric with strong rotational invariance.

Dynamic Textures

Dynamic textures are a concept used to describe and track textured regions as they change and morph dynamically from frame to frame [13–15, 45] For example, dynamic textures may be textures in motion, like sea waves, smoke, foliage blowing in the wind, fire, facial expressions, gestures, and poses. The changes are typically tracked in spatiotemporal sets of image frames, where the consecutive frames are stacked into volumes for analysis as a group. The three dimensions are the XY frame sizes, and the Z dimension is derived from the stack of consecutive frames n − 2, n − 1, n.

A close cousin to dynamic texture research is the field of activity recognition (discussed in Chap. 6), where features are parts of moving objects that compose an activity—for example, features on arms and legs that are tracked frame to frame to determine the type of motion or activity, such as walking or running. One similarity between activity recognition and dynamic textures is that the features or textures change from frame to frame over time, so for both activity recognition and dynamic texture analysis, tracking features and textures often requires a spatiotemporal approach involving a data structure with a history buffer of past and current frames, which provides a volumetric representation to the data.

For example, VLBP and LBP-TOP (discussed in Chap. 6) provide methods for dynamic texture analysis by using the LBP constructed to operate over three dimensions in a volumetric structure, where the volume contains image frames n − 2, n − 1, and n stacked into the volume.

Statistical Region Metrics

Describing texture in terms of statistical metrics of the pixels is a common and intuitive method. Often a simple histogram of a region will be sufficient to describe the texture well enough for many applications. There are also many variations of the histogram, which lend themselves to a wide range of texture analysis. So this is a good point at which to examine histogram methods. Since statistical mathematics is a vast field, we can only introduce the topic here, dividing the discussion into image moment features and point metric features.

Image Moment Features

Image moments [4, 500] are scalar quantities, analogous to the familiar statistical measures such as mean, variance, skew, and kurtosis. Moments are well suited to describe polygon shape features and general feature metric information such as gradient distributions. Image moments can be based on either scalar point values or basis functions such as Fourier or Zernike methods discussed later in the section on basis space.

Moments can describe the projection of a function onto a basis space—for example, the Fourier transform projects a function onto a basis of harmonic functions. Note that there is a conceptual relationship between 1D and 2D moments in the context of shape description. For example, the 1D mean corresponds to the 2D centroid, and the 1D minimum and maximum correspond to the 2D major and minor axis. The 1D minimum and maximum also correspond to the 2D bounding box around the 2D polygon shape (also see Fig. 6.29).

In this work, we classify image moments under the term polygon shape descriptors in the taxonomy (see Chap. 5). Details on several image moments used for 2D shape description are covered in Chap. 6, under “Object Shape Metrics for Blobs and Objects.”

Common properties of moments in the context of 1D distributions and 2D images include:

Zeroth order moment is the mean or 2D centroid.
Central moments describe variation around the mean or 2D centroid.
First order central moments contain information about 2D area, centroid, and size.
Second order central moments are related to variance and measure 2D elliptical shape.
Third order central moments provide symmetry information about the 2D shape, or skewness.
Fourth order central moments measure 2D distribution as tall, short, thin, short, or fat.
Higher-level moments may be devised and composed of moment ratios, such as co-variance.

Moments can be used to create feature descriptors that are invariant to several robustness criteria, such as scale, rotation, and affine variations. The taxonomy of robustness and invariance criteria is provided in Chap. 5. For 2D shape description, in 1961 Hu developed a theoretical set of seven 2D planar moments for character recognition work, derived using invariant algebra, that are invariant under scale, translation, and rotation [7]. Several researchers have extended Hu’s work. An excellent resource for this topic is Moments and Moment Invariants in Pattern Recognition, by Jan Flusser et al. [500].

Point Metric Features

Point metrics can be used for the following: (1) feature description, (2) analysis and visualization, (3) thresholding and segmentation, and (4) image processing via programmable LUT functions (discussed in Chap. 2). Point metrics are often overlooked. Using point metrics to understand the structure of the image data is one of the first necessary steps toward devising the image preprocessing pipeline to prepare images for feature analysis. Again, the place to start is by analysis of the histogram, as shown in Figs. 3.1 and 3.11. The basic point metrics can be determined visually, such as minima, maxima, peaks, and valleys. False coloring of the histogram regions for data visualization is simple using color lookup tables to color the histogram regions in the images.

Here is a summary of statistical point metrics:

Quantiles, median, rescale: By sorting the pixel values into an ordered list, as during the histogram process, the various quartiles can be found, including the median value. Also, the pixels can be rescaled from the list and used for pixel remap functions (as described in Chap. 2).
Mix, max, mode: The minimum and maximum values, together with histogram analysis, can be used to guide image preprocessing to devise a threshold method to remove outliers from the data. The mode is the most common pixel value in the sorted list of pixels.
Mean, harmonic mean, and geometric mean: Various formulations of the mean are useful to learn the predominant illumination levels, dark or light, to guide image preprocessing to enhance the image for further analysis.
Standard deviation, skewness, and kurtosis: These moments can be visualized by looking at the SDM plots.
Correlation: Topic was covered earlier in this chapter under cross-correlation and auto-correlation.
Variance, covariance: The variance metric provides information on pixel distribution, and covariance can be used to compare variance between two images. Variance can be visualized to a degree in the SDM, also as shown in this chapter.
Ratios and multivariate metrics: Point metrics by themselves may be useful, but multivariate combinations or ratios using simple point metrics can be very useful as well. Depending on the application, the ratios themselves form key attributes of feature descriptors (as described in Chap. 6). For example, mean: min, mean: max, median: mean, area: perimeter.

Global Histograms

Global histograms treat the entire image. In many cases, image matching via global histograms is simple and effective, using a distance function such as SSD. As shown in Fig. 3.12, histograms reveal quantitative information on pixel intensity, but not structural information. All the pixels in the region contribute to the histogram, with no respect to the distance from any specific point or feature. As discussed in Chap. 2, the histogram itself is the basis of histogram modification methods, allowing the shape of the histogram to be stretched, compressed, or clipped as needed, and then used as an inverse lookup table to rearrange the image pixel intensity levels.

Local Region Histograms

Histograms can also be computed over local regions of pixels, such as rectangles or polygons, as well as over sets of feature attributes, such as gradient direction and magnitude or other spectra. To create a polygon region histogram feature descriptor, first a region may be segmented using morphology to create a mask shape around a region of interest, and then only the masked pixels are used for the histogram.

Local histograms of pixel intensity values can be used as attributes of a feature descriptor, and also used as the basis for remapping pixel values from one histogram shape to another, as discussed in Chap. 2, by reshaping the histogram and reprocessing the image accordingly. Chapter 6 discusses a range of feature descriptors such as SIFT, SURF, and LBP which make use of feature histograms to bin attributes such as gradient magnitude and direction.

Scatter Diagrams, 3D Histograms

The scatter diagram can be used to visualize the relationship or similarity between two image datasets for image analysis, pattern recognition, and feature description. Pixel intensity from two images or image regions can be compared in the scatter plot to visualize how well the values correspond. Scatter diagrams can be used for feature and pattern matching under limited translation invariance, but they are less useful for affine, scale, or rotation invariance. Fig. 3.13 shows an example using a scatter diagram to look for a pattern in an image, the target pattern is compared at different offsets, the smaller the offset, the better the correspondence. In general, tighter sets of peak features indicate a strong structural or pattern correspondence; more spreading of the data indicates weaker correspondence. The farther away the pattern offset moves, the lower the correspondence.

Note that by analyzing the peak features compared to the low-frequency features, correspondence can be visualized. Fig. 3.14 shows scatter diagrams from two separate images. The lack of peaks along the axis and the presence of spreading in the data show low structural or pattern correspondence.

The scatter plot can be made, pixel by pixel, from two images, where pixel pairs form the Cartesian coordinate for scatter plotting using the pixel intensity of image 1 is used as the x coordinate, and the pixel intensities of image 2 as the y coordinate, then the count of pixel pair correspondence is binned in the scatter plot. The bin count for each coordinate can be false colored for visualization. Fig. 3.15 provides some code for illustration purposes.

For feature detection, as shown in Fig. 3.12, the scatter plot may reveal enough correspondence at coarse translation steps to reduce the need for image pyramids in some feature detection and pattern matching applications. For example, the step size of the pattern search and compare could be optimized by striding or skipping pixels, searching the image at 8 or 16 pixel intervals, rather than at every pixel, reducing feature detection time. In addition, the scatter plot data could first be thresholded to a binary image, masked to show just the peak values, converted into a bit vector, and measured for correspondence using HAMMING distance for increased performance.

Multi-resolution, Multi-scale Histograms

Multi-resolution histograms have been used for texture analysis [46], and also for feature recognition [47]. The PHOG descriptor described in Chap. 6 makes use of multi-scale histograms of feature spectra—in this case, gradient information. Note that the multi-resolution histogram provides scale invariance for feature description. For texture analysis [46], multi-resolution histograms are constructed using an image pyramid, and then a histogram is created for each pyramid level and concatenated together [10], which is referred to as a multi-resolution histogram. This histogram has the desirable properties of algorithm simplicity, fast computation, low memory requirements, noise tolerance, and high reliability across spatial and rotational variations. See Fig. 3.16. A variation on the pyramid is used in the method of Zhao and Pietikainen [15], employing a multidimensional pyramid image set from a volume.

Steps involved in creating and using multi-resolution histograms are as follows:

1.
Apply Gaussian filter to image.
2.
Create an image pyramid.
3.
Create histograms at each level.
4.
Normalize the histograms using L1 norm.
5.
Create cumulative histograms.
6.
Create difference histograms or DOG images (differences between pyramid levels).
7.
Renormalize histograms using the difference histograms.
8.
Create a feature vector from the set of difference histograms.
9.
Use L1 norm as distance function for comparisons between histograms.

Radial Histograms

For some applications, computing the histogram using radial samples originating at the shape centroid can be valuable [128, 129]. To do this, a line is cast from the centroid to the perimeter of the shape, and pixel values are recorded along each line and then binned into histograms. See Fig. 3.17.

Contour or Edge Histograms

The perimeter or shape of an object can be the basis of a shape histogram, which includes the pixel values of each point on the perimeter of the object binned into the histogram. Besides recording the actual pixel values along the perimeter, the chain code histogram (CCH) that is discussed in Chap. 6 shows the direction of the perimeter at connected edge point coordinates. Taken together, the CCH and contour histograms provide useful shape information.

Basis Space Metrics

Features can be described in a basis space, which involves transforming pixels into an alternative basis and describing features in the chosen basis, such as the frequency domain. What is a basis space and what is a transform? Consider the decimal system, which is base 10, and the binary system which is base 2. We can change numbers between the two number systems by using a transform. A Fourier transform uses sine and cosine as basis functions in frequency space, so that the Fourier transform can move pixels between the time-domain pixel space and the frequency space. Basis space moments describe the projection of a function onto a basis space [500]—for example, the Fourier transform projects a function onto a basis of harmonic functions.

Basis spaces and transforms are useful for a wide range of applications, including image coding and reconstruction, image processing, feature description, and feature matching. As shown in Fig. 3.18, image representation and image coding are closely related to feature description. Images can be described using coding methods or feature descriptors , and images also can be reconstructed from the encodings or from the feature descriptors. Many methods exist to reconstruct images from alternative basis space encodings, ranging from lossless RLE methods to lossy JPEG methods; in Chap. 4, we provide illustrations of images that have been reconstructed from only local feature descriptors (see Figs. 4.12, 4.13 and 4.14).

As illustrated in Fig. 3.18, a spectrum of basis spaces can be imagined, ranging from a continuous real function or live scene with infinite complexity, to a complete raster image, a JPEG compressed image, a frequency domain, or other basis representations, down to local feature descriptor sets. Note that the more detail that is provided and used from the basis space representation, the better the real scene can be recognized or reconstructed. So the trade-off is to find the best representation or description, in the optimal basis space, to reach the invariance and accuracy goals using the least amount of compute and memory.

Transforms and basis spaces are a vast field within mathematics and signal processing, covered quite well in other works, so here we only introduce common transforms useful for image coding and feature description. We describe their key advantages and applications, and refer the reader to the literature as we go. See Fig. 3.19.

Since we are dealing with discrete pixels in computer vision, we are primarily interested in discrete transforms, especially those which can be accelerated with optimized software or fixed-function hardware. However, we also cover a few integral transform methods that may be slower to compute and less used. Here is an overview:

Global or local feature description. It is possible to use transforms and basis space representations of images as a global feature descriptor, allowing scenes and larger objects to be recognized and compared. The 2D FFT spectrum is only one example, and it is simple to compare FFT spectrum features using SAD or SSD distance measures.
Image coding and compression. Many of the transforms have proved valuable for image coding and image compression. The basic method involves transforming the image, or block regions of the image, into another basis space. For example, transforming blocks of an image into the Fourier domain allows the image regions to be represented as sine and cosine waves. Then, based on the amount of energy in the region, a reduced amount of frequency space components can be stored or coded to represent the image. The energy is mostly contained in the lower-frequency components, which can be observed in the Fourier power spectrum such as shown in Fig. 2.16; the high-frequency components can be discarded and the significant lower-frequency components can be encoded, thus some image compression is achieved with a small loss of detail. Many novel image coding methods exist, such as that using a basis of scaled Laplacian features over an image pyramid [310].

Fourier Description

The Fourier family of transforms was covered in detail in Chap. 2, in the context of image preprocessing and filtering. However, the Fourier frequency components can also be used for feature description. Using the forward Fourier transform, an image is transformed into frequency components, which can be selectively used to describe the transformed pixel region, commonly done for image coding and compression, and for feature description.

The Fourier descriptor provides several invariance attributes, such as rotation and scale. Any array of values can be fed to an FFT to generate a descriptor—for example, a histogram. A common application is illustrated in Fig. 3.20, describing the circularity of a shape and finding the major and minor axis as the extrema frequency deviation from the sine wave. A related application is finding the endpoints of a flat line segment on the perimeter by fitting FFT magnitude’s of the harmonic series as polar coordinates against a straight line in Cartesian space.

In Fig. 3.20, a complex wave is plotted as a dark gray circle unrolled around a sine wave function or a perfect circle. Note that the Fourier transform of the lengths of each point around the complex function yields an approximation of a periodic wave, and the Fourier descriptor of the shape of the complex wave is visible. Another example illustrating Fourier descriptors is shown in Fig. 6.29.

Walsh–Hadamard Transform

The Hadamard transform [4, 9] uses a series of square waves with the value of +1 or −1, which is ideal for digital signal processing. It is amenable to optimizations, since only signed addition is needed to sum the basis vectors, making this transform much faster than sinusoidal basis transforms. The basis vectors for the harmonic Hadamard series and corresponding transform can be generated by sampling Walsh functions, which make up an orthonormal basis set; thus, the combined method is commonly referred to as the Walsh–Hadamaard transform; see Fig. 3.21.

HAAR Transform

The HAAR transform [4, 9] is similar to the Fourier transform, except that the basis vectors are HAAR features resembling square waves, and similar to wavelets. HAAR features, owing to their orthogonal rectangular shapes, are suitable for detecting vertical and horizontal images features that have near- constant gray level. Any structural discontinuities in the data, such as edges and local texture, cannot be resolved very well by the HAAR features; see Figs. 3.21 and 6.21.

Slant Transform

The Slant transform [276], as illustrated in Fig. 3.21, was originally developed for television signal encoding, and was later applied to general image coding [4, 275]. The Slant transform is analogous to the Fourier transform, except that the basis functions are a series of slant, sawtooth, or triangle waves. The slant basis vector is suitable for applications where image brightness changes linearly over the length of the function. The slant transform is amenable to discrete optimizations in digital systems. Although the primary applications have been image coding and image compression, the slant transform is amenable to feature description. It is closely related to the Karhunen–Loeve transform and the Slant–Hadamard transform [494].

Zernike Polynomials

Fritz Zernike, 1953 Nobel Prize winner, devised Zernike polynomials during his quest to develop the phase contrast microscope, while studying the optical properties and spectra of diffraction gratings. The Zernike polynomials [264–266] have been widely used for optical analysis and modeling of the human visual system, and for assistance in medical procedures such as laser surgery. They provide an accurate model of optical wave aberrations expressed as a set of basis polynomials, illustrated in Fig. 3.22.

Zernike polynomials are analogous to steerable filters [370], which also contain oriented basis sets of filter shapes used to identify oriented features and take moments to create descriptors. The Zernike model uses radial coordinates and circular regions, rather than rectangular patches as used in many other feature description methods.

Zernike methods are widely used in optometry to model human eye aberrations. Zernike moments are also used for image watermarking [270] and image coding and reconstruction [271, 273]. The Zernike features provide scale and rotational invariance, in part due to the radial coordinate symmetry and increasing level of detail possible within the higher-order polynomials. Zernike moments are used in computer vision applications by comparing the Zernike basis features against circular patches in target images [268, 269].

Fast methods to compute the Zernike polynomials and moments exist [267, 272, 274], which exploit the symmetry of the basis functions around the x and y axes to reduce computations, and also to exploit recursion.

Steerable Filters

Steerable filters are loosely considered as basis functions here, and can be used for both filtering or feature description. Conceptually similar to Zernike polynomials, steerable filters [370, 382] are composed by synthesizing steered or oriented linearly combinations of chosen basis functions, such as quadrature pairs of Gaussian filters and oriented versions of each function, in a simple transform.

Many types of filter functions can be used as the basis for steerable filters [371, 373]. The filter transform is created by combining together the basis functions in a filter bank, as shown in Fig. 3.23. Gain is selected for each function, and all filters in the bank are summed, then adaptively applied to the image. Pyramid sets of basis functions can be created to operate over scale. Applications include convolving oriented steerable filters with target image regions to determine filter response strength, orientation and phase. Other applications include filtering images based on orientation of features, contour detection, and feature description.

For feature description, there are several methods that could work—for example, convolving each steerable basis function with an image patch. The highest one or two filter responses or moments from all the steerable filters can then be chosen as the set-ordinal feature descriptor, or all the filter responses can be used as a feature descriptor. As an optimization, an interest point can first be determined in the patch, and the orientation of the interest point can be used to select the one or two steerable filters closest to the orientation of the interest point; then the closest steerable filers are used as the basis to compute the descriptor.

Karhunen–Loeve Transform and Hotelling Transform

The Karhunen–Loeve transform (KLT) [4, 9] was devised to describe a continuous random process as a series expansion, as opposed to the Fourier method of describing periodic signals. Hotelling later devised a discrete equivalent of the KLT using principal components. “KLT” is the most common name referring to both methods.

The basis functions are dependent on the eigenvectors of the underlying image, and computing eigenvectors is a compute-intensive process with no established fast transform known. The KLT is not separable to optimize over image blocks, so the KLT is typically used for PCA on small datasets such as feature vectors used in pattern classification, clustering, and matching.

Wavelet Transform and Gabor Filters

Wavelets, as the name suggests, are short waves or wave-lets [326]. Think of a wavelet as a short-duration pulse such as a seismic tremor, starting and ending at zero, rather than a continuous or resonating wave. Wavelets are convolved with a given signal, such as an image, to find similarity and statistical moments. Wavelets can therefore be implemented like convolution kernels in the spatial domain. See Fig. 3.24.

Wavelet analysis is a vast field [283, 284] with many applications and useful resources available, including libraries of wavelet families and analysis software packages [281]. Fast wavelet transforms (FWTs) exist in common signal and image processing libraries. Several variants of the wavelet transform include:

Discrete wavelet transform (DWT)
Stationary wavelet transform (SWT)
Continuous wavelet transform (CWT)
Lifting wavelet transform (LWT)
Stationary wavelet packet transform (SWPT)
Discrete wavelet packet transform (DWPT)
Fractional Fourier transform (FRFT)
Fractional wavelet transform (FRWT)

Wavelets are designed to meet various goals, and are crafted for specific applications; there is no single wavelet function or basis. For example, a set of wavelets can be designed to represent the musical scale, where each note (such as middle C) is defined as having a duration of an eighth note wavelet pulse, and then each wavelet in the set is convolved across a signal to locate the corresponding notes in the musical scale.

When designing wavelets, the mother wavelet is the basis of the wavelet family, and then daughter wavelets are derived using translation, scaling, or compression of the mother wavelet. Ideally, a set of wavelets are overlapping and complementary so as to decompose data with no gaps and be mathematically reversible.

Wavelets are used in transforms as a set of nonlinear basis functions, where each basis function can be designed as needed to optimally match a desired feature in the input function. So, unlike transforms which use a uniform set of basis functions—as the Fourier transform uses sine and cosine functions—wavelets use a dynamic set of basis functions that are complex and nonuniform in nature. See Fig. 3.25.

Wavelets have been used as the basis for scale and rotation invariant feature description [280], image segmentation [277, 278], shape description [279], and obviously image and signal filtering of all the expected varieties, denoising, image compression, and image coding. A set of application-specific wavelets could be devised for feature description.

Gabor Functions

Wavelets can be considered an extension of the earlier concept of Gabor functions [285, 325], which can be derived for imaging applications as a set of 2D oriented bandpass filters. Gabor’s work was centered on the physical transmission of sound and problems with Fourier methods involving time-varying signals like sirens that could not be perfectly represented as periodic frequency information. Gabor proposed a more compact representation than Fourier analysis could provide, using a concept called atoms that recorded coefficients of the sound that could be transmitted more compactly. See Fig. 3.26.

Hough Transform and Radon Transform

The Hough transform [220–222] and the Radon transform [291] are related, and the results are equivalent, in the opinion of many [287, 292]; see Fig. 3.27. The Radon transform is an integral transform, while the Hough transform is a discrete method, therefore much faster. The Hough method is widely used in image processing, and can be accelerated using a GPU [290] with data parallel methods. The Radon algorithm is slightly more accurate and perhaps more mathematically sound, and is often associated with X-ray tomography applied to reconstruction from X-ray projections. We focus primarily on the Hough transform, since it is widely available in image processing libraries.

Key applications for the Hough and Radon transforms are shape detection and shape description of lines, circles, and parametric curves. The main advantages include:

Robust to noise and partial occlusion
Fill gaps in apparent lines, edges, and curves
Can be parameterized to handle various edge and curve shapes

The disadvantages include:

Look for one type or parameterization of a feature at a time, such as a line
Colinear segments are not distinguished and lumped together
May incorrectly fill in gaps and link edges that are not connected
Length and position of lines are not determined, but this can be done in image space

The Hough transform is primarily a global or regional descriptor and operates over larger areas. It was originally devised to detect lines, and has been subsequently generalized to detect parametric shapes [293], such as curves and circles. However, adding more parameterization to the feature requires more memory and compute. Hough features can be used to mark region boundaries described by regular parametric curves and lines. The Hough transform is attractive for some applications, since it can tolerate gaps in the lines or curves and is not strongly affected by noise or some occlusion, but morphology and edge detection via other methods is often sufficient, so the Hough transform has limited applications.

The input to the Hough transform is a gradient magnitude image, which has been thresholded, leaving the dominant gradient information. The gradient magnitude is used to build a map revealing all the parameterized features in the image—for example, lines at a given orientation or circles with a given diameter. For example, to detect lines, we map each gradient point in the pixel space into the Hough parameter space, parameterized as a single point (d , θ) corresponding to all lines with orientation angle θ at distance d from the origin. Curve and circle parameterization uses different variables [293]. The parameter space is quantized into cells or accumulator bins, and each accumulator is updated by summing the number of gradient lines passing through the same Hough points. The accumulator method is modified for detecting parametric curves and circles. Thresholding the accumulator space and reprojecting only the highest accumulator values as overlays back onto the image is useful to highlight features.

Summary

This chapter provides a selected history of global and regional metrics, with the treatment of local feature metrics deferred until Chaps. 4 and 6. Some historical context is provided on the development of structural and statistical texture metrics, as well as basis spaces useful for feature description, and several common regional and global metrics. A wide range of topics in texture analysis and statistical analysis are surveyed with applications to computer vision.

Since it is difficult to cleanly partition all the related topics in image processing and computer vision, there is some overlap of topics in here and in Chaps. 2, 4, 5, and 6.

Chapter 3: Learning Assignments

1.
Discuss when to use a global image processing operation vs. a local or regional image processing operation.
2.
Discuss in general how global image statistics can guide image preprocessing for computer vision applications, and specifically name one global image metric and discuss how it can be applied.
3.
Compare global image feature metrics and local feature descriptors in general, and discuss a specific example global feature metric and compare it to a specific local feature descriptor.
4.
Describe global image texture in general terms.
5.
Discuss how a 2d histogram of an image can be used to understand image texture.
6.
Discuss how the 2d Fourier Series of an image is used to understand image texture.
7.
Discuss how the Haralick texture metrics based on the co-occurrence matrix are used to understand image texture.
8.
Discuss how Spatial Dependency Matrix (SDM) plots are used to understand image texture.
9.
Discuss statistical moments of an image histogram, including at least the mean value and variance, and how these features are useful as global image descriptors.
10.
Describe a multi-resolution histogram built from an image pyramid, and how to interpret the results of the histogram.
11.
Describe how a Fourier description of the shape of a circle is created from the Fourier Series, and how it is useful as a shape descriptor.
12.
Describe basis features for the HAAR transform, Slant Transform, and Walsh–Hadamard Transform.
13.
Compare Wavelet features to Fourier Series features.
14.
Describe the Hough Transform and the Radon Transform algorithms, and how they are used as a global image metric for shape detection.

Notes

1.
See the NIST online resource for engineering statistics: http://www.itl.nist.gov/div898/handbook/

References

Bajcsy, R.: Computer description of textured surfaces. Int. Conf. Artif. Intell. Stat. (1973)
Google Scholar
Bajcsy, R., Lieberman, L.: Texture gradient as a depth cue. Comput. Graph. Image Process. 5(1), (1976)
Google Scholar
Cross, G.R., Jain, A.K.: Markov random field texture models. PAMI 54(1), (1983)
Google Scholar
Gonzalez, R., Woods, R.: Digital Image Processing, 3rd edn. Prentice-Hall, Englewood Cliffs, NJ (2007)
Google Scholar
Haralick, R.M.: Statistical and structural approaches to texture. Proc. Int. Joint Conf. Pattern Recogn. (1979)
Google Scholar
Haralick, R.M., Shanmugan, R., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), (1973)
Google Scholar
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans. Inform. Theor. 8(2), (1962)
Google Scholar
Lu, H.E., Fu, K.S.: A syntactic approach to texture analysis. Comput. Graph. Image Process. 7(3), (1978)
Google Scholar
Pratt, W.K.: Digital image processing, 3rd edn. Wiley, Hoboken, NJ (2002)
MATH Google Scholar
Rosenfeld, A., Kak, A.C.: Digital picture processing, 2nd edn. Academic Press, New York (1982)
MATH Google Scholar
Tomita, F., Shirai, Y., Tsuji, S.: Description of texture by a structural analysis. Pattern. Anal. Mach. Intell. 4(2), (1982)
Google Scholar
Wong, R.Y., Hall, E. L.: Scene matching with invariant moments. Comput. Graph. Image Process. 8 (1978)
Google Scholar
Guoying, Z., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Trans. Pattern. Anal. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar
Kellokumpu, V., Guoying Z., Pietikäinen, M.: Human activity recognition using a dynamic texture based method
Google Scholar
Guoying, Z., Pietikäinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern. Anal. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar
Eichmann, G., Kasparis, T.: Topologically invariant texture descriptors. Comput. Vis. Graph. Image Process. 41(3), (1988)
Google Scholar
Lam, S.W.C., Ip, H.H.S.: Structural texture segmentation using irregular pyramid. Pattern Recogn. Lett. 15(7), (1994)
Google Scholar
Pietikäinen, M., Guoying, Z., Hadid, A.: Computer Vision Using Local Binary Patterns. Springer, New York (2011)
Book Google Scholar
Ojala, T., Pietikäinen, M., Hardwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. Proc. Int. Conf. Pattern. Recogn. (1994)
Google Scholar
Ojala, T., Pietikäinen, M., Hardwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996)
Google Scholar
Van Ginneken, B., Koenderink, J.J.: Texture histograms as a function of irradiation and viewing direction. Int. J. Comput. Vis. 31(2/3), 169–184 (1999)
Article Google Scholar
Stelu, A., Arati, K., Dong-Hui, X.: Texture analysis for computed tomography studies. Visual Computing Workshop DePaul University, (2004)
Google Scholar
Krig, S.A.: Image texture analysis using spatial dependency matrices. Krig Research White Paper Series, (1994)
Google Scholar
Laws, K.I.: Rapid texture identification. SPIE 238 (1980)
Google Scholar
Bajcsy, R.K.: Computer identification of visual surfaces. Comput. Graph. Image Process. 2(2), 118–130 (1973)
Article Google Scholar
Kaizer, H.: A quantification of textures on aerial photographs. MS Thesis, Boston University, (1955)
Google Scholar
Laws, K.I.: Texture energy measures. Proceedings of the Image Understanding Workshop, (1979)
Google Scholar
Laws, K.I.: Rapid texture identification. SPIE 238 (1980)
Google Scholar
Laws, K.I.: Textured image segmentation. PhD Thesis, University of Southern California, (1980)
Google Scholar
Ade, F.: Characterization of textures by “Eigenfilters.” Signal Process. 5 (1983)
Google Scholar
Davis, L.S.: Computing the spatial structures of cellular texture. Comput. Graph. Image Process. 11(2), (1979)
Google Scholar
Eichmann, G., Kasparis, T.: Topologically invariant texture descriptors. Comput. Vis. Graph. Image Process. 41?(3), (1988)
Google Scholar
Lam, S.W.C., Ip, H.H.S.: Structural texture segmentation using irregular pyramid. Pattern Recogn. Lett. 15(7), (1994)
Google Scholar
Pietikäinen, M., Guoying, Z., Hadid, A.: Computer vision using local binary patterns. Springer, New York (2011)
Book Google Scholar
Ojala, T., Pietikäinen, M., Hardwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. Proc. Int. Conf. Pattern. Recogn. (1994)
Google Scholar
Ojala T., Pietikäinen, M., Hardwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996)
Google Scholar
Pun, C.M., Lee, M.C.: Log-polar wavelet energy signatures for rotation and scale invariant texture classification. Trans. Pattern. Anal. Mach. Intell. 25(5), (2003)
Google Scholar
Spence, A., Robb, M., Timmins, M., Chantler, M.: Real-time per-pixel rendering of textiles for virtual textile catalogues. Proceedings of INTEDEC, Edinburgh, (2003)
Google Scholar
Lam, S.W.C., Horace, H.S.I.: Adaptive pyramid approach to texture segmentation. Comput. Anal. Images Patterns Lect. Notes Comput. Sci. 719, 267–274 (1993)
Google Scholar
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and Texture of Real World Surfaces. Technical Report CUCS-048-96, Columbia University, (1996)
Google Scholar
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and texture of real world surfaces. Conf. Comput. Vis. Pattern Recogn. (1997)
Google Scholar
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and texture of real world surfaces. ACM Trans. Graph. (1999)
Google Scholar
Suzuki, M.T., Yaginuma, Y.: A solid texture analysis based on three dimensional convolution kernels. Proc. SPIE 6491, (2007)
Google Scholar
Suzuki, M.T., Yaginuma, Y., Yamada, T., Shimizu, Y.: A shape feature extraction method based on 3D convolution masks. Eighth IEEE International Symposium on Multimedia, ISM’06. (2006)
Google Scholar
Guoying, Z., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Trans. Pattern. Anal. Mach. Intell. 29 (2007)
Google Scholar
Hadjidemetriou, E., Grossberg, M.D., Nayar, S.K.: Multiresolution histograms and their use for texture classification. IEEE PAMI 26
Google Scholar
Hadjidemetriou, E., Grossberg, M.D., Nayar, S.K.: Multiresolution histograms and their use for recognition. IEEE PAMI 26(7), (2004)
Google Scholar
Lee, K.L., Chen, L.H.: A new method for coarse classification of textures and class weight estimation for texture retrieval. Pattern Recogn. Image Anal. 12(4), (2002)
Google Scholar
Van Ginneken, B., Koenderink, J.J.: Texture histograms as a function of irradiation and viewing direction. Int. J. Comput. Vis. 31(2/3), 169–184 (1999)
Article Google Scholar
Shu, L., Chung, A.C.S.: Texture classification by using advanced local binary patterns and spatial distribution of dominant patterns. ICASSP 2007. IEEE Int. Conf. Acoust. Speech Signal Process. (2007)
Google Scholar
Stelu, A., Arati, K., Dong-Hui, X.:. Texture analysis for computed tomography studies. Visual Computing Workshop DePaul University, (2004)
Google Scholar
Ade, F.: Characterization of textures by “Eigenfilters.” Signal Process. 5 (1983)
Google Scholar
Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Understand. 73(2)
Google Scholar
Russel, B., Jianxiong, X., Torralba, A.: Localizing 3D cuboids in single-view images. Conf. Neural Inform. Process. Syst. (2012)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (SIGGRAPH Proc.) (2006)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. (TBP)
Google Scholar
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. Conf. Comput. Vis. Pattern Recogn. (2010)
Google Scholar
Yunpeng, L., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. Eur. Conf. Comput. Vis. (2012)
Google Scholar
Russell, B., Torralba, A., Murphy, K., Freeman, W.T.: LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77 (2007).
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42 (2001)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. Int. Conf. Robot Autom. (2011)
Google Scholar
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. Conf. Comput. Vis. Pattern Recogn. (2010)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Conf. Comput. Vis. Pattern Recogn. (2004)
Google Scholar
Fei-Fei, L.: ImageNet: crowdsourcing, benchmarking & other cool things. CMU VASC Semin. (2010)
Google Scholar
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. Conf. Comput. Vis. Pattern Recogn. (2012)
Google Scholar
Quattoni, A., Torralba, A.: Recognizing indoor scenes. Conf. Comput. Vis. Pattern Recogn. (2009)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. Int. Conf. Robot Autom. (2011)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. Eur. Conf. Comput. Vis. (2012)
Google Scholar
Xiaofeng R., Philipose, M.: Egocentric recognition of handled objects: benchmark and analysis. CVPR Workshops, (2009)
Google Scholar
Xiaofeng, R., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. Conf. Comput. Vis. Pattern Recogn. (2009)
Google Scholar
Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. Eur. Conf. Comput. Vis. (2012)
Google Scholar
Dana, K.J., van Ginneken, B., Nayar, S.K. Koenderink, J. J.: Reflectance and texture of real world surfaces. Trans. Graph. 18(1), (1999)
Google Scholar
Ce, L., Sharan, L., Adelson, E.H., Rosenholtz, R.: Exploring features in a Bayesian framework for material recognition. Conf. Comput. Vis. Pattern Recogn. (2010)
Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical report 07-49, University of Massachusetts, Amherst, (2007)
Google Scholar
Gross, R., Matthews, I., Cohn, J.F., Kanade, T., Baker, S.: Multi-PIE. Proceedings of the Eighth IEEE International Conference on Automatic Face and Gesture Recognition, (2008)
Google Scholar
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L.J., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. Int. Conf. Comput. Vis. (2011)
Google Scholar
LeCun, Y., Huang, FJ., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. Proc. Conf. Comput. Vis. Pattern Recogn. (2004)
Google Scholar
McCane, B., Novins, K., Crannitch, D., Galvin, B.: On benchmarking optical flow. Comput. Vis. Image Understand. 84(1), (2001)
Google Scholar
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. Conf. Comput. Vis. Pattern Recogn. Provid. Rhode Island. (2012)
Google Scholar
Hamarneh, G., Jassi, P., Tang, L.: Simulation of ground-truth validation data via physically- and statistically-based warps. MICCAI 2008, the 11th International Conference on Medical Image Computing and Computer Assisted Intervention
Google Scholar
Prastawa, M., Bullitt, E., Gerig, G.: Synthetic ground truth for validation of brain tumor MRI segmentation. MICCAI 2005, the 8th International Conference on Medical Image Computing and Computer Assisted Intervention
Google Scholar
Vedaldi, A., Ling, H., Soatto, S.: Knowing a good feature when you see it: ground truth and methodology to evaluate local features for recognition. Comput. Vis. Stud. Comput. Intell. 285, 27–49 (2010)
Google Scholar
Dutagaci, H., Cheung, C.P., Godil, A.: Evaluation of 3D interest point detection techniques via human-generated ground truth. The Visual Computer 28 (2012)
Google Scholar
Rosin, PL.: Augmenting corner descriptors. Graph. Model. Image Process. 58(3), (1996)
Google Scholar
Rockett, P.I.: Performance assessment of feature detection algorithms: a methodology and case study on corner detectors. Trans. Image Process. 12(12), (2003)
Google Scholar
Shahrokni, A., Ellis, A., Ferryman, J.: Overall evaluation of the PETS2009 results. IEEE PETS (2009)
Google Scholar
Over, P., Awad, G., Sanders, G., Shaw, B., Martial, M., Fiscus, J., Kraaij, W., Smeaton, AF.: TRECVID 2013: An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics, NIST USA, (2013)
Google Scholar
Horn, B.K.P., Schunck, B.G.: Determining Optical Flow. AI Memo 572, Massachusetts Institute of Technology, (1980)
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), (2010)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the Wild.” Conf. Comput. Vis. Pattern Recogn. (2009)
Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. Trans. Pattern. Anal. Mach. Intell. 33(5), (2011)
Google Scholar
Fisher, R.B.: PETS04 surveillance ground truth data set. Proc. IEEE PETS. (2004)
Google Scholar
Quan Y., Thangali, A., Ablavsky, V., Sclaroff, S.: Learning a family of detectors via multiplicative kernels. Pattern. Anal. Mach. Intell. 33(3), (2011)
Google Scholar
Ericsson, A., Karlsson, J.: Measures for benchmarking of automatic correspondence algorithms. J. Math. Imaging Vis. (2007)
Google Scholar
Takhar, D., et al.: A new compressive imaging camera architecture using optical-domain compression. In: Proceedings of IS&T/SPIE Symposium on Electronic Imaging (2006)
Google Scholar
Marco, F.D., Baraniuk, R.G.: Kronecker compressive sensing. IEEE Trans. Image Process. 21(2), (2012)
Google Scholar
Weinzaepfel, P., Jegou, H., Perez, P.: Reconstructing an image from its local descriptors. Conf. Comput. Vis. Pattern Recogn. (2011)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Conf. Comput. Vis. Pattern Recogn. (2005)
Google Scholar
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Graph. Vis. 3(3), 177–280 (2007)
Article Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Fischler, M.A., Bolles, RC.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), (1981)
Google Scholar
Sunglok, C., Kim, T., Yu, W.: Performance evaluation of RANSAC family. Br. Mach. Vis. Assoc. (2009)
Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-means clustering algorithm. J. Royal Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
MATH Google Scholar
Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik 133 (1908)
Google Scholar
Capel, D.: Random forests and ferns. Penn. State University Computer Vision Laboratory, seminar lecture notes online:. ForestsAndFernsTalk.pdf.
Google Scholar
Xiaofeng, R., Malik, J.: Learning a classification model for segmentation
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information
Google Scholar
Xiaofeng, R., Ramanan, D.: Histograms of sparse codes for object detection. Conf. Comput. Vis. Pattern Recogn. (2013)
Google Scholar
Liefeng, B., Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. Conf. Comput. Vis. Pattern Recogn. (2013)
Google Scholar
Herbst, E., Ren, X., Fox, D.: RGB-D flow: dense 3-D motion estimation using color and depth. IEEE Int. Conf. Robot Autom. (ICRA) (2013)
Google Scholar
Xiaofeng, R., Bo, L.: Discriminatively trained sparse code gradients for contour detection. Conf. Neural Inform. Process. Syst. (2012)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. ICCV ’11 Proceedings of the 2011 International Conference on Computer Vision
Google Scholar
Rosenfeld, A., Pfaltz, J.L.: Distance functions on digital images. Pattern Recog. 1, 33–61 (1968)
Article Google Scholar
Richardson, A., Olson, E.: Learning convolutional filters for interest point detection. IEEE Int. Conf. Robot Autom. ICRA’13 IEEE, 631–637, (2013)
Google Scholar
Moon, T.K., Stirling, W.C.: Mathematical Methods and Algorithms for Signal Processing. Prentice-Hall, Englewood Cliffs, NJ (1999)
Google Scholar
Liefeng, B, Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. Conf. Comput. Vis. Pattern Recogn. (2013)
Google Scholar
Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. Conf. Comput. Vis. Pattern Recogn. (2013)
Google Scholar
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996)
Article Google Scholar
d’Angelo, E., Alahi, A., Vandergheynst, P.: Beyond bits: reconstructing images from local binary descriptors. Swiss Federal Institute of Technology, 21st International Conference on Pattern Recognition (ICPR), (2012)
Google Scholar
Dengsheng, Z., Lu, G.: Review of shape representation and description techniques. J. Pattern Recogn. Soc. 37, 1–19 (2004)
Article Google Scholar
Yang M., Kidiyo, K., Joseph, R.: A survey of shape feature extraction techniques. Pattern Recogn. 43–90, (2008)
Google Scholar
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. Conf. Comput. Vis. Pattern Recogn. (2012)
Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. Int. Conf. Comput. Vis. (2011)
Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. ECCV’10 Proceedings of the 11th European Conference Computer Vision: Part IV, (2010)
Google Scholar
Calonder, M., et al.: BRIEF: computing a local binary descriptor very fast. Pattern. Anal. Mach. Intell. 34 (2012)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. ICCV ’11 Proceedings of the 2011 International Conference on Computer Vision, (2011)
Google Scholar
von Hundelshausen, F., Sukthankar, R.: D-Nets: beyond patch-based image descriptors. Conf. Comput. Vis. Pattern Recogn. (2012)
Google Scholar
Krig, S.: RFAN radial fan descriptors. Picture Center Imaging and Visualization System, White Paper Series (1992)
Google Scholar
Krig, S.: Picture Center Imaging and Visualization System. Krig Research White Paper Series (1994)
Google Scholar
Rosten, E., Drummond, T.: FAST machine learning for high-speed corner detection. Eur. Conf. Comput. Vis. (2006)
Google Scholar
Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. Int. Conf. Comput. Vis. (2005)
Google Scholar
Liefeng, B., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. Conf. Neural Inform. Process. Syst. (2011)
Google Scholar
Miksik, O., Mikolajczyk, K.: Evaluation of local detectors and descriptors for fast feature matching. Int. Conf. Pattern. Recogn. (2012)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Gleason, J.: BRISK (Presentation by Josh Gleason) at International Conference on Computer Vision, (2011)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. Pattern. Anal. Mach. Intell. IEEE Trans. 27(10), (2005)
Google Scholar
Gauglitz, S., Höllerer, T., Turk, M.: Evaluation of interest point detectors and feature descriptors for visual tracking. Int. J. Comput. Vis. 94(3), (2011)
Google Scholar
Viola, Jones. Robust real time face detection. Int. J. Comput. Vis. 57(2), (2004)
Google Scholar
Thevenaz, P., Ruttimann, U.E., Unser, M.: A pyramid approach to subpixel registration based on intensity. IEEE Trans. Image Process. 7(1), (1998)
Google Scholar
Qi, T., Huhns, M.N.: Algorithms for subpixel registration. Comput. Vis. Graph. Image Process. 35 (1986)
Google Scholar
Zhu, J., Yang, L.: Subpixel eye gaze tracking. Autom. Face Gesture Recogn. Conf. (2002)
Google Scholar
Cheezum, M.K., Walker, W.F., Guilford, W.H.: Quantitative comparison of algorithms for tracking single fluorescent particles. Biophys. J. 81(4), 2378–2388 (2001)
Article Google Scholar
Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algorithms. Opt. Lett. 33(2), 156–158 (2008)
Article Google Scholar
Hadjidemetriou, E., Grossberg, M.D., Nayar, S.K.: Multiresolution histograms and their use for texture classification. Int. Workshop Texture Anal. Synth. 26(7), (2003)
Google Scholar
Mikolajczyk, K., et al.: A comparison of affine region detectors. Conf. Comput. Vis. Pattern Recogn. (2006)
Google Scholar
Canny, A.: Computational approach to edge detection. Trans. Pattern. Anal. Mach. Intell. 8(6), (1986)
Google Scholar
Gunn, S.R.: Edge detection error in the discrete Laplacian of Gaussian. International Conference on Image Processing, ICIP 98. Proceedings. vol 2, (1998)
Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. Proceedings of the 4th Alvey Vision Conference, (1988)
Google Scholar
Shi, J., Tomasi, C.: Good features to track. Conf. Comput. Vis. Pattern Recogn. (1994)
Google Scholar
Haja, A., Jahne, B., Abraham, S.: Localization accuracy of region detectors. IEEE CVPR (2008)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Understand. 110(3), 346–359 (2008)
Article Google Scholar
Lowe, D.G.: SIFT distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. Eur. Conf. Comput. Vis. (2004)
Google Scholar
Kadir, T., Brady, J.M.: Scale, saliency and image description. Int. J. Comput. Vis. 45(2), 83–105 (2001)
Article MATH Google Scholar
Smith, S.M., Michael Brady, J.: SUSAN—a new approach to low level image processing. Technical report TR95SMS1c (patented), Crown Copyright (1995), Defence Research Agency, UK, (1995)
Google Scholar
Smith, S.M., Michael Brady, J.: SUSAN—a new approach to low level image processing. Int. J. Comput. Vis. Arch. 23(1), 45–78 (1997)
Article Google Scholar
Baohua, Y., Cao, H., Chu, J.: Combining local binary pattern and local phase quantization for face recognition. Int. Symp. Biometr. Secur. Technol. (2012)
Google Scholar
Ojansivu, V., Heikkil, J.: Blur insensitive texture classification using local phase quantization. Proc. Image Signal Process. (2008)
Google Scholar
Chan, C.H., Tahir, M.A., Kittler, J., Pietikäinen, M.: Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors. PAMI (2012)
Google Scholar
Ojala, T., Pietikäinen, M., Hardwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. Proc. Int. Conf. Pattern. Recogn. (1994)
Google Scholar
Ojala, T., Pietikäinen, M., Hardwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996)
Google Scholar
Pietikäinen, M., Heikkilä, J.: Tutorial on image and video description with local binary pattern variants. Conf. Comput. Vis. Pattern Recogn. (2011)
Google Scholar
Shu, L., Albert, C.S.: Chung. Texture classification by using advanced local binary patterns and spatial distribution of dominant patterns. IEEE Int. Conf. Acoust. Speech Signal Process. ICASSP, (2007)
Google Scholar
Pietikäinen, M., Hadid, A., Zhao, G., Ahonen, T.: Computer Vision Using Binary Patterns. Computational Imaging and Vision Series, vol. 40. Springer, New York (2011)
Book Google Scholar
Arandjelovi, A., Zisserman, A.: Three things everyone should know to improve object retrieval. Conf. Comput. Vis. Pattern Recogn. (2011)
Google Scholar
Guoying Z., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern. Anal. Mach. Intell. IEEE Trans. 29(6), (2007)
Google Scholar
Kellokumpu, V., Guoying Z., Pietikäinen, M.: Human activity recognition using a dynamic texture based method. Br. Mach. Vis. Conf. (2008)
Google Scholar
Zabih, R., Woodfill, J.: Nonparametric local transforms for computing visual correspondence. Eur. Conf. Comput. Vis. (1994)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. The Proceedings of the Seventh IEEE International Conference on Computer Vision, (1999)
Google Scholar
Abdel-Hakim, A.E., Farag, A.A.: CSIFT: a SIFT descriptor with color invariant characteristics. Conf. Comput. Vis. Pattern Recogn. (2006)
Google Scholar
Vinukonda, P.: A study of the scale-invariant feature transform on a parallel pipeline. Thesis Project
Google Scholar
Alcantarilla, P.F., Bergasa, L.M., Davison, A.: Gauge-SURF Descriptors. Elsevier, (2011)
Google Scholar
Christopher, E.: Notes on the OpenSURF Library, University of Bristol Technical Paper, (2009)
Google Scholar
Yan, K., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. Conf. Comput. Vis. Pattern Recogn. (2004)
Google Scholar
Gauglitz, S., Höllerer, T., Turka, M.: Evaluation of interest point detectors and feature descriptors for visual tracking. Int. J. Comput. Vis. 94 (2011)
Google Scholar
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vis. 57(2), 137–154 (2002)
Article Google Scholar
Grigorescu, S.E., Petkov, N., Kruizinga, P.: Comparison of texture features based on Gabor filters. IEEE Trans. Image Process. 11(10), (2002)
Google Scholar
Alcantarilla, P., Bergasa, L.M., Davison, A.: Gauge-SURF descriptors. Image Vis. Comput. 31(1), 103–116 (2013). Elsevier via DOI 1302
Article Google Scholar
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Morse, B.S.: Lecture 11: Differential Geometry. Brigham Young University, (1998/2000). http://morse.cs.byu.edu/650/lectures/lect10/diffgeom.pdf
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. CIVR ’07 Proceedings of the 6th ACM International Conference on Image and Video Retrieval
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Article MATH Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. (2001)
Google Scholar
Matas, J., Chum, O., Urba, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. Proc. Br. Mach. Vis. Conf. (2002)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. ACM Proceedings of the 15th International Conference on Multimedia, pp. 357–360, (2007)
Google Scholar
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. Br. Mach. Vis. Conf. (2008)
Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64 (2005)
Google Scholar
Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. Conf. Comput. Vis. Pattern Recogn. (2013)
Google Scholar
Ke, Y., et al.: Efficient visual event detection using volumetric features. Int. Conf. Comput. Vis. (2005)
Google Scholar
Zhang, L., da Fonseca, M.J., Ferreira, A.: Survey on 3D shape descriptors. União Europeia—Fundos Estruturais Governo da República Portuguesa Referência: POSC/EIA/59938/2004
Google Scholar
Tangelder, J.W.H., Veltkamp, R.C.: A Survey of Contrent-Based 3D Shape Retrieval Methods. Springer, New York (2007)
Google Scholar
Heikkila, M., Pietikäinen, M., Schmid, C.: Description of interest regions with center-symmetric local binary patterns. Comput. Vis. Graph. Image Process. Lect. Notes Comput. Sci. 4338, 58–69 (2006)
MATH Google Scholar
Schmidt, A., Kraft, M., Fularz, M., Domagała, Z.: The comparison of point feature detectors and descriptors in the context of robot navigation. Workshop on Perception for Mobile Robots Autonomy, (2012)
Google Scholar
Jun, B., Kim, D.: Robust face detection using local gradient patterns and evidence accumulation. Pattern Recogn. 45(9), 3304–3316 (2012)
Article Google Scholar
Froba, B., Ernst, A.: Face detection with the modified census transform. Int. Conf. Autom. Face Gesture Recogn. (2004)
Google Scholar
Freeman, H. On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. (1961)
Google Scholar
Salem, A.B.M., Sewisy, A.A., Elyan, U.A.: A vertex chain code approach for image recognition. Int. J. Graph. Vis. Image Process. ICGST-GVIP, (2005)
Google Scholar
Kitchen, L., Rosenfeld, A.: Gray-level corner detection. Pattern Recogn. Lett. 1 (1992)
Google Scholar
Koenderink, J., Richards, W.: Two-dimensional curvature operators. J. Opt. Soc. Am. 5(7), 1136–1141 (1988)
Article MathSciNet Google Scholar
Bretzner, L., Lindeberg, T.: Feature tracking with automatic selection of spatial scales. Comput. Vis. Image Understand. 71(3), 385–392 (1998)
Article Google Scholar
Lindeberg, T.: Junction detection with automatic selection of detection scales and localization scales. Proceedings of First International Conference on Image Processing, (1994)
Google Scholar
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 79–116 (1998)
Article Google Scholar
Wang, H., Brady, M.: Real-time corner detection algorithm for motion estimation. Image Vis. Comput. 13(9), 695–703 (1995)
Article Google Scholar
Trajkovic, M., Hedley, M.: Fast corner detection. Image Vis. Comput. 16(2), 75–87 (1998)
Article Google Scholar
Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide baseline stereo. PAMI 32(5), (2010)
Google Scholar
Arbeiter, G., et al.: Evaluation of 3D feature descriptors for classification of surface geometries in point clouds. Int. Conf. Intell. Robots Syst. (2012) IEEE/RSJ
Google Scholar
Rupell, A., Weisshardt, F., Verl, A.: A rotation invariant feature descriptor O-DAISY and its FPGA implementation. IROS (2011)
Google Scholar
Ambai, M., Yoshida, Y.: CARD: compact and real-time descriptors. Int. Conf. Comput. Vis. (2011)
Google Scholar
Takacs, G., et al.: Unified real-time tracking and recognition with rotation-invariant fast features. Conf. Comput. Vis. Pattern Recogn. (2010)
Google Scholar
Taylor, S., Rosten, E., Drummond, T.: Robust feature matching in 2.3 μs. Conf. Comput. Vis. Pattern Recogn. (2009)
Google Scholar
Grauman, K., Darrell, T.: The pyramid Match Kernel: discriminative classification with sets of image features. IEEE Int. Conf. Comput. Vis. Tenth 2, (2005)
Google Scholar
Takacs, G., et al.: Unified real-time tracking and recognition with rotation-invariant fast features. Conf. Comput. Vis. Pattern Recogn. (2010)
Google Scholar
Chandrasekhar, V., et al.: CHoG: compressed histogram of gradients, a low bitrate descriptor. Conf. Comput. Vis. Pattern Recogn. (2009)
Google Scholar
Mainali, G.L., et al.: SIFER: scale-invariant feature detector with error resilience. Int. J. Comput. Vis. (2013)
Google Scholar
Fowers, S.G., Lee, D.J., Ventura, D., Wilde, D.K.: A novel, efficient, tree-based descriptor and matching algorithm (BASIS). Conf. Comput. Vis. Pattern Recogn. (2012)
Google Scholar
Fowers, S.G., Lee, D.J., Ventura, D.A., Archibald, J. K.: Nature inspired BASIS feature descriptor and its hardware implementation. IEEE Trans. Circ. Syst. Video Technol. (2012)
Google Scholar
Bracewell, R.: The Fourier Transform & Its Applications, 3 ed., McGraw-Hill Science/Engineering/Math, (1999)
Google Scholar
Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM. (1972)
Google Scholar
Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), (1981)
Google Scholar
Illingsworth, J., Kitter, K.: A survey of the Hough transform. Comput. Vis Graph. Image Process. (1988)
Google Scholar
Slaton, G., MacGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. (2008)
Google Scholar
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. Eur. Conf. Comput. Vis. (2006)
Google Scholar
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of key-points. SLCV workshop, Eur. Conf. Comput. Vis. (2004)
Google Scholar
Dean, T., Washington, R., Corrado, G.: Sparse spatiotemporal coding for activity recognition. Brown Univ. Tech. Rep. (2010)
Google Scholar
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. Conf. Comput. Vis. Pattern Recogn. (2011)
Google Scholar
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Article Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Matching with shape context. CBAIVL ’00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. Conf. Neural Inform. Process. Syst. (2000)
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24(4), (2002)
Google Scholar
Belongie, S., Malik, J., Puzich, J.: Matching shapes with shape context. CBAIVL ’00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries
Google Scholar
Liefeng, B., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. ISER, vol 88 of Springer Tracts in Advanced Robotics. Springer, pp. 387–402, (2012)
Google Scholar
Loy, G., Zelinsky, A.: A fast radial symmetry transform for detecting points of interest. Eur. Conf. Comput. Vis. (2002)
Google Scholar
Wolf, L., Hassner, T., Taigman, Y.: Descriptor based methods in the wild. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Kurz, D., Ben Himane, S.: Inertial sensor-aligned visual feature descriptors. Conf. Comput. Vis. Pattern Recogn. (2011)
Google Scholar
Kingsbury, N.: Rotation-invariant local feature matching with complex wavelets. Proc. Eur. Conf. Signal Process. (EUSIPCO), (2006)
Google Scholar
Dinggang, S., Ip, H.H.S.: Discriminative wavelet shape descriptors for recognition of 2-D patterns. Pattern Recogn. 32(2), 151–165 (1999)
Article Google Scholar
Edelman, S., Intrator, N., Poggio, T.: Complex cells and object recognition. Conf. Neural Inform. Process. Syst. (1997)
Google Scholar
Hunt, R.W.G., Pointer, M.R.: Measuring Colour. Wiley, Hoboken, NJ (2011)
Book Google Scholar
Hunt, R.W.G.: The reproduction of color, 6 ed., Wiley, (2004)
Google Scholar
Berns, R.S.: Billmeyer and Saltzman’s Principles of Color Technology. Wiley, Hoboken, NJ (2000)
Google Scholar
Morovic, J.: Color Gamut Mapping. Wiley, Hoboken, NJ (2008)
Book Google Scholar
Fairchild, M.: Color appearance models. 1st ed., Addison Wesley Longman, (1998)
Google Scholar
Ito, M., Tsubai, M., Nomura, A.: Morphological operations by locally variable structuring elements and their applications to region extraction in ultrasound images. Syst. Comput. Jpn. 34(3), 33–43 (2003)
Article Google Scholar
Tsubai, M., Ito, M.: Control of variable structure elements in adaptive mathematical morphology for boundary enhancement of ultrasound images. Electron. Commun. Jpn. Part 3 Fund. Electron. Sci. 87(11), 20–33
Google Scholar
Mazille, J.E.: Mathematical morphology and convolutions. J. Microsc. 156, 257 (1989)
Article Google Scholar
Achanta, R., et al.: SLIC superpixels compared to state-of-the-art superpixel methods. PAMI 34(11), (2012)
Google Scholar
Achanta, R., et al.: SLIC superpixels. EPFL technical report no. 149300, (2010)
Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. J. Comput. Vis. (2004)
Google Scholar
Levinshtein, A., et al.: Turbopixels: fast superpixels using geometric flows. PAMI (2009)
Google Scholar
Lucchi, A., et al.: A fully automated approach to segmentation of irregularly shaped cellular structures in EM images. MICCAI (2010)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI (2000)
Google Scholar
Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Article Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59 (2004)
Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. PAMI 24(5), (2002)
Google Scholar
Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. PAMI 13(6), (1991)
Google Scholar
Levinshtein, A., et al.: Turbopixels: fast superpixels using geometric flows. PAMI 31(12), (2009)
Google Scholar
Scharstein, D., Pal, C.: Learning conditional random fields for stereo. Conf. Comput. Vis. Pattern Recogn. (2007)
Google Scholar
Hirschmüller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. Conf. Comput. Vis. Pattern Recogn. (2007)
Google Scholar
Goodman, J.W.: Introduction to Fourier optics. McGraw-Hill, New York (1968)
Google Scholar
Gaskill, J.D.: Linear Systems, Fourier Transforms, Optics. Wiley, Hoboken, NJ (1978)
Google Scholar
Thibos, L., Applegate, R.A., Schweigerling, J.T., Webb, R.: Standards for reporting the optical aberrations of eyes. In: Lakshminarayanan, V. (ed.) OSA Trends in Optics and Photonics, Vision Science and its Applications. Optical Society of America, Washington, DC (2000)
Google Scholar
Hwang, S.-K., Kim, W.-Y.: A novel approach to the fast computation of Zernike moments. Pattern Recogn. 39 (2006)
Google Scholar
Khotanzad, A., Hong, Y.H.: Invariant image recognition by Zernike moments. PAMI 12 (1990)
Google Scholar
Chao Kan, M., Srinath, D.: Invariant character recognition with Zernike and orthogonal Fourier-Mellin moments. Pattern Recogn. 35, (2002)
Google Scholar
Hyung, S.K., Lee, H.-K.: Invariant image watermark using Zernike moments. IEEE Trans. Circ. Syst. Video Technol. 13(8), (2003)
Google Scholar
Papakostas, G.A., Karras, D.A., Mertzios, B.G.: Image coding using a wavelet based Zernike moments compression technique. In: Proceeding of: Digital Signal Processing, vol 2, DSP, (2002)
Google Scholar
Mukundan, R., Ramakrishnan, K.R.: Fast computation of Legendre and Zernike moments. 28(9), 1433–1442, (1995)
Google Scholar
Yongqing, X., Pawlak, M., Liao, S.: Image reconstruction with polar Zernike moments. ICAPR’05 Proceedings of the Third International Conference on Pattern Recognition and Image Analysis—Volume Part II (2005)
Google Scholar
Singh, C., Upneja, R.: Fast and accurate method for high order Zernike moments computation. Appl. Math. Comput. 218(15), 7759–7773 (2012)
MathSciNet MATH Google Scholar
Pratt, W., Chen, W.-H., Welch, L.: Slant transform image coding. IEEE Trans. Commun. 22(8), (1974)
Google Scholar
Enomoto, H., Shibata, K.: Orthogonal transform coding system for television signals. IEEE Trans. Electromagn. Compatibil. 13(3), (1974)
Google Scholar
Dutra da Silva, R., Robson, W., Pedrini Schwartz, H.: Image segmentation based on wavelet feature descriptor and dimensionality reduction applied to remote sensing. Chilean J. Stat. 2 (2011)
Google Scholar
Arun, N., Kumar, M., Sathidevi, P.S.: Wavelet SIFT feature descriptors for robust face recognition. Springer Adv. Intell. Syst. Comput. 177 (2013)
Google Scholar
Dinggang, S., Ip, H.H.S.: Discriminative wavelet shape descriptors for recognition of 2-D patterns. Pattern Recogn. 32 (1999)
Google Scholar
Kingsbury, N.: Rotation-invariant local feature matching with complex wavelets. Proc. Eur. Conf. Signal Process. EUSIPCO (2006)
Google Scholar
Wolfram Research Mathematica Wavelet Analysis Libraries
Google Scholar
Strang, G.: “Wavelets.” Am. Sci. 82(3), (1994)
Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed., Elsevier, (2008)
Google Scholar
Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis. Cambridge University Press, Cambridge (2006)
MATH Google Scholar
Gabor, D.: Theory of communication. J. IEE. 93 (1946)
Google Scholar
Minor, L.G., Sklansky, J.: Detection and segmentation of blobs in infrared images. IEEE Trans. Syst. Man Cyberneteics. 11(3), (1981)
Google Scholar
van Ginkel, M., Luengo Hendriks, C.K., van Vliet, L. J.: A short introduction to the Radon and Hough transforms and how they relate to each other. Number QI-2004-01 in the Quantitative Imageing Group Technical Report Series (2004)
Google Scholar
Toft, P.A.: Using the generalized Radon transform for detection of curves in noisy images. 1996 I.E. International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96. Conference Proceedings, vol 4, (1996)
Google Scholar
Radon, J.: Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Mannigfaltigkeiten. Berichte Sächsische Akademie der Wissenschaften, Leipzig, Mathematisch-Physikalische Klasse 69 (1917)
Google Scholar
Fung, J., Mann, S., Aimone, C.: OpenVIDIA: parallel GPU computer vision. Proc. ACM Multimed. (2005)
Google Scholar
Bazin, M.J., Benoit, J.W.: Off-line global approach to pattern recognition for bubble chamber pictures. Trans. Nuclear Sci. 12 (1965)
Google Scholar
Deans, S.R.: Hough transform from the Radon transform. Trans. Pattern. Anal. Mach. Intell. 3(2), 185–188 (1981)
Article Google Scholar
Rosenfeld, A.: Digial Picture Processing by Computer. Academic Press, New York (1982)
Google Scholar
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. ICCV ’98 Proceedings of the Sixth International Conference on Computer Vision (1998)
Google Scholar
See the documentation for the ImageJ, ImageJ2 or Fiji software package for complete references to each method, [global] Auto Threshold command and Auto Local Threshold command. http://fiji.sc/ImageJ2
Garg, R., Mittal, B., Garg, S.: Histogram equalization techniques for image enhancement. Int. J. Electron. Commun. Technol. 2 (2011)
Google Scholar
Sung, A.P., Wang, C.: Spatial-temporal antialiasing. Trans. Visual. Comput. Graph. 8 (2002)
Google Scholar
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60 (2004)
Google Scholar
Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. PAMI 32 (2010)
Google Scholar
Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. CIVR 2004, In: Proceedings of the Challenge of Image and Video Retrieval, London, LNCS 2383
Google Scholar
Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. Conf. Comput. Vis. Pattern Recogn. (2008)
Google Scholar
Davis, L.S.: Computing the spatial structures of cellular texture. Comput. Graph. Image Process. 11(2), (1979)
Google Scholar
Pun, C.M., Lee, M.C.: Log-polar wavelet energy signatures for rotation and scale invariant texture classification. Trans. Pattern. Anal. Mach. Intell. 25(5), (2003)
Google Scholar
Spence, A., Robb, M., Timmins, M., Chantler, M.: Real-time per-pixel rendering of textiles for virtual textile catalogues. Proc. INTEDEC. (2003)
Google Scholar
Lam, S.W.C., Ip, H.H.S.: Adaptive pyramid approach to texture segmentation. Comput. Anal. Images Patterns Lect. Notes Comput. Sci. 719, 267–274 (1993)
Google Scholar
Yinpeng J., Fayad, L., Laine, A.: Contrast enhancement by multi-scale adaptive histogram equalization. Proc. SPIE. 4478 (2001)
Google Scholar
Jianguo, Z., Tan, T.: Brief review of invariant texture analysis methods. Pattern Recogn. 35 (2002)
Google Scholar
Tomita, F., Shirai, Y., Tsuji, S.: Description of textures by a structural analysis. IEEE Trans. Pattern. Anal. Mach. Intell. Arch. 4 (1982)
Google Scholar
Tomita, F., Tsuji, S.: Computer Analysis of Visual Textures. Springer, New York (1990)
Book MATH Google Scholar
Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. (1983)
Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article MathSciNet Google Scholar
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. SPIE J. Electron. Imaging (2004)
Google Scholar
Haralick, R.M., Shapiro, L.G.: Image segmentation techniques. Comput. Vis. Graph. Image Process. 29, 100–132 (1985)
Article Google Scholar
Raja, Y., Gong, S.: Sparse multiscale local binary patterns. Br. Mach. Vis. Conf. (2006)
Google Scholar
Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5 (2004)
Google Scholar
Szelinski, R.: Computer Vision, Algorithms and Applications. Springer, New York (2011)
Google Scholar
Pratt, W.K.: Digital Image Processing: PIKS Scientific Inside. 4 ed., Wiley-Interscience, (2007)
Google Scholar
Russ, J.C.: The Image Processing Handbook, 5 ed., CRC Press, (2006)
Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. IMAR. (2007)
Google Scholar
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. ISMAR ’11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality (2011)
Google Scholar
Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. ACM Symp. User Interf. Software Technol. (2011)
Google Scholar
Moravec, H.: Obstacle avoidance and navigation in the real world by a seeing robot rover. Tech Report CMU-RI-TR-3, Robotics Institute, Carnegie-Mellon University, (1980)
Google Scholar
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. Int. Conf. Comput. Vis. (2001)
Google Scholar
Turcot, P., Lowe, D.G.: Better matching with fewer features: the selection of useful features in large database recognition problems. Int. Conf. Comput. Vis. (2009)
Google Scholar
Feichtinger, H.G., Strohmer, T.: Gabor Analysis and Algorithms, 1997 ed., Birkhäuser, (1997)
Google Scholar
Ricker, N.: Wavelet contraction, wavelet expansion, and the control of seismic resolution. Geophysics 18, 769–792 (1953)
Article Google Scholar
Goshtasby, A.: Description and discrimination of planar shapes using shape matrices. PAMI 7(6), (1985)
Google Scholar
Vapnik, V.N., Levin, E., LeCun, Y.: Measuring the dimension of a learning machine. Neural Comput. 6(5), 851–876 (1994)
Article Google Scholar
Cowan, J. D., Tesauro, G., Alspector, J.: Learning curves: asymptotic values and rate of convergence. Adv. Neural Inform. Process. 6 (1994)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition: intelligent signal processing. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, E.: ImageNet classification with deep convolutional neural networks. Conf. Neural Inform. Process. Syst. (2012)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. COLT ’92 Proceedings of the Fifth Annual Workshop on Computational Learning Theory, (1992)
Google Scholar
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20 (1995)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Kluwer Data Mining Discov. 2 (1998)
Google Scholar
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: large displacement optical flow with deep matching. Int. Conf. Comput. Vis. (2013)
Google Scholar
Keysers, T.C., Gollan, D., Ney, H.: Deformation models for image recognition. Trans. PAMI 20 (2007)
Google Scholar
Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. Conf. Comput. Vis. Pattern Recogn. (2013)
Google Scholar
Boureau, Y.-L., Ponce, J., LeCu, Y.: A theoretical analysis of feature pooling in visual recognition. IML, 27th International Conference on Machine Learning, Haifa, Israel, (2010)
Google Scholar
Schmid, C., Mohr, R.: Object recognition using local characterization and semi-local constraints. PAMI 19(3), (1997)
Google Scholar
Ferrari, V., Tuytelaars, T., Gool, L.V.: Simultaneous object recognition and segmentation from single or multiple model views. Int. J. Comput. Vis. 67 (2005)
Google Scholar
Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. CIVR. (2002)
Google Scholar
Estivill-Castro, V.: Why so many clustering algorithms—a position paper. ACM SIGKDD Explor. Newslett. 4(1), (2002)
Google Scholar
Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisciplinary Rev. Data Mining Knowl. Discov. 1(3), 231–240 (2011)
Article Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, Hoboken, NJ (1975)
MATH Google Scholar
Hartigan, J.A., Wong. M.A.: Algorithm AS 136: A K-means clustering algorithm. J. Roy. Stat. Soc. 28(1), (1979)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: Hierarchical Clustering: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Book MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Phil. Mag. (1901)
Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3–4), 321–377 (1936)
Article MATH Google Scholar
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs, NJ (1999)
MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Hoboken, NJ (1998)
MATH Google Scholar
Hofmann, T., Scholkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36(3), 1031 (2008)
Article MathSciNet MATH Google Scholar
Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. Conf. Neural Inform. Process. Syst. (2004)
Google Scholar
Schmid, C., Mohr, R.: Local gray value invariants for image retrieval. PAMI 19(5), (1997)
Google Scholar
Dork, G., Schmid, C.: Object class recognition using discriminative local features. Technical Report RR-5497, INRIA—Rhone-Alpes (2005)
Google Scholar
Schlkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA (2001)
Google Scholar
Ferrari, V., Tuytelaars, T., Gool, L.V.: Simultaneous object recognition and segmentation from single or multiple model views. Int. J. Comput. Vis. 67(2), (2006)
Google Scholar
Cinbis, R.G., Verbeek, J., Schmid, C.: Segmentation driven object detection with fisher vectors. Int. Conf. Comput. Vis. (2013)
Google Scholar
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), (1981)
Google Scholar
Freund, Y., Schapire, R.E.: A short introduction to boosting. Jpn. Soc. Artif. Intell. 14(5), (1999)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Heckerman, D.: A tutorial on learning with Bayesian networks. Microsoft Res. Tech. Rep. (1996)
Google Scholar
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), (1997)
Google Scholar
Rabiner, L.R., Juang, B.H.: An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Mag. (1986)
Google Scholar
Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. (2001)
Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. Conf. Comput. Vis. Pattern Recogn. (2006)
Google Scholar
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. PAMI 13(9), (1991)
Google Scholar
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1) (2001)
Google Scholar
Schmid, C.: Constructing models for content-based image retrieval. Conf. Comput. Vis. Pattern Recogn. (2001)
Google Scholar
Alahi, A., Vandergheynst, P., Bierlaire, M., Kunt, M.: Cascade of descriptors to detect and track objects across any network of cameras. Comput. Vis. Image Understand. 114(6), 624–640 (2010)
Article Google Scholar
Simard, P., Bottou, L., Haffner, P., LeCun, Y.: Boxlets: a fast convolution algorithm for signal processing and neural networks. Conf. Neural Inform. Process. Syst. (1999)
Google Scholar
Vedaldi, A., Zisseman, A.: Efficient additive kernels via explicit feature maps. PAMI 34(3), (2012)
Google Scholar
Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. PAMI 33(3), (2010)
Google Scholar
Martin, E., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, (1996)
Google Scholar
Mihael, A., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. SIGMOD ’99 Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data
Google Scholar
Muja, M., Rusu, R.B., Bradski, G., Lowe, D.G.: REIN—a fast, robust, scalable recognition infrastructure. Int. Conf. Robot Autom. (2011)
Google Scholar
Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. Intell. Robots Syst. (2010)
Google Scholar
Alvaro, C., Martinez, M., Siddhartha S.: Srinivasa. MOPED: a scalable and low latency object recognition and pose estimation system. Int. Conf. Robot Autom. (2010)
Google Scholar
Jacob, M., Unser, M.: Design of steerable filters for feature detection using canny-like criteria. PAMI 26(8), (2004)
Google Scholar
Moré, J.J.: The Levenberg-Marquardt algorithm implementation and theory. Numer. Anal. Lect. Notes Math. 630, 105–116 (1978)
Article MathSciNet MATH Google Scholar
Lecun, Y.: Learning invariant feature hierarchies. Eur. Conf. Comput. Vis. (2012)
Google Scholar
Ranzato, M.A., Huang, F.-J., Boreau, Y.-L., Cun, Y.L.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. Conf. Comput. Vis. Pattern Recogn. (2007)
Google Scholar
Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. Int. Conf. Mach. Learn. (2010)
Google Scholar
Kingma, D., LeCun, Y.: Regularized estimation of image statistics by score matching. Conf. Neural Inform. Process. Syst. (2010)
Google Scholar
Losson, O., Macaire, L., Yang, Y.: Comparison of color demosaicing methods. Adv. Imaging Electron Phys. 162, 173–265 (2010)
Article Google Scholar
Xin, L., Gunturk, B., Zhang, L.: Image demosaicing: a systematic survey. Proceedings of SPIE 6822, Visual Communications and Image Processing, 68221J (2008)
Google Scholar
Tanbakuchi, A.A., et al.: Adaptive pixel defect correction. Proceedings of SPIE 5017, Sensors and Camera Systems for Scientific, Industrial, and Digital Photography Applications IV, (2003)
Google Scholar
Ibenthal, A.: Image sensor noise estimation and reduction. ITG Fachausschuss 3.2 Digitale Bildcodierung (2007)
Google Scholar
An Objective Look at FSI and BSI, Aptina White Paper
Google Scholar
Cossairt, O., Miau, D., Nayar, S.K.: Gigapixel computational imaging. IEEE Int. Conf. Comput. Photogr. (2011)
Google Scholar
Eastman Kodak Company, E-58 technical data/color negative film. Kodak 160NC Technical Data Manual, (2000)
Google Scholar
Kuthirummal, S., Nayar, S.K.: Multiview radial catadioptric imaging for scene capture. ACM Trans. Graph. (also Proc. of ACM SIGGRAPH), (2006)
Google Scholar
Zhou, C., Nayar, S.K.: Computational cameras: convergence of optics and processing. IEEE Trans. Image Process. 20(12), (2011)
Google Scholar
Krishnan, G., Nayar, S.K.: Towards a true spherical camera. Proceedings of SPIE 7240, Human Vision and Electronic Imaging XIV, 724002 (2009)
Google Scholar
Reinhard, H., Debevec, P., Ward, M., Kaufmann, M.: High Dynamic range imaging, 2nd edition acquisition, display, and image-based lighting. 2 ed., Morgan Kaufmann, (2010)
Google Scholar
Gallo, O., et al.: Artifact-free high dynamic range imaging. IEEE Int. Conf. Comput. Photogr. (2009)
Google Scholar
Grossberg, M.D., Nayar, S.K.: High dynamic range from multiple images: which exposures to combine? Int. Conf. Comput. Vis. (2003)
Google Scholar
Nayar, S.K., Krishnan, G., Grossberg, M.D., Raskar, R.: Fast separation of direct and global components of a scene using high frequency illumination. Proc. SIGGRAPH (2006)
Google Scholar
Wilson, T., Juskaitis, R., Neil, M., Kozubek, M.: Confocal microscopy by aperture correlation. Opt. Lett. 21(23), 1879–1881 (1996)
Article Google Scholar
Corle, T.R., Kino, G.S.: Confocal Scanning Optical Microscopy and Related Imaging Systems. Academic Press, New York (1996)
Google Scholar
Fitch, J.P.: Synthetic Aperture Radar. Springer, New York (1988)
Book Google Scholar
Ng, R., et al.: Light field photography with a hand-held plenoptic camera. Stanford Tech Report CTSR 2005-02
Google Scholar
Ragan-Kelley, J., et al.: Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31(4), (2012)
Google Scholar
Levoy, M.: Experimental platforms for computational photography. Comput. Graph. Appl. 30 (2010)
Google Scholar
Adams, A., et al.: The Frankencamera: an experimental platform for computational photography. Proc. SIGGRAPH. (2010)
Google Scholar
Salsman, K.: 3D vision for computer based applications. Technical Report, Aptina, Inc., (2010).
Google Scholar
Cossairt, O., Nayar, S.: Spectral focal sweep: extended depth of field from chromatic aberrations. IEEE Int. Conf. Comput. Photogr. (2010). (see also US Patent EP2664153A1)
Google Scholar
Fife, K., El Gamal, A., Philip Wong, H.-S.: A 3D multi-aperture image sensor architecture. Proc. IEEE Custom Integr. Circ. Conf. 281–284, (2006)
Google Scholar
Wang, A., Gill, P., Molnar, A.: Light field image sensors based on the Talbot effect. Appl. Optics 48(31), 5897–5905 (2009)
Article Google Scholar
Shankar, M., et al.: Thin infrared imaging systems through multichannel sampling. Appl. Optics 47(10), B1–B10 (2008)
Article Google Scholar
Flusser, B.Z.J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977–1000 (2003)
Article Google Scholar
Hirschmûller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. Conf. Comput. Vis. Pattern Recogn. (2005)
Google Scholar
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. Br. Mach. Vis. Conf. (2000)
Google Scholar
Faugeras, O.: Three Dimensional Computer Vision. MIT Press, Cambridge, MA (1993)
Google Scholar
Maybank, S.J., Faugeras O.D.: A theory of self-calibration of a moving camera. Int. J. Comput. Vis. 8(2), (1992)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Luong, Q.-T., Faugeras, O.D.: The fundamental matrix: theory, algorithms, and stability analysis. Int. J. Comput. Vis. 17 (1995)
Google Scholar
Hartley, R.I.: Theory and practice of projective rectification. Int. J. Comput. Vis. 35 (1999)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47 (2002)
Google Scholar
Lazaros, N., Sirakoulis, G.C., Gasteratos, A.: Review of stereo vision algorithms: from software to hardware. Int. J. Optomechatroni. 2(4), 435–462 (2008)
Article Google Scholar
Clark, D.E., Ivekovic, S.: The Cramer-Rao lower bound for 3-D state estimation from rectified stereo cameras. IEEE Fusion (2010)
Google Scholar
Nayar, S.K., Gupta, M.: Diffuse structured light. Int. Conf. Comput. Photogr. (2012)
Google Scholar
Cattermole, F.: Principles of Pulse Code Modulation, 1st ed., American Elsevier Pub. Co., (1969)
Google Scholar
Pagès, J., Salvi, J.: Coded light projection techniques for 3D reconstruction. J3eA, Journal sur l’enseignement des sciences et technologies de l’information et des systèmes 4(1), (2005) (Hors-Série 3)
Google Scholar
Gu, J., et al.: Compressive structured light for recovering inhomogeneous participating media. Eur. Conf. Comput. Vis. (2008)
Google Scholar
Nayar, S.K.: Computational cameras: approaches, benefits and limits. Technical Report, Computer Science Department, Columbia University, (2011)
Google Scholar
Lehmann, M., et al.: CCD/CMOS lock-in pixel for range imaging: challenges, limitations and state-of-the-art. CSEM, Swiss Center for Electronics and Microtechnology, (2004)
Google Scholar
Andersen, J.F., Busck, J., Heiselberg, H.: Submillimeter 3-D laser radar for space shuttle tile inspection. Danisch Defense Research Establishment, Copenhagen, Denmark, (2013)
Google Scholar
Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.).: Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications Lecture Notes in Computer Science, Springer (2013)
Google Scholar
Levoy, M., Hanrahan, P.: Light field rendering. SIGGRAPH ’96 Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)
Google Scholar
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. SIGGRAPH ’96 Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)
Google Scholar
Drebin, R.A.: Loren Carpenter, and Pat Hanrahan, volume rendering. SIGGRAPH (1988)
Google Scholar
Levoy, M.: Display of surfaces from volume data. CG&A (1988)
Google Scholar
Levoy, M.: Volume rendering using the Fourier projection slice theorem. Technical report CSL-TR-92-521, Stanford University, (1992)
Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping on a camera phone. ISMAR ’09 Proceedings of the 2009 8th IEEE International Symposium on Mixed and Augmented Reality (2009)
Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR’07, Nara)
Google Scholar
Lucas, B.D., Kanade, T.: An image registration technique with an application to stereo vision. Proceedings of Image Understanding Workshop, (1981)
Google Scholar
Beauchemin, S., Barron, J.D.: The computation of optical flow. ACM Comput. Surv. 27(3), (1995)
Google Scholar
Barron, J., Fleet, D., Beauchemin, S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Article Google Scholar
Baker, S., et al.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2009)
Article Google Scholar
Quénot, G.M., Pakleza, J., Kowalewski, T.A.: Particle image velocimetry with optical flow. In: Experiments in Fluids, vol 25(3), pp. 177–189, (1998)
Google Scholar
Trulls, E., Sanfeliu, A., Moreno-Noguer, F.: Spatiotemporal descriptor for wide-baseline stereo reconstruction of non-rigid and ambiguous scenes. Eur. Conf. Comput. Vis. (2012)
Google Scholar
Steinman, S.B., Steinman, B.A., Garzia, R.P.: Foundations of Binocular Vision: A Clinical Perspective. McGraw-Hill, New York (2000)
Google Scholar
Roy, S., Meunier, J., Cox, I.J.: Cylindrical rectification to minimize epipolar distortion. Conf. Comput. Vis. Pattern Recogn. (1997)
Google Scholar
Oram, D.: Rectification for any epipolar geometry. Br. Mach. Vis. Conf. (2001)
Google Scholar
Takita, K., et al.: High-accuracy subpixel image registration based on phase-only correlation. Institute of Electronics, Information and Communication Engineers(IEICE), (2003)
Google Scholar
Huhns, T.: Algorithms for subpixel registration. CGIP Comput. Graph. Image Process. (1986)
Google Scholar
Foroosh (Shekarforoush).: Hassan, Josiane B. Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE Trans. Image Process. (2002)
Google Scholar
Zitnick, L., Kanade, T.: A cooperative algorithm for stereo matching and occlusion detection. Carnegie Mellon University, Technical report CMU-RI-TR-99-35
Google Scholar
Jian, S., Li, Y., Kang, S.B., Shum, H.-Y.: Symmetric stereo matching for occlusion handling. CVPR ’05 Proceedings of the 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 2
Google Scholar
Kang, S.B., Szeliski, R., Chai, J.: Handling occlusions in dense multi-view stereo. Conf. Comput. Vis. Pattern Recogn. (2001)
Google Scholar
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. SIGGRAPH Proc. (1996)
Google Scholar
Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. UIST ’11 Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, (2011)
Google Scholar
Newcombe, RA. et al.: KinectFusion: real-time dense surface mapping and tracking. ISMAR ’11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality
Google Scholar
Durrant-Whyte, H., Bailey, T.: Simultaneous localisation and mapping (SLAM): part I the essential algorithms. IEEE Robotics Autom. Mag. (2006)
Google Scholar
Bailey, T., Durrant-Whyte, H.: Simultaneous localisation and mapping (SLAM): part II state of the art. IEEE Robotics Autom. Mag. (2006)
Google Scholar
Seitz, S., et al.: A comparison and evaluation of multi-view stereo reconstruction algorithms. CVPR 1, 519–526 (2006)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47 (2002)
Google Scholar
Baker, S., Matthews, I.: Lucas-Kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56 (2004)
Google Scholar
Gallup, D., Pollefeys, M., Frahm, J.M.: 3D reconstruction using an n-layer heightmap. Pattern Recogn. Lect. Notes Comput. Sci. 6376 (2010)
Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. Int Conf Comput Vis (ICCV) IEEE, 2320–2327, (2011)
Google Scholar
Hwangbo, M., Kim, J.-S., Kanade, T.: Inertial-aided KLT feature tracking for a moving camera. Intell. Robots Syst. (IROS)—IEEE. (2009)
Google Scholar
Lovegrove, S.J., Davison, A.J.: Real-time spherical Mosaicing using whole image alignment. Eur. Conf. Comput. Vis. (2010)
Google Scholar
Malis, E.: Improving vision-based control using efficient second-order minimization techniques. Int. Conf. Robot Autom. (2004)
Google Scholar
Kaiming H, Sun, J., Tang, X.: Guided image filtering. Eur. Conf. Comput. Vis. (2010)
Google Scholar
Rhemann, C., et al.: Fast cost-volume filtering for visual correspondence and beyond. CVPR, IEEE, 3017–3024, (2011)
Google Scholar
Fattal, R.: Edge-avoiding wavelets and their applications. SIGGRAPH (2009)
Google Scholar
Gastal, E.S.L., Oliveira, M.M.: Domain transform for edge-aware image and video processing. ACM SIGGRAPH 2011 papers Article No. 69
Google Scholar
Wolberg, G.: Digital Image Warping. Wiley, Hoboken, NJ (1990)
Google Scholar
Baxes, G.: Digital Image Processing: Principles and Applications. Wiley, Hoboken, NJ (1994)
Google Scholar
Fergus, R., et al.: Removing camera shake from a single photograph. ACM Trans. Graph. 25(3), (2006)
Google Scholar
Rohr, K.: Landmark-Based Image Analysis Using Geometric and Intensity Models. Kluwer Academic Publishers, Dordrecht (2001)
Book MATH Google Scholar
Corbet, J., Rubini, A., Kroah-Hartman, G.: Linux Device Drivers, 3rd ed., O’Reilly Media, (2005)
Google Scholar
Zinner, C., Kubinger, W., Isaacs, R.: PfeLib—a performance primitives library for embedded vision. EURASIP, (2007)
Google Scholar
Houston, M.: OpenCL overview. SIGGRAPH OpenCL BOF (2011), also on KHRONOS website
Google Scholar
Zinner, C., Kubinger, W.: ROS-DMA: a DMA double buffering method for embedded image processing with resource optimized slicing. IEEE RTAS 2006, Real-Time and Embedded Technology and Applications Symposium (2006)
Google Scholar
Kreahling, W.C., et al.: Branch elimination by condition merging. Euro-Par 2003 Parallel Process. Lect. Notes Comput. Sci. 2790, (2003)
Google Scholar
Ullman, J.D., Aho, A.V.: Principles of Compiler Design. Addison-Wesley, (1977)
Google Scholar
Ragan-Kelley, J., et al.: Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. SIGGRAPH 31(4), (2012)
Google Scholar
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. Eur. Conf. Comput. Vis. (2012)
Google Scholar
Schneider, C.A., Rasband, W.S., Eliceiri, K.W.: NIH image to ImageJ: 25 years of image analysis. Nat. Meth. 9 (2012)
Google Scholar
Muja, M.: Recognition pipeline and object detection scalability. Summer 2010 Internship Presentation, University of British Columbia
Google Scholar
Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. Conf. Comput. Vis. Pattern Recogn. (2001)
Google Scholar
Swain, M., Ballard, D.H.: Color indexing. Int. J. Comput. Vis. 7 (1991)
Google Scholar
Zhang, Z.: A flexible new technique for camera calibration. EEE Trans. Pattern. Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Article Google Scholar
Viola, P.A., Jones, M.J.: Robust real time object detection. Int. J. Comput. Vis. (2001)
Google Scholar
Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance. Int. J. Comput. Vis. 14 (1995)
Google Scholar
Grosse, R., et al.: Ground-truth dataset and baseline evaluations for intrinsic image algorithms. Int. Conf. Comput. Vis. (2009)
Google Scholar
Haltakov, V., Unger, C., Ilic, S.: Framework for generation of synthetic ground truth data for driver assistance applications. Pattern Recogn. Lect. Notes Comput. Sci. 8142 (2013)
Google Scholar
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. Comput. Vis. Pattern Recogn. 2 (2005)
Google Scholar
Agaian, S.S., Tourshan, K., Noonan, J.P.: Parametric Slant-Hadamard transforms. Proc. SPIE, (2003)
Google Scholar
Sauvola, J., Pietaksinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), (2000)
Google Scholar
Yen, J.C., Chang, F.J., Chang, S.: A new criterion for automatic multilevel thresholding. Trans. Image Process. 4(3), (1995)
Google Scholar
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging 13(1), 2004
Google Scholar
Gaskill, J.D.: Linear Systems, Fourier Transforms, and Optics. Wiley, Hoboken, NJ (1978)
Google Scholar
Shapiro, L.G., Stockman, G.C.: Computer Vision. Prentice-Hall, Upper Saddle River, NJ (2001)
Google Scholar
Flusser, J., Suk, T., Zitova, B.: Moments and Moment Invariants in Pattern Recognition. Wiley, Hoboken, NJ (2009)
Book MATH Google Scholar
Mikolajcyk, K., Schmid, C.: An affine invariant interest point detector. Int. Conf. Comput. Vis. (2002)
Google Scholar
Moravec, H.P.: Obstacle avoidance and navigation in the real world by a seeing robot rover. Tech. report CMU-RI-TR-80-03, Robotics Institute, Carnegie Mellon University & doctoral dissertation, Stanford University, (1980)
Google Scholar
Sivic, J.: Efficient Visual search of videos cast as text retrieval. PAMI 31 (2009).
Google Scholar
Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. AMFG’07 Proceedings of the 3rd International Conference on Analysis and Modeling of Faces and Gestures (2010)
Google Scholar
Scale-Space. Encyclopedia of Computer Science and Engineering. Wiley, Hoboken, NJ, (2008)
Google Scholar
Lindeberg, T.: Scale-space theory: a basic tool for analysing structures at different scales. J. Appl. Stat 21(2), 224–270 (1994)
Google Scholar
Bengio, Y.: Learning Deep Architectures for AI, Foundations and Trends in Machine Learning. Now Publishers Inc USA, (2009)
Google Scholar
Hinton, G.E., Osindero, S.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), (2006)
Google Scholar
Olson, E.: AprilTag: a robust and flexible visual fiducial system. Int. Conf. Robotics Autom. (2011)
Google Scholar
Farabet, C., et al.: Hardware accelerated convolutional neural networks for synthetic vision systems. ISCAS IEEE 257–260, (2010)
Google Scholar
Tuytelaars, T., Van Gool, L.: Matching widely separated views based on affine invariant regions. Int. J. Comput. Vis. 59 (2004)
Google Scholar
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEE Trans. Comput. (1973)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), (2010)
Google Scholar
Yi Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. Conf. Comput. Vis. Pattern Recogn. (2011)
Google Scholar
Amit, Y., Trouve, A.: POP: patchwork of parts models for object recognition. Int. J. Comput. Vis. 75 (2007)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Conf. Comput. Vis. Pattern Recogn. (2006)
Google Scholar
Grauman, K., Darrell, T.: The pyramid Match Kernel: discriminative classification with sets of image features. Int. Conf. Comput. Vis. (2005)
Google Scholar
Michal, A., Elad, M., Bruckstein, A.: KSVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 64 (2006)
Google Scholar
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Conf. Comput. Vis. Pattern Recogn. (2007)
Google Scholar
Johnson, A.: Spin-Images: A Representation for 3-D Surface Matching Ph.D. dissertation, technical report CMU-RI-TR-97-47, Robotics Institute, Carnegie Mellon University, (1997)
Google Scholar
Zoltan-Csaba, M., Pangercic, D., Blodow, N., Beetz, M.: Combined 2D-3D categorization and classification for multimodal perception systems. Int. J. Robotics Res. Arch. 30(11), (2011)
Google Scholar
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. (1988)
Google Scholar
Tombari, F., Salti, S., Di Stefano, L.: A combined texture-shape descriptor for enhanced 3D feature matching. Int. Conf. Image Process. (2011)
Google Scholar
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. Int. Conf. Comput. Vis. (2001)
Google Scholar
Ragan-Kelley, J., et al.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. PLDI ’13 Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (2013)
Google Scholar
Kindratenko, V.V., et al.: GPU clusters for high-performance computing. In: Proceedings of Workshop on Parallel Programming on Accelerator Clusters—PPAC’09, (2009)
Google Scholar
Munshi, A., et al.: OpenCL Programming Guide, 1 ed., Addison-Wesley Professional, (2011)
Google Scholar
Prince, S.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, Cambridge (2012)
Book MATH Google Scholar
Lindeberg, T.: Scale Space Theory in Computer Vision. Springer, New York (2010)
MATH Google Scholar
Pele, O.: Distance Functions: Theory, Algorithms and Applications. Ph.D. Thesis, Hebrew University, (2011)
Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. (1999)
Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), University of California, School of Information and Computer Science, Irvine, CA, (2013)
Zach, C.: Fast and high quality fusion of depth maps. 3DPVT Joint 3DIM/3DPVT Conference 3D Imaging, Modeling, Processing, Visualization, Transmission (2008)
Google Scholar
Visual Genomes for Synthetic Vision, Scott Krig, TBP (2016)
Google Scholar
Grimes, D.B., Rao, R.P.N.: Bilinear sparse coding for invariant vision. Neural Comput. 17(1), 47–73 (2005)
Article Google Scholar
Roger, G., Raina, R., Kwong, H., Ng, A.Y.: Shift-invariant sparse coding for audio classification. In: Proceedings of the 23rd Conference in Uncertainty in Artificial Intelligence (UAI’07), (2007)
Google Scholar
The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All), Technical Report, James Bergstra, Aaron Courville, and Yoshua Bengio (2011)
Google Scholar
Scalable Object Detection using Deep Neural Networks Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov
Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Anh, N., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. CVPR (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV (2014)
Google Scholar
Mutch, J., Lowe, D.G.: Object class recognition and localization using sparse features with limited receptive fields. IJCV (2008)
Google Scholar
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. CVPR (2005)
Google Scholar
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. IJCV (2013)
Google Scholar
Min, L., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going Deeper with Convolutions
Google Scholar
Behnke, S.: Hierarchical neural networks for image interpretation. Draft submitted to Springer Published as volume 2766 of Lecture Notes in Computer Science ISBN: 3-540-40722-7, Springer (2003)
Google Scholar
Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. CVPR (2014)
Google Scholar
van de Sande, E.A., Snoek, C.G.M., Smeulders, A.W.M.: Fisher and VLAD with FLAIR. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Ranzato, M., Boureau, Y., LeCun, Y.: Sparse feature learning for deep belief networks. In: Proceedings of Neural Information Processing Systems (NIPS), (2007)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview, Technical Report IDSIA-03-14/arXiv:1404.7828 v4
Google Scholar
Li D., Yu, D.: Deep learning methods and applications, foundations and Trends^® in signal processing 7
Google Scholar
Yoshua, B., Goodfellow, I.J., Courville, A.: Deep learning. MIT Press, (2016) (in preparation)
Google Scholar
Anderson, J.A., Rosenfeld, E., (eds.).: Neurocomputing: foundations of research. MIT Press, Cambridge MA, (1988). Also Neurocomputing vol. 2: directions for research. MIT Press, Cambridge MA, (1991)
Google Scholar
Jackson, P.: Introduction to Expert Systems, 3 ed., Addison Wesley, (1998)
Google Scholar
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. (1958)
Google Scholar
Joseph, R.D.: Contributions to Perceptron Theory. PhD thesis, Cornell Univ. (1961)
Google Scholar
Wiesel, D.H., Hubel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. (1959)
Google Scholar
Hubel, D.H., Wiesel, T.: Receptive fields, binocular interaction, and functional architecturein the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
Article Google Scholar
McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. (1943)
Google Scholar
Hebb, D.O.: The Organization of Behavior. Wiley, New York (1949)
Google Scholar
Rosenblatt, F.: The Perceptron—a perceiving and recognizing automaton. Report 85-460-1, Cornell Aeronautical Laboratory (1957)
Google Scholar
Ivakhnenko, A.G.: The group method of data handling—a rival of the method of stochastic approximation. Soviet Autom. Contr. (1968)
Google Scholar
Ivakhnenko, A.G., Lapa, V.G.: Cybernetic predicting devices. CCM Inform. Corp. (1965)
Google Scholar
Ivakhnenko, A.G., Lapa, V.G., McDonough, R.N.: Cybernetics and Forecasting Techniques. American Elsevier, NY, (1967)
Google Scholar
Ivakhnenko, A.G.: Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern. 4, 364–378 (1971)
Article MathSciNet Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, (2012)
Google Scholar
Ikeda, S., Ochiai, M., Sawaragi, Y.: Sequential GMDH algorithm and its application to river flow prediction. IEEE Trans. Syst. Man Cybern. 7, 473–479 (1976)
Article Google Scholar
Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron. Trans. IECE J. 62(10), 658–665 (1979)
Google Scholar
Fukushima, K.: Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
Article MathSciNet MATH Google Scholar
Dreyfus, S.E.: The numerical solution of variational problems. J. Math. Anal. Appl. 5(1), 30–45 (1962)
Article MathSciNet MATH Google Scholar
Dreyfus, S.E.: The computational solution of optimal. (1973)
Google Scholar
LeCun, Y.: Une proc´edure d’apprentissage pour r´eseau `a seuil asym´etrique. Proceedings of Cognitiva, vol 85, Paris, pp. 599–604, (1985)
Google Scholar
LeCun, Y.: A theoretical framework for back-propagation. In: Touretzky, D., Hinton, G., Sejnowski, T., (eds.) Proceedings of the 1988 Connectionist Models Summer School, CMU, Morgan Kaufmann, Pittsburgh, PA, pp. 21–28, (1988)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Back-propagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D. S., (ed.) Advances in Neural Information Processing Systems, vol 2, Morgan Kaufmann, pp. 396–404, (1990a)
Google Scholar
Kelley, H.J.: Gradient theory of optimal flight paths. ARS J. 30(10), 947–954 (1960)
Article MATH Google Scholar
Bryson, A.E.: A gradient method for optimizing multi-stage allocation processes. In: Proc. Harvard Univ. Symposium on Digital Computers and Their Applications, (1961)
Google Scholar
Bryson, Jr., A. E. and Denham, W. F.: A steepest-ascent method for solving optimum programming problems. Technical Report BR-1303, Raytheon Company, Missle and Space Division, (1961)
Google Scholar
Werbos, P.J.: The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley, (1994)
Google Scholar
Schmidhuber, J.: Learning complex, extended sequences using the principle of history compression. Neural Comput. (1992)
Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. (2014)
Google Scholar
Hochreiter, S., Jürgen, S.: Long short-term memory, neural computation. (1997)
Google Scholar
Ng, A.: Stanford CS229 Lecture notes. Support Vector Mach.
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Support vector machines and other kernel-based learning methods, Cambridge University Press, (2000)
Google Scholar
Hinton, G.E., Sejnowski, T.J., Rumelhart, D.E., McClelland, J.L.: Learning and relearning in Boltzmann machines, PDP Research Group (1986)
Google Scholar
Ackley, D.H., Hinton, G.E., Sejnowski, TJ.: A learning algorithm for Boltzmann machines. Cogn. Sci. (1985)
Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U. S. A. (1982)
Google Scholar
Smolensky, P.: Chapter 6: information processing in dynamical systems: foundations of harmony theory. In: Rumelhart, D.E., McLelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1, Foundations. MIT Press (1986)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv (2014)
Google Scholar
Also see NiN slides from ILSVRC (2014) http://www.image-net.org/challenges/LSVRC/2014/slides/ILSVRC2014_NUS_release.pdf
LeCun, Y.: A theoretical framework for back-propagation. In: Touretzky, D., Hinton, G., Sejnowski, T., (eds.) Proceedings of the 1988 Connectionist Models Summer School, CMU, pp. 21–28, Morgan Kaufmann, Pittsburgh, PA, (1988)
Google Scholar
Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Contr. (1963)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. ACM COLT ’92, (1992)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. (1995)
Google Scholar
Vapnik, V.: Estimation of Dependences Based on Empirical Data [in Russian]. Nauka, Moscow, (1979). English translation, Springer, New York, (1982)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, Inc., New York (1998)
MATH Google Scholar
Powell, M.J.D.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. (1964)
Google Scholar
Carreira-Perpignan, M.A., Hinton, G.E.: On contrastive divergence learning. In: Artificial Intelligence and Statistics, (2005)
Google Scholar
Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column Deep Neural Networks for Image Classification, cvpr (2012)
Google Scholar
Coates, A., Lee, H., Ng, A.: An analysis of single-layer networks in unsupervised feature learning, AISTATS (2011)
Google Scholar
Rosenblatt, F.: Principles of Neurodynamics Unclassifie—Armed Services Technical Informatm Agency. Spartan, Washington, DC (1961)
Google Scholar
Baddeley, A., Eysenck, M., Anderson, M.: Memory. Psychology Press, (2009)
Google Scholar
Goldman-Rakic, P.S.: Cellular basis of working memory. Neuron 14(3), 477–485 (1995)
Article Google Scholar
Rumelhart, D.E., McClelland, J.L., Group, P.R., et al.: Parallel distributed processing, vol 1. MIT Press, (1986)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv:1409.4842, (2014)
Google Scholar
Von Neumann, J.: First draft of a report on the edvac. (1945)
Google Scholar
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning (ICML), (2013)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1994)
MATH Google Scholar
Stollenga, M., Masci, J., Gomez, F., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. ICML (2014)
Google Scholar
Rupesh Kumar, S., Masci, J., Kazerounian, S., Gomez, F., Schmidhuber, J.: Compete to compute. In: NIPS, (2013)
Google Scholar
Cristian, B., Caruana, R., Niculescu-Mizil, A.: Model compression, ACM SIGKDD (2006)
Google Scholar
Mansimov, E., Srivastava, N., Salakhutdinov, R.: Initialization Strategies of Spatio-Temporal Convolutional Neural Networks, Technical Report, (2014)
Google Scholar
Weng, J., Ahuja, N., Huang, T.S.: Cresceptron: a self-organizing neural network which grows adaptively. In: Proceedings of Int’l Joint Conference on Neural Networks, Baltimore, MD, (1992)
Google Scholar
Cadieu, CF, Hong H, Yamins DLK, Pinto N, Ardila D, Solomon EA, Majaj NJ, DiCarlo JJ.: Deep neural networks rival the representation of primate IT cortex for core visual object recognition, (2014), PLOS 2014DOI: 10.1371/journal.pcbi.1003963
Google Scholar
Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. ICML (2011)
Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., Le-Cun, Y.: What is the best multi-stage architecture for object recognition?, ICCV (2009)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. NIPS (2014)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. (2006)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. NIPS (2007)
Google Scholar
Kandel, E.R., Schwartz, J.H., Jessel, T.M. (eds.) Principles of Neural Science, 4th ed., McGraw-Hill, (2000)
Google Scholar
Rao, R.P.N., Ballard, D.H.: Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nat Neurosci. (1999)
Google Scholar
Rosenfeld, A., Hummel, R.A., Zucker, S.W.: Scene labeling by relaxation operations. IEEE Trans. Syst. Man Cybernetics (1976)
Google Scholar
Métin, C., Frost, D.O.: Visual responses of neurons in somatosensory cortex of hamsters with experimentally induced retinal projections to somatosensory thalamus. Proc. Natl. Acad. Sci. U. S. A. 86(1), 357–361 (1989)
Article Google Scholar
Roe, A.W., Pallas, S.L., Kwon, Y.H., Sur, M.: Visual projections routed to the auditory pathway in ferrets: receptive fields of visual neurons in primary auditory cortex. J. Neurosci. 12(9), 3651–3664 (1992)
Google Scholar
Bach-y-Rita, P., Kaczmarek, K.A., Tyler, M.E., Garcia-LoraVenue, J.: Form perception with a 49-point electrotactile stimulus array of the tongue: a technical note. J. Rehabil. Res. Dev. (1998)
Google Scholar
Bach-y-Rita, P., Tyler, M.E., Kaczmarek, K.A.: Seeing with the brain. IJHCI (2003)
Google Scholar
Laurenz, W.: How Does Our Visual System Achieve Shift and Size Invariance, Problems in Systems Neuroscience, Oxford University Press, (2002)
Google Scholar
Thomas Yeo, B.T., Krienen, F.M., Sepulcre, J., Sabuncu, M.R., Lashkari, D., Hollinshead, M., Roffman, J.L., Smoller, J.W., Zöllei, L., Polimeni, J.R., Fischl, B., Liu, H., Buckner, R.L.: The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. (2011)
Google Scholar
Gross, G.N., Lømo, T., Sveen, O.: Participation of inhibitory and excitatory interneurones in the control of hippocampal cortical output, Per Anderson, The Interneuron, University of California Press, Los Angeles, (1969)
Google Scholar
John, C.E., Ito, M., Szentágothai, J.: The cerebellum as a neuronal machine, Springer, New York, (1967)
Google Scholar
Costas, S.: Interneuronal mechanisms in the cortex. The Interneuron, University of California Press, Los Angeles, (1969)
Google Scholar
Stephen, G.: Contour enhancement, short-term memory, and constancies in reverberating neural networks, Studies in Applied Mathematics, (1973)
Google Scholar
Parikh, D., Zitnick, C.L.: The role of features, algorithms and data in visual recognition. CVPR (2010)
Google Scholar
Christopher, B.: Pattern Recognition and Machine Learning, Springer, (2006)
Google Scholar
Eigen, D., Rolfe, J., Fergus, R., LeCun, Y.: Understanding deep architectures using a recursive convolutional network, arXiv:1312.1847 [cs.LG]
Google Scholar
NIPS.: Tutorial—Deep Learning for Computer Vision (Rob Fergus) (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. NIPS (2012)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and Understanding Convolutional Networks. ECCV (2014)
Google Scholar
Zeiler, M., Taylor, G., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: ICCV, (2011)
Google Scholar
Olga, R., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Large scale visual recognition challenge. ImageNet http://arxiv.org/abs/1409.0575, (2015)
Random Search for Hyper-Parameter Optimization James Bergstra JAMES.BERGSTRA@UMONTREAL.CA Yoshua Bengio, JMLR (2012)
Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: A deep convolutional activation feature for generic visual recognition. CVPR (2013)
Google Scholar
Yamins, D.L., Hong, H., Cadieu, C., DiCarlo, J.J.: Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. NIPS (2013)
Google Scholar
Haykin, S.: Neural Networks: a comprehensive foundation. Pearson Educ. (1999)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. (2013)
Google Scholar
Daniel L.K.Y., Honga, H., Cadieua, C.F., Solomona, E.A., Seiberta, D., DiCarloa, J.J.: Performance-optimized hierarchical models predict neural responses in higher visual cortex. Natl. Acad. Sci. (2015)
Google Scholar
US Government BRAIN Initiative.: http://www.artificialbrains.com/darpa-synapse-program
European Union Human Brain Project.: https://www.humanbrainproject.eu
Canadian Government Computation & Adaptive Perception Canadian Institute For Advanced Research CIFAR. http://www.cifar.ca/neural-computation-and-adaptive-perception-research-progress
Tatyana, V., Sharpee, O., Kouh M., Reynolds, J.H.: Trade-off between curvature tuning and position invariance in visual area. PNAS. (2013)
Google Scholar
Neural Networks, Tricks of the Trade, 2nd ed., Springer, (2012)
Google Scholar
LeCun, Y.: Convolutional networks and applications in vision, Comput. Sci. Dept., New York Univ., New York, NY, USA, Kavukcuoglu, K., Farabet, C., ISCAS. (2010)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ICLR. (2015)
Google Scholar
Lyu, S., Simoncelli, E.P.: Nonlinear image representation using divisive normalization. CVPR. (2008)
Google Scholar
Pinto, N., Cox, D.D., DiCarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput Biol. (2008)
Google Scholar
Yang Y., Hospedales, T.M.: Deep neural networks for sketch recognition. (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting, JMLR. (2014)
Google Scholar
Wan, L., Zeiler, M., Zhang, S., LeCun, Y., Fergus, R.: Regularization of neural network using drop connect. Int. Conf. Mach. Learn. (2013)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. (1994)
Google Scholar
Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional. Neural Netw.
Google Scholar
Mamalet, F., Garcia, C.: Simplifying convnets for fast learning. ICANN. (2012)
Google Scholar
Gens, R., Domingos, P.: Deep symmetry networks. NIPS (2014) see also slides at http://research.microsoft.com/apps/video/default.aspx?id=219488
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?, NIPS (2014)
Google Scholar
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV (2013)
Google Scholar
Hagan, M.T., Demuth, H.B., Beale, M.H.: Neural network design. PWS Publishing, (1996)
Google Scholar
Dominik S., M¨uller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. ICANN. (2010)
Google Scholar
Kaiming, H., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. CVPR (2015)
Google Scholar
Field, G., Gauthier, J., Sher, A., Greschner, M., Machado, T., Jepson, L., Shlens, J., Gunning, D., Mathieson, K., Dabrowski, W., et al.: Functional connectivity in the retina at the resolution of photoreceptors. Nature. (2010)
Google Scholar
Rosenblatt, F.: The Perceptron: A theory of statistical separability in cognitive systems. Cornell Aeronautical Laboratory, Buffalo, Inc. Rep. No. VG-1196-G-1, (1958)
Google Scholar
Auer, P., Burgsteiner, H., Maass, W.: A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Austr. Sci. Fund (2008)
Google Scholar
Vapnik, V., Chervonenkis, A., Moskva, N.: Pattern Recognition Theory, Statistical Learning Problems. (1974)
Google Scholar
Hearst, M.A., Berkeley, U.C.: Support vector machines. IEEE Intell. Syst. (1998)
Google Scholar
John P.: How to implement SVM’s, Microsoft Research. IEEE Intelligent Systems, (1998)
Google Scholar
Fukushima, K.: Cognitron: a self-organizing multilayered neural network, Biological Cybernetics, Springer, (1975)
Google Scholar
Fukushima, K.: Artificial vision by multi-layered neural networks: and its advances. Neural Netw. 37, 103–119
Google Scholar
Fukushima, K.: Training multi-layered neural network Neocognitron. Neural Netw. 40, 18–31
Google Scholar
Joan, B., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv:1312.6203 [cs.LG] (2014)
Google Scholar
Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. ICLR. (2014)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE (1998)
Google Scholar
http://www.imagemagick.org/Usage/convolve/#convolve_vs_correlate
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. CVPR. (2015)
Google Scholar
Fractional max-pooling Benjamin Graham. CVPR. (2014)
Google Scholar
The Human Connectome Project is a consortium of leading neurological research labs which are mapping out the pathways in the brain. See http://www.humanconnectomeproject.org/about/
Cun, Y.L., Denker, J.S., Solla, S.A.: Optimal brain damage. NIPS. (1990)
Google Scholar
Waibel, A.: Consonant recognition by modular construction of large phonemic time-delay neural networks. IEEE ASSP (1989)
Google Scholar
Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Large-scale FPGA-based convolutional networks. (2011)
Google Scholar
Clement, F., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Hardware accelerated convolutional neural networks for synthetic vision systems. ISCAS. (2010)
Google Scholar
Sermanet, P., Eigen, D., Zhang X., Mathieu M., Fergus R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. CVPR. (2014)
Google Scholar
Dong, J., Xia, W., Chen, Q., Feng, J., Huang, Z., Yan, S.: Subcategory-aware object classification. CVPR. (2013)
Google Scholar
Jun, Y., Ni, B., Kassim, A.A.: Half-CNN: a general framework for whole-image regression. CVPR. (2014)
Google Scholar
Hugo, L., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. JMLR. (2009)
Google Scholar
Yu, C., Yu, F.X., Feris, R.S., Kumar, S., Choudhary, A., Chang, S.-F.: Fast neural networks with circulant projections. (2015)
Google Scholar
Jochem, T., Dean Pomerleau, AI.: Life in the fast lane the evolution of an adaptive vehicle control system. Magazine (1996)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. JMLR. (2010)
Google Scholar
Hastie, T., Friedman.: The Elements of Statistical Learning. 2nd ed., Springer, (2009)
Google Scholar
Boureau, Y.-L., Le Roux, N., Bach, F., Ponce, J., Lecun, Y.: Ask the locals: multi-way local pooling for image recognition ICCV’11
Google Scholar
Ren, W., Yan, S., Shan, Y., Dang, Q., Sun, G.: Deep image: scaling up image recognition. CVPR. (2015)
Google Scholar
Karen, S., Simonyan, K.: http://imagenet.org/tutorials/cvpr2015/recent.pdf, ILSVRC Submission Essentials in the light of recent developments. ImageNet, Tutorial (2015)
Jon Shlens Google Research.: Directions in convolutional neural networks at Google, (2015), http://vision.stanford.edu/teaching/cs231n/slides/jon_talk.pdf
Sergey, I., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CVPR. (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR. (2014)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Int. Conf. Artif. Intell. Stat. (2010)
Google Scholar
Chunhui, G., Lim, J.J., Arbelaez, P., Malik, J.: Recognition using regions. CVPR. (2009)
Google Scholar
Ross G.: Fast R-CNN. CVPR. (2015)
Google Scholar
Volodymyr, M., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. NIPS. (2014)
Google Scholar
Oriol, V., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. (2015)
Google Scholar
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. ICML (2015)
Google Scholar
Subhashini, V., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence—video to text. (2015)
Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. (2014)
Google Scholar
Schmidhuber, J., Wierstra, D., Gagliolo, M., Gomez, F.: Training recurrent networks by evolino. Neural Comput. (2007)
Google Scholar
Weston, J., Chopra, S., Bordes, A.: Memory networks. ICLR. (2015)
Google Scholar
LaRue, J.P.: A Bi-directional Neural Network Based on a Convolutional Neural Network and Associative Memory Matrices That Meets the Universal Approximation Theorem, Jadco Signals, Charleston, SC, USA, 1 315 717 9009 james@jadcosignals.com
Google Scholar
Zhou, R.W., Quek, C.: DCBAM: A discrete chainable bidirectional associative memory. Pattern Recogn. Lett. (1991)
Google Scholar
Kosko, B.: Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 7, 49–60 (1988)
Article MathSciNet Google Scholar
Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. 353–359, (1972)
Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U. S. A. 79(8), 2554–2558 (1982)
Article MathSciNet MATH Google Scholar
Schmidhuber, J.: Long Short-Term Memory: Tutorial on LSTM Recurrent Networks, http://people.idsia.ch/~juergen/lstm/
Hochreiter, S., Steven, Y.A., Conwell, P.R.: Learning to learn using gradient descent. ICANN. (2001)
Google Scholar
Schmidhuber, J.: Learning to control fast-weight memories: an alternative to recurrent nets. Neural Comput. (1992)
Google Scholar
Jeff, D., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. CVPR. (2015)
Google Scholar
Mengye, R., Kiros, R., Zemel, R.: Exploring models and data for image question answering. ICML. (2015)
Google Scholar
Alex, G., Doktors der Naturwissenschaften.: Supervised Sequence Labelling with Recurrent Neural Networks
Google Scholar
Graves, A., Fernandez, S., Schmidhuber, J.: Multi-dimensional recurrent neural networks. ICANN. (2007)
Google Scholar
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures—DAG-RNN’s and the protein structure prediction problem. JMLR. (2003)
Google Scholar
Karol, G., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: a recurrent neural network for image generation. ICML. (2015)
Google Scholar
Richard, S., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. NIPS. (2012)
Google Scholar
B., Shuai, Zuo, Z., Gang, W.: Quaddirectional 2D-recurrent neural networks for image labeling. IEEE SPL. (2015)
Google Scholar
Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., Chen, Y.: Convolutional recurrent neural networks: learning spatial dependencies for image representation. CVPR. (2015)
Google Scholar
Alex, G., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. NIPS. (2008)
Google Scholar
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML. (2012)
Google Scholar
Kyunghyun, C., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP. (2014)
Google Scholar
Kyunghyun, C., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. SSST-8. (2014)
Google Scholar
Peter, T., Horne, B.G., Lee Giles, C.: Collingwood, P.C.: Finite state machines and recurrent neural networks—automata and dynamical systems approaches. Neural Networks Pattern Recogn. Chapter 6, (1998)
Google Scholar
Arai, K., Nakano, R.: Stable behavior in a recurrent neural network for a finite state machine. Neural Netw. 13(6), (2000)
Google Scholar
Wojciech, Z., Sutskever, I.: Learning to execute
Google Scholar
Rumelhart, D.E., McClelland, J.L.: Parallel Distributed processing: explorations in the microstructure of cognition. (1986)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. (1990)
Google Scholar
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. (1991)
Google Scholar
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition (1993)
Google Scholar
Williams, R.J., Zipser, D.: Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity. Back-propagation: Theory, Architectures and Applications, Lawrence Erlbaum Publishers, (1995)
Google Scholar
Robinson, A.J., Fallside, F.: The Utility Driven Dynamic Error Propagation Network. Technical Report CUED/F-INFENG/TR.1, Cambridge, (1987)
Google Scholar
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE (1990)
Google Scholar
Boden, M.: A guide to recurrent neural networks and backpropagation. (2014)
Google Scholar
Ders, F.: Long Short-Term Memory in Recurrent Neural Networks, PhD Dissertation, (2001)
Google Scholar
Qi, L., Zhu, J.: Revisit long short-term memory: an optimization perspective. NIPS. (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, QV.: Sequence to sequence learning with neural networks. NIPS. (2014)
Google Scholar
Kyunghyun, C., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. (2014)
Google Scholar
Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. CVPR. (2015)
Google Scholar
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML), (2011)
Google Scholar
Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Advances in Neural Information Processing Systems, NIPS. (2010)
Google Scholar
Volodymyr, M., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent Models of Visual Attention
Google Scholar
Steve, B., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In Computer Vision–ECCV, Springer, (2010)
Google Scholar
Tom, S., Glasmachers, T., Schmidhuber, J.: High dimensions and heavy tails for natural evolution strategies. Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. ACM. (2011)
Google Scholar
Zaremba, W., Sutskever, I.: Reinforcement Learning Neural Turing Machines. (2015)
Google Scholar
Hebb, D.: The Organization of Behaviour. Wiley, New York (1949)
Google Scholar
Liefeng, B., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. CVPR. (2011)
Google Scholar
Ivakhnenko, G.A., Cerda R.: Inductive Self-Organizing GMDH Algorithms for Complex Systems Modeling and Forecasting, http://www.gmdh.net/articles/index.html, see the general GMDH website for several other resources, http://www.gmdh.net
The review of problems solvable by algorithms of the group method of data handling. Pattern Recogn. Image Anal. (1995), www.gmdh.net/articles/
Ladislav, Z.: Learning simple dependencies by polynomial neural network. J. Inform. Contr. Manag. Syst. 8(3), (2010)
Google Scholar
Liefeng, B., Sminchisescu, C.: Efficient match kernel between sets of features for visual recognition. NIPS. (2009)
Google Scholar
Julesz, B.: Textons, the elements of texture perception and their interactions. Nature 290, 91–97 (1981)
Article Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV. (2007)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. IEEE CV. (2005)
Google Scholar
Lampert, C.H.: Kernel methods in computer vision. Found. Trends Comput. Graph. Vis. 4(3), 193–285 (2009)
Article MathSciNet Google Scholar
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. ICCV. (2005)
Google Scholar
Youngmin, C., Saul, L.K.: Kernel methods for deep learning. NIPS. (2009)
Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. (2009)
Google Scholar
Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. Int. Conf. Comput. Vis. (2007)
Google Scholar
Klaus-Robert, M., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE TNN. (2001)
Google Scholar
Nilsback, M.-E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR. (2006)
Google Scholar
Liefeng, B., Ren, X., Fox, D., Kernel descriptors for visual recognition. NIPS. (2010)
Google Scholar
Boswell, D.: Introduction to Support Vector Machines. (2002)
Google Scholar
Radu Tudor, I., Popescu, M., Grozea, C.: Local learning to improve bag of visual words model for facial expression recognition. ICML. (2013)
Google Scholar
Haussler. D.: Convolution kernels on discrete structures. Tech. Rep. (1999)
Google Scholar
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Asilomar Conf. Signals Syst. Comput. (1993)
Google Scholar
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Article MATH Google Scholar
Bruna, J., Mallat, S.: Invariant Scattering Convolution Networks. (2012)
Google Scholar
Wonmin, B., Breuel, T.M., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. CVPR. (2015)
Google Scholar
Du, Y., Wei, W., Liang, W.: Hierarchical recurrent neural network for skeleton based action recognition. CVPR. (2015)
Google Scholar
Jianchao, Y., Yu, K., Lv, F., Huang, Yihong Gong, T.: Locality-constrained Linear Coding for image classification. CVPR (2001) Jinjun Wang Akiira Media Syst., Palo Alto, CA, USA
Google Scholar
Reubold, J.: Kernel descriptors in comparison with hierarchical matching pursuit. Seminar Thesis, Proceedings of the Robot Learning Seminar, (2010)
Google Scholar
John, S.-T., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, (2004)
Google Scholar
Hofmann, T., Scholkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat.
Google Scholar
Rojas, R: Neural Networks—A Systematic Introduction, Springer, (1996)
Google Scholar
Teknomo, K.: Support Vector Machines Tutorial
Google Scholar
Vladimir, C., Mulier, F.M.: Learning from Data: Concepts, Theory, and Methods, 2nd ed., Wiley, (2007)
Google Scholar
Dan, C., Meier, U., Schmidhuber, J.: Multi-column Deep Neural Networks for Image Classification. CVPR. (2012)
Google Scholar
Amnon, S., Hazan, T.: Algebraic set kernels with application to inference over local image representations. (2005)
Google Scholar
Gehler, P, Nowozin, S.: On feature combination for multiclass object classification. CVPR. (2009)
Google Scholar
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. JMLR. (2004)
Google Scholar
Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. NIPS. (2009)
Google Scholar
Candes, E., Romberg, J.: Sparsity and incoherence in compressive sampling. Inverse Probl. 23, 969 (2007)
Article MathSciNet MATH Google Scholar
Kai, Y., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. CVPR. (2011)
Google Scholar
Jian, Z.F., Song, L., Yang X.K., Zhang, W.: Sub clustering K-SVD: size variable dictionary learning for sparse representations. ICIP. (2009)
Google Scholar
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. (1996)
Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 3397–3415, (1993)
Google Scholar
Kwon, S., Wang, J., Shim, B.: Multipath matching pursuit. IEEE Trans. Inform. Theor. (2014)
Google Scholar
Lloyd, S.P.: Least square quantization in PCM. Bell Telephone Laboratories Paper. Published in journal much later: Lloyd, S.P.: Least squares quantization in PCM, IEEE Trans. Inform. Theor. (1957/1982)
Google Scholar
Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik 133(133), 97–178 (1908)
MATH Google Scholar
Mairal, J.: Sparse Coding for Machine Learning, Image Processing and Computer Vision. PhD thesis. Ecole Normale Superieure de Cachan. (2010)
Google Scholar
Mairal, J., Sapiro, G., Elad, M.: Multiscale sparse image representation with learned dictionaries. In: IEEE International Conference on Image Processing, San Antonio, Texas, USA, (2007), Oral Presentation
Google Scholar
Mairal, J., Sapiro, G., Elad, M.: Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Model. Simul. 7(1), 214–241 (2008)
Article MathSciNet MATH Google Scholar
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Learning hierarchical and topographic dictionaries with structured sparsity. In: Proceeding of the SPIE Conference on Wavelets and Sparsity XIV. (2011)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)
MATH Google Scholar
Ethem, A.: Introduction to Machine Learning, MIT Press, (2004)
Google Scholar
Tom, M.: Machine Learning, McGraw Hill, (1997)
Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Huang, F.-J., Ranzato, M.-A.: A Tutorial on Energy-Based Learning, in Predicting Structured Outputs, MIT Press, (2006)
Google Scholar
Pursuit, R.R., Zibulevsky, M., Elad, M.: Efficient Implementation of the K-SVD algorithm using Batch Orthogonal Matching. Technical Report—CS Technion, (2008)
Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature. (1999)
Google Scholar
Logothetis, N.K., Pauls, J., Poggio, T.: Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5(5), 552–563 (1995)
Article Google Scholar
Tarr, M.: News on views: pandemonium revisited. Nat. Neurosci. (1999)
Google Scholar
Selfridge, O.G.: Pandemonium: a paradigm for learning. Proceedings of the Symposium on Mechanisation of Thought Processes (1959)
Google Scholar
Bülthoff, H., Edelman, S.: Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. U. S. A. 89, 60–64 (1992)
Article Google Scholar
Logothetis, N., Pauls, J., Bülthoff, H., Poggio, T.: Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 4, 401–414 (1994)
Article Google Scholar
Tarr, M.: Rotating objects to recognize them: a case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonom Bull. Rev. 2, 55–82 (1995)
Article Google Scholar
Booth, M., Rolls, E.: View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8, 510–523 (1998)
Article Google Scholar
Kobatake, E., Wang, G., Tanaka, K.: Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. J. Neurophysiol. 80, 324–330 (1998)
Article Google Scholar
Perrett, D., et al.: Viewer-centred and object-centred coding of heads in the macaque temporal cortex. Exp. Brain Res. 86, 159–173 (1991)
Article Google Scholar
Perrett, D.I., Rolls, E.T., Caan, W.: Visual neurons responsive to faces in the monkey temporal cortex. Exp. Brain Res. 47, 329–342 (1982)
Article Google Scholar
Tanaka, K., Saito, H.-A., Fukada, Y. & Moriya, M.: Coding visual images of objects in the inferotemporal cortex of the macaque monkey. J. Neurophysiol. 66, 170–189
Google Scholar
Parental olfactory experience influences behavior and neural structure in subsequent generations. Nat. Neurosci. 17, 89–96, (2014)
Google Scholar
Gjoneska, E., Pfenning, A., Mathys, H., Quon, G., Kundage, A., Tsai, L.H., Kellis, M.: Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature (2015), doi: 10.1038/nature14252
Google Scholar
Tanaka, K.: Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19, 109–139 (1996)
Article Google Scholar
Logothetis, N.K., Sheinberg, D.L.: Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996)
Article Google Scholar
Mutch, J., Lowe, D.: Multiclass object recognition with sparse, localized features. CVPR. (2006)
Google Scholar
Serre, R.: Realistic modeling of simple and complex cell tuning in the HMAX model, and implications for invariant object recognition in cortex. CBL Memo. 239 (2004)
Google Scholar
Hu, X.-L., Zhang, J.-W., Li, J.-M., Zhang, B.: Sparsity-regularized HMAX for visual recognition. PLOS One. 9(1), (2014)
Google Scholar
Charles, C., Kouh, M., Riesenhuber, M., & Poggio, T.: Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition. AI Memo 2004-024 (2004)
Google Scholar
Christian, T., Thome, N., Cord, M.: HMAX-S: deep scale representation for biologically inspired image categorization. ICIP. (2011)
Google Scholar
Riesenhuber, M., Poggio, T.: Neural mechanisms of object recognition. Curr. Opin. Neurobiol. 12, 162–168 (2002)
Article Google Scholar
Ungerleider, L.G., Haxby, J.V.: “What” and “Where” in the human brain. Curr. Opin. Neurobiol. 4, 157–165a, (1994), National Institute of Mental Health, Bethesda, USA
Google Scholar
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. PAMI. (2007)
Google Scholar
Mutch, J.: HMAX architecture models slide presentation. (2010)
Google Scholar
http://maxlab.neuro.georgetown.edu/hmax/
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of CVPR, (2006)
Google Scholar
Florent, P., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. ECCV. (2010)
Google Scholar
Giorgos, T., Avrithis, Y., Jégou, H.: To aggregate or not to aggregate: selective match kernels for image search. ICCV. (2013)
Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, (1999)
Google Scholar
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. INRIA Rennes, Rennes, France, CVPR. (2010)
Google Scholar
Relja, A., Zisserman, A.: All about VLAD. CVPR. (2013)
Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. Br. Mach. Vis. Conf. (2011)
Google Scholar
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Proceedings of ECCV, (2010)
Google Scholar
van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Proceedings of ECCV, (2008)
Google Scholar
Perronnin, F., Liu, Y., S´anchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. CVPR. (2010)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Proceedings of ECCV, (2010)
Google Scholar
J´egou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. Int. J. Comput. Vis. 87(3), 316–336 (2010)
Article Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE PAMI. (2012)
Google Scholar
Hong Lau, K., Tay, Y.H., Lo, F.L.: A HMAX with LLC for visual recognition. CVPR. (2015)
Google Scholar
Smith, K.: Brain decoding: reading minds. Nature 502(7472), (2013)
Google Scholar
Smith, K.: Mind-reading with a brain scan. Nature (2008)
Google Scholar
Bartholomew-Biggs, M., Brown, S., Christianson, B., Dixon, L.: “Automatic differentiation of algorithms” (PDF). J. Comput. Appl. Math. 124(1-2), 171–190 (2000)
Article MathSciNet MATH Google Scholar
Plaut, D., Nowlan, S., Hinton, G.: Experiments on Learning by Back Propagation, Carnegie Mellon University, (1986)
Google Scholar
Cayley, A.: On the theory of groups, as depending on the symbolic equation θ n = 1. Phil. Mag. 7, (1854)
Google Scholar
Cayley, A.: On the theory of groups. Am. J. Math. 11 (1889)
Google Scholar
Voytek, B.: Brain metrics. Nature (2013)
Google Scholar
Langleben Daniel, D., Dattilio Frank, M.: Commentary: the future of forensic functional brain imaging. J. Am. Acad. Psychiatry Law 36(4), 502–504 (2008)
Google Scholar
Finn, E.S., Shen, X., Scheinost, D., Rosenberg, M.D., Huang, J., Chun, M.M., Papademetris, X., Todd Constable, R.: Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nature (2015)
Google Scholar
Bergami, M., Masserdotti, G., Temprana, S.G., Motori, E., Eriksson, T.M., Göbel, J., Yang, S.M., Conzelmann, K.-K., Schinder, A.F., Götz, M., Berninger, B.: A critical period for experience-dependent remodeling of adult-born neuron connectivity. Neuron (2015)
Google Scholar
Allen Lee, W.-C., Huang, H., Feng, G., Sanes, J.R., Brown, E.N., So, P.T., Nedivi, E.: Dynamic remodeling of dendritic arbors in gabaergic interneurons of adult visual cortex. PLoS 4(2), e29 (2006)
Google Scholar
Wu, Z., Shuran, S., Aditya, K., Fisher, Y., Linguang, Z., Xiaoou, T., Jianxiong, X.: 3D ShapeNets: a deep representation for volumetric shapes. CVPR. (2015)
Google Scholar
Xiang, Y., Wongun, C., Yuanqing, L., Silvio, S.: Data-driven 3D voxel patterns for object category recognition. CVPR. (2015)
Google Scholar
Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. CVPR. (2015)
Google Scholar
Martinovic, A., Jan, K., Riemenschneider, H., Van Gool, L.: 3D All the way: semantic segmentation of urban scenes from start to end in 3D. CVPR. (2015)
Google Scholar
Rock, J., Tanmay, G., Justin, T., JunYoung, G., Daeyun, S., Derek, H.: Completing 3D object shape from one depth image. CVPR. (2015)
Google Scholar
Yub, J., Lee, H., Seok Heo, S., Dong Yun, Y., II.: Random tree walk toward instantaneous 3D human pose estimation. CVPR. (2015)
Google Scholar
Shape Priors Karimi Mahabadi, R., Hane, C., Pollefeys, M.: Segment based 3D object shape priors. CVPR (2015)
Google Scholar
Xiaowei, Z., Spyridon, L., Xiaoyan, H., Kostas, D.: D shape estimation from 2D landmarks: a convex relaxation approach. CVPR (2015)
Google Scholar
Levi, G., Hassner, T.: LATCH: learned arrangements of three patch codes, arXiv preprint arXiv:1501.03719 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. (2015)
Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Google Scholar
Romero, A., Nicolas, B., Samira Ebrahimi, K., Antoine, C., Carlo, G., Yoshua, B.: FitNets: hints for thin deep nets. arXiv:1412.6550 [cs], (2014)
Google Scholar
Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, ACM (2006)
Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. (2009)
Google Scholar
Nikolaus, M., Eddy, I., Philip H., Philipp F., Daniel C., Alexey D., Thomas B.: A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. CVPR, (2016)
Google Scholar
Horn, B.K.P.: Shape from Shading: A Method for Obtaining the Shape of a Smooth Opaque Object from One View, MIT DARPA report, (1970)
Google Scholar
Mutto, C.D., Zanuttigh, P., Cortelazzo, G.M.: Microsoft Kinect™ Range Camera. Springer, (2014)
Google Scholar
Mojsilovic, A.: A method for color naming and description of color composition in images, ICIP, (2002)
Google Scholar
van de Weijer, J., Schmid, C., Verbeek, J.: Learning color names from real world images. CVPR, (2007)
Google Scholar
Khan, R., Van de Weijer, J., Shahbaz Khan, F., Muselet, D., Ducottet, C., Barat, C.: Discriminative Color Descriptors. CVPR, (2013)
Google Scholar
van de Weijer, J., Schmid, C.: Coloring Local Feature Extraction. ECCV, (2006)
Google Scholar
Sung-Hyauk Cha.: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions, IJMMMAS, (see also Duda [826])
Google Scholar
Deza, E., Deza, M.M.: Dictionary of Distances, Elsevier, (2006)
Google Scholar
Glasner, D., Bagon, S., Irani, M.: Super-Resolution From a Single Image. ICCV, (2009)
Google Scholar
Vedaldi, V., Varma, G.M., Zisserman, A.: Multiple Kernels for Object Detection A. (2009)
Google Scholar
Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOGgles: Visualizing Object Detection Features. ICCV, (2013)
Google Scholar
Huang, Y., Nat. Lab. of Pattern Recognition (NLPR); Inst. of Autom.; Beijing, China; Wu, Z., Wang, L., Tan, T., PAMI.: Feature Coding in Image Classification: A Comprehensive Study, (2014)
Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.-F.: Imagenet: a large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. IEEE, 2009
Google Scholar
Targ, S., Almeida, D., Lyman K.: Resnet in Resnet: generalizing residual architectures, arXiv: 1603.08029. (2016)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv: 1602.07261, (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Krig Research, Folsom, USA
Scott Krig

Authors

Scott Krig
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Krig, S. (2016). Global and Regional Features. In: Computer Vision Metrics. Springer, Cham. https://doi.org/10.1007/978-3-319-33762-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-33762-3_3
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33761-6
Online ISBN: 978-3-319-33762-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics