Keywords

1 Introduction

Analysis of visual features of domestic and farm animals from the point of view of both biometric as well as breed identification is a relatively new field. In this work, we focus on the analysis of visual images associated with pigs not from a biometric standpoint, but from the point of view of breed identification. Every pig breed has distinct characteristics concerning its morphology, size, immunity to common epidemics as a function of the environment in which they are bred and finally in terms of their appeal to the social palate as food. Hence, breed selectivity, identification, and tracking, plays a role in ensuring commercial viability and stability across the supply chain system.

A majority of the available literature has focussed on biometric analysis and more specifically driven toward cattle (mainly cows). Initially, the main challenge was in the identification and selection of an appropriate portion of the animal in the imaging domain, which should contain a biometric identifier. Once this region of interest in the visual domain and the mode of acquisition is established, further processing can be done to extract interest points for matching these derived features from different animals.

For cattle biometrics, Localized Binary Patterns (LBP) descriptors from facial images were used by Cai and Li [1]. Semiautomatic or partially manual alignment and cropping were required to ensure that the images were perfectly registered before being subjected to LBP analysis. Problems in spatial registration of the patch LBP descriptors affected the classification results.

An attempt was made to use the MUZZLE palette of cows for biometric analysis by Awad et al. [2]. SIFT [3], points were extracted from the muzzle and were matched using the RANSAC algorithm [4]. SIFT as a feature is effective as long as the neighborhood profile is characterized distinctively around each interest point of significance. The matching between two interest points from two different muzzle images is done by comparing their abstracted neighborhood profiles. This registration process is extremely difficult because of the diversity with which pores and ciliary patterns within the same muzzle are expected to be captured owing to pose/camera variations, illumination changes and self-shadowing effects. The problems exist even for SURF features as these are also interest point based  [5].

In comparison with Cai and Li [1], LBP codes were generated for every pixel and eventually concatenated to form a histogram over the entire muzzle. This histogram was selected as the base feature. The intra-covariance and cross/interclass feature covariance matrices were then determined and then the optimal linear transform which maximized separability across classes was computed. Results from this approach were promising as the training set was augmented by rotational variations of the muzzle images from the same subject.

While all these papers provide useful cues toward the selection of features for biometric identification, breed classification in pigs becomes a different ball game for the following reasons:

  • While it is possible that visual descriptors or features selected for Biometric recognition offer good classification rates, it remains to be seen whether these features are stable over time. Only when these descriptors remain largely static for several months and years together do they qualify as viable biometric identifiers.

  • It is hard to tell whether the biometric identifiers from the earlier papers happen to be a current (one shot) representation of the face or the muzzle in farm animals as their progression over time have not been examined.

  • However, in the case of breed identification, there is potential to identify common traits across animal subjects within the breed which qualify as ancestral and genetically driven traits. Therefore, in some sense, breed identification should precede individual identification as this temporal element associated with the stability of the feature is now displaced by an ensemble statistic or characteristic.

Genetic confirmation of any visual descriptor (or statistic) is possible from a breed analysis and when this is augmented by an implicit assumption regarding ERGODICITY, the same visual descriptor can be used for biometric identification. Simply put:

Breed analysis is a prerequisite for Biometric analysis.

The rest of the paper is organized as follows: In Sect. 2, we provide the motivation for selecting the muzzle particularly from the point of view of breed analysis. The mechanism for first amplifying the key features in the muzzle, detecting them (or their positions) and computing the patch descriptors, is given in Sect. 3. Distinctiveness of the density maps across breeds are discussed in Sect. 4. Finally, the inferencing procedure and classification test results are presented in Sect. 5.

2 Problem Statement: MUZZLE for Breed Analysis of Pigs

The front portion of the nose of a pig is called the muzzle. The muzzle of a pig is a relatively smooth area and consists of two nostrils. These nostrils, coupled with the nasal linings have the following functionalities:

  • Breathing: Inhalation and Exhalation.

  • Regulation of body temperature.

  • Sampling and discriminating between different types of odors.

Hair follicles and pores are distributed all over the muzzle surface. The density of these follicles tends to vary from breed to breed. For instance, the density of follicles found on the muzzle of the Ghungroo (which is a West Bengal breed [6]) is the highest, while the density found on the muzzle of Yorkshire (which is an exotic breed originating in Europe [7]) is the least. As far as Ghungroo is concerned, this increased hair density can be attributed to the relatively hot and humid areas in which they are reared. As the humidity increases, the rate of evaporation from the skin tends to drop. To counterbalance this and to ensure balanced release of internal heat, animals, and humans from high humidity zones tend to have more hair follicles on the exposed and active areas of their skin. However, in the case of breeds like Yorkshire, which are reared in colder areas, to prevent excessive body heat loss, the density of hair follicles are found to be considerably lower, as compared to those breeds which have originated closer to the equator: Ghungroo (West Bengal) and Duroc (originally African [8]).

Furthermore, the relatively narrow nostrils of Hampshire [9] and Yorkshire as compared to the Ghungroo and Duroc can be justified on account of the climatic conditions in which they have been reared. The Hampshire and Yorkshire breed of pigs have come from the colder regions of Europe and North America. It is common knowledge that the nose plays a critical role in preparing air for our lungs. Ideally, air should be warm and moist before it enters the lungs. This is because the microscopic hairs in the nasal passage, called cilia, that help to keep pathogens and dust from entering the lungs work better with warm moist air, as opposed to dry, cold air. For pigs from hot, wet climates, there is less of a need to prepare the air for the lungs, so they retained wider nostrils. Having longer, narrower nostrils increases contact between the air and the mucosal tissue in the nose. This friction plays a role in warming up and moistening the air before it enters the lungs [10]. Hence, the overall shape of the muzzle, relative sizes of the nostrils and the density of pores and cilia tend to be a function of the environment in which these pigs have been reared.

In the following sections, we propose our algorithm for amplifying and extracting relevant features from the muzzle image which contain information about the position and distribution of nostrils, hair follicles, and pores on the muzzle surface. These features are then refined to learn breed-specific characteristics which can be further deployed toward breed identification.

3 Gradient Profiling and Patch Statistics

As discussed in the previous section, this paper attempts breed identification of pigs based on the density profile of hair follicles (or cilia) and pores distributed over the surface of the muzzle and its periphery. As a preprocessing step, these pores and cilia are highlighted so that we can extract relevant features from them. The process starts with smoothing the image first to remove any noise present and then taking horizontal and vertical derivatives to highlight the pores and cilia. A suitable function for these two operations can be a derivative of a Gaussian which is shown below

$$\begin{aligned} {\frac{\mathrm {d} }{\mathrm {d} x}}\left[ \frac{1}{\sqrt{2\pi \sigma ^{2}}}\exp ^{\frac{-x^{^{2}}}{2\sigma ^{2}}}\right]= & {} {-\frac{x}{\sqrt{2\pi \sigma ^{3/2}}}}\exp ^{-\frac{x^{^{2}}}{2\sigma ^{2}}} \nonumber \\= & {} -c_{0}x\exp ^{\frac{-x^{2}}{2\sigma ^{2}}} \end{aligned}$$
(1)

Two discrete kernels \(K_x\) and \(K_y\) (corresponding to horizontal and vertical gradients) obtained by sampling the above function are convolved with the image to obtain the horizontal and vertical gradient maps \(G_x\) and \(G_y\) respectively. The gradient magnitude profile is then computed from \(G_x\) and \(G_y\) according to

$$\begin{aligned} G_{mag}(i,j) = \sqrt{G_{x}^{2}(i,j) + G_{y}^{2}(i,j)} \end{aligned}$$
(2)

The gradient magnitude profile thus computed is normalized with respect to its mean and then thresholded to obtain a binary image, B.

$$\begin{aligned} \mu _G = \frac{1}{N_1 N_2}\sum _{i=1}^{N_1}\sum _{j=1}^{N_2} G_{mag}(i,j) \end{aligned}$$
(3)
$$\begin{aligned} B(i,j) = \left\{ \begin{array}{c} 1 \text { IF } \frac{G_{mag}(i,j)}{\mu _G} > \delta _G \\ 0 \text { IF } \frac{G_{mag}(i,j)}{\mu _G} < \delta _G \\ \end{array}\right\} \end{aligned}$$
(4)

where \(\delta _G\) is a relativistic threshold generally set to “1” as a compromise between picking up noise versus amplifying relevant information pertaining to pores and cilia on the muzzle. The Gaussian standard deviation parameter \(\sigma \) for smoothing, on the other hand, is picked as “1” (\(7 \times 7\) window) to ensure that sufficient detail is captured while taking a derivative of the image. This binary matrix is what we call the Gradient Significant Map (GSM). Figure 1 shows the muzzle images of several pigs: The first two images belong to Duroc (a, b), the second two belong to Ghungroo (c, d), the next two belong to Hampshire (e, f), and the last two to Yorkshire (g, h). If the threshold \(\delta _G\) is too small (say 0.25), there will be an uncontrolled classification of pixels as significant ones and all binary images irrespective of their breed will appear the same Fig. 3. One the other hand if \(\delta _G\) is too large (say 1.5), very little detail will be captured and once again all the density images will begin to appear the same, irrespective of their breed Fig. 4. As a compromise to facilitate sufficient base feature separation and a representation of the actual concentration of pores and cilia on the muzzle, this threshold is set to an in-between value “1” (Fig. 2).

Fig. 1
figure 1

Muzzle images of several pigs corresponding to different breeds: a, b Duroc, c, d Ghungroo, e, f Hampshire, g, h Yorkshire

Fig. 2
figure 2

Gradient Significance Map (GSM) for optimal choice of threshold \(\delta _G = 1\) (Gradient smoothing parameter, \(\sigma = 1\))

Fig. 3
figure 3

Gradient Significance Map (GSM) for low threshold \(\delta _G = 0.25\) (Gradient smoothing parameter, \(\sigma = 1\)). Almost every point is treated as a significant point and hence all binary images turn white to appear identical

Fig. 4
figure 4

Gradient Significance Map (GSM) for optimal choice of threshold \(\delta _G = 2\) (Gradient smoothing parameter, \(\sigma = 1\)). Very few significant points are picked up and the patch densities corresponding to the actual concentration of pores and cilia are not captured accurately

The GSMs are then divided into equal size patches and a suitable statistic is calculated for each patch as shown in Fig. 5. The statistic calculated for each patch is the percentage of significant pixels in that patch. Thus, if the patch size is \(m \times n\) and \(n_s\) is the no. of significant pixels in that patch as obtained from the GSM, then the corresponding patch statistic is obtained as

$$\begin{aligned} S(patch) = \frac{n_s}{m\times n} \times 100\% \end{aligned}$$
(5)
Fig. 5
figure 5

Division of the GSM into patches and the corresponding patch statistic

This patch statistic computed from each of the patches in Fig. 5, is eventually concatenated to form a feature vector for that muzzle image.

4 Diversity in Patch Statistics Across Breeds

At first, we formulate a patch diversity conjecture across breeds as follows:

Patch diversity conjecture: The patch density profile is expected to be a function of the environment in which these animals are reared and is therefore expected to vary from breed to breed. More importantly, the density maps derived from different spatial locations are expected to be different. It is precisely this diversity in these patch distributions that we wish to use for our final inferencing and decision-making procedure related to breed identification.

The database contained a total of 311 muzzle prints corresponding to 30 animals across four breeds (Duroc, Ghungroo, Hampshire, and Yorkshire). There were around 8–15 muzzle variations from each animal. Duroc and Ghungroo had six animals in their set while Hampshire and Yorkshire had 9 animals each. The muzzle images were taken with a high- resolution handheld camera with another person holding the nose of the pig tightly to avoid excessive blurring of the image due to relative motion between the camera and muzzle surface.

First, the RGB colored image is converted to gray scale, resized to \(1000\times 1000\) and the GSM is constructed. Regarding the patch size \(N_P\), if the patch size is too small, it will make the patch statistic too sensitive to camera panning and pig head movement (pose variation) and illumination changes, during image acquisition. On the other hand, large patch sizes are not desirable because spatial details are lost. As a compromise, purpose we choose a patch size \(N_P = 250\). With this patch size, and with images resized to 1000 \(\times \) 1000 as mentioned earlier, \(4 \times 4 = 16\) patches are generated, out of which the top-left and the top-right were left out of the analysis owing to background interference.

This training phase was setup as follows:

  • Using the proposed feature extraction algorithm the patch density statistics were computed for every muzzle image within the training set, which comprised of 159 muzzle prints (out of a total of 311 muzzle images) coming from 30 different animals across four breeds. The patch statistics were computed for a patch size of \(25\%\) (viz. the muzzle prints were split into \(4\times 4\) grids).

  • Thus each print was mapped to a \(4\times 4\) matrix:

    $$\begin{aligned} \mathbf P (\text {Image}-k) = \left( \begin{array}{cccc} S_{11}(k) &{} S_{12}(k) &{} S_{13}(k) &{} S_{14}(k) \\ S_{21}(k) &{} S_{22}(k) &{} S_{23}(k) &{} S_{24}(k) \\ S_{31}(k) &{} S_{32}(k) &{} S_{33}(k) &{} S_{34}(k) \\ S_{41}(k) &{} S_{42}(k) &{} S_{43}(k) &{} S_{44}(k) \\ \end{array} \right) \end{aligned}$$
    (6)

    with \(S_{ij}(k) \in [0,1]\) indicating the fraction of significant points in patch located at position (ij) and \(i,j \in \{1,2,3,4\}\) corresponding to Pig-k.

  • From the \(N_T = 159\), training muzzle prints across four breeds, the \(4\times 4\) patch matrices for each breed were concatenated.

    $$\begin{aligned} \mathbf DUROC _T= & {} \left\{ \mathbf D _1, \mathbf D _2,\ldots ,\mathbf D _{N_D}\right\} \nonumber \\ \mathbf GHUNG _T= & {} \left\{ \mathbf G _1, \mathbf G _2,\ldots ,\mathbf G _{N_G}\right\} \nonumber \\ \mathbf HAMP _T= & {} \left\{ \mathbf H _1, \mathbf H _2,\ldots ,\mathbf H _{N_H}\right\} \nonumber \\ \mathbf YORK _T= & {} \left\{ \mathbf Y _1, \mathbf Y _2,\ldots ,\mathbf Y _{N_Y}\right\} \end{aligned}$$
    (7)

    with, \(N_D + N_G + N_H + N_Y = N_T = 159\).

  • If \(\mathbf BR _k\) corresponds to a patch matrix from PIG-k in breed type \(\mathbf BR \), this can be written down as

    $$\begin{aligned} \mathbf BR _k = \left( \begin{array}{cccc} S_{11}(BR_k) &{} S_{12}(BR_k) &{} S_{13}(BR_k) &{} S_{14}(BR_k) \\ S_{21}(BR_k) &{} S_{22}(BR_k) &{} S_{23}(BR_k) &{} S_{24}(BR_k) \\ S_{31}(BR_k) &{} S_{32}(BR_k) &{} S_{33}(BR_k) &{} S_{34}(BR_k) \\ S_{41}(BR_k) &{} S_{42}(BR_k) &{} S_{43}(BR_k) &{} S_{44}(BR_k) \\ \end{array} \right) \end{aligned}$$
    (8)

    where, \(S_{ij}(BR_k) \in [0,1]\). All the patch statistics from a breed corresponding to a spatial index (ij) were concatenated to create a location specific conditional histogram. For instance, the conditional histograms for patch location \((i,j) \in \{1,2,3,4\}\) for the four breeds can be created by first sorting the values from the patch statistics (from a specific breed), in ascending order and then binning the count of the values falling within a fixed range. If M is the number of histogram bins, this process is represented as follows:

    $$\begin{aligned} \hat{\mathbf{f }}_{S_{ij}/DUROC}(x)= & {} BIN_M\left[ Sort\left( \left\{ S_{ij}(D_1), S_{ij}(D_2),\ldots ,S_{ij}(D_{N_D})\right\} \right) \right] \\ \hat{\mathbf{f }}_{S_{ij}/GHUNG}(x)= & {} BIN_M\left[ Sort\left( \left\{ S_{ij}(G_1), S_{ij}(G_2),\ldots ,S_{ij}(G_{N_G})\right\} \right) \right] \\ \hat{\mathbf{f }}_{S_{ij}/HAMP}(x)= & {} BIN_M\left[ Sort\left( \left\{ S_{ij}(H_1), S_{ij}(H_2),\ldots ,S_{ij}(H_{N_H})\right\} \right) \right] \\ \hat{\mathbf{f }}_{S_{ij}/YORK}(x)= & {} BIN_M\left[ Sort\left( \left\{ S_{ij}(Y_1), S_{ij}(Y_2),\ldots ,S_{ij}(Y_{N_Y})\right\} \right) \right] \end{aligned}$$

    where Sort(.) sorts the array of scalars in the ASCENDING order and \(BIN_M(.)\) generates the fractional count of values in M equi-spaced bins over the range [0, 1].

  • To ensure there is some form of a polynomial fit for the histograms, the Gaussian density function, which has two degrees of freedom, has been chosen as a reference:

    $$\begin{aligned} f_{S_{ij}/BR}(x) = \frac{1}{\sqrt{2\pi \sigma ^2}} e^{-\frac{(x-\mu )^2}{2\sigma ^2}} \end{aligned}$$
    (9)

    with, \(BR \in \left\{ \text {DUROC, GHUNG, HAMP, YORK} \right\} \). In the training phase, we learn the parameters of this Gaussian fit, i.e., \(\mu , \sigma \). A plot of all the learnt Gaussian distributions are shown in Fig. 6. Note each sub-figure in the set (a–n) comprises of a parametric fit for each of the four histograms corresponding to a specific patch location (ij) with patch locations (1, 1) and (1, 4) not considered on account of extreme background information and irrelevant details.

Fig. 6
figure 6

Patchwise conditional densities for each of the four breeds

The following are some observations regarding the sub-figures:

  • Sub-figures Fig. 6a, b still show considerable overlap in the conditional density functions on account of prevalent background information in all the patches extracted from these two spatial locations irrespective of the breed type.

  • A clear discrimination between the conditional densities begins with sub-figure (c) and continues all the way to sub-figure (n).

  • As predicted by the patch density conjecture, since Yorkshire has been reared in colder areas, the fraction of significant points corresponding to pores and cilia is much smaller as compared to the other breeds (corroborated by Fig. 2g, h). This is depicted by the “black” conditional density function in Fig. 6g, h, k, n), which has the smallest mean in almost all the patches.

  • On the other hand, because of the high density of pores and cilia for Ghungroo, the conditional mean is much higher than the other breeds for most of the patches (Green Gaussian curve in Fig. 6d–l and corroborated by Fig. 2c, d).

5 Testing and Inferencing Procedure

When a query muzzle template is supplied to this spatial conditional patch distribution model, the patch density statistics are first computed using the procedure outlined in Sect. 3.

Thus, this query muzzle print becomes a 14-point vector:

$$\begin{aligned} \nonumber \bar{Q}= & {} [q_{1,2}, q_{1,3}, q_{2,1}, q_{2,2}, q_{2,3}, q_{2,4}, q_{3,1},\\&q_{3,2}, q_{3,3}, q_{3,4}, q_{4,1}, q_{4,2}, q_{4,3}, q_{4,4}] \end{aligned}$$
(10)

The patchwise inferencing is done as follows: For each patch corresponding to the spatial location (ij), the breed with which the corresponding query patch is most closely associated is extracted through a simple MAXIMAL LIKELIHOOD test.

$$\begin{aligned} \begin{aligned} \hat{BR}_Q(i,j) {=} \underset{BR}{ARG\text { }MAX} \left\{ \! \mathbf f _{S_{ij}/DUROC}(q_{ij}),\mathbf f _{S_{ij}/GHUNG}(q_{ij}),\mathbf f _{S_{ij}/HAMP}(q_{ij}),\mathbf f _{S_{ij}/YORK}(q_{ij})\! \right\} \end{aligned} \end{aligned}$$
(11)

where \(BR \in \{DUROC,GHUNG,HAMP,YORK\}\). The overall association of the query vector with one of the breeds is obtained by taking a MAJORITY VOTE across all patch decisions:

$$\begin{aligned} BR_Q(FINAL) = MAJORITY_{VOTE}\left[ \hat{BR}_Q(1,2), \hat{BR}_Q(1,3),\hat{BR}_Q(2,1),\ldots ,\hat{BR}_Q(4,4)\right] \end{aligned}$$
(12)

If \(BR_Q(FINAL)\) is the same as the original breed, then the query has been IDENTIFIED correctly, otherwise there is a misclassification. For testing: 29 muzzle prints from Duroc, 34 from Ghungroo, 45 from Hampshire, and 44 from Yorkshire were deployed out of the total of 311 and the number of correct detections were noted in Table 1. Since Yorkshire demonstrated a low conditional mean and a small variance across most patches, as expected the classification percentage was high (100%).

Table 1 Results of the breed classification algorithm with patch size \(N_P = 250\)
Table 2 Confusion matrix associated with the breed classification algorithm: patch size set as \(N_P = 250\). Ideally the diagonal elements must be as close to “1” as possible

Duroc and Ghungroo showed moderate classification (slightly poorer) results of 75% and 70% respectively, as the variances in their conditional density functions were larger leading to a significant overlap in the functions. Since Duroc, Ghungroo, and Hampshire all have partial white patches in some pigs, Duroc tends to be confused for Ghungroo and Hampshire (Ghungroo higher, because of the similarity in the contour and overall muzzle structure) and Yorshire (least), while, Ghungroo tends to be confused for Duroc and Hampshire (both high) and Yorkshire (least). This can be witnessed in the confusion matrix Table 2.

Hampshire shows the worst classification result of \(58\%\) as the white patch diversity in terms of size and distribution across the muzzle is maximum within the class. This is confirmed by the fact that a significant fraction of Hampshire muzzle prints have been misclassified as Yorkshire (8/45) Table 2.

6 Conclusion

In this paper, we propose a location-specific feature learning algorithm, for breed classification of pigs, based on muzzle images. Gradient Significance Map was constructed from the muzzle images by thresholding normalized Gaussian gradients. This Gradient Significance Map was divided into patches and patch statistic values were computed for each patch. For each of the patches, four Gaussian distributions are learnt from the training data corresponding to the four classes. In the testing phase, Maximum Likelihood Estimation was used to assign each patch to a particular breed. A majority voting across all the patches was carried out to assign the final class label to a muzzle image. The classification rates for Duroc, Ghungroo, Hampshire, and Yorkshire are 75.86%, 70.59%, 58.78%, and 100% respectively.