1 Introduction

Asians and African elephants are the largest land animals, requiring large area for their habitat (Van Aarde et al. 2008; Santiapillai et al. 2010). Therefore, their living area are not officially confined to a particular territory because of its complexity.

Elephants are in threat because of poaching and deforestation by humans for settlement, industrial purpose (Douglas-Hamilton 2008; Lemieux and Clarke 2009) and habitat loss.

India is native to 60% of Asian elephant population, most of which lives near to human settlements in mountainous forests. Southern India hosts half of its population numbering to 6300 elephants (Kumara et al. 2012). Population increase of humans leading to more agricultural and industrial activities decreases the needs of elephants such as grassing area and acute shortage of water. Hence, there has been severe human–elephant conflict (HEC) which threatens both of their livings. As elephants are in search of food entering agricultural fields and humans in turn agitating creates huge conflict (Hoare and Du Toit 1999; Pinter-Wolman 2012). This happens especially in the vicinity of human inhabitation near forests (Sugumar and Jayaparvathy 2013). To track these groups of elephants becomes difficult because of their size and changing behaviour.

By the growth of the technology and advanced living world, a lot of research has been performed on the visual analysis of human beings and human-related events. The automated analysis of elephants has been widely used in the area of object recognition (Zeppelzauer 2013). Elephant recognition in images is an interesting and one of the difficult tasks in object recognition, which can play an important role to solve the human–elephant conflict. Animals are among the most difficult objects for classification and recognition. An object recognition system very much depends on segmentation, feature extraction and decision making process (Kamavisdar et al. 2013). Generally, in monitoring applications the camera is usually fixed and the background is mostly static. In such a scenario we can easily learn a background model and identify objects of interest by detecting changes to the background.

The main significance of this work is image datasets containing elephants in different poses, sizes, in groups or as individuals. The detection of elephants is especially hard because their skin does not exhibit a salient texture pattern and thus lacks in distinctive visual features (Ardovini et al. 2008). Infrared (IR) images are used because during the night time the object cannot be seen through the nacked eye. So, the image perception is based on the object (animal) temperature. The object may be either hot or cold, which depends upon the objects internal and external temperatures. Brightness of the infrared image depends on temperature of object. If it is cold, less brightness in the IR image, otherwise more brightness in the IR images. For this purpose variety of IR cameras are available to capture the object image. The captured infrared images are processed to detect any presence of the object in the field of view. The object is separated from the background and is then processed and compared with the specific reference feature set (Suen et al. 1998). If the object is determined to be an elephant, then the recognition status is ‘yes’, otherwise ‘no’.

Implementation of the image processing algorithm in a general purpose processors (GPPs) is inefficient as it is not been able to keep up with the pace at which data are growing. Real-time image processing is difficult to achieve on a serial processor (Chikkali and Prabhushetty 2011). This is due to several factors such as the large data set represented by the image and the complex operations which may need to be performed on the image (Hwang et al. 2010). One of the methods to accelerate data analysis and to overcome the limitation of GPPs is the use of hardware in the form of field programmable gate arrays (FPGAs). FPGAs are often used as implementation platforms for image processing applications because their structure is able to exploit spatial and temporal parallelism. Hence, the objective of this paper is to implement the elephant recognition algorithm in FPGA to reduce the computation time. But however it cannot be used as a standalone model as image pre-processing for FPGA implementation is done using CPU.

The rest of this paper is organised as follows. Section 2 describes the methodology used for elephant recognition. Section 3 presents the implementation process of elephant recognition in FPGA, which reduces the computational time as compared with the software. Section 4 deals with the experimental results and discussions and Sect. 5 lists the conclusion and future work.

1.1 Related works

There are many techniques for automatic detection of elephants such as satellite tracking using global positioning system (GPS) (Douglas-Hamilton et al. 2005) and systems based on vibration sensors in the ground (Sugumar and Jayaparvathy 2013) which are not applicable for bigger population and are costly too. Today, techniques such as acoustic and visual monitoring are used to sample populations and to obtain reliable estimates of species presence and, potentially, abundance which are cost and detection effective (Wrege et al. 2010; Blumstein et al. 2011). Dabarera and Rodrigo (2010) proposed appearance based recognition algorithms for identification of elephants. By providing the front face image of an elephant the system gives the result based on vision algorithm whether the elephant is a new one or already recognized one (Goswami et al. 2012) by comparing the tusk, ear lobe parameters of a captured elephant can calculate its population. But in reality its difficult to capture the front face of an elephant.

Sugumar and Jayaparvathy (2013) have given an analytical model for surveillance and tracking of elephant herds using a three-state Markov chain. This study was conducted in the dense forest region of western ghats in Coimbatore region. Sugumar and Jayaparvathy (2014) using a real time imaging they presented the intrusion of elephants in the border (Zeppelzauer and Stoeger 2015) acoustic and visual domain were used to detect the elephants. Visual detectors can detect even during day time and do not require vocalization of elephants. There are various methods available to detect object and identify it in the field of image processing (Kumar and Parameswaran 2013). The earlier works in (Ramanan et al. 2006; Hannuna et al. 2005; Lahiri et al. 2011) specifies one skin feature in detecting wild animals in forest. The skin texture helps in identifying the animal easily. One feature alone will not help rather both color and skin texture helps in the identification of different elephants (Sugumar and Jayaparvathy 2014, Wang et al. 2011). In image processing after the filtering is done, various edge detection mechanisms could be used in image processing for the identification of the desired objects. Various of these mechanisms are explored by Muthukrishnan and Radha ( 2011). Mechanisms like edge based, sobel, canny,roberts, prewitt methods specifically for elephants were discussed (Shaukin 2015).

Image segmentation was found to be the best one for recognising an elephant from the background. As this method process only with image, it will not affect both elephant and human beigns. It can be calssified as thresholding (Suen et al. 1998), template matching, region growing, edge detection (Chikkali and Prabhushetty 2011) and clustering.

Clustering technique has been employed in various fields among which k-means clustering is widely used (Hussain et al. 2012). K-NN classifier (Hussain and Seker 2012) the most trustworthy and with less errors is used for classification. Mangai et al. (2015) implemented a clustering-based image segmentation approach for elephant recognition method. The experiment and analysis is done using 23 different IR images. All those images gave the proper segmentation results using k-means clustering technique with k = 2. Shape features are extracted using the K-NN classifier. Elephants are accurately detected for K = 3 and 5.

Many automated systems which are the result of scientific advancement are discussed: the authors (Arivazhagan and Ramakrishnan 2010) presented a paper on the conservation of elephants with the focus on connectivity of elephant’s habitat. The study was done in southern part of the Indian sub-continent. Olivier et al. (2009) discussed a method to detect the elephants using the dung. With the dung decay rates and distance sampling techniques, they have detected and estimated the population size, age group in Southern Mozambique region. Fernando et al. (2012) proposed a solution by direct observations to avoid conflict. Balch et al. (2006) proposed RFID system for tracking. It has a drawback of short range detection and it has low update rate of the locations.

Prabu (2016) used seismic sensors for detection. Recently, automated vision-based method was proposed to detect elephants (Ramesh et al. 2017).

2 Methodology

The block diagram shown in Fig. 1 depicts the various stages of automated elephant recognition and the detailed description of each one is given in this section.

Fig. 1
figure 1

Various stages of automated elephant recognition

The input images are infrared images that may contain different animals with static background. The animals may be single or in group doing different actions like eating, moving, etc. The conversion of infrared (IR) image into grayscale image is accomplished in the pre processing stage. Subsequently, thresholding based segmentation method is used to segment the object of interest. Next, different shape based feature are extracted from the detected object. Finally, object recognition is done using k-NN classifier based on the extracted features. The detailed description of each stage is given in the following subsections.

2.1 Image segmentation by thresholding

First step of object recognition is segmenting the image, so that the pixels of same characteristics belong to one group. Whole pixels are arranged into distinct groups at the end of segmentation It has been extensively used in many areas, such as pattern recognition, machine learning and statistics. The segmentation algorithms used in our methodology is thresholding based segmentation. The objective of image binarization is to divide an image into two groups, foreground or object, and background. In image processing applications, the gray level values assigned to an object are different from the gray level values of the background. Therefore thresholding can be considered as an effective way to separate foreground and background. The output of a thresholding process is a binary image which is obtained by assigning pixels with values less than the threshold with zeros and the remaining pixels with ones.

Let us consider image f of size M × N (M rows and N columns) with L gray levels in the range [0, L − 1]. The gray level or the brightness of a pixel with coordinates (i, j) is denoted by f (i, j). The threshold, T, is a value in the range of [0, L − 1]. The thresholding technique determines an optimum value for T from the histogram of the input image, so that:

$$g\left( {i,j} \right)=~~~0\;{\text{for}}\;f(i,j)<T\quad {\text{1}}\;{\text{for}}\;f(i,j)>T$$
(1)

The data’s available may not be useful in their collected form unless they are computationally processed to extract meaningful results. For the application of object recognition, features are to be extracted from the segmented image. These features will determine in which class the object belongs to. The classification is done using K-NN classifier (Hussain and Seker 2012) which is the most robust to noisy training data and accurate classification method. It is effective when the training data is large. A storage database having the characteristics of the particular object searching for is maintained and the segmented image is compared with the database stored. If the segmented image matches the characteristics stored in the database, the output for the object recognition is positive, else it is negative. Positive represents that the object is present in the image and negative, if the object is not present.

2.2 Feature extraction

The various shape features extracted from the segmented object(s) in the work are major axis length, minor axis length, eccentricity, orientation, convex area, equivdiameter, solidity, extend, area and centroid. The detailed description of the same is given below.

  • Major axis length The longest line that can be drawn through the object connecting the two farthest points in the boundary is called as major axis. If the major axis end points are (x1, y1) and (x2, y2), then the major axis length is calculated as,

$$D=\sqrt {{{({x_2} - {x_1})}^2}+{{({y_2} - {y_1})}^2}}$$
(2)
  • Minor axis length The object width can be calculated using the minor axis. Minor axis is the line that can be drawn through the object while maintaining perpendicularity with the major axis

  • Eccentricity This feature determines the elongatedness of the object. The eccentricity is the ratio of the distance between the foci of the object ‘C’ and its major axis length ‘D’, which is defined as,

$${\text{Eccentricity}}=\frac{C}{D}$$
(3)
  • Orientation The orientation the object is determined by using the angle between major axis x-axis of the image described as,

$${\tan ^{ - 1}}\left( {\frac{{({y_2} - {y_1})}}{{({x_2} - {x_1})}}} \right)$$
(4)
  • Convex area The number of pixels in the convex hull (smallest polygon that encompasses) the object.

  • Equivdiameter It specifies the diameter of a circle with same area as the detected object, which is computed as,

$${\text{Equiv}}\;{\text{diameter}}=\sqrt {\frac{{4 \times {\text{area}}\;{\text{of}}\;{\text{an}}\;{\text{object}}}}{\pi }}$$
(5)

where, area is the total number of pixels of the object.

  • Solidity The ratio of the area of the object to the convex area given as,

$${\text{Solidity}}=\frac{{{\text{area}}}}{{{\text{convex}}\;{\text{area}}}}$$
(6)
  • Extent The ratio of the total number of pixels in the object region to the pixels in the minimum bounding box (smallest rectangle that enclose the object). It is expressed as,

$${\text{Extent}}=\frac{{{\text{area}}}}{{{\text{area}}\;{\text{of}}\;{\text{the}}\;{\text{minimum}}\;{\text{bounding}}\;{\text{box}}}}$$
(7)
  • Area is a scalar; the actual number of pixels in the region.

  • Centroid is a scalar that specifies the centre of mass of the region. The first element of Centroid is the horizontal coordinate (or x-coordinate) of the centre of mass, and the second element is the vertical coordinate (or y-coordinate).

It is to be noted that the above features for the segmented object in our work are computed using the matlab command ‘regionprops’.

2.3 Recognition

Object recognition identifies the object from a set of known labels. The recognition analysis is done with the help of k-NN classifier. Animals like elephant, pig, deer, lion, tiger, cow, bear, horse and dog are classified into various classes like class 1, class 2, class 3, class 4, class 5, class 6, class 7, class 8 and class 9 respectively. If the object belongs to class 1, then the recognition will be “detected object is elephant”. If not “detected object is not an elephant”.

The algorithm for K-nn classifier is as follows:

  • Step 1 Calculate the distance between the feature set extracted from the detected object and the feature dataset, i.e., training vectors using Euclidean distance measure.

  • Step 2 Select k closest training sets based on the calculated distance measures

  • Step 3 Recognize the object based on the class of the majority training set selected.

3 FPGA implementation

The need to process the image with less computational time, lead to the implementation of image processing algorithm in hardware level, which offers parallelism and thus significantly reduces the processing time (Saegusa and Maruyama 2007; Hussain et al. 2012). The drawback of most of the methods is that they use a high level language for coding. This objective lead to the use of Xilinx System Generator, a tool with a high level graphical interface under the Matlab, Simulink based blocks which makes it very easy to handle with respect to other software for hardware description (Christe et al. 2011). Xilinx System Generator is an extension of Simulink and consists of models called ‘Xilinx Blocks’ which are mapped into architectures, entities, signs, ports and attributes which scripts file to produce synthesis in FPGAs, HDL simulation and development tools (Ladgham et al. 2012).

Figure 2 shows the entire operation of FPGA implementation of image processing using Simulink and Xilinx blocks. It goes through three phases as:

Fig. 2
figure 2

Various stages of FPGA implementation of image processing using XSG

  • Image pre-processing blocks

  • Functional processing using XSG

  • Image post-processing blocks

3.1 Image pre-processing block-sets

In the software level simulation using Simulink block-sets alone, where the image is used as a two-dimensional (2D) arrangement such as M × N, there is no need for any image pre-processing, but at hardware level this matrix must be an array of one dimension(1D), namely a vector, where it requires image pre-processing. Simulink blocks for image pre-processing are shown in Fig. 3. All the image pre-processing is done on a CPU.

Fig. 3
figure 3

Simulink blocks for image pre-processing

Moreover, the methods require considerable processing. Although these methods have good performance, they are not generally suitable for hardware implementation. The basic requirement for the thresholding method is its adaptability and efficiency. It should also have the least dependency on image pre-processing. A Xilinx block for thresholding technique is given in Fig. 4.

Fig. 4
figure 4

XSG blocks for image segmentation using thresholding

3.2 Functional processing using XSG

System Generator is a DSP design tool from Xilinx that enables the use of the Math Works model-based Simulink design environment for FPGA design. Xilinx blocksets for image segmentation and thereby feature extraction are made for generating corresponding HDL code.

3.2.1 Image segmentation by thresholding using XSG

For the hardware implementation, the effectiveness of a thresholding method is considered in terms of parameters such as speed and complexity. These become very important in hardware image processing applications. All of the high-ranked cluster based techniques have to compute some image features, such as maximum/minimum gray level values, or variance of image, before the segmentation. Therefore an image must be pre-processed pixel by pixel. For these methods a large processing overhead is present. In these techniques complex computational processes, such as logarithms, are also required.

3.2.2 Feature extraction using XSG: area

Area is a scalar that gives the actual number of on pixels in the image. Figure 5. gives the Xilinx blocks for extracting area of the object in the binary image.

Fig. 5
figure 5

XSG blocks for area feature extraction

3.2.3 Feature extraction using XSG: centroid

Centroid is a scalar that specifies the centre of mass of the region. The first element of Centroid is the horizontal coordinate (or x-coordinate) of the centre of mass, and the second element is the vertical coordinate (or y-coordinate). Xilinx blocks for extracting the centroid of the object in the image is given in Fig. 6.

Fig. 6
figure 6

XSG blocks for centroid feature extraction

3.2.4 Feature extraction using XSG: Equiv

3.2.4.1 Diameter

Equiv diameter is the scalar that specifies the diameter of a circle with the same area as the region. Computed as sqrt (4 × area/pi). This property is supported only for 2-D input label matrices. Figure 7. gives the Xilinx blocks for extracting the equiv diameter of the object in the image.

Fig. 7
figure 7

XSG blocks for Equiv diameter feature extraction

3.3 Image post-processing block-sets

For post-processing it uses a Buffer block which converts scalar samples to frame output at lower sampling rate, followed by a 1D to 2D (matrix) format signal block, finally a sink is used to display the output image back in the monitor, utilizing the Simulink block-sets. Image post-processing block-sets are shown in Fig. 8.

Fig. 8
figure 8

Simulink blocks for image post-processing

3.4 Hardware-software co-simulation

Usually several issues may arise when the model is transformed into hardware. System generator provides several methods to transform the models built using Simulink into hardware. One of these methods is called hardware co-simulation.

Hardware co-simulation enables building a hardware version of the model and using the flexible simulation environment of Simulink we can perform several tests to verify the functionality of the system in hardware. Hardware co-simulation supports FPGAs from Xilinx on boards that support JTAG or ethernet connectivity. Steps involved are:

  • Configure system generator for hardware co-simulation—compiling a new board as shown in Fig. 9.

  • Generation of the hardware co-simulation block is shown in Fig. 10

  • Hardware co-simulation block and synthesis report generation is shown in Fig. 11.

  • Modifying the design for hardware co-simulation is given in Fig. 12.

Fig. 9
figure 9

Hardware co-simulation—compiling a new board

Fig. 10
figure 10

Building the design netlist

Fig. 11
figure 11

Netlist building complete

Fig. 12
figure 12

Modified design for hardware co-simulation

Advantage of this methodology is that, it does not require HDL code to be written.

4 Results and discussion

In this section, recognition performance for various animals like elephant, bear, horse, pig, tiger, deer, cow and lion are presented and discussed. The outputs at segmentation, feature extraction and recognition in both software and hardware processing are presented in detail. The device utilization summary of hardware implementation and the delay comparison of computation in software and hardware are discussed in detail. Software implementation results are obtained using Matlab R2012a and hardware implementation results using Matlab Simulink R2012a and Xilinx System Generator using Xilinx ISE Design Suits 14.4 and prototyped in Virtex-4 FPGA board. 4.1 experimental datasets.

Input infrared images for the experiment include objects in a fixed background. The database include infrared images of nine classes of elephant, pig, deer, lion, tiger, cow, bear, dog and horse. Detailed information about the number of images in each class is tabulated in Table 1. Fig. 13 shows the sample infrared images from each class.

Table 1 Experimental datasets
Fig. 13
figure 13

Sample infrared images

The datasets are categorised into classes from 1 to 9. 24 images are chosen as the training data. The category and the corresponding classes with the number of images in each class is shown in Table 1. One sample image from each class is given in Fig. 14. E1, P1, DE1, L1, T1, C1, B1, H1 and D1 corresponds to first sample image of categories elephant, pig, deer, lion, tiger, bear, horse and dog respectively.

Fig. 14
figure 14

Segmentation output for software processing

4.1 Software implementation results

The Matlab implementation results for image segmentation, feature extraction and object recognition are shown in this section. The entire algorithm is run on a CPU.

4.1.1 Segmentation results

The segmentation output for sample images from each class are shown in Fig. 14. The segmented outputs are based on the thresholding process.

4.1.2 Feature extraction

Table 2 presents eight different shape-based features i.e., major axis length, minor axis length, eccentricity, orientation, convex area, equiv diameter, solidity and extent that are extracted from the sample images under each category of animals, i.e., elephant, pig, deer, lion, tiger, cow, bear, horse and dog. The feature extraction values are extracted from the segmented object.

Table 2 Sample feature set-software

4.1.3 Recognition results

Table 3 shows the recognition status of each IR image. Recognition is done for the elephants. From the recognition results it is observed that when the IR database images are given as inputs, the images 1–7, 18 and 23 are detected as elephant hence the recognition status is “yes” and all other animals are not detected as elephants and hence the recognition status is “no”. For verifying the results, the category of the input IR images are also taken into consideration and the verification results are shown in Table 4. It is found that whenever the input IR image is an elephant the recognition status is “yes”, hence elephant is recognised. The recognition rate has been evaluated and is shown in Table 5. Recognition rate is calculated from Table 4 as, when the category is elephant and if the recognition status is “yes”, recognition rate is 100%.

Table 3 Recognition results software
Table 4 Verification results-software
Table 5 Class wise recognition rate software

If the category is not an elephant and if the recognition status is “no”, recognition rate is 100%. A low performance shown for certain categories due to its smaller training set included in K-NN classifier. The classification and recognition results shown in Table 4 make use of K = 3. Observations are made by changing the value of K to 5 and 7 and the recognition results are calculated. These are then compared with the results obtained with K = 3.

Recognition rates with different K values are given in Table 6. On taking the average, and considering the entry of all other animal categories, K value of 7 is preferred since it has given an average recognition rate of 93.96%. But it is observed that for K = 3 and 5, elephants are perfectly detected. While for K = 7 the recognition rate for elephant is reduced to 85.71%. Hence K = 3 is chosen for recognition since the number of neighbours is less as compared to K = 5. This helps in the reduction of computation time and complexity of the algorithm.

Table 6 Recognition rates with different K values

4.2 Hardware implementation results

The hardware implementation results are obtained by hardware co-simulation using Xilinx System Generator (XSG). Results for image segmentation using thresholding technique, feature extraction and object recognition using K-NN classifier are shown in this section.

4.2.1 Segmentation results

The FPGA board used is Virtex-4 xc4vlx25-10ff668. The board is connected to the computer using JTAG for hardware co-simulation. Figure 15 shows the hardware co-simulation result for segmentation.

Fig. 15
figure 15

Hardware co-simulation for segmentation using XSG

4.2.2 Feature extraction

From the segmented image, area, centroid and Equiv diameter features are extracted using XSG. Sample feature set extracted for each class is shown in Table 7.

Table 7 Sample feature set-hardware

4.2.3 Recognition results

The recognition results obtained from K-NN classifier are given in Table 8. Here the recognition status is set to ‘yes’ if an elephant is present in the image else ‘no’. From the recognition results it is observed that 1–7, 12 and 14 are detected as elephant and all other animals are not detected as elephants. For verifying the results, the corresponding images are taken for verification and the results are shown in Table 8.

Table 8 Recognition results-hardware

The verification results are shown in Table 9. The classification and recognition results shown in Table 8 make use of K = 3. The class wise recognition rates are evaluated and are shown in Table 10. For elephant the recognition rate is 100%. A very low performance shown for certain categories is due to its smaller training set included in the k-NN classifier. The recognition rate differs for lion, cow, horse and dog when recognized in software and hardware is due to the difference in the shape features extracted from the segmented image.

Table 9 Verification results-hardware
Table 10 Class wise recognition rates-hardware

4.2.4 Device utilization summary and computation time analysis

The resource utilization of Virtex-4 FPGA when segmentation and feature extraction algorithm are computed is shown in Table 11.

Table 11 Device utilization summary

Table 12 details about the computation time taken for processing segmentation and feature extraction (area, centroid, equivdiameter) by software and hardware. The results clearly indicate that hardware processing consumes time in nanoseconds while software processing consumes time in seconds. From Table 13, it is observed that the entire process of elephant recognition is done using software–hardware co-simulation. Thus the total computational time taken to process the object recognition algorithms in software is 2.649 s and software–hardware co-simulation is 0.274 s. The computation time delay is reduced by 89.65% in software–hardware co-simulation as compared to software simulation. Hardware processing means the entire algorithm run on a FPGA. Based on the FPGA architecture, the computation time differs. But the objective of this work is to prove that irrespective of the FPGA architecture, the computation time is less as compared to implementing on CPU.

Table 12 Computational time analysis between software simulation and hardware simulation for image processing
Table 13 Computational time analysis of object recognition in software simulation and software–hardware co-simulation

5 Conclusions and future work

In this paper, thresholding based segmentation and k-NN classifier has been done for the elephant recognition of the segmented infrared image by prototyping in Virtex-4 xc4vlx25-10ff668 FPGA board. The effectiveness of our approach and implementation is evaluated by calculating the recognition rate and computational time. The results indicate that the approach is capable of recognizing the elephants from infrared database images and when implemented in hardware using software–hardware co-simulation method, the computation time reduces by 89.65% as compared to software simulation. In future, recognition rate of animals other than elephant may be improved and FPGA implementation of k-NN classifier can be done to further reduce the computational time. Analysis is done for 24 images and it can be extended for more number of images. FPGA implementation of K-NN classifier can also be done in future to achieve more decrease in the computational time.