Keywords

1 Introduction

Every educational institution or organization employs attendance recording system. Some continue with the traditional method for taking attendance manually while some have adopted biometric techniques [1]. The traditional method makes it hard to authenticate every student in a large classroom environment. Moreover, the manual labor involved in computing the attendance percentage becomes a major task.

The radio frequency identification (RFID) helps to identify a large number of crowds using radio waves. It has high efficiency and hands-free access control. But it is observed that it can be misused. In radio frequency identification (RFID)-based automatic attendance recording system [2] uses RFID tags, transponders, and RFID terminals for attendance management of employees and students. The captured data is processed by the server to update the database. Attendance recording using Bluetooth technology proposed by Bhalla et al. [3] demands the students to carry their mobile phones to classroom, so that the software installed in the instructors’ mobile phone can detect it via Bluetooth connection and MAC protocols. The common drawback of these two approaches is proxy attendance, as there is no provision for verification.

A biometric-based system indeed provides the solution as they measure characteristics that are unique to every human being and hence making it impossible to duplicate the biometric characteristics of a person. Therefore, there is an extremely low probability of two humans to share the same biometric data and it can only be lost due to a serious accident. It has proven it usefulness and reliability in many organizations, government bodies, and commercial banks. Biometric measurements can be subdivided into physiological and behavioral. However, each biometric method has its own advantages and disadvantages. A physiological method derives its data directly using the body parts of human beings. They include fingerprint scan, iris scan, retina scan, hand scan, and facial recognition. A behavioral method derives its data from an action done by human beings. It includes the following: voice scan, signature scan [4, 5], and keystroke scan.

Various biometric-based authentication systems have been developed and implemented in the past to yield maximum efficiency. These methods include fingerprints, eye retina, voice etc.

1.1 Related Work

Rashid et al. [6] proposed biometric voice recognition technology using voiceprints of an individual to authenticate. This system is useful for people having difficulty in using hands and other biometric traits. However, this system is sensitive to background noise. Also, the voice of the person tends to change with age. The voice recognition system may not accurately identify the person when he/she is suffering from throat infection or flu. Hence, this system is not reliable.

Retina scanning [7]-based methodology uses a blood vessel pattern to authenticate. The pattern remains the same and is not affected by aging as well. However, this device can be used by only one person at a time. It proves to be time-consuming for a large crowd. This equipment also requires the person to be in close contact with it for authenticating. Since it is open for the public, it is susceptible to be vandalized. Alternately, optical sensors are used for scanning the fingerprints of an individual [8]. This system is most commonly used in every organization because of its high reliability. However, the optical sensor can be used only one at a time which tends to waste a considerable amount of time for large crowds. The optical sensor comes in direct contact with the student. It is exposed to a high risk of getting dirty or damaged.

To overcome the disadvantages of existing ARS, face recognition-based attendance authentication techniques are being developed. Joseph et al. [9] proposed a face recognition-based ARS using principal component analysis (PCA) [10]. Roshan Tharanga et al. [11] proposed a smart ARS based on PCA and Haar transform. Shireesha et al. [12] have used PCA, LBPH and LDA for face recognition in their research [13]. Yohei et al. [14] proposed attendance management system which can estimate the position of each student and attendance by continuous observation and recording.

The biometric systems defined above are efficient and reliable and provides immense security when compared to the traditional method. However, these systems offer some disadvantages as well. Most of the devices are unable to enroll some small percentage of users, and the performance of the system can deteriorate over time. Table 1 gives the advantages and disadvantages of various biometric traits.

Table 1 A comparison of various biometric technologies

1.2 Advantages of Face Recognition-Based ARS

Face recognition-based ARS has proven [15] to be a promising due to the following advantages:

  • No physical interaction is required from the user.

  • It is very accurate and provides a high level of security.

  • It has an advantage of ubiquity and of being universal over other biometrics, i.e., everyone has a face and everyone readily displays the face.

  • Non-intrusive nature.

  • Use of one biometric data in different environments.

  • Ease of use of any camera to capture the biometric data of the faces.

This chapter has its focus on development of face recognition-based attendance recording and management system under both controlled and uncontrolled environments.

Section 2 gives an overview of face recognition-based ARS, elaborate various requirements, and the state of the art. Section 3 describes the proposed ARS system which is based on partial face recognition algorithms. Section 4 discusses about the experiments conducted and the associated results for both the controlled and uncontrolled environments. Finally, Sect. 5 concludes the chapter.

2 Overview of Face Recognition-Based ARS

Face recognition is defined as a process of identifying a person with the help of his facial features [16]. Face recognition technology involves scanning the distinctive features of the human face to authorize the student.

2.1 Overview

The general block diagram of face recognition-based ARS is shown in Fig. 1. A database of students’ personal information along with their face image is to be created first. The existing images in the database are known as the “standard images.” The image or the video of students is acquired by using a digital camera or video recorder placed in classroom. Detection of faces from the images or video frames is then performed. It is then required to locate the faces of the students. Locating the faces is a challenging job in real-time applications. The spatial features of the detected faces are then extracted as part of dimensionality reduction. The features also define the behavior of the image. Recognition algorithm is then applied to identify the real-time face image with the database created. A matching score is used to obtain how well the two images are matched. This score matched against the database image reveals the identity of the student.

Fig. 1
figure 1

Steps involved in face recognition-based ARS system

2.1.1 Requirements

The factors to be considered while selecting face recognition-based ARS are as follows [17]:

  • Uniqueness: Every person has unique features when compared with every other person. However, this is not true for twins. They have identical features. Thus, a face recognition system must have the ability to identify every person.

  • Universality: Every person’s appearance differs from another person. Due to this reason, the face recognition algorithm might not work for some people while it might work for another set of people. Factors such as long hair, beard, and spectacles might create an extra difficulty to recognize the faces. The resulting solutions for these problems might not equally work well with the others.

  • Permanence: The human face appearance changes with age. The face might not look the same for a long period of time. It is also subject to permanent changes such as plastic surgery or to temporary changes such as veil or sunglasses.

  • Collectability: Collectability account for the biometric features that can be determined quantitatively. This biometric system does not require the direct physical contact with the individual whose biometrics are to be captured. Capturing facial images is easy. In fact, a person photograph can be taken without his notice. However, the facial recognition system requires proper lighting; a correct positioning of the person, long scanning time. Thus, a facial recognition system is a highly professional system.

  • Performance: Performance includes the speed of acquiring the images, their processing times; which determines the accuracy of correctly recognizing the right faces against the images in the database. The speed of operation depends upon the face recognition algorithm. It also depends upon the number of images stored in the database, as their large number would take a long processing time.

  • Acceptability: Acceptability defines the user friendliness of the system used in daily lives. Face recognition technique is highly user friendly as it involves a non-intrusive way of capturing the biometric information of the person. It provides an easier access control as compared with the other biometric solutions.

  • Circumvention: Circumvention states whether the biometric system can be fooled or hacked by other fraud people. It depends on technical implementation, quality of the camera, surrounding background, and algorithm.

2.2 Performance Metrics

The accuracy of any biometric-based system is determined by measuring two kinds of error rates.

  • False Acceptance Rate (FAR): FAR is the measure of number of unknown students being falsely accepted into the ARS system as known students. This is called “Type-I error.”

  • False Rejection Rate (FRR): FRR is the measure of how many known students are falsely rejected by the ARS system as unknown students. This is called “Type-II error.”

The authentication [18] procedure requires low false acceptance rate (FAR) which says the score must be high enough before matching. It also requires a low false rejection ratio (FRR) to avoid the unknown students being missed out from marking their attendance.

2.3 State of the Art

Attendance management systems (ARS) using facial recognition techniques have evolved tremendously since the past decade. Various methods such as principal component analysis (PCA) [19], local binary pattern [20], eigenface [21], AdaBoost [22], Haar classifier [23], two-dimensional Fisher’s linear discriminate (2DFLD) [24], and 3D modeling [25] have been used for the same.

In the method proposed by Jha et al. [26], color-based technique is used for face detection. This method detects the skin color of humans and its variations. The skin area is then segmented and fed as an input to the recognition process. For face recognition and feature extraction, principal component analysis [27] is used. PCA [28] technique is based on a statistical approach which deals with pure mathematical matrixes. The entire system is implemented in MATLAB. However, the skin tones vary dramatically within and across individuals. Also, due to the changes in the ambient light and shadows, it changes the apparent color of the image. The movement of objects causes blurring of colors.

Shehu and Dika [29] in their work used real-time face detection algorithm which is integrated on the learning management system (LMS). This system automatically detects and registers students present in the classroom. Their approach uses a digital camera installed on a classroom scanning the room every 5 min to capture the images of the students. HAAR classifier is used for face detection. However, the students are required to pay attention to the camera while capturing images. This method, however even detects objects as faces creating a large number of false positives. For face recognition, eigenface methodology is implemented. A drastic change in the student’s appearance causes false recognition of the student. A manual method of cropping the region of interest is done to increase its efficiency.

Balcoh [30] proposed a method that uses Viola–Jones algorithm for face detection algorithm and eigenface methodology for face recognition. However, cropping of images is required after the face detection process, in order to recognize the faces of the students. Mao et al. [31] performed the multiobject tracking to convert detected faces into tracklets. This method uses spare representation to clutter the face instances each tracklet into a small number of clusters. Experiments have been performed on Honda/UCSD database. Real-time face detection and recognition is not used.

Tsun et al. [32] performed the experiment by placing the Webcam on the laptop to continuously capture the video of the students. At regular time intervals, frames of the video are captured and used for further processing. Viola–Jones algorithm is used for face detection algorithm due to high efficiency and eigenface methodology is used for face recognition. However, the students are required to remain alert as the eigenface methodology is not capable to recognize tilted faces captured in the frames. Also, a small classroom was used due to the limited field of view of the Webcam used on the laptop.

Yet another method proposed for ARS by Fuzil et al. [33] used HAAR classifier for face detection and eigenface methodology for face recognition. This system is intended only for frontal images. The different facial poses cannot be recognized. Moreover, the faces which are not detected by the HAAR classifier require manual cropping of the facial features which results in lower efficiency of the overall system. Rekha and Chethan, in their proposed method [34], use Viola–Jones algorithm face detection and a correlation method for recognition. This method also uses the manual cropping of the region of interest where is it further compared with the existing database. However, the images where multiple people are captured in the same or different sequence, the face recognition efficiency is very low.

Shirodkar et al. [35] have proposed an ARS using a Webcam to capture the facial features of the students. Facial detection using Viola–Jones algorithm basically including HAAR features, Integral image, AdaBoost, cascading, and local binary pattern (LBP) was used for facial recognition. In this algorithm, the image is divided into several parts, where on each part LBP is applied. An accuracy of 83.2 % was achieved in this system. However, this system overlooks the pose variations which can occur during the lecture hours. Muthu Kalyani et al. [36] proposed a methodology which uses 3D face recognition to provide more accuracy for recognizing the faces from the images stored in the database. It uses a CCTV camera fixed at the entrance of the classroom. However, the facial recognition from still images is a problem under various illuminations, pose, and expression changes. Moreover, the installation cost of CCTV camera is expensive.

Kanti et al. [37] in their paper for ARS use Viola–Jones algorithm for face detection and principal component analysis for face recognition. PCA can be used for holistic faces and not for partial faces. Thus, the facial recognition efficiency reduces for the overall system. In the method proposed by Rode et al. [38], faces are detected using the skin classification method [39]. Eigenfaces are generated for facial recognition. However, the above algorithm is not implemented for real-time images. There is a need for the selection of region of interest in the images for further processing. The comparison of different face recognition-based ARS is given in Table 2.

Table 2 Summary of face recognition-based ARS

Table 2 indicates that the existing automated attendance systems are proved effective only for frontal faces. This chapter focuses on these issues and presents an ARS that can improve the performance of existing automated systems, using modified Viola–Jones algorithm to detect faces and alignment-free partial face recognition algorithm for face recognition.

3 Proposed ARS

A database created with the student’s personal data along with their face images. Figure 2 shows the block diagram of the system. A camera is used to capture the images of the faces or to capture the real-time video. Optical devices such as the camera or video recorder are used to accomplish this task. The students face images to be recognized are fed to the image processing block where it performs preprocessing, face detection, and face recognition tasks. Preprocessing includes tasks such as cropping of image and enhancement procedures. These processed images are fed to the face recognition algorithm. These database images are then compared with the real-time recognized faces to identity the student.

Fig. 2
figure 2

Block diagram of the proposed ARS

3.1 Face Detection Using Viola–Jones Algorithm [40]

The face detection in the proposed ARS is performed by employing modified Viola–Jones algorithm. The images of the students are captured from the camera placed at the top center of the blackboard at fixed intervals. These images are then preprocessed and are converted to gray scale before performing the face detection. The Viola–Jones method uses integral images to compute the features of the faces. The advantages of Viola–Jones algorithm are as follows:

  • High accuracy,

  • Low false detection rate,

  • Fast feature extraction is possible,

  • Location and scale invariant feature detector,

  • Features are scalable.

A subwindow is swept across the selected real-time image for catching the faces. Before this step, the rescaling of the image is done to different sizes and then run the size locator across these images. However, in the Viola–Jones algorithm, the detector is rescaled rather than the images. The locator is operated every time with a different size through one image at a time. Viola–Jones algorithm uses a scale invariant detector. The locator is established using a fundamental picture and HAAR wavelets. It uses AdaBoost algorithm to select important features. The background region present in the images is eliminated. The majority of the computational time is spent on face regions. Boundary boxes are then inserted for the detected faces. The size of the boundary box depends upon the size of the face detected in an image or video frame.

3.2 Face Recognition Using MKD-SRC Representation

The proposed system employs the MKD-SRC method of partial face recognition proposed by Liao et al. [41] irrespective of whether the face detected is holistic or partial. This approach represents database images and real-time images as multikeypoint descriptor (MKD) and then applies sparse representation-based classification (SRC) for face recognition. This kind of representation performs well in cases where the training data available is small. Figure 3 shows the block diagram of MKD-SRC-based face recognition employed in the proposed system.

Fig. 3
figure 3

Block diagram of face recognition algorithm

Each face in the database is represented by a set of descriptors, where the size of descriptors depends upon the information available about the face. Therefore, the descriptor size for frontal images is large when compared to partial faces in the database. A scale and affine invariant detector namely CanAff is first used to detect the key points. This detector is robust to viewpoint changes and, hence, works best in uncontrolled environments, where the captured images are composed of more partial faces.

Each detected key point region is first enclosed by ellipse and then is normalized to a uniform size circles using affine transformations. After normalizing the detected key point regions to a fixed size, a Gabor ternary pattern (GTP) descriptor is constructed for each key point region. GTP features are robust to illumination changes and noise and are proved as best local features. The procedure is as follows:

Apply Gabor filter to each detected and normalized region. Only odd Gabor kernels are processed with single scale and with just four different orientations (0°, 45°, 90°, and 135°) as they are efficient at detecting edges in these directions. The results of these four Gabor filters are combined to form a ternary pattern called Gabor ternary pattern (GTP), which has the same size as the original detected region. Each GTP region is then divided into subregions and a histogram is extracted for every subregion. The resultant histograms are all concatenated to form a feature vector. To eliminate the extremes and outliers and to form fixed size a normalization step is applied. Finally, principal component analysis (PCA) is applied to reduce the dimensionality of feature vector to M. The resultant feature vector is called multikeypoint descriptor (MKD).

The MKD’s obtained from each image in a class forms a subdictionary. A class here is defined as a set of images with different poses collected for the same student. A dictionary is composed of all subdictionaries and represents features from all images in a class.

Let C denote the total number of classes. Then, the class dictionary

$$ D = \left( {D_{1} ,D_{2} , \ldots ,D_{\text{c}} } \right) $$
(1)

where each dictionary D i corresponds to one class and is given by

$$ D_{i} = \left( {di_{1} ,di_{2} \ldots di_{n} } \right) $$
(2)

where n is the total number of images in a class, and din is a subdictionary for nth class image, which represents one MKD. The dictionary D gives complete description about the database of images. As the dictionary size depends on the size of the input real-time detected face image, filtering is adapted to keep only the largest values of the descriptors.

Application of sparse representation-based classification (SRC) is preferred in this paper for its effectiveness in the classification. From the theory of compressed sensing (CS) [41], “a sparse solution is possible for an over complete dictionary and hence any descriptor from real-time image can be expressed by a sparse linear combination of the dictionary D, with a high probability using ℓ1 minimization.” Inspired by this statement, a multitask SRC based on least residual, proposed by Laio et al. [41] is employed directly, to determine the identity of the real-time partial face image.

3.3 Controlled Environment

In the controlled environment, the camera is fixed to the wall. Each student is required to come in front of this camera to get his/her image to be captured. In the controlled environment, motion detection algorithm (in built in MATLAB) is used to detect if a student came in front of the camera.

3.4 Uncontrolled Environment

In the uncontrolled environment, the camera is placed on top of the blackboard to capture the video of the students during the lecture. The wide capturing angle of the camera enables to capture all the students present in the classroom. Images can be captured at regular intervals or video can be taken by using a high definition camera. Figure 4 shows the camera arrangement in the classroom.

Fig. 4
figure 4

Demonstration of camera arrangement in the classroom

Figure 5 shows flowchart for the proposed ARS for one lecture. At the start of the lecture, camera is initiated to capture the video of the students attending the lecture. One second of the video consists of approximately 30 frames. These frames are further processed one at a time.

Fig. 5
figure 5

Flowchart of the uncontrolled environment

The faces of students are detected using the Viola–Jones algorithm. The detected faces are fed as an input to the face recognition algorithm. If all the students are not recognized in the first frame, then another frame is given as an input to the face detection algorithm. Further, these detected faces are given to the face recognition algorithm. This process continues till all students are detected and recognized. At the end of the lecture, recognized students are awarded with points. This data is stored in the attendance sheet, and their respective percentage of attendance is calculated.

4 Results and Discussions

MATLAB 2012b version is used for implementation of software algorithm. The computer vision toolbox is used for implementations of various algorithms related to feature detection, feature extraction, feature matching, object detection and tracking, motion estimation, and video processing. A DSLR 3100 (digital single-lens reflex) camera is used for image acquisition and video capturing. The proposed algorithms and systems are defined, implemented, tested, and performance evaluated for two scenarios: controlled and uncontrolled environments.

4.1 Database Creation

Initially, a database is created for 20 students with 10 different poses. Table 3 shows the database images along with their roll numbers.

Table 3 Database of 20 students

4.2 Results of Controlled Environment

Table 4 presents the results of face recognition in controlled environment. A test image is the real-time image captured by the camera when a student comes in front of it. A recognized image is a photograph of a student that scores high when compared with the test image in features. After recognition algorithm updates excel sheet accordingly. Table 4 shows that all the 20 students’ faces were successfully detected and recognized. Hence, for the proposed attendance recording system, the face detection efficiency is 100 % and face recognition efficiency is 100 %.

Table 4 Results of proposed ARS under controlled environment

4.3 Results of Uncontrolled Environment

Figure 6 shows a sample frame from the real-time video captured by the camera, placed above the blackboard. Table 5 shows the results of the proposed ARS under uncontrolled environment. It shows the number of faces detected of the students and whether these detected faces are recognized. Figure 6 shows one frame from the video.

Fig. 6
figure 6

A frame captured from the video

Table 5 Results of proposed ARS under uncontrolled environment

The proposed system is trained with a class of frontal and partial face images. For each class, a dictionary is created which contains the biometric information at the training stage. During face recognition, the features extracted from every real input image matched against the trained images dictionary. The sparse representation is used in calculating the minimum distance between them.

However, there are chances of an unknown student being identified correctly against the known student. Thus, a threshold level is maintained to avoid the unknown student being correctly recognized. Ideally, the minimum distance of the unknown must be lower than the known student. Achieving this ideal case in the real world scenario is quite a challenge. A threshold is chosen such high that no unknown student’s results will exceed the result of the known student. This will reduce the false acceptance of the system. On the other hand, the known student’s results are lower than the highest unknown student’s results. To avoid this situation, we can choose a threshold so low that no known student’s images are falsely rejected. This also results in the false acceptance of the unknown students. Therefore, choosing a threshold as a compromise between them is necessary.

In the proposed system, in the uncontrolled environment, only 16 students out of 20 were detected. Thus, the face detection efficiency is 80 %. Out of which only 12 students were recognized. The face recognition efficiency is 60 %. Note that, the output differed from frame to frame. This is because, in each frame, the number of students being detected varies. This affects the face recognition performance. Comparing the Tables 4 and 5, it can be shown that the results of the controlled environment are more accurate than the ones for uncontrolled environment. The consolidated results for the controlled and uncontrolled environment are shown in Table 6. Note that the image quality affects the output results. The image quality in the controlled environment is better than in the uncontrolled environment.

Table 6 Consolidated results of proposed ARS

4.3.1 Advantages

The face recognition method employed in the system is alignment free [41]. This method is capable of recognizing the partial faces under the following conditions:

  • Self-occlusion: this includes blocking the face due to non-frontal poses.

  • Facial accessories: this includes blocking of the face due to facial accessories

  • Limited view: this includes the faces which lie out of the camera’s field of view.

  • Extreme illumination: this includes images in which the facial area is gloomy or high-lightened.

  • Sensor saturation: this includes the underexposure or overexposure of the facial areas in the images.

  • External occlusion: this includes blocking of the faces due to other objects or faces in the image.

Further, the alignment-free partial face recognition algorithm has the following advantages over the other existing algorithms:

  • No prealignment of the images is required,

  • No presence of eye or any other facial component required,

  • No prior knowledge of the input face required, i.e., whether the face is holistic or partial.

4.4 Limitations

It is also observed that the proposed ARS system works under some limitations. Due to the low image quality in the uncontrolled scenario, the efficiency of the system is reduced. Also the computation time required is large when the number of images stored in the database is large. Further work can be carried out to make the system more efficient in the real-time scenario. Also, the training images for each student can be increased to make the system more robust to the recognition problem. Moreover, different ways can be explored to reduce the computational time. This can be achieved by using much more efficient algorithms.

The camera height can be raised to prevent the blockage created by tall students. The use of more number of cameras mounted across the classroom, and by employing image stitching algorithms can enhance the capacity.

An independent system can be implemented by extending the hardware to Internet Protocol (IP) camera. This effort can make the system more accurate as it can continuously monitor the students. An FPGA based system can be implemented as a future scope.

5 Conclusion

A smart attendance capturing and management system is proposed to overcome the existing drawbacks present in the biometric-based attendance management systems, based on existing partial face recognition algorithms. The proposed system was implemented in two phases: controlled environment and uncontrolled environment. The facial features of the students are captured and recognized. The names of the recognized students are then updated in the excel sheet. This system could be used for attendance marking of the students and staff in any organization. This system saves time and manual effort otherwise required to put by the lecturer. This system helps lecturers’ to efficiently manage a large number of students present in the classroom. The system will also help prevent a large number of students from skipping the daily classes.

The proposed system proved accurate under controlled environment. While the efficiency of uncontrolled environment is quite low, it is user friendly. The performance of the proposed attendance system completely depends upon the resolution of the camera used, the number of students detected and hence recognized. Further, the large number of images stored in the database increases the computation time. The techniques employed to make the existing ARS system more efficient and user friendly is never ending.