Keywords

1 Introduction

Tracking attendance and checking the school uniforms of students are important factors in the university environment. The term “school uniforms” in the paper refers to the dresses specified in the academic regulations of a university, specifically Can Tho University (CTU) of Vietnam. According to Decision 1813/QD-DHCT dated June 18, 2021, on academic regulations of CTU, students are required to wear student ID cards as well as neat, polite, and discreet clothes when entering the university environments. Particularly, students at CTU must wear student ID cards, sleeved shirts, long pants, and sandals. Based on our observation, some students do not follow the school’s dress regulations, which may have a negative impact on the campus’s aesthetics and the classroom’s atmosphere. In addition, students will not be permitted to take the final exam if they are absent for more than 20% of the lecture hours. The task of checking attendance and school uniforms is time-consuming and monotonous. To the best of our knowledge, we have not found any application that supports both student attendance monitoring and school uniform checking. This study proposes an approach based on machine learning for building such a system.

The remaining parts of this paper are organized as follows. Section 2 presents related work. The system architecture is proposed in Sect. 3. Sections 4 and 5 discuss our data collection and experimental results. Section 6 concludes the paper.

2 Related Work

Several attendance checking applications utilize face recognition or barcode scanning on student ID cards. Saparkhojayev and Guvercin [15] used an RFID reader to read the student ID card and a web-camera to simultaneously take a picture of the student, which was then sent to a computer and stored in a database. Alghamdi [2] built a mobile application to recognize students using a Turck device to detect the RFID tags found on their university ID cards within a distance of 3 m. Islam et al. [8] reported that tracking attendance using RFID technology is slower, less effective, and more expensive than using a smartphone. Therefore, they built a system comprising a mobile application supporting an interface for teacher-student communication and attendance tracking. Attendance data was stored in the databases of the mobile phone and a web sever.

Some studies use biometrics for attendance checking. Kadry and Smaili [16] utilized wireless iris recognition, micro-controller and RF wireless techniques for developing an attendance tracking system. However, this attendance checking system seems unsuitable for a university environment. Rufai et al. [14] designed a biometric access system for screening exams and checking attendance. Biometric information is collected from each student during enrollment, and all information of each student is detected and compared with the stored data during the examination.

Neural networks have been intensively developed and are utilized to automatically monitor student attendance. Filippidou and Papakostas [7] built a real-time checking attendance model by fine-tuning pre-trained convolutional neural networks (CNNs) for recognizing the faces of students. In particular, the CNN-based attendance checking model captures images of the classroom using a webcam, detects faces using the MTCNN algorithm, and then recognizes students’ faces using a CNN-based model. Dang [5] introduces a smart attendance system by enhancing the FaceNet facial recognition model based on a MobileNetV2 backbone with an SSD subsection. The author performs experiments on mobile devices and achieves an accuracy of 95% on a small dataset. The YOLO model [13] is utilized for checking attendance. Alon et al. [3] built a student attendance system using YOLOv3, which obtained an accuracy of 94%. Mardiana et al. [10] developed a YOLOv5-based system for checking library attendance. Other approaches consist of Local Binary Pattern Histogram face recognizer to identify students [12], and QR code [1, 9, 11].

3 System Architecture

This study uses the pre-trained YOLOv8Footnote 1 models to detect and predict objects in a frame. Figure 1 presents the system architecture. At the beginning of each class, the monitor will start the application, select the class information, and activate the webcam to begin the process of verifying attendance and uniforms. The application will automatically create a file for storing attendance and uniform verification data. After completing the attendance checking process, the monitor can easily export the .csv file containing attendance information.

Fig. 1.
figure 1

The system architecture.

Figure 2 shows the YOLOv8-based models for monitoring attendance and checking uniforms in detail. First, when an object enters the frame, the YOLOv8-based student detection model (named YOLOv8Student) trained on the MS COCO dataset detects and localizes a human. Then, the student’s face is detected using the YOLOv8-based facial recognition model (referred to as YOLOv8Face), which is trained on the WIDER FACE dataset [17], and identified using ArcFace [6] by returning the most comparable face found in the database. The YOLOv8-based models for checking shirts and pants, called YOLOv8Shirt and YOLOv8Pants, respectively, are used to detect and predict the types of shirts and pants. After checking the shirt, another YOLO-v8-based model is utilized to detect and verify the student ID card located in front of the student’s chest. The system will send out a notification signal if it detects that the student does not follow the university’s requirements.

Fig. 2.
figure 2

YOLOv8-based models for monitoring attendance and checking uniforms.

4 Data Collection

Our datasets comprise images taken by ourselves, extracted from the Internet, and provided by Roboflow Universe as follows:

  • Attendance dataset: has 80 images of 11 volunteers, who are students at the College of Information and Communication Technology (CICT) of CTU.

  • Shirt and pant datasets: obtain from RoboflowFootnote 2. The shirt dataset consists of 517 images including 96 images of jackets and vests, 69 images of sleeved shirts, 97 images of sleeveless shirts, 168 images extracted from the Internet, and 87 images taken by ourselves. The pant dataset has 297 images, consisting of 48 images of short pants, 92 images of long pants, 27 images of pantalones, 121 images from the Internet, and 9 images gathered manually.

  • Student card dataset: comprises 155 images collected manually from the ID cards of students at CICT. We observe that the student cards are rectangular, and can be worn horizontally or vertically. This dataset includes 42 horizontal card images, 106 vertical card images, and 7 unspecified-dimension card images.

The Make Sense toolkitFootnote 3 is utilized to label images, and the Roboflow Project is used to manage and augment images. We employ several levels of augmentation, including horizontal flipping, 12-degree rotation, adjusting brightness, and applying Gausian blur. After the augmentation process, the total number of images used for experiments is 1,343 images of shirts (labeled with “sleeved shirt” and “sleeveless shirt”), 796 images of pants (labeled with “long pants” and “short pants”), and 395 images of student cards.

5 Experimental Results

Our experiments with YOLOv8 provided by UltralyticsFootnote 4 are performed in the Google Colab environment with GPU T4. The models are trained with a batch size of 16 and 150 epochs. For other parameters, we use the same values as suggested by Ultralytics. Each dataset is divided into 3 portions, including 75% for training, 20% for validation, and 5% for testing.

We perform experiments in the classrooms at CICT. The application is deployed on a computer located inside a classroom, opposite the classroom’s main entrance. We note that there must be no obstacles between the camera and the door to help the camera properly identify the objects. The Asus ROG Eye SFootnote 5 is used as an external camera for monitoring and capturing student images. This webcam is cost-effective, has a compact design, and provides relatively good image quality with 60 frames per second.

The school uniform checking system consists of a shirt checking model, a pants checking model, and a student ID card checking model. We evaluate each school uniforms checking model, as presented in Table 1. The mAP50-95 values of the shirt checking model, pant checking model, and student ID card checking model are 0.902, 0.859, and 0.687, respectively. Figure 3 shows the confusion matrices for predicting students wearing shirts, pants, and ID cards.

Table 1. Experimental results of the school uniform checking model.
Fig. 3.
figure 3

Confusion matrix of predicting wearing shirts (left side); wearing pants (middle); and wearing student cards (right side).

When the distance between the camera and the objects is between 1.3 m and 2.9 m, the models are able to detect and identify the objects perfectly. Specifically, the models perform the best when this distance is between 1.8 m and 1.9 m. The models become less accurate when this distance is less than 1.3 m or greater than 3.0 m.

We conduct two scenarios for evaluating the school uniform checking model in real-life. Experiments are performed in both natural (sunlight) and LED (classroom) lights. Occasionally, students enter the classroom without turning on the lights; hence, we also conduct experiments using only natural light. Our application performs more accurately in well-lit conditions. In particular, the facial recognition model achieves an accuracy of 85% in optimal lighting conditions. Without LED lighting and under cloudy conditions, this model’s accuracy significantly drops to 20%. Table 2 presents the accuracies of models in different lighting conditions. Figure 4 illustrates the lights’ impact on our application’s accuracy.

Table 2. The accuracies of models in different light conditions
Fig. 4.
figure 4

Example of the light’s effect on the application: from left to right, the brightness decreases, causing the accuracies of the models to go down.

Currently, the model cannot distinguish between real and fake student ID cards. In other words, if students are wearing cards that resembles the shape of student ID cards, the system predicts that they are wearing student ID cards. We will enhance the model to recognize student ID cards and extract the information on them in real time. In addition, students are not permitted to wear short pants, but may wear skirts. Our model currently does not differentiate between short pants and skirts. For future work, we may utilize a skirt dataset [4] to help the model study an additional type of dress allowed to be worn at CTU. Moreover, when a student wears a mask, raises or lowers his head, the facial recognition model cannot identify him correctly. When a group of students enters the class, the model performs very well if no one is obscured; otherwise, the obscured objects will not be detected correctly. Figure 5 illustrates some examples in which models do not perform well. After completing the attendance checking progress, the monitor can easily export the .csv file containing attendance information as shown in Fig. 6.

Fig. 5.
figure 5

Some situations that the models do not perform well: a parking ticket can be recognized as a student ID card (left side); attendance can be checked with a wrong name due to the student’s lowered head (middle); the student’s uniform in the middle cannot be recognized because it is covered (right side).

Fig. 6.
figure 6

A spreadsheet for checking the attendance and school uniforms of students.

6 Conclusion

This paper details a system for automatically checking the attendance and school uniforms of students using YOLOv8. The main goal of this system is to eliminate the disadvantages of checking attendance manuals, which are tedious and time-consuming. The system performs fairly well in optimal lighting conditions. Our model is simple, accurate, and deployable at other institutions. More work is required to enable the system to work in less favorable lighting conditions.