Keywords

1 Introduction

With the many countries in the world pay more attention to STEAM education, it reflects the importance of all countries in the world to interdisciplinary courses. Several researchers, educators, governments, and industrial and commercial organizations emphasize the beneficial effects of STEAM disciplines and related skills in cultivating creativity and innovation positively impact the future society and economy [1,2,3]. STEAM education aims to cultivate the ability to integrate knowledge in science, technology, engineering, arts, and mathematics. Learners use what they have learned in these fields to identify and solve problems that cannot be solved by a single-disciplinary approach [4].

STEAM education is not only limited to integrating multiple disciplines but also needs to incorporate cross-field knowledge into problem-solving capabilities to solve various problems in life [5]. The general teaching strategy is project-based learning, which mainly involves a specific course or activity with the concepts of constructivism, contextual learning theory, cognitive psychology [6]. Influenced by the idea of learning by doing, STEAM education centered on hands-on learning has gradually been disseminated and promoted [7]. The hands-on in STEAM can cultivate learners’ core competence in engineering through the practice of the production process of the product and even serve as a bridge between learners' theory and reality [8, 9]. Moreover, compared to traditional teacher-centered teaching methods, hands-on learning can stimulate learners’ creativity and independent learning ability [10].

On the other hand, in order to evaluate the learning performances of learners, scales or questionnaires are commonly used for quantitative analysis [11,12,13,14,15], but this method is prone to be biased by people’s self-awareness [16]. Therefore, this study is a pilot study that uses object-detection to identify various materials and tools by recording the videos of learners in STEAM hands-on activities. In this way, this study can be used as the basic prototype of computer vision application in STEAM education. Its ultimate goal is to identify and judge learners’ participation in STEAM activities.

2 Related Work

2.1 STEAM Education

With the launch of the first artificial satellite by the Soviet Union, countries began to realize the importance of integrating science, mathematics, and technology; they thus sought to keep up with the development of technology and engineering to avoid military threats from other countries [17]. Furthermore, to maintain the country’s leading position in the rapidly changing and expanding global economy. The leaders of various countries have begun to regard the innovation and production of industrial, and consider it a necessary condition for economic progress and national competition. A series of educational reform projects for science, mathematics, and technology have been proposed [18, 19].

The most representative is STEM education proposed by National Science Foundation (NSF) in the 1990s. The term “STEM” education represents the abbreviations for Science, Technology, Engineering, and Math, which present that education is no longer an independent subject teaching model, meaning further towards integrating multiple subjects through the combination of science and engineering [5]. STEM integrates basic science and mathematics knowledge into engineering and technology and extends it to other fields [20, 21].

On the other hand, Yakman [22] indicated that most STEM education only regards various disciplines as independent fields without achieving proper integration; and the previous STEM emphasizes the knowledge and skills represented by multiple disciplines, while culture and humanities were often overlooked in the process from concept to implementation [23]. Therefore, Yakman [22] proposed integrating the arts into STEM to create a STEAM framework for teaching across disciplines (see Fig. 1). The STEAM framework establishes more effective interdisciplinary associations and improves learning mode within and between disciplines [24,25,26].

Fig. 1.
figure 1

A framework for STEAM education [22].

2.2 STEAM Hands-on Learning

Hands-on learning originated from the early garage culture in the United States. In the hands-on process, learners must be able to use their senses and make by themself to practice products, and then test their learning [27]. Learners will form an abstract concept or theory from the experience of trial, knowledge, and thinking to achieve reflective thinking through analysis and induction [28]. Holstermann, Grube, and Bögeholz [29] reported that learners with hands-on experience could show higher interest and learning motivation than others. It also has a significant positive impact on learning difficulties [30, 31].

The core of STEAM education is to cultivate learners using cross-domain knowledge to solve different challenges from the real world. Meanwhile, ghallenge and interesting learning situations tgnite curiosity and desire to explore and find specific solutions [32]. However, the traditional teacher-centered approach is ineffective on particular topics for STEM or even STEAM education [33,34,35]. Many studies thus have integrated hands-on learning into STEAM education, trying to find the most suitable teaching method in STEAM education [7, 36].

3 Methodology

3.1 Participants

InTheMicro:bit Obstacles Avoidance Car training workshop was held as an experimental activity. In this study, a total of 28 students were recruited to participate in this experiment from elementary schools in southern Taiwan, including 17 boys and 11 girls. All participants were asked to engage in a series of STEAM activities related to hands-on learning, which are the obstacles avoidance car tasks. Due to three students could not complete the experimental procedure, the total number of students after excluding their experimental data is 25 students. Then, 25 students were assigned to a homogeneous grouping of three persons according to the scores of the past achievement, which means a group of people with similar scores to ensure that they will have better interaction to complete every tasks. The detailed grouping information is shown in Table 1.

Table 1. The details of grouping in this study.

3.2 STEAM Activity

In this experiment, the micro:bit obstacles avoidance car activity was designed by shallower to the deeper, which guides the learners to gradually understand the operation and concept of the obstacles avoidance car and obtain micro:bit and makecode programming-related information. The details of the task design and arrangement are shown in Table 2.

Table 2. The task of Micro:bit obstacle avoidance car in STEAM activity.

3.3 System Design

Based on the excellent detection speed and accuracy of the YOLOv4 model [37] in the field of object detection, this study thus uses the YOLOv4 model to identify objects commonly used in STEAM hands-on activity. This study used the generated STEAM hands-on activities and its collected videos as training data, which targeted the human keyboard, mouse, tablet PC, obstacles avoidance car, mobile phone, and pen, a total of 7 objects for marking. The marking tool named LabelImg used is a set of open-source marking software developed based on Python [38]. LabelImg can be accessed through Mark the target object by frame selection, which the operation process is shown in Fig. 2. The final output is the bounding box of each object in the images denoted BBoxi, which contains the category number Classi, the center point of images X coordinate and Y coordinate, the image size Xi and Yi, the width of the bounding box Wi and the height Hi. A total of five data in YOLO format (see Fig. 2).

Fig. 2.
figure 2

The marking process of LabelImg.

The proposed approach is the original labeled training dataset, and the testing dataset was simply trained through the model of YOLOv4-Large [39], the YOLOv4-Large then trained the original YOLOv4. In this way, the object features learned by the YOLOv4-Large were granted to the original YOLOv4 and finally achieved the same detection speed and improve the recognition accuracy of small objects without increasing the cost of marking time. Restated, the YOLOv4-Large uses a deeper network architecture to overlay and adjust the input image size to 1280 × 1280, which can record the characteristics of more objects. As the input image size increases, the parts of small objects can be recognized.

4 Results

4.1 The Description of Datasets

This research uses YOLOv4 to identify the operated objects in Micro:bit Obstacles Avoidance Car activities, the learning materials, and tools include: 1) human; 2) keyboard; 3) mouse; 4) tablet; 5) car; 6) mobile phone; and 7) pen.

In the datasets, 872 labeled images captured by the field were cut, and all images were divided into 785 training dataset and 87 testing dataset at a ratio of 9:1. As Fig. 3, Fig. 3(a) is the number of various objects in the training dataset, which shows the distribution of objects is relatively uneven, especially the least number of mobile phones (class #1) and pen (class #5). Figure 3(b) is the distribution of the center points of the training set, and it can be found that the objects in the training dataset are evenly scattered throughout the screen. Figure 3(c) is the aspect ratio distribution of the objects in the training dataset. The result denoted that the objects in the training dataset are mostly small objects.

Fig. 3.
figure 3

The description of the training dataset.

4.2 The Results of YOLOv4 Object Detection

Due to images are taken from the same video for training and testing, the images in the training dataset and the testing dataset may cause similarity, and then, the results of accuracy calculated from the testing dataset may be too good to evaluate the accuracy of the model accurately. Therefore, to ensure that the testing dataset is completely independent of the training dataset, and its validity of the accuracy of the evaluation. The random segmentation is not used when splitting the training dataset and the testing dataset. In other words, the images in the training dataset and the testing dataset are from different groups; for example, some part of the training data may come from the 1st group, and some part of the testing data comes from the 3rd group. This way of datasets classification can avoid the problem of excessive similarity, and the validity of the accuracy evaluation can be further ensured.

After training 20,000 epochs, the final average loss is 2.8, and the mAP(Mean Average Precision) in the testing dataset is 88.7%. The accuracy of each type of object are: 1) human = 98.9%; 2) keyboard = 98.3%; 3) mouse = 98.1%; 4) tablet = 93.6%; 5) car = 93.5%; 6) phone = 62.7%; and 7) pen = 76.0%. The recognition result is shown in Fig. 4, which can find that the recognition effect of larger objects is better, but for small objects, such as pens and phone, the recognition results are relatively poor and often unrecognizable.

Fig. 4.
figure 4

A sample of the result in YOLOv4 object detection.

4.3 The Results of YOLOv4-Large Object Detection

In this study, the proposed approach is an original labeled training dataset, and testing dataset were simply trained through the model of YOLOv4-Large.Notably, if the YOLOv4-Large is directly used as the object detection model, the detection time will be lengthened because the model becomes relatively complex. That is, the original 30-FPS (Frames Per Second) is reduced to 10-FPS, which means that 20 images reduce the number of images detected per second. This phenomenon of speed loss will affect the operation speed of the subsequent overall motion recognition system and cause a bottleneck; therefore, this study only uses the YOLOv4-Large as an automatic labeling model. In other words, YOLOv4-Large will automatically label the remaining images and uses the results of automatically labeled images as a new training dataset to retrain the original YOLOv4 model.

Finally, after 20,000 epoch training, the average loss is 2.2 and mAP in the testing dataset is 96.1%. The accuracy of various objects are: 1) human = 99.1%; 2) keyboard = 98.6%; 3) mouse = 99.3%; 4) tablet = 95.5%; 5) car = 98.3%; 6) phone = 87.4%; and 7) pen = 94.7%. The comparison results of different models in STEAM hands-on activity for object-detection are shown in Table 3.

Table 3. Comparison of different models on STEAM hands-on activity

5 Discussion and Conclusion

Overall, although the original YOLOv4 is used as a model to recognize objects in STEAM hands-on activity, the recognition effect is better for larger objects; however, due to the complexity of the STEAM, there are often smaller objects that are not easy to identify. Therefore, this study proposed a solution to train small models (i.e., YOLOv4) in the form of large models (i.e., YOLOv4-Large) to improve the recognition accuracy of small objects while maintaining the same detection speed and not increasing the cost of marking time.

Compared to YOLOv4, the findings of this study show that the mAP of YOLOv4-Large increased by 7.4%. Moreover, the accuracy of smaller objects in STEAM obtained improving, that is, the mobile phone increased by 24.7%, and the pen increased by 18.7%. Based on this solution, objects of large and small sizes can be accurately identified in STEAM implementation activities to facilitate the implementation of advanced systems, such as human pose and behavior recognition.