Introduction

STEM (an acronym for Science, Technology, Engineering and Mathematics) has been applied to form educational policies, programs, or practices in one or more STEM disciplines (Bybee, 2010). The focus on STEM education has been growing in popularity all over the world, and the advancement that emerges from STEM fields affects every part of our everyday life. STEM education has the ability to prepare a workforce of inventors and innovators who will lead the economic prosperity of a country and thus improving the country’s economic ranking at the international level (Brown et al., 2011; Langdon et al., 2011).

However, there are many challenges faced in implementing STEM education due to the ambiguity of the definition for STEM education with multiple standpoints (Lin et al., 2019; Tam et al., 2020). A common belief of STEM education applied STEM as an intervention that combines all four STEM disciplines (Çevik, 2018; Chonkaew et al., 2016). On the other hand, the integration of any two STEM disciplines within an authentic context would be adequate to be categorised as a STEM-based lesson (Kelley & Knowles, 2016). This idea is supported by another study as one of the disciplines must be technological or engineering discipline (Sanders, 2012).

The diversity of STEM education-based studies in the literature is the best indicator of the rapid developments that STEM education has experienced over the last decade. Evolutionary trends of STEM education can be observed from studies that investigated the effect of STEM education on students in different aspects. Recent STEM intervention has been performed by applying STEM practices (Kang et al., 2018; Selcen Guzey & Aranda, 2017) and STEM elements such as problem solving (Gülen, 2019; Lin et al., 2018), critical thinking (Ardianti et al., 2020) and inquiry ability (Chittum et al., 2017; Lin et al., 2019; Ong et al., 2016a, 2016b) on teachers and students. It was found that an ideal STEM practice should lay emphasis on the inclusion of technology and engineering (Brown et al., 2011; Bybee, 2010; Sanders, 2009a) in common science and mathematics learning.

Among the most noticeable recent trends is the usage of the term ‘integrated’ or ‘integrative’, which highlights combining multiple disciplines with STEM (English, 2016; Gardner & Tillotson, 2019; Thibaut et al., 2018; Thibaut et al., 2018). Integrated STEM can be performed by teaching and learning of multidisciplinary subjects and by overlapping and sequencing across engineering practices and engineering design of relevant technology as part of science and/or mathematics (Bybee, 2013; Johnson et al., 2016). Besides, the emergence of other concepts like STEAM and STREAM as an extension of STEM can also be noticed. The added ‘A’ acronym represents the addition of the arts discipline and the ‘R’ acronym represents the integration of either robotics, religion, or reading elements in STEM. However, there are still many unexplored STEM developments that can be found in diverse STEM education-based studies conducted to date. As identifying the STEM education trend is relatively challenging, a social network analysis (SNA) can be introduced and applied to visualise the interconnection between the relevant studies.

SNA is an analysis of a network between social entities, which is commonly modelled by graphs in which the vertices represent the entities and edges represent the connections between the entities (Tabassum et al., 2018). The entities in the network are shown as an intertwined mesh of connections (Scott, 1988). SNA shows the connection between the available research papers, where each paper acts as an entity or domain. The connection established helps researchers to understand the dependencies between the domains, characterise the emerging trends and the effect of the domain on the network (Tabassum et al., 2018).

Besides, CiteSpace software has been developed by utilising the concept of SNA to provide a visual of emerging trends and patterns that illustrate the time–variant mapping from a paper to its intellectual origin (Chen, 2006). In order to achieve this, CiteSpace provides time-series snapshots of a domain before merging the snapshots (Chen, 2004). This open-source software can further enhance the functionality of SNA in identifying the current trends in STEM education and enriching the findings of systematic review studies.

The systematic review is a review of literature using a systematic method to summarise the evidence of a question/query in a detailed and comprehensive manner (Tawfik et al., 2019). In other words, a systematic review is an organised method of evaluating and outlining research studies that have met the specified requirements by adopting a clearly defined protocol. Bennett et al. (2005, p. 387) stated that:

Systematic reviews of educational research aim to answer specific review questions from published research reports by identifying relevant studies, characterizing such studies to form a systematic map of research in the area, extracting relevant data to establish the value of the findings, and synthesizing and reporting the outcomes.

A systematic review allows researchers to explore the status and trends in multiple disciplines (Li et al., 2020) in addition to minimising the bias during the review process (Hanley & Cutts, 2013). A systematic review assists researchers in identifying precise evidence on a query posed by narrowing down the relevant literature. Nevertheless, a systematic review alone is not able to visualise the interconnection between the review papers that specifically illustrate the current trends of STEM education research. Hence, this paper specifically aims to review the development of STEM education intervention for the secondary school level by using a series of procedures that included the use of SNA by CiteSpace and in-depth systematic analysis. In this study, “STEM education” was used as the keyword instead of the common stand-alone term to avoid confusion related to plants and stem cells in medicine as well as excluding the studies that related to STEM workforce supply. Similarly, this study defined STEM education as a learning intervention that involved any one of the STEM disciplines with the application of STEM practices and STEM skills such as teamwork, design thinking, and justification skills through inquiry group activities with real-world connection for the secondary level (Hafiz & Ayob, 2019). The findings obtained contribute significant insights into the STEM education trend with existing evidence base as reference for future STEM education research and development.

Background

STEM education

The STEM acronym was started to be known as SMET, but later it was amended to STEM as a better term to demonstrate the disciplines of science and technology, which are the main support for engineering and mathematics (Sanders, 2009a, 2009b). The emphasis on STEM disciplines and the idea to integrate these subjects were stimulated when the USA via National Research Council (NRC) noticed the shrinking qualified workforce in STEM-associated industries (NRC, 2012). This decreasing trend also affected the fulfillment of the growing demand for STEM careers worldwide (Wang, 2013). Subsequently, the US government took action in establishing the idea of STEM learning at schools in order to cultivate interest towards science subjects and inspire youth in pursuing STEM careers.

STEM education aspires to develop a strong basis through the incorporation of a science, technology, engineering and mathematics curriculum framework that enables students to be innovative, promotes critical thinking and enhances the ability in problem-solving (Kennedy & Odell, 2014; Langdon et al., 2011). STEM learning promotes a systematic process through guided steps, resulting in interesting and comprehensible learning, especially with the application of educational technology. NRC (2012, p. 42) suggested that STEM learning should include eight steps, which are usually mentioned as part of science and engineering practices:

(1) Asking questions (for science) and defining problems (for engineering). (2) Developing and using models. (3) Planning and carrying out investigations. (4) Analysing and interpreting data. (5) Using mathematics and computational thinking. (6) Constructing explanations (for science) and designing solutions (for engineering). (7) Engaging in argument from evidence. (8) Obtaining, evaluating, and communicating information.

Participation in the proposed practices based on a constructivist approach enables enhancing students’ comprehension of the crosscutting principles along with disciplinary ideas of science and engineering, inspiring and initiating students’ interest via a true meaningful self-learning process (Chu et al.,, 2021; NRC, 2012).

According to Kolb (2014), learning takes place when learners combine theory and practice by interacting with real-life components. However, in the context of science learning, students often have difficulties in understanding abstract concepts as they fail to link conceptual knowledge to real-life application (Camarao & Nava, 2017; Hochberg et al., 2014). The lack of exposure to real-life learning experience indirectly affects college students’ training and career (Bishop, 2019). As a solution, STEM education should be highly promoted to significantly influence students’ approaches to everyday issues (Gülen, 2019).

Numerous advantages have been reported through the implementation of STEM education, which allow students to engage with current policy issues and cultivate the necessary skills for decision making in daily life (NRC, 2012). STEM education has been proven useful in improving students’ understanding, cultivating positive attitude, and increasing students’ participation and interest in STEM professions (Cwikla et al., 2014; Supalo et al., 2014; Wallace et al., 2014). Moreover, STEM education provides the opportunity and context for the practical use of science concepts (NRC, 2012), developing skills and thinking (Bybee, 2010; Gülen, 2019; Kubat & Guray, 2018), enhancing understanding and transfer of knowledge between subjects (Bell, 2015). Results have demonstrated that students who possess STEM skills will be better prepared to confront the challenges of the twenty-first century (Wan Husin et al., 2016). Apart from that, STEM education intends to increase the number of STEM graduates eligible for employment in the STEM industry. Implementation of STEM education also benefits STEM graduates more in terms of employment and career prospects compared to non-STEM graduates (Langdon et al., 2011).

Global recognition of the critical needs of STEM education has urged academics and educators around the world to respond to this ongoing demand. Several publications presented an overview of the approaches used for STEM education literature trend analysis (Brown, 2012; Ibáñez & Delgado-Kloos, 2018; Li et al., 2020; Thibaut et al., 2018). The study of Li et al. (2020) investigated STEM education from early 2018 to 2000 and analysed 798 papers from different time periods, publications, and countries, and dealing with different topics and methodologies of research. The results revealed that STEM education research has grown significantly since 2010. A large number of recent publications also suggested that research on STEM education has received its own prominence as a trending and important subject. Apart from that, it was found that a vast majority of published STEM education research has been contributed from the USA, where the idea of STEM originated (Li et al., 2020; Margot & Kettler, 2019). A series of papers published between 2010 and 2018 were compiled as there had been limited studies offering detailed examples of educational effects and outcomes of technology such as Augmented Reality (AR) technology on the STEM fields (Ibáñez & Delgado-Kloos, 2018; Sirakaya & Alsancak Sirakaya, 2020). Both the above-cited reviews showed that AR implementation in STEM education happened at all ages, but it mainly focused on middle school students for determining the achievement level and affective factors such as motivation and attitude (Ibáñez & Delgado-Kloos, 2018; Sirakaya & Alsancak Sirakaya, 2020). However, studies exploring the gender aspect and practices among students in STEM education are still insufficient. In order to fill the research gap, this study examined empirical STEM education publications on the gender aspect and practices despite the understanding and attitude of junior and senior high school students.

In term of research methodology, systematic review analysis of STEM education found that most of the empirical studies employed a quantitative approach (Ibáñez & Delgado-Kloos, 2018; Li et al., 2020; Sirakaya & Alsancak Sirakaya, 2020). Apart from that, it was found that STEM education research frequently focused on designing and describing a STEM program, event or activity rather than emphasising its impact (Brown, 2012). Therefore, this study focused on researching the impact of STEM education intervention and excluded those articles that solely focused on development.

In summary, only few reviews paid attention to the current STEM education trends (Brown, 2012; Li et al., 2020), STEM education integration (Margot & Kettler, 2019; Thibaut et al., 2018) as well as AR teaching approaches in STEM learning (Ibáñez & Delgado-Kloos, 2018; Sirakaya & Alsancak Sirakaya, 2020). Part of these reviews focused on specific constraints on STEM education studies, while others analysed the overall development of the STEM education. Thus, there was a need for a comprehensive yet in-depth study on the design and effect of STEM education intervention on secondary school students’ understanding, attitude, gender aspect and practices as well as visualising the growth of STEM education.

Systematic review with the support of CiteSpace software

Lately CiteSpace software has gained attention as it is capable of linking and visualising the citation history and citation structure of publications in graphic mode (Chen, 2006; Chen et al., 2012). The integrated usage and assistance of CiteSpace software improves time productivity and complements the decision and comprehension criteria of a systematic review study. It offers evidence from reported publications that can lead and strengthen researchers' judgements and viewpoints. The CiteSpace software (http://cluster.ischool.drexel.edu/~cchen/citespace/) is free software able to accomplish multiple operations to optimise the interpretation and clarification of the chronology context and connections between previous research trends by ‘identifying the fast-growth topical areas, finding citation hotspots in the land of publications, decomposing a network into clusters, automatic labelling clusters with terms from citing articles, geospatial patterns of collaboration, and unique areas of international collaboration’ (Chen, 2004, para. 1). This user-friendly open source software based on Java application:

supports a unique type of co-citation network analysis – progressive network analysis – based on a time slicing strategy and then synthesizing a series of individual network snapshots defined on consecutive time slices [to identify] nodes that play critical roles in the evolution of a network [, which] are candidates of intellectual turning points. (Chen et al., 2010, p. 1393).

Systematic reviews or research review papers based on CiteSpace software analysis have been conducted in broad fields of studies ranging from from medicine (Chen et al., 2012), social commerce (Cui et al., 2018), urban transportation network (Jia et al., 2019) to environmental science (Widziewicz-Rzońca & Tytła, 2020). Specifically, in the education field, recent studies investigated the chronological development of remote laboratories in science education (Tho et al., 2017), AR in education (Liu et al., 2018) and Massive Open Online Courses (Zheng et al., 2019). However, there is a lack of systematic review on STEM education, which plays a significant role in the modern world, to visualise the growth of STEM education.

Design

A systematic review of STEM education with growth visualisation using the CiteSpace software and EPPI method that provided a comprehensive coding procedure in identifying the main characteristics of studies (Tho et al., 2017; Bennett et al., 2005) would be appropriate and worthwhile for applying to STEM education in this study. It was also found that there are significant advantages linked to STEM education that inform further research and development work on infusing STEM in science classrooms. There were several essential questions to be explored or remained unanswered:

  • What is the growth of STEM education for secondary school students?

  • What is the design of STEM education intervention for secondary school students?

  • Does learning through STEM education help secondary students understand better?

  • Does learning through STEM education enhance secondary students’ attitudes towards this new mode of learning?

  • Are there any gender differences in learning through STEM education for secondary school students?

  • Does learning through STEM education develop secondary students’ practices and processes for this new mode of learning?

The following sections begin by considering the methods and design steps on how to review previous research in STEM education. The evidence of the study was then reviewed and analysed. Finally, we reported on the findings and the scope of future work.

Methodology

This study analysed previous STEM education research studies found in the Social Science category of the SCOPUS database (https://www.scopus.com/). The analysis was divided into two main parts that complementing each other. First, the STEM education studies were identified and isolated through the CiteSpace analysis. Second, a number of articles were selected based on the criteria and adapted Evidence for Policy and Practice Information and Co-ordinating (EPPI, 2007) guide for in-depth document analysis to identify trends, design principles and areas of further research. CiteSpace provides a general visualisation and trend on STEM education which offers an initial idea for further analysis using EPPI.

Part 1—CiteSpace analysis procedures

The identified STEM education research studies from the SCOPUS database were imported to CiteSpace analysis for detecting citation bursts, tracking evolution of STEM education field into two complementary visualization views as cluster view and time-zone view. The burst terms were retrieved from titles, abstracts, descriptors, identifiers of bibliographic collections, and the frequency of the term bursts over time (Chen, 2006). The CiteSpace software utilised the burst terms as label of clusters and identified other important highly cited articles that were not listed in the SCOPUS database through the cited references or bibliographic collections (Tho et al., 2017; Chen, 2006).

The objective of this study included searching for the important STEM education articles that were most cited for analysis. Figure 1 shows the setup for CiteSpace analysis, where the time slicing was set to divide a timespan from 2000 to 2020 into a series of smaller windows. The CiteSpace software enables users to generate several sorts of networks by choosing a single node type or multiple concurrent node types. The selected node types included keywords and references of articles after pruning the sliced network and the merged network. Based on the findings, 45,540 valid references were detected from 1093 database records between 2004 and 2020. First, all the anonymous authors were removed from the list as the unidentified information would not contribute to current analysis and these missing data might repeat the same year of publication with the unidentified information set by publishers.

Fig. 1
figure 1

Set-up for CiteSpace analysis

Toggle legend colour and background colour were adjusted for clearer visualisation. These data were further assigned to 85 clusters identified, but due to the software restriction, only 16 keywords of major clusters were listed under cluster visualisation and timeline visualisation in Figs. 3 and 4. The key terms were extremely significant for recognising the emerging trend of recent active STEM education research as well as the guidelines for future studies.

The strongest citation burst in terms of the strength and burst duration was analysed to explore the highly cited author. A narrative summary was created to explore the largest cluster in STEM education based on the size, silhouette value, mean year and label based on frequency-inverse document frequency (TFIDF), log-likelihood ratio (LLR) and mutual information (MI) algorithm. In this study, the default LLR algorithm was chosen as a calculation technique for obtaining clustering outcomes. The likelihood ratio or its logarithm was applied to compute the maximum likelihood according to the probability density function in extracting the presumable cluster label.

Part 2—In-depth analysis of selected articles

The SCOPUS database was accessed on 16 December 2020 with the permission of the institution to search for the articles meeting the criteria of this study. The search result was refined to articles published in English in the field of social science. The exported data were then filtered by the research title followed by abstracting based on the research questions and criteria during the screening process. The articles chosen were examined by the other researchers to ensure fulfillment of the review criteria.

This section of the analysis focused on a total of 38 in-depth review research papers based on two elements of the adapted EPPI (2007) guide, specifically reporting standards and quality of the study. Each of the studies was entitled to a double evaluation of five researchers, which included judgements on further classification and identification based on all the criteria established for this review. As a consequence, the review process can be said to be accurate, reliable and transparent, which could be further reinforced and modified. The overall goal of the systematic review for STEM education was seeking evidence to prove whether STEM education contributes to development and evaluation of the improvement in the understanding, attitudes, gender aspect and practices beneficial for secondary school students. Studies included in the review met the following criteria:

  • Their principal focus is the effects of STEM education intervention on design, understanding, attitudes, gender aspect and practices.

  • They report educational evaluations of student’s learning in STEM education, not only development.

  • They have been published in English language journals during the period 2004–2020.

  • Level of the sample is secondary school students.

These review phases facilitated the analysis, which included identifying research gaps and combining ideas of different research topics. Furthermore, these phases helped to reduce the research bias, for instance, the influence of the results of this study abstracts on the review procedure. Where the nature of the review process involved a clear or detailed review of the methodology and results, researchers used this review format to summarise different constructs of evaluation, which should be very useful in research discussion, conclusions and suggestions.

Analysis and findings

SCOPUS database

The educational research studies were found from the large pool of STEM education studies available in the SCOPUS database. Figure 2 demonstrates 1093 social science articles in English language at the final publication stage that resulted from the SCOPUS database search between 2004 and 2020.

Fig. 2
figure 2

SCOPUS database search result for the topic STEM education

CiteSpace analysis

The narrative overview was created for the purpose of retrieving information using the key terminology. The network was distributed into 18 co-citation clusters. The automatically selected cluster labels of the seven largest clusters were displayed along with their size, identity number and silhouette value in brackets. The size of the cluster corresponded to the number of articles published within the cluster, where the mean year was from the duration in which the cluster was updated. The silhouette value ranging from –1 to 1 was used to measure the homogeneity involved in defining the existence of a cluster with a value of 1, which was a means of perfect isolation from other clusters where no single article was grouped into two or more clusters (Rousseeuw, 1987; Chen et al., 2010). Furthermore, Chen et al. (2010) stated that ‘cluster labelling or other aggregation tasks will become more straightforward for clusters with the silhouette value in the range of 0.7 ~ 0.9 or higher’ (p. 1391). The largest seven clusters identified were engineering education (#0, 0.623) with 50 papers followed by social support (#1, 0.748) with 51 articles, interdisciplinary STEM education (#2, 0.7) with 31 articles, gender difference (#3, 0.797) with 38 articles, STEM discipline (#4, 0.852) with 29 articles, engineering design (#5, 0.84) with 31 articles and kinetic friction coefficient (#8, 0.853) with 26 articles.

According to the cluster visualisation shown in Fig. 3, engineering education (cluster #0), social support (cluster #1), interdisciplinary STEM education (cluster #2), gender difference (cluster #3) and STEM discipline (cluster #4) were actively connected, where engineering design (cluster #5) and kinetic friction coefficient (cluster #8) have a weaker connection with the main cluster. The timeline visualisation shown in Fig. 4 establishes the crucial evolution of STEM education research in the history by showing the clusters between 2000 and 2020. The clusters of engineering education and engineering design started to gain attention in the early 2004 followed by STEM studies on social support and gender differences, which began in 2007. Subsequently, studies in interdisciplinary STEM education and STEM disciplines became popular starting 2010. All these six major themes were still active in the current STEM research. The earliest cluster that emerged in the STEM field in the early 2001 was college student pathway, followed by the cluster of state-supported residential academies and prediction of STEM enrolment. However, these clusters have received less attention in recent years.

Fig. 3
figure 3

Terms generated from 2000 to 2020 in cluster visualisation

Fig. 4
figure 4

Terms generated from 2000 to 2020 in timeline visualisation

Another significant highlight of this study is an accountability of the respective authors with the strong citation burst over the past 20 years. The citation burst occurred as a result of a huge number of citations of a study experienced over a particular time frame. Figure 5 demonstrates top 10 papers with strongest citation bursts in terms of the strength value and burst period. The analysis revealed the highly cited authors in the STEM education field according to the strength value were Maltese and Tai (2011); Bybee (2010); Langdon et al. (2011); Wang (2013); NRC (2012); Crisp et al. (2009); Brown et al. (2011); Carlone and Johnson (2007). The selected articles published in recent years showed the citation burst strength value ranging from 3.34 to 5.12. The study of Maltese and Tai (2011) with highest strength value (5.12) focused on the association of students’ educational experiences pursuing a degree in STEM that gained attention of the research community during the period 2015 to 2017. Besides, it can be observed that the majority of the articles experienced citation bursts only after several years of publications, which was justified in the study by Li et al. (2020) stating STEM education research developed significantly after 2010.

Fig. 5
figure 5

Top 10 keywords with strongest citation bursts

Systematic review through reporting studies based on adapted EPPI

Overview of relevant studies

The 38 research articles identified through the screening procedures were systematically reviewed using the adapted EPPI (2007) criteria for reporting standards and quality of study. The reporting details are summarised in Table 1. Out of the 38 articles that fulfilled all the criteria, there were 20 studies designed for STEM intervention, 16 studies reported on understanding and 30 studies reported on attitudes. There were only four studies that investigated the gender aspect, while 11 studies examined the skills and science practices. The outline of these articles and the evidence presented therein are discussed in the following sections.

Table 1 Reporting details on evaluation of the 38 studies included

The countries of origin for the data in the articles can be classified into three geographical regions: 21 studies in North America, 16 studies in Asia and one study in Europe. It can be seen that the most number of STEM studies were conducted in the USA, which corresponded to the outcomes of Li et al. (2020). More than 50% of the studies were funded by foundations that supported the STEM programme to completion. Nearly 77% of the studies involved small-scale participants of fewer than 100 samples. The limited sample size might reduce the power of the study and the data cannot be generalised to the population. The small sample size implemented in most of the studies might be due to insufficient funding, fewer coordinators and time constraints in guiding the participants in hands-on group activities or field trips. However, as some studies involved qualitative design, so only a small number of samples were required to focus on in-depth exploration of the STEM intervention.

Among all the STEM studies that were conducted at the secondary level, there were nine articles related to middle school students and 27 studies related to high school students. Out of the 27 studies, 10 primarily focused on upper secondary or senior (11th grade to 12th grade) students. Only one study incorporated both middle and high school students (Hotaling et al., 2012), whereas another one involved high school and college students (Supalo et al., 2014). Nonetheless, there was a confusion in the study of Outlay et al. (2017), where the camp designed was exclusively for middle school girls entering the 6th, 7th and 8th grades, but the sample for data collection only involved 6th and 7th grade students.

Based on the research methodology, there were 17 studies that applied experimental designs, while 21 studies used non-experimental designs. Furthermore, 19 studies implemented mixed methods, 13 used quantitative approaches and six used qualitative approaches for data collection and analysis. Only 17 out of 38 studies obtained ethics board consent and permission from the authorised community for conducting the studies, where the participation was absolutely voluntary. Nearly 58% of studies did not mention the pilot test, which is crucial for researchers to foresee the challenges and problems that might be addressed. Nevertheless, the findings about effect size were reported in only four studies (Çevik, 2018; Chang et al., 2015; Parno et al., 2020; Wallace et al., 2014).

Most of the research studies published on STEM fields with the total of eight studies (Cantrell & Ewing-Taylor, 2009; Chacko et al., 2015; Keller & John, 2020; Lin et al., 2018; Outlay et al., 2017; Sabo et al., 2014; Serrano Pérez & Juárez López, 2019; Tam et al., 2020) focused specifically on technology and engineering disciplines, which were trending research topics. Besides, there were three adapted STEM studies with additional disciplines such as health (Wallace et al., 2014), imagination of STEM (Tsai et al., 2018) and digital literacy (Jiang et al., 2019). STEM disciplines chosen in these studies will lead to different designs for the intervention.

Evidence of design in STEM Education

In general, there were 20 studies designed for their own intervention in STEM education, of which 18 studies implemented existing or adapted STEM programmes. The intervention introduced to participants included STEM workshop, lesson, module, programme, field trip, experiment, hands-on activities, research project, learning software, peer mentoring and others. The STEM intervention in these studies was carried out in different durations. Some of the studies were carried out in a short time frame that is less than a week due to time constraints (Ghadiri Khanaposhtani et al., 2018; Keller & John, 2020; Mohd Shahali et al., 2019; Supalo et al., 2014; Tam et al., 2020). Seven studies with one to four weeks and 13 studies with a duration of two to six months of intervention. Another nine articles are longitudinal studies that observed the long-term effectiveness of STEM intervention over the period of two to three years. However, the studies of Ardianti et al. (2020), Fung (2020) and De Leo-Winkler et al. (2019) did not clearly mention the time interval for the entire STEM intervention. In general, an educational intervention should be carried out for at least eight weeks in order to effectively assess the impact on the participants.

While several STEM studies have been undertaken in recent years, not all studies have actually focussed on proper understanding of STEM education. Many studies claimed that a curriculum with the integration of science, technology, engineering and mathematics is considered as STEM education. In fact, the definition of STEM education goes far beyond merely combining the subjects. A common deficiency that was found in the prior studies was lack of emphasis on STEM practices and STEM skills among learners. As an example, the study of De Leo-Winkler et al. (2019) was more teacher-centric rather than student independent active learning in exploring new concepts. Therefore, only those articles were identified for inclusion in the current study where the authors had implemented the intervention with the application of STEM skills and elements. Articles with ‘self-claimed STEM education’ were also excluded for further analysis.

Additionally, some improvements could be made from other prior research. For example, the study of Chacko et al. (2015) and Chang et al. (2015) showed unclear objectives in respective studies. Study of Chacko et al. (2015) and Cwikla et al. (2014) also did not demonstrate the results precisely. Apart from that, certain general limitations had been listed as potential guidelines. The most common limitation reported was about the constraints related to students’ enrollment that resulted in a small sample size and limited response rate from participants (Leonard et al., 2016; Mohd Shahali et al., 2019; Sabo et al., 2014). Self-selection of participants was another issue as it might have been affected by the initial attitude and motivation of participants before the intervention as well as the parametric test (Bamberger, 2014; Chittum et al., 2017; Hughes et al., 2013). The group experimental design in the studies of Huri & Karpudewan (2019) and Mohd Shahali et al. (2019) lacked a control group for the comparison of the findings. Moreover, the lack of randomisation (Chang et al., 2015; Tam et al., 2020; Thomas et al., 2015; Wallace et al., 2014) and possibility of interaction with control groups (Bamberger, 2014) also made it impossible to perform an experimental design. The studies of Blustein et al. (2013) and Fung (2020) also claimed that the geographical factors and cultural factors can also threaten the validity of the study. Specifically, for qualitative study, the interpretation of the narrative might be one of the biases for research findings (Blustein et al., 2013). On the other hand, quantitative research with supporting qualitative data gave in-depth and meaningful findings by examining the convergence, complementary and coherence of triangulated data (Creswell & Creswell, 2018). As an example, the study of Barak and Assal (2018) used examination questions and class assignments for evaluating students’ understanding in STEM learning through qualitative and quantitative approaches.

Evidence of understanding through STEM education

The evidence on understanding in STEM education came from 16 studies. As an alternative to Bloom’s taxonomy, students’ understanding can be defined as a learning process where students can explain, interpret, apply, empathise, have perspective, and have self-knowledge (Wiggins et al., 2005). The majority of these studies examined students’ understanding by utilising conceptual assessment tests, and two studies used examination scores (Barak & Assal, 2018; Thomas et al., 2015). Four studies (Chacko et al., 2015; Cwikla et al., 2014; Sabo et al., 2014; Serrano Pérez & Juárez López, 2019) reported their data with descriptive statistics like percentage of the correct responses, whereas the other 11 studies provided data with inferential statistics. On top of that, the study of Ghadiri et al. (2018) interpreted students’ understanding from their drawing by inductive and content analysis. Almost all of the studies showed a positive change in conceptual understanding except the study of Thomas et al. (2015), which showed trivial insignificant improvement in academic, and the study of Gülen (2019), which reported that STEM roles did not have any significant impact on boosting academic achievement. Furthermore, there were no research studies that focussed on encouraging low achievers for improving their academic achievement, performance and understanding or creating awareness for their future study and career path even if they had demonstrated poor attitude towards learning.

Evidence of attitudes through STEM education

According to 30 studies providing evidence of significance of attitude in STEM education, the majority of the studies used survey questionnaire items and written open-ended questions for data collection. Smaller studies used interviews, observations and drawing tests to code students’ responses and interpret their perceptions. Nevertheless, the study of Supalo et al. (2014) did not state the instrumentation and method used for data analysis. The aspects explored in these studies included motivation, interest, enjoyment, self-efficacy, awareness and career perceptions. Six quantitative studies evaluated the outcome only by way of descriptive statistics using mean, standard deviation and percentage without comparing the findings. Another 14 quantitative studies performed inferential statistical analysis where qualitative studies classified the written responses into themes. Nearly all of the studies reported the transformation of participants’ attitude in a positive direction. However, the scale of the positive change varied between studies. Few studies mentioned there were no significant differences on students’ attitude towards STEM education (Barak & Assal, 2018; Leonard et al., 2016) or career choices (Cantrell & Ewing-Taylor, 2009). On the other hand, the study of Mohd Shahali et al. (2019) reported reduced interest towards STEM subjects, study of Leonard et al. (2016) showed a significant decline in self-efficacy on computer use and study of Bamberger (2014) demonstrated negative perceptions towards women scientists or engineers. Thus, future researchers should take note of the content of the designed intervention to optimise participants’ engagement in STEM learning. As an extension for the study of Bamberger (2014), more participants and professional scientists from both gender groups shall be involved to explore the gender aspect towards a STEM career.

Evidence of gender through STEM education

The gender dimension of STEM education was given the least consideration as only four studies included this aspect of STEM education. This section includes only those studies that examined the gender aspect by including participants from both sex. Gender viewpoints were analysed using quantitative, qualitative and mixed methods. Studies reported that both genders demonstrated slightly different mean scores of engagement and achievement in STEM learning (John et al., 2016) and making career choices (Blustein et al., 2013; Cantrell & Ewing-Taylor, 2009). Unfortunately, most of these studies did not collect inferential statistics to draw the conclusions on gender effect except the study of Hughes et al. (2013), who derived that the effectiveness of the programme was affected by the type of pedagogy but not the gender. Hence, the findings indirectly imply that more consideration shall be given to address the issue of gender stereotyping in STEM ability or even STEM practices.

Evidence of practices outcome through STEM education

The evidence of practices in STEM education included 11 studies which mostly implemented assessment tests, rubrics or questionnaires for evaluation. The practices investigated comprised hands-on or science process skills and high-order thinking such as problem-solving, inquiry, analytical thinking, computational skills, reflective and STEM-integrated thinking in addition to the technical skills. Eight out of those 11 studies presented positive practice outcomes. The study of Gülen (2018) found no significant difference in the development of reflective thinking skills, whereas the study of Lin et al. (2019) stated no statistically significant difference in technological inquiry abilities compared to control groups. Lastly, the study of Leonard et al. (2016) reported students’ computational thinking skills based on the nature of instructions. As a result, STEM intervention with beneficial initiatives shall be continued to nurture more valuable STEM skills for potential leaders.

The linkage between CiteSpace and EPPI analyses

Based on the findings, the EPPI results were related to the CiteSpace analysis. The largest cluster obtained from CiteSpace analysis was engineering education where five of 38 studies from EPPI analysis mentioned engineering discipline as their focus. This indicated engineering education was the early concept leading in the STEM field. However, recent STEM interventions have gradually expanded to other disciplines but not only the engineering field. Besides, the CiteSpace analysis also demonstrated gender difference as the third-largest cluster but EPPI analysis shown limited STEM education intervention on the gender aspect. This finding denoted previous STEM studies such as integrated STEM teacher preparation, predicting STEM enrolment and student perception that included the gender aspect were mostly survey studies without the application of STEM education intervention. These linkages are able to provide an overall idea for implementing the relevant STEM studies in future.

Conclusion and future work

This study presents an overview of recent STEM education research conducted using two types of analysis procedures: CiteSpace analysis and in-depth analysis based on the adapted EPPI (2007) guidelines. The practicality and feasibility of these procedures have been demonstrated by providing a well-defined review of research methodologies and findings, which are useful in analysing various constructs of empirical studies (Tho et al., 2017). The CiteSpace analysis identified several top-cited authors like Maltese and Tai (2011); Bybee (2010); Langdon et al. (2011) and detected engineering education as the largest cluster within the STEM education articles. Hence, the finding demonstrated most of the studies established different weights on different fields as it was challenging to balance each of the disciplines in STEM education. STEM research studies that fulfilled the adapted EPPI (2007) criteria were screened and shortlisted by all the five researchers to improve the research trustworthiness. In-depth analysis demonstrated that STEM education is highly beneficial towards secondary school students. Most studies indicated a positive impact of STEM education on the students’ understanding, attitude and practices, without a major gender gap. The findings indicated that STEM education intervention and other efforts need to be continuously carried out for secondary school students. A STEM education intervention should be well planned to ensure that the intervention truly represents STEM education principles in terms of content and instruction. STEM intervention design shall consider the activity’s suitability with the participants’ cognitive, emotional and physical maturity in order to maximise the effect of the intervention. Future studies shall examine the gender aspect as there are limited studies involving this aspect in the STEM education field.

This study also highlights the significance of sampling, data collection and statistical analysis methods. The sampling method shall strive to include a higher number of samples, to ensure the generalisability of the findings and extending the benefits of STEM education to more students at all levels of academic performance. In addition, inferential statistics shall be employed in data analysis to clearly illustrate the difference in STEM achievements. Statistical tests on effect size can be used to determine the impact of STEM intervention on participants. Furthermore, issues such as self-selection, lack of control groups and randomisation selection, and self-interpretation of collected data shall be avoided in future studies. Pilot tests, informed consent and human research ethics approval also need to be taken care of in accomplishing successful STEM education research. The outcome of this paper has contributed to the literature of STEM education and perhaps will also contribute to improving the quality of future STEM education intervention and research.

Although this systematic review using adapted EPPI (2007) and CiteSpace was carried out to reduce selection bias, the articles selected were limited to the SCOPUS database for the period 2004 to 2020. More information on the necessity and benefit of STEM education intervention could be provided by opting other high-impact databases such as Web of Science in the future work. The retrieval method for database searching was limited to the topic of “STEM education”. Future studies should expand the sources of literature by using other keywords to explore more STEM education studies that may not use the same term. Additionally, this study solely focused on STEM education intervention at the secondary level. Future analysis can explore the STEM education trend among elementary students or university students. More research is required to explore other aspects of practices in STEM education intervention, along with the aspect of understanding, attitude and gender aspect, especially for underachievers who received minimal attention in empirical STEM studies. The connection between STEM education intervention with STEM career awareness which have been mentioned in previous studies should be included by future researchers. Further study shall also consider appraising the high-impact studies via authoritative published appraisal tools.