1 Introduction

With the rapid development of artificial intelligence (AI), the demand for K-12 computer science (CS) education continues to rise (El-Hamamsy et al., 2021). However, there has long been a lack of trained CS teachers due to reduced budgets (Margolis et al., 2011). As a result, school administrators have increasingly been forced to ask teachers with little formal training in CS to teach CS courses (Moin et al., 2005). In China, the national government advanced the development and implementation of an AI and programming curriculum for K-12 education in the national policy “The Development Plan of the New Generation Artificial Intelligence”, but due to a lack of AI subject knowledge, teaching skills, and attitudes, large numbers of in-service CS teachers are not competent in teaching AI courses or modules in schools.

Reflecting the potential for effective teacher behavior in specific teaching situations, teaching competency is necessary to enhance in-class professionalism (Kim & Kim, 2016), and it involves the combination of psychological qualities such as the knowledge, skills, and dispositions that teachers need to complete teaching tasks (Caena, 2014). In K-12 AI education, although researchers have conducted some studies and implemented interventions among primary and secondary school students directly (Goel & Joyne, 2017; Henry et al., 2021), few studies have targeted in-service CS teachers (Ng et al., 2021). To fill this gap, a professional development (PD) program based on the technological pedagogical content knowledge (TPACK) framework combining online and offline activities was designed and implemented to promote CS in-service teachers’ AI teaching competency in the current research, and the effectiveness of the program was also examined with quantitative data and qualitative data. To some extent, the specific TPACK-based framework oriented toward AI teaching competency (TPACKAI) and its corresponding teacher learning activities developed with some effective elements of PD in this study may theoretically contribute to the literature and practically make up for the gap in CS teacher PD initiatives in China.

The paper is organized as follows: Section 2 provides a literature review, Section 3 describes the research methodology, Section 4 shows the results, Section 5 discusses the findings, and the last section presents the conclusion.

2 Literature review

2.1 AI teaching competency

Teaching competency is defined as an integration of the knowledge, skills, and attitudes required for the successful implementation of subject matter education (Koster et al., 2005; Prajugjit & Kaewkuekool, 2020), and it is necessary to enhance professionalism in class (Doodewaard, 2020). Among various models of teacher competency, the COACTIV model of teachers’ professional competency by Baumert and Kunter (2013) is highly relevant to this study. This model assumes a diverse set of capacities, including cognitive (i.e., knowledge and skills) and noncognitive characteristics (belief), that constitute the key determinants of successful teaching (Kunter & Baumert, 2013). Drawing on the COACTIV model, we conceptualize three aspects of AI teaching competency that are particularly linked to teachers’ AI teaching practice: AI teaching knowledge, AI teaching skills and AI teaching belief (self-efficacy).

Regarding the cognitive area, the TPACK framework (Mishra & Koehler’s, 2006) is usually adopted to identify teaching knowledge and skills. Kim et al. (2021)used the TPACK framework to determine which teaching abilities are necessary for CS teachers to improve AI teaching in the K-12 stage. In terms of pedagogical knowledge, teachers are required to implement problem-based learning (PBL) and game-based learning with appropriate AI ethics. Regarding content knowledge, teachers are required to reach the knowledge level of undergraduate students majoring in AI. In terms of technological knowledge, teachers are required to master programming technology and choose appropriate AI platforms.

Regarding the noncognitive area, teaching self-efficacy is one of the most important constructs in teaching competency (Lauermann & König, 2016). Based on Bandura’s (1997) work, teachers’ self-efficacy denotes teachers’ beliefs about their abilities to succeed in specific situations. For teachers’ professional development (TPD), teaching self-efficacy is a potential construct that affects teachers' actual teaching knowledge, ability, behavior, and thinking (Orakcı et al., 2020; Tschannen-Moran & Hoy, 2007). Burić and Kim (2020) reported that teaching self-efficacy is one of the most significant motivational features that affects teachers' teaching quality and students' motivational beliefs. Other studies have also indicated that there is a strong correlation between teachers’ teaching self-efficacy and students' achievements (Tschannen-Moran & Hoy, 2007). The extent to which teachers perceive such efficacy may influence whether they will take action, whether they invest effort in an action, and how long they may sustain possible challenges (Tschannen-Moran & Hoy, 2001). Therefore, this study considers teachers’ AI self-efficacy as a decisive element for CS teachers’ success in AI teaching.

2.2 Professional development for CS teachers

TPD is mainly regarded as the PD of in-service teachers and is defined as a broad activity that develops teachers’ skills, knowledge, expertise, and other characteristics (OECD, 2009). Effective PD can facilitate the transformation of teaching practice and ultimately improve students' learning (Desimone, 2009). Even for qualified in-service teachers in CS, PD is necessary, as it allows them to enhance their disciplinary knowledge, acquire innovative pedagogical approaches, and deepen their content pedagogical skills (Brandes & Armoni, 2019; Tokmak et al., 2013).

In terms of PD approaches for CS teachers, face-to-face, online or blended activities have been designed and implemented in previous research (Armoni, 2017; Lazarinis et al., 2019; Murai & Muramatsu, 2020). For instance, Rich et al. (2021) conducted a face-to-face PD program to train CS teachers to teach coding to K–6 students. The year-long PD program included open-ended, hands-on, model lessons, and other activities, and the results showed that there were statistically significant increases in computing teaching efficacy and teaching values. Considering the cost and distance, Goode et al. (2020) transformed face-to-face PD activities into fully online activities to make K-12 CS teachers competent in teaching new and equitable CS courses. To improve K-12 CS teachers’ Scratch programming skills, Lazarinis et al. (2019) implemented a blended PD with the Moodle platform and found that it promoted the development of teachers' computing thinking.

In terms of PD content for CS teachers, with few studies on AI knowledge and related pedagogy, most research has mainly focused on CS concepts (El-Hamamsy et al., 2021), knowledge (Chai et al., 2020), programming (El-Hamamsy et al., 2021; Martinez et al., 2016), computational thinking (Monjelat & Lantz-Andersson, 2019; Reimer & Blank, 2018), and pedagogical skills (Qian et al., 2018; Sentance et al., 2018). For instance, Martinez et al. (2016) carried out an introductory CS PD course for K-12 teachers, which was integrated with pedagogical content knowledge and teacher classroom practice. The results showed that the PD program effectively improved inquiry-based CS teaching. However, the PD course did not include AI-related TPACK, with only a focus on teachers’ fundamental programming concepts. Kandlhofer et al. (2019) presented an educational project for training and certifying teachers and school students in AI and robotics and described the four stages and the detailed content of the training modules and curricula in this project. Although the appropriateness of the teaching methods and teaching materials of the developing system were explored among 16 teachers, the impact on teachers’ learning outcomes has not yet been examined. Vazhayil et al. (2019) conducted a PD workshop, with only two learning activities on the technical application of text and image recognition to train CS teachers to introduce AI in their schools; however, the PD program lasted for only 2 days, and qualitative data were collected from teachers to examine the effectiveness of the workshop.

2.3 In-service teacher professional development based on the TPACK framework

Initially, the TPACK framework was utilized to conceptualize teachers’ technology integration competency in teaching (Rosenberg & Koehler, 2015). Mishra and Koehler (2006) summarized this complex intersection of domain knowledge as TPACK, expanding the original pedagogical and content knowledge (PCK) model (Shulman, 1986). In teacher education, the TPACK framework has been widely used for in-service teacher PD. Research has shown that TPACK-based PD programs improve in-service teachers’ knowledge, self-efficacy, and technology integration skills (Baran, 2010; Blonder et al., 2013; Hong & Stonier, 2015; Koehler et al., 2007; Oda et al., 2020). For instance, Oda et al. (2020) organized 24 school teachers to learn about the integration of GIS in science or social science classes, and the TPACK framework was used to understand the properties and impacts of PD; the authors found that the framework helped teachers introduce and use GIS technologies in their class more often.

However, for in-service CS teachers, the literature on TPACK-based PD is limited, with a focus mainly on computational thinking (CT). Angeli et al. (2016)discussed the TPACK that teachers needed to teach a K-6 CS curriculum based on a CT framework and developed a corresponding PD course, but they provided little empirical evidence in terms of the effectiveness of the CT-based TPACK framework. Kong et al. (2020) implemented a TPACK-based PD program with two 39-h courses for 76 in-service primary school teachers that mainly focused on CT concepts and practice. Although knowledge tests, surveys and reflections were used to examine the teachers’ understanding of TPACK-related CT knowledge and perception of PD, AI knowledge was not included, nor were the related teaching skills and self-efficacy examined in this PD program. Chen and Cao (2022) examined a virtual PD program among 43 in-service school teachers, including CS, mathematics, science, and other subjects. The program aimed to improve K-12 teachers' knowledge, attitudes and beliefs in maker-centered instruction instead of their AI teaching competency. The researchers reported that the study relied heavily on self-reports for assessing teachers' capabilities and attitudes, which may influence the reliability and validity of the research. For AI-related PD, Kim et al. (2021) used the TPACK framework to determine which teaching abilities are necessary for CS teachers to enact AI teaching in K-12 education; however, this study did not extend the framework into the PD program and examine the effectiveness on teachers’ AI competency.

In sum, general TPACK-based frameworks for CS teachers’ PD have been proposed, and their effectiveness has been examined in prior studies, but few studies have designed specific TPACK-based PD programs to improve CS teachers’ AI teaching competency, namely, their AI knowledge, skills, and teaching self-efficacy. To address this gap, the current research intentionally designed a TPACK-based PD program incorporating some effective elements of TPD in a blended environment (as described in detail in Sect. 3.3) and then examined the effectiveness of this PD program.

The following research questions (RQs) were investigated in particular in this research:

  • RQ1: Will the AI knowledge of K-12 CS teachers be significantly improved by the TPACK-based PD program?

  • RQ2: Will the AI teaching skills of K-12 CS teachers be developed by the TPACK-based PD program?

  • RQ3: Will the AI teaching self-efficacy of K-12 CS teachers be significantly improved by the TPACK-based PD program?

3 Methods

3.1 Participants

Forty K-12 CS teachers from a northwestern province in China participated in this research voluntarily, and the demographic information is described in Table 1. Among these participants, 22 (55.0%) teachers were male, and 18 (45.0%) were female; 16 (40.0%) teachers were aged between 36–40 years old; 26 (65.0%) teachers had been teaching more than 10 years, and 33 (82.5%) teachers had been teaching in urban schools. Notably, none of the participants attended any other PD programs during this TPACK-based PD.

Table 1 Participants’ demographic information

3.2 Research design

This study adopted a single group pretest and posttest quasi-experimental design, as shown in Fig. 1. The PD activities consisted of two phases: face-to-face and online activities. Before the PD program, teachers were invited to participate in a pretest to evaluate their AI knowledge and AI teaching self-efficacy. After 25 days of blended PD activities, their AI knowledge and AI teaching efficacy were tested again, individual teachers' AI programming work and group AI lesson plans were collected, and semistructured interviews were conducted.

Fig. 1
figure 1

The quasi-experimental design

3.3 Intervention: TPACK-based PD

In this research, a TPACK-based PD program was designed to promote the AI teaching competency of CS teachers, especially emphasizing content knowledge of AI (CKAI), technological content knowledge of AI (TCKAI), pedagogical content knowledge of AI (PCKAI), technological pedagogical knowledge of AI (TPKAI), and technological pedagogical content knowledge of AI (TPACKAI), as shown in Fig. 2. CKAI concerns the introduction of AI, including the “five big ideas about AI” (Touretzky et al., 2019) and the application of AI. TCKAI concerns block-based programming for AI, including the use of digital software and physical hardware to learn AI. PCKAI consists of two parts, including teaching strategies and instructional design for AI. TPKAI focuses on tools for teaching AI, including related apps, websites, and tools. TPACKAI emphasizes the integration of programming, technologies, and pedagogy for teaching AI, for example, the development of a school-based AI textbook. The specific PD modules and content are described in Table 2.

Fig. 2
figure 2

TPACK-based PD framework

Table 2 The PD modules and content based on the TPACK framework

The whole TPACK-based PD program combined 45 h of face-to-face workshops and a 30-h online course. Considering the effective teacher PD elements—“active learning” and “cooperation”—proposed by Darling-Hammond et al. (2017), most of the face-to-face PD activities followed project-based learning pedagogy, which included hands-on assembly and programming of electronic blocks, design and sharing of lesson plans with peers, observation of classroom teaching practice, and on-site visits, as shown in Fig. 3, with more detailed activities shown in Table 3.

Fig. 3
figure 3

Active learning and cooperation in face-to-face activities

Table 3 Face-to-face workshop schedule

During project-based learning, teachers were required to solve challenges or complete projects in groups of 4 to 5 people, and coaches or experts walked around the lab to assist teachers and provide dynamic scaffolds. For instance, during the lesson plan activity, 40 teachers were divided into 8 groups, with 5 teachers in each group, and they collaborated together to design an AI lesson plan within groups and finally shared their lesson plan in the whole class.

The online phase for CS teachers’ PD lasted 15 days, and it included a self-paced online course and community of practice (CoP) in WeChat. Within the online course, there were 11 video lectures about AI and its programming, as well as 3 assignments for block-based AI programming projects, which ranged from easy to difficult and which focused on emotion recognition, speech recognition, and augmented reality. In the online CoP, participants communicated academic content with each other, shared their programming artifacts and asked for help from their facilitators (Fig. 4).

Fig. 4
figure 4

Online learning tools and activities. Note: (a) table of contents of the AI programming course, including the introduction of AI, the AI expert system, and block-based programming for AI, (b) the community of practice in WeChat, where the participants were discussing how to design a rotated symmetric figure in Python, and (c) two examples of block-based AI programming, which are image recognition and data processing

3.4 Instruments

In this research, an AI knowledge test was developed to measure the change in teachers’ AI knowledge before and after the intervention. The evaluation criteria for AI lesson plans and AI programming artifacts were used to measure teachers’ AI teaching skills. An AI teaching self-efficacy scale was adopted to assess the improvement of teachers’ AI teaching self-efficacy before and after the intervention. Finally, a semistructured interview was used to support and supplement the quantitative data.

3.4.1 AI knowledge tests

The Artificial Intelligence for K-12 Initiative in the United States proposed “5 big ideas” for AI education: perception, representation and reasoning, machine learning, interaction, and social impact (Touretzky et al., 2019). Based on these ideas, an AI knowledge test with two homogeneous versions was developed for the pre- and posttest by a team comprising an AI expert and two researchers, whose opinions were used to support the content validity of this test. Then, an educational technology expert reviewed the test for face validity, including its accuracy and PD appropriateness. For the items in each version of the AI knowledge test, there were 15 multiple choice questions, with 3 items for each of the “5 big ideas”. The test included 10 questions with a single answer, with 1 point for correct answers and 0 for wrong answers, and 5 questions with multiple answers, with 2 points for selecting all the correct answers, 1 point for selecting some of the correct answers and 0 for selecting or including wrong answers. Some examples of the pre- and posttest items are shown in Table 4. The overall Cronbach’s alpha of the AI knowledge tests was 0.608. Although this value is not ideal, since AI knowledge has not been accurately defined in previous research and since no relevant instrument has been released yet, a slightly lower internal consistency is acceptable (Chai et al., 2016). The discrimination of each version of the AI knowledge test ranged from 0.32 to 0.41, and the difficulty coefficient ranged from 0.487 to 0.623.

Table 4 Example items of the AI knowledge test

3.4.2 Evaluation criteria for AI lesson plans

To measure CS teachers’ AI lesson plan ability, the researchers developed a rubric of AI lesson plans, including 3 categories and 9 items (as shown in Table 5). The categories included learning objectives, learning content and learning activities, which were adapted from the Lesson Plan Evaluation Criteria for K-12 Education originally developed by Ye and Wu (2003). The specific items were integrated from 3 related rubrics, including the STEM Lesson Plan Evaluation Criteria (Kim et al., 2015), the Information Technology Curriculum Standard for Senior High Schools (Ministry of education, 2017) and the Lesson Plan Evaluation Criteria for K-12 Education. To improve the structural validity of the rubric, the researchers of this study discussed it together, conducted a trial study and then revised it. In addition, the structure and level of the rubric were reviewed for content validity by an educational technology expert, and then researchers refined the final version of the rubric based on expert feedback. The rubric was organized into three sections: learning objectives, learning content and learning activities. The total score of the evaluation criteria is 9, and each item is scored as 1 or 0.

Table 5 AI lesson plan evaluation criteria

3.4.3 Evaluation criteria for AI programming artifacts

To assess teachers’ teaching skills, this study also used block-based programming artifact evaluation criteria to analyze teachers’ individual AI programming artifacts; these criteria were originally developed by Moreno-León et al. (2015) and include seven dimensions: flow control, synchronization, parallelism, user interactivity, logical thinking, data representation, abstraction and problem decomposition. The total score is 21, which is divided into three levels: basic (0–7 points), developing (8–14 points) and proficient (15–21 points).

3.4.4 AI teaching self-efficacy scale

Riggs and Enochs (1990) developed a 5-point Likert scale to assess teachers’ science teaching efficacy belief; it includes two core dimensions: science teaching efficacy belief and science teaching outcome expectancy. There are 25 items in this scale, each scored on a scale ranging from 1 (strongly disagree) to 5 (strongly agree). To measure the AI teaching self-efficacy of CS teachers, the scale was adapted with some modifications and deletions in the current research, as shown in Table 6. There are 10 items in total, which reflect the two core dimensions: AI teaching efficacy beliefs and AI teaching outcome expectancy. Among the 10 items, item 4, 6 and item 9 are reversed. It should be noted that the reversed encoding was converted into forward encoding during data analysis. Then, the validity and reliability of the adapted scale were tested with 111 K-12 CS teachers. The KMO value was 0.791 > 0.7, indicating that there was a certain correlation between the items. The loading values ranged from 0.52 to 0.79 > 0.45. The Cronbach's alpha coefficient was 0.822 > 0.7, indicating that the scale had an acceptable stability and consistency.

Table 6 Factor loading matrix of the AI teaching self-efficacy scale

3.4.5 Semistructured interview outline

For AI teaching competency and the teachers’ PD, the semistructured interview focused on the following questions: (1) What are your gains and feelings from these PD activities? (2) Do you feel more confident in carrying out AI-related courses/activities or guiding students in the future? Why? (3) What challenges do you think you may face in carrying out AI-related courses/activities after these PD activities?

3.5 Data collection and analysis

To efficiently collect quantitative data, the AI knowledge test and the AI teaching self-efficacy scale were integrated into one online questionnaire. Meanwhile, the pairwise deletion method was used to screen valid questionnaires (Graham, 2009). After invalid responses were eliminated, 32 valid questionnaires remained, and the valid rate reached 80%. The Shapiroe-Wilk test was used to test the normality of change scores since the sample size was less than 50. The results showed that the change scores between the pretest and posttest of AI knowledge (w = 1.934, p = 0.469 > 0.05) and AI teaching efficacy (w = 0.972, p = 0.561 > 0.05) showed a normal distribution. Therefore, a paired samples t test was used to analyze the data. To avoid the effect of sample size, Cohen's d was used to indicate the standard difference between the pre- and posttest means. Effect sizes with Cohen’s d > 0.5, 0.8, 1.2, and 2.0 were considered medium, large, very large, and huge, respectively (Sawilowsky, 2009).

In terms of AI lesson plans, works from 8 groups were collected. To assess these 8 works, 2 lesson plans were first randomly selected and assessed by two evaluators to conduct a trial evaluation. The Cronbach's alpha of the trial evaluation was 0.809 > 0.8. Then, the two evaluators discussed the differences together and reached a consensus. Finally, all the lesson plans were scored separately by the two evaluators, and the Cronbach's alpha coefficient was 0.904.

In terms of AI programming projects, there were 30 teachers who submitted all three online assignments, up to 90 artifacts in total. To improve the reliability of the assessment, the researchers initially discussed the evaluation criteria together to ensure that the two evaluators could completely understand the evaluation criteria. Then, four AI programming artifacts were randomly selected from each project and assessed by the two evaluators to conduct a trial evaluation. The Cronbach's alpha of the trial evaluation was 0.922 > 0.8, indicating high internal consistency. Then, the two evaluators discussed the results and reached a consensus on the differences. Finally, all 90 programming artifacts were randomly divided into two halves, and 15 teachers’ three works were rated by the two evaluators separately.

To conduct semistructured interviews, researchers purposefully selected 4 participants, namely, teacher T1 with high AI teaching self-efficacy and low AI knowledge, teacher T2 with high AI teaching self-efficacy and high AI knowledge, teacher T3 with low AI teaching self-efficacy and low AI knowledge, and teacher T4 with low AI teaching self-efficacy and high AI knowledge. The qualitative data were analyzed following four steps: transcribing the interviews, reading the transcripts, coding the data and interpreting the data (Ruona, 2005). Prior research-driven coding nodes were developed based on AI knowledge, AI teaching skills and AI teaching self-efficacy. Initially, some samples from the interview data were coded by two researchers individually. Then, the two researchers discussed the results together and reached a consensus regarding the codes and subthemes, and finally, they coded all interview data individually again.

4 Results

4.1 RQ1: CS teachers’ AI knowledge

The results of the paired samples t test of AI knowledge are shown in Table 7. The average score of CS teachers’ AI knowledge was 10.88 before the intervention and increased to 13.56 after the intervention. There was a statistically significant difference in the average score of AI knowledge before and after the PD program (t = 5.241, p = 0.000, Cohen’s d = 1.033), especially in representation and reasoning (t = 6.723, p = 0.000, Cohen’s d = 1.107), interaction (t = 4.274, p = 0.000, Cohen’s d = 0.701), and social impact (t = 3.388, p = 0.002, Cohen’s d = 0.554). However, there was no significant change in the other two dimensions, including perception (t = 1.161, p = 0.255, Cohen’s d = 0.220) and machine learning (t = 0.232, p = 0.818, Cohen’s d = 0.044).

Table 7 Paired samples t test of AI knowledge (n = 32)

4.2 RQ2: CS teachers’ AI teaching skills

4.2.1 AI lesson plan

Descriptive statistics of the AI lesson plan scores are shown in Table 8. The mean value of the overall score was 6.04 (SD = 3.67). For each item, when the mean value was above 0.5, it indicated that more than half of the lesson plans met the standard. For most groups, the learning objectives were clear and diversified (M = 0.88), and most of these objectives reflected the core literacy of CS (M = 0.88). The learning content of most groups embodied AI knowledge (M = 0.88) and related it to real life (M = 0.88). In terms of learning activity design, more than half of the groups' teaching design activities were open and flexible (M = 0.63), which was consistent with the learning objectives (M = 0.88). However, the description of content was not accurate and specific for most groups (M = 0.38), and only one group’s learning activity reflected project-based learning (M = 0.13).

Table 8 Descriptive statistics for AI lesson plan scores

4.2.2 AI programming

Table 9 shows that the mean values of the overall score of each AI programming project increased gradually, from 10.00 (the first project—emotion recognition) to 12.93 (the second project—speech recognition) and to 16.76 (the third project—augmented reality). Furthermore, the average scores of the first and the second projects were at the developing level (8–14), and the average score of the third project was at the proficient level (15–21). This indicated that the AI programming skills of the CS teachers improved constantly, especially considering that the complexity of the three projects increased gradually.

Table 9 Descriptive statistics for AI programming skills

4.3 RQ3: CS teachers’ AI teaching self-efficacy

The results of the paired samples t test are shown in Table 10. The average score of CS teachers’ AI teaching self-efficacy was 31.66 before PD and increased to 38.19 after PD. This showed that the AI teaching self-efficacy of CS teachers was significantly improved (t = 6.361, p = 0.000, Cohen’s d = 1.124) in the two subdimensions of “AI teaching efficacy belief” (t = 3.123, p = 0.004, Cohen’s d = 1.164) and “AI teaching outcome expectancy” (t = 6.577, p = 0.000, Cohen’s d = 0.551).

Table 10 Paired samples t test of AI teaching self-efficacy (n = 32)

4.4 Interview data analysis results

According to the three research questions, the codes of the interview data were categorized into 3 themes and 5 subthemes. The frequency counts of AI teaching competency dimensions are provided in Table 11. All 4 interviewees expressed positive views about “AI lesson plan ability”, with these views expressed 12 times in total; “AI teaching efficacy belief”, expressed 12 times; “AI programming skills”, expressed 8 times; and “AI knowledge”, expressed 7 times.

Table 11 Frequency counts of the AI teaching competency dimension

In terms of AI knowledge, teacher T3 (low AI teaching self-efficacy, low AI knowledge) mentioned, "I have learned a lot of AI knowledge this time. For example, I have seen some hardware with my own eyes, and I have learned how to assemble and program them." Teacher T1 (low AI teaching self-efficacy, high AI knowledge) said, "I didn't know what AI was before this training program. Now I have a general understanding of AI, and I feel more confident in teaching AI."

In terms of AI teaching skills, teacher T4 (low AI teaching self-efficacy, high AI knowledge) said, "Our school has a programming club using block-based software such as Scratch, and I only taught students how to program with software since there was a lack of supportive hardware and poor hands-on practice. After this training, I feel I am particularly interested in AI programming with hardware." Teacher T3 (low AI teaching self-efficacy, low AI knowledge) noted, "After training, I know how to apply Beidou navigation to teach AI in my class, and all the lesson plans provided by the experts can be taught to my students. In addition, I learned to carry out AI instructional design based on the core CS literacy, and I felt that I have made great progress."

Regarding AI teaching self-efficacy, teacher T2 (high AI teaching self-efficacy, high AI knowledge) stated, "When I return to school, I will adjust my CS course, add some AI-related content, one or two lessons or a unit into a semester, carry out some related activities to make AI and block-based programming popular among students." Teacher T4 (low AI teaching self-efficacy, high AI knowledge) said, "I will apply (at school) to buy some Arduino (products) and learn to do some creative works by myself. I have a strong expectation to study further and guide my students to combine their ideas with creation."

5 Discussion

Previous research has indicated that TPACK is an effective knowledge framework for teachers’ PD programs (Akyuz, 2018; Archambault & Barnett, 2010; Chai & Koh, 2017); however, few studies have used the TPACK model to promote CS teachers’ AI teaching competency. To our knowledge, this study may be the first empirical study to promote the AI teaching competency of K-12 in-service CS teachers using a TPACK-based PD approach. Meanwhile, the findings indicated that this approach improved CS teachers’ AI teaching competency, including their AI knowledge, AI teaching skills and AI teaching self-efficacy. These findings, as well as practical implications, are discussed in the following sections.

5.1 TPACK-based PD design and its effectiveness on AI teaching competency

To improve CS teachers’ AI teaching competency, this TPACK-based PD program focused on multiple content designs embedded in a series of blended learning activities. In this study, CKAI, PCKAI, TCKAI, TPKAI and TPACKAI were intentionally developed among CS in-service teachers. This new content design extended many previous studies on CS teachers’ PD, which have mainly focused on basic CS knowledge, concepts, programming, or computational thinking (Angeli et al., 2016; Chen & Cao, 2022; El-Hamamsy et al., 2021; Kong et al., 2020). In terms of activity design, this TPACK-based program combined various face-to-face activities with an online teacher learning course, especially stressing that the activities reflected effective TPD elements, including active learning, collaboration, and the use of models and modeling (Darling-Hammond et al., 2017; Desimone, 2009). In particular, these activities in this PD program focused on hands-on assembly and programming and lesson plans with peers.

Regarding the effectiveness of this PD program, first, CS teachers’ AI knowledge improved significantly after the 25-day intervention, especially in representation and reasoning (d = 1.107), interaction (d = 0.701), and social impact (d = 0.554). This finding is in line with previous research that found that robot programming activities in a PD program promoted teachers’ better understanding of the ideas and technologies in AI (El-Hamamsy et al., 2021). To some extent, this TPACK-based PD program was successful in general; however, teachers’ perceptions and machine learning of AI knowledge have not yet been improved. The reason probably lies in the PD content of this program not being strengthened enough regarding the knowledge and activities related to these two AI subtopics.

Secondly, after the intervention, most CS teachers’ AI lesson plans met the criteria, especially in the subdimensions of “be clear and diversified”, “reflect core literacy of computer science”, “embody AI knowledge”, “relate to real life”, and “be consistent with the learning objectives”. This finding indicated that teachers’ pedagogical content knowledge related to AI was improved through this TPACK-based PD program. However, project-based learning was not reflected in most groups’ lesson plans, which revealed that there was a weakness in this PD program. As a student-centered constructivist approach, project-based learning has often been applied to programming education with positive outcomes (Hsu et al., 2018; Wang & Hwang, 2017). Xue and Wang (2022) also stressed that teachers should reasonably integrate teaching methods and information technology and cultivate students’ creative thinking and information literacy in AI courses. Therefore, the weakness of this PD program should be addressed in the future.

Regarding the effect of the program on CS teachers’ AI programming skills, the scores of the three programming projects increased gradually, indicating that the TPACK-based PD program had a positive result. Since programming technology is an important element of technological knowledge related to the AI discipline (Kim et al., 2021), it is necessary to purposefully improve teachers’ programming skills. Toward this goal, block-based coding sessions, both online and offline, were implemented in this study. This approach is consistent with previous research that used robotics and block-based programming to inspire teachers’ creativity and develop their understanding of advanced CT concepts (Kim et al., 2015; El-Hamamsy et al., 2021; Dorotea et al., 2021).

Finally, CS teachers’ AI teaching self-efficacy was significantly improved in terms of both “AI teaching outcome expectancy” and “AI teaching efficacy belief” after the intervention. This finding aligns with Kapici and Akcay’s (2020) study, which found that lesson planning practice supported by the environment and technical equipment could significantly improve teachers’ self-efficacy. In the present study, TPACK-based programs embedded with various professional learning activities, especially group-based lesson planning accompanied by hands-on assembly and programming electronic fabrics, may play an important role in increasing CS teachers’ teaching beliefs. As a result, high AI teaching self-efficacy may have a positive impact on their related teaching practice and students’ learning outcomes, according to previous studies on teaching self-efficacy (Burić & Kim, 2020; Orakcı et al., 2020).

5.2 Contributions and implications

To sum up the contributions of this research, we suppose that our PD intervention is one of the first empirical and evidence-based attempts to promote CS teachers’ AI competency. The intervention in this study was designed particularly based on the TPACK framework (Rosenberg & Koehler, 2015), and it also integrated certain elements of effective TPD (Darling-Hammond et al., 2017; Desimone, 2009). Another contribution of this study was that we used multiple instruments, including knowledge-based tests, artifact-based assessments (programming artifacts and lesson plans), self-reports, and interview data, to assess CS teachers’ AI teaching competency. Compared with previous research that examined the effect of TPACK-based PD programs on CS teachers (Angeli et al., 2016; Chen & Cao, 2022; Kong et al., 2020), the assessment approach in our research was more comprehensive and might improve the reliability and validity of the research.

The main implication of the present study lies in the effective design of CS in-service teachers’ PD programs for practitioners. Due to the lack of competent CS teachers to teach AI-related courses in K-12 schools (Akpinar & Bal, 2006), it is urgent to design high-quality PD programs to improve in-service CS teachers’ AI teaching competency, including in cognitive and noncognitive aspects. According to effective elements of TPD, first, PD content should focus on “teaching strategies associated with specific curriculum content [that] supports teacher learning within teachers’ classroom contexts” (Darling-Hammond et al., 2017). In the present study, this element reflected an intentional focus on AI-specific discipline and constructivist pedagogies, which were mainly categorized into CKAI, TCKAI, PCKAI, TPKAI and TPACKAI. The study revealed that this TPACK-based content design had a positive impact on the CS teachers’ AI teaching knowledge, skills and self-efficacy in general. Meanwhile, this content design particularly incorporated active teacher learning into the PD program, which engaged CS teachers directly in constructing AI-related artifacts and designing lesson plans based on constructionism theory (Papert, 1991). Furthermore, this PD program created spaces for CS in-service teachers to share ideas and collaborate in their professional learning, which reflected the “collaboration” element of effective PD to some extent. This holistic TPACK-based design may have some practical implications for many CS teachers’ PD initiatives in China.

6 Conclusion and limitations

To promote CS teachers’ AI competency, this research constructed and implemented a TPACK-based PD program. With a 25-day intervention, including 45 h of offline activities and 30 h of online activities, the TPACK-based PD program a) significantly improved CS teachers’ AI knowledge, especially in representation and reasoning, interaction, and social impact; b) developed CS teachers’ AI teaching skills, including their AI lesson plan ability and AI programming skills; and c) significantly improved CS teachers’ AI teaching self-efficacy, both in AI teaching efficacy belief and AI teaching outcome expectancy. These findings revealed the positive effectiveness of TPACK-based PD in improving the AI teaching competency of CS teachers in K-12 education, which helps to expand and enrich the existing research and practice on the design of effective PD programs for CS teachers, especially in AI education.

In terms of the limitations of this study, on the one hand, a one-group pretest–posttest quasi-experimental design was employed in the study, which may have made it hard to explain the real effect of the intervention due to the lack of a control group. However, during this TPACK-based intervention, no other PD programs were offered to the participants, which could exclude other confounding variables. In the future, it is suggested that a much more rigorous pretest and posttest (quasi-)experimental design with a control group be conducted to examine the effectiveness of the program. On the other hand, the TPACK-based PD program in this study did not involve all the effective PD elements, such as “coaching and expert support” and “feedback and reflection”, which focus on improving teaching practice in the classroom. In future research, the program could be strengthened through job-embedded PD for CS teachers, and the impact on teaching practice could be examined further. Furthermore, students’ AI learning outcomes also need to be assessed to provide much stronger evidence of the PD program’s effectiveness.