1 Introduction

Lesson study (LS), which originated in Japan, is a student-focused, teacher-oriented, collaborative professional development (PD) approach (Stigler & Hiebert, 1999) that has been adapted globally (Huang et al., 2019; Seleznyov, 2019). An LS typically includes four core phases: study, plan, teach, and reflect (Lewis, 2016). Furthermore, researchers have identified important features of effective LS, including (1) studying relevant materials and considering goals for student learning and development, (2) developing research lesson proposals in alignment with these goals, (3) carefully observing and collecting data on student learning and development during the research lesson, (4) using these data to reflect on the lesson and instruction more broadly, (5) revising and reteaching the research lesson if desired, (6) involving knowledgeable others (KOs) during the LS process, and (7) disseminating the LS results (e.g., Akiba et al., 2019; Takahashi & McDougal, 2016). Studies have documented various effects of the implementation of LS on teaching (Huang et al., 2017), teacher learning (Vermunt et al., 2019), student learning (Lewis & Perry, 2017), curriculum reform (Lewis & Takahashi, 2013), professional learning community building (Aas, 2021), and research-informed classroom practice (Wei & Huang, 2022).

Internationally, research on LS has experienced dramatic development over the past 20 years as both the number of publications and the range of topics associated with LS have increased [e.g., the establishment of International Journal for Lesson and Learning Studies (IJLLS) since 2012; the annual conference of World Association of Lesson Studies since 2007]. Recent reviews of LS (Cheung & Wong, 2014; Larssen et al., 2018; Seleznyov, 2019; Willems & Van den Bossche, 2019) indicate that researchers have taken different approaches in efforts to synthesize studies on LS published through 2018. Two systematic literature reviews have documented the effectiveness of LS (Cheung & Wong, 2014; Willems & Van den Bossche, 2019). Reviewing papers published from 2000 to 2010, Cheung and Wong (2014) found that LS brought positive benefits to teachers’ PD and students’ learning. As a follow-up, Willems and Van den Bossche (2019) reviewed papers published between 2010 and 2018 and concluded that LS positively impacted teachers’ professional learning in terms of their beliefs, behaviors, knowledge and skills.

Despite the reported successes of LS—especially in Japan and China, where LS has been used for over a century (Lewis, 2016)—numerous challenges have been uncovered in recent studies of adaptations of LS in other countries. Two major issues have been identified. The first issue relates to the LS process. Prior LS studies have demonstrated inconsistent understanding of LS, leading to different implementations (Larssen et al., 2018). For instance, while plan is one of the four phases commonly included in LS, studies outside Japan and China often downplayed the role of lesson planning, which may lead to unsuccessful LS (Fujii, 2016). Similarly, KOs, a critical element in Japanese and Chinese LS, have often been overlooked when LS was conducted in other countries.

The second issue relates to LS impacts (Lewis, 2016). While most studies reported the impact of LS on teacher learning, there was little evidence of impact on teaching practice, student learning, and school culture/community (Seleznyov, 2019). There was also a lack of consistent outcome measures across studies (Cheung & Wong, 2014). Moreover, most research studies appeared to be qualitative case studies that lacked rigorous design (Willems & Van den Bossche, 2019). Together, these issues raise questions about the validity and reliability of the reported impact of LS.

The above issues suggest a need for closer assessment of the most recent research on LS. Furthermore, to our knowledge, no recent reviews have examined LS implementation in mathematics education, a field in which LS has been widely used. Given that mathematics teaching and learning is an important but challenging subject that calls for global attention, this study conducts a review of recent LS in mathematics education. Specifically, we focus on two research questions:

  1. 1.

    How has LS been implemented in recent mathematics education?

  2. 2.

    What are the major impacts of LS in recent mathematics education?

To explore these research questions, our review is guided by Lewis’s (2016) theoretical model that indicates an LS cycle and pathways of impact. Below we introduce this model, which helps frame this review.

2 A theoretical model of LS (Lewis, 2016)

Based on extensive studies of Japanese LS and building on theories of cognitive and situated learning, Lewis et al. (2009) proposed a theoretical model for how implementation of lesson study improved instruction through three intervening pathways and provided a North American LS case to illustrate the mechanisms by which LS can improve instruction. Lewis (2016) further refined the models by diagrammatizing the LS cycle, defining four pathways of impact and outcomes. As shown in Fig. 1, the four LS phases produce the outcomes of instruction and student learning by simultaneously improving four basic inputs: teachers’ knowledge; teachers’ beliefs and dispositions (e.g., their interest in student thinking and belief that students can learn); teacher learning community (e.g., norms and routines of collaborative improvement); and curriculum (e.g., instructional tasks and support materials for teachers). Using this model and evidence from 11 empirical studies reported in a special issue (Huang & Shimizu, 2016), Lewis further illustrated why and how LS works both in Japan and other countries. For our study, we adapted Lewis’s model, combining teachers’ knowledge, beliefs and dispositions as one category and viewing the two outcomes of instruction and learning as part of the LS impact. As described below, this model allowed us to simultaneously examine variations in how the LS cycle was implemented and studied as well as to consider how dimensions of impacts and outcomes were assessed and emphasized across recent literature.

Fig. 1
figure 1

A theoretical model of LS cycles and pathways of impact (Lewis, 2016)

3 Methods

3.1 Search and screening process

We first identified and searched relevant peer-reviewed journals published in recent years (from 2015 to April 2022). Although our study is not a systematic review, we use a modified PRISMA diagram (Fig. 2) to illustrate how records were identified and screened for inclusion and took an approach similar to that used in a recent review about STEM education (Li et al., 2020). LS is an emerging research field with relevant publications appearing in different journals; thus, we identified and then searched five types of research journals using the search term “lesson study”: a special journal on lesson study (IJLLS), general education journals (e.g., Teaching and Teacher Education, Journal of Teacher Education), mathematics education journals (e.g., Journal for Research in Mathematics Education, Journal of Mathematical Teacher Education, ZDM—Mathematics Education), STEM education journals (e.g., International Journal of STEM Education), and cognitive science and educational psychology journals (e.g., Learning and Instruction). Overall, we examined 28 journals and identified 204 articles (see Appendix 1).

Fig. 2
figure 2

Modified PRISMA diagram for the screening process based on Page et al. (2021)

We then screened each identified article (n = 204) for its research focus, resulting in an exclusion of 37 articles that did not primarily focus on lesson study. The remaining articles (n = 167) were retrieved and assessed for eligibility. About half of the articles (n = 85) were excluded because they did not focus on lesson study in mathematics. The remaining 82 articles were included in further review. Given that we intended to use Lewis’s model for coding, we excluded seven more non-empirical articles, yielding a final sample of 75 studies to answer our research questions.

3.2 Coding and analysis

For the 75 empirical LS articles that involved mathematics, we first coded the following aspects of each article: study type (e.g., empirical, theoretical, review), methods used (qualitative such as observation, quantitative such as correlational analysis, or mixed methods), grade level (elementary, secondary), teacher level (preservice, in-service), country, and lesson study time (e.g., 1 month, 8 weeks, 3 years). Results were documented in a spreadsheet. We further coded LS implementation and impact. For LS implementation, we coded whether an article addressed any or all LS phases: study, plan, teach, and reflect. For LS impact, we coded whether an article reported its impact on any of the following dimensions: (a) knowledge/belief and dispositions, (b) teacher learning community, (c) curriculum, (d) instruction, and (e) learning. Note that even though the non-empirical articles were not coded with Lewis’s model, we reviewed them (e.g., Huang & Shimizu, 2016; Lewis, 2016; Pang & Marton, 2017), which provided further support for our interpretation of findings.

Next, we conducted three rounds of in-depth analysis. First, we noted the study focus and research questions, LS procedures reported in each article, and the successes and challenges in terms of LS implementation and impact. Given that our review focused on LS in mathematics education in a global context, we paid attention to mathematics-specific and cultural factors. Next, we memoed all notes, highlighting the main findings in each article with our research questions in mind. We then organized the main findings into two tables, one related to implementation and one related to impact. For implementation, we further analyzed whether a finding indicated successes or challenges of an LS phase. For impact, we analyzed what type of impact findings illustrated. If an article reported several impacts, we recorded each separately. Finally, we compared findings across articles to identify emerging themes in alignment with Lewis’s (2016) model. Articles with similar or contrasting findings were put together under each theme. We then organized these themes into LS implementation and LS impact, respectively.

4 Findings

4.1 Overview

The 75 articles described LS in a total of 28 countries; the United States appeared most frequently (20 articles). How the LS team was formed and how the LS process was conducted varied greatly across cultural contexts (Amador & Carter, 2018). LS was most often conducted within a single school (Takahashi & McDougal, 2016), although it could also occur at district, regional, and national levels (Groves et al., 2016). It was frequently conducted by in-service (87%) elementary and middle school teachers (Corey et al., 2021), although a growing body of research has explored the implementation of LS in secondary classrooms (e.g., Huang et al., 2017), with preservice teachers enrolled in teacher training programs (e.g., Guner & Akyuz, 2020). LS cycles were often led by expert teachers (Seino & Foster, 2021) or supported by KOs who provided feedback during the LS process (Hernández-Rodríguez et al., 2021; Lewis & Perry, 2017). Outside of Japan, university faculty often initiated and facilitated LS cycles (e.g., Calleja & Camilleri, 2021).

4.2 LS implementation in mathematics education

Among the 75 articles, 56 (75%) mentioned all four LS phases or considered the LS cycle as a whole. Some articles explicitly focused on particular LS phases such as plan (n = 6; e.g., Aas, 2021; Fauskanger et al., 2019; Fujii, 2016; Hernández-Rodríguez et al., 2021), reflect (n = 7; e.g., Gu & Gu, 2016; Kager et al., 2022; Seino & Foster, 2021), or both (n = 2; e.g., Vermunt et al., 2019). However, the study phase was rarely singled out as a research focus. In fact, in some studies where the full LS cycle was addressed, the study phase was not explicitly listed (n = 13; e.g., Chua 2019; Groves et al., 2016; Guner & Akyuz, 2020; Huang & Shimizu, 2016; Widjaja et al., 2017). It seems that the study process was combined with planning, obscuring the study phase (Watanabe, 2018). In contrast, the teach phase (e.g., research lesson) received a great deal of attention and was always explored in conjunction with other phases.

Below we present our main findings, including six emergent themes about LS implementation. Given that some themes occurred across LS phases, we present our findings according to these themes and relevant successes and challenges. We conclude this section with observations regarding new developments in LS implementation.

4.2.1 Research purpose/question in LS

Takahashi and McDougal (2016) listed “a clear research purpose” as the first critical element of their proposed “collaborative lesson research” LS model. However, the literature suggested that LS was often undertaken without a clear research purpose. For instance, during the “plan” stage, Fujii (2016) found that teachers’ planning in countries outside Japan often lacked a research question. As such, the selected task examples lacked a clear, research-driven objective. Relatedly, during the teach stage, many teachers did not have a clear purpose for their research lesson. Takahashi and McDougal (2016) found that teachers had often misconceived the goal of the LS research lesson as simply to develop a good product. This is different from enhancing teachers’ teaching competency, a common goal shared by Japan and China LS (Leavy & Hourigan, 2016), although the latter also stresses the importance of developing a good product (Huang et al., 2019). Lack of a clear research focus may negatively affect teacher reflections, resulting in poor reflection quality. This was found in Sekao and Engelbrecht (2021) with South African primary mathematics teachers.

4.2.2 Materials and the study of materials

Quality of teaching materials was found to be one of the three design features of LS that supported teachers’ collaborative learning (Akiba et al., 2019), but quality materials were not always available. Groves et al. (2016) reported that “material” was a constraint for Australian teachers as they sought suitable problem-solving tasks. Challenges in accessing high-quality materials, coupled with limitations of existing materials, make the study phase of LS even more critical. Through the study phase, teachers should spend time studying teaching materials, a prerequisite of lesson planning (Miyakawa & Winsløw, 2019) and a process leading to LS success (Takahashi & McDougal, 2016; Watanabe, 2018). However, as mentioned earlier, many studies of LS combined the study phase with planning, likely due to the close connection between studying materials and designing tasks. This practice does not seem to align with the essence of Japanese LS, which regards studying teaching materials as a critical separate phase.

4.2.3 The focus on students’ thinking

Across the literature, there is a consensus that it is most critical to focus on students’ thinking and learning during LS (e.g., Akiba et al., 2019; Bruce et al., 2016; Confrey & Shah, 2021; Lewis, 2016). Whether or not one focuses on children’s thinking in LS distinguishes expert teachers from novice ones (Bocala, 2015). Consider, for example, that Japanese KOs’ final comments clearly emphasized children’s ideas and work (Seino & Foster, 2021). Akiba et al. (2019) also found that facilitators’ focus on students was a critical design feature of LS that supported teacher learning.

However, many studies outside Japan (e.g., in Iran, Kazakhstan, and Laos) reported challenges in focusing on children’s thinking (Arani, 2017; Khokhotva, 2018). For instance, during lesson planning, teachers’ talk tended to be descriptive rather than analytical (Grimsæth & Hallås, 2015, as cited in Fauskanger et al., 2019), indicating a lack of focus on students’ thinking. Bjuland and Mosvold (2015) reported that during research lessons, pre-service teachers in Norway did not observe students’ learning and that those lessons were not organized to make students’ learning visible. Similarly, Gero (2015) found that during reflection, US teachers tended to focus on social, less critical aspects of the LS process and conducted minimal analysis of students’ thinking. Bakker et al. (2022) reported similar findings with Dutch teachers. Given that the LS goal should ultimately be to enhance students’ learning (Lewis, 2016), the lack of focus on student thinking may lead to unproductive discussions. However, Warwick et al. (2016) suggested that a focus on students’ learning enabled teachers to collaborate effectively on developing plans to address student need.

4.2.4 Knowledgeable others (KOs) and facilitators

KOs play an important role in both Japanese and Chinese LS. Although the exact roles of KOs in both countries can differ—for example, they may or may not be part of the LS team or serve as a facilitator—these experts provide invaluable feedback to teachers (Groves et al., 2016; Hernández-Rodríguez et al., 2021; Seino & Foster, 2021) and offer mathematical and practical knowledge (Gu & Gu, 2016). Seino and Foster (2021) found that in their final comments, Japanese KOs asked teachers to think about content-specific aspects of lessons like decimals, subtraction, and fractions, encouraging teachers to ask students to think more deeply about why they approached solving a problem in a certain way. Groves et al. (2016) reported that the KO in their LS served as a contributing factor to Australian teachers’ success in professional learning. Nevertheless, in some LS groups, there were no KOs or facilitators involved, particularly in LS conducted outside of Japan and China (Khokhotva, 2018; Takahashi & McDougal, 2016), which may have limited LS outcomes.

Although KOs play a critical role in supporting LS, some studies revealed room for this role to improve. For instance, in Gu and Gu (2016), the Chinese KO, a research specialist, paid little attention to teachers’ questions and did not engage them in conversation. Similarly, Amador and Carter (2018) reported that the lengthy talk between the facilitator and the cooperating teacher in a US setting declined preservice teachers’ professional noticing. In fact, Kager et al. (2022) found that different LS groups had significantly different learning trajectories and needs. Therefore, it is important for KOs and facilitators to be aware of teachers’ learning needs and provide corresponding support (Huang et al., 2021).

4.2.5 Collaboration and needed support

Collaboration has been widely identified as a critical but challenging factor in LS. First, collegial questioning and critique can be lacking. For example, Sekao and Engelbrecht (2021) found that South African teachers received over-critical feedback in debriefing sessions, making them feel personally attacked. Consequently, these teachers viewed the reflection process as less useful and enjoyable than planning and teaching phases and were concerned about being observed in research lessons. Gero (2015) reported similar concerns among US teachers. These findings call for grounding LS in a cultural context that facilitates collaboration and collegiality. Based on a study of a Singapore LS, Lee and Tan (2020) reported that collegial questioning and critique enabled teacher learning. Therefore, in a country where teaching is perceived as individualized and confidential, LS must be conducted in a way that promotes collaboration. In fact, Lewis and Hurd (2011) provided guidelines for conducting debriefings.

Another element of collaboration lies in school and district support for LS. Akiba and Wilkinson (2016) reported that even though LS was mandated by the state of Florida in the US, insufficient funding was provided to support teachers, which challenged LS implementation. This contrasts with Japan and China where additional funding is not a requirement because LS is a routine part of teachers’ daily work. Gero (2015) noted that the district’s high degree of control over the LS process threatened teachers’ ability to take responsibility for student learning. Such top-down approaches to LS conflict with its nature as a teacher-oriented PD format. Other studies (e.g., Sekao & Engelbrecht, 2021; Shingphachanh, 2018) called for principals’ support for teachers’ engagement with LS. In addition to the level of control, studies (e.g., Akiba & Wilkinson, 2016; Sekao & Engelbrecht, 2021) also indicated that external obstacles such as limited time, inflexible routines, and organization structural challenges could result in a lack of systematic support.

4.2.6 Time and duration of LS

Our finding also revealed the issue with time and duration of LS during implementation. Akiba et al. (2019) found that LS duration was one of the three design features that mattered to LS success, and it was significantly associated with teachers’ participation in an effective inquiry process. In their studies, Akiba et al. found the time span of LS ranged from one day to 118 days and the active time ranged from 2 to 23 h. Similarly, across the articles in our sample, the duration of LS varied widely, ranging from one cycle within 1 or 2 days (e.g., Confrey & Shah, 2021; Gero, 2015; Hernández-Rodríguez et al., 2021; Miyakawa & Winsløw, 2019) to multiple cycles over several years (e.g., Aas, 2021; Bruce et al., 2016; Groves et al., 2016; Kager et al., 2022). This variation is noteworthy and an area for future research.

The time concern is also visible within LS phases such as planning and debriefing. In Japan, a school usually spends 4–6 weeks to carefully develop one lesson plan (Seino & Foster, 2021), yet a research lesson plan outside Japan may be developed in 1–2 h (e.g., Confrey & Shah, 2021; Hernández-Rodríguez et al., 2021). Khokhotva (2018) also reported that Kazakhstan teachers were concerned about insufficient time for planning. Similarly, teachers in Lee and Tan (2020) felt rushed with discussions during LS meetings.

4.2.7 New practice in LS implementation

Our findings revealed new developments in LS in recent years, contributing to LS research and practice. First, a type of theory-informed LS has developed. For instance, researchers in the US (Confrey & Shah, 2021; Huang et al., 2019) explored how teachers incorporated learning trajectory and/or variation into research lessons. In Confrey and Shah (2021), middle school teachers who received PD on learning trajectories about ratios conducted an LS to enact what they had learned. The teachers scanned the data report to pinpoint students’ learning trajectories and modified their lessons according to their findings. Huang et al. (2019) reported on a group of elementary teachers who conducted an LS focused on comparative addition. After analyzing the learning trajectory of this topic, teachers incorporated variation strategies into research lessons. Additionally, Pang (2016) reported how an LS incorporated the “five practices” (Smith & Stein, 2018) in mathematics discussion in a Korean context. The “five practices” theory included anticipating, monitoring, selecting, sequencing, and connecting students’ responses to key topics. This theory served as a lens for discourse analysis, guiding Korean teachers in reflecting on and enhancing their lessons during the LS.

The other new development is related to online and/or hybrid LS, likely due to the increased use of technology and the COVID-19 Pandemic. During these LS, video was often used. The formats of these LS were often online (Calleja & Camilleri, 2021; Huang et al., 2021; Widjaja et al., 2017). Consequently, there were increased cross-cultural LS that involved researchers from different countries (e.g., US and China; Japan and Iran) who came together through online platforms to conduct LS (Arani, 2015; Huang et al., 2021).

Both types of new practices support better LS implementation. For instance, with an integration of theory and LS, the LS team can develop a clear research question to guide all phases. This would require an LS team to spend time studying teaching materials to link theory and practice. However, teachers may need support to enhance their understanding of the learning theories in actual classrooms. For example, in Confrey and Shah (2021), teachers intended to use student learning data to adjust their instructions, but their interpretations of the data often appeared to be either too narrow or too broad. As a result, their modified instruction did not effectively align with the targeted goals on the learning trajectories.

Likewise, with online and hybrid LS, teachers and facilitators can conduct LS regardless of their geographical distance, which could help to address issues related to scheduling and a lack of qualified facilitators. The video-based LS also allows teachers to observe research lessons in their own time, which may help them to overcome the structural/organization challenges reported in recent LS studies. Nevertheless, potential challenges persist. For instance, while teacher interaction is crucial during LS, how easily can we promote collegial interactions among teachers in an online LS? As another example, how may online LS help to address equity issues in mathematics education? How may online LS help to sustain LS at scale? Additionally, while online LS offers the flexibility for teachers to independently watch the pre-recorded lessons, how can we ensure that this does not add to the time burdens on teachers who are already juggling busy schedules? Future studies may explore these questions.

4.3 Major impacts of LS in recent mathematics education

Our coding of the 75 empirical articles based on Lewis’s (2016) LS model indicated that the literature mainly reported impact on teachers’ knowledge, beliefs, and disposition (n = 46, 61%). The other two pathways were less frequently mentioned (learning community: n = 17, 23%; curriculum: n = 16, 21%). Similarly, the two outcomes (instruction and learning) were also less frequently reported. In comparison with instruction (n = 22, 29%), there were only 12 studies that reported students’ learning (16%). In addition, 20 articles (27%) did not report any impact of LS. Overall, current LS research has mainly reported its impact on teacher learning but not on other areas.

Our examination of articles and systematic reviews (e.g., Larssen et al., 2018; Seleznyov, 2019; Willems & Van den Bossche, 2019), identified two challenges in understanding LS impact. First, while Lewis’s (2016) model indicates pathways of impact that leads to anticipated outcomes, studies that have reported changes across domains are rare (Willems & Van den Bossche, 2019). Lewis and Perry (2017) are an exception. Through a randomized, controlled trial, the researchers found that LS supported by appropriate resources (curriculum) enhanced teacher knowledge, which seemed to serve as a pathway to improved instructional practice and student learning. More studies like Lewis and Perry (2017) are needed.

The second challenge for understanding the LS impact relates to a lack of rigorous design and consistent outcome measures (Willems & Van den Bossche, 2019), as findings were mainly based on small-scale qualitative research. Of the empirical research articles we collected, 55% employed case study methods. Seleznyov (2019) additionally noted that previous measures on LS effects overemphasized on short-term outcomes. With these two challenges in mind, below we share more details about the identified impact reported in recent studies.

4.3.1 Most reported impact: teacher knowledge

LS studies that reported impacts primarily emphasized LS’s effects on teachers’ professional knowledge, beliefs, and dispositions (e.g., Akiba et al., 2019; Leavy & Hourigan, 2016; Nguyen & Tran, 2022). Corey et al. (2021) reported that when the instructional products generated from LS focused on student mathematical thinking that was specific to a task or a mathematical topic (e.g., descriptions of multiple solutions and ways of reasoning), they contributed to the development of teachers’ knowledge. Vermunt et al. (2019) also reported that teachers of grades 5–8 in London improved their mathematical subject and pedagogical knowledge. In particular, less experienced teachers had a sharp increase in meaning-oriented learning. Although some findings were based on teacher surveys rather than observations, these findings are encouraging because when teachers felt they lacked knowledge, they were nervous and unconfident, which inhibits LS success (e.g., Sekao & Engelbrecht, 2021).

Additionally, we found that some studies reported teachers’ knowledge improvement from the lens of mathematical knowledge for teaching (MKT; e.g., Huang et al., 2021) and its subcategories (KCS and KCT; e.g., Leavy & Hourigan, 2016). Other studies reported that LS impacted teacher knowledge in specific areas. This includes enhanced knowledge of mathematical content (Arani, 2015; Chua, 2019; Suh & Seshaiyer, 2015), students’ thinking, ideas, and competency (Guner & Akyuz, 2020; Pang, 2016), curriculum and planning (e.g. Arani, 2015; Barber, 2018; Pang, 2016), and teaching and enhanced noticing (Pang, 2016).

4.3.2 Less reported pathways of impact: curriculum and learning community

Only a few studies reported the impact of LS on curriculum and teachers’ learning community, two of Lewis’s (2016) pathways of impact. Regarding curriculum, Druken et al. (2021) reported that collaboration between university methods and content faculty through LS resulted in enhanced teaching materials. Pang (2016) also reported that a Korea LS resulted in more specified lesson goals for students. Similarly, Fujii (2016) found that the post-discussion in LS provided a context for revising tasks, resulting in changed curriculum. The limited research on LS’s impact on curriculum (n = 16, 21%) may be partially due to the fact that the study of teaching materials in LS implementation has been largely overlooked.

Similarly, only limited studies (n = 17, 23%) reported LS’s impact on teachers’ professional learning community. For instance, Chua (2019) and Lundbäck and Egerhag (2020) reported that teachers who participated in LS developed greater appreciation for their professional learning community. However, there was little evidence that LS made a sustainable difference by influencing schools’ professional learning cultures and structures in other countries outside Japan (Akiba & Wilkinson, 2016; Seleznyov, 2019). Lewanowski-Breen et al. (2021) reported that LS had a long-term impact on the learning community even after six years. However, in another study, Takahashi and McDougal (2016) reported that despite a longstanding public commitment to research in city schools, “all the schools that piloted lesson study in the early years discontinued after a few years” (p. 516). The limited LS impact on teachers’ learning community is likely related to collaboration challenges and the lack of systematic support during LS implementation.

4.3.3 Less reported outcomes: instruction and student learning

Even though, as indicated by Lewis (2016), improvements of instruction and student learning are goals of LS, most studies did not investigate the impact on these outcomes. As mentioned earlier, the effect of LS on student learning received the least attention (n = 12, 16%).

Regarding impact on instruction, most articles reported positive teaching results, which were primarily demonstrated through analysis of research lessons (Barber, 2018; Guner & Akyuz, 2020; Huang & Shimizu, 2016; Huang et al., 2017; Pang, 2016). Guner and Akyuz (2020) reported that by adapting Japanese LS into an Australia context, teachers had improved their teaching as evidenced by substantial time spent on sharing solutions and more probing questions to elicit students’ explanations. Goei et al. (2021) found that all groups in their LS made changes in teaching. This included organizing lessons toward focusing on students’ work, promoting negotiation among solutions, and making comparisons and connections between students’ ideas. A few studies did report challenges in teaching. As mentioned earlier, Confrey and Shah (2021) investigated how teachers modified their lessons based on student data across learning trajectories. They found that teachers either focused too narrowly on a single datapoint or interpreted data too broadly; neither approach sufficiently supported instructional changes. Regardless of the reported successes or challenges, a common thread across these articles is that the expected teaching shift is geared toward better responses to students’ mathematical thinking and learning.

In terms of the reported LS impact on student learning, existing studies mainly reported how students’ knowledge improved based on the research lessons (Confrey & Shah, 2021; Huang et al., 2019; Lewis & Perry, 2017; Lundbäck & Egerhag, 2020). For instance, Lewis and Perry (2017) reported students’ changes of fraction knowledge after LS supported by a resource kit in the US. Huang et al. (2019) found that students increased their knowledge of solving comparison word problems based on an LS integrated with learning trajectory and variation in China. In addition to reported successes, a few studies also revealed remaining challenges in student learning. For instance, Huang et al. (2019) found that students still had challenges in understanding the equivalence of rewording for comparison word problems (e.g., how many more, how many less, and difference).

The above findings about the positive impact of LS on teaching and learning, although limited, are encouraging. However, for both areas, the reported impacts are limited to the research lesson or assessment conducted during LS. It is unclear whether there is a long-term impact on teaching and learning. In fact, long-term impacts may be further complicated for LS implemented outside of Japan, which may not be sustained if funding is not available. Future research should continue investigating ways to document LS outcomes on teaching and learning in the long-term using validated instruments.

5 Discussions and future directions

LS is one of the most effective PD approaches (Gersten et al., 2014; Lewis & Perry, 2017) and has drawn global attention. However, our findings echo prior literature (e.g., Larssen et al., 2018; Seleznyov, 2019; Takahashi & McDougal, 2016) that the success of LS is indecisive in contexts outside Japan (and China). Our review calls for deeper, culturally relevant understanding of LS, suggesting opportunities for future research and practice.

5.1 Deeper understanding of LS

Studies that reported successes were often small-scale and qualitative, and a majority of the successes related to teachers’ knowledge. However, evidence of how LS implementation made an impact on teaching and learning is lacking. In addition, how changes in the pathways (teacher knowledge, curriculum and teachers’ learning community) lead to outcome changes is also unclear. These findings reveal research gaps that call for future investigation on LS by linking LS implementation (cycles) to its impact (Lewis, 2016). Consider, for example, Aas’s (2021) focus on talking during planning. This study collected data from the first, second, eighth, and ninth LS cycles that could have allowed an investigation of changes in teacher practices beyond characterizing teacher talks.

Furthermore, we noticed a need to foster deeper understanding of LS itself. For instance, while studying and planning should serve as central prerequisites to follow-up activities (e.g., Fujii, 2016; Miyakawa & Winsløw, 2019), the importance of studying teaching materials and lesson planning were often minimized, which may limit impact on curriculum. In addition, across planning, teaching, and debriefing, the purpose or focus of LS is often unclear. For example, teacher reflections may not focus on students’ mathematical thinking and learning. This may even be traced back to the lack of research questions for many LS studies. Moreover, the durations of LS in some studies were quite brief, yet careful lesson planning and deep reflections require significant time. Teachers in some LS reported they felt rushed with both planning and discussions (Khokhotva, 2018; Lee & Tan, 2020). The above challenges indicate a lack of deep understanding of LS in mathematics education. Perhaps KOs can provide some guidance to ensure the LS’s direction and quality (Hernández-Rodríguez et al., 2021; Lewis & Perry, 2017). Unfortunately, in much of the reported LS literature, KOs were not included. Research studies also indicated that there is room for KOs to improve, especially in ways that engage teachers in deep reflections, suiting the teacher-oriented nature of LS.

One additional note: during our article analysis, we monitored their mathematical specificity in the context of LS. While many LS studies used mathematics as a subject for research, the mathematics content did not receive sufficient attention. Overall, our findings call for a deeper understanding of the LS cycle so that the LS impact across the pathways and outcomes (Lewis, 2016) can be observed. Perhaps the reported new developments in LS could provide some support in this endeavor.

5.2 Cultural relevance of LS

As teaching is a cultural activity (Stigler & Hiebert, 1999), teacher learning and professional growth must be culturally relevant. As such, it is critical to consider cultural relevance when studying and implementing LS in diverse nations. As Robutti et al. (2016) noted, LS is so integral to Japanese school culture that it is taken for granted. However, in many other countries, school structure, organization, and time do not support LS. Additionally, since LS has a long tradition in Japan, there is a supportive environment that enables the collective effort and productive collaboration needed at all stages of LS. However, other countries often appear to lack the necessary collaboration between districts and schools (Gero, 2015; Sekao & Engelbrecht, 2021) and even among LS team members (Akiba & Wilkinson, 2016). When such a culture does not exist, LS mechanisms may not function in a way that facilitates teaching and learning changes. For example, Sekao and Engelbrecht (2021) noted that some participating South African teachers’ reluctance to permit others to observe their teaching might reflect a disconnect between local norms and global LS principles. Resistance to observation and feedback might discourage teachers’ full participation in LS and thus limit potential for positive impacts.

Existing literature has reported cross-cultural collaborations (e.g., Clivaz & Miyakawa, 2020; Sekao & Engelbrecht, 2021) that may help improve the impact of LS. Cultural relevance should inform such collaboration. For instance, Groves et al. (2016) questioned how to implement LS in ways that fit the existing Australian curriculum and western teaching culture that stressed small-group rather than whole-class discussion. The authors observed, “Clearly the cultural differences between Japan and Australia are of critical importance in a faithful implementation of JLS in Australia. However, as demonstrated by this project, changing the teaching environment is not impossible, but overcoming long-term cultural practices may be a more serious obstacle” (p. 511). Despite cultural differences, one common observation across global LS studies is that LS should still focus on students’ mathematical thinking and learning (Prediger et al., 2019). For instance, in Clivaz and Miyakawa (2020), teachers in Japan and Switzerland co-developed a grade 4 problem-solving geometry lesson plan and implemented it in both countries, but each nation’s enacted lesson was quite different. While it is necessary to acknowledge cultural factors like different curricula, researchers may also consider what one may learn from this difference. Are there common goals for learning and thinking that students in both countries should achieve? Are there instructional insights that may transcend cultural contexts to support those goals? Can cross-cultural insights inform lesson improvements? Through follow-up experiments and continued cross-cultural collaboration, LS can be implemented in more meaningful ways.

5.3 Future directions

Moving forward, we propose three questions to guide research on LS in mathematics education: (a) How may we investigate LS implementation and its impact in a systematic way? (b) How may we improve understanding and implementation of LS to bring forth anticipated impact that goes beyond teacher knowledge? (c) How may we provide systematic, culturally relevant support for LS in countries outside Japan? Future studies may explore each of these questions to enrich the big picture of LS in mathematics education. For instance, there is a need to clearly document whether changes have occurred in teaching and student learning after LS has been implemented. What aspects of the LS implementation have contributed to the observed change? How will the observed changes be scaled up and sustained with school district support? In addition, future studies may contribute to conceptualizing the key elements of LS (e.g., the role of KO and their relation to facilitators as well as the respective effects on teacher learning), addressing misconceptions (e.g., the purpose of LS and its necessary duration), and establishing validated outcome measures of LS.