Introduction

Literacy in mathematics involves the ability to reason and communicate both orally and in writing (Dunston & Tyminski, 2013). According to the Common Core State Standards, students must build logical statements supported by data to communicate with others (National Governors Association Center for Best Practices & Council of Chief State School Officers 2010). Additionally, the inclusion of mathematics-writing prompts on high-stakes tests to evaluate reasoning and communication in mathematics has become common in recent years. In fact, nearly half of states use mathematics-writing prompts for Grades 4 and 5 on high-stakes mathematics assessments (Powell & Hebert, 2022). Due to the increasing focus on mathematics writing within standards and assessments, researchers have begun to examine how students write in mathematics and how to support students in mathematics writing (Arsenault et al., 2022; Hughes et al., 2020; Powell et al., 2017). Although such research indicates educators report the use of mathematics writing in the classroom holds value to improve mathematics outcomes and writing outcomes (Powell et al., 2021), little research examines the efficacy of mathematics writing instructional practices across studies (Powell et al., 2017).

In this synthesis, we reviewed research studies in which authors provided instruction on mathematics writing for students in Kindergarten through Grade 12. We examined the mathematics writing and mathematics outcomes of mathematics-writing instruction, the instructional components used to teach mathematics writing, and if identified instructional components differed for students with and without mathematics difficulty (MD). In the introduction, we define the types of mathematics writing (i.e., exploratory, informative, argumentative, and mathematically creative; Casa et al., 2016). Next, we review frequent methods for supporting students in mathematics writing. Then, we present the challenges of mathematics writing for all students and for students with MD. Finally, we present the purpose and research questions of the synthesis.

Defining mathematics writing

Educators perceive mathematics writing as a valuable component of mathematics. In fact, over half of educators reported including mathematics writing in the classroom at least once a week (Powell et al., 2021). The inclusion of mathematics writing in the classroom targets the development and assessment of student understanding of mathematical concepts (Casa et al., 2016; Powell et al., 2021). For this synthesis, we defined mathematics writing as the learning and assessment of student mathematical understanding in the four mathematics-writing categories outlined by the Elementary Mathematical Writing Task Force: exploratory, informative, argumentative, and mathematically creative (Casa et al., 2016).

Exploratory mathematics writing

Within exploratory mathematics writing, students act as their own audience to support their understanding of mathematical concepts (Casa et al., 2016). A prompt for exploratory writing may require a student to write about their difficulties while working through a mathematics problem and how they overcame the mathematics difficulty (Tan & Garces-Bacsal, 2016). The use of exploratory writing can support students to develop mathematical ideas as well as lead to engaging students in other types of mathematics writing (Casa et al., 2016).

Informative mathematics writing

For informative mathematics writing, an educator, or another student, acts as the audience as students write explanations about a mathematical concept (Casa et al., 2016). For example, an informative prompt may provide the work of a pseudo student with a prompt for the student to identify the mistakes and explain how they would solve the problem correctly (Namkung et al., 2019). Powell et al. (2017) reported three-fourths of intervention studies focused on informative mathematics writing. When surveyed, 60% of Kindergarten through Grade 12 educators reported that they required students to explain their work through informative mathematics writing at least once a week (Powell et al., 2021).

Argumentative mathematics writing

In argumentative writing, an educator, or another student, acts as the audience as the student constructs an argument or critiques the reasoning of others (Casa et al., 2016). In argumentative writing prompts, students write a claim based on solving a problem and defend the truth of their claim (Kosko & Zimmerman, 2019). The use of argumentative writing becomes especially important as students write mathematics proofs in geometry (Fuentes, 2011). Powell et al. (2021) reported that 51% of educators require students to write arguments in mathematics.

Mathematically creative writing

In mathematically creative writing, students write for a wider audience to document original mathematical ideas or create problems. For mathematically creative writing prompts, students may create their own word problems or mathematical stories (Casa et al., 2016; Levenberg, 2014). Across grade levels, only 36% of educators reported the use of mathematically creative writing in the classroom (Powell et al., 2021).

Supporting mathematics writing

Previous research indicates the potential benefits of instruction and practices opportunities for students engaging in mathematics writing. Educators may provide mathematics-writing instruction as a primary focus of instruction or as an integrated approach to supporting mathematics learning (Baxter et al., 2005; Swanson et al., 2019). When integrating mathematics writing into mathematics instruction, educators frequently have included journal writing, letter writing, explaining problem solving, writing mathematics-vocabulary definitions, and writing word problems (Powell et al., 2017, 2021). Many of which may not require a tremendous amount of educator support for students to engage in them successfully. When providing targeted mathematics-writing instruction, educators reported including explanations of mathematics writing, modeling, opportunities for practice in mathematics writing, and feedback about mathematics writing (Powell et al., 2021).

Although many U.S. students use mathematics writing on high-stakes assessments, and educators include mathematics writing in instruction (Powell & Hebert, 2022; Powell et al., 2021), little consensus exists on how to effectively support mathematics writing (Powell et al., 2017). Across studies with mathematics-writing instruction, high variability exists in reporting student outcomes, such as only providing data about mathematics outcomes or mathematics-writing outcomes or by only providing descriptions of mathematics-writing features without formal scoring (Powell et al., 2017). Within the evidence base, variability exists across outcome measures that target mathematics-writing instruction, and inconsistencies across measures have made it challenging to draw conclusions about the efficacy of mathematics-writing instruction. With this synthesis, we aimed to provide a comprehensive overview of mathematics-writing instructional practices to inform teaching and research related to mathematics writing.

To date, we identified one synthesis about mathematics writing (Powell et al., 2017). The authors identified 29 studies in which mathematics writing was featured in instruction, as academic assessments, or surveys. With their synthesis, they were most interested in surveying the empirical research base of mathematics writing to determine how many mathematics-writing studies have been published. Of all 29 studies, 17 provided information about mathematics-writing instruction. Powell et al. (2017) coded each study for the mathematics content, mathematics-writing type (e.g., informative, argumentative), implementer, type of assessment, and results. The authors noted that most studies used informative mathematics writing with journal writing as the most popular method for writing in mathematics. In only seven studies did educators ask students to engage in organized classroom writing (i.e., instruction) with only a handful of studies collecting data about the efficacy of mathematics-writing instruction. Powell et al. (2017) did not evaluate the quality of the studies, literature focused on students with MD, or grey literature, such as dissertations, which would be important given the exploratory nature of many studies of mathematics writing. Furthermore, the authors suggested they had difficulty drawing conclusions about the efficacy of mathematics writing because of the sparse amount of data available and suggested the necessity of future research in all areas of mathematics writing.

Challenges of mathematics writing

To successfully communicate and reason through mathematics writing, students must use general writing skills, computation, mathematics language, and (on occasion) visual representations within their writing (Arsenault et al., 2022; Casa et al., 2016; Hebert & Powell, 2016; Hughes et al., 2020; Powell & Hebert, 2016). However, mathematics writing continues to place a strain on students with and without MD. In the following paragraphs we outline each of the prerequisite skill areas and describe associated challenges.

Within general writing skills, planning and organization are especially challenging for students in mathematics writing (Correnti et al., 2013; Hebert & Powell, 2016). In informative mathematics writing, students must organize their writing with an introduction and conclusion in addition to the problem-solving procedures (Hughes et al., 2020). For argumentative mathematics writing, students must also include a rebuttal (Hughes et al., 2020). Although including an introduction, conclusion, and rebuttal, if needed, supports accurate responses, often, students only include the problem-solving features (Hebert & Powell, 2016). Students may include an introduction sentence, but they rarely write a concluding sentence (Correnti et al., 2013; Hebert & Powell, 2016). Therefore, the development of general writing skills can help effectively convey mathematical ideas.

A second challenge for mathematics writing may include doing computations within a problem. Mathematics-writing prompts frequently include a prompt for students to review computations completed by a pseudo student or prompt students to complete computation problems themselves (Hebert & Powell, 2016; Hughes & Lee, 2019). Yet, students frequently make computation errors or do not attempt computation problems to respond to mathematics-writing prompts (Arsenault et al., 2022; Hebert & Powell, 2016). When responding to informative mathematics-writing prompts with a pseudo student, only 38.9% of students attempted to check the student’s work by doing the computation themselves (Arsenault et al., 2022). Additionally, only one-third of students wrote equations in their mathematics-writing responses. When students included equations in their mathematics-writing responses, only 32% correctly set up their equations (Hebert & Powell, 2016). Mathematics writing cannot be separated from computational understanding. The connection likely relates to the necessity for students to be able to complete the computations with accuracy to then express mathematical understanding in mathematics writing. Therefore, when supporting mathematics writing, student computational skills must be considered.

Another hurdle for mathematics writing may be related to the use of mathematics vocabulary. To perform well on mathematics-writing prompts, students must include specific and clear mathematics vocabulary (Stonewater, 2002). Precise and clear mathematics vocabulary includes technical (e.g., integer, quadrilateral), subtechnical (i.e., degrees), general (i.e., difference, more), and symbolic vocabulary (i.e., +, 5; Monroe & Panchyshyn, 1995). The inclusion, or exclusion, of specific mathematics vocabulary impacts quality of mathematics-writing responses (Hebert & Powell, 2016; Hughes et al., 2020; Stonewater, 2002).

Another challenge may be the use of visual representations within mathematics writing. The inclusion of visual representations supports clear mathematics writing by going beyond words to communicate mathematically and can indicate a high-level of understanding of a problem (Casa et al., 2016; Utami et al., 2019). The inclusion of visual representations correlates with the production of correct mathematics-writing responses (Hughes et al., 2020). Yet, students frequently do not include visual representations in their writing, especially if the prompt did not include a visual representation (Arsenault et al., 2022; Hebert & Powell, 2016).

Students with mathematics difficulty (MD)

We also focus on mathematics writing for students with MD. Students with MD perform below grade-level performance or below the average range in mathematics based on researcher assessments or school diagnosis (Nelson & Powell, 2018), including both students with Individualized Education Programs with goals in mathematics as well as students who perform below a cut-off score assigned by researchers. Cut-off scores frequently range between the 10th to 45th percentile (Geary et al., 2012; Hecht & Vagi, 2010). Students with MD typically perform below their same aged peers across early numeracy, computation, rational number, and word-problem skills (Arsenault & Powell, 2022; Nelson & Powell, 2018).

For students with MD, mathematics writing can be an especially challenging task because it requires students to access both general writing and mathematical prerequisite skills, something that can challenge all students (Arsenault et al., 2022; Hebert & Powell, 2016; Hughes et al., 2020). On mathematics-writing tasks, students with MD typically perform lower than peers on mathematics-writing prompts (Arsenault et al., 2022). Additionally, students with MD write fewer words, numbers, and symbols than their typically achieving peers when responding to mathematics-writing prompts (Arsenault et al., 2022; Hebert & Powell, 2016).

When writing in mathematics, students with MD demonstrate difficulty with general writing skills as well as mathematics specific skills (Hughes et al., 2020). Hughes et al. (2020) reported that when responding to a mathematics-writing prompt, students with MD experienced difficulty with general writing skills as they tried to explain computational procedures and engage in reasoning. Students with MD also had trouble with completing computation problems within mathematics-writing prompts (Hughes et al., 2020). For example, when responding to an informative mathematics-writing prompt with a fraction word problem, only 28 out of 51 students wrote the correct answer to the fraction problem (Hughes et al., 2020). Frequently, students with MD will not even attempt to complete needed computations (Arsenault et al., 2022).

Purpose and research questions

Educators perceive mathematics writing as valuable for supporting mathematics understanding (Powell et al., 2017, 2021). Research also indicates that mathematics writing can be included in the classroom in a variety of formats, but only one past synthesis examines mathematics-writing practices for supporting students’ mathematics-writing performance (Powell et al., 2017). Considering the growing importance of mathematics writing and the challenges students frequently experience with mathematics writing, especially for students with MD, it is important that an updated synthesis examines student outcomes after participating in mathematics-writing instruction, particularly for students with MD. It is also necessary to investigate the quality of mathematics-writing studies and all data related to mathematics-writing and general-mathematics outcomes of mathematics-writing instruction to understand the richness of literature related to mathematics-writing instruction.

In this synthesis, we reviewed the Kindergarten to Grade 12 studies focused on either (a) mathematics-writing instruction or (b) mathematics instruction with mathematics writing as a component of instruction. We investigated the following research questions:

  1. 1.

    How do students who participate in mathematics-writing instruction perform on measures of mathematics writing and general mathematics?

  2. 2.

    Do mathematics-writing and general-mathematics outcomes differ across study and instructional features of mathematics-writing instruction (i.e., type of mathematics writing, instructional focus, participant type)?

  3. 3.

    What methods are used to practice mathematics writing during mathematics-writing instruction?

  4. 4.

    For the studies including students with MD, what methods are used to practice mathematics writing during mathematics-writing instruction?

Methods

Search procedures

We conducted a comprehensive search of the literature dated January 2000 to June 2022 to select studies measuring mathematics instruction with mathematics writing as the main instructional goal or as a component of the instruction. We selected 2000 as the start date because of the National Council of Teachers of Mathematics standards released in 2000. This marked a transition point for instructional planning due to the increased focus on rigorous standards in mathematics for Grades K to 12 (National Council of Teachers of Mathematics, 2014). We searched three databases: PyscINFO, Education Source, and ERIC. For the search, we used one line of search terms: "written math*" OR "math* writing" OR ("word problem*" AND math* AND writ*) OR ("open response*" AND math* AND assessment) OR (“open ended” AND math* AND assessment) OR (math* AND (writ* n2 assess*)) OR (calculus AND writing) OR (geometry AND writing) OR (algebra AND writing) OR TI(math* AND writ*) OR AB(math* n2 writ*) OR SU(mathematics AND writing) OR SU(mathematics AND "written communication"). Figure 1, the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) flowchart (Page et al., 2021), shows the results of the search and the screening of the artifacts. The initial search resulted in 4665 artifacts, decreasing to 3559 artifacts after deduplication. Next, we reviewed titles and abstracts, excluding 3093 artifacts, leaving 466 for full text screening. During full text screening, we excluded 446 artifacts, resulting in 19 articles that met the inclusion criteria. Next, we conducted a forward and backwards search on the included articles. We included one article from the forward search and one article from the backwards search. We also conducted a forward and backwards search on the two additional articles, but we identified no further articles for inclusion. After the forward and backwards search, 21 articles met inclusion criteria for the synthesis.

Fig. 1
figure 1

PRISMA diagram

Inclusion and exclusion criteria

To be included in this synthesis, studies had to meet the following six criteria: (a) The study was published in English. (b) The study was published in or after the year 2000 through 2022. (c) The study design was a randomized control trial, quasi-experimental, regression discontinuity, or single case. For quasi-experimental design studies, treatment and control groups were required, pretests were completed, and a method of pretest equivalence was reported. Table 3 reports the pretest equivalence for each study with a quasi-experimental design. We included these four designs due to previous reports of the limited number of studies including mathematics-writing instruction (Powell et al., 2017). By including a wide range of study designs, we gained a broad representation of the effects and methods in studies with mathematics-writing instruction. (d) The study included mathematics writing instruction. We defined mathematics writing in four categories: exploratory, informative, argumentative, and mathematically creative (Casa et al., 2016). We defined mathematics writing instruction to include educator explaining, educator modeling, educator demonstrating, and/or student practice in mathematics writing (Powell et al., 2021). (e) The authors include a measure of mathematics writing or mathematics for all students who participated in the instruction (except for attrition). Measures of mathematics writing data included quantitative mathematics writing assessments where students wrote mathematically to respond to prompts or questions or mathematics writing artifacts coded for features included or with a rubric (Powell et al., 2017). Measures of mathematics included quantitative mathematics assessments or mathematics artifacts coded with a rubric. (f) The study was conducted with school-aged students from Kindergarten to Grade 12. (g) The study was published in a peer-reviewed journal.

We excluded articles when they met the following exclusion criteria: (a) No participants in the article. For example, when the study referenced a vignette without actual student participants (e.g., Fello & Paquette, 2009). (b) The article included no quantitative mathematics writing or other mathematics assessment data. For instance, Alvi and Nausheen (2019) included interview data, student engagement data, and qualitative data describing student problem-solving episodes, but we excluded the study due to no quantitative mathematics writing or other mathematics assessment data. (c) The participants in the study were outside the Kindergarten to Grade 12 range, such as when student data was not disaggregated from educator data (e.g., Kosko & Norton, 2012). (d) The study did not include mathematics writing, according to our operational definition. For example, if study defined mathematics writing as handwritten algorithms or writing a word problem rather than exploratory, informative, argumentative, or creative writing in mathematics, we excluded the study (e.g., Broto & Greer, 2014). (e) The study did not include mathematics-writing instruction. For instance, Kosko and Zimmerman (2019) only included a mathematics-writing assessment without instruction. (f) The study was a case-study design with only a treatment group and no control group or did not include pretest equivalence between groups (e.g., Baxter et al., 2005). (g) The study was written in a language other than English (e.g., Çontay & Duatepe-Paksu, 2018). (h) The instruction focused on science, technology, or engineering with mathematics. For example, Casler-Failing (2018) was excluded because of the focus on robotics with mathematics. (i) The study did not include mathematics (e.g., van Drie et al., 2005).

Coding procedures

We coded the 21 articles identified for the synthesis for study demographics, mathematics-writing category, mathematics category, intervention characteristics, student outcomes, and study quality. For study demographics, we coded for study design (i.e., randomized control trial, quasi-experimental, regression discontinuity, or single case), study location (i.e., country), student grade level and age, student gender (i.e., female or male), student race or ethnicity, number of students in treatment and control groups, sample MD status, and English language status. For sample MD status, we coded the percent of students with MD included in the sample. If the authors reported the percent of students with MD, we coded the sample as “Yes” and included the exact percent of students with MD included. We coded the sample as “No” for MD if the authors did not report the percentage of students with MD. For example, if the authors reported that the sample included a class of students but did not report how many of the students were students with MD.

For mathematics categories, we documented all types of mathematics content areas covered in the study. We included: early numeracy, algebra, fractions, geometry, life skills, measurement, operations, word-problem solving, pre-algebra and algebra, and calculus. We also included an option for “other” to fill in alternative mathematics categories not listed.

For mathematics-writing categories, we classified the mathematics writing completed in each study as exploratory, informative, argumentative, or mathematically creative (Casa et al., 2016). Samples of exploratory mathematics writing included when students wrote to make sense of their own thoughts about mathematics. We coded studies to include mathematics informative writing if the student provided information or explained mathematics concepts. Writing qualified as argumentative mathematics writing if the students used writing to construct mathematical arguments and critique the reasoning of others. Last, mathematically-creative writing included writing samples when students wrote creatively to communicate original ideas, problems with written responses, or solutions (Casa et al., 2016).

With the mathematics-writing instruction, we focused on instructional features and mathematics-writing instructional methods. The instructional features included total instruction time in minutes (calculated through provided information on number of sessions, duration of sessions, frequency of instruction), group size (i.e., whole class, small groups between 2 and 8 students, small groups with size of group not reported, individual), and implementer (educator, researcher, preservice educator, peer tutor, other). We also coded for the purpose of instruction: a goal of increasing mathematics-writing performance or a goal to improve mathematics performance. We defined studies as focused on mathematics writing if the study purpose included a statement focused on mathematics-writing performance, it was the only instructional component, or if there was modeling with practice in mathematics writing. We defined studies as focused on mathematics content with a mathematics-writing component if only one component of a multi-component mathematics intervention included mathematics writing. Next, we coded for the mathematics-writing instructional methods: self-regulated strategy development (SRSD; Graham et al., 2005), attack strategy, paraphrasing, journal writing, letter writing, defining vocabulary, note taking, responding to mathematics-writing prompts, explaining after solving a mathematics problem, argumentative writing, modeling of mathematics writing, or other mathematics-writing practice (see Table 1 for operational definitions).

Table 1 Operational definitions of instructional methods

Next, we coded for the measure information and student outcomes. We recorded the type of measures (i.e., mathematics test, mathematics-writing test, mathematics-vocabulary test, reading test on mathematics) and the timing of the measure (i.e., completed during instruction, pretest/posttest). We also coded the type of comparison (i.e., pretest/posttest, treatment/control, baseline/treatment). We also documented the mathematics content area (i.e., early numeracy, algebra, fractions, geometry, life skills, measurement, operations, word-problem solving, pre-algebra/algebra, calculus, other). For the mathematics-writing measures, we recorded mathematics-writing type (i.e., exploratory, informative, argumentative, mathematically creative, or other; Casa et al., 2016). For mathematics-vocabulary and reading measures on mathematics, we coded for the content description. Finally, for all measure types, we coded the assessment name and if applicable the subcategory names.

After documenting the measure information, we coded for the student outcomes on all mathematics and mathematics-writing measures. For studies to include pretest and posttest outcomes (i.e., randomized control trials and quasi-experimental design), we coded the means with provided units, standard deviations, and group sizes at pretest and posttests for all groups. We also recorded significance testing and effect sizes (i.e., Cohen’s d, Eta Squared, and Hedges’ g) when reported. For single-case design studies, we recorded progress monitoring data during baseline and instruction. We also included any pretest and posttest means and standard deviations, mean and range of non-overlap for all pairs (NAP), effect sizes (i.e., Tau-U), confidence intervals, and significance testing when reported. Finally, we coded for study quality. We used the Cook et al. (2015) quality indicator checklist; measuring study quality for both group design studies and single-case design studies.

Reliability coding

The first author coded all studies. The third author double-coded 24% of the studies in the synthesis. The first author trained the third author on the coding sheet and manual in a 1-h training. During the training, the first and third author coded a practice article. Then, the third author independently double-coded five of the articles in the synthesis. On these studies, we obtained an overall inter-rater reliability of 80%. The first and third author then met to discuss any disagreements to reach consensus. We reached 100% inter-rater reliability on the 5 double coded articles after discussing the disagreements.

Data analysis

For this synthesis, we completed two types of data analysis methods for studies including raw data which could be further refined. When studies included data only in graphs, we extracted the mean scores using WebPlotDigitizer. The WebPlotDigitizer program measured the distance of scores on a graph from the x- and y-axis based on the scale set for the graph (Rohatgi, 2020). For effect sizes, we calculated Cohen’s d for studies if the means, standard deviations, and group sizes were provided using an effect size calculator for comparing groups with different sample sizes (Lenhard & Lenhard, 2016). When the needed information was not provided to calculate Cohen’s d, we reported the provided effect sizes from the study (i.e., Cohen’s d, Hedges’ g, Tau-U, Eta Squared, and NAP).

We also calculated total number of minutes of instruction. To calculate total number of minutes of instruction, we multiplied the average minutes per session with the average number of sessions. If the studies included a range for the minutes per session or number of sessions, we first averaged the minimum and maximum, then proceeded with multiplying the minutes per session by the number of sessions.

Results

We identified 21 articles for the synthesis, and one of the articles included two separate studies (Hacker et al., 2019), resulting in 22 studies included in the synthesis. Only 10 studies reported race and ethnicity demographics. When reported, the proportion of African American/Black participants ranged from 0.00 to 0.49, Hispanic participants ranged from 0.08 to 1.00, White participants ranged from 0.00 to 0.85, and Asian/Pacific Islander participants ranged from 0.02 to 0.11. Other race and ethnicity categories reported included: Bosnian (0.17), Native American (0.02), Indonesian (0.11), Two or More Races (0.04 to 0.06), and Other (0.02). Table 2 provides a summary of additional study characteristics including study location, research design, study sample size, participant types, language status, grade level, age, implementer, group size and study quality.

Table 2 Summary of study characteristics (N = 22)

Mathematics-writing instruction outcomes

Across the 22 studies, we examined mathematics-writing instruction based on student performance on mathematics measures, mathematics-writing measures, mathematics-vocabulary measures, and reading comprehension measures on mathematics topics.

Mathematics measures

A total of 13 studies included a mathematics test as an outcome measure. For these studies, six group design studies included the needed data to calculate Cohen’s d, four group design studies included alternative data, and three single-case design studies included Tau-U and non-overlapping pairs (NAP).

Quantitative

Among the studies measuring mathematics, six studies included the needed information to compare treatment and control groups based on significance testing and Cohen’s d. For these six studies, measures were focused on algebra, geometry, fractions, and word-problems solving with four studies including standardized measures and two studies including researcher created measures. Significance testing was completed for 23 measures of general mathematics between treatment groups with instruction in mathematics writing compared control groups. Of these comparisons, 14 comparisons were reported as significant and nine were reported as not significant. For the significant comparisons, Cohen’s d ranged between 0.03 and 1.88.

For Cross et al. (2009), although significance testing was completed for one treatment group, reported with effects of significance testing with Cohen’s d, significance testing was not completed for two additional treatment groups with exposure to mathematics-writing instruction. The author team reported using a researcher created measure focused on algebra.

Qualitative

Four group design studies included alternative data. Two of these studies reported significance testing between treatment and control groups on measures of mathematics, but without the needed information to compare based on Cohen’s d (Hacker et al., 2019; Swanson et al., 2019). These studies included 12 comparisons between treatment and control groups for measures of mathematics, with seven significant effects and five nonsignificant effects. Swanson et al. (2019) reported using standardized measures of word-problem solving or computational fluency but did not include the needed information to calculate for effect sizes. The standardized measures included the Comprehensive Math Achievement Test (CMAT), KeyMath, Test of Math Ability (TOMA-3), and Wechsler Individual Achievement Test: Arithmetic Computations (WIAT-R). Hacker et al. (2019) reported Hedges’ g as 0.60 based on the standardized fractions measure easyCBM Numbers and Operations.

Two additional studies reported on measures of mathematics but did not include significance testing across the treatment and control groups. Blanton et al. (2019) reported that the treatment group grew at a significantly faster rate than the control group from pretest to posttest in Grade 3, with the treatment group showing a 21% advantage over the control group by the end of the school year on general mathematics. In Grades 4 and 5, the treatment group performed marginally significantly higher than the control group on general mathematics, with a 2% advantage over the control group. The author team reported using researcher-created measure focused on algebra. In Stoyle and Morris (2017), treatment and control groups were included, but not statistically compared. Treatment group one made greater gains growth from pretest (M = 6.95) to posttest (M = 39.73) than the control group from pretest (M = 22.21) to posttest (M = 43.05). Treatment group two also made greater gains from pretest (M = 10.35) to posttest (M = 40.50) than the control group. This study included a standardized measure in fractions from the Bridges mathematics program.

Single-case design

Three single-case design studies included measures of mathematics. Two studies reported a Tau-U between 0.66 and 0.77 across all students and one study reported NAP across all students as 0.32 (Bundock et al., 2021; Hacker et al., 2019; Kong & Swanson, 2019). Two of the three used standardized measures in fractions and computational fluency; however, the name of the fraction standardized measure was not reported. The computational fluency measure was AIMSweb Math Concepts and Applications (M-CAP). Two of the three also used researcher created measures focused rate of change or word-problem solving.

Mathematics-writing measures

A total of 11 studies included mathematics-writing measures. Of the studies with a mathematics-writing measure, five group design studies compared the effects of a treatment group to a control group with the needed information to calculate Cohen’s d, three group design studies included alternative data, and two studies measured the difference in student performance from baseline to treatment conditions.

Quantitative

Across the five studies to include the needed information to calculate Cohen’s d, 23 comparisons of significance in relation to mathematics writing were completed, including measures of writing organization, mathematics vocabulary, and mathematics content in writing. Of these comparison, 14 comparisons were reported as significant and nine were reported as not significant. For the significant effects, Cohen’s d ranged between 0.36 and 2.88. These studies all included researcher-created measures. Across the studies, the mathematics-writing measures focused on mathematics content including word-problem solving, fractions, and geometry.

Qualitative

Of the four group-design studies to include alternative data, three included comparisons across groups and one included pretest to posttest comparisons. Across the three studies with comparisons across groups but without the needed information to calculate Cohen’s d, nine comparisons of significance in relation to writing organization, mathematics vocabulary, and mathematics content in writing were completed, all of which were reported as significant. Chasanah et al. (2020) reported that the two treatment groups significantly outperformed the control group on a researcher-created measure of mathematics writing. Hacker et al. (2019) reported that the treatment group significantly outperformed the control group on researcher-created measures of mathematical reasoning (g = 1.82), number of argumentative elements (g = 3.20), and total words written (g = 1.04). Last, Uswatum and Mariani (2020) reported that the treatment group significantly outperformed the control group on completeness of mathematical communication ability, proportion of completeness, and written mathematical communication enhancement. It was not reported if this was on researcher created or standardized measures. The mathematics content for the measures included word-problem solving for Chasanah et al. (2020) and fractions for Hacker et al. (2019) and Uswatum and Mariani (2020).

One study used an alternative method for measuring growth on mathematics writing. Gearing and Hart (2019) the students demonstrated significant growth from pre- to posttest on their mathematics-writing scores. Student growth was measured through a researcher-created measure with word-problem solving as the mathematics topic in the measure.

Single-case design

Two studies measured the difference in student performance from baseline to treatment conditions. In Bundock et al. (2021), the students demonstrated a non-significant overall Tau-U of 0.33 from baseline to treatment. In Hacker et al. (2019), the students demonstrated non-overlap across all students in mathematics-writing reasoning (0.61), rhetorical elements (0.45), and total words written (0.14). These two studies used researcher-created measures. The measure in Bundock et al. (2021) included a mathematics focus of rate of change, and the measure in Hacker et al. (2019) focused on fraction content.

Other assessments

Two studies included alternative measures related to mathematics and mathematics writing.

Qualitative

Yang and Lin (2012) measured reading comprehension of geometric proofs through a researcher-created measure. The students in the treatment group significantly outperformed the students in the control group for reading comprehension of geometric proofs for foundational knowledge, logical status, summary, and generality, but not for reading comprehension for proofs for application.

Single-case design

Fore III et al. (2007) included a researcher-created measure of mathematics vocabulary. The students made significant growth from baseline to treatment with a 43% increase in performance from baseline to treatment (NAP = 0.78).

Outcomes based on study and instructional features

We also synthesized outcomes of studies with mathematics-writing instruction based on study and instructional features: Type of mathematics writing, instructional focus, and participant type.

Type of mathematics writing

For type of mathematics writing, we categorized outcomes of the studies by type of mathematics writing.

Exploratory writing

Five studies focused only on exploratory writing during instruction. Two of the studies with exploratory writing completed significance testing comparing treatment and control groups with the needed data to calculate Cohen’s d. They reported 10 significant comparisons for general mathematics (Cohen’s d = 0.06 to 1.04) and eight non-significant comparisons for general mathematics (Moran et al., 2014; Swanson et al., 2014). Two additional studies reported significance testing to compare treatment and control groups without the needed data to calculate Cohen’s d. Swanson et al. (2019) reported six significant comparisons and five non-significant comparisons for general mathematics. Uswatun and Mariani (2020) reported three significant comparisons for mathematics writing. Last, one single-case study demonstrated growth on general mathematics from baseline to treatment (Tau-U = 0.66; Kong & Swanson, 2019).

Informative writing

An additional six studies focused exclusively on informative writing during instruction. Of the six studies, three studies completed significance testing with the needed data to calculate Cohen’s d. There were two significant comparisons for mathematics (Cohen’s d = 0.03 to 0.06; Thayer & Giebelhaus, 2001) and five significant comparisons for mathematics writing (Cohen’s d = 0.84 to 2.41; Hebert et al., 2019; Hughes & Lee, 2020). There was also one non-significant comparison for mathematics writing (Hughes et al., 2019). Three studies also completed alternative methods of analyzing results. Blanton et al. (2019) reported significant gains for the treatment group compared to the control group for Grade 3, but only marginally significant gains for Grade 4 or 5 on a measure of general mathematics. Gearing and Hart (2019) reported significant pretest to posttest gains on a measure of mathematics writing for the treatment group, but not for the control group. Using a single-cased design, Fore III et al. (2007) reported significant gains from baseline to treatment for mathematics vocabulary.

Argumentative writing

Five studies focused only on argumentative mathematics writing during instruction. Of these studies, two studies completed significance testing with the needed data to calculate Cohen’s d. There was one significant comparison for mathematics (Cohen’s d = 0.82; Kiuhara et al., 2019) and three significant comparisons for mathematics writing (Cohen’s d = 1.66 to 2.88; Kiuhara et al., 2019). Two additional studies reported comparisons between a treatment and control group for mathematics and mathematics writing without the needed data to calculate Cohen’s d. This included one significant comparison for mathematics, three significant comparisons for mathematics-writing, four significant comparisons for reading comprehension of geometric proofs, and one non-significant comparison for reading comprehension of geometric proofs (Hacker et al., 2019; Yang & Lin, 2012). Last, one single-case design study reported students made growth on mathematics (NAP = 0.32) and mathematics writing for reasoning (NAP = 0.61), rhetorical elements (NAP = 0.45), and total words written (NAP = 0.14) from baseline to treatment (Hacker et al., 2019).

Mathematically-creative writing

We coded for studies that used only mathematically-creative writing, but no studies included in this synthesis reported only using mathematically-creative writing.

Multiple types of writing

The remaining six studies included multiple types of mathematics writing. Five of the six studies included argumentative and informative mathematics writing. For the studies with argumentative and informative mathematics writing, three studies reported significance testing between a treatment and control group with the needed data to calculate Cohen’s d. The studies reported eight significant comparisons for mathematics writing (Cohen’s d = 0.36 to 0.59), two significant comparisons for mathematics (Cohen’s d = 0.37 to 1.88), eight non-significant comparisons for mathematics writing, and one non-significant comparison for mathematics (Cohen et al., 2015; Cross, 2009; Gavin et al., 2013). Although Cross (2009) reported significance testing for one treatment group, they did not report significance testing for two additional treatment groups. Additionally, Stoyle and Morris (2017) did not report significance testing for two treatment groups but did report the treatment groups made greater gains than the control group from pretest to posttest on mathematics. Last, in Bundock et al. (2021), the students demonstrated a non-significant overall Tau-U of 0.33 from baseline to treatment on mathematics writing and a significant overall Tau-U of 0.77 from baseline to treatment on mathematics.

One of the six studies to include multiple types of mathematics writing was Chasanah et al. (2020) which included informative and creative writing. In this study, the two treatment groups significantly outperformed the control groups.

Instructional focus

Although the majority of the studies focused instruction on mathematics writing, several studies also focused instruction on mathematics content or mathematics vocabulary with mathematics writing as a component of the instruction.

Mathematics writing

A total of 13 studies focused instruction on mathematics writing. Of these, six studies included comparisons between treatment and control groups with the needed data to calculate Cohen’s d. These two studies reported two comparisons on mathematics where the treatment group significantly outperformed the control group (Cohen’s d = 0.37 to 0.82; Cross, 2009; Kiuhara et al., 2019). Although Cross (2009) reported that one treatment group significantly outperformed the control, the authors did not significantly compare two additional treatment groups to the control group. For mathematics writing, the treatment groups significantly outperformed the control groups on 16 comparisons (Cohen’s d = 0.36 to 2.88), but there was no difference between the treatment and the control groups for nine comparisons for mathematics writing (Cohen et al., 2015; Hebert et al., 2019; Hughes & Lee, 2020; Hughes et al., 2019; Kiuhara et al., 2019).

Five additional studies included treatment and control groups, but did not include the needed data to calculate Cohen’s d. For mathematics writing, the treatment groups significantly outperformed the control on eight comparisons (Chasanah et al., 2020; Hacker et al., 2019; Uswatum & Mariani, 2020). For mathematics, the treatment group significantly outperformed the control for one comparison (Hacker et al., 2019). Two studies also reported significant gains from pretest to posttest for three comparisons of mathematics and mathematics writing for treatment groups (Gearing & Hart, 2019; Stoyle & Morris, 2017).

Two single-case design studies included a main instructional focus of mathematics writing. Bundock et al. (2021) reported significant gains for mathematics writing but not for general mathematics. Hacker et al. (2019) reported gains for mathematics writing and general mathematics.

Mathematics content

Eight studies focused on mathematics content but used mathematics writing as a component of the instruction. A total of four studies included significance testing with the needed data to calculate Cohen’s d. These studies included 22 comparisons of mathematics between treatment and control groups and no comparisons for mathematics writing. For the mathematics comparisons, 13 resulted in the treatment groups significantly outperforming the control group (Cohen’s d = 0.03 to 1.88) and nine resulted in no significant difference (Gavin et al., 2013; Moran et al., 2014; Swanson et al., 2014; Thayer & Giebelhaus, 2001). Two additional studies reported significance testing on measures of mathematics between treatment and control groups without the needed data to calculate Cohen’s d for 17 comparisons. For these comparisons, 11 were significant and six were not significant (Swanson et al., 2019; Yang & Lin, 2012). One additional study reported significant gains in mathematics from pretest to posttest in mathematics for Grade 3, but only marginally significant gains for Grades 4 and 5 (Blanton et al., 2019). And last, Kong and Swanson (2019) demonstrated growth on mathematics from baseline to treatment.

One final study included mathematics writing as a component of mathematics-vocabulary instruction. In the mathematics-vocabulary study, students made significant gains from baseline to treatment for mathematics vocabulary (Fore III et al., 2007).

Participant type

We also examined the effects based on participant type. We categorized the participant types as studies which identified all students in the sample as students with MD, studies which identified a portion of the students in the sample as students with MD, and studies which did not identify any students in the sample as students with MD.

Students with MD (100%)

A total of eight studies identified all students in the sample as students with MD. Of these eight studies, three included significance testing between treatment and control groups with the needed data to calculate Cohen’s d. There were 11 significant comparisons for mathematics (Cohen’s d = 0.06 to 1.04) and eight non-significant comparisons for mathematics (Kiuhara et al., 2019; Moran et al., 2014; Swanson et al., 2014). Kiuhara et al. (2019) also reported one significant comparison for mathematics writing (Cohen’s d = 2.88). Hacker et al. (2019) also identified all students in the sample as students with MD and compared a treatment group and control, but did not include the needed data to calculate Cohen’s d. The treatment group significantly outperformed the control for mathematics (g = 0.60) and for mathematics writing in relation to mathematical reasoning (g = 1.82), number of argumentative elements (g = 3.20), and total words written (g = 1.04). Three additional single-case design studies also identified all students in the sample as students with MD. For these studies, authors reported significant gains in mathematics vocabulary (Fore III et al., 2007) and positive gains in mathematics writing (Bundock et al., 2021; Hacker et al., 2019). For mathematics, Hacker et al. (2019) reported positive gains, but Bundock et al. (2021) reported non-significant gains.

Students with MD (< 100%)

Three studies reported students with MD as a portion of the students included in the study sample. Gearing and Hart (2019) reported 4.7% of the students in the study were identified as students with MD. They did not disaggregate the results for students with MD, instead reported that the treatment group made significant gains on mathematics writing from pretest to posttest (Cohen’s d = 0.87), but the control group did not make significant gains. Alternatively, Cohen et al. (2015) and Swanson et al. (2019) disaggregated the results of students with MD from the rest of the sample. In Cohen et al. (2015), 55% of the students were identified with MD. The students with MD made significant gains compared to the control group in mathematics writing in linking words, reasoning, formal vocabulary count, formal vocabulary terms used correctly, and complete sentences. They did not make significant gains compared to the control group for informal vocabulary content, informal vocabulary terms used correctly, and attempted mathematical writing. This differed from the students without MD who significantly outperformed the control group in mathematics writing for reasoning, formal vocabulary count, and formal vocabulary terms used correctly, but not for linking words, informal vocabulary content, informal vocabulary terms used correctly, attempt at mathematical writing, and complete sentences. Swanson et al. (2019) reported 33% of the sample included students with MD. The author team disaggregated the students with MD who were English language learners from the students with average mathematics performance who were English language learners. The students with MD in treatment group one did not significantly outperform the control group in any measure of mathematics, differing from the students without MD who significantly outperformed the control group on the TOMA-2 and CMAT. The students with MD in treatment group two significantly outperformed the control group for Key Math (g = 0.23), but not for the TOMA-2, CMAT, or WIAT-R. Alternatively, the students without MD in treatment group two significantly outperformed the control group for the TOMA-2, CMAT, and Key Math. The students with MD in treatment group three significantly outperformed the control group for the TOMA-2 and CMAT (g = 0.22), but not for the Key Math or WIAT-R, aligning with the students without MD.

Students without MD

The remaining 11 studies did not identify students included in the studies as students with MD. Five of the studies with students without MD included significance testing between treatment and control groups with the needed data to calculate Cohen’s d. For mathematics, the treatment group significantly outperformed the control group for four comparisons (Cohen’s d = 0.03 to 1.88) and did not significantly outperform the control group for one comparison (Cross et al., 2009; Gavin et al., 2013; Tayer & Giebellaus, 2001). Yet, Cross et al. (2009) did not report significance testing for two out of the three treatment groups compared to the control group. For mathematics writing, the treatment group significantly outperformed the control group for five comparisons (Cohen’s d = 0.84 to 2.41) and did not significantly outperform the control group for one comparison (Hebert et al., 2019; Hughes & Lee, 2020; Hughes et al., 2019). The remaining studies provided information on treatment and control groups, but without the needed data to calculate Cohen’s d. Chasanah et al. (2020) and Uswatun and Mariani (2020) included five significant comparisons between the treatment and control groups on mathematics writing. Yang and Lin (2012) included three significant comparisons and one non-significant comparison for reading comprehension of geometric proofs. Last, Blanton et al. (2019) and Stoyle and Morris (2017) reported significant gains in mathematics from pretest to posttest for treatment groups but not control groups. Blanton et al. (2019) reported significant gains from pretest to posttest for Grade 3 and marginally significant gains for Grade 4 and 5. Stoyle and Morris (2017) reported significant gains for two treatment groups from pretest to posttest.

Instructional methods

We examined methods in the 22 studies used to practice mathematics writing during mathematics-writing instruction (see Table 3). Across the studies, seven included educator modeling of mathematics writing. Studies more frequently included discussion, with 16 studies including discussion. Other embedded oral practices involved think aloud (2 studies), educator questioning (1 study), and feedback (3 studies).

Table 3 Study intervention components, methods, and results

Several studies included planning and organization methods related to mathematics writing. Eight studies featured attack strategies for mathematics writing. For example, Hughes et al. (2019) used PRISM-Check as an attack strategy. In PRISM-Check, students completed a six-step process for solving and responding in writing to the mathematics problem. Similarly, eight studies used graphic organizers, and five studies used SRSD as instructional methods. In Moran et al. (2014) students also revised their mathematics writing. In Cohen et al. (2015), the educators and students referenced an interactive word wall.

The structure of mathematics-writing instructional methods also ranged across studies. The practice of journal writing occurred in two studies with both paper journals and online journals (i.e., blogging). Alternatively, no studies included letter writing. In eight studies, students participated in informative writing tasks after solving mathematics problems. Similarly, students paraphrased mathematics concepts, procedures, and problems in five studies. Students also participated in argumentative writing to justify their mathematics problem solving in nine studies and responded to formal mathematics-writing prompts in three studies. In a single study, students created problems (Chasanah et al., 2020). Students also used mathematics writing for defining vocabulary in two studies and note taking in three studies.

Instructional methods for students with MD

Finally, we reviewed the instructional methods for practicing mathematics writing during mathematics-writing instruction in eight studies which identified all students in the sample as students with MD. Four studies included educator modeling in mathematics writing. In five studies, students participated in discussions related to mathematics writing, two of which students also engaged in think aloud. Additionally, for one study which included discussion, the educator also provided feedback to the students (Swanson et al., 2014).

These studies also included instructional methods related to organization and planning. Of the studies with instructional methods related to organization and planning, five studies included an attack strategy, and five studies included graphic organizers. Three of the studies with an attack strategy and graphic organizers used the SRSD framework. Additionally, in one study students revised their mathematics writing (Moran et al., 2014).

The studies supporting students with MD through mathematics-writing instruction also varied in mathematics-writing formats. Although no studies included journal writing, letter writing, or formal mathematics-writing prompts, studies included argumentative writing, note taking, informative writing, paraphrasing, and defining vocabulary. Students justified their reasoning through argumentative writing in three studies and participated in informative writing tasks after solving mathematics problems in three studies. Similarly, in three studies students with MD also commonly participated in note taking. Students with MD also participated in paraphrasing in three studies and defined vocabulary with only one study (Kiuhara et al., 2019).

Discussion

In this synthesis, we reviewed studies with mathematics-writing instruction to review outcomes and methods of mathematics-writing instruction for students with and without MD. First, we asked about student outcomes in mathematics writing and mathematics for students who participated in mathematics-writing instruction. Then, we investigated whether outcomes varied based on study and instructional features. Third, we examined which methods authors used to practice mathematics writing. And fourth, we determined which methods authors used to practice mathematics writing for studies with only students with MD. By examining the outcomes and methods of mathematics-writing instructional practices, considerations for future research and practice can be identified.

Is mathematics-writing instruction helpful?

Across Grades 1 to 12, authors generally reported positive academic outcomes related to instruction with mathematics writing. Like the previous mathematics-writing synthesis (Powell et al., 2017), the outcome measures for studies included a wide variety of measures. For the studies with treatment and control groups with mathematics measures, the author teams reported treatment groups significantly outperformed the control groups more frequently than they reported no significant difference. For the studies using pre- to posttesting or baseline to intervention differences, the author teams also reported growth in mathematics outcomes.

For the studies with treatment and control groups measuring mathematics-writing measures, authors reported the treatment groups significantly outperformed the control groups more frequently than they reported no significant difference. For the studies using pre- to posttest or baseline to intervention differences, the author teams reported significant mathematics-writing gains on one study (Gearing & Heart, 2019), growth in mathematics-writing on the second study (Hacker et al., 2019), and no significant gains on the third study (Bundock et al., 2021). One study measured student performance through mathematics vocabulary, with students demonstrating significant gains in mathematics vocabulary (Fore III et al., 2007). Last, one author team measured reading comprehension of geometric proofs, with students demonstrating mixed outcomes (Yang & Lin, 2012).

The generally positive results suggest that student mathematics and mathematics-writing outcomes may improve after exposure to instruction with mathematics writing. The positive trend aligns with the previous synthesis on mathematics writing which indicated promising results for mathematics-writing practices such as journal writing and organized mathematics writing in the classroom (Powell et al., 2017). Results of this synthesis demonstrated the use of mathematics-writing instruction may support student outcomes for mathematics and mathematics writing on high-stakes assessments. Yet, with the high variability of type of measure, mathematics content, and methods of analysis, more research is needed to determine efficacy of mathematics-writing instruction on student outcomes in mathematics and mathematics writing.

Do the study and instructional features impact mathematics-writing efficacy?

Type of mathematics writing

First, we examined the type of mathematics writing used within mathematics-writing instruction. Studies either focused on one type of mathematics writing or multiple types of mathematics writing. Of the studies focused on one type of mathematics writing, author teams most frequently included only informative writing (6 studies), aligning with previous research (Powell et al., 2017). Unlike previous research, the synthesized results indicated that nearly as many studies focused exclusively on exploratory writing (5 studies). For informative writing studies, overall studies demonstrated positive mathematics and mathematics-writing outcomes. The exploratory writing studies demonstrated higher rates of non-significant outcomes across studies, especially with for measures of mathematics. Slightly fewer studies exclusively included argumentative writing (5 studies), with positive mathematics and mathematics-writing outcomes. Alternatively, zero studies included only creative writing. Generally, we noted improved outcomes for students when they participated in mathematics-writing instruction including informative, exploratory, argumentative, or creative writing.

Six more studies included multiple types of mathematics writing within instruction. Five of the studies included both argumentative and informative writing. These studies demonstrated about an even spread of significant and non-significant outcomes for mathematics and mathematics writing when comparing the treatment and control groups. One study focused on informative and creative mathematics writing in which the students in the treatment group significantly outperformed the students in the control group on mathematics writing. Across the six studies to include multiple types of mathematics writing, the minimal significant results may suggest that involving multiple types of mathematics writing within instruction may restrict student outcomes. Yet, due to the limited number of studies included, no conclusions can be drawn about the limitations of including multiple types of mathematics writing in instruction.

Instructional focus

In addition to types of mathematics writing, we examined efficacy based on instructional focus. Just over half of the studies (13 studies) focused on mathematics writing as the primary area of instruction. For the studies focused on mathematics writing, author teams of frequently reported positive outcomes for either mathematics or mathematics-writing outcomes. Such trends align with educator perspectives that the use of mathematics-writing effectively supports students to learn mathematics and communicate mathematically (Powell et al., 2021).

The remaining eight studies focused on mathematics or mathematics vocabulary, including mathematics writing as one component of overall mathematics instruction. Of the mathematics-focused studies, author teams reported a mix of significant and non-significant outcomes for mathematics. None of these studies reported mathematics-writing outcomes. The inclusion of mathematics writing may support student mathematics outcomes; however, conclusions cannot be drawn based on how mathematics writing impacted mathematics outcomes due to the multi-component nature of the interventions. Last, for the one study focused on mathematics vocabulary, students demonstrated significant gains on mathematics vocabulary. Therefore, including mathematics writing as a component of mathematics-vocabulary instruction may be of value, but more research could help determine if mathematics writing was an active component.

Participant type

We also examined the outcomes for studies with a full range of students and studies with only students with MD in the treatment group(s). Eight studies exclusively included students with MD, three studies reported including students with MD as a portion of the study sample, and 11 studies did not report including students with MD. For the studies with only students with MD, students most frequently demonstrated significant performance on mathematics and less frequently demonstrated significant performance on mathematics writing. The studies to include a portion of the sample as students with MD (between 4.7 and 55%) only disaggregated the students with MD for two studies (Cohen et al., 2015; Swanson et al., 2019). For these two studies, students significantly outperformed the students in the control group only for some measures of mathematics and mathematics writing. The remaining 11 studies did not report the number of students with MD. For these studies, students in the treatment groups frequently outperformed the students in the control groups on mathematics and mathematics writing. While a limited number of studies with mixed measures and findings produced these results, it does indicate that students with MD can benefit from mathematics-writing instruction. Such trends suggest that mathematics-writing may be used effectively to support mathematics and mathematics-writing outcomes for students with MD in addition to a full range of students.

How do educators teach mathematics writing?

Across studies, authors used a wide variety of methods for instruction in mathematics writing. The two methods used most often included discussion and explaining problem solving with informative writing. The high rates of inclusion of discussion aligned with reports that educators who taught mathematics-writing engaged students in guided practice, provided corrective feedback, practices which relate to the use of educator and student discussions (Powell et al., 2021). The trends indicate the importance of discussion within mathematics-writing instruction. Furthermore, the common use of responding to problem-solving prompts reflects the value placed on informative writing by researchers and educators (Powell et al., 2021).

All other strategies occurred in 10 or fewer studies. Between 5 and 10 studies included: graphic organizers, argumentative writing to justify mathematics problem solving, attack strategies, modeling, paraphrasing, and SRSD. Under five studies included the following instructional practices: note taking, defining vocabulary, journal writing, formal mathematics-writing prompts, feedback, think aloud, educator question, revised mathematics writing, word wall, and create problems. Additionally, no studies involved letter writing, respond to discussion, or drawing. Several of the methods align with practices commonly used to support mathematics instruction, writing instruction, or both mathematics and writing instruction. For example, the use of attack strategies frequently occurs in mathematics instruction to support word-problem solving (Fuchs et al., 2021; Powell et al., 2020). Alternatively, the use of SRSD is frequently used as an explicit writing instruction framework for general writing skills (Graham, 2005). Other strategies, such as graphic organizers, vocabulary instruction, and feedback support students across the areas of mathematics and writing (Alghamdi et al., 2020; Graham et al., 2013; Kim et al., 2021; Powell & Driver, 2015). The high variety of methods used across studies suggest that few methods have been established as best practices for mathematics-writing instruction; instead, authors tend to rely on methods commonly used in mathematics instruction or writing instruction.

How do educators teach mathematics writing to students with MD?

Many of the same methods used in the studies with a full range of students were also used in the studies with only students with MD in the treatment group(s). As described, only eight studies focused exclusively on students with MD. Five studies included the methods of discussion, attack strategy, and graphic organizers and four studies included modeling. Three studies included argumentative writing, informative writing, SRSD, paraphrasing, and note taking. Fewer than three studies included think aloud, feedback, revising, and defining vocabulary. No studies with a sample of only students with MD included journal writing, letter writing, and formal mathematics-writing prompts. Across the studies focused exclusively on students with MD and on a wider range of students, we identified discussion as one of the most common methods for instruction for mathematics writing. This may indicate the value of including discussion as a tool for supporting mathematics-writing development for a full range of students as well as students with MD. Other methods frequently used for students with MD also overlapped with methods used in mathematics and writing instruction, such as attack strategies and graphic organizers (Alghamdi et al., 2020; Hebert et al., 2018; Powell et al., 2020). The results indicates that, similar to the methods used across all studies, methods for mathematics and writing instruction also overlap with those used for students with MD in mathematics-writing instruction. More research could better identify the efficacy of the methods for directly supporting students with MD in mathematics writing.

Limitations

Before providing considerations for research and practice, we note several limitations. First, because of the variability in research designs and limited quantitative data provided by authors, we did not have access to enough data to provide effect sizes for each study. Although we reported standardized effect sizes for Cohen’s d for 10 group design studies, the remaining 8 group design studies are qualitatively compared because they did not include the needed information to calculate Cohen’s d. Additionally, four single-case design studies either reported Tau-U or NAP. If all studies could be compared with consistent effect sizes, this would provide easier comparisons across studies and allow for a meta-analytic approach to the data. Similarly, the lack of any standardized measures in mathematics writing or consistent use of the same mathematics measures prohibits comparison of results across studies. Second, the results are limited by publication bias. In this synthesis we only have the significance testing included which had been reported by the author teams. Some author teams did not report significance testing (Cross et al., 2009) and others may have excluded measures which did not demonstrate significant differences. Third, even though 22 studies appear adequate, it is important to note the studies occurred across Grades 1 to 12. Therefore, only a handful of studies focused on any one grade level, so the number of studies is often inadequate to draw strong conclusions about mathematics-writing instructional practices. Similarly, the number of studies focused exclusively on students with MD was less than 10. Taken together, future research is warranted to increase the study numbers and sample sizes which would allow for more robust conclusions about mathematics writing.

Considerations for research and practice

Our results demonstrate a high need for future research on mathematics-writing instruction. Researchers should primarily consider that despite the positive results indicating mathematics writing supports students, more high-quality studies are needed to strengthen the knowledge base on mathematics-writing instruction. By conducting studies with clear instructional plans, formal mathematics and mathematics-writing measures, and a rigorous data analysis plan, a stronger understanding of the efficacy of mathematics-writing instruction and practices can be determined. The second consideration for research involves the development of mathematics-writing instruction for students with MD. Students with MD tend to experience more difficulty than their peers without MD on mathematics-writing measures (Arsenault et al., 2022; Hughes et al., 2020), but few studies focused on students with MD. More rigorous research could identify how to intensify mathematics-writing outcomes for students with MD.

Considering the positive outcomes and the increased prevalence of mathematics writing (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010; Powell & Hebert, 2022), we recommend that practitioners provide instruction in mathematics writing to both classes with a full range of students and intervention groups including students with MD. When providing mathematics-writing instruction, practitioners should consider instruction on one type of mathematics writing, especially informative mathematics writing, due to the frequent use and high rates of positive outcomes for students practicing informative mathematics writing. They should also consider using best practices emphasized in the literature on writing and mathematics, such as discussions, graphic organizers, or attack strategies within mathematics-writing instruction.

Conclusion

We examined mathematics writing in the elementary and secondary grades. Many studies reported positive outcomes in mathematics and mathematics writing. Such positive outcomes varied across study and instructional features, such as types of mathematics writing, instructional focus, and participant type. For studies with a full range of students and studies primarily supporting students with MD, instructional methods aligned with those used in mathematics and writing instruction (Alghamdi et al., 2020; Graham et al., 2013; Kim et al., 2021; Powell & Driver, 2015). While many studies in this synthesis demonstrated improved mathematics writing, results also indicate the high need for future rigorous research on mathematics-writing instruction to provide high-quality analysis of mathematics-writing instruction for students with and without MD.