Introduction

Digital technologies are exciting pedagogical tools that can enhance the delivery, clarity, and precision of mathematics instruction. Incorporating technology in the classroom makes an essential contribution to student success in mathematics (Nepo, 2017). Based on this trend, research examining effective use of technology in the mathematics classroom has grown exponentially. Over the last three decades, numerous meta-analytic studies have investigated technology’s effects on mathematics achievement and the factors that mediate these effects (Chan and Leung 2014; Li and Ma 2010). These meta-analyses provide summary effect size estimates, as well as moderators of the effect sizes across studies. Summary effect sizes are often the focus of traditional meta-analysis, while less emphasis is placed on the moderators of these effects.

Effect size reporting and its role in meta-analytic thinking are considered significant concerns in effective mathematics education research consumption and reporting. The American Psychological Association (APA 2010) and the American Educational Research Association (AERA 2006) regularly advocate for the reporting of effect sizes and more recently, considered meta-analytic thinking an extension to previous reporting practices. Numerous mathematics education scholars cite the benefits of effect size reporting and meta-analytic thinking through the presentation and interpretation of confidence intervals (Young et al. 2013; Young and Young 2016; Cumming 2012; Zientek et al. 2008). Effect sizes and confidence intervals are organic elements of meta-analytic research and represent metrics for comparison and summarization of effects across studies. Therefore, reviewing the trends in previous meta-analytic research on the moderators of the effects of technology integration on mathematics achievement is vital to the fidelity of technology integration research in the mathematics classroom.

However, to enhance the teaching and learning of mathematics with technology, researchers must refine theoretical constructs through empirical specification, which can, and should guide classroom applications. Moderators are often directly related to classroom implementation, and can be used to refine theoretical constructs thereby supporting empirical specification. Unfortunately, moderators of effect sizes are rarely synthesized in the empirical literature. Synthesizing the moderators of effect sizes across prior meta-analyses has empirical and practical importance to effective implementation of technology-enhanced teaching in the mathematics classroom.

The purpose of this systematic review was to examine the moderator analysis results for prior meta-analytic research to identify trends in empirical research and practice. It is our hope that results of this study provide recommendations for future research and instructional praxis. These results are relevant because they demonstrate how the expansion of meta-analytic thinking supports effective classroom teaching with technology.

Literature review

Prior syntheses and meta-analyses combine knowledge from individual studies to inform the teaching and learning practice with technology. Recent syntheses have examined the influence of technology-enhanced instruction on learning across a multitude of disciplines and contexts (Chang et al. 2018; Fu and Hwang 2018; Wang et al. 2017), however few studies have systematically reviewed prior meta-analyses to synthesize the results across first-order meta-analysis (Young et al. 2018; Gurevitch et al. 2018; Tamim et al. 2011). Within mathematics education research, numerous studies have examined the unique influences of specific technology integration on the teaching and learning of mathematics. Several studies have examined the relationship between teacher pedagogical beliefs and their use of technology in the mathematics classroom. The majority of prior meta-analytic research has focused on the unique effects of integrating specific technological tools in the mathematics classroom. In the sections that follow, the researchers review the effects of several common classroom technologies on student achievement in mathematics.

Computer-assisted instruction

The use of computers to guide and enhance mathematics learning is well documented. Two of the most common applications of computers in the mathematics classroom are computer-assisted instruction (CAI) and computer-based instruction (CBI). CAI and CBI are similar applications of computers in the classroom, but their instructional purposes are nuanced. Computer-assisted instruction (CAI) is a more precise term, often referring to the use of computers in drill and practice, tutorials, or simulation activities offered in substitution or as a supplement to traditional, teacher-directed instruction (Hicks and Holden 2007), while computer-based instruction (CBI) is broadly defined as the use of computers in the delivery of instruction (Kulik 1983). The effects of CAI and CBI on student achievement in general and in mathematics education specifically have been examined across a multitude of grade levels and diverse contexts (Yung and Paas 2015). Despite their nuances, CAI and CBI are often operationalized as learning delivered primarily by means of the computer, which typically incorporate drill and practice, simulations, and well-defined feedback mechanisms. CBI and CAI have been used interchangeably within prior meta-analyses in mathematics education research, and thus, they are discussed as one in the same here.

The results of prior meta-analyses have suggested that the effects of CBI/CAI on mathematics achievement vary from small to medium based on effect size benchmarks (Cohen 1992). Prior meta-analyses were conducted across grade levels and various types of mathematics content (Chadwick 1997; Chen 1994; Hsu 2003; Larwin and Larwin 2011; Lee 1990). CBI/CAI studies consistently conclude that duration and mode of instructional use were statistically significant moderators of study effects. These results are particularly pertinent as they relate to the length of treatment and the instructional modality necessary to enhance mathematics teaching and learning. Calculator use has a rich tradition within mathematics education, and unlike CAI or CBI, calculators are viewed as a more content specific instructional technology.

Calculators

Many mathematics educators continue to debate when to use calculators in the mathematics classroom within research and policy documents. The affordances of calculators as pedagogical tools cannot be denied. The variety of hand-held calculators continues to evolve. Today, calculators range from simple arithmetic calculators to scientific calculators, graphing calculators, and symbolic calculators with a variety of calculating modes, including algebraic systems and spreadsheets (Close et al. 2012). The National Council of Teachers of Mathematics (NCTM) contends that calculators are fundamental technologies in mathematics classrooms that enrich student understanding (NCTM 2000). Given the multiple perspectives on the use of calculators in the mathematics classroom, the results of a prior meta-analysis on the effects of calculators on mathematics achievement were instrumental to the acceptance of calculators as pedagogically meaningful tools.

The results of prior meta-analysis investigating calculator use and mathematics achievement tend to converge at the moderate level of effectiveness. Statistically, the significant moderators of calculator effects on mathematics achievement are grade level and assessment type (Ellington 2006; Hembree and Dessart 1986; Nikolaou 2001; Tokpah 2008). This is not surprising given that the grade level remains a point of contention. Many concerns remain regarding early exposure to calculators in the mathematics classroom, due in part to the inconsistencies in access during examinations. For instance, the results of the 2009 National Assessment of Educational Progress (NAEP) indicate that 66% of fourth graders claimed they never used a calculator for exams or quizzes, compared to only 28% of eighth graders surveyed (Planty et al. 2009). These results are further substantiated by trends observed in prior meta-analyses. Hembree and Dessart (1986) conclude, “average students (except fourth grade) who use calculators in concert with traditional mathematics instruction improve their basic skills with paper–pencil tasks, both in computational operations and in problem-solving” (p. 96). Therefore, assessment and grade require additional pedagogical consideration.

Mathematics software and emerging trends in mobile technology

Mathematics software applications vary from general to specific forms, such as digital geometry software (DGS), and virtual manipulatives. Compared to CAI/CBI and calculator use in the mathematics classroom these tools are relatively under-researched. Thus fewer meta-analyses exist. The overall effect sizes for mathematics software applications range from 0.09 to 1.02 (Chan and Leung 2014; Cheung and Slavin 2013; Moyer-Packenham and Westenskow 2013; Steenbergen-Hu and Cooper 2013). The consistent statistically significant moderators of effect sizes observed in the literature are grade level, duration, and mathematics subject matter (algebra, geometry, etc.). This indicates that the divergence in effect sizes across these studies may be attributed to these aforementioned moderators.

Emerging research trends tend to focus on the effects of mobile technology on the teaching and learning of mathematics. For example, Bano et al. (2018) identified three themes within the pedagogical approaches present in the mathematics and science instruction with mobile devices literature. These approaches were collaboration, inquiry-based learning, and realistic learning. Fabian et al. (2016) found the overall mean effect of mobile technology on achievement in elementary mathematics was .48. The researchers also found that the results of studies in middle grades classrooms were positive overall, but the effects on high school environments were mixed. Given that the results of prior meta-analysis provide credence to the use of technology-enhanced teaching methods in the mathematics classroom, but lack overarching prescriptive conclusions for general praxis with technology, a summary of effects across prior meta-analysis is warranted.

Moderators and meta-analytic thinking

A meta-analytic lens may serve as the most suitable empirical tool to identify the best practices with technology in the mathematics classroom. Meta-analysis is a research synthesis tool that uses summaries of effect sizes to generate empirical conclusions from ostensibly similar studies. Meta-analysis involves (1) summarizing several studies regarding effect sizes, and (2) combining the results to make summative inferences (Cooper 2016). This process involves calculating the average effect size, testing for homogeneity, detecting moderators, and explaining any heterogeneity (Hunter and Schmidt 2004). The detection of moderators is the critical feature of any meta-analytic study; because differences in strength and direction in effect sizes are identified here. Rosenthal (1991) argues, “The search for moderators is not only an exciting intellectual enterprise but indeed…it is the very heart of scientific enterprise” (p. 447). Moderators offer conditions for effects that are theorized, thus informing researchers of the circumstances in which the effects under investigation can be reliable (Schmidt and Hunter 2014). This information is vital to successful implementation of technology in the mathematics classroom across instructional contexts.

Using the lens of meta-analytic thinking, researchers can make better decisions about technology integration in the mathematics classroom. Meta-analysis can help researchers find specific variables that account for the variance in the effectiveness of technology integration in the mathematics classroom. Moderators quantify qualitative variables that influence the strength or direction of relationships in meta-analytic research (Steel et al. 2002). Moderators are also important because they identify statistical interactions, which do not imply causation but rather add context to effect size results (Cooper and Patall 2009). Given the distinctions among the associations they identify, moderators are consistently placed in three categories. Moderators are categorized as either: (1) methodological variations, (2) theoretical constructs, or (3) study characteristics (DeCoster 2004).

Methodological variations refer to components of the experimental design such as sample size, random assignment, or treatment duration. Theoretical constructs are moderators grounded in theory or based on the application of recognized theoretical trends. The final category of moderator variables includes study related artifacts, such as publication status or publication year. Moderators are recognized for their ability to enhance theory development and increase the general richness of empirical work (Aguinis et al. 2011). Given the empirical merit of meta-analytic research and the contextualization offered by moderator analysis, systematically summarizing results across studies is practically and scientifically necessary. To examine the effect of technology on achievement in mathematics across multiple contexts, therefore, a literature survey was conducted to identify and characterize pertinent moderators of effect size. The present study was guided by the following research question:

1. How are the moderators of effect size characterized in prior meta-analyses of technology-enhanced mathematics instruction?

Method

The current systematic review utilized the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) protocol. According to Moher, Liberati, Tetzlaff, and Altman Moher et al. (2009), PRISMA represents a set of evidence-based items that represent accepted practices for conducting systematic reviews and meta-analysis. Eligible studies were limited to meta-analyses written between 1980 and 2015. Due to the focus on meta-analyses, systematic reviews, literature synthesis, and traditional qualitative or quantitative studies were not included.

Data sources were electronic databases covering education, psychology, and social sciences. The specific databases included JSTOR, ERIC, EBSCO, Pych INFO, and ProQuest. In each database, an initial search was performed against the abstracts using the Boolean search term “meta-analysis OR systematic reviewANDmathematics OR STEMANDtechnology OR digital”. Whenever possible, search limiters were used to align the initial search results more closely with the eligibility criteria. For example, most databases allow limiting the search to a specific date range. The search was concluded in January of 2016.

Screening process

Figure 1 presents the complete study inclusion and exclusion process. The screening criteria shown in Table 1 guided the selection of articles from the initial pool. First, study titles and abstracts were screened for relevance to the research question and study topic of interest. Then, the remaining studies were screened against the criteria provided in Table 1. The initial pool of 42 studies was systematically screened using this process and reduced to a final pool of 18. As shown in Fig. 1, most of studies were removed for lack of effect size reporting and the absence of a digital technology focus. Pertinent data related to the research questions were extracted from the remaining studies.

Fig. 1
figure 1

Technology meta-analysis study inclusion flowchart

Table 1 Inclusion Screening Process

Data collection and analysis

The extraction protocol presented in Table 2 guided data extraction from the retained articles. Data extracts included citation, purpose, mean effect sizes, number of independent effect sizes, moderators, and key findings. Extracted data were stored in a database indexed by article. In addition, Results, Discussion, and Conclusion sections of each article were extracted and stored in a database for critical analysis.

Table 2 Data Extraction Protocol

To examine the moderators affecting the strength and direction of the results, each meta-analysis’ methodological, theoretical, and study characteristic moderators, the researchers used a semi-structured coding protocol based on an adapted list of features and trends found in the systematic review. Moderators were coded verbatim initially, and then coded categorically after all studies were reviewed. Moderator categories were based on operational definitions that emerged during the coding discussions and data extraction process. The researchers assessed coding reliability by comparing the independent coding results from the studies. The initial inter-rater agreement was 95%, and we met to resolve the remaining inconsistencies in the coding results.

Data were analyzed descriptively to best characterize the trends in moderator influence on effect size variability. Frequency counts for each moderator were recorded along with the Q B statistics, and p-values. Moderators were assigned a rating of high, medium, or low based on the ratio between the frequencies of statistically significant observation compared to the total number of observations for that particular moderator. These data represent a measure of the impact of each moderator across the studies reviewed in the current study.

Results

The final pool of studies comprised 18 meta-analyses inclusive of studies conducted between 1986 and 2014, representing 1193 independent effect sizes (Table 3). The median year of publication was 2007 and the range for the year of publication was 28 years. A complete list of study characteristics is presented in Table 1, which shows that the majority of the meta-analyses were journal articles (10 out of 18) and the remaining meta-analyses were dissertation studies. All studies except one included either an overall mean effect size or sufficient data to calculate the overall effect size. Only one study reported an overall negative effect size, and the overall effect sizes ranged from − .11 to 1.02 in magnitude.

Table 3 Description of Included Meta-Analysis

To answer the research question, the researchers identified 17 specific moderators by screening the frequency data using IBM SPSS Statistics 20; then, based on characterizations and definitions presented in prior research narrowed the list of moderators. The researchers operationalized each moderator for fidelity. Table 4 presents the operational definition of each moderator, frequency of investigation, and impact based on the ratio between the number of times the moderator was a statistically significant predictor and the moderator identification frequency. Next, the researchers ranked the moderator impact as either (low < .50), (medium ≈ .50), or (high > .50) based on the data observed. The researchers listed the moderators in descending order of frequency of observation for explanatory transparency.

Table 4 Moderator Operational Definitions, Frequencies, and Impact

Grade level was the most frequently observed moderator while access was the least frequently observed. The remaining moderators in the upper quartile, listed in descending order, were role, duration, ability, and mode. The lower quartiles of moderators in ascending order were organization, teacher, community, and race. Results in Table 4 suggest that four out of the five moderators identified in the upper quartile of frequency had a high impact on the variability of meta-analysis results. Most of the lower quartile moderators have a low impact on the variability of effect sizes. The teacher was the only moderator in the lower quartile that had a high impact on the variability of meta-analysis results. In summary, eight moderators ranked as high, three as medium, and six as low. In the discussion section that follows, the researchers provide substantive conclusions and implications for teachers, administrators, and researchers based on these results.

Discussion

The purpose of this systematic review was to examine the moderator analysis results for prior meta-analytic research to identify trends in empirical research and practice. The analysis of this research compared different conceptualizations of learning with technology examined measurement in mathematics classrooms, and identified common and generalizable findings across the meta-analyses regarding the moderators of the effectiveness of technology integration in mathematics classrooms. The results suggest that the effects of technology integration on mathematics achievement vary from negligible to large, but are consistently small. However, given the practical significance of a small effect size in the mathematics classroom this finding has educative merit for teachers and administrators (Hill et al. 2008).

Moderators are important tools to use when evaluating and planning technology-mediated learning in the mathematics classroom. Thus, the focus of this study was on moderators of the effects of meta-analyses. The 17 moderators investigated in this study varied in frequency of investigation and impact on effect size differences. Grade level, role, and duration were the three most investigated moderators, and all had a high impact on effect size variation across the results of the meta-analyses examined. In the 12 studies examining the moderator variable grade level, 66% found that grade level had a statistically significant influence on the student achievement effects variability when technology was integrated in the mathematics classroom. Many included studies favored technology in middle and high school classrooms as opposed to early elementary settings. This conclusion is consistent with prior studies that found that technology is utilized less in elementary schools compared to high school classrooms (Brown et al. 2007). Researchers should continue to assess this phenomenon, and teachers and administrators should examine the grade level implications of technology integration closely when designing and applying interventions.

The results of the ten studies examining the instructional role of technology suggest that technology was a statistically significant moderator of effect sizes in 90% of the meta-analysis examined. Only one study concluded that the instructional role of technology was not a statistically significant moderator of effect sizes, but the findings from the other nine studies concluded that effect sizes were larger when technology was used to supplement or augment instead of substitute or replace traditional instruction in the classroom. In addition to traditional instructional tools such as software resources and standard tools such as graphing calculators, educators are exploiting a variety of technological tools in mathematics instruction, including cell phones and other mobile technologies (Gay and Burbridge 2016; Young and Young 2012; Davis 2010; Valk et al. 2010). As studies continue to use these tools, it will be important to revisit the effects of instructor role on achievement.

Third, duration was assessed in nine studies and found to be a significant moderator of effect sizes in 89% of the studies investigated. Most studies found that at approximately three weeks the effect of a technology intervention weans. Thus, researchers and educators need to be cognizant of overexposure and the novelty factor. Other considerable instructional findings were that mode of instruction, assessment, subject matter, concentration, and teacher instructional orientation all statistically significantly influenced the variability of effect sizes in the meta-analysis; however, these moderators were investigated less often across studies. Such student demographic variables such as race, gender, SES, and community were not consistent moderators of the effects of technology on mathematics achievement.

Limitations

Summarizing effects of moderators on the effect sizes across meta-analyses has several limitations. First, much of the data pertinent to each moderator resides at the individual study level. This is problematic because a precise estimation of the exact influences of all moderators assessed in prior meta-analyses would be difficult to feasibly examine even through second-order meta-analysis (Young 2017). Thus, a representative sample of moderators that could be assessed at the meta-analysis rather than at study level was selected for systematic review in the present study.

Conclusion

This study provides a comprehensive systematic review and literature survey of research conducted from 1985 until 2015. Based on the summary of almost 30 years of research, this study provides important conclusions related to the effectiveness and moderators of technology integration in mathematics classrooms. In conclusion, the results of this systematic review indicate that technology integration supports mathematics achievement across prior meta-analytic research. However, the statistically significant moderators of the effects vary across studies.

Based on these results, the researchers recommend that teachers and researchers continue to implement technology in the mathematics classroom, but emphasize optimizing the effects grade level, role of technology, and duration. The researchers also recommend further research into demographic variables, which were investigated less frequently across studies. In addition, more research is necessary to capture the unique influences of teachers on the effects of technology integration in the mathematics classroom. Finally, the researchers recommend further investigation into variables such as student access to technology at home and the effects of instructional context regarding the duration effects of technology integration. Armed with these recommendations, researchers and educators are better equipped to make informed decisions concerning the when, where, and how of integrating technology in the mathematics classroom.