This study examines the impact of computer technology (CT) on mathematics education through a systematic review of existing literature. This extensive meta-analysis of existing empirical evidence focuses on student mathematics achievement and optimal conditions for mathematics learning in K-12 (kindergarten to grade 12) classrooms.

The use of CT has become increasingly popular in elementary and secondary schools over the past several decades. There is little doubt that technology has become a ubiquitous tool for teaching and learning. The National Council of Teachers of Mathematics (NCTM 2000) emphasized the importance of the use of technology in mathematics education, stating that “technology is essential in teaching and learning mathematics; it influences the mathematics that is taught and enhances students’ learning” (2000, p.11).

Although CT has great potential to impact the teaching and learning of mathematics, the presence of CT hardware does not automatically produce desirable schooling outcomes in mathematics education (Clark 1983; Li 2004). Successful and effective use of technology for the teaching and learning of mathematics depends upon sound teaching and learning strategies that come from a thorough understanding of the effects of technology on mathematics education (Albright and Graf 1992; Coley et al. 2000).

The increasingly popular use of CT in education has set off a flurry of research studies that focuses on the success and effectiveness of technology in elementary and secondary education. For example, Braden et al. (1991) evaluated a computer-assisted instruction (CAI) program for elementary hearing-impaired students in Florida. The results indicated that the CAI treatment led to better in-class mathematics quiz scores. Xin (1999) compared the use of CAI in cooperative learning and whole-class instruction. Although mathematics performance had increased for students using CAI in both cooperative and whole-class settings, gains in mathematics performance were significantly greater among students under cooperative learning, indicating that mathematics performance could be enhanced if students were given opportunities to work within a CT-assisted cooperative learning environment.

Irish (2002) studied the effectiveness of a multimedia software program to teach students with learning and cognitive disabilities and reported that the learning of mathematics among disabled students benefited from this multimedia communication approach. SBC Knowledge Ventures (2005) has systematically documented the advantages of using videoconferencing on learning: (a) helping meet state and national curriculum standards, (b) helping students take classes not offered at their school, and especially (c) helping students involved in the operation of the videoconferencing equipment learn both technical skills and content skills.

One of the hypermedia-based learning programs in mathematics is the “Adventures of Jasper Woodbury,” a mathematics program developed at the Vanderbilt University. It has been widely used around the world. Based on the theory of anchored instruction, the program uses video and multimedia computing technology to provide problem-scenarios aiming to help students develop necessary skills and knowledge for problem solving and critical thinking. Implementation of this program has yielded some interesting findings. According to Mushi (2000), learning mathematics through media was interesting to students and had made a positive impact on their attitude toward mathematics. Shyu (1999) found that computer-assisted video-based anchored instruction could enhance student problem-solving skills. Funkhouser (1993) observed significant gains in problem-solving ability and knowledge of mathematics content among secondary students after they engaged in activities using an exploratory problem-solving software.

Intelligent tutoring systems, based on the ACT (Adaptive Control of Thought) and ACT-R theory of learning and problem solving, is a type of computer-based instructional technology focusing on learning of geometry, algebra, and LISP (Anderson et al. 1985). Early evaluation (Anderson et al. 1995; Koedinger et al. 1997) of the tutors usually, though not always, showed significant achievement gains. Empirical investigation into such system showed that students learned best with the tutoring systems when the tutor provides immediate feedback, consisting of short and directed error messages. Further, students gained more when the intelligent tutors were presented as non-human tools rather than as emulations of human tutors.

In mathematics classrooms, particularly at the elementary school level, manipulatives have been used extensively to help build a foundation for students to understand abstract concepts. The increasing access to CT in school has inevitably resulted in some enthusiasm in the use of virtual manipulatives for mathematics instruction. Virtual manipulatives are usually replicas of real manipulatives and accessed via the Internet. A variety of studies have examined virtual manipulative tools in mathematics classrooms and found positive impacts of such tools on student achievement in and attitude toward mathematics (Char 1989; Kieran and Hillel 1990; Kaput 1992). One advantage of virtual manipulatives, according to Reimer and Moyer (2005), is the capability of connecting dynamic visual images with abstract symbols, a limitation of regular manipulatives. For example, virtual manipulatives could be effectively used to teach fraction concepts for elementary students (Suh et al. 2005). Others have found that junior high students benefited from using virtual pattern blocks, platonic solids, and geoboards to explore geometric concepts (Reimer and Moyer 2005). When virtual manipulatives were used in combination with regular manipulatives, researchers also found positive results (Ball 1988; Terry 1996). Results from other studies, however, indicated no significant gains in mathematics achievement after students engaged in using virtual manipulatives (Kim 1993; Nute 1997).

In a nutshell, how technology can be used successfully and effectively to affect the teaching and learning of mathematics in K-12 classrooms is the key research question that many primary studies have attempted to address. Unsurprisingly, findings have not been consistent, especially when CT use is compounded with other factors such as student characteristics, e.g., gender, ability (Royer et al. 1994; Salerno 1995), student group composition (Brush 1997; Xin 1999), and teaching methods (Hecht et al. 1995; Farrell 1996; Shyu 1999).

As research evidence accumulates on this educational issue, research synthesis becomes useful in helping to make sense from a large body of research literature. The need for the present review is twofold. First, primary studies have provided inconsistent results concerning the effects of CT in mathematics classrooms, calling “for a systematic integration of the literature both for theory development and for pedagogical guidance” (Lou et al. 2001, p. 451).

Second, although several meta-analyses exist examining the effect of learning with CT (e.g., Kulik et al. 1980, 1982; Kulik and Kulik 1986; Niemiec et al. 1987; Kulik and Kulik 1991; Lou et al. 2001), less attention has focused specifically on CT for mathematics learning, one of the core school academic subjects. Among the few existing studies (Hartley 1977; Kulik et al. 1983; Niemiec and Walberg 1985) focusing on mathematics, Hartley’s (1977) dissertation was the first meta-analysis specifically looking at math achievement of elementary and secondary students, reporting that computer-based instruction (CBI) raised student achievement by 0.4 standard deviation or from 50th percentile to 66th percentile. Later, Burns and Bozeman (1981) examined 40 primary studies focusing on math achievement and concluded with similar findings. Few other extensive review of CAI (Kulik et al. 1983; Niemiec and Walberg 1985), each included separate analysis for math achievement, also reported that CAI improved student learning.

The work of Christmann et al. (1997) was the only review study published after 1990 in which mathematics was explicitly examined, along with seven other curricular areas. They compared academic achievement of grade 6 to 12 students who received traditional instruction with that of grade 6 to 12 students who received traditional instruction supplemented with CAI. The primary studies of mathematics included in their meta-analysis, however, were journal articles, conference presentations, and dissertations published before 1990.

The significant change of technology, consequently CT in mathematics instruction, calls for an up-to-date investigation of the literature. The vast majority of the schools have gained access to CT hardware and network since 1990, with computer hardware becoming increasingly affordable and computer software programs emerging in great quantity from different philosophical approaches. In addition, CT has been used differently now from its earlier days. For example, there has been a shift from using CT to promote drill-and-practice in traditional mathematics classrooms to using CT to create an interactive learning environment in learner-centered mathematics classrooms (Lou et al. 2001). All these changes call for a new systematic examination of the effect of CT on mathematics instruction.

Definition of Terms and Types of Technology

We begin with a clarification of three terms that are fundamental for this study. The first term is computer technology (CT). In this work, CT refers to computer software rather than computer hardware. Four-function calculators and handheld graphing calculators are not included in this work because previous research has already systematically reviewed these topics. For instance, Hembree and Dessart’s (1986) meta-analysis examined the effects of K-12 student calculator use. Analyzing 79 primary studies focusing on students’ achievement and attitude, the authors concluded that the use of calculators may increase student achievement and confidence levels. Smith et al. (1997) conducted another meta-analysis of research studies published between 1984 and 1995 that supported and extended the result of Hembree and Dessart (1986). Burrill et al. (1992) synthesized scientific research on the use of graphing calculators, concluding that

Students who use handheld graphing technology have a better understanding of functions, of variables, of solving algebra problems in applied contexts, and of interpreting graphs than those who did not use the technology. … No significant differences in procedural skills were found between students who use handheld graphing technology and those who do not. This indicates that extensive use of the technology does not necessarily interfere with students’ acquisition of skills.

(p. v)

The second term that needs attention is mathematics achievement. Mathematics achievement refers to performance scores (on solving mathematical problems) that are measured by mathematics tests, either standardized ones or teacher-made (researcher-made) ones.

Different types of CT, ranging from early computer-assisted instruction (CAI) or computer-based instruction (CBI), to simulations, or computer-mediated communication, to the Internet or Logo, have been developed and applied in an attempt to enhance mathematics teaching and learning in the past several decades. Researchers (Means 1994; Lou et al. 2001) have classified various types of CT into five main categories: (a) tutorial, (b) communication media, (c) exploratory environment, (d) tools, and (e) programming language. Specifically, the programming language category refers to the teaching of mathematics directly using specific computer programming languages such as Python or C++. Because the vast majority of existing research studies apply types of CT that fall into the first four categories, this meta-analysis focuses on these categories. The following sections describe each category and present some examples to illustrate research on the use of CT under each category. Throughout this meta-analytical review, CT refers to the use of one of these four types or a combination of these four types of technology.

Tutorial pertains to programs that directly teach mathematics by setting up a stimulating environment where information, demonstration, drill, and practice are provided to students (Lou et al.2001). This type of CT includes, but not limit to, computer-assisted instruction (CAI), various mathematics games (e.g., Math Blaster), and numerous drill and practice software (e.g., A+Math, Math Facts in a Flash, Maple 13, and Math Realm). CAI refers to direct instruction type programs (e.g., mimics of programmed instruction) or drill and practice—a contemporary use of the term. CAI could be an effective mechanism for teaching students with special needs certain mnemonic strategies, which in turn, could help increase their performance and accuracy on basic mathematical tasks.

Communication Media refers to communication tools such as email, computer-supported-collaborative learning systems, videoconferencing, and the Internet (Lou et al.2001). These tools enable effective communication and information sharing.

Exploratory Environments seek to encourage active learning through discovery and exploration (Lou et al.2001). Logo, simulations, and hypermedia-based learning are examples of this type of CT. One of the hypermedia-based learning programs in mathematics is the “Adventures of Jasper Woodbury,” using video and multimedia to provide problem-scenarios aiming to help students enhance their problem-solving and critical-thinking skills.

Tools serve the technological purpose to make teaching and learning fun, effective, and efficient (Lou et al.2001). Word processors, PowerPoint, spreadsheet, Geometer’s Sketchpad, Cabri Jr., data analysis software, and various virtual manipulatives are some examples of this type of CT. Fully aware of the distinction between the use of technology in instruction and the use of technology to manage instruction, we also include, in this category, instructional management software such as Accelerated Math™ as long as the software has been used for instructional purposes rather than solely for assessment purposes. For example, Ysseldyke and his colleagues have studied the use of Accelerated Math™ that allow teachers to “match instruction to an individual student’s skill level, providing appropriate practice, monitoring student progress and giving corrective feedback” (Ysseldyke et al.2003, p. 164).

Context Matters to CT Application

The literature on the use of CT in mathematics classrooms suggests that the effects of technology on learning may depend on the learning environment to which CT is applied. Some researchers have attributed student academic success and attitudinal change to not only the use of technology itself but also the embedded method of teaching developed from pedagogical reform. Two distinct pedagogical approaches have been cited most frequently in the research studies: traditional and constructivist teaching. We define traditional approach of teaching as teacher-centered whole-class instruction and constructivist approach of teaching as student-centered instruction that emphasizes strategies such as discovery-based (inquiry-oriented) learning, problem-based (application-oriented) learning, and situated cognition based on constructivism. We emphasize that these terms refer to families of approaches and that instruction within a family can be carried out well or poorly or from different approaches within the general approach. To some extent, these definitions were specific to this review, and as a result, all findings were classified into either traditional instruction or constructivist instruction.

For example, Connell (1998) explored mathematics teaching and learning with technology in two rural classes during a period of 1 year. Both classes used the same technology but with different teaching approaches. One class adapted constructivist pedagogy and technology was used as a student tool for mathematics exploration. A behaviorist approach was used in the other class where technology was mainly used as a presentation tool. By the end of the study, both classes easily surpassed state and district goals and had shown a significant improvement from their baseline. However, the performance of the students in the constructivist class was significantly and consistently higher than that of the students in the behaviorist class. The longer CT was used in the above fashion, the greater the difference in favor of the constructivist class.

Another important context factor is the use of group learning or individual learning approaches. For example, Lou and her colleagues conducted a meta-analysis on small group versus individual learning with CT on achievement and affective variables. Their analysis of 486 independent findings extracted from 122 studies indicates that, on average, small group learning is significantly more effective than individual learning on student individual achievement (mean ES = +0.15), group task performance (mean ES = +0.31), and several affective outcomes. Because the findings on achievement and group task performance are significantly heterogeneous, they have identified several technology, task, grouping, and learner characteristics that are accounted for such variability.

Other factors that may contribute to mathematics teaching and learning with technology are the learner characteristics. Students’ achievement, behavior, and attitude towards mathematics and technology may depend on their gender, grade level, ability level, and socioeconomic status (SES).

Braden et al. (1991) evaluated a computer-assisted instructional (CAI) program for elementary hearing-impaired students in Florida. The results indicated that the CAI treatment had led to better in-class math quiz scores. Other outcomes such as reading and math scores on the Florida Statewide Student Achievement Tests were also measured, but no significant relationship was found. Irish (2002) studied the effectiveness of a multimedia software program to teach students with learning and cognitive disabilities. Using a single-subject, multiple-baseline design across subjects (Cooper et al. 1987), the study was systematically replicated across three pairs of grade 5 students. Although sample size (e.g., six students in total) was minimal, the results of this study showed that CAI could be an effective mechanism for teaching these special needs students certain mnemonic strategies, which in turn, could help increase their performance and accuracy on basic multiplication tasks.

In sum, the research reviewed on learning of mathematics with technology suggests that the effectiveness of mathematics learning with technology is highly dependent on many other characteristics such as teaching approaches, type of programs, and type of learners. We included these contextual characteristics in our meta-analysis in an attempt to identify the moderating study features that may affect the effectiveness of technology in mathematics education.

Objective and Research Questions

The objective of this meta-analysis is to assess the impact of CT on mathematics learning for students in grades K-12. Meta-analysis provides the best statistical approach to synthesize inconsistent empirical evidence by taking retrieved (existing) studies as a “random” sample of studies from a certain population. By examining this sample, one takes into account design characteristics as a way to control for the quality of each study. Our present meta-analysis seeks answer the following research questions:

  1. 1.

    Does mathematics learning with CT impact mathematics achievement of K-12 students, compared to mathematics learning without CT? If so, to what extent?

  2. 2.

    What study features moderate the effects of CT on K-12 students’ mathematics achievement?

  3. 3.

    What are the optimal conditions for effective mathematics learning with CT in terms of K-12 students’ mathematics achievement?

Method

This systematic review quantitatively integrated the findings from the primary research on the educational use of technology in the teaching and learning of mathematics for students in grades K-12. The following section outlines the procedures employed under the following headings: information retrieval, study inclusion criteria, coding of outcome measures and study features, calculating effect size, and data analyses.

Information retrieval

In this study, we focused on recent educational studies of CT (i.e., journal articles and doctoral dissertations published since 1990). The rationale for choosing 1990 was twofold. First, since 1990, there have been dramatic changes both in technology development, particularly CT, and how technology is used in mathematics education (Lou et al. 2001). Second, a previous meta-analysis of a similar topic focused on studies published before 1990 (Christmann et al. 1997).

We used a three-step approach to search for relevant studies. The initial step included an electronic search on the following databases: (a) Educational Resources Information Center (ERIC, 1990–2006), (b) PsycINFO (1990–2006), (c) Education Full Text (1995–2006, a database with research papers published since 1995), (d) Education Abstracts, (e) Dissertation Abstract International (1990–2006), (f) Australian Education Index, (g) British Education Index, (h) Canadian Education Index, and (i) ProQuest Digital Dissertations and Theses Fulltext.

Depending on the database, the specific search strategy was modified to fit the terms, search symbols, and keywords appropriate to the overall topic. We systematically used ERIC thesaurus for our ERIC search, and Library of Congress Subject Headings for searches of other North American databases. The following keywords and their related cognates provided examples of the types of terms that we used to locate potentially relevant studies:

Intervention Keywords included technology, computer or educational technology, computer, multimedia (including simulation), online, e-learning, web-based courseware or software, and distance education.

Outcome Keywords included mathematics, mathematics learning, mathematics teaching, algebra, arithmetic, geometry, calculus, graphing, data analysis, statistics, number theory, number sense, and mathematics achievement.

Target Population Keywords included elementary or secondary school students, middle school, junior high school, high school, public school, and private school.

To enrich our pool of studies, we (a) searched for both qualitative and quantitative reviews published since 1990 to use their reference lists; (b) checked reference lists from articles with extensive literature reviews such as Christmann et al. (1997), Clements (1998), Woodward and Reith (1995), and Woodward (1995); (c) used author searches and the Citation Indices to look for potential papers by leading scholars of this field; and (d) conducted a manual search of the leading journals related to this study, including Educational Technology Research and Development, British Journal of Educational Technology, Journal of Research on Technology in Education, Journal of Computers in Mathematics and Science Teaching, Journal of Research in Mathematics Education, Educational Studies in Mathematics, Journal of Mathematics Behavior, and For the Learning of Mathematics. Steps (a) and (b) resulted in 1,938 articles, and (c) and (d) seven articles. All these efforts of search produced a total of 1,945 articles.

Selection criteria

Studies for this review were subjected to a two-stage process of screening for inclusion. Stage 1 was the screening of titles and abstracts. Stage 2 was the screening of full articles. In order to advance to the second stage of the process, a study had to meet the following criteria:

  • The study uses CT for instructional (or learning) purposes.

  • Participants of the study are students in regular classrooms in grades K-12.

  • The study employs an experimental or quasi-experimental design (as defined later).

  • The study is published during 1990 to 2006 (without restriction to geographical area or language).

  • The study uses mathematics achievement as outcome.

  • The study reports quantitative data in sufficient detail for the calculation of an intervention effect size.

In the first stage, two reviewers evaluated titles and abstracts retrieved from electronic searches and references from primary studies and review articles, using the above criteria to identify potential studies. For those titles and abstracts judged to be definitely or potentially appropriate for inclusion, a full copy of each article was obtained. When there was disagreement on whether to advance a specific title or abstract for inclusion, the original complete paper was also obtained. A total of 183 articles were identified for more complete inspection. In the second state, two reviewers read all complete articles to examine the appropriateness of the study. Any ambiguities or questions about eligibility were resolved through discussion.

Coding of outcome measures and study features

To identify the methodological and substantive characteristics that might contribute to the significant variation in the empirical findings, characteristics of study, design, intervention, and outcome measures were coded using a three-stage coding procedure. First, a set of broad categories was established based on a review of the related literature, resulting in the coding scheme. The information on sample characteristics, intervention characteristics, outcome measures, and research findings was then extracted and coded by two reviewers. Inter-rater agreement was 100% after discussion on a few disputed findings. Finally, for studies with insufficient data, we made effort to contact the authors for further information. About 10% of the authors responded, and the information they provided was added to the coding.

We piloted the coding scheme with a random sample of 10 primary studies to ensure the inclusion of salient study features in the literature and avoid researcher bias (Lou et al. 2001). We then developed a codebook from our original coding scheme based on the result of the pilot.

Appendix 1 describes 15 features coded for each study in terms of publication, sample, design, implementation, and technology. Publication features included publication type and publication year. Sample characteristics included gender composition, racial composition, socioeconomic composition, student type, grade level (later collapsed into two categories of elementary and secondary due to a small number at each grade level), and whether class was the unit of analysis. The categories for gender composition are adapted from the work of Lipsey and Wilson (2001) where students are divided into: predominant female (males ≤ 45%), predominantly male (males ≥ 55%), and gender balanced (45% < males < 55%) groups.

Student type distinguished between general education students and special needs students a category comprised of low-achieving students, mental/physical/emotional disability students (e.g., ADHD, blind), and at-risk (drop-out) students. Design characteristics focused on the research design of each primary study and the confidence rating of the effect size (ES) estimation. The research design referred to whether the research is experimental or quasi-experimental based on the definitions provided by Creswell (2003). In a true experiment, researchers have both treatment and control groups and randomly assign the participants to the treatment group. In a quasi-experiment where random assignment is not used, researchers may have treatment and control groups, or use single-group interrupted time-series design (i.e., single group before and after a treatment). The confidence of ES rating was based on the categories developed by Lipsey and Wilson (2001). The calculation of each ES, based on the available data reported in each study, was rated as either more reliable estimation, or less reliable estimation, or no estimation.

Implementation features coded duration of intervention, teaching methods employed (constructivist or traditional), type of testing instrument, and country (collapsed into developing or developed countries). Considering the duration of intervention ranged from a minimum of eight classes in a week to multiple school years, we decided to use a term (6 months) as a cutoff time since it was commonly used in the studies to describe interventions. We defined traditional approach of teaching as teacher-centered whole-class instruction and constructivist approach of teaching as student-centered instruction that emphasized strategies such as discovery-based (inquiry-oriented) learning, problem-based (application-oriented) learning, and situated cognition based on constructivism. To some extent, these definitions were specific to this review, and as a result, all findings were classified into either traditional instruction or constructivist instruction.

Technology type included tools, exploratory environment, communication media, and tutorial. Mathematics domains (e.g., problem-solving, computation) and assignment level were two other study features coded initially but later eliminated from analysis due to the lack of sufficient data from primary studies (i.e., most primary studies did not include those aspects in their research design).

Appendix 2 provides a detailed summary of primary studies with information (i.e., study features) extracted and analyzed in this meta-analysis. The overall effects of CT on both mathematics achievement and attitudes toward mathematics were first calculated. The moderated effects of CT by study features were then computed for mathematics achievement only due to limited data available for attitude toward mathematics.

Calculating effect size

For studies that report mean and standard deviation (SD), effect size was calculated using the mean difference (between experimental and control groups) as the numerator and the pooled standard deviation (PSD) as the denominator (see Rosenthal 1991). Effect size for studies with data in the form of t value, F value, p level, frequency, and proportion was calculated using formulas provided by Lipsey and Wilson (2001). Since all studies included in this review employed either experimental (i.e., randomly selected samples) or quasi-experimental design (i.e., not randomly selected participants), each of the effect sizes was calculated from each study based on a comparison of an experimental group with CT to support mathematics instruction to a control group not using CT.

Some studies provided data collected from a different time period (i.e., longitudinal data). Aligned with the study of Lou et al. (2001), when pre-scores (collected at the beginning of an intervention) and post-scores (collected at the end of an intervention) were available, we used post-scores only. That is, we did not perform statistical adjustment for pretest variance or differences. The mean difference in post-scores was the numerator and the PSD in post-scores was the denominator.

Some studies reported multiple findings per study. Regarding the number of findings to be extracted from each study, two approaches have often been used in meta-analysis (a single finding per study or multiple findings per study). According to Lou et al. (2001), the advantage of extracting one finding per study is a guarantee of the independence of each effect size. The disadvantage is that the differences within a study between different sample groups (e.g., elementary versus secondary students) or between different treatments under investigation (e.g., groups using one kind of computer system versus another kind) are lost.

Extracting multiple effect sizes from a single study, on the other hand, might result in a violation of the independence assumption for effect sizes, which in turn might increase type I or type II errors (Glass et al. 1981). In this study, if a study had multiple effect sizes from the same subjects (students) for mathematics achievement, we chose the one most directly addressing the study purpose (i.e., the one providing valid data on as many study features as possible). In the case where all effect sizes shared a similar background of study features, we used the arithmetical average of the effect sizes for each affected study. This allowed us to eliminate the problem of dependence while ensuring that all levels of all study features are represented (Lou et al. 2001).

The strategies reported above also effectively addressed the concern of inflated effect sizes resulting from clustering of effect sizes within some primary studies. Clustering occurs when there is more than one effect size from a primary study. As we discussed earlier, we took only one effect size from each primary study unless it includes independent samples (e.g., samples from different grade levels). This selected effect size was the most representative of a primary study with information on as many study features as possible. We aggregated effect sizes within a primary study only when all of them shared very similar study features (e.g., from the same sample, with similar treatment design, with similar implementation procedure). Although these strategies do not represent a perfect solution to the clustering effects in meta-analysis, they do reduce the chance of inflated effect sizes by a substantial degree.

Data analyses

Data screening was first performed using SPSS for Windows (SPSS, 2005) on frequency and descriptive procedures. If a study feature (e.g., assignment level) had almost no variability or with over 90% missing data, the study feature was eliminated from further analysis. The rationale for this elimination was twofold. First, such a small number of studies might not have sufficient statistical power for meaningful analysis. Second, these studies might actually bring in systematic sampling errors.

Analytically, the overall effects of CT on mathematics achievement were computed by averaging effect sizes. Although there are different ways to estimate means, we used the weighted mean effect sizes by using sample sizes as weights for correcting bias (Lou et al. 2001). That is, findings based on larger sample sizes were given more weight. We chose such approach because it is simply to administrate and easy to interpret. A 95% confidence interval was then used to determine the statistical significance of these means.

Homogeneity analysis

Next we tested the homogeneity of all effect sizes extracted from studies (Hedges and Olkin 1985). The weighted individual effect sizes were then aggregated to form an overall weighted mean estimate of the CT effects (d+). Homogeneity statistics (QT) were used for the set of effect sizes to determine whether the set of effect sizes varied statistically significantly, that is, whether the findings shared a common effect size in the population. If QT was not statistically significant, a fixed-effects model would be adopted for data analysis. If QT was statistically significant, a random-effects model would be adopted for data analysis.

Subgroup (Moderator) analysis

With the adoption of a model, subgroup (moderator) analysis was followed to examine differences in the average effects among groups as the moderated effects of CT by study features. To control for confounding variables and to avoid the “fishing-trip” type of data analysis, we used ANOVA within the multiple regression framework as our primary statistical tool. Specifically, based on the ANOVA analogues for categorical data, a weighted least-squares (WLS) multiple regression analysis was performed on effect sizes. Sample sizes were used as the weighting variable. The first analysis aimed to identify study features that accounted for significant unique variance in the findings. This was done individually, feature by feature. For example, all gender-related variables formed a block. The block of gender composition (of the studies) was examined individually for absolute gender effects. As a result, we determined absolute effects of study features.

The second analysis aimed to examine relative effects of study features. In this step, all statistically significant predictors identified from the above ANOVA analogues (see the first analysis) were entered together in a simple WLS regression model (Lipsey and Wilson 2001; Lou et al. 2001). Again, sample sizes were used as weights. Features that were not statistically significant were removed one by one from the model until all remaining features were statistically significant in the model. These features are deemed as having statistically significant relative effects which would represent the most important predictors of the effects of CT on mathematics achievement. In both first and second analyses, proportion of variance explained (by a WLS regression model), R-square, was obtained as a statistic to measure the adequacy of the model.

Results

In total, 85 independent effect sizes were extracted from 46 primary studies involving a total of 36,793 learners to examine the effects of CT on mathematics achievement. About half of the mathematics achievement outcomes were measured by locally developed or teacher-made instruments and the other half by standardized tests. Almost all studies were well controlled, employing either random assignment of students to experimental/ control conditions or using statistical control for quasi-experimental designs. Nearly two thirds of the studies were published journal articles, and the rest were doctoral dissertations or unpublished reports.

Overall effects of technology

A total of 85 effect sizes were created from retrieved empirical research studies that investigate whether the use of technology has statistically significant effects on mathematics achievement. The average weighted Cohen’s d was 0.28 SD with a 95% confidence interval from 0.13 to 0.43. With confidence intervals above the zero point, both averages were statistically significant at the alpha level of 0.05, indicating statistically significant (positive) effects of technology on mathematics achievement. Rosenthal and Rosnow (1984) classified effect sizes more than 0.50 SD as large, between 0.30 and 0.50 SD as moderate, and less than 0.30 SD as small. The weighted average therefore indicated small positive effects of technology on mathematics achievement after sample size was controlled.

The 85 effect sizes retrieved from empirical research studies had a minimum of −0.66 SD and a maximum of 3.76 SD, with seven out of 85 effect sizes being negative (indicating negative impacts of technology on mathematics achievement). The 75th percentile was 0.87, the median was 0.43, and the 25th percentile was 0.13. The distribution of effect sizes showed mild measures of skewness (1.61) and kurtosis (2.50). The test of homogeneity of effect sizes was not statistically significant (Q = 0.14, df = 84). We therefore adopted a fixed-effects model assuming that effect sizes differ by sampling error only.

Study features moderate the effects of CT on mathematics achievement

This section addresses the issue of what study features moderate the effects of CT on mathematics achievement. A total of 13 study features were analyzed to identify factors that significantly moderated the effects of CT. Table 1 breaks down effect sizes by these study features to provide descriptive statistics. Table 2 presents inferential results identifying features that statistically significantly influenced the effects of CT. We emphasize that effect sizes were weighted by sample sizes through the use of WLS regression. We note that the unit of the unstandardized coefficient is SD because effect size was Cohen’s d in our meta-analysis. Therefore, each unstandardized coefficient could directly be compared with some established criteria for an evaluation of its magnitude or strength. In this meta-analysis, we adopted Rosenthal and Rosnow (1984) and already applied their criterion earlier.

Table 1 Descriptive Statistics on Effect Size, by Study Characteristics (N = 85)
Table 2 Statistical Results on Study Characteristics Individually Explaining Variance in Effect Size (N = 85)

Effects of sample-related characteristics

We paid particular attention to the sample characteristics of the studies, coding (a) gender composition, (b) racial composition, (c) socioeconomic composition, (d) special education status, (e) type (or level) of education (elementary or secondary), and (f) class as the unit of analysis. We investigated whether these sample characteristics would explain the variance in effect size measures by examining in a separate manner whether each block of variables was related to the effects of technology on mathematics achievement.

The block of gender composition showed no statistically significant effects of CT on mathematics achievement. This fact could be appreciated not only from the small effects of the three gender composition variables but also from the trivial R-square that indicated the unique variance for which this block accounted. Therefore, gender groups did not differ in terms of the effects of CT on mathematics achievement. In other words, all gender groups benefited equally from technology. Similarly, neither racial groups nor socioeconomic groups differed in terms of the effects of CT on mathematics achievement.

When we compared the effects of technology on mathematics achievement between special need students and general education students, we found statistically significant effects among students with special need (1.31 SD). According to Rosenthal and Rosnow (1984), such a magnitude represents large effects. Technology was strongly more effective in promoting mathematics achievement when used to help special need students than to help general education students. In fact, special need students represented the largest effects among all sample-related characteristics of the studies.

The block concerning the type (or level) of education also showed statistically significant effects of technology on mathematics achievement. The effects were negative, indicating that elementary school students demonstrated larger effects than secondary school students (0.22 SD). This magnitude is small, indicating that elementary school students had a small advantage in using technology to promote mathematics achievement over secondary school students.

Finally, we examined whether the unit of analysis mattered to the effects of CT on mathematics achievement. Most primary studies used either class or grade level as the unit of analysis, and a few used school as the unit of analysis. We found that primary studies that did not use class as the unit of analysis reported larger effects than primary studies that used class as the unit of analysis. The magnitude of the effects was moderate (0.45 SD). The unit of analysis did play a moderate role in determining the effects of CT on mathematics achievement.

Effects of implementation-related characteristics

We also paid particular attention to the implementation characteristics of the studies, coding (a) duration of the intervention (or treatment), (b) method of teaching, (c) type of testing instrument, and (d) type of country (where studies were conducted). Similar to the case of sample characteristics, we investigated whether these implementation characteristics would explain the variance in effect size measures by examining in a separate manner whether each block of variables was related to the effects of technology on mathematics achievement. Table 2 shows that larger effects were related to a shorter duration. Specifically, technology interventions that lasted one term showed larger effects than technology interventions that lasted longer than one term (but shorter than one year) (0.35 SD). Meanwhile, technology interventions that lasted longer than 1 year showed no statistically significant advantage over technology interventions that lasted one term. These findings indicated that shorter technology interventions were much more effective in promoting mathematics achievement than longer technology interventions.

We found that method of teaching had large effects of technology on mathematics achievement of students (1.00 SD). When used in settings where teachers practiced constructivist approach to teaching, technology had much stronger effects on mathematics achievement than settings where teachers practiced traditional approach to teaching. In addition, we found that type of testing instrument had small effects of technology on mathematics achievement of students (0.27 SD). When used to measure mathematics achievement of students, studies with non-standardized tests showed larger effects of technology on mathematics achievement than studies with standardized tests. Finally, we found that where studies were conducted also accounted for the variance in effect size measures. Studies conducted in developing countries reported larger effects of technology on mathematics achievement of students, compared with studies conducted in developed countries (0.31 SD). In fact, this magnitude indicated a moderate difference.

Effects of publication-related characteristics

We considered some basic publication characteristics of our retrieved studies: (a) year of publication and (b) type of publication. We grouped studies into those published before the turn of the century (i.e., before 1999) and those published after the turn of the century (i.e., after 1999). We found that publications before the turn of the century reported larger effects of CT on mathematics achievement than publications after the turn of the century (see Table 2). The magnitude of 0.38 SD indicated a moderate difference. Meanwhile, we found that unpublished documents reported larger effects of technology on mathematics achievement than theses and dissertations. The magnitude of 0.86 SD indicated a large difference.

Effects of technology-, design-, and quality-related characteristics

We distinguished four types of technology as they applied to educational (school) settings: (a) tutorial, (b) communication media, (c) exploratory environment, and (d) tools. When this block was examined, we found that all types of technology demonstrated the same effects on mathematics achievement of students. We also considered the type of experiment that was used to assess the effects of technology on mathematics achievement. Specifically, we compared studies employing true experimental design with studies employing quasi-experimental design. We found that both types of research design revealed the same effects of technology on mathematics achievement.

Finally, we examined the impact of study quality on effect size measures, grouping effect sizes into three categories. We found that both effect sizes estimated more reliably and effect sizes estimated less reliably actually produced the same effects of technology on mathematics achievement as effect sizes whose estimation reliability could not be reasonably determined with available information.

Relative effects of technology on mathematics achievement

We reported in the previous sections effects on mathematics achievement of individual blocks of characteristics of the studies (i.e., sample-related characteristics, implementation-related characteristics, publication-related characteristics, technology-related characteristics, design-related characteristics, and quality-related characteristics). Although separated analyses of these individual blocks were informative, a collective analysis of all characteristics of the studies was necessary to discern salient characteristics that demonstrated the most important effects of technology on mathematics achievement of students. For this purpose, all characteristics of the studies statistically significant in Table 2 were tested together for relative importance. Table 3 shows the results of salient characteristics that demonstrated the most important effects.

Table 3 Statistical Results on Study Characteristics Collectively Explaining Variance in Effect Size (N = 85)

Four characteristics of the studies remained statistically significant collectively. Two of them indicated large effects. With other statistically significant variables controlled, special education status showed a magnitude of 1.02 SD in favor of applying technology to special need students over general education students, and method of teaching showed a magnitude of 0.79 SD in favor of using technology in school settings where teachers practiced constructivist approach to teaching over school settings where teachers practiced traditional approach to teaching.

Meanwhile, two characteristics indicated moderate and small effects of technology on mathematics achievement. Year of publication showed a moderate magnitude of 0.32 SD in favor of publications before the turn of the century (before 1999) over publications after the turn of the century (after 1999), with other statistically significant variables controlled. Type (or level) of education showed a small magnitude of 0.22 SD in favor of using technology at the elementary school level over second school level, with other statistically significant variables controlled.

We noted that this combined regression model was quite adequate in accounting for variance in effect size measures (Lou et al. 2001). The R-square indicated that this combined regression model explained more than a third (34%) of the variance in effect size measures. In comparison, much smaller was the R-square in Table 2 that measured the adequacy of each individual block model with statistically significant predictors. Nevertheless, even those models indicted satisfying performance in the light of Gaur and Gaur (2006) who claimed that “while in natural science research it is not uncommon to get R square values as high as 0.99, a much lower value (0.10–0.20) of R square is acceptable in social science research” (p. 109).

Therefore, study characteristics presented in Table 3 were strong predictors of the effects of technology on mathematics achievement. In fact, when all positive conditions were present (effects associated with year of publication were not counted), students learning mathematics with CT could achieve 2.03 SD higher than those learning mathematics without CT, representing an exceptional gain in mathematics achievement through the application of CT in mathematics classrooms. The positive conditions in this case meant that CT was applied to special need students in elementary school setting where teachers practiced constructivist approach to teaching. Because each coefficient represented the unique contribution of the corresponding variable (with statistical control of other variables in the model), 2.03 SD was the result of direct addition of coefficients, a common practice of meta-analysis (Lou et al. 2001), associated with special need students, elementary students, and constructivist approach to teaching.

Discussion

CT has been used in mathematics classrooms in the last several decades. As computers become ubiquitous tools for learning and instruction, an important question is to what extent CT impacts student mathematics learning.

Our meta-analysis extends the existing literature by focusing on the impact of CT on mathematics—a core school academic subject and by examining primary studies conducted after 1990. This up-to-date investigation of the literature allows us to taking into consideration the significant change of technology and the use of technology in schools in the last two decades.

For example, Lou et al. (2001) synthesized the empirical research on the effectiveness between small group and individual learning when using computer technology. Their major finding was that small group learning had more positive effects. Although such a synthesis is important, it dealt with only one aspect of classroom instruction involving computer technology. Our meta-analysis extends their effort in understanding the effects of social context (related to computer technology) by synthesizing the empirical research on the effectiveness between traditional and constructivist approach of classroom instruction. As a matter of fact, our meta-analysis goes beyond Lou et al. (2001) by including quite a few variables new to Lou et al. (2001) in an effort to broaden our understanding on the effects of social context when using computer technology.

In this meta-analysis, we have examined the impact of CT on mathematics learning. A broad goal of this work is to extract important factors that contribute to technology in schools and we have achieved such goal by identifying the critical “ingredients” of the effective use of technology in mathematics learning. This broad goal certainly limited our effort to the “whether” aspect rather than the “why” aspect of using technology to promote mathematics learning. The why issue requires an in-depth analytical approach to specifically design research (particularly experimental research) to manipulate key elements in, say, using technology for mathematics learning of special education students. Few empirical studies we retrieved were specific enough on this regard. Nevertheless, our meta-analysis did imply to educational practice that the use of technology in general is an effective tool to promote mathematics learning of special education students. We do believe that our key ingredients would effectively lead future researchers to investigate the why aspect of effective technology use in mathematics learning.

Technology effects

The objective of this systematic review was to examine the impact of CT on mathematics achievement for students in grades K-12. Based on a total of 85 independent findings extracted from 46 primary studies involving 36,793 learners, we found overall positive effects of CT on mathematics achievement. On average, there was a moderate but significantly positive effect of CM on mathematics achievement (mean ES = +0.71). This indicates that in general students learning mathematics with the use of CT, compared to those without CT, had higher mathematics achievement.

Such a result, of course, should not diminish the importance of good teaching. Kozma argues that any tool, technology or otherwise, simply cannot replace good teaching (Kozma 2001). Rather, our result indicates that in this information era where technology has become essential in teaching and learning (NCTM 2000), CT should be a necessary component of any good teaching.

Not all approaches of using CT, however, resulted in equally good performance and not all students learning with CT learned better than those learning without CT under all conditions. To achieve maximum benefit, the way to use CT matters. Both our meta-analysis and the previous one by Christmann et al. (1997) raise the concern that using CT purely as a way to deliver instruction would “not influence student achievement any more than the truck that delivers our groceries causes changes in our nutrition” (Clark 1983, p. 449). Yet, our nutrition can be damaged by a bad choice in delivery methods (e.g., food may get rotten if not delivered with refrigerated trucks). We therefore need to consider the optimal conditions for learning mathematics with the support of CT. The multiple regression analysis indicates that the significant variability in mathematics achievement could indeed be accounted for by a few salient design and learner characteristics.

Pedagogical and context factors that moderate the effect of CT

Study features that account for the most variability in mathematics achievement include type of students, level of education, method of teaching, and year of publication (see Table 2). The effects of CT on mathematics achievement were significantly enhanced when CT was used (a) on special need students, (b) in elementary mathematics classrooms, and (c) where a constructivist approach to teaching was practiced. When all these positive conditions are present, large positive effects of CT (mean ES = 2.03) may be expected. The differential effects of CT for students of different ability levels are consistent with those found in Niemiec and Walberg (1985). Their review of 48 primary studies of elementary students (grade 8 and under) math learning showed that lower achievers and at-risk learners (e.g., special needs) achieve more than other students when CAI was used. This result also parallels with the series efficacy studies conducted by Fuchs and colleagues in the late 1990s (e.g., Fuchs and Fuchs 1998; Fuchs et al. 2002).

We went back to reexamine the primary studies and realized that special needs students had often been engaged in math learning in following ways: (1) involving both collaborative small group learning and individual practices, and/or (2) using technology in ways to help students building appropriate mental models of mathematics (Moreno and Mayer 2000). In addition, the types of CT used often involved multimedia (whether commercially built or researcher-built math software) for the presentation of information in symbolic, visual, and verbal forms.

Even when CT is applied in educational settings lacking any of the above conditions, CT may still be an effective strategy to promote the learning of mathematics, according to Table 2 where individual study features are examined. Stated differently, in the absence of the optimal conditions discussed above, the impact of CT on mathematics achievement can still be enhanced if CT was used (a) on students grouped by grade level (school in a few cases) (other than class), (b) using non-standardized tests (instead of standardized tests), and (c) in developing countries (vs. developed countries).

Further, we found that shorter technology interventions (6 months or less) were much more effective in promoting mathematics achievement than longer technology interventions (between 6 and 12 months). It is possible that such gain in mathematics achievement is a result of the novelty effects of technology, as suggested in some previous research (Kulik et al. 1982; Kozma 2001). As students get familiar with the technology, such novelty effects tend to decrease, which often results in diminished motivation to use CT in a serious way for learning.

Some findings are beyond our expectation. The first noteworthy one is that different types of CT had no significant effects on mathematics achievement. Whether CT was used as communication media, tutorial device, or exploratory environment, students share similar results in their math achievement. In a similar vein, difference in sample characteristics such as gender composition, racial composition, or SES composition created similar effect sizes in math achievement.

Difference in research methodology across primary studies appeared to produce little difference in effect size measures. Studies with more scientifically rigorous methods (e.g., random assignment) did not produce different outcomes from studies with less scientifically rigorous methods (e.g., naturalistic approach).

Policy implications

Our policy implications center around the five salient characteristics of primary studies that were statistically significant collectively. One characteristic that showed moderate effects speaks to the advantage of technology in promoting mathematics achievement of elementary school students over secondary school students. This result is not a total surprise to us in that we expect technology with all of its visual and hands-on approaches and attractions to suit more to the learning styles of elementary school students than secondary school students. It demonstrates that the current way that technology is applied to elementary school students seems to work well and further development and advancement along this pathway may impact mathematics achievement of elementary school students even more dramatically.

The other characteristic of the studies that showed large effects speaks to the fact that unpublished documents tend to report higher levels of impact of technology on mathematics achievement than theses. We believe that theses/dissertations are likely to have stronger research methodology and therefore make more reliable knowledge claim than unpublished documents. In this sense, unpublished documents tend to overestimate the impact of technology on the learning of mathematics. Therefore, educational decisions made by a digest of mainly unpublished reports and documents may need to be more cautious when considering the advancement of technology for educational purposes.

Three characteristics of the studies indicated small effects of technology on mathematics achievement. It is encouraging to report that technology showed larger effects on mathematics achievement of special need students than general education students. Technology has indeed been a very valuable tool to help students with learning difficulties as witnessed by many technologies developed particularly for special need students. This belief that technology can help special need students learn mathematics is supported by our meta-analysis. Although this is a small magnitude, it does indicate the potential of technology on the learning of mathematics of special need students. We expect further technological advancement will produce larger effects on the learning of mathematics among students with special need.

It is also encouraging to report that using technology in school settings where teachers practiced constructivist approach to teaching showed larger effects on mathematics achievement than using technology in school settings where teachers practiced traditional approach to teaching. This implies that technology may work better in a certain type of learning environment. This is exciting news to us in that technology does require a context to intervene with the learning of mathematics. With available data, we can only test between the constructivist approach and the traditional approach to teaching. The result indicates that a constructivist approach facilitates technology to impact the learning of mathematics. This should, of course, not nullify the usefulness of directed instruction. Rather, it highlights the importance of the contextual consideration of technology in promoting the learning of mathematics, just like the case of special need students.

Finally, studies that used non-standardized tests as instrument to measure mathematics achievement reported larger effects of technology on mathematics achievement than studies that used standardized tests. This finding implies a methodological concern to us. Often, standardized tests have good psychometric properties. On the other hand, non-standardized tests often need to deal with the issues of reliability and validity. From this perspective, the use of non-standardized tests could artificially inflate (and thus distort) the effects of technology on the learning of mathematics. Another possibility is that teachers/researchers who build their own measures are also those who are heavily vested in implementing the interventions. The implementation fidelity of intervention programs therefore may be a factor contributing to such a difference.

Of course, the notion that standardized tests tend to measure low-level thinking abilities has been around. There will be attempts to bring up alternative assessments on the learning of mathematics. Therefore, there is a need for further research studies to distinguish between traditional assessment instruments and alternative assessment instruments. The contribution of this finding is to show that what instrument to use could indeed be a deciding factor in the examination of technology as it impacts the learning of mathematics.

The above implications come from a digest of Table 3 that examined collective effects of the characteristics of primary studies. We believe that some individual effects of the characteristics of primary studies as reported in Table 2 could also be informative to educational research and practice. Since the duration of intervention programs one term or shorter had more impact on math achievement than longer programs, we suggest that technology interventions may need to be short term or change constantly in type or format if a long-term goal is set. The novelty effects as established in the literature (Kulik et al. 1982; Kozma 2001) may need to be maintained to see the effects of CT. Technology interventions with interval breaks and with regular changes in type, equipment, and way of application of CT are basic strategies to maintain the novelty effects.

Different units may be taken to examine the effects of CT, including class, grade level, and school. Results of our meta-analysis showed that the effects of CT were enhanced when it was applied to unit of students other than class. To a large extent, this finding suggests that the scale of CT intervention does matter. Larger scales of CT application produced larger effects on mathematics achievement. Although the reason for these effects is not clear at this time, we speculate that a larger scale of CT intervention may convey the importance of using CT in a more serious way to students. We suggest that CT should be applied to at least a whole grade level to maximize its effects on mathematics achievement.

Limitations and future research

This meta-analysis, like all others, has limitations. First, the fact that meta-analysis cannot be conducted in an experimental fashion exposes our inability to control sample sizes and missing data. Small samples lead to the decreased sensitivity of data analysis. Weighting effect sizes by sample sizes restores sensitivity only to a certain extent. Second, it is impossible for any meta-analysis to evaluate (and code) the design quality of the programs (CT intervention in our case) used in primary studies. In other words, we cannot control the intervention integrity and implementation fidelity. The results of meta-analysis evaluating the effectiveness of a certain intervention are most likely populated without such knowledge. Third, even though the types of CT used in this study are adapted from previous review studies (Means 1994; Lou et al. 2001), we acknowledge the categories are broad and possible usages can be instructionally useful or damaging for given goals. Future research is sought to consider the nature of the use of technology. Fourth, the moderate effects of CT on special needs students identified in this study warrant further attention. Yet the scope of this study prevents us to conduct more in-depth investigation focusing on special needs students. Comprehensive analyses on use of CT with students with special needs (e.g., disabilities) are recommended for future research, considering multiple ways in which CT are used (e.g., drill and practice, remediation, instructional enhancement, and differentiated instruction).

Although initially planned to examine the same issues on attitude toward mathematics, we encountered very limited data and, therefore, had to drop this outcome and focused our attention on mathematics achievement. As a result, we have gained very little on the effects of CT on affective outcomes such as attitude toward mathematics. Overall, the impact of CT on the affective domain in mathematics is an uncharted field. We encourage more researchers to engage in empirical studies on the effects of CT on affective outcomes such as attitude toward mathematics and mathematics anxiety to accumulate primary studies for a future meta-analysis. Our meta-analysis to some extent has exposed the narrow focus of existing primary studies on the effects of CT on mathematics education. To have a fuller understanding of the impact of CT, further research would need to broaden outcome measures in both cognitive (e.g., different, specific mathematical areas such as algebra and geometry) and affective (e.g., attitude, anxiety, self-concept) domains as a way to provide comprehensive information on how CT may affect mathematics learning.

As well, we do not have data to examine if the change in technology has led to a change in CT uses in classrooms. It is a worthy issue to be explored in future research. We also recognize that our definition of mathematics achievement does not distinguish different potentially assessable mathematical knowledge. Future studies are recommended to consider mathematics achievement that reflects subtlety with respect to the developing theories of mathematical knowledge that have been produced in various research traditions. In our study, if a study had multiple effect sizes from the same subjects, we chose the one most directly addressing the study purpose. We understand that such approach loses data or information that could be potentially useful and suggest that future studies are needed using multi-level modeling approach.