Keywords

1 Background

There exist several software aids for speaking mathematical expressions in web pages and elsewhere (e.g., MathPlayer, JAWS, Safari + VoiceOver, ChromeVox). A spoken expression is comprehensible when the expression being spoken is short. For most people, working memory is limited to around 7 words [1], and may be shorter when dealing with mathematics due to the density of its notation. This makes comprehension of larger expressions difficult via speech alone. One obvious solution is to allow users to navigate expressions so they can rehear parts and better understand the structure of the expression.

Several systems have implemented some form of navigation including the earliest systems for speaking math: Aster [2] and MathTalk [3]. Aster used a strict tree-based model of navigation. MathTalk and subsequent systems rejected that as too complicated and used a tree only for two-dimensional notations such as fractions and roots. Subsequent research efforts including MathGenie [4] and AudioMath [5] also supported navigation. Currently available math-to-speech systems include MathPlayer [6], ChattyInfty [7], ChromeVox [8], Safari + VoiceOver, and JAWS: all support navigation. ChromeVox, Safari + VoiceOver, and JAWS navigate math similar to MathPlayer’s simple mode (see below); ChattyInfty’s navigation is similar to MathPlayer’s character mode.

In collaboration with the Educational Testing Service (ETS) as part of an IES grant, Design Science added the ability to move around/navigate expressions to MathPlayer. Both NVDA and Window-Eyes make use of MathPlayer to generate math speech, with several other assistive technology companies looking into using MathPlayer. The MathPlayer navigation work includes many capabilities not found in prior work; it is discussed in the next section.

Only MathTalk and MathGenie have published user studies and for both of them, studies were done with sighted users. The IES study is the first to use blind and low vision students to compare comprehension and usability of speech versus braille and large print for mathematical expressions. The findings are discussed in the remainder of this paper.

2 Implementation

Navigation was added to MathPlayer for a navigation study and modified some for the MathPlayer 4 release based on feedback from the study. Navigation in MathPlayer is performed via keyboard commands. Features include:

  • Moving/Zooming: This is the basic mode of navigating. Three modes of moving around an expression are supported (see below). Arrow keys are used to move left/right and to zoom in/out of expressions.

  • Descriptions/Overviews: Users can choose between hearing the expression read to them or hearing a description (overview) of the expression (e.g., “fraction plus something plus 1”). Overviews can be set as a default when moving around or can be heard via key commands.

  • Place markers: 10 place markers are supported. At any point, users can set, move to, or hear what is at the place marker. This is particularly useful for cancelling fractions, marking coefficients for systems of equations, etc.

  • Where am I: the ability to recall context without moving (e.g., “x + 1 inside of the fraction with numerator x + 1 and denominator x squared minus one”). The ability to get more and more context along with the ability to get the entire context is provided.

A unique aspect of MathPlayer’s navigation is the ability to navigate in different modes: character, simple, and enhanced. To illustrate the differences, this sample expression is used:

$$ 2\sqrt {x^{2} - 4} + 3a\sqrt {x + 1} $$
(1)
  • Character/Word: navigate the leaves of the tree. E.g., moving to the right by typing the right arrow key in the above expression, results in the user hearing “2”, “inside square root, in base, x”, “in exponent, 2”, “out of exponent, minus”, etc. Character and Word mode differ only for multi-digit numbers such as 128 and multi-character identifiers/operators such as “sin”.

  • Simple: navigates by word except for 2D notations such as fractions and exponents. For these, the entire 2D notation is spoken. Users zoom into and out of the notation to hear parts of it. This is the common model that is implemented in many systems such as Safari, ChromeVox, and JAWS. In simple mode, moving to the right in the above expression, the user hears “2”, “times the square root of x squared minus 4”, “plus”, “3”, “a”, “times the square root of x plus 1”.

  • Enhanced: infers what the expression tree is for the math and moving left/right uses that structure. E.g., in the example above, one would hear “2 times the square root of x squared minus 4”, “plus”, “3 a times the square root of x plus 1”.

Another unique aspect of MathPlayer’s navigation is “auto zoom in”/“auto zoom out”. A description can be found in [6]. Several power users (those who read at very high TTS speeds) requested that auto zoom out be turned off. These users said that they commonly “bang” multiple times on the arrow key and want to use the end of a structure to act as a wall that stops them. No student in the IES study requested this. The ability to turn to turn off auto zoom out was added to the final release of MathPlayer. “Shift arrow” will auto zoom out even if it is turned off. This provides a way to avoid having to “back out” (zoom out) of a nested 2D notation.

3 Study Results

The IES grant consisted of MathPlayer development along with four feedback studies and a final pilot study covering all aspects of the grant. The four feedback studies looked at a new speech style (ClearSpeak [9]), various forms of prosody and lexical cues to resolve speech ambiguities, navigation, and authoring documents (aimed at teachers). After making changes based on the studies, these features were evaluated in final pilot study [10]. This paper discusses the navigation study and the pilot study.

IRB approval of the studies was obtained and all participants signed consent forms. As thanks for participating in the study, the students received gift cards in amounts ranging between $25 and $125 depending upon the length of the study.

3.1 Navigation Study

The initial navigation study involved 20 students with blindness or low vision in classes ranging from algebra 1 to pre-calculus. Each participant read through an interactive tutorial to learn and practice MathPlayer’s navigation features. Based upon their experiences from the tutorial for each of the navigation features, the study asked:“how easy/hard was it learn…” and “how likely are you to use…”. The students found it easy to learn most features. On a scale ranging from 0-3, with three being “very easy,” the mean was between 2.44 and 2.79. Three features were viewed as less likely to be used:

  • Describe/Overview (1.78)

  • Placemarkers (2.21)

  • Where am I (2.28)

Describe/Overview mode was the least developed feature in MathPlayer, so it came as no surprise to us that it was the least liked feature. There are two problems with Describe/Overview that we were aware of:

  • More effort needed to be spent determining the amount of detail to provide. E.g., the expression

$$ x^{2} + \frac{1}{{x^{2} + 1}} + 1 $$
(2)

is read as “something plus fraction plus 1”. It would probably be better to read it as “x squared plus one over something plus 1”. That is only slightly longer, but it provides much more detail.Footnote 1

  • We debated using the words “term”, “factor”, “exponent”, etc., instead of “something” in expressions. Ultimately, we used the generic word “something” because the semantics of the expression aren’t fully known and we felt that using a wrong word might be misleading. One student suggested using “term”, etc., when asked what they would like to see changed; most students had no suggestions for improvement.

There were two things about place markers that confused some students. As implemented (for simplicity of implementation), place markers are local to each expression: they can only reference the current expression and disappear when the expression being navigated is exited. A couple of students didn’t seem to realize this and asked for a method to clear the place markers. One student asked for more than 10 place markers (place markers are currently bound to keys 0–9 for simplicity).

There were two comments about “where am I”: one person wanted it to go from the bigger to the smaller (whole context then current location) and one person wanted an indication of how deeply nested they were. The rest either had no comment or thought it was fine the way it was.

In the final pilot (see next section), students were again asked about specific navigation features and how they helped their understanding and solving math problems in the pilot. Table 1 (below) shows the responses from the pilot study (one student didn’t answer this question). As can be seen, the results are similar to those found in the navigation study. Several questions tried to get information about on how the students liked the three navigation modes. Students’ answers varied widely as to their preferred mode, although many of students said they made use of all three modes and found each useful for different situations.

Table 1. Ratings of navigation features (adapted from [10])

3.2 Final Pilot

The final pilot involved 21 students, 17 of whom had also participated in the navigation study. They were given two similar documents: a Word document with math problems (accessible via TTS + NVDA + MathPlayer + MathTypeFootnote 2) and a braille, regular print for CCTV, or large print document based upon their preferences or previous usage. Students were divided randomly into two groups. Each group received paired documents in different orders (speech first or last), with each document containing 16 questions (32 total). This allowed a comparison between our speech-based solution and the student’s preferred non-electronic format.

Prior to the experiment, students familiarized themselves with MathPlayer by going through a tutorial. On the day of their study participation, they practiced with two problems to make sure they remained comfortable with the system. Each part of the pilot began with a sample problem and answer followed by problems the student should solve. Here are a few examples:

  1. 1.

    How many zeroes are there to the right of the decimal point in the number \( 3.0000001 \)?

  2. 2.

    The following questions are based on the polynomial

    $$ 12x^{6} + 18x^{2} + 35x^{7} + 5x^{15} + 45 + 16x^{12} $$
    1. (a)

      How many terms does the polynomial have?

    2. (b)

      What is the coefficient of \( x^{2} \)?

  3. 3.

    Simplify the expression \( 4 + 3x - 2 + 8y - 2x - 3y + 5 - 4y + 10x \)

  4. 4.

    What is the value of the expression \( 3\left( {\left( {6 + 5} \right) - \left( {8 - 4} \right)} \right) - 2 \)?

  5. 5.

    Simplify the algebraic fraction \( \frac{{\left( {x + 1} \right)\left( {2x - 3} \right)}}{{\left( {2x + 1} \right)\left( {x + 1} \right)\left( {2x + 3} \right)}} \).

    1. (a)

      What is the numerator of the simplified fraction?

    2. (b)

      What is the denominator of the simplified fraction?

Net scores were computed for the paired (spoken and other format) problems as follows:

  • 0: student answered both the spoken question and its non-spoken clone correctly/incorrectly

  • 1: student correctly answered the spoken question but not its non-spoken clone

  • −1: student incorrectly answered spoken question but correctly answered its non-spoken clone

The average net score per question was 0.125 (Std. Dev. 2.73). This indicates that the students’ performance using speech was similar to their performance using their usual format (insignificant bias towards speech). In other words, despite less familiarity with the speech solution, students performed comparably to the familiar but more costly printed solution.

Table 2 (below) shows that most students performed similarly on the two formats independent of the question with two exceptions: question 3.2 (example 3 above, much worse with speech) and question 4.3 (example 4 above, much better with speech).

Table 2. Performance on each question and format (adapted from [10])

Table 3 shows the data per user along with their favorite modality for accessing math. The maximum net difference for a user was just 2, showing that speech is a viable option among all users in the study independent of their preferred format. Despite the students’ similar performance across formats, on a feedback question, student’s expressed a small preference for their usual format. We looked at the results for those who answered that they would always or would usually prefer their usual method. The data showed that their math scores were slightly higher for speech. Also, the time they spent on the problems in each method didn’t correlate with their preference.

Table 3. Net (spoken vs. non-spoken) and per-format scores by student (N = 21), aggregated across question pairs and sorted by net score, showing selected demographic information from the background questionnaire. *B = “blind”, LV = “Low Vision” (Adapted from [10])

Students were asked at the end of each document how easy or difficult it was to understand the math in the document. Almost all of the students said understanding the speech was “somewhat easy”, compared to “very easy” for their preferred format.