Keywords

1 Introduction

The process of simplifying texts affords better comprehension on the part of struggling readers. Text simplification generally involves manipulation at the syntactic, lexical, and discourse level. All simplified texts share the same goal: reducing a reader’s cognitive load and increasing text comprehensibility on the part of the L2 reader [1, 2]. The basis for text simplification is the notion that if written content is accessible, then beginning level readers, such as second language (L2) readers, can use the input to better test and confirm language hypotheses [3]. In general, much of the language to which beginning level readers are exposed has been simplified to make it easier to comprehend. For instance, most readings provided to L2 students contain less sophisticated words, fewer rare words, greater syntactic complexity, and more explicit cohesive devices such as connectives or lexical overlap between text segments [1, 2]. However, in almost all cases, a human has to manually simplify the text at the grammatical, syntactic, morphological, or lexical levels [4].

The aim of this paper is to propose a novel method of automatically simplifying texts using sequence-to-sequence Machine Learning models in order to paraphrase certain expressions into easier to understand, equivalent forms. Such an approach has strong potential to aid practitioners, teachers, and textbook writers to better meet the needs of students with lower reading skills.

2 Method

2.1 Corpora

Three datasets were used in the simplification algorithm. First, phrases and paraphrases were collected from the ParaPhrase DataBase (PPDB) [5], which consists of English pairs of phrases and paraphrases, with their associated alignment and entailment properties, with three types of paraphrases: lexical, phrasal and syntactic. For the purpose of this project, the PPDB XXXL English pack was filtered such that only those pairs of source-target phrases that correspond to equivalence entailments remained, with the target text being chosen as the one to maximize the Dale-Chall readability formula [6].

The second source of simplified data came from WordNet synonym sets. The WordNet lexical database [7] contains synsets (i.e., sets of synonyms) which can be used to generate synonym pairs by intersecting the synsets of various dictionary terms. Using these, we supplemented our paraphrasing data with additional pairs of synonyms to expand the number and range of potential rephrases. Age of acquisition (AoA) scores were used for establishing a simplification criterion (i.e., we selected which words in the synonym set were easier to understand based on AoA scores).

Another dataset integrated into the corpus consists of sentence aligned pairs between the Simple English Wikipedia entries and their corresponding English Wikipedia entries [8]. This corpus has been previously used for textual simplification and presents a good diversity of simplified sentence pairs.

The three simplified paraphrase sources in our corpus have significant differences when it comes to the scope and nature of the simplifications they provide, allowing for more robust model development. Synonyms from Wordnet tend to be only one word long, while PPDB typically has phrases of 6 to 8 words in length and the Simple Wikipedia aligned dataset uses entire phrases.

2.2 Model Architectures

The Transformer we used [9] followed an encoder-decoder architecture. The inputs consisted of sequences of word embeddings, which were then modified by adding a positional encoding that uniquely identifies each position in the text. The resulting embeddings were processed by a multi-head attention layer that consists of a self-attention distributed across a number of heads. Attention computes the compatibility function of a query Q given a set of corresponding key-value pairs (K-V). These relationships modeled by self-attention do not necessarily correspond to those typically understood in natural language (e.g., syntactic structure, coreferences etc.), but are rather some latent dependencies that arise from the text.

A variation of the Transformer is the Universal Transformer [10], an extension of the original architecture that is Turing complete. The Universal Transformer uses for recurrence either a separable convolutional or a neural network with a rectified linear unit activation and two affine transformations [10].

3 Results

BLEU scores [11], one of the frequently employed metrics for machine translation, were used to evaluate the models. BLEU scores range from 0 to 100, where 100 indicates that the translation is identical to the reference translation. The BLEU score is usually formed as a geometric mean of the individual n-gram precision scores combined with a brevity penalty, assigned so as to discourage shorter translations. In addition to the deep learning models described previously, the BLEU scores for a “Repeater” provide an estimate of the similarity between the normal and simplified phrases. Both the evaluation and the model training were conducted using the tensor2tensor library [12] (Table 1).

Table 1. BLEU scores for the tested models.

Transformer-based models attain BLEU scores that indicate good generalization, with the Universal Transformer model presenting less overfitting. Simplification is only performed on phrases instead of paragraphs or the whole text because the data present in the corpus is, at most, limited to sentences. Table 2 presents examples of paraphrase suggestions generated by the Transformer model.

Table 2. Sample paraphrases generated for an input essay in ascending order of BLEU scores.

As a post-hoc analysis, we used a corpus of 100 texts [4] which were each simplified to three levels (advanced, intermediate, and elementary) to better assess the performance of the model on real world texts. We measure the uncased BLEU score for the Transformer model paraphrases generated on the advanced texts and compare them to their intermediate and elementary forms. We also try various probability thresholds which indicate the minimum joint probability of a candidate simplification. All evaluations are performed using the Transformer model. The results from Table 3 indicate that the more alterations the model is allowed to make (lower thresholds), the worse it performs. One reason for this may be the manner in which the human experts perform alterations in these texts, such as the use of sentence fusion, phrase splitting, phrase reordering. and the elimination of certain sequences of text wholesale. These alterations are beyond the capabilities of what our model has been trained to perform, although they provide insight into future directions for analysis.

Table 3. BLEU scores for the Transformer model’s translations on the real-life testing corpus.

4 Conclusions

In this paper, we analyzed the capabilities of modern Neural Machine Translation models in the context of text simplification, via paraphrasing. By expanding on previous work done by Kauchak [8], we generate a text simplification dataset that includes samples of varying scopes: synonyms, few word idioms, and entire phrases. We set up our learning problem such that the models are trained to transform an English sequence into another, equivalent, sequence with higher readability. We then train Machine Translation architectures consisting of encoder-decoder Neural Networks in order to evaluate how well they can transduce text written in English into a simpler form.

Our results suggest that human modifications to the text diverge from those found in the textual simplification corpora we used. The reference simplifications tended to include stylistic and structural alterations, such as fusing or breaking up phrases, eliminating portions of the text, and changing the structure of the document.

Our constructed dataset expands on those commonly used in text simplification and we show that the neural models examined in this study are indeed capable of generalizing on these data. A future avenue of research for this topic is the construction of a dataset that is better aligned with the kind of alterations humans make during essay simplification. This might require the addition of syntactic parsers, part of speech taggers, and tools that can measure elements of text cohesion including vectors of connectives and semantic representations across texts. This work and future endeavors of this kind have strong potential to make crucial contributions to students’ capacity to understand and learn from text - a concern of a broad range of practitioners and researchers.