1 Introduction

Reading comprehension is one of the most important part of foreign language learning and is a key criteria for evaluating learners’ language skills. According to the famous input hypothesis [1] of Stephen D. Krashen and many other linguists’ research, the most efficient way to improve reading comprehension is giving reading materials that have slightly higher level than the reader’s reading ability. Reading too many easy texts will become meaningless repetition. On the other hand, second language learners will lose their confidence and interests if the difficulty of texts are too high [2]. Therefore, classifying reading materials by their readability plays a crucial role in foreign language learning.

Readability classification of traditional paper-based reading exercise usually needs manual work of language experts. Obviously, it takes a huge human resource overhead. To overcome the high manual cost drawback, researchers have focus on automatic readability sorting based on computers.

With the development of distributed networks and web service technology, online learning systems and online testing systems such as TOFEL iBT® [3] become more and more popular. Internet-based learning and testing systems tend to contain more and more reading materials. Therefore, manual classification becomes increasingly impossible. Furthermore, the online systems should have a big advantage of real time. It will become a great regret if the web-based learning systems could not real-time update texts’ readability ranging according to the correct rate of the learner’s answers and display reading materials of different readability for users with different reading comprehension.

The most intuitive criteria of readability include word length, number of affix, abstraction level of words, number of polysemy, sentence length [4]. Scholars constitute the formula Flesch Reading Ease in [5]. To classify documents according to their readability by machines, the studies in [68] use variants of Flesch Reading Ease. Such methods are easy to deploy; however, the weights of indicators are experience dependant and the classification results have greater subjectivity.

To enhance the accuracy of classification, Socher et al. [9] introduced statistical model into readability sorting. However, the runtime complexity also rises rapidly along with the performance improvement. Schwarm and Ostendorf [10] applied a support vector machine (SVM) for measuring the perplexity of reading materials. The result is really outstanding if not consider the time and computational consumption.

The work in [11] investigated a lack of training data classification problems and presented a machine learning-based comparator for readability classification when reducing the dependence on training set. However, the large granularity of readability is sometimes disadvantage to finding the suitable difficulty levels for readers.

A measure based on an extension of multinomial Naïve Bayes classification that combines multiple language models to estimate the most likely grade level for a given passage is proposed in [12] which can achieve high precision with a low time and computational overhead.

Unfortunately, above methods can neither real-time update the readability nor respond to the reading comprehension of users.

The ideology of classifier committees in which using committees of classifiers stems from the intuition that a team of experts, by combining their knowledge, was proved to produce better results than a single expert alone [13], especially in AdaBoost algorithm family [1416]. Moreover, when using language learners as the base classifiers and the reading materials as the training set, Boosting algorithms can get the real-time readability of texts and the reading comprehension of users. The problem is that Boosting-based algorithms are difficult to achieve readability initialization.

In this paper, we investigate the readability classification problem for online foreign language learning. This article contributes the following:

  1. 1.

    A modified method based on the Smoothed Unigram model [12] for better adapt to the online documents readability sorting.

  2. 2.

    A Boosting-based classification algorithm for update texts’ readability in real time according to readers’ response.

  3. 3.

    A novel weighting mechanism for evaluating users’ reading comprehension.

  4. 4.

    A novel online reading system which can automatically make readability classification and select suitable reading materials for learners in different levels adaptively.

In the remaining sections, we start by reviewing the key problems of text classification especially readability classification (Sect. 2). In Sect. 3, the modified Smoothed Unigram model for online documents readability initialization is presented. A Boosting-based classification algorithm is proposed in Sect. 4 for real-time update texts’ readability and learners’ reading comprehension. Adaptive selection of reading materials according to the language levels of users is achieved in Sect. 5. The overall structure of the novel system is shown in Sect. 6. Finally, Sect. 7 summarizes the paper.

2 Background

2.1 Review of text categorization

Similar to other data classification tasks, text classification also called text categorization (TC) can be defined as the problem of approximation an unknown category assignment function F: D × C → {0, 1}, where D is the set of all possible documents and C is the set of predefined categories:

$$ F(d,c) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {d \in D\,{\text{and}}\,d\,{\text{belong}}\,{\text{to}}\,{\text{class}}\,c} \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
(1)

The approximating function F: D × C → {0, 1} is called a classifier. The task is to build a classifier that produces results as close as possible to the true category assignment [17]. The most popular text classification algorithm include Naïve Bayes [18], decision tree [19], SVM, neural networks, Rocchio [20], k-nearest neighbors (kNN) [21] and Boosting-based algorithm such as AdaBoost [22]. Generally speaking, Naïve Bayes has the worst precision and the best efficiency, decision tree and neural networks are middle-level algorithm, kNN, SVM and AdaBoost have the highest classification accuracy [23]. However, the computational consumption of SVM is larger than many other TC methods.

2.2 Readability classification

Text classification is widely used in many different aspects of real world. The most common use of TC is to determine topic type of documents. For example, determine whether a text download from www.time.com belongs to politics, economics or entertainment. Spam filtering, orientation analysis and readability sorting are also major applications of text classification. The schematic of different applications of TC as shown in Fig. 1.

Fig. 1
figure 1

Different applications of TC

As an important issue in text mining, readability sorting has a wide application in education, publication and search engine. The two major methods to distinguish the difficulty of the text is readability formula and statistical language model-based method.

Readability formula-based classification has an 80-year history and more than a 100 variants. Currently, the most classic readability formulas are Flesch Reading Ease, Gunning Fog Index [24] and Automated Readability Index [25]. Although the ease of use, formula-based readability sorting gradually become less attractive because the unsatisfactory classification accuracy.

Instead of processing texts as strings, statistical language model-based readability classification can deep into the interior of language [26]. It got considerable development in past decade. Similar to other machine learning-based text classification problems, readability sorting needs a labeled document collection as the training set. The training set should be preprocessed. The preprocessing includes words segmentation, part-of-speech (POS) tagging, exclusion of stop words. Features would be selected or extracted from the pretreated training set. After the computation of features, readability model is built by referring the manual labels. Then, the testing set will be classified by the built model according to their readability. The flow chart of statistical language model-based readability sorting is shown in Fig. 2.

Fig. 2
figure 2

Flow chart of statistical language model-based readability sorting

How to select the most representative features and how to avoid large scales of training set dependence are the two important problems which would deeply affect the performance of readability classification.

3 Initial readability sorting

Although Boosting-based classification algorithm may be very powerful in readability update and learners’ reading comprehension evaluation, it cannot be used in the initial readability sorting because it is low efficient to find the suitable features for representing the reading difficulty. Therefore, we need to learn from traditional readability classification algorithm to determine the initial readability levels of reading materials.

The language modeling Naïve Bayes-based approach of readability classification [12] algorithm proposed by Kevyn Collins–Thompson and Jamie Callan is an accurate tool for text difficulty prediction. It called Smoothed Unigram model. In addition, Smoothed Unigram model has no domain limitation. They excluded the measurement of sentence length. However, different from native language learners, it is an important indicator about readability for foreign language learners [27]. Therefore, the Smoothed Unigram model should be modified to meet the request for foreign language learning.

3.1 Original Smoothed Unigram model

Smoothed Unigram model is based on a variation of the multinomial Naïve Bayes classifier. In text classification terms, each class is described by a language model corresponding to a predefined level of difficulty. For English online texts, it trained 12 language models corresponding to the 12 English grade levels.

The language models used in [12] are simple: They are based on unigrams and assumed that the probability of a token is independent of the surrounding tokens, given the grade language model. A unigram language model is defined by a list of types (words) and their individual probabilities. Although this is a weak model, it can be trained from less data than more complex models and turns out to give good accuracy for our problem.

In Smoothed Unigram model, a document D is assumed to be generated according to the following steps:

  1. 1.

    Using the prior distribution P(L i ), choose a readability level model L i from the set of unigram models L. The distribution of L i in the word space W is multinomial.

  2. 2.

    Using the distribution \( P(N|L_{i} ) \), determine the words number N of a passage in tokens.

  3. 3.

    Modeling features of the passage by bag of words model, using the Naïve assumption which promised the independence between each token to sampling L i N times. Build the model L i according to the N tokens.

Assuming c(t) is the count of type t in D. The probability of D given in model L i is:

$$ P(D|L_{i} ) = P(N|L_{i} ) \cdot N!\prod\limits_{t \in D} {\frac{{P(t|L_{i} )^{c(t)} }}{c(t)!}} $$
(2)

Obviously, the runtime complexity of Eq. (2) is huge because of the computation of the high times power and factorial. Therefore, Smoothed Unigram uses Bayes’ rule to find the most likely level language model given in the text D. In other words, find the model L i which maximized \( N(L_{i} |D) \) as:

$$ P(L_{i} |D) = \frac{{P(L_{i} )P(D|L_{i} )}}{P(D)} $$
(3)

As the assumptions which the levels have equal initial distribution and the word number N of passage is independent to the levels, former function can be simplified as:

$$ N(L_{i} |D) = \log \,Z + \sum\limits_{t \in D} {c(t)\log P(t|L_{i} )} $$
(4)

Logarithm is used in function (4), and log Z represents combined factors involving passage length and the uniform prior P(L i ). Hitherto, the Smoothed Unigram model is built simply by Eq. (4).

3.2 Modified Smoothed Unigram

The effectiveness of original Smoothed Unigram is proved in [12]. However, two characteristics make it not very ideal for online language learning and second language learning. Firstly, in original Smoothed Unigram, sentence length is not considered as a syntactic component. It is certainly suitable for native language learners. However, for foreign language learners, sentence length is a major reason which affects the readers’ understanding [28]. Therefore, the average sentence length and maximum sentence length should be taken into the readability calculation. Secondly, 12-level division of text difficulty will lead to large online time consumption. Moreover, classified reading materials into too many grades will make the readabilities discrimination between two adjacent grades blurred and limit the systems' accuracy.

To solve the problems above, we modified Smoothed Unigram model in two aspects: take the sentence length into consideration and change the readability division to seven levels.

Assuming the number of sentences in a text is n, the length of sentence i is S i , and the comprehensive sentence length σ can be used as an additional readability indicator as:

$$ \sigma = \frac{{\sum\nolimits_{i = 1}^{n} {S_{i} \cdot \hbox{max} (S_{i} )} }}{n} $$
(5)

Above function only considered the number of words in the longest sentence. Its utility is easy but may lead to noisy sentence sensitive. Noisy sentence means the sentence is extremely long but it only appears once in the text and has a very low semantic importance for understanding the full passage. A sentence length threshold δ can be introduced to minimize the interference of noisy sentence. The length of the longest sentence could be replaced by the number of over-long sentences as:

$$ \sigma = \frac{{\sum\nolimits_{i = 1}^{n} {S_{i} \cdot \sum\nolimits_{j = 1}^{m} {S_{j}^{{\prime }} } } }}{n} $$
(6)

where S′ is the set of sentences in which the sentence length is beyond δ. The number of sentences in S′ is m, \( S_{j}^{{\prime }} \) is the length of jth sentence in S′. According to the statistic in [29], the average length of English sentences is more or <15.4 words in different corpora, and we use 22 as the value of δ in this article.

Using supplementary indicator σ to improve the ability to distinguish between text readability, the final equation of original Smooth Unigram model can be modified as:

$$ N^{{\prime }} (L_{i} ) = k\sigma \left[ {\log \,Z + \sum\limits_{t \in D} {c(t)\log \,P(t|L_{i} )} } \right] $$
(7)

where N′(L i ) is the final readability, \( k \in [0,1] \) is an empirical coefficient.

In order to increase the dissimilarity between classes, we merged the 12 grades of readability in Smoothed Unigram into seven levels. By the benefits of this modification, the system’s runtime complexity will reduce and user experience of readability will be more obvious. The seven levels of readability can be represented using uppercase letters A–G with the increasing difficulty. Correspondence between letters and readability is shown in Table 1.

Table 1 Correspondence between letters and readability

Finally, the initial readability sorting based on modified Smooth Unigram model will work following the below steps:

  1. 1.

    Preprocessing Make pretreatment for input documents in vocabulary level.

  2. 2.

    Feature selection Represent documents by most representative features.

  3. 3.

    Modeling Construct Smoothed Unigram model.

  4. 4.

    Calculate the readability of document using the model and sentence length.

  5. 5.

    Output letters which represent the readability of input reading materials.

4 Online-Boost algorithm

Boosting is a voting-based method for improving the performance of classification by using a group of weak classifiers instead the attempt to build single powerful strong classifier [29]. In Boosting, the classifiers are trained sequentially. Before training the ith classifier, the training set is reweighed with greater weight given to the documents that were misclassified by the previous classifiers. The reweight strategy is suitable for updating the readability. What’s more, online learners can be used as a group of base classifiers. Therefore, Boosting is a helpful methodology for online readability update and learners’ reading comprehension evaluation.

The original Boosting algorithm uses three weak classifiers \( (c_{1} ,c_{2} ,c_{3} ) \) to form a committee. It divides a large training set into three parts \( (X_{1} ,X_{2} ,X_{3} ) \) randomly and uses X 1 to train c 1 firstly. Then, it uses the subset of X 1 which is misclassified by c 1 and X 2 together as the training set of c 2. The rest can be done in the same manner.

4.1 AdaBoost algorithm family

The AdaBoost algorithm is the best known example of Boosting approach [30]. Because of its high performance, AdaBoost become a big algorithm family. AdaBoost.MH [31], AdaBoost.P [32], AdaBoost.L [33], Multi-class Adaboost [34] and asymmetric AdaBoost [35] are the most important variants of AdaBoost family. Schapire [36] restated the original AdaBoost, some later researchers perfect its details [3739].

To control the computational cost in a reasonable range, AdaBoost uses a dual-weighted process to choose training sets and classifiers. The detailed steps of AdaBoost are as follows:

  1. 1.

    Given training set \( (x_{1} ,y_{1} ),(x_{2} ,y_{2} ), \ldots ,(x_{n} ,y_{n} ) \) where x i is the training sample and \( y_{i} \in \{ 1, - 1\} \) denotes x i’s category label \( (1 \le i \le n). \)

  2. 2.

    Let \( f_{j} (x_{i} ) \) denote ith feature of jth document.

  3. 3.

    Define the initial distribution of documents in the training set \( D_{I} (i) = \frac{1}{N} \)

  4. 4.

    Searching weak classifier \( c_{t} (t = 1,2, \ldots ,T) \): for jth feature of every sample, a weak classifier c j can be obtained, and thus get the threshold \( \theta_{j} \) and orientation P j to minimum the error \( \varepsilon_{j} \) as below:

    $$ \varepsilon_{j} = \sum\limits_{j = 1}^{n} {D_{i} (x_{i} )\left| {c_{j} (x_{i} ) \ne y_{i} } \right|} $$
    (8)

    Therefore, the weak classifier c j is:

    $$ c_{j} (x) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {P_{j} f_{j} (x) < P_{j} \theta_{j} } \hfill \\ { - 1,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
    (9)
  5. 5.

    Choose c j from the whole feature space which has the minimal error \( \varepsilon_{j} \) as the weak classifier.

  6. 6.

    Recalculate the feature of samples:

    $$ D_{t + 1} (i) = \frac{{D_{t} (i){\text{e}}^{{( - \alpha_{i} y_{i} c_{i} (x_{i} ))}} }}{{Z_{t} }} $$
    (10)

    where Z t is a normalization factor which makes \( \sum\nolimits_{i = 1}^{n} {D_{t + 1} (i) = 1} \) and \( \alpha_{i} \) is the weight.

  7. 7.

    Repeat the steps above T times and get T optimal weak classifiers with different weights.

  8. 8.

    Combine weak classifiers according to their weight to construct a strong classifier:

    $$ C_{\text{strong}} (x) = {\text{sign}}\left( {\sum\limits_{t = 1}^{T} {\alpha_{t} c_{t} (x)} } \right) $$
    (11)

Training set utilization can be enhanced using the algorithm above through adjusting the weights of misclassified texts. In addition, the performance of base classifiers is evaluated by the weighting process. These attributions of AdaBoost are useful for online readability and reading comprehension update.

However, the direct use of AdaBoost in online language learning system is impossible. Several significant differences exist between traditional AdaBoost and the request of online readability classification. First of all, online learning systems emphasis on users’ experience; therefore, the readability should be judged by users. However, in traditional AdaBoost, this work is undertaken by machine learning algorithms such as SVM, neural networks. Secondly, the classification ability of base classifiers is transparent to the user. In online learning system, the classification ability equals user’s reading comprehension when using learners as the weak classifier. Therefore, it is important to make users know their level of language skill. Thirdly, AdaBoost cannot give base classifiers corresponding documents according to their classification ability. This disadvantage should be overcome to make the system automatically select reading materials with suitable difficulty for users of different reading comprehension. In addition, the online system must have higher robustness because human classifiers have more randomness than machine classifiers. In other words, the online learning system may face more noise.

4.2 Readability updating

The advantage of using online learners as base classifiers in the system will no longer need preprocessing. Obviously, not only words segmentation and POS but also feature selection and document representation are not necessary for human to make classification. More than 60 % time cost will be saved [40] by this way.

The most intuitive criterion of learners-based readability classification is average correct rate. The high average correct rate reveals the low semantic difficulty. Without considering the users’ language skills, when all the questions of a text are answered correctly by all the users, it will be regarded as extremely easy text and vice versa. Three assumptions are very important in the initial step of readability update and reading comprehension evaluation:

  1. 1.

    Readability of texts is independent of each other.

  2. 2.

    Language skills of learners are independent.

  3. 3.

    All learners have the same initial reading comprehension.

Assuming the premier readability of document k evaluated online is R 1(k), it can be calculated as:

$$ R_{1} (k) = {1 \mathord{\left/ {\vphantom {1 {\frac{{\theta \sum\nolimits_{i = 1}^{n} {C_{i} } }}{nQ}}}} \right. \kern-0pt} {\frac{{\theta \sum\nolimits_{i = 1}^{n} {C_{i} } }}{nQ}}} $$
(12)

where θ is the initial readability determined by modified Smooth Unigram of document i, C i is the number of right answer given by learner i, n is the number of learners, Q is the sum of questions in text i.

Similar to the first step of readability update according to the average correct rate, the initial reading comprehension of learner j S 1(j) can be computed as:

$$ S_{1} (j) = \frac{{\sum\nolimits_{i = 1}^{m} {\theta_{i} T_{i} } }}{{\sum\nolimits_{i = 1}^{m} {Q_{i} } }} $$
(13)

where m is the number of texts, θ i is the readability of text i given by modified Smoothed Unigram model, T i is the sum of right answer in text i, Q i is the question number of text i.

Since system calculated and recorded the data of users’ language skills, in the second step of readability update can take more information into account for getting a more precious result. It is easy to imagine that if a reading material has a low correct rate when trained by a high language skill user, it may have very low readability. Similarly, high correct rate text which will be answered by low reading comprehension learners reveals it is less difficult. In other words, correct rate of low language skill users and error rate of high language skill users are more important and should have greater weights.

To determine high and low reading comprehension, two thresholds α and β are introduced into the system. When the language skill of learner j is lower than α, its contribution for readability will be replaced by a lower bound R MIN, and when its language skill is higher than β, its contribution for readability will be replaced by an upper bound R MAX. Assuming the readability of document k in the second update round is R 2(k), it can be calculated as:

$$ R_{2} (k) \, = R_{2} (A) + R_{2} (B) + R_{2} (C) $$
(14)

where R 2(A), R 2(B) and R 2(C) are defined as:

$$ \left\{ {\begin{array}{*{20}l} {R_{2} (A) = R_{\text{MIN}} ,} \hfill & {S_{1} (j) \le \alpha } \hfill \\ {R_{2} (B) = {1 \mathord{\left/ {\vphantom {1 {\prod\nolimits_{i}^{n} {S_{1} (j)C_{j} ,} }}} \right. \kern-0pt} {\prod\nolimits_{i}^{n} {S_{1} (j)C_{j} ,} }}} \hfill & {\alpha \le S_{1} (j) \le \beta } \hfill \\ {R_{2} (C) = R_{\text{MAX}} ,} \hfill & {S_{1} (j) \ge \beta } \hfill \\ \end{array} } \right. $$
(15)

The readability update procedure can keep on running by an iterative way according to the function above:

$$ \left\{ {\begin{array}{*{20}l} {R_{i} (A) = R_{\text{MIN}} ,} \hfill & {S_{i - 1} (j) \le \alpha } \hfill \\ {R_{i} (B) = {1 \mathord{\left/ {\vphantom {1 {\prod\nolimits_{i}^{n} {S_{i - 1} (j)C_{j} ,} }}} \right. \kern-0pt} {\prod\nolimits_{i}^{n} {S_{i - 1} (j)C_{j} ,} }}} \hfill & {\alpha \le S_{i - 1} (j) \le \beta } \hfill \\ {R_{i} (C) = R_{\text{MAX}} ,} \hfill & {S_{i - 1} (j) \ge \beta } \hfill \\ \end{array} } \right. $$
(16)

Using the iterative equation above, system updates the readability of reading materials in real time according to the learners’ performance of article understanding. The work steps of readability update is shown in Fig. 3.

Fig. 3
figure 3

Work steps of readability update

Above function only considered the symmetry case which all learners are in the same round. However, the most commonly situation is asymmetric. The randomness of online behavior makes it impossible that all the users practice same number of documents.

Fortunately, formula (16) can meet the request of asymmetric situation when making a small modification. Assuming the practiced texts of learn i is n i , replacing the former definition of R i(B) by:

$$ R_{i} (B) = \frac{{\sum\nolimits_{i = 1}^{n} {S_{i} (n_{i - 1} )} }}{{\sum\nolimits_{i = 1}^{n} {n_{i - 1} \cdot S_{i} (n_{i - 1} )} }} $$
(17)

Hitherto, the system can update readability of reading materials in real time in different situations. Furthermore, the online readability update algorithm is noise insensitive according to the analysis above.

4.3 Reading comprehension evaluation

In the former subsection, only average correct rates are considered when making reading comprehension evaluation. It is the simplest but not the best way to examine users’ language skills. To improve the performance of reading ability evaluation, some statistical characteristics are introduced.

Stability is an important indicator for performance evaluation. If the correct rate of a learner’s answers in different texts fluctuate greatly, his language ability will be considered not so good even he has the same correct rate with other users. To avoid the runtime complexity cause by power computing, following variant of correct rate variance can be used:

$$ V_{i} = \sum\limits_{i = 1}^{n} {\left[ {\frac{{T_{i} }}{{Q_{i} }} - \sum\limits_{i = 1}^{n} {{{\frac{{T_{i} }}{{Q_{i} }}} \mathord{\left/ {\vphantom {{\frac{{T_{i} }}{{Q_{i} }}} n}} \right. \kern-0pt} n}} } \right]} $$
(18)

The above formula also has the role of filtering. Imagine the situation that a weak-skill language learner encounters the article which he had read, the correct rate would be enhanced by this kind of passages. On the other hand, when a high comprehension user just tests the function of the system and gives the random answers, his correct rate will be reduced. Influence of these interfering factors could be limited by function (18).

Correct rates of texts with different readability should play different roles in reading comprehension evaluation. The readability can be used as weight of correct rate:

$$ S(j)^{A} = \frac{1}{m}\left( {\sum\limits_{j = 1}^{m} {R(j)T_{j} /C_{j} } } \right) $$
(19)

Another valuable statistical characteristic is the correct rate of a learner when he is tested by difficult reading materials. When two users have approximately same average correct rate, the one who have higher correct rate in harder test probably has better language skills. The standard of difficulty can be defined as:

$$ R(j) > \frac{1}{2}\left( {\mathop {\hbox{max} }\limits_{j \in m} \left\{ {R(j)} \right\} + \frac{1}{m}\sum\limits_{j = 1}^{m} {R(j)} } \right) $$
(20)

When readability meets the above constraint, the reading comprehension can be calculated as:

$$ S(j)^{B} = \xi \frac{1}{m}\left( {\sum\limits_{j = 1}^{m} {R(j)T_{j} /C_{j} } } \right) $$
(21)

where \( \xi > 1 \) is an empirical parameter.

Similarly, the concept of easy can be defined as:

$$ R(j) < \frac{1}{2}\left( {\mathop {\hbox{min} }\limits_{j \in m} \left\{ {R(j)} \right\} + \frac{1}{m}\sum\limits_{j = 1}^{m} {R(j)} } \right) $$
(22)

When the readability of document j satisfied above constraint, its dyslexia is very low. Therefore, the correct rates of these reading materials should make less contribution in the comprehension evaluation. An empirical parameter \( \zeta \in [0.5,1] \) can be used as:

$$ S(j)^{C} = \zeta \frac{1}{m}\left( {\sum\limits_{j = 1}^{m} {R(j)T_{j} /C_{j} } } \right) $$
(23)

In this way, the weights of correct rates in easy texts are limited by function (23). Integrating the analysis above, the final reading comprehension S(j) of learner j can be evaluated by following function:

$$ S(j) = S(j)^{A} \cdot S(j)^{B} \cdot S(j)^{C} $$
(24)

When learners use the system more than once, their reading comprehension can be evaluated and updated by using Eq. (24) iteratively.

However, similar to function (16), above function also just considers the symmetry case in which all the reading materials are in the same round. Function (19) can be modified to increase the scope of application of the system by using the number of documents which learner j has read instead the sum of texts. The detail of reading comprehension evaluation is shown in its pseudo code (Fig. 4):

Fig. 4
figure 4

Pseudo code of reading comprehension evaluation

where t 1 is the lower bound of readability and t 2 is the upper bound of readability. By introducing the parameter ρ, variance of correct rate between different reading materials is considered when making the reading comprehension evaluation.

4.4 Overview of Online-Boost

As the former analysis, update of readability needs to call the computation results of reading comprehension evaluation and the evaluation of language skills needs to call the readability. Therefore, some gradation strategy should be designed to avoid the system fall into an infinite loop.

As we know, the initial readability of different reading materials has given by modified Smoothed Unigram model. The initial readability can be used for first-step reading comprehension evaluation and the evaluated reading comprehension can be used for first-step readability update. Readers can choose the level of first reading text as their language skills are unknown. In this way, a cross-iterative mechanism can be used as the final integration strategy. The algorithm which can package deal with readability update and reading comprehension evaluation is called Online-Boost in this paper. Its operation sequence is shown in Fig. 5.

Fig. 5
figure 5

Operation sequences of Online-Boost

Following steps above, the online language learning system can update documents’ readability and evaluate users’ reading comprehension in real time. The comprehensive weighting method guaranteed system’s accuracy. Furthermore, efficiency of the system will be enhanced significantly without the procedures of feature selection and feature computation.

5 Preparing for application

As the core of online foreign language learning system, Online-Boost algorithm is constructed in previous section. However, the algorithm could not be used directly to build a system because condition missing exists in the system.

5.1 Scalability

Reading materials of the system should be dynamic. It means system managers could increase documents to the system after it became online. Constantly updated reading resources will increase the attractiveness of the system and make the system to meet the needs of more users with different interests. Therefore, the system should have a high scalability for new texts.

To achieve scalability, we create tables in the database which are devoted to store new reading materials. When the new documents are entered, they will be stored in a dedicated database space. Then, the modified Smoothed Unigram model will call these new texts for initial readability computation. After the initial readability sorting, the new reading materials will be considered as known difficulty documents, so they will be stored and updated together with other texts. Thus, the dedicated tables are cleared to receive new documents. The detailed steps are as follows:

  1. 1.

    The database is divided into two parts: D 1 for storing readability unknown documents and D 2 for storing readability known documents.

  2. 2.

    Input new reading materials set R 1 and store them in D 1.

  3. 3.

    Call modified Smoothed Unigram model to calculating initial readabilities of R 1.

  4. 4.

    Dump all the documents in R 1 and their initial readabilities in D 2 and empty D 1.

  5. 5.

    Use texts in D 2 as the learning materials and D 1 to receive new documents.

5.2 Concurrency control mechanism

As an online system which may face many users at the same time, it must have the ability of concurrency control. In online learning system, the most important concurrency is users submit answers of a document at the same time. Different from the classic online ticketing problem that only one user’s submission could be accepted, in the language learning system, all results submitted at the same time by all users should be accepted by the learning system to draw a comprehensive conclusion of the document’s readability.

Many mechanisms can be used in different scenarios for concurrency control such as lock-based protocol, timestamp-based protocol, mechanism based on validity checking and multi-version concurrency control (MVCC) [41]. The characteristics of database transactions in online foreign language learning system are as follows:

  1. 1.

    Higher requirement in response speed of read operations relative to write operations.

  2. 2.

    More read operations than write operations.

  3. 3.

    Write conflict is not frequent because many reading materials in the same difficulty grade can be used.

In a word, it is a higher read concurrency, higher read response and lower write concurrency system. Therefore, MVCC should be used to improve the efficiency of the system.

5.3 Anonymous access

Registration is needed for recording and updating users’ reading comprehension and readability updating. However, registration process may make potential users of the system lose patience. Anonymous access may lead to serious interference such as malicious submissions of random answers and repeat test with same documents. The former behaviors will falsely increase difficulty of reading materials, and the later will lead to over-evaluation of readability.

To solve problems brought by anonymous access above, a treated differently strategy could be used.

6 Experiment

We use the novel system for foreign language (English) education in The Second Junior High School of Pingluo. Fifty-three students participated in the use of the new systems in a semester. We used two methods to evaluate the effect of the novel system. The first method is system satisfaction survey. We give a survey for each student, and the effect of the novel system is divided in 6 levels: very helpful, helpful, a little helpful, hard to say, helpless and totally helpless. As a subjective feelings of students for the language learning system, the experimental results are shown in Table 2.

Table 2 Results of effect survey

Changes in academic performance of students can be used as objective criteria for the effect of the novel method and system. We choose six test papers from a standard examination paper library which are divided into several standard difficulty levels, two test papers in each difficulty level. The exams were taken before and after the use of new system, spaced 4 month. The changes of average score are shown in Table 3.

Table 3 Changes of average scores before and after the use of new system

Experimental results show that the systems can significantly improve learners’ language skills, and their feeling of the learning system is also quite good.

Moreover, we test the novel methods efficiency. We simulated that 100 users submitted their answers at the same time, in this situation, the time delay of the system is less than 1 s.

7 Conclusion

Readability is a crucial indicator for second language learning. In this article, a readability-based language learning system is completely constructed. It uses modified Smoothed Unigram model for reading material’s initial readability classification. Correct rate of users’ answers will be used for readability update and reading comprehension evaluation after the system online. The updating and evaluation are achieved by Online-Boost algorithm which is the core of the system. Similar to AdaBoost, Online-Boost algorithm uses the methodology of Boosting which forms a committee by a group of experts. Different from AdaBoost algorithm family members, Online-Boost uses language learners as base classifiers. In this way, high time and computational feature selection procedure is avoided and the classification results are more in line with the user’s subjective experience. In addition, in traditional Boosting-based categorization algorithm, weak classifier has higher classification ability is not important. Users only focus on the final classification result. However, in Online-Boost, because users are weak classifiers, their classification ability (reading comprehension) will be solved as vice products with no extra cost. Moreover, the system is designed with scalability. Since the system was used in a junior high school for English learning, its effect for foreign language study can be evaluated in a real teaching environment. The experimental results reveal that the method proposed in this article is helpful for improving language learners’ reading comprehension.