Keywords

1 Introduction

Language is the most convenient way for people to communicate. With the deepening of economic globalization, people are communicating more and more closely, but it is difficult for people from different countries and regions to communicate with each other. Therefore, how to enable people to carry out efficient language learning is an important part of modern education. We know that the complete language learning includes four aspects: listening, speaking, reading and writing, among which the teaching of “speaking” is the teaching of spoken language one by one, which has been the weakness of teaching all over the world [1]. In addition, for oral tests, whether traditional interviews or the recent rise of the machine test, there are heavy scoring tasks, boring, subjective differences between the evaluators [2]. Therefore, it is urgent to improve the evaluation of pronunciation quality. In this paper, an evaluation model of English continuous speech teaching based on cloud computing platform is proposed. This paper introduces cloud computing technology, constructs cloud computing system, sets up cloud computing evaluation framework of English continuous pronunciation teaching quality, and realizes the evaluation model of English continuous pronunciation teaching quality under cloud computing.

2 Related Work

Relevant scholars have done a lot of research. In reference [3], a holographic mobile phone based application for Spanish speaking children’s basic English vocabulary pronunciation practice is proposed, and a holographic mobile application is introduced, which aims to help Spanish speaking children practice basic English vocabulary pronunciation. In order to stimulate students’ learning motivation and improve their practical experience, multi-channel stimulation (sound, image and interaction) is used in the application of mobile hologram. One experimental group used mobile applications without holographic games, and the other used applications with holographic games. Performance evaluation, satisfaction survey and emotion analysis were conducted before and after the test. The results show that the use of holographic mobile applications has a significant impact on children’s motivation. It also improves their performance compared to the traditional methods used in the classroom. In reference [4] proposed that the differences of vowel duration should be considered from the perspective of phonetic evaluation and stress. According to the evaluation of stress and pronunciation, the generation of vocabulary and phrase stress and the differences of vowel length of Korean primary school students were investigated. This paper analyzes the stress of words and phrases from the perspective of vowel length. Compared with the low level group, the pronunciation of stressed syllables in the high level group was significantly longer than that in the non stressed group. According to the comparison of stress position, it is found that the stress vowel in the second syllable takes much longer to pronounce in all pronunciation evaluation groups. In the evaluation, the duration of the people who produce the stressed syllables is much longer than that of the non stressed syllables. The results show that in the process of teaching, segmented teaching and extraordinary teaching should be carried out at the same time.

3 Overall Design and Research of Teaching Quality Evaluation Model for Continuous Pronunciation

The English continuous pronunciation teaching quality evaluation system mainly divides into the automatic speech recognition part and the pronunciation quality evaluation part.

3.1 Design of Automatic Continuous Speech Recognition Technology

Automatic recognition of continuous speech is a process in which the computer converts speech into transcribed text, and it is an important means for the computer to “understand” speech. Although the continuous speech recognition with large vocabulary has not reached the practical level yet, it is relatively easy for the computer to read the text aloud in the task of speech quality evaluation. The main task of speech recognition is to align students’ pronunciation with the target text. At the same time, some CALL systems use speech recognition to judge abnormal reading such as addition, omission and readback. As shown in Fig. 1, the system first generates recognition information based on a given text, then decodes it through Viterbi based on a pre-trained acoustic model, and finally outputs the recognition results.

Fig. 1.
figure 1

Automatic speech recognition model

The main characteristics of speech signal are short-time average energy. Short time average energy \(E\): refers to the sum of squares of all sample values in a frame speech signal.

$$ E = \sum\limits_{n = 1}^{N} {x\left( n \right)}^{2} $$
(1)
$$ E\left( {\varepsilon_{i} } \right) = 0 $$
(2)

\(x\left( n \right)\): A sample point of the signal.

\(N\): Total number of samples in voice.

3.2 Design of English Continuous Pronunciation Teaching Quality Evaluation

After getting the result of speech recognition, we can extract the corresponding scoring features that can describe the quality of students’ pronunciation, and then compute the machine scores, as shown in Fig. 2. The scoring model is trained on a dataset with manual scores.

Fig. 2.
figure 2

Pronunciation quality evaluation model

Teachers’ evaluation of pronunciation quality includes the examination of pronunciation standard degree, fluency degree and completeness. Therefore, the computer shall, like the teacher, calculate the measures of the standard degree of description pronunciation, fluency degree and completeness (i.e. the scoring characteristics) respectively according to the input speech and text, and then give the machine scores after a comprehensive examination based on the prior knowledge (scoring model). In the current research on evaluation, the posterior probability of integer logarithms of frames is commonly used to describe the students’ pronunciation standard measure, while the speed of speech and the score of time length are commonly used to describe the students’ pronunciation fluency measure. If the recognition results obtained by using the recognition network or language model need to be compared with the reference text, and the scoring characteristics describing the students’ pronunciation fluency or completeness such as addition, omission and readback are extracted [5].

3.3 Construction of English Continuous Speech Teaching Quality Evaluation Model Under Cloud Computing

In order to construct the evaluation model of English continuous pronunciation teaching under cloud computing, cloud computing technology is introduced to change the traditional statistical method and set up the evaluation framework.

3.3.1 Introduction of Cloud Computing

The English continuous pronunciation teaching quality evaluation model based on cloud computing is a computing system to evaluate the English continuous pronunciation teaching quality. In order to improve the accuracy of the computational data, cloud computing technology is introduced, through obtaining the initial parameters of continuous speech evaluation, a set of interrelated data is constructed, and cloud computing technology is used to perform data processing tasks and output computational conclusions according to the quality evaluation algorithm of continuous speech teaching under cloud computing.

The evaluation model of English continuous pronunciation teaching is mainly based on cloud computing technology platform. Cloud computing is a product of traditional computer and network technologies such as utility computing, parallel computing, distributed computing, virtualization, network storage, load balancing, etc. [6]. The goal of cloud computing is to use “computing power” as a common infrastructure, just like power, water, and financial systems. Cloud computing is divided into three main categories: infrastructure services (JaaS), platform services (PaaS) and software services (SaaS). Infrastructure services mainly refer to that users acquire computer infrastructure through network, and deploy and run various software, such as operating system, application program and so on. Platform services consider the platform to include the operating system, programming language environment, database and web server [7]. Users deploy their applications on the platform, which provides hosting services for them. Users cannot manage the platform’s infrastructure, but can set the amount of some of the underlying resources being used. A software service is a model for providing software over a network. Users do not need to develop their own, but instead apply to the service provider to rent Web-based software. Cloud providers install and run applications on the cloud, and users use the software through cloud clients (often browsers). Users cannot manage the underlying infrastructure, but can set up applications in a limited way [8, 9]. Its cloud computing architecture is shown in Fig. 3.

Fig. 3.
figure 3

Cloud computing architecture

3.3.2 Framework of the Evaluation Model

Figure 4 illustrates the basic structure of English continuous speech teaching quality evaluation system: the original speech is converted into digital signal through the microphone, and then sent to the recognition module through the speech processing module to get the recognized text output.

In its natural state, speech recognition is made difficult by the consideration that speakers may not rigorously adhere to grammatical structures, and that word reversals and redundancies may occur [10, 11]. Therefore, it is necessary to establish a blank model of some non-phonetic units in continuous speech, and discard them in recognition to improve the accuracy.

Fig. 4.
figure 4

Assessment Model Framework Diagram

3.3.3 Determination of Evaluation Algorithm

English continuous pronunciation teaching quality evaluation algorithm is the core program of English continuous pronunciation teaching quality evaluation and analysis under cloud computing background. The evaluation algorithm of English continuous pronunciation teaching quality is established, and the evaluation model of English continuous pronunciation teaching quality is embedded.

Firstly, the acoustic features are extracted from the speech of the students, then the basic model of standard speech is assembled into a forced linear matching network according to the speech script that the students need to learn, and the feature sequence is input into the network for forced alignment. The output is the time segmentation information of the real speech and the logarithmic likelihood probability of each phoneme. The segmentation result X is matched with the standard pronunciation model and the model in this paper. The log likelihood ratio score based on cloud computing for defining the feature sequence X is as follows:

$$ V\left( X \right) = \log p\left( {{X \mathord{\left/ {\vphantom {X {\lambda_{m} }}} \right. \kern-\nulldelimiterspace} {\lambda_{m} }}} \right) - \log \left( {{X \mathord{\left/ {\vphantom {X {\lambda_{n} }}} \right. \kern-\nulldelimiterspace} {\lambda_{n} }}} \right) $$
(4)

In this formula, \(\lambda_{m}\) is the standard pronunciation model of the speech to be graded and \(\lambda_{n}\) is the model of this paper. The larger the log likelihood ratio \(V\left( X \right)\) is, the closer the \(X\) is to the \(\lambda_{m}\).

The introduction of cloud computing model into speech scoring calculation can reduce the influence of various factors, which is of great importance to speech quality evaluation. Cloud computing models can be targeted to represent incorrect recognition results that can be easily confused with the standard pronunciation model [12, 13]. This is because the English continuous pronunciation teaching quality evaluation model under cloud computing represents a large spatial distribution. When the model evaluates a feature vector, only a small number of mixed members contribute to the final likelihood value [14, 15]. These few members are the mixture components that are easily confused with the standard pronunciation model.

3.3.4 Determination of Maximum Likelihood Parameter

The purpose of maximum likelihood estimation is to find the appropriate model parameter \(P\) to maximize the likelihood function of the model given the training vector set [16, 17]. Assuming that the available training vector set is \(Y = \left\{ {Y_{1} ,Y_{2} ,Y_{3} , \cdots ,Y_{n} } \right\}\), the likelihood function is as follows:

$$ P\left( {{Y \mathord{\left/ {\vphantom {Y \lambda }} \right. \kern-\nulldelimiterspace} \lambda }} \right) = \sum\limits_{y = 1}^{n} {p\left( {{{y_{1} } \mathord{\left/ {\vphantom {{y_{1} } \lambda }} \right. \kern-\nulldelimiterspace} \lambda }} \right)} $$
(4)

An important feature of MLE is that only when there are enough training eigenvectors can the model estimate converge to the real model parameters [18]. However, the expression of the model does not have a closed form solution, so an iterative method is needed to solve the parameters of the model, which is the expectation maximization algorithm [19]. The \(W\) is solved by introducing the maximum likelihood function.

$$ W = \sum\limits_{y = 1}^{Y} {p\left( {{Y \mathord{\left/ {\vphantom {Y \lambda }} \right. \kern-\nulldelimiterspace} \lambda }} \right)} \log p\left( {{y \mathord{\left/ {\vphantom {y \lambda }} \right. \kern-\nulldelimiterspace} \lambda }} \right) $$
(5)

Finally, the maximum likelihood parameter \(P_{n}\) is solved:

$$ P_{n} = \sum\limits_{y = 1}^{Y} {W\left( {X_{n} Y_{n} } \right)} $$
(6)

Using the above method to get the parameter model of each phoneme can not only reduce the number of training corpus, but also reduce the amount of computation when calculating the log likelihood ratio. Therefore, in practical calculation, this method is often used to obtain the maximum likelihood parameters.

3.3.5 Evaluation Model Embedding

Obviously, it is impossible to evaluate a student’s pronunciation quality comprehensively and objectively by using a measurement method. The results obtained through the scoring mechanism are more abstract to the student and inconsistent with human perception, and therefore this machine scoring is mapped to a more vague classification based on pronunciation expertise [20]. In this system, the score is set to different levels, which is more in line with human perceptual habits, but also has a certain stability.

The model of English continuous speech teaching quality evaluation based on cloud computing is based on the evaluation algorithm, which generates files from the results of the calculation. The embedding process is shown in Fig. 5.

Fig. 5.
figure 5

Evaluation model embedding process

Based on the introduction of cloud computing platform, the model of English continuous pronunciation teaching quality evaluation system under cloud computing is established by the determination of English continuous pronunciation teaching quality evaluation algorithm and the embedding of English continuous pronunciation teaching quality evaluation model.

4 Analysis of Experimental Results

In order to verify the validity and feasibility of the evaluation model of English continuous pronunciation teaching under cloud computing, English courses are selected for experimental analysis.

4.1 Experiment Object

The subjects were randomly selected from five classes of English Majors in a university, 120 students in each class, a total of 600 students. Taking the English course of each class as a reference, the teaching quality of English Continuous pronunciation was evaluated.

Through the evaluation model of English continuous pronunciation teaching quality, the network course of college English is established, and the evaluation of pronunciation teaching quality is carried out.

The selected 600 students were randomly divided into three groups, and three students a, B and C were selected as the leaders of the three groups. Each group leader is mainly responsible for supervising the students’ overall learning situation. When teachers answer questions, they will judge the students’ learning situation in real time according to the learning situation of each group. Because the students in different teaching stages have different teaching tasks, the students will encounter various difficulties in learning, and each student’s learning ability and learning basis are different, so the students’ ability to solve problems in learning is not the same.

4.2 Case Results and Analysis

In order to verify the effectiveness of the evaluation model of English continuous pronunciation teaching under cloud computing, a comprehensive survey of English majors was conducted. According to the survey results, the group led by A is set as Group A, the group led by B is set as Group B, and the group led by C is set as Group C. The results of the survey include monthly examination, mid-term examination and final examination. Different groups of students have different perceptions of the four independent factors: ideological orientation, interactive education, public opinion rendering, and effect monitoring. The comparison of the improvement rates of the continuous pronunciation teaching in the three groups is shown in Fig. 6:

Fig. 6.
figure 6

Comparison of improvement rate of continuous pronunciation teaching in three groups

As can be seen from Fig. 6, the survey covers freshmen in Class 1 of English majors. The results of the survey show that students’ scores in continuous pronunciation teaching have improved. Among the different exams, group A students led by group A showed the most significant improvement in English scores.

On the basis of cloud computing, the survey data show that the undergraduates of the school have a high level of awareness of the four factors of self-media platform. Through training English majors in continuous phonetic pronunciation, the operation of the four factors of self-media platform on English education is studied. The result of the improvement of the teaching quality of continuous phonetic pronunciation by the talent training path platform is shown in Fig. 7:

Fig. 7.
figure 7

The effect of teaching quality of English Continuous pronunciation on the efficiency of performance improvement

As can be seen from Fig. 7, with the change of the running time of English major talent training path, the overall performance improvement shows an upward trend. With the increase of the running time of the we media platform, the change of the independent variables of the platform can significantly improve the running efficiency of English majors, and the independent variables are positively correlated with the operating efficiency. It is proved that the teaching quality of English Continuous pronunciation can improve the efficiency of performance improvement through the English professional knowledge orientation of college students. On the whole, it has a positive correlation with the dependent variables, but the influence degree is weaker than the independent variables.

5 Conclusion

In this paper, a cloud -based assessment model of English continuous speech teaching is proposed. English continuous speech teaching quality evaluation system is mainly divided into automatic speech recognition part and pronunciation quality evaluation part. Based on the analysis and discussion of the two parts, this paper constructs the framework of English continuous pronunciation teaching quality evaluation model, determines the evaluation algorithm and maximum likelihood parameter, and realizes the establishment of English continuous pronunciation teaching quality evaluation model based on cloud computing through the embedding of evaluation algorithm, the teaching quality of English Continuous pronunciation can be improved. It is hoped that this study can provide a theoretical basis for the systematic analysis of English continuous pronunciation teaching quality assessment.

Fund Projects

  1. 1.

    A Study on the Development and Utilization of Student-centered College English Curriculum Resources in the 12th Five-Year Plan of the Chinese Institute of Education (0106129-DX21).

  2. 2.

    A Study on the Construction of College English Microcourse Resources and the Flipping Classroom Teaching Practice in the Construction of Digital Foreign Language Teaching Resources by Higher Education Press in 2017.

  3. 3.

    A Book on College English Application Ability Cultivation and Special training in the Textbook Construction Project of Wuhan Institute of Design and Sciences (JC201801).