1 Introduction to AAC system in English language teaching

AAC techniques complement or substitute presentation or reading for people with impaired English language development (Caron et al., 2021). The AAC is utilized by individuals with various language and words disabilities, including chronic disabilities such as neurological disorder, cognitive disability, and autism (Huifeng et al., 2020). An AAC includes all forms of interaction that could encourage people with very few or no responsive statements to interact with each other (Su et al., 2015).

AAC consists of a broad range of procedures, including technology instruments that encourage people's freedom with specific interaction requirements (Manogaran et al., 2020). Increased and alternative communication increases awareness and improves overall teaching based on personal and computer interaction (Rodríguez et al., 2020). Numerous pupils with education momentum and speech impairments for English can communicate with the connected engagement among people and machines through intelligent devices (Dash et al., 2021).

Alternative communication means a structure of interaction for those who cannot speak orally (Soh et al., 2020). If the instructor can communicate verbally, it uses the sign for interaction with students, using ampliative communication (Khamparia et al., 2020).In the first instance, the student does not want to communicate because of the specific mutism caused by anxiety and it can generate the English language or sounds (Gheisari et al., 2021). If the identical student could not speak, perhaps because of birth wounds or afterward, AAC would substitute for speaking English (Sedik et al., 2020). The alternative way uses Educational approaches to substitute verbal presentation to replace oral speech (Hu et al., 2018). If the instructor can speak, verbally utilizes verbal interaction for communicating with students, teachers use information exchange (Amudha, 2021).

Increased and alternative communication in the school environment is essential (Nguyen et al., 2021). With some form of speech disorder, almost 80% of children aged 3 to 17 years experience choices those students (Gao et al., 2020). A trustworthy and effective individual needs to make sure students experience should experience nothing incorrect with each other (Samuel et al., 2020). If the students have anxiety issues due to a speech impediment, vocal signals, students can avoid learning essential communication abilities (Amudha et al., 2019). AAC is very well established that one of the essential gentle talents in any sector is good communication regarding English (Kuthadi et al., 2021). Augmentative communication is necessary to ensure that a child communicates with them in the best way possible without mental illnesses (Zhou et al., 2020).

Increasing and alternative communication could be time absorbing process in the education sector (Light et al., 2019). Fortunately, Educational technology has made the AAC process fast and uncomplicated. Educators have been compelled to use artistic interaction and symbolic structures (Abrahamson et al., 2019). Educators already have more possibilities with Educational technology to contact their learners with the English language (Kurilovas & Kubilinskiene, 2020). And several educational institutions are far more conscious and thus provide educational, financial support. The significance of educational institutions have an identity on education; they are: to compute the social process of education, education institutions encourage the spirit of opposition and the formation of social personality. To ensure users can offer efficient integrated courses, it is simple to execute solid lecture ideas and policies (Kumar et al., 2020). Whenever an augmentative technique is being used, it's almost always used for academic purposes. Users can often incorporate spoken word generators effectively in the classroom with many other choices. The main contribution of BDIAI-AAC is described as follows.

  • BDIAI-AAC is a voice acknowledgment trained in an interactive video network. The trained network works on three layers of Artificial Intelligence(AI).

  • The input layer is the speech recognition model that transforms speech to a string spoken by a teacher. The hidden layer methods have the data type corresponding to the points of video graphics. Finally, with an AAC phrase, the Output Layer shows the animation version.

The remaining article is organized as follows: Sect. 2 comprises various background studies concerning Augmentative and Alternative Communication systems in the education field. Section 3 Elaborates the proposed BDIAI-AAC to train individuals in the English language with the neural disorder. Section 4 constitutes the results that validate the performance with the corresponding descriptions. Finally, the conclusion with future perspectives is discussed in Sect. 5.

2 Background study on AAC system in education filed

This section discusses several works that various researchers have carried out; Rosanna Yuen-Yan Chan et al. (Chan et al., 2019) developed A Context-Aware Augmentative and Alternative Communication System (CAACS). To answer the children's English language and learning requirements, the CAAC process enables daily interaction for non—verbal children with reasonable identity. Two folds are the analytical response implemented with the AAC field and demonstrate its efficacy for ID users. Secondly, CAACSopensseveral new possibilities for the mentally challenged by transferring traditionally self-contained AAC devices into a subsystem and enabling it through the Internet of Things usability by system-of-systems (SoS) framework.

Rashed Aldabas et al. (Aldabas, 2019) proposed Special Education Teachers' Perspectives (SETP). SETP looked at specific academic instructors' viewpoints concerning obstacles and mediators using ACC with numerous disabled students in Riyadh (Saudi Arabia). A school population of 172 teachers from such participants reported two AAC barrier survey questions. The results show that the school's environmental element faces the greatest significant barriers, over and above the instructor and learner components. The research results show successful methods to enhance the use of AAC and enhance students' learning perceptions.

Jocelyn Mngomezulu et al. (Mngomezulu et al., 2019) discussed Zulu core vocabulary (ZCV). To educate the collection of language for peers using AAC, ZCV aimed at identifying the vocabulary that is most often and popularly utilized by Zulu-speaking children of pre-school. Throughout frequent pre-actions, interaction specimens would be taken off by six respondents without disorders who spoke Zulu. Both orthographic phrases and anatomical assessment of formatives were used for evaluations. Zulu's linguistic and spelling composition has made it more helpful to analyze core fluency using a developmental approach.

Susheel Joginder Singh et al. (Joginder Singh et al., 2020) introduced Malaysian teachers' experience (MTE). MTE employed a mixture of two phases to investigate the use, knowledge, and conceptions of Malaysian teachers in AAC. Phase 1 implicated 252 teachers who finished the survey to gather national information on their use and general AAC conceptions. Phase 2 consisted of semi-structured discussions with 13 educators with observation in promoting AAC students. Around half of those who completed the program were familiar with AAC and used AAC together with their learners.

Jareen Meinzen-Derr et al. (Meinzen-Derr et al., 2019) discussed Enhancing language (EL). The EL experiment's main aim is to evaluate if integrating central word linguistic communicative techniques into a speech-language treatment scheme for smaller kids with deaf/hard of hearing (D/HH) enhances the spoken results vocabulary. The findings of EL indicated that AAC core term settings screen supplies through iPad technology have helped youths school children with D/HH grow continuously and quickly verbal language skills.

Kerstin M. Tönsing et al. (Tönsing et al., 2019) introduced Multilingualism and augmentative and alternative communication (MAAC). MAAC is implemented to explain the self-reported communicative abilities of multilingual South African older children who use AAC and characterize individuals' connectivity words and social interaction methods. Participants voiced an effort to promote their communicative language skills, citing obstacles that limit communications technologies and reading skills.

Based on the survey, BDIAI-AAC is suggested to teach students English with the Big Data Integrated Artificial Intelligence. BDIAI-AAC is a voice acknowledgment educated in an interactive video system.

3 Big data integrated artificial intelligence for AAC (BDIAI-AAC)

This paper discussed big data and artificial intelligence-based English speech recognition with AAC. The significant progress for detailed description fields like voice processing work in a fairly regulated atmosphere is automatic speech recognition (ASR). This article introduces a method that incorporates several trends of pronunciation for improved recognition of English. This integration is carried out by weighing the ASR system's responses when different language models are set. The weight of each response is obtained by Hidden Markov models (HMMs).

Figure 1 shows the Big data based on English oral Training. Figure 1 shows that the proposed system's key component is the hidden language recognition of the Markov model (HMM). For expanding the Markov idea, the use of hidden units to evaluate how the state's transformation is developed to enhance conventional Markov systems.HMM is exceedingly rich in mathematical design and can form the theoretical framework for a wide variety of language recognition applications. As the usage of HMM in Fig. 1 can portray more details with students' and teachers' training sessions. The entire English oral teaching system is administered and used by three groups of respondents (teachers, students, and administrators): Three types of ID verification and approval are available in the system suggested. There are three groups administrated in the English oral teaching system: students, teachers, and administrators. The groups are aligned in the level of authentication manner in the individual sector. The ID verification lies on the respondent’s system, works upon the mechanism factor of machine controls. The ID validation is mandatory to guarantee the respondent’s presence for learning information. To promote machine entry's future management, identity is the first step at the learning system's beginning. For user demand, two major factors exist the need for information, and one is the necessity of protection. The first applies to the information resources and supports for a particular person and its content; the second refers to an identity worker with a unique ID identity. Teachers will, for example, be involved in building an online archive of queries, content bookstores, evaluations and analyses, teaching guidelines, and answering questions. Moreover, administrator services are not available for teaching system learning content. The training stage is bonded to the entire English language education, unified in user identity authentication management. The training stages include the English material consumption, contents over oral training, and the data to be organized and consumed. The requirements for users apply to the entire oral English learning system. In addition, the system proposed is intended not for special instructions and the use of training systems. There are three types of steps to recognize student English language recognition. (i) input layer speech recognition model, (ii) Hidden layer process (iii) Output layer-based displayed animated video.

Fig. 1
figure 1

Big data-based English oral training

3.1 Input layer speech recognition model

The phoneme sequences were derived from an ASR method's phoneme response for the uncertainty matrix estimate. Phonetic transcripts of the word production of an adapted ASR method received the phoneme sequences. By the following mathematical model, an ASR system's performance is estimated in Eq. (1):

$$\widehat{Z}=\underset{Z}{\mathrm{max}}\,Q\,\left(R|Z\right)\,Q\left(Z\right)$$
(1)

As shown in Eq. (1), ASR system performance has been determined. Where \(\widehat{Z}\) is the ASR device output (vocabulary units) is most probable; where ‚R is ambient assessment; where Q(Z) is the appropriate systemic and pronunciation information or possibility (Language Model, K). Q(R THISTIZ) is the possibility that an instrument interpretation of module Z is related to that of the pronunciation (acoustic model). N grammars usually calculate Q(Z), while HMM or ANN commonly models q (R). The ASR method calculates the most probable phoneme (HMM) model sequence, representing speech \(R\) for word recognition tasks.

Integrating two phoneme strings is used to classify phonemes to predict phoneme confusion—matrix.\(P\), the (correct) phoneme transcript of the word sequence \(Z\) spoken by the speaker. \({\tilde{p }}_{i}^{*}\) Is the ASR method decoded phonemes series.

For the language recognition role, the purpose of the phoneme confusion matrix's modeling is to approximate \(Z\) from the rest of the \({\tilde{P }}^{*}\). The following expression is modeled in Eq. (2):

$${Z}^{*}=\underset{R}{\mathrm{max}}\prod_{i}^{N}QO\left({p}_{i}\right)QO\left({\tilde{p }}_{i}^{*}|{p}_{i}\right)$$
(2)

Figure 2 illustrated the Proposed diagram. As described in Eq. (2), confusion matrix modeling has been discussed. Where \({p}_{i}\) is the \(i\) stage of each sequence in the reference phenomenon \(p\), and ˜ \({\tilde{p }}_{i}^{*}\) the \({i}^{th}\) phoneme in the experimental stage.\({\tilde{P }}^{*}\). (of length M). Equation 2 shows that the most probable word sequence is given the stage of phoneme observed from a speaker. The term \(QO\left({\tilde{p }}_{i}^{*}|{p}_{i}\right)\) represents the phoneme's likelihood alternatively to \({\tilde{p }}_{i}^{*}\) being recognized in \({p}_{i}\) and obtains it from the speaker's confusion matrix. As described in Fig. 2, this aspect has been incorporated into the ASR mechanism. Initially, \(QO\left({\tilde{p }}_{i}^{*}|{p}_{i}\right)\) Predictions are based on a decoded collection of voice and video in training. Then, assessment is done when the \({\tilde{P }}^{*}\). (which is now taken from test speech) is decoded in \(Z\)- sequences using educated technology. The correction method is performed phonetically, and a more precise approximation of \(Z\) is achieved by integrating a word-language model.

Fig. 2
figure 2

Proposed diagram

The integration of confusion matrices has been seen as a mixing issue, and the Gaussian emissions probability mixture model for HMMs is referred to in Eq. (3):.

$${A}_{i}\left({R}_{s}\right)={\sum }_{L=1}^{L}{D}_{il}M\left({R}_{s},{\mu }_{il,}\sum jl\right)$$
(3)

In Eq. (3) and Fig. 3 shows the \({A}_{i}\left({R}_{s}\right)\) is the probability of the vector of observation \(\left({R}_{s}\right)\) being generated from the HMM status has been obtained.\(l\) is the number of the mixing components, \({D}_{il}\) is the weight for a \(l\) mixture component which satisfies \({\sum }_{l=1}^{l}{D}_{il}=1\) and \(M\left({R}_{s},{\mu }_{il,}\sum jl\right)\) indicates a fixed Gaussian volume feature with an average μ (il) vector and correlation coefficients sequence along with stage i.

Fig. 3
figure 3

Probability vector observation

\(l\) is the amount of uncertainty vector to represent for our solution, each collected in a various speech method. \({D}_{L}\) the confusion mass of particular \({L}^{th}\) and \({\left({\tilde{p }}_{i}^{*}|{p}_{i}\right)}^{L}\) the discrete probabilities for the distribution associated with the confusion matrix. Consequently, it is possible to convey cumulative semantic uncertainty vector with several trends of spelling is defined as follows in Eq. (4):

$$QO{\left({\tilde{p }}_{i}^{*}|{p}_{i}\right)}_{mix}={\sum }_{L=1}^{L}{D}_{L}QO{\left({\tilde{p }}_{i}^{*}|{p}_{i}\right)}^{L}$$
(4)

Uncertainty level with discrete probabilities for distribution of the confusion matrix has been shown in Eq. (4).The above step 1 to obtained the student speech recognition process. These found results have been forwarded to the hidden layer.

3.2 Hidden layer process

The RNN Input Layer reflects the input sample, and the data input dimension is the input data of cells. The additives' characteristics are weighed and illustrated in the main, secret level, and the communication is information and outcome at the first minute of the secret layer. Once the activation feature is enabled, the activation function and the hidden layer would be joined. The activation layer for featuring shows the hidden process using the sigmoid function, multiple neurons. The secret layer encloses the linear activation layer and reaction unit. These are the activation and hidden layer used over the hidden Markov method. Equation (5) shows the process of computation of data accessing the hidden RNN layer.

$$\begin{aligned} G_{s} = & G\left( {Z_{{yg}} Y_{s} + Z_{{gg}} G_{{s - 1}} + a} \right) \\ X_{s} = & R\left( {Z_{{gr}} G_{s} + a_{r} } \right) \\ \end{aligned}$$
(5)

Figure 4 explores the Hidden layer implicit function. As inferred in Eq. (5), the uncertainty vector and network value have been calculated. Included is \({Y}_{s}\), the network incident state \(s\), \(G\) is the functional stage of the uncertainty vector layer, \(R\) is the output layer stage, and \({X}_{s}\) is the value of the network with time \(s\). weight \({Z}_{yg}\), \({Z}_{gg}\), \({Z}_{gr}\), and Bias \({a}_{a},{a}_{r}\) are networking criteria, which must be optimized by training.

Fig. 4
figure 4

Hidden layer implicit function

Currently, RNN uses a teaching approach dependent on gradients. The gradient usually disappears progressively during the transmission process due to the long expansion steps of the domain. Consequently, normal RNN cannot catch a long-term dependency within time series data in practical applications. It connects three gates to the neuron and a memory unit that can efficiently monitor neuron entry, storage, and output.

The equation can express the input–output ratio of the LSTM neurons in Eq. (6):

$$\begin{aligned} J_{s} = & \alpha \left( {Z_{{yj}} Y_{s} + Z_{{gj}} G_{{s - 1}} + Z_{{gj}} D_{{s - 1}} + a_{j} } \right) \\ E_{s} = & \alpha \left( {Z_{{ye}} Y_{s} + Z_{{ge}} G_{{s - 1}} + Z_{{de}} D_{{s - 1}} + a_{e} } \right) \\ D_{s} = & E_{s} D_{{s - 1}} + J_{s} tang\left( {Z_{{yd}} Y_{s} + Z_{{yd}} G_{{s - 1}} + a_{d} } \right) \\ R_{s} = & \alpha \left( {Z_{{yr}} Y_{s} + Z_{{gr}} G_{{s - 1}} + Z_{{dr}} D_{{s - 1}} + a_{r} } \right) \\ g_{s} = & R_{s} tang\left( {D_{s} } \right) \\ \end{aligned}$$
(6)

The above Eq. (6) to obtain the input stage's value, forget gate, output gate, and storage unit has been evaluated. Where \(\alpha\) is an activation gate and corresponds to the input stage, and \(J,E,R,D\), the LSTM network will monitor the input flow. Information and reduce gradient deterioration in the workout course due to gate functions and storage components. The standard RNN is still unidirectional, pass the series of historical facts and speech recognition. In Eq. (7), the method of estimation of the bidirectional RNN hidden layer is shown:

$${g}_{s}={Z}_{g}^{s}{\overrightarrow{g}}_{s}+{Z}_{g}^{s}{\overleftarrow{g}}_{s}$$
(7)

Figure 5 expressed the hidden layer process diagram. The described in Eq. (7) hidden layer implicit function has been estimated. The circularity of RNN is partly represented by the hidden layer condition being bound to the original input series, and by the following calculation, the post-input likelihood in Eq. (8):

Fig. 5
figure 5

Hidden layer process diagram

$$Q\left({X}_{1}^{s}|{Y}_{1}^{s}\right)\approx {\prod }_{s=1}^{S}Q\left({X}_{1}^{s}|{Y}_{1}^{s}\right)={\prod }_{s=1}^{S}Q\left({X}_{s}|\left.{g}_{s}\right|{Y}_{s}\right)$$
(8)

As found in Eq. (8), the post-input likelihood has been determined. Below is the conception of the hidden and output layers in Eq. (9):

$$\begin{aligned} g_{s} = & \alpha \left( {Y_{s} ,G_{{s - 1}} } \right) \\ Y_{s} = & h\left( {g_{s} } \right) \\ \end{aligned}$$
(9)

The \(\alpha\) is an activation function in (10) and (11) above, and the particular calculation is as follows.

$$\begin{aligned} \alpha \left( y \right) = & sigmoid\left( y \right) = \left( {\frac{1}{{1 + e^{{ - y}} }}} \right) \\ \alpha \left( y \right) = & tang\left( y \right) = \left( {\frac{{e^{y} - e^{{ - y}} }}{{e^{y} + e^{{ - y}} }}} \right) \\ h\left( w \right) = & softmax\left( w \right) = \frac{{e^{{wi}} }}{{\sum _{{l = 1}}^{l} e^{{wl}} }} \\ \end{aligned}$$
(10)

Figure 6 initialized the LSTM overall Process. The above Eq. (10) obtains the hidden layer activation has been formulated. Step 2 hidden layer processes the string data and matches it with the corresponding video animation composed above section.

Fig. 6
figure 6

LSTM overall process

3.3 Output layer-based displayed animated video

As seen in Fig. 7, the audio-visual ASR model varies in 3 stages: visual front phase architecture, speech-video integration approach, and speech recognition. The proposed method showed the bimodal integration of audio and visual input in the perception of voice. The bimodal integration consists of a single unit section for audio-visual configuration. The main control blocks in automated speech-video voice recognition. Compared with conventionally audio of ASR, digital stage and speech-video fusion modules incorporate a more daunting schedule in automated speech recognition. This activity area improves ASR compared to the conventional audio modality by leveraging the speaker's mouth's visual modality and contributing to speech-video language recognition systems. The parameters used in traditional audio have classification parameters; they are high-frequency, energy function, and speech function parameters. Automatic audio-visual speech recognition presents new and demanding challenges relative to conventional ASR. The activity area of the audio recognition and numbering stage increases the speech level identification. The acoustic model involves an automatic speech recognition system for presenting audio signals into small frames.ASR implements robust facial sensing and position prediction, accompanied by retrieving the appropriate visual characteristics and monitoring the speaker's mouth or lips. They choose ASR as the robust facial sensing and position prediction for providing a best retrieving module on capturing voice signals than existing methods. The conventional techniques are scheduled on featuring signals yet with lower quality assurance. As compared to audio recognizers, two streams of features are now available, one for each model. They include speech streams and video streams; they are integrated for further leveling process. The integration of the speech-video streams can ensure decent device consistency and hopefully significantly outperformance than two individual model recognizers. These two subjects, namely visual fronts and audiovisual fusion, are complicated, and the scientific community has carried out considerable theoretical work.

Fig. 7
figure 7

Output layer video animation

The data mentioned above vector transformations are better adapted to region of interest (ROI) compression than to ROI classification into speaking groups. Linear discriminant analysis (LDA) is better for the above mission, as it maps new space for an increased grading. For automated speech reading, LDA is first suggested. It is immediately added to the vector. LDA is considered in cascade following a single frame ROI vector projection by the principal component analysis (PCA) or on the concatenation of neighboring PCA projected vectors. LDA represents the sequence of stages \(D\) is pre-selected and that the testing stage of the matrix is \({Y}_{k}=1,\dots K\).are labeled as \(D(K)\in D\) (such as HMM statements) aims for the \({Q}_{LDA}a\) matrix such that the proposed workout samples \(\left\{{Q}_{LDA}{Y}_{k},K=1,\dots K\right\}\) is well-distinguished into the class \(D\) package, testing stage of the vector is \({T}_{Z}\) and its \({T}_{A}\) training sample. These matrices are supplied by in Eq. (11):

$${T}_{Z}=\sum_{d\in D}Qo(D){\sum }^{(D)} and {T}_{A}=\sum_{d\in D}Qo(D)\left({n}^{(D)}-n\right){\left({n}^{(D)}-n\right)}^{S}$$
(11)

In Eq. (11), scatter matrix with training sample has been formulated.\(Qo(D)\), is the observational cumulative distribution function, where \({K}_{d}= \sum_{k=1}^{K}{\mu }_{D(K)}\) and \({\mu }_{j,i=1}\), \({n}^{(D)}\) and \({\sum }^{(D)}\) represent the test stage functions average and standard deviation, \(\sum_{d\in D}Qo(D){n}^{(D)}\) is the average stage value. To evaluate \({Q}_{LDA}\), the optimized vectors of the matrix set \(\left({T}_{Z},{T}_{A}\right)\) that satisfy \({T}_{A}E={T}_{Z}\) are first computed. Matrix \(E=\left[{E}_{1},\dots {E}_{C}\right]\)

Single-stream HMMs are used for modeling a series of descriptive audio or speech sequences of dimensional \({C}_{t}\), where A;\(U\) denote audio or visual modalities where \({B}_{t}\) is Gaussianvector as determined as follows Eq. (12):

$$Qo\left[\left.{R}_{s}^{(t)}\right|D\right]=\sum_{L=1}^{{L}_{TD}}{Z}_{tdl}{M}_{{C}_{t}}\left({R}_{s}^{(t)};{N}_{tdl};{T}_{tdl}\right)$$
(12)

As calculated in Eq. (12) Gaussian vector has been determined. For all classes,\(d\in D\) while specify the HMM transformation between different courses. Therefore, the HMM function variable in Eq. (13):

$${B}_{t}={\left[{O}_{t}^{\intercal },{O}_{t}^{\intercal }\right]}^{\intercal }\,where \ \ {A}_{t}={\left[\left\{\left[{Z}_{tdl},{{M}^{\intercal }}_{{C}_{t}},{{T}^{\intercal }}_{{C}_{t}}\right],L=1,\dots ,{L}_{td},D\in D\right\}\right]}^{\intercal }$$
(13)

Above Eq. (13) HMM function variable has been formulated. In (12) and (13), \(d\in D\) while represent the HMM dependent stage, whereas mixed mass \({Z}_{tdl}\) are positively given to one, \({L}_{td}\) denotes the number of mixed values, and \({M}_{C}\left(R,N,T\right)\) is the \(C\)- distribution with average \(N\) and a diagonal vector, its diagonal being denoted by \(T\).

The proposed method above step 3 to the animated video along with a sentence using AAC. The proposed method achieves high efficiency, performance, language recognition rate, prediction, accuracy, stimulus, disorder identification rate.

4 Results and discussion

BDIAI-AAC has been validated based on word recognition rate, prediction rate. BDIAI-AACconsists of English persons in the conversation procedure to promote an English society and get English language samples. Without distortion and resound, the speech input machine has been put in a voice length used in the training model and the remainder for test checking. In this case, interactive media measurements depend upon the type and potentially English language development on the classroom session and are captured by the mental strength of different datasets in the database's specific situations and evaluated by attempting ideal structure efficiency on paired information. The communicative session must always be based on the development approach of the student’s mental strength. Predefined datasets can achieve this on some related circumstances of the data source. The efficiency of the proposed BDIAI-AAC is shown in Fig. 8.

Fig. 8
figure 8

The efficiency of BDIAI-AAC

The performance of verbal-visual data gathered by the English language and the voice data therein can shift radically with time in functional language development. The fluctuations or even structure rate reliance of the constants are necessary to model the English language development. This can then be accomplished by estimating local environmental circumstances first and then searching for significant workable modeling among environmental estimations and stream variables with the pre-computation constants. The performance of BDIAI-ACC is shown in Fig. 9.

Fig. 9
figure 9

The performance rate of BDIAI-AAC

The rapid decrease is primarily due to the massive impact on the vital measuring component's price. The BDIAI-AAC method is continually described by the combined effect of the delivery component of numerous language allocation models for dispersing its matrices in likelihood space. The Word embedding input variables are being used in the language recognition scheme to define the confidence interval of the voice attribute set of specific languages. The language recognition rate of BDIAI-AAC is shown in Fig. 10.

Fig. 10
figure 10

The language recognition rate of BDIAI-AAC

The completed set of straps has been used to generate language development with the given input and the output units when teaching is finished. BDIAI-AAC is then educated to decrease production linear regression separability. Using the BDIAI-AAC functional vectors, the proposed scheme is educated through functional and spatial modeling instruction in language development for neurological disorder persons. The BDIAI-AAC characteristics are officially understood by random initialization and the use of improving service. The disorder rate of persons is identified by the BDIAI-AAC, as shown in Fig. 11.

Fig. 11
figure 11

Disorder identification rate

With either a single layer, the string data attain under half of the flawless neural net mistakes. The proposed method achieves an overall accuracy of more than specific attributes evaluated in the entire AAC system studied up to a hidden unit per layer. BDIAI-AAC often attained a quite high accuracy of structure categorization along with the same data. This precise identification of semi-essential speech components may permit the relatives of responsive language recognition and knowledge structures during the complete identification structure. The accuracy of BDIAI-AAC is shown in Fig. 12.

Fig. 12
figure 12

The accuracy of BDIAI-AAC

The BDIAI-AAC sound training stage's instrument information is the transformative characteristics are retained in the English language. As most computations occur on the output units, the significant number of language statements is considerably introduced to the computation complexity. The AAC is utilized in the first hidden divisions for reduced responsibility. The conversion of an English sentence is obtained by video evaluation in the prediction process. The prediction rate of BDIAI-AAC is shown in Table 1.

Table 1 The prediction rate of BDIAI-AAC

BDIAI-AAC is intended to examine English under the English language identification issue and define English syntax through the AAC model. The layers in AI are suitable for removal visuals, and the cognitive circulatory system is ideal for categorizing speech acceptance which can be identified solely by the animated video. The hidden layer evaluates the string data as well as matches with the corresponding animated video. The range of stimulus of normal speech is shown in Table 2.

Table 2 The range of stimulus of normal speech

The proposed method achieves the highest word recognition and prediction rate when compared to other existing A Context-Aware Augmentative and Alternative Communication System (CAACS), Zulu core vocabulary (ZCV), Special Education Teachers' Perspectives (SETP).

5 Conclusion

This paper presents BDIAI-AAC for training people with a cognitive disorder in the English language. BDIAI-AAC is evaluated by focusing on an edited video network for voice recognition. The trained network of Artificial Intelligence (AI) works on three levels. The input layer is a model for speech recognition, which converts the pupil's speech into strings. The hidden surface practices the string data and matches the video visuals. Finally, the output unit and the AAC sentence will display the animated video. Thus, AI-capable systems and AAC prototypes transform English phrases into related videos or graphics. The comparative analysis of the proposed method BDIAI-AAC with technological advancements has shown that the method reaches 98.01% of word recognition rate and 97.89% of prediction rate, high efficiency (95.34%), performance (96.45%), accuracy (95.14%), stimulus (94.2%), disorder identification rate (91.12%) when compared to other methods.