Keywords

1 Introduction

Communication is fundamental for the social development of human beings. When, for any reason, speech is impeded, the possibility of achieving true social fulfillment is considerably reduced. As a means of socialization and compensatory mechanism, deaf individuals have developed their own language, sign language. In the face of the inability of deaf individuals to use spoken words and as an act of solidarity with this sector of the population, society must commit to promoting the dissemination of sign language among hearing individuals. Nowadays, society advances, and new technologies are created to assist certain segments of the population, but the question is: Why, with so much technology available, is it not being used to create a more effective communication system between those with hearing disabilities and those without? Current systems are slow and costly, leading to significant inclusion challenges for individuals with such disabilities in social, professional, and other contexts.

Actually is estimated 1.3 billion people experience significant disability. This represents 16% of the world’s population, or 1 in 6 of people. The persons with disabilities, suffer to stigma, discrimination, poverty, exclusion from education and employment, and barriers faced in the systems itself [1].

In Mexico 7.6% of the population lives with hearing disabilities. It is estimated that, in Mexico, about 2.3 million people suffer from hearing disabilities, of which more than 50% are over 60 years old, just over 30% are between 30 and 59 years old, and about 2% are girls and children, according to data from the Ministry of Health [2].

Artificial intelligence is an area of ​​science with great growth where there are different emerging technologies. Some of them are being used to support the inclusion of people with sensorial disabilities, an example is the use of Augmented Reality for people with visual disabilities [3], Tangible Interfaces, Gamification, Extended Reality for Blind and Autistic People [4] to mention a few, this work shows the emerging technology related to voice recognition to people whit visual disability [5].

In response to this need, an innovative application has emerged, which is a voice recognition to sign language translator. This application combines powerful voice recognition technology with a deep understanding of sign language, providing a revolutionary tool to facilitate seamless communication between hearing and deaf individuals. It should be noted that sign language can vary in certain aspects between different regions or communities. Therefore, the application must consider these variations and generate translations adapted to the specific characteristics and conventions of each context. This application has the potential to facilitate real-time communication, allowing deaf individuals to interact more effectively and fully participate in various environments.

A series of applications have been developed with the aim of addressing the inclusion issues of people with hearing disabilities. However, they are not sufficiently accurate in tackling this problem since they only translate word by word and take time to display the entire sentence that the user wants to understand. Additionally, glove sensor-based systems have been explored, but their accessibility and costs make them less viable for the general population. The designed application seeks to solve the issues of other systems, as well as to innovate and offer highly precise and high-quality natural language processing. To achieve this, a series of important concepts were needed to address the development of the translation application effectively.

To do this, it is necessary to understand what artificial intelligence (AI) is, which, according to Oracle, Mexico [6], refers to a discipline of computer science that focuses on designing and developing systems capable of performing tasks that generally require human intelligence. AI is based on the idea that machines can learn from experience and improve their performance in specific tasks over time. For this purpose, a branch of AI focusing on speech recognition was used. As stated by Rodríguez, J. L. O. [7] in his article, computer speech recognition is a complex task involving pattern recognition and biometric systems. Typically, the speech signal is sampled in a range between 8 and 16 kHz. In the experiments reported in this work, a sampling frequency of 11025 Hz was used. The speech signal needs to be analyzed to extract relevant information once it has been digitized.

According to MDN [8], the Web Speech API is the application programming interface (API) developed by the World Wide Web Consortium (W3C) that allows website and web application developers to integrate speech recognition and speech synthesis functionalities into their products [9]. It is important that the application has a high level of effective language, where the most well-supported idea comes from Zendesk [10], defining effective communication as one in which a message is shared, received, and understood without altering its ultimate purpose. In other words, the sender and the receiver interpret the same meaning. This way, doubts and confusion are avoided while meeting expectations regarding what has been conveyed.

It is known that the database is a very important technology for processing all the information provided by the Web Speech API. Therefore, Oracle mentions that it is an organized collection of structured information or data, usually stored electronically in a computer system [11]. Typically, a database is managed by a database management system (DBMS). Undoubtedly, the database has a close relationship with natural language processing. According to Jurafsky and Martin [12], natural language processing is a branch of artificial intelligence that focuses on the study of interaction between computers and human language. The goal of NLP is to enable computers to understand, process, and generate human language naturally, just as a person would. Speaking about the front-end or visual interface of the application, it is necessary to consider usability and accessibility.

The International Organization for Standardization mentions that usability refers to the extent to which a product or system can be effectively, efficiently, and satisfactorily use d by its users to achieve specific goals in a given context. It is a discipline that focuses on designing products and systems that are easy to use. Likewise, the International Organization for Standardization states that accessibility refers to the ability of a product, service, or environment to be used by all people, regardless of their physical, cognitive, or sensory abilities [13].

2 Methodology

The methodology employed in this work is based on experimental research, which consists of several stages: data collection, data analysis, and measurement. To carry out this process, it was broken down into various procedures that can be observed in Fig. 1. First, an inductive course on Mexican Sign Language (LSM) was conducted with the purpose of understanding how deaf individuals interpret information, in order to perform effective natural language processing.

Subsequently, semantic fields were defined based on vocabulary, identifying the most frequently used fields and words. Once this step was completed, each LSM sign was created, totaling approximately 650 signs [14]. Next, speech recognition was implemented using the Web Speech API to integrate it with the database creation and application design.

After all these elements were defined, the most complex phase was tackled: natural language processing, which allowed for the adaptation of information for deaf individuals to comprehend. Following this, a sequential search in the database was conducted to display each corresponding sign to the user. This process culminated successfully with an accuracy of over 90% and a processing time of less than 2 s.

Fig. 1.
figure 1

“Experimental Research” Methodology

3 Results

The methodological process previously outlined has been carried out. First, the selection of semantic fields was undertaken, as can be seen in Fig. 2. It is worth noting that a total of 650 signs were recorded [15], a sufficient quantity to establish basic and intermediate dialogues. Once these semantic fields were defined, the conception and design of the application, as detailed in Fig. 3, took place, and the voice recognition functionality was integrated. When the corresponding button is activated, as shown in Fig. 4, the application begins to transcribe and process the user’s auditory input.

Fig. 2.
figure 2

Semantic Fields in Spanish

Fig. 3.
figure 3

Main Interface

Fig. 4.
figure 4

Voice Recognition Interface

Once these interfaces are defined and when the Web Speech API transmits the information obtained from speech recognition, the next step is to process that information in order to generate the corresponding signs for the message intended for people with hearing impairments. To carry out natural language processing with this information, the process began by understanding the difference in communication for deaf individuals as compared to hearing individuals. In this context, three fundamental rules were established, which are:

  • They handle verbs in their infinitive form.

  • They remove connectors and words that are not important to convey a message.

  • If the sentence doesn’t indicate the tense, then it is specified whether it is past, present, or future. (This is more applicable when it comes to gestures rather than written form)

With these 3 rules, a message can be transformed from how a listener communicates to how a deaf person communicates. An example would be:

Listener: “The movie will be at 5:00 in the afternoon.”

Deaf: “movie future be 5:00 afternoon.”

An algorithm, in conjunction with a table, has been developed for the purpose of applying the previously mentioned rules. Initially, verbs were processed in their infinitive form using the table in Fig. 5. This table, through programming, analyzes the text string containing the message and detects the verbs present in it. When a verb is identified, it is replaced with its corresponding infinitive form. Additionally, in this process, the third rule is implemented, which involves determining the tense of the message. To do this, a numerical value is assigned in the position before the verb: 0 for past, 1 for present, and 2 for future.

Fig. 5.
figure 5

Synthesizer (conversion of verbs to their infinitive form)

Next, the algorithm proceeds to the second phase, which involves the removal of connectors and terms that do not contribute to the understanding of the message. For this purpose, the table described in Fig. 6 was used, in which the algorithm carries out an exhaustive search to identify if the message contains terms listed in the table, proceeding to eliminate them if affirmative.

Fig. 6.
figure 6

Eliminator (Removes unnecessary words and connectors)

Once the message has been adapted for the understanding of people with hearing disabilities, the search for each required linguistic signal to convey that message is initiated. To carry out this process, the approach described in Fig. 7 is employed, where the algorithm performs a thorough search in a predefined table for the corresponding signal. If the signal is not found in the table, the system breaks down the word in question into its individual components, generating a distinctive signal for each letter. Essentially, a phonetic enumeration of the word not found in the table is carried out to ensure that any term can be translated effectively and accurately.

Fig. 7.
figure 7

Language (Sign Search)

Once the natural language processing process is completed, the subsequent action consists of presenting the identified gestures in the last step, which are displayed on the interface, as illustrated in Fig. 8.

Fig. 8.
figure 8

Sign Language Display Interface

Once the message has been presented in sign language to the user, the interface intended for the hearing-impaired user to input the message they wish to convey to the hearing user is displayed, as illustrated in Fig. 9. Clicking on the speaker icon will activate voice transmission, thus establishing high-quality communication between both users, free from temporary issues, signal quality degradation, and other limitations.

Fig. 9.
figure 9

Text-to-Speech Translation Interface

Once the application was completed, it underwent evaluation by several professionals specialized in the field. Subsequently, a presentation was held at the Aguascalientes Special Education Institute, as illustrated in Fig. 10, where an invitation was extended to teachers from the institute to evaluate the application. These educators, both men and women, used the application among themselves, and they were provided with detailed information about the objectives and concepts presented in this article.

Fig. 10.
figure 10

Application Evaluation

In total, three presentations were conducted during the technical council meetings. In addition to involving individuals with hearing disabilities, their family members were also included in the process. Once each presentation was concluded and after several participants had the opportunity to test the application, a questionnaire was provided to them. Figure 11 shows question 1 of the questionnaire, where the majority of the respondents were education professionals, including teachers. Other types of users, such as social workers, students, managers, and administrative staff, also participated.

Fig. 11.
figure 11

Question 1 of the assessment.

Below, in Fig. 12, the results of question 2 are presented, which proved to be very useful as it highlights the diversity of users in terms of age. This is of particular importance when specifying the applicability of the tool, as age can sometimes correlate with experience in using Mexican Sign Language (LSM).

Fig. 12.
figure 12

Question 2 of the assessment.

Then, they were asked for their opinion about the application, and after analyzing all the responses, a significant acceptance from the users was observed. Many of them expressed a genuine desire for the application to be available in production, which will undoubtedly contribute significantly to improving inclusion in society. Finally, they were inquired about possible improvements or suggestions for the ongoing development of the application, and detailed tracking of these comments has been carried out, as detailed in the Discussion section of this article.

4 Discussion

In an increasingly digital and connected world, smart applications are playing a fundamental role in the inclusion of people with hearing disabilities. These tools not only facilitate everyday communication but also open new opportunities for participation in society. From voice recognition apps and automatic sign language translation to connected assistive hearing devices, technology is paving the way for more accessible and equitable communication. However, it is essential to consider aspects such as accessibility and data privacy when developing and using these applications, ensuring that they truly achieve their goal of improving the quality of life for people with hearing disabilities.

Advancements have been made for future work that is in development. In Fig. 7’s interface, in addition to each sign language phrase, there will be a graphical representation of that sign language next to it. For example, if you want to display the sign for a house, the sign will be shown alongside an image of a house. This is because not all deaf individuals are proficient in sign language, so the support of images will aid in a quicker understanding of the message the speaker wants to convey to the deaf person.

Additionally, efforts are being made to expand the sign language vocabulary, creating signs for different semantic fields such as medicine, engineering, education, sports, etc. This way, more technical communication can be established without the algorithm having to spell out words extensively because certain signs are not in the database.

The final improvement will be to make the application accessible throughout Mexico. This is because sign languages differ in some states or regions, so signs will be created and separated depending on the region, and the translation will be based on that. This ensures that no matter where in the country the application is used, it will always work with 100% precision and efficiency.

5 Conclusions

The significant impact that an artificial intelligence-developed application can have on helping society is interesting. In the case of this research, it is concluded that the development of an intelligent application for inclusion, along with the created and implemented form, was of great assistance. Based on experimentation and testing, it was evident that people were quite interested in the application. This led to the conclusion of how necessary the application is and what sets it apart from other applications in the market.

The development of an application that translates speech into Mexican Sign Language represents a significant advancement in inclusion and accessibility for individuals with hearing impairments in Mexico. This application has the potential to break down existing communication barriers, enabling a smoother and more natural interaction between people who use different forms of communication.

By providing real-time translation from speech to Mexican Sign Language, the application facilitates communication in various settings, such as education, employment, healthcare, and social interactions. Deaf individuals will be able to access real-time information, actively participate in conversations, learn about various subjects, and express their thoughts and feelings more effectively.

It is crucial to emphasize that the inclusion of people with hearing impairments requires a collective effort from society, institutions, teachers, administrators, social workers, students, and individuals. Adequate policies and legislation are needed, along with the promotion of accessible environments and tailored resources.