Introduction

Today’s smart cities use technology and systems to enhance interactivity by revamping infrastructure, management, communications, and transparency. The “Internet of Things” refers to connections between systems and appliances within our everyday environment, such as houses, vehicles, and office buildings. The Internet of Things can encompass the “1) Personal and Home; (2) Enterprise; (3) Utilities; and (4) Mobile” (Gubbi et al. 2013). Smart cities are one-step up from the Internet of Things, representing a “System of Systems” (Hernández-Muñoz et al. 2011). In effect, this is an aggregation of the Internet of Things, with systems talking to systems on a metropolitan level.

Smart systems and smart cities represent a host of possibilities, particularly for those with physical disabilities. They can also help to monitor people with a variety of medical conditions such as by monitoring pacemaker data of those with heart conditions (Hernández-Muñoz et al. 2011). Theoretically, a smart city would be able to help a deaf resident with real-time visual information, or a blind resident via Bluetooth audio. As Macagnano (2008) notes, “the disabled/elderly have only to synchronize onto a chosen menu (transport, shopping, health care, entertainment) and the surrounding environment becomes ‘really’ barrier-free, by offering visual/audio appropriate information. In this context, audible messages (Bluetooth, WI-FI and next generation wireless technologies) could also verbally describe to the user, in real time, what is happening (next bus stop name, name of shop owner, timetables, etc.) according to his/her 3D position in the city (Macagnano 2008).”

In working to improve outcomes for all in the smart city, we must consider the educational system. Smart schools will be both creators and consumers of data and information communication technologies. But, the pedagogical impact on students is foremost. Smartly interconnected systems and smartly interconnected cities will be for naught if no accommodations are made for educating students with communications or learning disabilities, or those whose native languages are not English. The K-12 and higher education systems are referred to collectively as P-20 education (Futrell 2014). Science, technology, engineering, and math (STEM) students with disabilities are at a particular disadvantage since empirical observations are all vital to scientific research, discovery, and understanding. Captioning of audio and video lectures can help to alleviate these issues. The Universal Video Captioning platform (UVC) provides a semi-automatic approach to synchronize captioning into accessible STEM-related videos. It has the potential to transform learning and teaching for students with disabilities by integrating synchronized captioned educational videos into undergraduate and graduate STEM disciplines. It also takes full advantage of multiple learning styles, since research shows that students are more likely to learn the materials and to express satisfaction with courses that accommodate their learning styles. The Accessible Educational STEM Videos Project permits us to combine the advantages of captioned/transcribed multimedia with the recognized potency of audio and video to reinforce learning through the capability of re-listening or re-watching content. Multimedia objects with captioning may also assist students whose native language is not English, prior research describes this process in more detail (Zhuhadar 2015; Zhuhadar et al. 2007, 2009a, b, c, 2010, 2015; Zhuhadar and Nasraoui 2008a, b; Zhuhadar and Yang 2013). In 2009, 32.7 % of all US students in the STEM disciplines were foreign nationals (Wasem 2012). Use of subtitles is very common in foreign-language and English as a Second Language classes. Several authors (Cristea et al. 2006; Stewart et al. 2006; Stewart and Pertusa 2004) have noted that native English speakers studying Spanish who viewed films with the audio and subtitles in the same language learned more vocabulary, showed a gain in comprehension, and had higher levels of engagement and satisfaction with the material. Similarly, Hayati and Mohmedi (2011) reported a significant improvement among international students learning English, noting that multimodal delivery “[enhances] comprehension better than simply processing subtitles through silent reading” (p. 190). This line of research suggests that the use of captioned multimedia may also improve learning outcomes among STEM students whose native language is not English. The UVC’s significance will extend from the integration of multimodal delivery of audio and video in STEM courses to the transformation of curricular activities in P-20 education to accommodate students with disabilities.

Rational and Background

In recent years, P-20 education (pre-kindergarten through graduate education) has strived to serve students with disabilities more effectively. Information in Table 1 shows individuals with disabilities who are served by K-12 (kindergarten through secondary school) schools while information in Table 2 shows the relative distribution of students with disabilities in colleges and universities. In 2011–2012, K-12 schools enrolled 6,401,000 students with disabilities, while in 2013, higher education institutions enrolled 2,417,400 students who identified as having a disability.

Table 1 Children 3 to 21 years old served under the Individuals with Disabilities Education Act, part B, by type of disability: school year 2011–2012
Table 2 Major field of study of undergraduates, by disability status: 2012

Several statutes and standards require learning objects to be accessible to students with disabilities. These include the Individuals with Disabilities Education Act. 20 U.S.C. § 1400 et seq. (1990, amended 2004) the Rehabilitation Act (1973), the Americans with Disabilities Act (1990), and the World Wide Web Consortium’s Web Content Accessibility Guidelines (2008). Accessible learning, while often presented as an issue for distance education, is also a barrier for students with disabilities in face-to-face classes. In fact, distance education solutions hold some promise for helping students with disabilities enrolled in traditional face-to-face courses. Technologies used in distance learning courses can also assist non-native speakers struggling with the accessibility of information in coursework. To make online coursework accessible to students with disabilities, colleges and universities provide captioning services, which transcribe text from audio and video files to ensure that those with auditory disabilities access and learn content in courses. Moreover, figures, graphs, and other visual documents are tagged with explanations to ensure that those with auditory of visual impairments know the title and purpose of these images. The transcription of lectures for accessibility of disabled students in online courses could also benefit disabled students enrolled in face-to-face courses. However, transcription is a very labor-intensive and costly endeavor, and it poses an even greater challenge for disabled students participating in traditional face-to-face coursework, where new content is disseminated in lectures, labs, and discussion groups multiple times per week.

A review of the literature on learning shows that simultaneous multimedia experiences represent effective instructional design practices (González et al. 2015).

Metacognition is one important factor in the multimedia design puzzle. Cognitive learning theories emphasize the importance of context, allowing individuals the ability to store, process, and remember learned information in terms of pre-learning, learning, and performance (Tessmer et al. 1997, p. 90). Metacognition is the idea of an individual being cognizant of how he/she is learning. In simple terms, it is “thinking about thinking” and is a prerequisite for higher-order learning (Zohar 1999, p. 418). Given this knowledge, instructional designers and students need to take control of the cognitive process, which requires an understanding of learning styles. These styles are an extension of personality and lead to very different ways of engaging with instructional material (Hauptman and Cohen 2011).

While assistive technology may help students with sensory or learning disabilities, the most important issue is proper pedagogical design (Moallem 2007–2008). As Thomas Klein (1995) pointed out in a much-cited article, “[C]omputers are not a panacea; they cannot rescue a school from weak teachers, a weak curriculum, or the absence of sufficient funding.” Universal design—making courses accessible for all students regardless of disabilities—goes beyond making courses usable and accessible. It involves “creating instructional goals, methods, materials, and assessments that work for everyone—not a single, one-size-fits-all solution but rather flexible approaches that can be customized and adjusted for individual needs” (CAST 2012).

Proper course design from the very beginning with attention to multimodal environments will help all students, not just those identified as having a disability (Moreno and Mayer 2007). As necessary accommodations vary by the type of disability, it can seem overwhelming to faculty members to design courses that meet these diversity of needs. For example, a blind student might be able to hear online videos but will miss the visual details and a deaf student may be able to see the visual details but not listen to the professor’s detailed explanation or elaboration of the visual artifact graphic. Generating simultaneous video captioning that describes the professor’s explanation would be beneficial to this deaf student. However, a combined approach of describing the action and the dialogue will benefit all learners, especially those with visual or hearing impairments. Blind students can use a screen reader to listen to the captioning details, while deaf students will be able to read the transcription. Learning-disabled students will also benefit from the simultaneous multimodal presentation of information and materials helping to reinforce the material, concepts, theories, or research under investigation in the course.

Disabled students vary in their learning styles and in their type of required accommodation. Hearing-impaired students have a significant disadvantage in a traditional lecture classroom because of their inability to listen to formal lectures and explanations provided by the instructor. This is one of the primary motivating factors for online learning among students with hearing impairments as video captured lectures are transcribed for their use (Kisner 2015, p. 12). Likewise, multimedia captioning may also help visually impaired students allowing them to use screen reader technology.

The science, technology, engineering, and mathematics (STEM) disciplines are no exception in their need to serve students with disabilities (Jerome et al. 2008). The United States Federal Government Minority Science and Engineering Improvement Program. 20 U.S.C. § (1067) requires that universities make special efforts to recruit disabled students into these fields. STEM students with disabilities are especially impacted by the challenges they face, particularly given that color, sound, and observation are all important in science base courses and disciplines. For example, only one in four deaf students actually persist to graduation (Marschark et al. 2005). Technology may provide ways to overcome these challenges, particularly for students with sensory and learning difficulties using a diversity of multimodal and instructional design approaches. Given the importance of STEM disciplines to national competiveness and future economic development, the US Government and private and non-profit entities have allocated resources specifically for the programs designed to promote the recruitment, education, and retention of students into STEM fields.

The emergence of distance education has resulted in innovative ways of teaching STEM programs using video capture, reusable multimedia, and interactive simulated labs. There are many advantages to adding captions, or transcriptions, to audio and video files. The capacity of multimedia to reinforce learning through re-listening or re-watching is powerful, and when captions are added to these multimedia files, the learning experience is more enhanced. These multimodal solutions help to make face-to-face, hybrid, and online classes and distance education available to all disabled students (Kidd 2009). Interactive learning that is “highly immersive” allows for the presentation of “three-dimensional objects” to teach relationships and higher-order thinking (p. 2109). This is especially useful in STEM disciplines where spatial intelligence and image processing are required (p. 2115). Students with cognitive-based impairments may also benefit from alternate delivery formats for the delivery of learning material (van Hoorebeek 2009, p. 232).

Use of subtitles is also very common in foreign-language and English as a Second Language (ESL) classes, suggesting that the use of captioned multimedia may also improve learning outcomes among STEM students whose native language is not English at English-speaking colleges and universities. Our global society requires smart cities to accommodate those who come from other cultures and who speak different languages, including those who may know some English (or other languages at the center of instruction), but do not speak said language perfectly. The pre-kindergarten through secondary school -20 education system is no exception. In 2012, according to the National Center for Education Statistics, there were 4,397,318 English language learners in K-12 (kindergarten through the final year of high school, or secondary school) and 966,333 international students in American colleges and universities (National Center for Education Statistics 2015; U.S. Immigration and Customs Enforcement 2014, pp. 2.) The data further reveal that 344,299 (36 %) of international students are studying in the STEM fields (p. 23). Simultaneous multimedia learning may help all of these students.

There have been a number of studies of simultaneous multimedia use among language learners. Stewart and Pertusa (2004) noted that native English speakers studying Spanish who viewed films with the audio and subtitles in the same language learned more vocabulary, showed a gain in comprehension, and had higher levels of engagement and satisfaction with the material. Similarly, Hayati and Mohmedi (2011) reported a significant improvement among international students learning English, noting that multimodal delivery “[enhances] comprehension better than simply processing subtitle through silent reading” (p. 190).

In another student, Aldera and Mohsen (2013) randomly assigned 50 Arabic-speaking learners of English to three groups. One group learned with animation, captions, and keyword annotation (ACA). The second group used animation and captions (AC), while the control group received animation alone as the method of instruction (A). Participants were tested on vocabulary recognition, vocabulary production, listening comprehension, and listening recall. Tests were given immediately after the lessons, and again 4 weeks later. Interestingly, there were differences between the vocabulary and the listening comprehension/recall over time. Regarding vocabulary recognition and production, the ACA group had significantly higher scores than the AC group, whose scores in turn were significantly higher than the A group. However, the effect was reversed for listening comprehension and recall over time. On those tests, A had better scores than AC or ACA (p. 70). This research shows the importance of using multiple formats for the same material and of varying the approach throughout the course.

The importance of mindful (metacognitive) design cannot be overstated. With 12 % of the population identified as having disabilities (NSF 2013), and with international students making up over 4 % of the higher education community (Institute of International Education 2014), the need for well-designed learning objects will only increase. Instructional designers play an important role in ensuring that learning objects are accessible to all. Universal design and metacognitive practices hold the promise of increasing student learning for everyone. Students with sensory, communication, or learning disabilities and those whose native language is not English deserve the opportunity to learn at the same level as everyone else. Simultaneous captioning in multimedia provides the tools to make this happen.

Objectives and Novelty of the Universal Video Captioning Platform

The UVC platform will provide significant assistance to each type of disability listed in Table 2. Subsequently, alternative text such as captions or transcripts expands the students’ understanding of media. Since disabled students vary in their learning styles, some students may learn most effectively through oral means, while others may learn best through visual or textual means. Therefore, the significance of our approach is in providing a combined format where video is captioned and transcribed. This solution is significant because it holds the possibility of reaching students who have traditionally been at risk of not persisting. Research shows that the keys to serving disabled students are to make materials accessible in more than one medium and to ensure that “mental, physical, or other sensory disabilities” are considered when designing the learning experience (Richter and Paretti 2009; Seale 2013).

Why Universal Video Captioning Platform (Accessible STEM-Video)?

Automated closed captioning is considered a hard problem. Currently, there is no single product that provides a complete automated solution to this problem or that can accurately recognize a variety of voices instantly. Table 3 presents a comparison between most existing products and our proposed solution.

Table 3 Comparison between most of the known products and UVC infrastructure

The infusion of video capturing with captions and/or transcripts will immerse students with disabilities in STEM programs through a multimodal presentation of materials so that they can interpret or reinterpret information in a more effective way. Students with disabilities will gain a better understanding of STEM materials through listening, watching, and/or reading, thereby broadening their understanding and deepening their skills. We anticipate that simultaneous multimodal presentations of materials will help disabled learners increase their grade point average and decrease their attrition. Our assumption is based on multiple studies that have found significant increases in grade point average and decreases in attrition among disabled learners who used assistive technology (Gutknecht 2015; Kuo and Kuo 2015; Owings et al. 2015; Raskind and Higgins 1999; Simoncelli 2010).

Proposed UVC Infrastructure Design

The emergence of the Web and new data-intensive computing platforms and technologies, such as WebDAV, Hadoop, MapReduce, Solr, and others, provide the best solutions to handle large amounts of data and the capability of searching for specific resources (Fig. 1). Therefore, we suggest a solution to this challenge where we can manage resources in a timely manner. The proposed infrastructure is a web-based platform that will be deployed on a server, allowing it to be accessed worldwide. It uses the latest WebDAV technology (Web-based Distributed Authoring and Versioning) to identify resources, users, content, etc. The front end will consist of the Solr Search Engine, Flex, and PHP, while the backend will be MySQL and Apache Tomcat servers.

Fig. 1
figure 1

UVC Infrastructure

Components of Platform Infrastructure

The platform infrastructure consists of an administrative management system, a faculty/staff user interface, a transcriber user interface, and a synchronized captioning applet.

The Admin Layer Interface

The purpose of the administration layer (Fig. 2) is to control user access, provide upload and download interfaces, maintain the database system, troubleshoot problems, and generate statistical reports. Finally, the administrative module provides a search engine mechanism to allow searching for a specific course, faculty member, video, caption, or script.

Fig. 2
figure 2

Administration layer

The User Layer Interface

The user layer interface module (Fig. 3) consists of the registration and login access. The uploading dashboard allows users to upload a video, an audio, or a reference to video or audio by using WebDAV technology. A universal identifier is associated with each resource. If the multimedia item is already located on the web, only a reference to that resource will be added to the database. The downloading dashboard lists the current status of each submitted request (download captioned video, download script, and status (“in process” or “completed”)).

Fig. 3
figure 3

User layer interface module

The Transcriber Layer Interface

The transcriber layer (Fig. 4) provides registration and login access, as well as task management. Four dashboards are available for users. The transcribers will receive notices of requested audio or video files that need transcription. When a transcriber receives a notice, this task will be moved to his/her personal interface. The task will be removed from the queue, and the status of the resource will change to “in process.” A separate dashboard will list all of the queued resources on which the transcriber is working. Each transcriber will see a different (adaptive) interface listing the individual specific tasks. We are considering using open access to this platform and using crowdsourcing for transcription in the future. Once the project has been assigned, the transcriber will listen to the audio and transcribe on the platform in real-time. Scripts in progress can be saved as drafts. When the individual transcriber logs into the system again, all assigned and uncompleted work will show up. Once the work has been completed, the captioned video and script will be moved from the transcriber interface to the live user interface and the status of this specific task will change to “completed.” Instead of using a wave pedal as a hardware device, this functionality is provided through the use of arrow keys on the keyboard. An enhanced Web-based captioning Applet (Zhuhadar, L et al. 2015) allows transcribers to generate automatic synchronized captions with time stamps. Furthermore, transcripts with content description will be provided in plain text as additional content.

Fig. 4
figure 4

Transcriber layer

Synchronized Captioning Applet

As a starting point, we plan to develop an enhanced web-based version of our current softwareFootnote 1. We noticed that the software has three drawbacks: (1) it is not web-based and cannot be accessed in real-time over the web; (2) the accuracy of the synchronization between the audio and the script is affected negatively when there are long pauses; and (3) the application only works with specific video formats.

We therefore propose using a JavaScript Applet embedded inside the platform, which can be accessed easily by transcribers. To improve accuracy, we propose using a segmentation process to the audio file before we add the script. We will use short-time energy (ShorttimeEn) to detect long pauses in the audio.

This measure can differentiate silence from speech and is calculated by the following equation [1]:

$$ {\mathrm{Short}}_{\mathrm{time}}\mathrm{E}\mathrm{n}={\displaystyle \sum_{-\infty}^{\infty }}\left[x(m)w{\left(n-m\right)}^2\right]={\displaystyle \sum_{m=n-N+1}^{m=n}}x{(m)}^2 $$
(1)

Where x(m) is audio signals, w(m) is a rectangle window of length equal to N, and n is the index of the ShorttimeEn. When ShorttimeEn drops below a certain threshold, we consider this frame as a pause. After such a pause has been detected, we save its location. Therefore, there will be no captioning during long pauses. This process is fully automated since we are not going to make any physical changes to the original audio.

The speech that is converted to equivalent text in dashboard 2 (transcriber interface) will be selected, long pauses will be identified, text vocabularies will be divided over the length of the audio after extracting long pauses, and vocabularies will be embedded in the correct time frame with no pauses. As we know, the manual captioning is labor intensive. Therefore, this solution will have a significant impact on the system performance.

Research Questions and Evaluation Methods

Our research questions involve the effectiveness of using captioning to improve outcomes, as follows:

  1. 1.

    Does the integration of accessible STEM videos improve study skills for students with disabilities in STEM programs?

  2. 2.

    Does the integration of accessible STEM videos improve student grade point averages for students with disabilities in STEM programs?

  3. 3.

    Does the integration of accessible STEM videos increase student persistence in STEM programs among students with disabilities?

Conclusion and Future Work

We presented a UVC infrastructure that provides a universal online model for video captioning to accommodate the unprecedented growth of video lectures. The UVC platform can serve as a repository for uploading videos and scripts. Captions can be automatically embedded inside educational videos using an enhanced and extended web-based version of our current software, to accommodate a variety of video formats. STEM teachers will be able to use the online UVC platform to upload audio and video and will receive a notification to download the original multimedia with synchronized captions and the transcript.

The proposed platform promises to (1) promote better opportunities for students with disabilities in STEM programs to understand their materials and (2) open new avenues for research on students’ learning styles. The combination of online resources can foster our understanding of how students with disabilities benefit from the online learning and how they collaborate with their teachers.

After the implementation phase, we plan to measure the impact and effectiveness of adding captioning in educational videos on studying, grades, and persistence for individuals with disabilities compared to non-captioned videos (quasi-experiments). This will increase our understanding of students with disabilities in STEM programs and will shed light on their diverse talents and ways of learning. By transforming the online components of STEM education, we hope to help enhance career opportunities for students with disabilities.