Keywords

1 Introduction

For the last two decades, the development of image retrieval technology in business and industry has opened a new dimension to conveniently and efficiently access visual information (Yang et al. 1998; Yuh-Shyan et al. 2004; Datta et al. 2008; Picard et al. 2008). However, the adoption of image retrieval technology in education has lagged behind and still in its prototype stage. Some educators have designed and implemented mobile learning applications with image retrieval technology to help leaners identify the names and types of birds and butterflies (Yuh-Shyan et al. 2004), provide augmented learning contents (Han et al.), or enhance the learning of second language (Starostenko et al. 2009). The evaluation of these projects demonstrates significant results in the learning process and the engagement of learners. The use of image retrieval technology in learning is not limited to some specific areas. It can be widely applied to health, creative arts, design, teaching and learning, data analysis, and many other disciplines. It is regarded as one of the important trends in future mobile teaching and learning. The following section reviewed the literature on mobile learning. Section 3 introduced the latest image retrieval technology and some of its applications to mobile learning. Section 4 discussed the advantages and disadvantages of the application of image retrieval to education. The last section summarized the findings of this chapter and shed a light on the future image retrieval technology in mobile teaching and learning.

2 Literature and Empirical Studies

The mobile telecommunication industry has evolved rapidly in the last decade, with 95% of the global population covered by mobile cellular signals (Zhang 2012a; ITU 2016). The traditional mobile voice communicating service has been expanded into various multimedia services and social communicating services, such as taking and sending image or video, listening music, watching TV, playing games, checking emails, managing personal schedules, and surfing the Internet (see Chap. 2, “Characteristics of Mobile Teaching and Learning”). The growth of the telecommunications not only provides users with a method to communicate but brings significant profits for the value-added services including learning anytime and anywhere (Vogel et al. 2009; Zhang 2012a; Qiu and McDougall 2013). The development of 3G (third-generation networks) and 4G (next-generation cellular wireless access standards) creates new markets and opportunities (Zhang 2012a). New mobile devices and wearable devices, such as Apple Watch, led the market trends, which opened a new area of mobile learning (Hennig 2016) (see Chaps. 2, “Characteristics of Mobile Teaching and Learning and 79, “VR and AR for Future Education”). Combined with virtual reality and augmented reality technologies, they brought new opportunities to mobile education (Alkhezzi and Al-Dousari 2016; Hennig 2016; Yousafzai et al. 2016; Metzgar 2017; Sun and Looi 2017) (see Chaps. 79, “VR and AR for Future Education” and 77, “Augmented Reality in Education”). Nonetheless, surveys of mobile phone users demonstrate that consumers view the benefits of mobile sectors as saving money, saving time, and providing useful information (Evans 2008; Zhang 2012a; Alkhezzi and Al-Dousari 2016; Zidoun et al. 2016).

There is a continuing discussion concerning the advantages and disadvantages of mobile learning compared to traditional face-to-face teaching (Evans 2008; Mishra 2013; Qiu and McDougall 2013; Rennie and Morrison 2013; Alhassan 2016; Yousafzai et al. 2016). There are many challenges for mobile learning, such as small screen size, limited computing capability, short batteries life, high cost for telecommunication services, low bandwidth, reliability of networks connection, the design of learning functions, distraction from learning, and physiological issues (Mishra 2013; Rennie and Morrison 2013; Alkhezzi and Al-Dousari 2016; Hwang and Chang 2016; Yousafzai et al. 2016). With the growth of new technologies and mobile devices, these limitations will be reduced (Hennig 2016). Many designers, educators, and developers have worked together to bring new technologies into universities, high schools, and primary schools (Alley 2009; Cheon et al. 2012; Fraga 2012). In 2016, the education applications were the third most popular category in Apple store and shared 8.55% of all apps (Statista 2016). By 2016, there were more than 130 billion apps downloaded in Apple Store, including more than 11 billion educational apps. They benefited teachers and students all over the world. The growth of touch screens, wearable technologies, and 3D technologies brought opportunities for mobile learning to benefit learners of all abilities (Alkhezzi and Al-Dousari 2016; Hennig 2016; Yousafzai et al. 2016). The adoption of mobile technology in education increased self-learning and lifelong learning (Sharples 2000; Demouy et al. 2015). With the new growth in technologies and industries, the requirements of job markets changed, posing more challenges to universities and educational institutions (Hunt and Zhou 2017). With the focus on personal abilities, technology and environment, and international and communication, mobile learning is playing an important role in the future of education (Zhang 2015; Metzgar 2017).

3 Advanced Image Retrieval Technology

Image retrieval finds images from large databases to meet the requirements set by a user (Smeulders et al. 2000; Datta et al. 2008; Zheng et al. 2017). This technology becomes important and indispensable with the wide use of images, because users need to have efficient access of the visual information in image databases, as well as searching for a specific image. The application of image retrieval in mobile teaching and learning can be intuitively understood, because visual aids such as graphs, diagrams, illustrations, and pictures have become an essential component of modern teaching and learning. And the use of these visual aids is becoming more extensive with the popularity of multimedia tools, wireless communication networks, and mobile computing platforms (Fig. 1).

Fig. 1
figure 1

An illustration of an image retrieval system. (Source: From the author)

Image retrieval systems can be categorized into text-based image retrieval (TBIR) systems or content-based image retrieval (CBIR) systems. These two kinds of systems can be differentiated through the queries they accept. In a retrieval system, a user submits queries to the system via an interface to express the information need. A query can usually be submitted in two different formats. The first format, which is commonly used, is in the form of free text queries. It consists of a small number of keywords or textual description about the images to retrieve. A system working with text-based queries is often called a TBIR system. In such a system, each image in the database has been associated with keywords or textual annotation. The relevance of an image to a given query is measured by the similarity between its textual annotation and the query. In this case, image retrieval that is essentially converted to text retrieval has been well investigated (Roediger and Pyc 2012; Rattanarungrot et al. 2014). The second query format provides an example of image to retrieve. A system accepting such a query is often called CBIR system. In this case, the relevance of an image to a given query is measured by the similarity of visual content of the two images, for example, the similarity of their color, texture, or shape information.

Since its early days, image retrieval has been treated as an application of text retrieval, and an image retrieval system is often developed within database management system. This leads to the TBIR system. Until now, most of the commercial image retrieval systems are still based on TBIR due to the success and usability of the database management and information retrieval techniques. CBIR started attracting attention in the 1990s. As the digital imaging equipment use increases, the volume of image databases becomes increasingly large. As a result, it is time-consuming and labor-intensive for annotators to manually add keywords for each image. Simply using a small number of keywords is difficult to provide an accurate and comprehensive description for an image, which is especially true for the images in a broad domain such as the Internet. Due to the issue of human perception subjectivity, people may have dissimilar descriptions about the same image. As a result, the keywords given by annotators may not be the same as the queries given by users. In this case, it becomes difficult to retrieve relevant images by using text-based image retrieval.

CBIR has been intensively researched during the past two decades (Yuh-Shyan et al. 2004; Picard et al. 2008). CBIR does not need human annotators but instead uses computers to extract visual features to represent an image. The visual features are based on the color, texture, or shape of an image. In this way, each image associates with a set of visual features, which are conceptually comparable to the text annotation used in TBIR. The similarity of two images is evaluated by comparing the associated visual features. Two images having similar visual features are deemed relevant images. CBIR can effectively address the three issues previously mentioned. CBIR suffers from a critical issue called “semantic gap.” As human beings, we describe image content with high-level concepts such as “desk,” “car,” or “airplane.” However, when describing image content, computers only use low-level visual features. The semantic gap leads to two semantically related images not containing similar visual features, and vice versa. In addition, the TBIR and CBIR approaches are not contradictory but complementary to each other. As long as images have been annotated with textual information, the two approaches can be integrated to effectively improve retrieval performance.

The applications of image retrieval can be categorized as narrow-domain-based and broad-domain-based applications. In the former, the images in a database are related to a specific application or restricted to a specific scope, and they often have less diverse image content. For example, the search of medical images, trademark images, or astronomical photos frequently belongs to a narrow-domain-based image retrieval system. Comparatively, the latter case has little or no restrictions on image content, and the images in a database can relate to arbitrary topics, scopes, and applications, resulting in very diverse image content. Searching for images on the Internet is a good example of broad-domain-based retrieval. Both types of image retrieval can find their applications in mobile teaching and learning.

As previously mentioned, TBIR only deals with text-based queries. Comparatively, CBIR can handle more flexible query modes. The most common query mode in CBIR is query by example; users can directly submit to the system an example of the images to be retrieved and can delineate a region of an image to request the system to search for the images having this specific region. In some CBIR systems, users are allowed to submit a line sketch or color composition to search for images (Kumar et al. 2010). In addition to the query modes, advanced image retrieval systems allow users to interact with the systems to improve retrieval performance, which is called “relevance feedback.” The relevance feedback mechanism was originally used in TBIR. In the 1990s, it was introduced into CBIR and has received much attention. Through this mechanism, users can label retrieved images as relevant or irrelevant and feed this evaluation back to the system. By analyzing user’s feedback, the system will refine the retrieval in the next iteration. Making use of this mechanism can effectively improve retrieval performance. This mechanism is an effective means to deal with the notorious “semantic gap” problem by introducing human users into the loop of image retrieval. These flexible query modes and the interactive relevance feedback mechanism can bring benefits to the users of mobile teaching and learning to acquire information and knowledge.

Many image retrieval systems have been developed in the past decades. The query by image content (QBIC) is among the earliest content-based image retrieval systems, developed by IBM in the mid-1990s. This system allows users to find images from a large database to meet their information needs in terms of color, shape, texture, etc. It accepts the queries including example images, sketches and drawings, and designated color or texture patterns. In addition to QBIC, famous pioneering image retrieval systems include the VisualSEEk and WebSEEk systems by Columbia University and Photobook and PicHunter systems developed by MIT. Recently, the developments of new companies focus on image retrieval such as TinEye and Kooaba.

CBIR was initially focused on developing effective visual descriptors to describe image content, for example, its color, texture, or shape information. Typical visual descriptors are found in the MPEG-7 visual standard for content description (Sikora 2001). During the last two decades, with the advance of computer vision and machine learning technology, more sophisticated visual descriptors and retrieval algorithms have been designed, significantly boosting the performance of content-based retrieval. The state-of-the-art retrieval algorithms are built upon the convolutional neural networks developed in the field of deep learning. This approach describes an image with unprecedentedly effective feature representations. Upon these representations, simple Euclidean distance can be used to measure image similarity, achieving groundbreaking image retrieval performance (Razavian et al. 2014).

As previously mentioned, this retrieval approach is built upon deep learning technology. In the past several years, deep learning technology has received intensive attention due to its record-breaking performance in many pattern recognition and machine learning tasks, including image retrieval. Deep learning is realized through deep neural networks or artificial neural networks (ANNs) with many layers. ANNs were intensively researched several decades ago. Its resurgence in the past several years is attributed to the following changes: (1) more powerful computers, (2) larger-scale datasets, and (3) more advanced algorithms to train neural networks. Benefiting from these changes, deep neural networks can now directly learn feature representations from raw images, which are far better than those representations empirically designed via prior knowledge or domain theories. These directly learned feature representations yield significant improvement to image retrieval and change the techniques used in the state-of-the-art retrieval systems. Compared with the previous bag-of-features model commonly used a decade ago, the image retrieval is relatively easier and simpler to use (Fig. 2).

Fig. 2
figure 2

Extracting feature representation for an image by using a pre-trained convolutional neural network. (Source: From the author)

To obtain feature representation, each image is presented to a deep convolutional neural network. This neural network is pre-trained on a large-scale image dataset, for example, the ImageNet dataset collected by researchers at Stanford University (Deng et al. 2009). Because the training dataset contains a large number of images, the obtained neural network can produce feature representations to effectively characterize generic images. Instead of designing features to describe the color, texture, or shape of an image, the state-of-the-art approach employs a pre-trained deep network as a feature extractor. This considerably simplifies the feature extraction process, achieving better image retrieval performance. A typical content-based image retrieval system built upon deep learning technology is illustrated in Fig. 3. Given an image database, each image will be fed into a pre-trained deep convolutional neural network to extract its feature representation. Post-processing operations are conducted to further improve the feature representation. In doing so, each image in the database is characterized by a high-dimensional vector. Once a query image is submitted, its feature representation will be extracted in the same manner. Since each image is now associated with a vector, image retrieval can be carried out by evaluating the similarity of the associated vectors. To speed up retrieval process, an indexing structure is often employed. This deep learning-based retrieval approach is able to achieve excellent retrieval performance. During the past several years, a number of advanced variants of this approach have been developed to improve retrieval accuracy and computational efficiency further.

Fig. 3
figure 3

Illustration of an image retrieval system based on deep learning technology. (Source: From the author)

A successful implementation of this system requires sufficient hardware and software support. This retrieval system consists of a server and a group of clients. The image database, pre-trained deep neural network model, feature representations of the images in a database, and the indexing structure are stored on the server side. When conducting retrieval, a client needs to send the query information to the server and then receive retrieval result. In this case, the computational power of a mobile platform, as a client, and the bandwidth between client and server become critical. When a query image is submitted, a straightforward way is to send this query image to the server and process it there (e.g., extracting feature representation). This leads to low computational requirement for the client and is important for mobile platforms. At the same time, this approach puts excessive requirements on the bandwidth between the server and clients and the computational capability of the server. Another way is to process the query image, as much as possible, on the client side and minimize the information to be sent to the server. Although this can reduce the requirement on bandwidth and remove burden from the server, more computational capability will occur for the client. This becomes an issue for low-end mobile platforms, especially when the latest deep learning-based retrieval approach is used because the extracting feature representation with deep neural networks involves significant computation. The above two ways correspond to the “centralized” and “decentralized” approaches commonly encountered in distributed computing systems. Selecting a specific mobile image retrieval system must be evaluated by way of considering all the factors related to the system.

With the development and popularity of mobile computing platforms, image retrieval has been applied to mobile teaching and learning tasks. The following part shows three examples in the fields of medical practice, outdoor ecology learning, and archival research, respectively. Image plays an important role in medical teaching and training (White et al. 2014; Matzke et al. 2017) inspiring the development of mobile medical image retrieval system called MedSearch Mobile (Duc et al. 2011). It is a mobile search system built upon an existing MedSearch system. This system is a web-based working on a variety of mobile platforms focusing on the iPhone and iPad, as shown in Fig. 4. It retrieves text- and content-based medical images from medical open access literature. Users are allowed to type in free text queries or take pictures with phone’s camera to conduct retrieval. In addition, this system tests different screen layouts to investigate utilizing the display space in the most efficient manner. This is an important issue for mobile image retrieval systems, because mobile computing platforms usually have limited space for users to interact with the system. By addressing potential issues, such a system will be able to provide efficient access to medical information and expect to improve the performance of medical teaching and learning.

Fig. 4
figure 4

The MedSearch mobile system on iPhone and iPad. (Source: Duc et al. 2011)

Content-based image retrieval has been used in outdoor ecology learning (Yuh-Shyan et al. 2004). Taking advantage of wireless transmission technology and handheld devices, a mobile firefly-watching learning system was developed (Yu et al. 2004). This system allows students to take pictures of the firefly in an outdoor environment and transfer the picture to the server side. The picture is matched with images in a database and similar ones are retrieved. By cross-referencing the retrieved images and the captured image, students will be able to identify their commonness and differences on site and access textual information associated with the retrieved images. With such a system, students will have opportunities to perform independent learning and are less constrained by time and place. Fig. 5 shows the interface of this image retrieval system. The left panel displays the query image and the right panel displays the retrieval result.

Fig. 5
figure 5

A mobile firefly-watching learning system. (Source: From Yu et al. 2004)

Image retrieval has been used to retrieve photos from archival photographic collections. Most of existing archival photo search systems are based on text annotation and use text-based image retrieval to find relevant photos. However, with the increasing volume of archival photo collections, text-based retrieval becomes less efficient due to the need of manual annotation and the limited expressive power of keywords. Taking advantage of content-based retrieval techniques, a system was developed to reduce the dependence on text annotation and provide public users with efficient access to visual information to conduct research, teaching, and learning. This system was previously built upon the bag-of-features model for image retrieval, and it has now been upgraded by employing the state-of-the-art deep learning-based approach. Fig. 6 displays a snapshot of this system. When a user is interested in an archival photo, he/she can click the photo. The photo will be displayed at the top as query, and the system will retrieve relevant photos from the database and display them on the screen. In this manner, all the information associated with the retrieved photos could be passed to the user and can be extended to mobile platforms.

Fig. 6
figure 6

A retrieval example of archival photo search system. (Source: From the author )

4 Advantages and Disadvantages

The advantages of image retrieval technology in education are to improve learning efficiency, enhance memory by providing similar learning contents, and engage students in learning. Some prototype products and their evaluation demonstrated the image retrieval technology helped students in their learning and increased their interests in learning and discussion (Yuh-Shyan et al. 2004; Datta et al. 2008). A picture is worth a thousand words (Larive 2008). Ten or twenty similar pictures benefit students more. The searching ability, linked memory (Zhang 2012b), and group discussion enhance not only learning but lead to lifelong learning (Sharples 2000; Mishra 2013).

Although several studies (Yuh-Shyan et al. 2004; Datta et al. 2008; Picard et al. 2008) indicated image retrieval technology in mobile learning had a positive influence on the learning process and positive evaluation in qualitative analysis as discussed above, there are some disadvantages or barriers for this technology that should be taken into consideration for future design and development. Mobile devices, compared to personal computers, have limited computing capability and battery life due to the limitation of their hardware (Rennie and Morrison 2013; Alhassan 2016; Yousafzai et al. 2016). Although the gaps are decreasing, mobile devices still have lower computing capability. Therefore, image retrieval applications should fit into mobile devices by employing better algorithms to speed up the process.

Mobile devices are limited by their screen size and image resolution (Mishra 2013; Alkhezzi and Al-Dousari 2016). Images with details or small-sized words are not suitable for mobile devices. High-resolution images take more time to load and transfer. Small-sized images with simple content are a better fit for mobile devices. The network connection via 3G is costly and not reliable on mobile devices (Zhang 2012b; Alhassan 2016; Yousafzai et al. 2016; Metzgar 2017). Although both fixed broadband prices and mobile broadband prices dropped dramatically from 2013 to 2016, affordability is still the major barrier for mobile adoption in many developing countries (ITU 2016). Some mobile devices make it easier to transfer the Wi-Fi connection to 3G anytime, which increase the costs to the customers. For example, in Australia, a video transfer via 3G connection may cost 200 AUD per hour. Therefore, the image retrieval technology should focus on reducing the size of transferred files and messages.

Mobile device users are willing to learn with smaller time slots instead of watching mobile devices for more than 1 h (Qiu and McDougall 2013; Alkhezzi and Al-Dousari 2016). They are interested in applications with convenient and simple functions, more colors and interactions, or social communication functions. Distraction from other mobile applications is another problem in mobile learning (Peter and Gina 2008; Sana et al. 2013; Alhassan 2016). A well-designed mobile application should meet these requirements (Hennig 2016).

5 Conclusion

Mobile teaching and learning is a growing trend in higher education. The advanced image retrieval technology can add value to this field due to the wide use of image and video in teaching and learning. With the development of image retrieval technology in the last two decades, image retrieval is not a simple extension of text retrieval anymore but has opened a new dimension for conveniently and efficiently accessing visual information. This is important for mobile teaching and learning advocates who expect free, flexible, and efficient ways to acquire knowledge. The extensive use of image retrieval technology in areas related to mobile teaching and learning can be expected in the very near future.

At the same time, there are still a variety of issues needing to be resolved to make mobile image retrieval more reliable and efficient. One is related to the computational capability of mobile computing platforms. With the increasing application of mobile image retrieval systems, more sophisticated image-processing algorithms and graphical user interfaces could be implemented at the client side leading to more computational overhead and memory and storage usage. In this case, it may not be effective to pursue high-end mobile platforms but, instead, design better systems and create algorithms and communication protocols to make mobile image retrieval a cost-effective system. This is important for making mobile teaching and learning affordable for everyone in the era of big data and deep learning.

Another critical issue is the development of image retrieval technology. Although content-based image retrieval has made significant progress, its performance still needs more research and development. The effectiveness of image retrieval depends on the development of image understanding, a central issue in computer science and artificial intelligence. As previously described, this field has witnessed increased growth from the traditional bag-of-features model to the more advanced deep learning model. The latter has achieved remarkable performance in the area of image understanding and still has significant potential to be exploited. The advance of image retrieval technology will not only expand the scope of its applications to mobile teaching and learning but help improve the quality of these applications.

6 Cross-References