1 Introduction

In the state-of-the-art methods for (large) image transmission, in most cases, a user has to passively wait till completed transmission to see the entire image, and no user interaction behaviors (e.g., user tapping, etc.) are considered to affect the transmission performance (e.g., higher image transmission efficiency with relatively poor image quality). Image transmission in a resource-constraint mobile wireless network (MWN) usually have three main characteristics: 1) Mobility of users: most users using the MWN are constantly moving which means the spatial position of each user varies with time; 2) Lower processing power: the processing and battery capacities of most mobile client devices (e.g., cell phone, PDA, etc) are very limited, which motivates us to devise an energy-efficient technique to support the image transmission with lower costs; 3) Instability and heterogeneity of the MWN: the bandwidths in the MWN are often unstable, that means, some nodes may be down or connected intermittently to the network. The bandwidth between any two nodes in the MWN may be different according to the variance of time.

For the transmission and browsing of large images in MWN, the network transmission cost of such images will consume a large percentage of the overall interaction time [19]. So, the reduction of the transmission cost, especially in MWN, is critical to the transmission performance improvement. Since an image usually contains some salient objects with corresponding regions called R egion of S alient O bjects (RSO), the total image data size can be effectively reduced if the pixel resolutions of the non-RSO region can be moderately reduced based on the network bandwidth. This will not affect the user experience in watching, but the total transmission cost can often be decreased significantly (i.e., higher image transmission efficiency with inferior image quality in unimportant regions). Moreover, in the state-of-the-art methods, no human interactions (e.g., tapping the receiver screen) are considered during large image transmission processing, which may cause a user to wait for a long time if the current network bandwidth is limited and unstable, especially in MWN environment. So the user experiences are bad in this case since he is anxious to watch the image. If the pixel resolutions of non-RSO replicas can be interactively adjusted by the receiver users during the transmission processing, not only the transmission performance is improved, but also the user experiences can be significantly enhanced. Based on the above motivations, we propose a content-based and bandwidth-aware Interactive Image Transmission (IIT) scheme that not only considers the image content and the network bandwidth, but also user interactions. For an active image transmission processing, exploring user interactions to optimize the transmission processing and improve the user experiences is a new research topic, which has received little attention so far.

The basic idea behind the transmission scheme is that given a transmission image I S , its corresponding RSOs are first automatically detected by the approach of Girshick et al. [11]. As the RSOs are main contents of the image, they are critically important to the IIT processing in which their pixel resolutions are kept original. However, for the rest of the image (i.e., non-RSO), both its pixel resolution and transmission priority are lower than that of the RSOs so that the key part of the image can be transmitted and displayed first. Note that, the optimal resolution of the non-RSOs can be derived based on factors such as current network bandwidth, corresponding size, user interactions, etc. Then, the IIT processing of I S becomes an active transmission of the image blocks (IB) in I S that has been partitioned in the preprocessing step (ref. Section 3.2). The IB replicas with different resolutions and transmission priorities are stored at the slave node level (N L ). During the transmission processing, in case of unsatisfied transmission efficiency, the user’s anxiety to watch the image can be implicitly indicated by the frequency of tapping the receiver’s screen. Then, the IB replicas with lower pixel resolutions are chosen as candidate ones for transmission, reconstruction, and display at the receiver node. For those IBs with lower resolutions, the user can also request new IB replicas with higher resolutions by tapping the corresponding region on the screen.

The challenges of designing such an innovative high performance interactive image transmission method include the five main aspects:

  1. 1)

    Applying user interaction (i.e., screen tapping) to image transmission: as user interaction (i.e., tapping the screen) means the user hopes the image transmission can be completed earlier, the problem is how to establish a relationship between the frequency of screen tapping and the resolution of non-RSO part.

  2. 2)

    High computation cost in image transmission: most images are characterized by high pixel resolution, high dimension, and large scale. So the transmission cost of such images is very high.

  3. 3)

    Mobility of MWN users: as most users in the MWN are often moving which means the spatial position of each user varies with time, how to perform an optimal data placement is also a challenging issue.

  4. 4)

    Resource-Constraint MWN: the power capacities of the mobile devices are very limited. The display resolutions of such mobile devices are often low. Furthermore, the bandwidth in the MWN is limited, how to transmit such a large image in the resource-constraint MWN is challenging.

  5. 5)

    Instability and heterogeneity of the MWN: the nodes in the MWN are often instable, that means, some nodes may be down or connected intermittently to the network. The bandwidth between any two nodes in the MWN varies with time. There is no guarantee that the total response time of each transmission will be similar.

To address the above challenges, we propose two enabling techniques in the IIT, i.e., a multi-resolution-based interactive RIB replica selection scheme and an optimal IB replica placement scheme. We have implemented the IIT method and extensive experiments indicate that our approach is specifically suitable for large image transmission in a relatively low network bandwidth with much enhanced user interaction experiences. Our contributions can be summarized as follows:

  • We introduce a framework of an interactive image transmission scheme in the resource-constraint mobile network (IIT).

  • We propose an interactive user model to support the IIT processing.

  • We present a multi-resolution-based interactive RIB replica selection scheme to adaptively reduce the communication cost in the MWN environment and improve user experiences.

  • We design an optimal IB replica placement scheme to adaptively reduce the storage cost.

The remainder of the paper is organized as follows. Section 2 reviews related techniques and Section 3 presents preliminary definitions and preprocessing step. After that, to effectively and efficiently facilitate the interactive image transmission processing, we present two enabling techniques, i.e., a multi-resolution-based interactive RIB replica selection scheme and an optimal IB replicas placement scheme in Sections 4 and 5, respectively. In Section 6, we propose an interactive image transmission scheme called the IIT. In Section 7, we perform comprehensive experiments to evaluate the efficiency of our proposed approach before we conclude the paper in Section 8.

2 Related work

In this section, we review key related works on image data transmission techniques. Image data transmission techniques have been studied for about 20 years [1, 8, 16, 18, 19, 21]. The state-of-the-art methods can be mainly divided into two categories: 1) improvement design of transmission protocol [8, 9, 16, 19, 2123]; and 2) image data encoding and compression [37, 9, 17, 18, 22, 26].

Charles et al. [8] first proposed a wireless image data transmission method from end to end, and provided experimental analysis. As images are usually transmitted across the Internet using a lossless protocol such as TCP/IP, lossless protocols require retransmission of lost packets, which substantially increases transmission time. John et al. [16] presented a fast lossy Internet image transmission scheme (FLIIT) for compressed images which eliminates retransmission delays by shielding important portions of the image with redundancy bits. They described a joint source and channel coding algorithm for images which minimizes the expected distortion of transmitted images. After that, Raman et al. [21] proposed an image transmission protocol called ITP. Comparing with the traditional TCP protocol, the ITP is more suitable for image data transmission. Gao et al. [9] presented a robust image transmission scheme for wireless channels based on compressive sensing. Due to the high packet error rates and the need for retransmission, Aziz et al. [23] has designed a novel architecture for energy efficient image processing and communication over wireless sensor networks. Recently, Maani et al. [19] introduced a parallelism to provide an efficient method of medical image transmission based on parallel TCP connection.

Since most image transmission methods use the same pixel interpolation scheme for the entire picture, without considering the differences in different parts, Chang et al. [7] presented a progressive image transmission (PIT) scheme which transmits the most significant part of a picture, followed by less important parts. Lin et al. [18] presented a compound image compression algorithm for real-time applications of computer screen image transmission called Shape Primitive Extraction and Coding (SPEC). Ruiz et al. [22] designed an image compression algorithm to support progressive image transmission. Available Pit mechanisms and systems can be categorized into spatial domain [5], and pyramid-structured progressive transmission [17]. In the transform domain, an image undergoes block compression and the transformed coefficients are transmitted progressively in a relative importance order (e.g., Progressive JPEG). Alternatively, a germinal and instinctive method for progressive image transmission in the spatial domain is the Bit Plane Method (BPM) [4, 26]. In this method, the final transmitted image is the same as the original. However, its high transmission bit rate is a major disadvantage of BPM.

Due to the drawback of BPM, lossy PIT techniques have received more attention. To provide a fast PIT scheme, Chang et al. [6] improved the BPM method by color guessing called the guessing by neighbors (GBN), which uses interleaved pixels for transmission. Fifty percent of the pixels are transmitted while the other 50 % were “guessed”. Sun et al. [24] proposed a progressive image transmission system over wireless channels by combining joint source-channel coding (JSCC), space-time coding, and orthogonal frequency division multiplexing (OFDM). Based on the Reed-Solomon coding scheme, Boluk et al. [3] proposed a robust image transmission over wireless sensor networks. Victor et al. [27] devised a 3-D scalable image compression method with optimized volume of interest coding. Arslan et al. [2] has proposed a generalized unequal error protection LT codes for progressive data transmission. Although Hu et al. [12] introduced an attention model based progressive image transmission method in which the region of interest (ROI) in an image can be transmitted with high-resolution in priority, the human interaction can not be allowed to affect the transmission efficiency. Wu and Abouzeid [28] devised a power aware image transmission in energy constrained wireless networks. Gelogo and Kim [10] reviewed the compressed images transmission issues and solutions. Zhuang et al. [30] have proposed a content-aware and multi-resolution-based image transmission scheme in which only two factors, i.e., the image content and the network bandwidth are considered to optimize the transmission processing. Recently, Xua et al. [29] proposed an adaptive FEC coding and cooperative relayed wireless image transmission.

Different from the above state-of-the-art methods, the paper proposes an interactive image data transmission method in which the three factors, i.e., the receiver’s interaction behaviors, the image content, and current network bandwidth are holistically analyzed to obtain an optimal transmission pixel resolution. To the best of our knowledge, this is the first attempt to improve the transmission performance from the perspectives of the user interactions and the image contents.

3 Preliminaries and preprocessing

3.1 Preliminaries

The list of symbols to be used throughout the rest of this paper is summarized in Table 1.

Table 1 Meaning of symbols used

Definition 1. A mobile wireless network (MWN ) is a graph which is represented by a triplet:

$$ MWN:=<N,E,T> $$
(1)

where N refers to the set of nodes, E refers to a set of edges representing the network bandwidths for data transmission at time T.

In Fig. 1, due to the instability and heterogeneity of the MWN environment, the bandwidth of any two nodes in MWN may be different and vary with time. In addition, the data transmission distance in the mobile network is limited.

Fig. 1
figure 1

Three-layer architecture in a MWN

Definition 2. The nodes in the MWN can be logically divided into three categories: the sender node (N S ), the slave node (N L ), and the receiver node (N R ), formally denoted as N = N S N L N R , where

  • N S is responsible for obtaining an optimal IB transmission pixel resolution based on the collection and analysis of the network bandwidth and user interaction behaviors;

  • N L is responsible for: 1) storing the IB replicas with different pixel resolution and transmission priorities, and 2) sending the IB replicas to the receiver;

  • N R is responsible for: 1) receiving, reconstructing, and displaying the images; 2) sending user interactions (e.g., tapping the screen) to N S .

For each image, in most cases, there exist some salient objects that users are interested in. The regions of such salient objects are called region of salient objects (RSO).

Definition 3. A region of salient object (RSO) in an image can be modeled by a four-tuple:

$$ RS{O}_i:=<i,S,pos,dpi> $$
(2)

where i is the ID number of the RSO, S is the area value of the RSO, pos is the coordinate position of the RSO in the image, and dpi refers to the dots per inch(dpi) for the RSO.

Definition 4. A non-RSO part of an image, denoted as R, can be modeled by a two-tuple:

$$ R:=<S,dpi> $$
(3)

where S is the area value of the R, dpi refers to the dots per inch for the R.

Based on Definitions 3 and 4, in Fig. 2, there exists one RSO (e.g., RSO 1) and one non-RSO (e.g., R) in the image. The corresponding RSO area can be preliminarily detected by the approach proposed by Girshick et al. [11] and drew by a black rectangle line.

Fig. 2
figure 2

One RSO in an image

As mentioned before, for an image, it can be equally partitioned into blocks called image blocks.

Definition 5 (Image Block). Given an image block IB i , it can be modeled by a five-tuple:

$$ I{B}_i:=<i,S,pos,TP,dpi> $$
(4)

where i refers to the ID number of the IB,S is the area value of the IB, pos is the coordinate position of the IB in the image, TP refers to the transmission priority of the IB (TP[0,1]), and dpi means the corresponding pixel resolution of the IB;

Note that, there are two kinds of IBs (i.e., SIB and RIB) in terms of their positions in the image, which are defined below:

Definition 6 (SIB). A SIB is an image block which intersects with a RSO or is contained by a RSO, formally represented as:

$$ SIB=\left\{I{B}_i\Big|I{B}_i\cap RS{O}_j\ne \varnothing \right\} $$
(5)

where i∈[1,α + β] and j∈[1,|RSO|].

Definition 7 (RIB). A RIB is an image block contained by the non-RSO part(R) of an image, formally denoted by:

$$ RIB=\left\{I{B}_i\left|I{B}_i\cap R=I{B}_i\right.\right\} $$
(6)

where i∈[1,α + β].

Based on Definitions 6 and 7, also in Fig. 2, the RSO has 16(4 × 4) SIBs (i.e., IB 23, IB 24, IB 25, IB 26, IB 33, IB 34, IB 35, IB 36, IB 43, IB 44, IB 45, IB 46, IB 53, IB 54, IB 55 and IB 56). The rest of the IBs are RIBs.

Our proposed IIT method aims at transmitting a large image adaptively and efficiently in a limited network bandwidth and a user-preferred time (T θ ) that is dynamically set through the user interactions (e.g., tapping the receiver screen) during the transmission processing.

3.2 Preprocessing

As mentioned above, in the preprocessing step of the IIT, for each image I i in Ω, it can be physically and equally partitioned into several IBs. The corresponding SIBs and RIBs are placed at the slave node level with different pixel resolutions and transmission priorities. Algorithm 1 details the initial IB replicas placement processing.

figure f

Generally speaking, in most of the state-of-the-art image data transmission schemes, an image is transferred as a whole object in which the transmission priorities of all IBs are equal. Thus, the important RSOs in the image are often displayed later than the rest of the image. For images with high pixel resolutions, such methods further leads to the increase of transmission failure rate. Once transmission failure occurs, the image needs to be re-transmitted, resulting in an even higher transmission overhead. To overcome this technical bottleneck, we propose a T ransmission P riority (TP) Assignment for IB replicas scheme called TPA to support the successive and robust transmission of the large image data. The TP of each IB is defined in Eq. (7).

$$ I{B}_i.TP=\left\{\begin{array}{l}1,\kern1.4em if\;I{B}_i\cap RS{O}_j=I{B}_i\\ {}\;0.5,\kern0.5em if\ I{B}_i\cap RS{O}_j\ne \phi \\ {}0,\kern1.5em Otherwise\end{array}\right. $$
(7)

According to the different TPs of the IBs, the IBs can be transmitted in terms of their corresponding TPs in a descending order, which not only ensure the robustness of data transmission but guarantee that the important information can be transmitted in advance.

4 Multi-resolution-based interactive RIB replica selection scheme

As mentioned before, high pixel resolutions of digital images usually lead to a large data size accordingly. It is non-trivial to transmit an image of such a big size to the receiver nodes directly, especially in a resource-constraint mobile network environment in which the network bandwidth is limited and unstable.

Based on the above analysis, in this section, we propose a Multi-resolution-based Interactive image data Replica selection scheme (MIR) by uniformly analyzing the relationship of the image content, network bandwidth, and human interactions.

4.1 Modeling user interaction behaviors

To facilitate the interactive image transmission processing, it is critical to modeling the user’s interactive behaviors. Specifically, for a receiver user U R , his anxiety can be modeled by the times of tapping the screen during a certain short time interval (ΔT). So in this subsection, we first give a definition of the user anxiety degree (UAD).

Definition 8 (User Tapping). A user tapping (UT) can be modeled by a triple-tuple:

$$ U{T}_i:=<i,UID,T> $$
(8)

where UID means the user ID and T refers to the time when the tapping is performed.

Definition 9 (User Anxiety Degree). For a user U i , his UAD can be defined as the times for tapping on the screen during a short time interval (i.e., ΔT), formally represented as:

$$ UA{D}_i={\displaystyle \sum_{U{T}_i.T\in \varDelta T}1} $$
(9)

where UT i .T means the time when the i-th user tapping is performed and \( \varDelta T\le {T}_{\theta}^{Max} \) . Footnote 1

Based on Definition 9, the larger the UAD is, more anxious the user is. Next, we establish the relationship between a transmission deadline (T θ ) and the UAD. Suppose that the image transmission processing can be finished in T θ that is dynamically adjusted based on the users tapping the screen, so for T θ , it can be represented below:

$$ {T}_{\theta }={T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{\delta }.k $$
(10)

where \( {T}_{\theta}\in \left[{T}_{\theta}^{Min},{T}_{\theta}^{Max}\right] \) (see Fig. 3), \( {T}_{\theta}^{Min} \) is a minimal time threshold (i.e., \( {T}_{\theta}^{Min}={T}_0+\frac{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)\cdot Bit\cdot }CR}{E_j\cdot TR} \)), \( {T}_{\theta}^{Max} \) is a maximal one (i.e., \( {T}_{\theta}^{Max}={T}_0+\frac{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2\Big)\cdot Bit\cdot }CR}{E_j\cdot TR} \)). δ is a granularity value that can be tuned by users and k ∈ [1, δ].

Fig. 3
figure 3

Selection of ΔT

Based on Eq.(10), more anxious the user is, the smaller T θ is. Since the user anxiety can be measured by the UAD defined in Definition 9. Moreover, k is proportional to T θ in Eq.(10). Therefore, the larger the user’s UAD is, the smaller k is. Based on this observation, the relationship between k and UAD can be approximately represented in Fig. 4.

Fig. 4
figure 4

k and UAD

With the increase of the UAD, k is decreasing gradually. k and UAD can be approximately modeled by Eq.(11):

$$ k=\left\lceil \frac{\delta }{UAD+1}\right\rceil $$
(11)

Based on Eqs. (1011), T θ can be derived in Eq. (12):

$$ \begin{array}{l}{T}_{\theta }={T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{\delta}\cdot \left\lceil \frac{\delta }{UAD+1}\right\rceil \\ {}\kern0.75em \approx {T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{UAD+1}\end{array} $$
(12)

4.2 Optimal ID of RIB replica

The basic idea of the MIR method is that for a certain image, its transmission pixel resolution needs to be adjusted based on the variance of the network bandwidth. Specifically, with higher network bandwidth, a higher-resolution image can be transferred in a reasonable short period of time (T θ ) (ref. Eq.(12)), where T θ can be adjusted by a user interactively during the transmission processing. On the contrary, in order to get a shorter response time, a lower-resolution version of the same image can be sent to the receiver node with lower network bandwidth.

Although reducing the pixel resolution of the whole image can reduce the transmission overhead, some salient objects (i.e., RSO), however, cannot be clearly viewed. Therefore, we adjust the resolutions of the non-RSO part in the image just moderately based on the network bandwidth and the user interactions. Compared with the non-RSO part(R) of the image, the RSOs is best displayed with the original pixel resolution. For the non-RSO area, as shown in Fig. 5a, it can be equally partitioned into some RIBs by yellow dash lines.

Fig. 5
figure 5

Comparison of the pixel resolutions of a same image

Based on the above analysis, the objective of our method is to get a trade-off between the quality of image and the transmission cost under different resolutions and available network bandwidth.

Suppose that the image transmission processing can be finished in T θ set by user, so we have:

$$ {T}_T={T}_O+\frac{Size\left({I}_i\right)}{BWidth\left({E}_j\right)}\le {T}_{\theta } $$
(13)

where

  • Size(I S ) is the data size of I S , represented as: \( Size\left({I}_S\right)=\left({\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2}\right)\cdot Bit\cdot CR \), where Bit means color bit, and Bit can be 8, 16, or 24, CR is an image compression ratio and CR∈[0,1];

  • T 0 is the start-up transmission time;

  • BWidth(E j ) is a real network bandwidth, denoted as BWidth(E j ) = E j  ⋅ TR, where E j is a theoretical network bandwidth, TR means a attenuation rate of the bandwidth, and TR∈[0,1];

  • T θ is a time threshold and \( {T}_{\theta}\in \left[{T}_{\theta}^{Min},{T}_{\theta}^{Max}\right] \), where the definitions of \( {T}_{\theta}^{Min} \) and \( {T}_{\theta}^{Max} \) are the same to Eq.(10).

Based on Eq.(12), Eq. (13) can be rewritten as follows:

$$ {T}_O+\frac{\left({\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2}\right)\cdot Bit\cdot CR}{E_j\cdot TR}\le {T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{\delta}\cdot \left\lceil \frac{\delta }{UAD+1}\right\rceil $$
(14)

Solving Eq. (14), it can be derived below:

$$ R.dpi\le \sqrt{\frac{\frac{\left({T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{\delta}\cdot \left\lceil \frac{\delta }{UAD+1}\right\rceil -{T}_O\right)\cdot {E}_j\cdot TR}{Bit\cdot CR}-{\displaystyle \sum_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)}}{R.S}} $$
(15)

To obtain a relatively high resolution of the non-RSO part, its pixel resolution can be approximately represented in Eq. (16).

$$ R.dpi\approx \sqrt{\frac{\frac{\left({T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{\delta}\cdot \left\lceil \frac{\delta }{UAD+1}\right\rceil -{T}_O\right)\cdot {E}_j\cdot TR}{Bit\cdot CR}-{\displaystyle \sum_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)}}{R.S}} $$
(16)

Next, we study how to choose an optimal ID of each IB replica among its Δ replicas. For the non-RSO part of the image, as the lower and upper bound of the dpi (i.e., D L , D U ), the ID number(i) of the non-RSO(R) replica, and the granularity (Δ) are met in Eq.(17).

$$ \begin{array}{l}\varDelta \left({E}_U\right)=\left\lceil \frac{D_U-{D}_L}{\sqrt{\frac{\left({\theta}_T-{T}_0\right)\cdot {E}_U\cdot TR}{Bit\cdot CR\cdot \left({\displaystyle {\sum}_{i=1}^{\left|\varOmega \right|}{\displaystyle {\sum}_{j=1}^{\alpha_i}SI{B}_{ij}.S+R.S}}\right)}-{D}_U}\;}\right\rceil \hfill \\ {}\varDelta \left({E}_L\right)=\left\lceil \frac{D_U-{D}_L}{\sqrt{\frac{\left({\theta}_T-{T}_0\right)\cdot {E}_L\cdot TR}{Bit\cdot CR\cdot \left({\displaystyle {\sum}_{i=1}^{\left|\varOmega \right|}{\displaystyle {\sum}_{j=1}^{\alpha_i}SI{B}_{ij}.S+R.S}}\right)}-{D}_L}\;}\right\rceil \hfill \end{array} $$
(17)

where Δ is a granularity value.

As i is an integer, solving Eq.(17), we have:

$$ i=\left\lceil \frac{\left(R.dpi-{D}_L\right)\cdot \varDelta }{D_U-{D}_L}\right\rceil $$
(18)

Combing Eq. (18) with Eq. (16), the replica ID for the non-RSO part can be derived as:

$$ i=\left\lceil \left(\sqrt{\frac{\frac{\left({T}_{\theta}^{Min}+\frac{T_{\theta}^{Max}-{T}_{\theta}^{Min}}{\delta}\cdot \left\lceil \frac{\delta }{UAD+1}\right\rceil -{T}_O\right)\cdot {E}_j\cdot TR}{Bit\cdot CR}-{\displaystyle \sum_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)}}{R.S}}-{D}_L\right)\cdot \frac{\varDelta }{D_U-{D}_L}\right\rceil $$
(19)

Algorithm 2 summarizes a dynamic optimal RIB replica selection process in which two cases are considered, i.e., with user tapping (lines 7–10) and without user tapping (lines 2–5).

figure g

4.3 Optimal Δ

Since the value of Δ can affect the storage and maintenance costs for the replicas to some extent. To obtain an optimal Δ, in this subsection, we proceed to derive an optimal Δ.

Specifically, let us first denote the average resolution of the image I S as dpi ∈ [D L , D U ], where D L and D U denote the lower and upper bound dpi of I S , respectively. For description simplicity, we use image fragment (IF) to represent the RSOs and R in the image. Since the IFs are composed of some IBs, so the pixel resolution of the IB replicas can be obtained by that of the IFs. Therefore, the data sizes of the IFs and their corresponding pixel resolutions can be approximately met in Eq.(20):

$$ {\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2=}\left({\displaystyle {\sum}_{i=1}^{\left|RSO\right|}RS{O}_i.S+R.S}\right)\cdot dp{i}^2 $$
(20)

where R.S means the area of the non-RSO region.

Solving Eq.(20), the average dpi of the whole image can be derived in Eq. (21):

$$ dpi=\sqrt{\frac{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2}}{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}RS{O}_i.S+R.S}}} $$
(21)

In addition, the bandwidth of the j-th edge is defined as: E j  ∈ [E L , E U ], where E L and E U are the lower and upper bound of the bandwidth of the j-th edge, respectively. Note that, the above bandwidth is a theoretical value that is larger than the actual one. For the current network bandwidth E j , we have \( {E}_j\in \left[{E}_L+\frac{\left(i-1\right)\cdot \left({E}_U-{E}_L\right)}{\varDelta },{E}_L+\frac{i\cdot \left({E}_U-{E}_L\right)}{\varDelta}\right] \). Since i is an integer, so \( i=\left\lceil \frac{\left({E}_j-{E}_L\right)\cdot \varDelta }{E_U-{E}_L}+1\right\rceil \).

As illustrated in Fig. 6, based on an assumption that given a transmission time deadline T θ, when the mobile network can’t provide enough bandwidth, the image transmission processing may not be efficiently and successfully completed in T θ . On the contrary, when the network bandwidth is sufficient enough, the average dpi of an image is proportional to the network bandwidth (E j ), the corresponding dpi of I S under the current network bandwidth (E j ) can be derived as follows:

$$ dpi={D}_L+\frac{i\cdot \left({D}_U-{D}_L\right)}{\varDelta } $$
(22)
$$ dpi={D}_L+\left\lceil \frac{\left({E}_j-{E}_L\right)\cdot \varDelta }{E_U-{E}_L}+1\right\rceil \cdot \frac{D_U-{D}_L}{\varDelta } $$
(23)
Fig. 6
figure 6

Relationship between E j and dpi

where i∈[1, Δ].

In Eq. (23), a whole image is regarded as an object to be processed. The pixel resolution of the whole image can be adjusted according to the variance of the network bandwidth. This method, however, may decrease the pixel resolution of the RSOs so much that the user cannot clearly view it. Therefore, in the preprocessing step, as shown in Fig. 5a, the RSOs in the image are firstly identified by the red solid line rectangles automatically [11], i.e., RSO 1. Figure 5b shows that the resolutions of the RSO in Fig. 5a and b are fixed, and the resolution of the rest part (R) in Fig. 5b, however, is decreased significantly.

Combing Eqs. (20) and (23), we have,

$$ \sqrt{\frac{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2}}{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}RS{O}_i.S+R.S}}}={D}_L+\left\lceil \frac{\left({E}_j-{E}_L\right)\cdot \varDelta }{E_U-{E}_L}+1\right\rceil \cdot \frac{D_U-{D}_L}{\varDelta } $$
(24)

Δ can be approximately derived by solving Eq.(24):

$$ \varDelta =\left\lceil \frac{D_U-{D}_L}{\sqrt{\frac{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}\left(RS{O}_i.S\cdot RS{O}_i.dp{i}^2\right)+R.S\cdot R.dp{i}^2}}{{\displaystyle {\sum}_{i=1}^{\left|RSO\right|}RS{O}_i.S+R.S}}}-{D}_L-\left\lceil \frac{\left({E}_j-{E}_L\right)\cdot \left({D}_U-{D}_L\right)}{E_U-{E}_L}\right\rceil}\right\rceil $$
(25)

To obtain an optimal Δ, Δ should be minimized such that the storage cost of IB replicas is minimal. So, let RSO i .dpi = R.dpi = dpi, we have:

$$ {\varDelta}_{opt}=\left\lceil \frac{D_U-{D}_L}{dpi-{D}_L-\left\lceil \frac{\left({E}_j-{E}_L\right)\cdot \left({D}_U-{D}_L\right)}{E_U-{E}_L}\right\rceil}\right\rceil $$
(26)

From a theoretical perspective, the network bandwidth (i.e., E j ) varies in all bandwidth (i.e., [E L, E U]) ranging from E L = 10 MB/S to E U = 100 MB/S. In most real-life applications, however, E j is relatively stable (i.e., varies in a small range \( \left[{E}_L^{\prime },{E}_U^{\prime}\right] \), where \( {E}_L^{\prime}\ge {E}_L \) and \( {E}_U^{\prime}\le {E}_U \)). Then,

$$ {\varDelta}_{opt}\in \left[\left\lceil \frac{D_U-{D}_L}{dpi-{D}_L-\left\lceil \frac{\left({E}_L-{E}_L\right)\cdot \left({D}_U-{D}_L\right)}{E_U-{E}_L}\right\rceil}\right\rceil, \left\lceil \frac{D_U-{D}_L}{dpi-{D}_L-\left\lceil \frac{\left({E}_U-{E}_L\right)\cdot \left({D}_U-{D}_L\right)}{E_U-{E}_L}\right\rceil}\right\rceil \right] $$
(27)

Based on Eq.(27), the optimal Δ can be approximately represented as:

$$ {\varDelta}_{opt}\approx \left\lceil \frac{D_U-{D}_L}{dpi-{D}_L-\left\lceil \frac{\left({E}_U-{E}_L\right)\cdot \left({D}_U-{D}_L\right)}{E_U-{E}_L}\right\rceil}\right\rceil $$
(28)

5 Optimal IB replicas placement scheme

As data placement and storage is critically important for the data transmission, so in this section, to better facilitate the IIT processing, we propose an optimal IB replicas placement scheme based on the available network bandwidth.

For example, in Fig. 7, assume that there are three RSOs (e.g., RSO 1, RSO 2, and RSO 3) represented by the blue rectangles in an image. The image is first equally partitioned into 56 IBs in which the 9 IBs, 9 IBs, 6 IBs and 32 IBs belong to the RSO 1, RSO 2, RSO 3, and R, respectively. The SIBs are stored with the original resolution and the RIBs are stored with different pixel resolutions based on Δ.

Fig. 7
figure 7

IB replicas placement at the slave node level

Based on Eq.(25), different network bandwidths (i.e., E j  ∈ [E L , E U ]) correspond to different granularities. The initial granularity (Δ ini ) can be represented by Eq.(29):

$$ {\varDelta}_{ini}= \arg \max \left\{\varDelta \left({E}_L\right),\varDelta \left({E}_U\right)\right\} $$
(29)

where \( \varDelta \left({E}_U\right)=\left\lceil \frac{D_U={D}_L}{\sqrt{\frac{\left({\theta}_T-{T}_O\right)\cdot {E}_U\cdot TR}{Bit\cdot CR\cdot \left({\displaystyle {\sum}_{i=1}^{\left|\varOmega \right|}{\displaystyle {\sum}_{j=1}^{\alpha_i}SI{B}_{ij}.S+R.S}}\right)}} - {D}_U}\right\rceil \), and \( \varDelta \left({E}_L\right)=\left\lceil \frac{D_U={D}_L}{\sqrt{\frac{\left({\theta}_T-{T}_O\right)\cdot {E}_U\cdot TR}{Bit\cdot CR\cdot \left({\displaystyle {\sum}_{i=1}^{\left|\varOmega \right|}{\displaystyle {\sum}_{j=1}^{\alpha_i}SI{B}_{ij}.S+R.S}}\right)}} - {D}_L}\right\rceil \).

Since the pixel resolutions of the SIBs are fixed, next we focus on the study of the optimal granularity value for the RIB replicas. In Algorithm 1, the initial granularity value of the RIB replicas (Δ ini ) is designed for all bandwidth which ranges from E L = 10 MB/S to E U = 100 MB/S. The increase of Δ ini , however, leads to a larger storage cost of the IB replicas. In most real-life applications, the network bandwidth (i.e., E j ) is relatively stable (i.e., varies in a small range) and the RSOs in the images are identified previously, which motivate us to investigate an optimal RIB replicas storage scheme based on the available network bandwidth. Therefore, to further reduce the storage cost of the RIB replicas, we propose a batch updating algorithm for the optimal RIB replicas storage.

First, suppose that the small range of the network bandwidth is denoted as \( \left[{E}_L^{\prime },{E}_U^{\prime}\right] \), combining with Eqs.(23) and (25), the optimal ID is derived below:

$$ {i}_{opt}\approx \left\lceil \frac{RIB.dpi-{D}_L}{\sqrt{\frac{\left({\theta}_T-{T}_0\right)\cdot {E}_j\cdot TR}{Bit\cdot CR\cdot \left({\displaystyle {\sum}_{i=1}^{\left|\varOmega \right|}{\displaystyle {\sum}_{j=1}^{\alpha_i}SI{B}_{ij}.S+R.S}}\right)}-{D}_L-\frac{\left({E}_j-{E}_L\right)\cdot \left({D}_U-{D}_L\right)\;}{E_U-{E}_L}}\;}\right\rceil $$
(30)

where \( {E}_L^{\prime}\ge {E}_L \) and \( {E}_U^{\prime}\le {E}_U \).

Based on Eq.(30), the optimal replica IDs for the RIBs range from \( \left[{i}_{opt}\left({E}_L^{\prime}\right),{i}_{opt}\left({E}_U^{\prime}\right)\right] \), if \( {i}_{opt}\left({E}_L^{\prime}\right)\le {i}_{opt}\left({E}_U^{\prime}\right) \); otherwise, \( \left[{i}_{opt}\left({E}_U^{\prime}\right),{i}_{opt}\left({E}_L^{\prime}\right)\right] \). That means the replicas having granularity IDs not in the above ranges can be removed. Algorithm 3 summarizes a batch updating method for the optimal storage of the RIB replicas.

figure h

6 The IIT algorithm

With the support of the above enabling techniques, an image can be efficiently and interactively transferred in the MWN. Before introducing the IIT algorithm, we first provide a system overview of the transmission method.

6.1 System overview

Figure 8 shows the system overview of the IIT processing. Given an image I S , the RSOs detection processing is first conducted by the approach of Girshick et al. [11]. Then, the image is physically and equally partitioned into several IBs that include two types: SIB and RIB. For the SIB replicas, they can be transmitted to the receiver with their original resolutions at top priorities. For the RIB replicas, their pixel resolutions and transmission priority are lower than that of the SIBs, so that the key part of the image can be transmitted and displayed in priority. Once the preprocessing step is completed, the next step is to perform the IIT processing. Specifically, when an image I S is prepared to transmit, the information such as current network bandwidth and the receiver user interaction behaviors need to be collected and analyzed uniformly to derive the optimal pixel resolutions of the RIB replicas. Finally, based on their transmission priorities, the candidate IBs are transmitted to the receiver node (N R ) where the IBs are reconstructed and displayed.

Fig. 8
figure 8

System overview of the IIT processing

6.2 The algorithm

Algorithm 4 summarizes the detailed steps of our proposed IIT algorithm. First of all, a user U R sends an image transmission request (i.e., the ID number of I S ) to the sender node level N S (line 1), then the RSOs in the image are first identified (line 2). After that, the interactive transmission processing starts (line 3). In line 4, SIB replicas are first sent to the sender node with their original resolution. After that, for the RIB replicas transmission, we perform the dynamic optimal RIB replica selection processing to obtain the optimal ID of RIB replica (line 5), then sent to the sender node (line 6).

figure i

7 Experimental evaluation

To demonstrate the efficiency and reliability of our proposed IIT method, we conduct simulation experiments to demonstrate the transmission performance.

7.1 Experiment setup

The prototype image receiver client is implemented on Android platform [25] in the Java language. The user interactions (i.e., tapping the screen) are randomly simulated by the users. Each node has a 2.7GHz Xeon processor, 2.0GB memory and 1 TB hard disk. The nodes in the local area network are connected via 1Gbps network links. The number of nodes in our system varies from 10 to 100. All experiments are performed in a 4G cellular network in which the average and maximum data communication rates are 10 bps and 30Mbps, respectively. In the slave nodes, the IB replicas with different pixel resolutions are stored in a file system and some information is recorded in a MySQL [20] database.

  • Datasets. To objectively and extensively evaluate the IIT method, we adopt two image datasets that are obtained from two ways: 1) Real dataset: 100,000 images are downloaded from Internet in which the image data size ranges from 0.2 to 1 MB; 2) Synthetic dataset: To evaluate the effect of data size on the image transmission performances, we have synthesized five groups of image data in which the data size of each image are 1 MB, 5 MB, 10 MB, 50 MB and 100 MB, respectively.

  • Competitors. There are two competitors in our experiments. The first one is a baseline - traditional transmission method (i.e., transmitting the whole image without partitioning); the second one is the Cbmr scheme [30].

7.2 A prototype transmission system

We have implemented a prototype transmission system for images as illustrated in Fig. 9. Figure 9a shows an example of the backend interface of the offline image processing. One RSO in this figure has been identified by a blue rectangle. Figure 9b demonstrates the receiver client interface in which the IBs in the two IFs (e.g., one RSO represented by the blue-line rectangle and the rest part R) have been reconstructed and displayed. Moreover, the pixel resolution of the RSO is higher than that of the rest part.

Fig. 9
figure 9

An interactive image transmission system

7.3 Effect of image size

In the first experiment, we study the effect of the image size on the performance of the IIT processing by using two kinds of image data as mentioned above. Method 1 uses traditional transmission method (i.e., transmitting the whole image without partitioning), method 2 adopts the Cbmr scheme [30], while method 3 adopts the IIT. In Fig. 10a and b, when the bandwidth (e.g., 100 MB/Sec) is relatively stable, the total transmission time using the IIT is better than that of other two ones. Meanwhile, with the increase of the image data size, the performance gap of the two approaches increases with the image size. This is because compared with the IIT, the corresponding data size of the images to be transmitted with the traditional approach is increasing so rapidly that the images cannot be sent to the destination nodes quickly. Our hybrid pixel resolution approach can effectively reduce the transmitted image data, especially for a large image.

Fig. 10
figure 10

Effect of image size

7.4 Effect of network bandwidth

Next, we investigate the effect of the network bandwidth on the performance of the IIT processing by using the two kinds of images. The three methods we compared with are the same to the ones in Section 7.3. Figure 11 shows when the image data sizes (e.g., 0.5 MB for real data and 50 MB for the synthetic data) are fixed, the total response time using our proposed IIT method is superior to that of other two methods. Meanwhile, with the condition that the bandwidth is increasing, the response time decreases gradually and the performance gap becomes larger especially for the large synthetic image data. This is because in the IIT method, the original image data size has been significantly reduced based on the three factors, i.e., network bandwidth, the image contents, and user interaction.

Fig. 11
figure 11

Effect of network bandwidth

7.5 Effect of Δ

In this experiment, we proceed to test the effect of Δ on the transmission cost, storage cost, and the mean opinion score(MOS) defined in Eq.(31), respectively.

$$ MOS=\frac{{\displaystyle {\sum}_{i\in \left[1,\left|{U}_i\right|\right]}{U}_i. OS}}{\left|{U}_i\right|} $$
(31)

where U i .OS means the opinion score(OS) for the user U i (ref. Table 2) and |U i | is the total number of the users.

Table 2 The description of the opinion scores

According to Eq.(31), 100 volunteers are randomly selected and involved to decide whether the image replica is clear to view by answering ‘excellent’, ‘good’ , ‘fair’ , ‘poor’ ,or ‘bad’. The synthetic image data set is used to perform the experiment.

In Fig. 12a, with the increase of Δ, the transmission cost is gradually decreasingly. This is because when Δ is small, it is hard to find the suitable IB replicas since the gap between the data sizes of the current IB replicas and the optimal ones becomes larger, leading to a higher transmission cost. Similarly, as illustrated in Fig. 12b, the storage cost increases rapidly as Δ increases since the total number of the IB replicas is increasing when Δ becomes larger, which leads to a larger storage cost of the IB replicas. Finally, we study the effect of Δ on MOS. In Fig. 12c, when Δ is larger than 20, the MOS is increased no more. This is because when Δ is too small, the image quality of the image reconstruction becomes relatively low which results in a poor user experience. So to obtain a tradeoff between transmission cost, storage cost and image quality, an optimal number of Δ is found to be 20.

Fig. 12
figure 12

Effect of Δ

7.6 Back end vs. client end

In this experiment, we compare the overheads of the backend and the client end of our transmission method and the Cbmr [30] in which 50 transmission requests are randomly generated and conducted to obtain the average consuming times for the above two ends respectively. We adopt the synthetic data as the experimental one. Figure 13 illustrates that when the image data size is from 20 MB to 100 MB, the transmission time of the IIT is gradually increasing and smaller than that of the Cbmr. Additionally, the overheads of the backend and client end for the Cbmr method are slightly larger than the IIT. This is because for the Cbmr, its computational costs for the image compression in the back end are larger than the IIT. The decoding costs for the Cbmr in the client end are also larger than the IIT.

Fig. 13
figure 13

Back end vs. client end

7.7 Effect of optimized storage scheme

This experiment evaluates the effect of storage scheme by adopting the real data as the experimental one. Figure 14 shows that when the number of images increases, the storage cost of the traditional approach is much larger than that of the optimized one. This is because for the optimized storage scheme, the data size of the image replicas is significantly reduced when the network bandwidth varies in a small range.

Fig. 14
figure 14

Effect of storage scheme

7.8 Effect of the TPA

In the last experiment, we first test the efficiency of the two transmission schemes: 1) Our proposed TPA method and 2) the C bmr approach [30] by using the synthetic dataset. In Fig. 15a, when the image data size is from 20 MB to 100 MB, the transmission time of TPA is gradually increasing and better than that of the Cbmr. This is because the data compression rate of the TPA method is higher than the Cbmr one. The data size to be transferred by the Cbmr is larger than that of the TPA.

Fig. 15
figure 15

Effect of TPA

To evaluate the effect of image size on the transmission robustness, we then used the synthetic image dataset in which the images have been divided into five groups in terms of the data size such as 5 MB, 10 MB, 20 MB, 50 MB and 100 MB. The transmission reliability (TR) can be defined in Eq.(32).

$$ TR=\frac{Number\kern0.17em of\kern0.17em successful\kern0.17em data\kern0.17em transmissions}{Total\kern0.17em number\kern0.17em of\kern0.17em data\kern0.17em transmissions} $$
(32)

As shown in Fig. 15b, with the increase of data size, the successful data transmission ratio is 100 % by using our image blocking technique. For the data transmission without adopting the TPA method, when the transmission data size is less than 10 MB, the successful transmission ratio is 100 %, however, if the data size is 20 MB, the average TR is decreased to 87 %. And if the data size is larger than 50 MB, the average TR is zero since it is hard to transmit such a large image successfully in a MWN. Based on the experimental result, to guarantee a high successful data transmission ratio, it is possible to transfer a large image only through the image blocking method in a limited network bandwidth.

8 Conclusions and future work

In this paper, we have presented an interactive large image transmission scheme (IIT) in the resource-constraint MWN, which allows the users to actively affect the transmission processing through tapping the receiver’s screen. Two enabling techniques, i.e., multi-resolution-based IB replica selection scheme and optimal IB replicas placement scheme are proposed to optimize the image transmission. Our experiments demonstrate that our proposed IIT method is more suitable for the large image transmission in minimizing the network communication cost as well as maximizing the rate of successful image transmission and user satisfaction.

In the future, to further enhance the transmission efficiency of the multiple images in a transmission-intensive environment, we plan to study a batch image transmission scheme based on the IIT approach. For real-life applications, we are planning to apply our application into of the traffic anomaly detection [14] and energy-efficient multicast routing [13, 15].