1 Introduction

In recent years, the advancements in wireless communication and imaging hardware have predicted the utilization of multimedia sensors in Wireless Sensor Networks (WSNs) and introduced a new paradigm called as Wireless Multimedia Sensor Networks (WMSNs). WMSNs are composed of interconnected multimedia sensing devices that allow retrieving Audio, Video, still images and also the scalar information from the environment [1]. With the capability of providing the enriched information of the environment, the WMSNs have gained a widespread applicability in several real time applications like Industrial process control, environmental monitoring, video surveillance, remote health care, traffic enforcement etc. Moreover, they can be deployed in difficult, inadequate and unattended areas. However, there are particular limitations that need to be addressed during the development of algorithms for processing the sensed data in WMSNs. They are mainly limited battery power, limited memory, limited processing capabilities and narrow bandwidth [3].

In WMSN, there exists a large number of camera sensor nodes deployed to monitor the area of interest with one or more data sinks located at the center of the network or at the outside of the network. The camera sensor nodes monitor the area, capture the observations and send their observations to the sink. Compared to the traditional WSNs, the design of data processing in WMSNs is a very challenging task, because the data acquired through camera sensor nodes in WMSN is in the form of images, audio and videos. Moreover, the camera sensor nodes are resource constrained while the visual information requires more sophisticated processing techniques and also require much larger bandwidth to deliver. Hence we focused on the data processing and communication aspects in WMSN.

The processing and transmission of raw images or videos seeks more bandwidth which is not supported by camera sensor nodes in WMSNs. In general, to make the network supportable to the multimedia data processing, it can be subjected to redundancy. In other words, the unnecessary data present in the acquired images or videos needs to be removed. Multimedia Source Coding [26] is one of standard image compression method which has great compression efficiency. The better examples for multimedia source coding are JPEG or JPEG 2000 [9], MPEGx [27] and H.26x [34]. These coding method uses motion estimation and motion compensation methods to utilize the spatiotemporal correlation properties of video sequences for predictive coding. This kind of methodology occupies a large number of resources. Moreover, the coding complexity is observed to be 5 to 10 times the decoding complexity. In the current advanced real time applications, these traditional video coding methods are not applicable for resource constrained WMSNs. Moreover, they have extensive mathematical computations which place a huge computational burden over the camera sensor nodes.

On the other hand, Distributed Source Coding (DSC) [12, 30, 37] has emerged as an alternate method which allows a simple encoder and complex decoder. In DSC the source camera sensor nodes performs simple encoding and leaves the computationally intensive decoding task for sink. In recent years, DSC has emerged as a promising solution for WMSNs. Recently, Ning Ma et al. [23] applied DSC based on Gradient Domain Region of Interest (ROI) which tried to enhance the coding efficiency of the severe motion region and improve the decode image while reducing the coding rate and consequently energy consumption of camera sensor nodes. In this method, the lower bound of coding rate is derived with the help of Slepain-Wolf theorem [31] which is dependent on the side information obtained through channel information. However, the DSC based on channel coding for the exploitation of correlation among adjacent frames is not easy which results in limited coding efficiency of DSC.

In this paper, we propose a simple and effective DSC in which the camera sensor nodes experience a less computational burden as well as less energy consumption. Here we employed a new clustering strategy initially to group the camera sensor nodes into several groups. Once the nodes are clustered, the CH selection is done based on the availability of resources. Next, the cluster nodes encode the sensed image and transmit them to the respective cluster head. Due to the transmission of encode images, the cluster nodes experience less resource consumption. Then the cluster head finds the correlation between the encoded images obtained through clustered nodes and extracts only uncorrelated data to transmit to sink node. At this phase, the cluster head applies the joint entropy to find the information gained from multiple camera sensor nodes. The major novelty and contributions are described as follows;

  • To preserve the energy of sensor nodes, this work proposes to apply histogram coding in which each image is represented with histogram features instead of raw pixels. The entire image requires more number of bits to represent while the histogram requires only few bits. This phase preserves the energy of sensor nodes thereby their network lifetime increases.

  • To remove the redundant information in images received from sensor nodes, the CH applies joint entropy coding which determines the joint mutual information between them. The proposed coding is much simple and effective than the traditional JPEG and MPEG compression standards.

  • To preserve the energy of CH, we propose a new selection node strategy based on the joint mutual information between the images forwarded by them. The selection is done at first instant based on the initial set of images. Further, only the selected nodes continue to forward the images to CH.

  • To analyze the performance of proposed method quantitatively, we propose a new distortion function that is derived based on the entropies of original and compressed images.

Rest of the paper is organized as follows; section II explores the detail of literature survey. Section III explores the complete details of developed mechanism for WMSNs. Section IV explores the details of simulation experiments conducted on the developed mechanism and finally the concluding remarks are shown in section V.

2 Literature survey

With an objective of multimedia data coding, different authors proposed different methods to obtain a better Quality of Service (QoS) in WMSNs [13, 24, 29]. Shikang Kong et al. [20] proposed an image compression and transmission method based on “Non-negative Matrix Factorization” (NMF). In this approach, the camera sensor nodes capture the image and send them to normal nodes for NMF based compression. Then the compressed images are transmitted to the cluster head followed by sink node. Even though the camera sensor nodes are taken the responsibility of image compression, they suffer from communication burden due to the transmission of entire images to normal nodes.

Z. Y. Xiong et al. [38] proposed a low complex JPEG based image compression scheme based on change detection. Change detection is employed to localize the region of interest and remove the data for transmission. Further, to reduce the computational complexity, Fast Discrete Cosine Transform (DCT) is employed. This method provides a tradeoff between lifetime of camera sensor nodes and the quality of reconstructed images. However, the DCT introduces huge computational burden at node level thereby the lifetime of network will get effected. R. Banerjee and S. D. Bit [4] proposed an energy saving image compression method based on curve fitting. After acquiring the images through camera sensor nodes, the curve fitting coefficients are generated and then transmitted to the sink node. Here the total data size is reduced due to the transmission of only curve fitting coefficients.

Hong Yang et al. [40] proposed a Robust Distributed Video Coding (RDVC) method for the optimization of video quality in WMSNs. An error-resilient key frame coding scheme is introduced here based on the protection of Wyner-Ziv coding (WZC) [14] in which the extra WZ bits helps in the provision of error resilience and also improved rate/distortion performance. Following the concept of Distributed Joint Source Channel Coding (DJSCC) [39], a new distributed source channel codec based on Group Puncture Rate Adaptive IRA code (GPRAIRA) is proposed. However, the estimation of channel state information consequences to inaccurate information which leads to distorted video at sink.

Next Han C et al. [16] proposed a new distributed image compression and transmission model based on “singular value decomposition” (SVD). In this study, the entire network is divided into camera nodes and common nodes. The complete image compression task is done by common nodes and hence the energy consumption pressure of camera sensor nodes is greatly reduced. However, the quality of image retrieved at sink node is not more effective due to the simple SVD.

Considering the co-operation of multiple nodes and “principal component analysis” (PCA), a “Noise-Tolerant Distributed Image Compression (NDIC)” is proposed by Z. Wei et al. [33] for image compression in WMSNs. In this study, the camera nodes gather the images and the normal nodes compress the divided images adaptively through NDIC-PCA and send the compressed image to CH followed by sink. This approach has gained an effective energy balance with less image quality. The PCA extracts only the principal components from an image irrespective of the objects.

Chun-Ming Wu et al. [35] proposed an image compression method by combining the JPEG-XR based compression process with in network processing. This approach proposed for multi-node cooperation. Initially, the cluster head dynamically partitions the entire network into dynamic partition non-uniform (DPNU) structure to attain the load balance. Then the image sensed is divided into several segments and each segment is transmitted to different cooperating neighbor clusters for compression. However, the image compression with the help of JPEG-XR with multimode cooperation introduces a huge resource depletion problem due to its complex compression methodology.

P. Jiang et al. [19] proposed an improved image compression algorithm based on in-cluster distributed processing (ICDP) which was oriented to the distributed incremental image processing algorithm (DICA). Based on the principle of energy priority selection rule, some auxiliary nodes are chosen in every cluster to accomplish the task of multi-level wavelet transform [2, 8] of JPEG2000. Since the major task of image compression is accomplished with the help of auxiliary nodes, the resources of main nodes not get affected and results in a better network lifetime. However, the computational burden is observed to be very high due to the accomplishment of multi-level wavelet transform in the network.

S. Heng et al. [18] proposed a new distributed architecture for multi-hop image compression to enhance the network lifetime for resource constrained WMSNs. This approach utilizes the combination of Fuzzy Logic System (FLS) [6] and distribution based computation for load balancing. Major this approach is accomplished in three phase; in the first phase the FLS determines the optimal size of camera cluster, in the second phase, a distributed image compression method is applied that partitions in the compression task among several sensors nodes (not camera nodes) and finally a hierarchical multi-hop routing is employed to divide the network into layers and simultaneously the FLS is employed for the selection of optimum relay node.

He Li et al. [17] proposed an image compression model for mobile WMSNs based on dynamic alliance and task collaboration algorithm. Initially the task of dynamic alliance is established with the help of camera sensor nodes based on their location, computational capabilities and resource utilization of normal nodes. Then the location and average moving velocity of the camera nodes and normal nodes are considered to measure the task stable execution time. Further the task of image compression is partitioned into subtasks and the sub tasks are accomplished on the tasks stable execution time.

3 Proposed approach

3.1 Overview

In this section, we discuss the details of our proposed approach. In this method, we develop a novel image compression method based on the entropy and correlation properties of images sensed by camera sensor nodes in WMSNs. Under the developed model, initially we cluster the camera sensor nodes in to different clusters. The clustering is done based on the Euclidean distances between sensor nodes. After clustering the nodes the node which has huge amount of energy resources is selected in cluster head (CH). After clustering the nodes then we apply an image compression method. In the image compression process, keeping the resource constraints, we have modeled a simple strategy to represent an image. Since an entire image transmission from each cluster node to CH consequences to a huge communication burden in the network, we have suggested a simple image representation method based on histograms. Before subjecting the images to Histogram computation, they were processed through median filtering [5] for noise removal. Since the median filter is most appropriate and simple filter for noise removal, we considered here for the purpose of noise removal. Once the histograms of each image sensed by each camera sensor node are reached to CH, it computes joint entropy to compress the image data thereby it sends only significant information to sink node. Under this section initially we discuss the details of clustering; next we discuss the details of histogram computation and then the calculation of joint entropy between two cameras followed by joint entropy between multiple cameras. Finally we discuss the details of a new function called as distortion function that explores the details of distortion in the received data at sink node due to resource constraints. The working flow of proposed approach is shown in Fig. 1. According to this diagram, the videos of CSNs are totally independent for each node. But, once they reached to CH, the mutual information between them is measured. Even though they are independent, due to the chance of overlapping regions, the videos acquired by CSNs have common information and that information needs to be reduced. The sample illustration of overlapping regions is shown in Fig. 2. Even through the videos acquired at CSNs are independent, there exist some common information which needs to removed.

Fig. 1
figure 1

Overall block diagram

Fig. 2
figure 2

CSNs installed on an area with overlapping regions and the corresponding acquired images

3.2 Clustering

In WMSNs the camera sensor nodes have limited energy, bandwidth, memory and processing capabilities. Hence if the entire camera sensor nodes are engaged to execute the tasks, then they will show a huge impact on the network lifetime. Moreover the multimedia sensor node captures images and videos which are of larger in size, the additional processing tasks make the nodes to die quickly. Hence to reduce this additional burden, the camera sensors nodes are clustered into groups and the major processing task is assigned to the CH. To execute the major processing task, the CH must have greater resources. Among the nodes present in cluster, the node with larger energy is selected as CH. Here the sensor nodes (cluster nodes) execute the simple task of histogram computation and the data redundancy task is executed by CH. The camera sensor node is only responsible to send their images after representing them in the histogram format. Once the histograms of all images are received at CH, then CH finds correlation between then to determine the non-redundant data. The CH only sends the uncorrelated information to sink node or base station. Consider the WMSN with N number of camera sensor nodes and let it be n1, n2, n3, …. , nN, the clustering is done based on the following expression.

$$ d\left({n}_i,{n}_j\right)=\sqrt{{\left({x}_j-{x}_i\right)}^2+{\left({y}_j-{y}_i\right)}^2} $$
(1)

Where d(ni, nj) is Euclidean distance between node ni and nj. (xi, yi) is the location coordinates of ni, (xj, yj) is the location of coordinates of node nj. In this manner the Euclidian distance is measured from every node to every node and we construct a distance matrix as follows

$$ d=\left[\begin{array}{ccc}{d}_{11}& {d}_{12}& \dots \kern0.5em {d}_{1N}\\ {}{d}_{21}& {d}_{22}& \dots \kern0.5em {d}_{1N}\\ {}\begin{array}{c}\vdots \\ {}{d}_{N1}\end{array}& \begin{array}{c}\vdots \\ {}{d}_{N2}\end{array}& \begin{array}{cc}\begin{array}{c}\dots \\ {}\dots \end{array}& \begin{array}{c}\vdots \\ {}{d}_{NN}\end{array}\end{array}\end{array}\right] $$
(2)

Where dij is the Euclidian distance between two nodes ni and nj, where i and j varies as 1 to N. After the construction of distance matrix then we compute the neighbor nodes for every node based on the following expression;

$$ {Ne}_i= find\left({d}_{ij}\le {R}_i\left({n}_i\right)\right) $$
(3)

Where Ri(ni) is the Communication range of node ni and Nei is the node ni’s neighbor node set. Once the neighbor nodes are measured for every node, one node is selected as CH which has huge resources availability. At this situation we concentrate on the selection of non-common nodes as CH’s. Since there is a chance of a single node may get selected as CH for multiple clusters, we have to mitigate this problem. If we observe a common CH for two groups then they are merged and formulated into a single cluster with only one CH is selected which has higher resource availability.

3.3 Histogram computation

Here the main intension to compute the histogram of an image is to reduce the computation burden of camera sensor node. If a camera sensor node transmits the image directly to CH, then it suffers from a huge computation burden followed by resource consumption. For example, consider an image sensed by a camera sensory node is at size 256 × 256, each pixel is represented by 8-bits, then the total number of bits to be transmitted to CH is 256 × 256 × 8 = 5,24,288 bits. This is very larger in number and camera sensor node need to transmit such kind of images continuously to CH which results in a heavy communication burden followed by huge energy consumption. Hence, to solve this problem the image need to be represented in such a manner it has to consume fewer resources. At the same time, we also here concentrates on the computation burden of the sensor node. To represent an image with better representation, a huge no of mathematical operations are needed to employ. For example the most popular and effective image/video compression techniques such as JPEG/MPEGX give a better representation with less number bits with less information loss. However, these methods introduce a huge computational burden at the resource constrained camera sensor nodes. Hence they are not suggestible techniques for image representation in WMSNs.

To represent an image with less computational burden, here we suggest a histogram based representation. In this representation we need just counters which have very much less hardware complexity. Hence we apply histogram based representation to represent the sensed image at the camera sensor node. For histogram computation, initially the image is divided into 4 blocks of equal size [28]. Then it forms a set S by using Eq. (4)

$$ S=\left\{{B}_n\in I,\kern0.5em n\le 4\right\} $$
(4)

Where Bn is the nth block of image I sensed by camera sensor node. Then the histogram computation is applied for every block according to the following expression

$$ {P}_{\left(I,{B}_n\in S\right)}\left({k}_L\right)={N}_L $$
(5)

Where \( {P}_{\left(I,{B}_n\in S\right)}\left({k}_L\right) \) is the histogram of kth intensity pixel having intensity kLoccurs in the nth block of image I, and NL is the number of occurrences of kL intensity pixel. In the above Eq. 5, we have considered the bin size as 1 which totally gives 256 values for a gray image. To further reduce the burden we can increase the bin size and the pixel those having pixel intensity in the range of bin are accumulated with the help of counter. An example demonstration for histogram computation is depicted in the Fig. 2.

3.4 Joint entropy at CH

After obtaining the histogram of images from every camera sensor node, the CH finds the joint entropy to find the correlation between camera sensor nodes. The joint entropy is used to study the amount of visual information from multiple cameras in the WMSN. The joint entropy provides the correlated information between the images received from multiple camera sensor nodes. If the images obtained by the camera sensor nodes are less correlated then they will provide more information to the sink. Hence we employ to compute the joint entropy at CH to measure the amount of visual information (Fig. 3).

Fig. 3
figure 3

Histogram computation

3.4.1 Entropy calculation

The entropy calculation gives the details of information perceived from the source of information generation. In real time data oriented applications, the entropy has widespread significance in the prediction of information availability. In information theory [10], the entropy concept is employed to measure the amount of information from a random source. If the image sensed by camera sensor node is interpreted as a Gray level source” the probabilities of source symbol are modeled by the gray level histogram of the sensed image. For a sensed image at any camera sensor node, the entropy is calculated as [15].

$$ \hat{H}=-\sum \limits_{i=1}^LP\left({r}_i\right){\mathit{\log}}_2P\left({r}_i\right) $$
(6)

Where L is the number of possible gray levels, and P(ri) is the probability of occurrence of the gray level. Generally the entropy denotes an average amount of information per pixel in the image. After capturing the image, the camera sensor node represents it in the histogram format. Let the camera sensor node is Ci and the image sensed by it is Ii and after transformation into histogram and let it be Pi. The camera sensor node transmits Pi to the CH and then the amount of information gained at CH is H(Pi). Here we did not consider the information loss caused by compression scheme or the loss incurred due to its packet transmission in the channel. For a given cluster with Q camera sensor nodes as C = {C1, C2, C3, …, CQ}, transmitted their images after transformation into histograms as {P1, P2, P3, …, PQ} to the CH, the amount of information gained at the CH is measured with the help of joint Entropy H(P1, P2, P3, …, PQ ). Hence our main objective is to find out the joint Entropy of multiple cameras.

3.4.2 Joint entropy of two images

Consider two camera sensors nodes C1 and C2, these are deployed in a region to sense the Region Of Interest (ROI). Consider each camera has captured one image and let they are denoted as I1 and I2 Where I1 is acquired with camera C1 and I2 is acquired with camera C2. The joint Entropy of I1 and I2 is calculated as [11, 32].

$$ H\left({I}_1,{I}_2\right)=H\left({I}_1\right)+H\left({I}_2\right)-I\left({I}_1;{I}_2\right) $$
(7)

Where H(I1) is the entropy of I1, H(I2)is the Entropy of Image I2 and I(I1; I2) is the mutual information of the two images. In other words, we can say that the mutual Information can be represented as the uncertainty reduction of one source due to the awareness of another source. Based on this interpretation the mutual information can be defined as

$$ {\displaystyle \begin{array}{c}I\left({I}_1;{I}_2\right)=H\left({I}_1\right)-H\left({I}_1/{I}_2\right)\ \\ {}\begin{array}{c} or\\ {}I\left({I}_1;{I}_2\right)=H\left({I}_2\right)-H\left({I}_2/{I}_1\right)\end{array}\end{array}} $$
(8)

Where H(I1/I2) or H(I2/I1) denotes the conditional Entropies which need the awareness of one source to measure the Entropy of another source. With respect to the probability computation, the standard definition for mutual information is given as [32];

$$ I\left({I}_1;{I}_2\right)=\sum \limits_i\sum \limits_jp\left(i,j\right)\log \left(\frac{p\left(i,j\right)}{p(i)p(j)}\right) $$
(9)

Where p(i) and p(j) are the probability distribution of histograms of Image I1 and I2 respectively, and p(i, j) is the Joint probability distribution of two sources. Generally the Mutual Information measures the mutual dependency between two sources. For a larger value of Mutual Information between I1 and I2, the images I1 and I2 are more correlated while for lesser value of Mutual Information between I1 and I2 the images I1 and I2 are less correlated. As the images are less correlated they can contribute more information to the CH and as the images are more correlated they can contribute less information to the sink node of base station. For images with high correlation means the sensed portion of their images are almost same. In such conditions one image is enough to reveal the information about region of interest.

According to the normalized form of mutual information, [25] defined a new coefficient called as Entropy Correlation Coefficient (ECC). The mathematical expression for ECC is obtained as

$$ ECC=\frac{2\times I\left({I}_1;{I}_2\right)}{H\left({I}_1\right)+H\left({I}_2\right)} $$
(10)

Here the range of ECC of defined from 0 to 1 where 0 value indicates that the source cameras C1 and C2 are not dependent or independent while the ECC value 1 denotes that the sources are mutually strongly dependent. In case of ECC value zero, we can also interpret that the two camera sensor nodes are equal means C1 is equal to C2. In such conditions the CH can gain more information about the region of interest and can forward to sink. From the Eq. (10) the I(I1; I2) is segregated as

$$ I\left({I}_1;{I}_2\right)=\frac{1}{2}\times ECC\times \left(H\left({I}_1\right)+H\left({I}_2\right)\right) $$
(11)

Substitute Eq. (11) in Eq. (7), then the joint entropy is reformulated as

$$ H\left({I}_1,{I}_2\right)=H\left({\mathrm{I}}_1\right)+H\left({I}_2\right)-\frac{1}{2}\times ECC\times \left(H\left({I}_1\right)+H\left({I}_2\right)\right) $$
(12)
$$ H\left({I}_1,{I}_2\right)=H\left({I}_1\right)+H\left({I}_2\right)-\frac{1}{2}\times ECC\times H\left({I}_1\right)-\frac{1}{2}\times ECC\times H\left({I}_2\right) $$
(13)
$$ H\left({I}_1,{I}_2\right)=H\left({I}_1\right)+H\left({I}_2\right)\times \left(1-\frac{1}{2}\times ECC\right) $$
(14)

In the above Eq. (14) the individual Entropies such as H(I1) and H(I2) can be measured at individual camera sensor nodes. However it keeps an additional computational burden. Hence this responsibility is also assigned to CH only. The main responsibility of camera sensor node is the transformation of sensed image into Histograms. Since we choose the CHs those have huge amount of resources, computationally intensive tasks are assigned to CH only. Moreover a raw image transformation from camera sensor nodes consumes heavy resources, we suggested to transformation of the raw image into simple Histogram representation which need very less resources for computation as well for forwarding to CH. Once the Histograms or every sensed image are received at CH, then it can compute the ECC very easily followed by joint entropy through Eq. (14). Due to this consideration we can observe reduced resource consumption at camera sensor node level which enhances the network lifetime.

3.4.3 Joint entropy of multiple images

In earlier sub section, we measured the joint entropy only between two cameras. However in WMSN, according to a clustering model explained at section 3.2, there exist more than two camera sensor nodes in each cluster. Here CH needs to measure the joint entropy of multiple camera sensor nodes. This computation is done simply by extending the theory discussed in section 3.4.2. Consider a cluster have Q camera sensor nodes, C = {C1, C2, C3, …, CQ}, and the respective sensed images are ={I1, I2, I3, …, IQ}, the joint entropy of all these images is represented as H(I1, I2, I3, …, IQ). For the evaluation of joint entropy of multiple cameras, with the help of standard definition, then the probability distribution of all the Q images needs to be estimated, but it is a very difficult task particularly when the Q value is large.

Here to accomplish this task, we have extended the concept of joint entropy calculation of two cameras are discussed in earlier section 3.4.2. Since there exists and Q number of individual images in the Q cluster {I1, I2, I3, …, IQ}, we merge two of them together so that the joint entropy of these two images can be measured with the help of Eq. (14). If two images are merged at a time then the total number of images left in the cluster becomes Q-1. Upon the repetition of this merging process, the Q individual images will be merged and will get single joint entropy H(I1, I2, I3, …, IQ). After the completion of merging process the calculation of joint entropy for multiple cameras is demonstrated through the following Table 1 and Fig. 4.

Table 1 Joint entropy of multiple cameras
Fig. 4
figure 4

Joint entropy calculation

3.5 Distortion computation

To monitor the area of interest in WMSNs, a set of camera sensor node are deployed. Let there are N number of camera sensor are deployed and each node has sensed an image and let it be I1, I2, I3, …, IN. The joint Entropy of all these images is measured and Let it be H(I1, I2, I3, …, IN) which denotes the maximum amount of information gained at the CH regarding the Region of interest. In the earlier subsections, we have computed the joint Entropies and based on the obtained values, the CH choose only few set of camera sensor nodes to transmit the information. The selection is done based the joint Entropy and if the obtained joint entropy between two camera sensor nodes is lesser in value then they are said to be more correlated. In such conditions only one camera sensor node is allowed to transmit the Histogram of sensed image to the respective CH. That is to infer, if two camera sensor nodes transmit their histograms to the CH, the amount of information gained at the sink will be larger if the two camera sensor nodes are less correlated.

Consider that among the N available camera sensor nodes in the cluster only a subset of camera sensor nodes as {Ci1, Ci2, Ci3, …, CiM, is chosen to report to the CH, the joint Entropy at CH is H(Ii1, Ii2, Ii3, …, IiM). Based on the values, we define a new distortion function as a ratio of amount of information gained at CH to the maximum amount of information possible to gain. The mathematical expression for newly defined distortion function is expressed as [11].

$$ D=\frac{H\left({I}_1,{I}_2,{I}_3,\dots, {I}_N\right)-H\left({I}_{i1},{I}_{i2},{I}_{i3},\dots, {I}_{iM}\right)\ }{H\left({I}_1,{I}_2,{I}_3,\dots, {I}_N\right)} $$
(15)

As per Eq. (15) the value of D lies in between 0 and 1. Here D interprets the percentage of information loss incurred due to the resource constraints of network. This newly defined distortion function is very much helpful for different applications which have different constraints over the information loss. For instance, one application may ask the network to transmit the information within 5% to 10% of information loss. In such conditions, the developed approach is much helpful which signifies the resources constrained information loss. Based on the distortion value the number of cameras need to report can be chosen.

4 Simulation experiments

4.1 Simulation setup

In this section, we discuss about the details of simulation experiments conducted over the developed method. To check the performance of proposed mechanism, we have implemented the detailed methodology using MATLAB 2015 simulator. For the purpose of simulation, initially a random network is created with P Number of nodes and the area of network is considered as X × Y, where X is the length and Y is the width. Under this simulation, we have varied the node count from 20 nodes to 50 nodes. For every set of nodes, we have maintained a constant network size and it is of 1000 × 1000 m2. After the deployment of nodes in the network, to realize the concept of resources availability, we have assigned random energies and memories for every node. Next, we formulated clusters according to the developed clustering mechanism and the nodes which are of having larger memory and larger energy are chosen as CHs. After that, we have tried to transmit different sized images from different sensor nodes to sink node according to the developed methodology. For entire simulation, we have considered the Simulation time as 200 s. The source and sink nodes are chosen in such manner they are kept far from each other and the source node send data to destination node within the specified simulation time. The simulation parameters considered for simulation are depicted in Table 2.

Table. 2 Simulation parameters

4.2 Performance metrics

To evaluate the performance of the proposed approach, several performance metrics are used and their mathematical formulae are as shown below;

  1. 1.

    Average Energy Consumption (AEC): AEC is defined as the ratio of total energy consumed by total number of nodes. In order to calculate the AEC metric, let n be total number of nodes obtained on the way to sink node, and the total energy is Etotal(i) for each node i, be evaluated as;

$$ \mathrm{AEC}=\frac{1}{n}\sum \limits_{i=1}^n{E}_{total}(i) $$
(16)
  1. 2.

    Peak Signal to Noise Ratio (PSNR): PSNR measures the quality of visual information transmitted from source node to sink node. It is given by Eq. (17).

$$ PSNR=10\ast \log 10\left(\frac{{\mathit{\operatorname{Max}}}^2}{MSE}\right) $$
(17)

Where MAX is the maximum gray pixel intensity (generally Max = 255) and MSE is the mean Square error between original and received image. Mathematically, MSE is expressed as;

$$ MSE=\frac{1}{mn}{\sum}_{i=1}^n{\left(O\left(i,j\right)-R\left(i,j\right)\right)}^2 $$
(18)

Where O(i, j) is original pixel of a frame at source node and R(i, j) is retrieved pixel of a frame at sink node. In this work, the average PSNR is considered by averaging the PSNRs of individual frames of a video sequence. Higher PSNR defines a higher quality and vice versa. Generally, PSNR is expressed in decibels (dB).

  1. 3.

    Structural Similarity Index Metric (SSIM): SSIM measures the structural similarity between original frame sent from source node and the received image at sink node. Unlike PSNR which evaluates the observed errors, SSIM measures the structural information degradation from original frame to received frame. The SSIM is mathematically expressed as;

$$ SSIM=\frac{\left(2\ast \overline{x}\ast \overline{y}+{C}_1\right)\left(2\ast {\sigma}_{xy}+{C}_2\right)}{\left({\sigma}_x^2+{\sigma}_y^2+{C}_1\right)\left({\overline{x}}^2+{\overline{y}}^2+{C}_2\right)} $$
(19)

Where C1 = (k1L)2 and C2 = (k2L)2 avoid the fraction from infinity. L is the dynamic range of the pixel values (typically this is 2# bits per pixel −1). k1 = 0.01 and k2 = 0.03 by default. The general range of SSIM lies between −1 and 1.

4.3 Results

Under this section, we explore the effectiveness of proposed approach through different performance metrics. Initially we check the performance through the computation of joint entropy and distortion for varying number of selected cameras. Next, we measure the visual quality through PSNR and SSIM for varying number of selected cameras. Finally we had shown a detailed comparison between proposed and several existing methods through PSNR, SSIM and Average energy consumption.

Figure 5 shows that the joint entropy increases with an increase in the number of selected cameras. Here we considered the random camera selection for the purpose of comparison. Under the random selection, the number of camera nodes to be selected for the data transmission to sink is chosen in a random fashion without considering any reference parameters like entropy or correlation. However, in our method, the camera selection strictly follows the entropy process. As the information sensed by camera sensor nodes is largely correlated, the joint entropy is less means they contribute very less information to the sink. Unlike, as the correlation is less, the joint entropy is high means the sink node will gain more information. The random selection process neglects the properties of sensed images, it has less joint entropy. It can be observed to be less even with an increase in the number of selected camera sensor nodes. From the joint entropy results shown in Table 3, on an average, the joint entropy of random selection is observed as 2.53 while for proposed entropy based selection, it is observed as 3.12.

Fig. 5
figure 5

Joint entropy for random camera selection and proposed entropy based selection

Table 3 Comparison of joint entropy at different CSNs

Figure 6 shows that as the number of camera nodes selected increases, the distortion decreases. In our methodology, the distortion is measured as the deviation between current and maximum entropies of multiple cameras; the distortion will be less for more number of cameras. For less number of selected camera nodes, the information gained at sink node is less and hence the distortion present in the received multimedia data is more. Unlike, as the count of selected cameras increases, they send their complete information to the sink and the information gained at sink will increase. Due to this reason, the distortion will get reduced with an increase in number of camera nodes selected. At this phase, the selection process employed for clear selection also have significant role. The camera node selection must be like that the joint entropy between camera nodes must be high, indirectly denotes a less correlation. This is not possible if the camera nodes are selected randomly. In out method, the CH finds the joint entropy and based on the obtained results, it decides which and how many nodes are to be selected. Hence the proposed entropy based method has less distortion compared to the random selection. From the distortion results shown in Table 4, on an average, the distortion of random selection is observed as 46% while for proposed entropy based selection, it is observed as 34%.

Fig. 6
figure 6

Distortion for random camera selection and proposed entropy-based selection

Table 4 Comparison of distortion at different CSNs

PSNR is a qualitative performance metric which reveals the quality of received data at the sink or base station. The PSNR have an inverse relation with Mean Square Error or distortion function. Means as the MSE increase, the PSNR decreases and vice versa. Similarly for a video received at sink node has high distortion; its PSNR will be less and vice versa. Here we did two case simulation studies to measure the PSNR, one is with respect to number of camera nodes selected through different selection methods and another is with respect to bits per pixel (bpp). The plot shown in Fig. 7 demonstrates the effect of camera selection over the PSNR. From this figure, we can observe that the PSNR is less for random selection method while it is high for proposed entropy-based selection method. Since the proposed entropy-based selection method choses the camera nodes which have minimum correlation properties, the image data received at sink node will have less distortion and higher PSNR. From the PSNR results shown in Table 5, on an average, the PSNR of random selection is observed as 29 dB while for proposed entropy-based selection, it is observed as 35 dB.

Fig. 7
figure 7

PSNR (dB) for random camera selection and proposed entropy-based selection

Table 5 Comparison of PSNRs at different CSNs

SSIM is a one more qualitative performance metric which measures the structural similarity between original image at source node and received image at sink. For SSIM calculation, we have done two case studies; one is with respect to camera selection method, and another is with respect to varying bpp. With an appropriate selection of camera nodes, the CH and the sink receives an appropriate and qualitative image which consequences to less distortion between original and received images. As the distortion is less, the SSIM is high and vice versa. Hence, we can see from Fig. 8, the SSIM of proposed entropy-based selection method is high (0.8780) while for random selection method, it is less (0.8140). Further, from the results shown in Table 6, the Entropy based selection is proved to provide more quality with respect to the structure of images.

Fig. 8
figure 8

SSIM for random camera selection and proposed entropy-based selection

Table 6 Comparison of SSIM at different CSNs

4.4 Comparison

To alleviate the effectiveness of proposed approach, the results are compared with the conventional approaches such as NDIC-PCA [33] and DVC-SW [23].

The results shown in Fig. 9 demonstrate the comparison details between proposed and several existing methods through PSNR with varying bpp. As the number of bits used to represent an image increase, the quality of image also increases, as observed in Fig. 8. In the proposed approach, the camera sensor nodes transmit the entire sensed image information to the CH and the CH finds only the uncorrelated data and forwards it to the sink. Further the selected camera nodes of less correlated and contribute more information to the sink. Due to this reason, the proposed methods have experienced a higher PNSR compared to the existing method such as DVC-SW and NDIC-PCA. Even though these both methods employed distributed coding, they didn’t consider the correlation properties of images. In the earlier NDIC-PCA, the PCA is applied over the images to extract only principal components which are not sufficient to reconstruct the images at sink. Hence the received image has much deviation with original transmitted image which raises a larger MSE and lesser PSNR. From the results shown in Table 7, we can see that the proposed approach has an average PSNR of 33 dB while for existing methods; it is observed as 30 dB and 29 dB for DVC-SW and NDIC-PCA respectively.

Fig. 9
figure 9

PSNR (dB) comparison at different pixel rates

Table 7 Comparison of PSNRs at different bpp

Next, the pixel rate also has significant impact on the structural quality of image, as it can be observed in Fig. 10. From this figure, we can observe that the SSIM followed increasing characteristics with an increase in the pixel rate or bpp. To represent the edge features (structures or boundaries) of an image, sufficient numbers of bits are required and then only its structural property will get preserved. Hence for a larger value of bpp, the SSIM will be high. As the number of similar pixels those represent the edge features of two images are more, then we can say that they have more SSIM, and it is possible only with higher bpp. Since the proposed approach employed a joint entropy-based data modeling, the parts of images received from different clusters form a complete image with more uncorrelated data. However, the conventional methods like PCA and SF theorem can’t support for the extraction of uncorrelated data between sensed images from multiple cameras. Hence the SSIM of proposed approach is observed to be high compared to the existing methods. From the SSIM results shown in Table 8, on an average, the SSIM of proposed approach is observed as 0.8756 while for existing methods, it is observed as 0.8540 and 0.8352 for DVC-SW and NDIC-PCA respectively.

Fig. 10
figure 10

SSIM comparison at different pixel rates

Table 8 Comparison of SSIM at different bpp

Figure 11 explores the details of energy consumption at node level with varying distance between CH and sink. In general, as the distance between CH and sink increases, the CH needs more energy to forward the data to sink. At this phase, the requirement of extra energy arises if the CH is doing data redundancy or data compression task. Hence, we have measured the energy consumed each node by applying different methods for image compression. Compared to all the remaining methods, the proposed method has less energy consumption since we have used just a counter to count the pixels at node level and entropy calculator at CH level. Next, even though NIDC-PCA [33] employed the distributed coding of images, they have employed PCA for data redundancy which has greater computational tasks compared to proposed approach. In the case of Dynamic Alliance Collaborative Compression (DACC) [17], the nodes will collaborate to each other to execute two tasks such as image transmission and image compression. Hence, they have gained less energy consumption than NDIC-PCA. Initially, it is observed the larger energy consumption and it decreases with the increase distance between CH to sink. Because, as the distance increase, there is a possibility of more number of neighbor ode availability and each node contributes to either image compression or transmission. Further, the JPEG oriented methods such as JPEG [38], JPEG 2000 [19] and JPEG-XR [35] are observed to have more energy consumption due to their computationally intensive tasks. From this figure, we can see that the distributed coding methods (proposed and NIDC-PCA) are much deviated from JPEG oriented methods in the prospect energy consumption for image compression followed by transmission. From the average energy consumption results shown in Table 9, on an average, the energy consumption per node of developed method is observed as 0.9 J while for existing methods, it is observed as 1.1023 J, 1.2532 J 1.9414 J, 2.4232 J and 3.6221 J for DACA, NIDC-PCA, JPEG-XR, JPEG 2000 and JPEG respectively.

Fig. 11
figure 11

Energy consumption per node comparison at distances

Table 9 Comparison of average energy consumption at different distances

For subjective assessment of proposed method, we evaluate Mean Opinion Score (MOS) by subjecting the output images to visual quality analysis. Under this concept, different viewers are chosen, and they are asked to see the original input image and output image. After seeing, they were asked to score a value in between 0 and 5, where 0 is for worst and 1 for excellent. MOS is a subjective test which referrers the real time experience of people such that we may know whether the developed method is able to fulfill the requirements of real time people or not. The obtained MOS results are demonstrated in Table 10. From the results, we can see that the proposed method has gained a larger MOS value compared to NDIC-PCA. Since PCA removes more information, the quality of images will get lost.

Table 10 Mean opinion score comparison

5 Conclusion

In WMSNs, due to the nature of larger sized data and resource constrained camera sensor nodes, the routing design is a challenging task. To achieve an optimal performance in WMSNs, the prime focus is needs to be kept in the removal of redundant data that was sensed by multiple camera sensor nodes. Even though there is several standard image or video compression algorithms are there in the existence, they are not suitable for resource limited WMSNs. Hence, we have developed a new and simple image compression and transmission method based on Histograms and correlation properties. For an efficient image representation at node level, we have employed the histogram method. For data redundancy, we proposed a joint entropy-based camera node selection and the only the nodes those have less correlation are selected to report. Moreover, to lessen the computational burden and energy consumption, we proposed a new clustering mechanism, and the entropy calculation tasks are assigned to resource rich CH. Simulation experiments are done with several cases studies and the performance is measured through Distortion, PSNR, SSIM, and energy consumption. On an average, the PSNR obtained through the proposed approach is observed as 33 dB while for existing methods; it is observed as 30 dB and 29 dB for DVC-SW and NDIC-PCA respectively. Thus, the average improvement is observed as 3 dB and 4 dB from conventional methods. Next, the average SSIM of proposed approach, DVC-SW, and NDIC-PCA are observed as 0.8756, 0.8540 and 0.8352 respectively. Thus, the average improvement is observed as 0.0216 and 0.0404 from conventional methods. Furthermore, the proposed approach also had shown effectiveness at the reduction of energy consumption. From the results we observed that the average energy consumption of proposed approach is observed as 0.9 J while for existing methods, it is observed as 1.1 J, 1.9 J, 2.4 J and 3.6 J for NIDC-PCA, JPEG-XR, JPEG 2000 and JPEG respectively. Based on these results we can conclude that that the developed method is much effective in both Quality Preservation and energy consumption reduction.

Even though the proposed approach is able to reduce the energy consumption at node level, it can be further reduced through the selection of an appropriate intermediate node selection. Since multimedia is of larger size, a single path won’t support a quick delivery at sink. Thus, the main limitation of our method is more end-to-end delay due to single path between source and sink. Hence in future, we focus towards the development a new multipath routing mechanism which makes the delay less and helps in the quick data delivery at sink. A one more possibility to extend this work is to apply on 3D multimedia data. In the future, we will also extend our delivery approach and computing skill to 3D collaborative multimedia application [7, 21, 22, 36].