Introduction

With the development of big data and cloud computing, the contradiction between privacy protection, big data and cloud computing has become increasingly prominent. How to realize the effective sharing and utilization of massive data [1] has become a hot research topic. In the past few decades, data hiding has been widely concerned by the research community. Data hiding embeds additional data by modifying the content of the cover medium, which are mainly divided into three categories: watermarking [2,3,4], steganography [5] and reversible data hiding (RDH) [6,7,8,9,10].

Reversible data hiding (RDH) can embed additional data in the cover media and recover the cover media losslessly after extracting the embedded data. The existing RDH methods are mainly based on three strategies: lossless compression [6], difference expansion [7, 8] and histogram shifting [9, 10]. In the lossless compression method, some features of the original image are extracted for lossless compression, and additional data are embedded in the reserved room. In order to improve the embedding capacity, RDH methods based on difference expansion and histogram shifting are proposed.

With the development of third-party cloud paradigms and privacy protection applications, the demand for privacy protection is growing. The combination of encryption and RDH plays a vital role in privacy protection. In order to store or share files securely using third-party services, content owner uses encryption method to convert the original content into unreadable ciphertext before transmission. The data hider then embeds the data in the ciphertext. At the same time, the recipient wants to recover the original content losslessly after decryption and data extraction. Such privacy protection schemes trigger RDH-ED to manage ciphertext data.

RDH-ED methods are mainly classified into two categories: vacating room after encryption (VRAE) [11,12,13] and reserving room before encryption (RRBE) [14, 15]. Zhang et al. [11] first proposed the RDH-ED method. The data hider divides the encrypted images into non-overlapping blocks and embeds the data by flipping the three least significant bits (LSBs) of half of the pixels in each block. By using the spatial correlation of the images, the recipient designs a smoothness estimation function to estimate the texture complexity of each block for data extraction and images restoration. However, the quality of recovered images and the accuracy of data extraction are still not satisfactory. In order to separate data extraction and images recovery, Zhang et al. [12] proposed a separable RDH-ED method based on LSB compression. The recipient can restore the images without extracting the additional data, that is, the data can be extracted directly from the embedding space. Afterwards, Qian et al. [13] proposed that data hider can reserve room by compressing a series of selected bits obtained from encrypted images. The legitimate recipient uses the distributed source coding to correctly extract the additional data and restore the original image perfectly.

Compared with VRAE methods, RRBE methods have better performance in reducing data extraction errors and restoring original images. Ma et al. [14] first proposed to reserve room for data hiding before encryption. This method ensures that there are no errors in data extraction and image restoration. Recently, Puteaux et al. [15] proposed an RDH-ED method. The data hider uses MSB replacement to embed additional data. The recipient extracts additional data from the MSB plane of the encrypted image. After decryption, the recipient uses the correlation between adjacent pixels to reconstruct the original image through MSB prediction.

According to the state-of-the-art methods introduced above, image-based RDH has been extensively studied for many years, but these methods cannot be directly applied to other cover media, such as text, audio, video and 3D mesh. At present, 3D meshes have been applied in various fields. For example, in the medical field, 3D meshes are used to accurately describe organs. In the film industry, 3D meshes are used to represent characters, objects and scenes. Considering the commercial value, visual value and economic benefits, when the 3D meshes are distributed on the Internet, the producers or copyright owners inevitably face practical problems such as copyright protection and content authentication. Therefore, RDH based on 3D meshes is an important research topic. However, RDH research with 3D meshes as cover media is still in its infancy.

The existing RDH methods of 3D models are mainly divided into four domains: spatial domain, transform domain, compressed domain and encrypted domain. The spatial-domain-based RDH [16,17,18,19,20] embeds additional data into the 3D model by slightly modifying the vertex coordinates instead of modifying the connectivity data and has low complexity. The transform-domain-based RDH [21] embeds additional data in the transform coefficients of the model. The compressed-domain-based RDH [22, 23] uses vector quantization to compress the vertices of the 3D model and then embeds the data in the compressed model stream [24, 25]. In recent years, the encrypted-domain-based RDH [26, 27] has attracted the attention of research community.

The method of Jiang et al. [26] uses scaling and quantization to map the vertex coordinates of the 3D mesh to integers. The data hider embeds additional data by flipping several LSBs of the encrypted coordinates. The recipient uses the smoothing measure function to realize data extraction and mesh restoration. At this time, data extraction and mesh restoration are inseparable. In [27], a two-layer RDH-ED method using a homomorphic Paillier cryptosystem is proposed for 3D mesh, which is more suitable for cloud data management. Due to the large ciphertext extension and high computational complexity of the Paillier cryptosystem, the method in [27] is not efficient in practice. For the sake of fairness, the proposed method is mainly compared with [26] , because both [26] and the proposed method are based on symmetric encryption.

In this paper, we propose a 3D mesh-based RDH-ED method based on integer mapping and MSB prediction, which not only improves the embedding capacity, but ensures that the method is separable. The main contributions of this paper are as follows:

  1. (1)

    The MSB embedding strategy is adopted to achieve higher embedding capacity.

  2. (2)

    By making full use of the correlation of adjacent vertices in the natural mesh, the recipient can recover the MSB of the “embedded” vertices through ring prediction, so as to achieve mesh lossless recovery.

  3. (3)

    The proposed method can directly extract additional data from the encrypted mesh and guarantees the data extraction is error-free and separable, which is of great significance for privacy protection.

The rest of this paper is organized as follows: Proposed Method introduces the pro-posed method. Experimental Results and Analysis presents the analysis of experimental results. Conclusion concludes this paper and describes the future work.

Proposed Method

The proposed method consists of three stages: 1) reserving room and encryption; 2) data hiding; 3) data extraction and mesh recovery. Figure 1 illustrates the framework of the proposed method. Pre-processing, prediction error detection, encryption, data hiding, data extraction and mesh recovery are elaborated in the following sub-sections.

Fig. 1
figure 1

Framework of the proposed method

Pre-processing

3D mesh models are represented in various file formats such as OFF, PLY, OBJ, etc. The 3D mesh is composed of vertices data and face data. Vertices data include coordinates data of vertices represented as \(V= \{ v_i \in \mathfrak {R}^3 | 1\leq i \leq N\}\), where the vertex is represented as \(v_{i}=(v_{i,x}, v_{i,y}, v_{i,z})\) and N is the number of vertices. Note that each coordinate \(v_{i,j}\) < 1 and \(j\in\{x, \; y, \; z \}\)\(F=(f_1, f_2, \ldots f_M)\) represent face sequence, where \(f_{i}=(v_{i,x}, v_{i,y},v_{i,z})\), M is the number of face. Figure 2 shows the local region of a “Cow” mesh, and Table 1 is its corresponding file format.

Fig. 2
figure 2

Cow mesh

Table 1 File format for Fig. 2

We can perform lossy compression of vertex coordinates according to the recommendation of [28]. According to the different precision m, the corresponding integer value is between -\(10^{{m}}\) and \(10^{{m}}\), where \({{m}}\in\) [1-33]. Normalizing floating point coordinates \(v_{i,j}\) to integer coordinates \({\bar{v}_{i,j}}\) as

$$\begin{aligned} {\bar{v}_{i,j}}=\lfloor {v_{i,j}}\times 10^{{m}}\rfloor , \end{aligned}$$
(1)

where i is the ith vertex, \({j}\in \{{x, y, z}\}, {v_{i, j}}\) is the original set of floating point vertices and \({\bar{v}_{i,j}}\) is the set of integer vertices. Recipient can convert the processed integer coordinates to floating point coordinates by Eq. (2).

$$\begin{aligned} {\hat{v}_{i,j}}={\bar{v}_{i,j}} / 10^{m}. \end{aligned}$$
(2)

The value of m corresponds to the bit-length l of integer coordinates as

$$\begin{aligned} {l} = \left\{ \begin{array}{ll} 8, \qquad \quad &{}{1\le {m} \le 2}\\ 16,\qquad \quad &{} {3\le {m} \le 4}\\ 32, \qquad \quad &{} {5\le {m} \le 9}\\ 64, \qquad \quad &{} {10\le {m} \le 33}. \end{array} \right. \end{aligned}$$
(3)

Prediction Error Detection

The “embedded” set \({s}_{e}\) is used to embed additional data, and the “reference” set \({s}_{n}\) is used to recover the mesh without modifying the vertices during the whole process. We traverse all the vertices contained in the face data in ascending order and assume that \(F=(f_1, f_2\ldots f_m)\) represents the face data sequence, where \(f_{i}=(v_{i,x}, v_{i,y}, v_{i,z})\), M is the number of face data. Assuming that \(f_{n}=(v_{n,x}, v_{n,y}, v_{n,z})\) is the next face sequence to be traversed, and both \({s}_{e}\) and \({s}_{n}\) are initially 0. If there is no vertex in \(f_{n}\) in \({s}_{e}\) or \({s}_{n}\), we choose the first vertex in \(f_{n}\) to add \({f}_{n,x}\) to \(s_e\) and add \({f}_{n,y}\) and \({f}_{n,z}\) to \(s_n\).

Fig. 3
figure 3

An example of prediction error detection test on cow mesh

As shown in Fig. 3, the MSB of the x-coordinate of the “embedded” vertex numbered 1 is 0. The sender counts the number of occurrences of 0 or 1 in the MSB of the coordinates of the “reference” vertices numbered 2, 3, 4, 5, 7 and 8. If the number of 0 is greater than or equal to the number of 1, the MSB of the vertex coordinate numbered 1 is predicted to be 0. Vertex 1 is called a vertex in the embedding set that does not have prediction errors. Otherwise, the vertex index information is recorded as auxiliary information. Finally, the sender sends the auxiliary information together with the original mesh to the data hider.

Encryption

After the vertex coordinates are pre-processed, the sender uses Eq. (4) convert integer coordinates to binary.

$$\begin{aligned} {{b}_{i,j,u}}=\lfloor {\bar{v}_{i,j}}/ 2^{u}\rfloor \quad {mod}\quad 2,\qquad {u}=0, 1\ldots {bitlen}-1, \end{aligned}$$
(4)

where \(\lfloor . \rfloor\) is a floor function and 1 \(\le\) i \(\le\) N and j\(j\in \{x, \; y, \; z\}\) the bitlen of the coordinate can be obtained by Eq. (3).

The sender uses the stream cipher function to generate pseudo-random bits \({c}_{i, j, u}\) and encrypts the original 3D mesh bit stream \({b}_{i, j, u}\) to obtain the encrypted binary \({e}_{i, j, u}\).

$$\begin{aligned} {{e}_{i,j,u}}={{b}_{i,j,u}} \oplus {{c}_{i,j,u}}, \end{aligned}$$
(5)

where \(\oplus\) stands for exclusive OR.

The sender can get the encrypted mesh using Eq. (6)

$$\begin{aligned} {{E}_{i,j}}=\sum _{u=0}^{bitlen-1}{{e}_{i,j,u}} \times 10^{m}, \end{aligned}$$
(6)

where \({E}_{i,j}\) is the integral value of coordinates.

Data Embedding

To prevent additional data from being detected, the data hiding key Kw is used to encrypt the to-be-inserted data. The sender first calculates \({s}_e\) and then embeds the data in the vertices in \({s}_e\) where there is no prediction error. The MSB of the x, y and z coordinate values of each vertex is replaced with 1 bit. With Eq. (7), each vertex in \({s}_e\) is embedded with 3 bits.

$$\begin{aligned} {{v_{i,j}}^{\prime \prime }}={w}\times 2^{bitlen-1}+{{v_{i,j}}^{\prime }} \quad mod \quad 2^{{bitlen}-2}, \end{aligned}$$
(7)

where w is additional data, \({{{v_{i,j}}^{\prime }}}\in\)C is the vertex after pre-processing and encryption, \({{v_{i,j}}^{\prime \prime }}\) is the vertex of marked encrypted mesh.

After the data embedding stage, the sender obtains encrypted mesh with additional data, namely E(M)w. Figure 4 shows the embedding process on vertex 1 of the “embedded” set. Assuming that the x-axis coordinate value is 0.180757, when m = 6, after pre-processing, it is mapped to an integer 180757. The data hider directly replaces the MSB of vertex 1 with additional data 1 by bit replacement strategy. After completing the above steps, the data embedding process is completed.

Fig. 4
figure 4

An example of data embedding on cow mesh

Data Extraction and Mesh Recovery

The recipient uses the data hiding key Kw to extract additional data and uses the encryption key Ke to restore the original mesh respectively. Since the proposed method is separable, there are the following three situations according to the different situations:

Case 1: With only the data hiding key Kw, the recipient can extract the MSB from the vertex coordinates of \({s}_{e}\) without prediction error and then obtain the corresponding plaintext additional data.

$$\begin{aligned} {w}={{v_{i,j}}^{\prime \prime }}/2^{bitlen-1}, \end{aligned}$$
(8)

where \({{v_{i,j}}^{\prime \prime }}\in {C}\) is vertex of the marked encrypted mesh.

Case 2: With only the encryption key Ke, the recipient can recover E(M)w to get M. M is recovered in two steps : mesh decryption and MSB prediction recovery.

The pseudorandom bits \(c_{i,j,u}\) are generated by the encryption key Ke and used to perform xor function with \({{e}_{i,j,u}^{\prime \prime }}\) to decrypt the marked encrypted mesh E(M)w.

$$\begin{aligned} {{b}_{i,j,u}^{\prime \prime }}= {{e}_{i,j,u}^{\prime \prime }}\oplus {{c}_{i,j,u}}, \end{aligned}$$
(9)

where \({{e}_{i,j,u}^{\prime \prime }}\) is the binary stream of the marked encrypted mesh, \({{b}_{i,j,u}^{\prime \prime }}\) is the binary stream of the decrypted mesh with additional data and u=0, 1...bitlen-1.

After decryption, the vertex coordinates of the \({{s}_{n}}\) set is restored. In the data embedding stage, the MSB of the coordinates of the vertices embedded in the set is replaced by additional data. Therefore, the recipient uses the spatial correlation of the original mesh, and the MSB of the “embedded” set vertices is predicted by the MSB of the surrounding adjacent vertices. The method of using adjacent reference coordinates to predict the embedded vertex coordinates is called ring prediction. The recipient can obtain a high-quality restoration mesh by using ring prediction.

For example, the coordinate values of adjacent vertices 2, 3, 4, 5, 7 and 8 have been restored correctly after decryption. Based on their MSB values, the coordinate value of the vertex number 1 is predicted to be 0 or 1. When predicting the MSB of v\(_{1,{x}}\), we count the MSB of the x coordinate of vertex index numbers 2, 3, 4, 5, 7 and 8. If the number of occurrences of MSB 0 is greater than or equal to the number of occurrences of 1, the MSB of \(v_{1,{x}}\) is expected to be 0, otherwise it is 1.

Case 3: With the data hiding key Kw and encryption key Ke at the same time, the recipient can extract additional data and restore the original 3D mesh perfectly. Note that data extraction step needs to be performed before mesh restoration.

Fig. 5
figure 5

Test meshes: (a) Beetle, (b) Mushroom, (c) Mannequin, (d) Elephant

Experimental Results and Analysis

In this section, the reversibility and embedding capacity of the improved method are analyzed, and the results are compared with the state-of-the-art method [26]. We perform extensive experiments in MATLAB R2018b under windows 10. As shown in Fig. 5, there are four standard test meshes: Beetle, Mushroom, Mannequin, Elephant. Two datasets: meshes with OFF format from The Princeton Shape Retrieval and Analysis GroupFootnote 1 and those in OBJ format from The Stanford 3D Scanning RepositoryFootnote 2 are used to test performance. The key indicator is the embedding capacity. In Embedding Capacity, we analyze the embedding capacity of the proposed method. In Geometric and Visual Quality, for the distortion of the original mesh caused by the data hider, the Hausdorff distance and the signal-to-noise ratio (SNR) are used to evaluate the reversibility. In Performance Comparison, the performance comparison of the proposed method and the state-of-the-art method [26] is given. The additional data embedded is a randomly generated 0/1 sequence.

Embedding Capacity

The embedding rate (ER) is measured by the number of bits per vertex (bpv), which is the ratio of the number of embedded bits to the number of vertices in the mesh.

In fact, in the clear areas, MSB predictions are easier than LSB predictions. In this paper, the MSB of each coordinate axis is replaced with 1-bit additional data, and as a result, 3 bits are embedded per vertex. We test the embedding rate of the proposed method on four standard test meshes. The embedding rate of the proposed method on Beetle, Mushroom, Mannequin and Elephant is 0.98 bpv, 1.34 bpv, 0.95 bpv and 1.02 bpv, respectively.

Geometric and Visual Quality

Hausdorff distance and signal-to-noise ratio (SNR) are used to measure the geometric distortion of the mesh. Hausdorff distance measures the similarity between two sets of points by calculating the distance between two sets of points. Assuming there are two sets A=(a\(_1\),a\(_2\)...a\(_p\)) and B=(b\(_1\),b\(_2\)...b\(_q\)), the Hausdorff distance between two sets of points is defined as:

$$\begin{aligned} H(A,B)=max({h}(A,B),{h}(B,A)), \end{aligned}$$
(10)
$$\begin{aligned} {h}(A,B)=max({a}\in A)min({b}\in B) \parallel {a}-{b}\parallel , \end{aligned}$$
(11)
$$\begin{aligned} {h}(B,A)=max({b}\in B)min({a}\in A) \parallel {b}-{a}\parallel , \end{aligned}$$
(12)

where \(\parallel . \parallel\) is the distance between point a of set A and point b of set B (such as L2), p and q are the number of elements in the set.

Signal-to-noise ratio (SNR) is defined as: SNR=

$$\begin{aligned} 10 \times \lg \frac{ \sum _{i=1}^{N} [(v_{i,x}-\overline{v}_x)^2+(v_{i,y}-\overline{v}_y)^2+(v_{i,z}-\overline{v}_z)^2]}{ \sum _{i=1}^{N} [(g_{i,x}-\overline{v}_x)^2+(g_{i,y}-\overline{v}_y)^2 +(g_{i,z}-\overline{v}_z)^2 ] }, \end{aligned}$$
(13)

where \(\bar{v}_x\)\(\bar{v}_y\)\(\bar{v}_z\) are the averages of the mesh coordinates, \({v_{i,x}}\) , \({v_{i,y}}\) , \(v_{i,z}\) are the original coordinates, \(g_{i,x}\)\(g_{i,y}\)\(g_{i,z}\) are the modified mesh coordinates, N is the number of vertices.

Fig. 6
figure 6

Results of test meshes on different m: (a) Hausdorff distance, (b) SNR

The value of m is a trade-off between the quality of the recovered mesh and the computational overhead of the process.

As shown in Fig. 6(a), when 2\(\le\)m\(\le\)4, the Hausdorff distance gradually decreases, and when m\(\ge\)4, the Hausdorff distance steadily approaches 0. The results show that as the accuracy m increases, the similarity of the point set between the recovered mesh and the original mesh increases. As shown in Fig. 6(b), SNR shows an upward trend as m increases. Thus, as m increases, the Hausdorff distance decreases, while the SNR increases. This indicates that the quality of the recovered mesh is increasing.

Figure 7 shows the visual effect of the original mesh at different stages of the proposed method when m = 4, including original mesh, encrypted mesh, marked encrypted mesh and recovered mesh.The difference between the original mesh and the recovered mesh is invisible to the naked eye, which means that the proposed method does not introduce perceptual distortion.

Fig. 7
figure 7

Illustrative examples showing the appearance of the mesh of different stages when m = 4. From left to right is the original mesh, encrypted mesh, marked encrypted mesh and recovered mesh

Performance Comparison

The data hiding method in [26] flips the LSBs of each vertex to embed 1 bit data. Due to the spatial correlation of the mesh, the original mesh local region is much smoother than the modified mesh local region, so the recipient uses a smoothness estimation function to estimate the fluctuation of each local region for data extraction and image restoration. As shown in Fig. 8, the embedding rate of Jiang et al.’s method [26] on Beetle, Mushroom, Mannequin and Elephant is 0.35 bpv, 0.45 bpv, 0.34 bpv and 0.34 bpv, respectively. The embedding rate of the proposed method is 0.98 bpv and 1.34 bpv, 0.95 bpv and 1.02 bpv. The experimental results show that the proposed method improves the embedding capacity compared with the method of Jiang et al. [26].

We tested the performance of embedding rate on the Princeton Shape Retrieval and Analysis Group dataset to reduce the impact of randomly selecting test meshes. As shown in Table 2, the embedding rate of Jiang et al. is 0.35 bpv, while the average embedding rate of the proposed method is 1.02 bpv. Thus, the proposed method has significant advantages in embedding rate compared with the method proposed by Jiang et al. [26].

Fig. 8
figure 8

Comparison of embedding rate between the proposed method and Jiang et al.’s method

Table 2 Average embedding rate comparison with Jiang et al.’s method on datasets

The Hausdorff distance and signal-to-noise ratio (SNR) are used to measure the geometric distortion of the meshes. Lower Hausdorff distance values and higher SNR values indicate that the quality of the recovered mesh is better. Figure 9 shows the experimental results of the Hausdorff distance and SNR on four test meshes. Taking the Beetle as an example, the ER of the proposed method is 0.98 bpv, while the method of Jiang et al. [26] is 0.35 bpv. When m = 4, the Hausdorff distance of this method is 0.008 \((10^{{-3}})\), while the method of Jiang et al. is 0.990 \((10^{{-3}})\). The SNR of this method is 76.37, while the method of Jiang et al.is 43.06. Thus, the proposed method not only obtains a higher embedding rate, but also has good performance in obtaining high-quality recovered meshes compared with the method [26].

Fig. 9
figure 9

Comparison results of Hausdorff distance and SNR on test meshes: (a) Beetle, (b) Mushroom, (c) Mannequin, (d) Elephant

Feature Comparison

The method of Jiang et al. [26] uses a smoothing estimation function to calculate the local smoothness of the embedded data and the local smoothness of the unmodified part. Since data extraction and model restoration are performed at the same time, this method is inseparable. When applied to the cloud management, it means that if the cloud administrator wants to extract the additional data embedded in the embedding stage, it must have the decryption key to decrypt the marked encrypted mesh first, which may expose sensitive information of the content owner. This method is suitable when the content owner fully trusts the third-party platform, and there are certain application limitations. The proposed method can directly extract the additional data embedded in the cloud from the ciphertext and does not involve the decryption of the mesh. Therefore, as shown in Table 3, the proposed method is separable and more suitable for cloud management.

In fact, the data extraction error rate of the method in [26] on the Beetle mesh is 36.78\(\%\), and that of Mushroom, Mannequin and Elephant are 32.08\(\%\), 45.21\(\%\) and 4.94\(\%\), respectively. Larger data extraction error rate indicates that it is possible to transmit inaccurate additional data and cause invalid communication. In contrast, our method can directly and correctly extract additional information from the ciphertext domain to achieve effective communication. This method [26] is not achieve reversibility. The proposed method controls the degree of distortion by adjusting parameter m values and combines ring prediction to obtain perfect restoration mesh, that is, reversibility is achieved.

Table 3 Feature comparison between the proposed method and Jiang et al. [26]

Performance Analysis on Dense Meshes

In practical applications, models between different formats are often formatted. The proposed method is designed for the mesh in .OFF format. But after modifying the model reading function, the proposed method can directly use the mesh in the .PLY format as a carrier. In order to verify the effectiveness of the method, we performed experiments on The Stanford 3D Scanning Repository data set. Three dense meshes in .PLY format are randomly selected from the data set to show performance. The embedding rate and distortion performance of Table 4 show that the proposed method also achieves a higher embedding rate on dense meshes. Thus, experiments on dense meshes show the applicability and effectiveness of the proposed method to dense meshes.

Table 4 Performance of reversible data hiding on dense meshes

Conclusion

In this paper, we propose a RRBE separable RDH-ED method for encrypted 3D mesh models based on integer mapping and MSB prediction. The proposed method not only achieves feasibility, but also emphasizes the balance between capacity and distortion. The data hider achieves larger embedding capacity through the MSB embedding strategy. The recipient gets higher quality recovered mesh using ring prediction. At the same time, data extraction and mesh restoration in the proposed method are separable and error free. Experiments show that our method has larger embedding capacity and higher quality recovered mesh compared with the state-of-the-art methods. Since the selection of “embedded” sets is limited by the connectivity of the mesh, the embedding capacity of the proposed method is not very ideal. Designing a more effective method to select the “embedded” set to improve the embedding capacity is a problem to be solved in future work.