Keywords

1 Introduction

Machine learning, such as deep learning, has been widely used in various applications because of its technological breakthroughs. However, recent works indicate that a trained model is vulnerable to various attacks, which will compromise the security of the model itself and the application system. The trained model information, such as the architecture and parameters, can be regarded as the intellectual property of the model creators. Moreover, if the model information is revealed, it is easy to carry out adversarial example attacks [1] to cause misclassification by adding small perturbations to the input data of the model, and model inversion attacks [2] to steal the training data from the model, i.e., white-box attacks. Therefore, stealing or illegally copying the model is one of the important issues in the security of machine learning. Machine learning implemented on embedded devices will especially face this issue due to reverse engineering when the devices are in the hands of attackers.

Several methods using trusted execution environments (TEEs) have been proposed for model protection recently [3]. TEEs provide an isolated environment for code execution that cannot be manipulated by malicious software running on the device, including the untrusted operating system (OS). Current various processors have a TEE mechanism such as Intel SGX, AMD SEV, and Arm TrustZone. TEEs have been mainly used for cryptographic applications, however, they have also been used for other critical applications with the increase in memory size of TEEs. Moreover, it has been proposed to execute machine learning applications using TEEs for model protection [4,5,6,7,8,9,10].

There are several problems with TEEs to protect trained models. In general, it does not work well to simply port the program of a trained model to a TEE. This paper focuses on three main problems as follows:

  • Lack of memory. The memory size of TEEs is generally not enough for machine learning, especially deep learning. For example, the available memory size of TEEs is about a few MB in Arm TrustZone, depending on the hardware and OS [11, 12]. However, the parameter size of typical deep learning is in the order of hundreds or tens of MB.

  • Runtime overhead. To solve the lack of memory, existing works have proposed an approach where the loaded parameters to the memory and the execution of prediction are divided into each layer for memory saving [6, 10]. This approach switches the execution environment for each layer. However, it increases the runtime due to frequent switching of execution environments.

  • Parameter manipulation. A lot of existing works encrypt the trained model information for confidentiality, however do not consider the possibility of parameter manipulation, i.e., integrity [4,5,6,7,8,9,10]. An attacker can manipulate the parameters of a trained model, which can cause malfunctions and misrecognition in the application system.

This paper proposes a novel model protection method using TEE, mainly for deep learning on embedded devices. The proposed method presents a solution to three problems: lack of memory, runtime overhead, and parameter manipulation. The method saves memory to divide the loaded parameters and the execution of prediction into each layer or calculation unit, instead of loading all parameters into the memory, referring to existing works [6, 10]. To reduce the number of times the execution environment is switched, the method uses the shared memory of TEEs to load the divided parameters and executes the decryption of the loaded parameters and the prediction in concurrent computing on TEEs. The method uses GCM (Galois/Counter Mode), the most widely used dedicated authenticated encryption mode, to achieve both the confidentiality and integrity of the parameters.

The experiments show the evaluation results of the proposed method in terms of measurable memory saving and runtime. The experiments are conducted on a Raspberry Pi 3 Model B with Arm TrustZone (for Cortex-A architecture) and OP-TEE [13], which is an open-source TEE implementing the Arm TrustZone technology. The experimental results are compared with existing methods using three trained models (LeNet, VGG-7, and Darknet) implemented by Darknet [14], which is an open-source neural network framework.

The contributions of this work can be summarized as follows:

  • The proposed method provides memory saving, low runtime overhead, and detection of parameter manipulation, for trained model protection using TEEs (Sect. 3). It is mainly for deep learning on embedded devices.

  • The experiments show the effectiveness of the proposed method in terms of memory saving and runtime (Sect. 4). The runtime of the proposed method is about one-tens compared with existing works, however, it is a trade-off between the divided size of loaded parameters and the runtime overhead.

  • The security technologies are discussed to build and support the model protection system applying the method (Sect. 5).

2 Problems

This section explains three main problems with TEEs to protect trained models, based on existing works. This paper focuses on Arm TrustZone for embedded devices. Related works are summarized in Sect. 6.

2.1 Lack of Memory

In Arm TrustZone, the available memory size of a TEE (called a secure world) is about a few MB, depending on the hardware and OS [11, 12]. For example, in the case of Arm TrustZone and OP-TEE on the Raspberry Pi 3 Model B, the available memory size of the secure world is 7 MB [13]. One reason for this is that applications in the secure world need to fit into the on-chip memory. However, the parameter size of typical deep learning models is on the order of hundreds or tens of MB. There are lightweight models with the parameter sizes of a few MB, however, the matrix operations such as im2col require memory allocation several times larger than parameter size, when the convolutional operations are used. In addition, there could be the requirements to implement the program in the secure world as compactly as possible to eliminate potential vulnerabilities or avoid allocating as many resources as possible to applications executed only for specific purposes in the secure world.

2.2 Runtime Overhead

To solve the lack of memory, the approach of existing works increases the runtime of the deep learning applications [6, 10]. The approach achieves memory saving by dividing the loaded parameters into each layer, however, the execution environment is switched for each layer. When frequently switching the execution environment, the runtime overhead is increased due to storing and restoring the state of a processor, i.e., context switch. In addition, the shortcomings of another approach present in the existing literature, where only several layers are implemented in the secure world [7, 9], are discussed in Sect. 6.

2.3 Parameter Manipulation

A lot of existing works store the encrypted model information in storage for confidentiality, however do not consider the possibility of parameter manipulation [4,5,6,7,8,9,10]. In other words, although existing works keep the parameters secret, an attacker can manipulate them, which can cause malfunctions and misrecognition in the deep learning application system. For example, the application output can be fixed to a specific result due to the manipulated parameters. Although this threat cannot cause sophisticated malfunctions and misrecognition like adversarial example attacks, it is important to ensure the integrity of trained models.

Fig. 1.
figure 1

Use case of the proposed method

3 Proposed Method

This section explains the proposed method using TEEs to protect trained models, based on the solutions of the problems in Sect. 2 and the use case. The basic idea is to divide the loaded parameters and execute GCM (decryption and detection of parameters manipulation) and prediction in concurrent computing.

3.1 Use Case

Figure 1 shows a use case of the proposed method. The use case describes that an embedded device executes a real-time or distributed prediction with deep learning on-site by using sensor data as input. A cloud server connected to the embedded device gathers the sensor data and generates a trained model. The generated model in the cloud server is fed back to the embedded device for making it smart.

3.2 Threat Model

As shown in Fig. 1, the trained model can be regarded as a critical component and needs to be protected from theft or manipulation. The use case shows that the encrypted model delivered from the cloud server is securely decrypted and executed for prediction. This paper aims to protect against an attacker who may steal or manipulate the trained model with illegal memory access, etc. The proposed method uses TEEs to securely decrypt and execute trained models. Fault injection and side-channel attacks are out of scope.

Figure 2 shows an overview of trained model protection using a TEE. In this figure, the execution environment is separated between a normal world and a secure world using Arm TrustZone on an embedded device. Normal applications are executed in the normal world, and deep learning applications, which have the trained model information to be protected, are executed in the secure world. The normal applications can access the deep learning applications only through secure APIs, in other words, malicious applications or operations cannot freely access data in memory used on deep learning applications. The trained model is encrypted and stored in the storage of the normal world, and is decrypted in the secure world when it is called by the deep learning applications. The secret key for decryption is stored in a secure element accessible only from the secure world. Therefore, the trained model information cannot be revealed by malicious applications or operations.

Fig. 2.
figure 2

Overview of trained model protection using TEEs

3.3 Solutions

The solutions of the three problems in Sect. 2 are addressed as follows:

Memory Saving. The memory saving is achieved to divide the task composed of loading the parameters and executing prediction into each layer or computation unit such as matrix multiplication, instead of loading all parameters into the memory. It is based on the existing works [6, 10]. In order to divide loading the parameters, it is necessary to decrypt the encrypted parameters according to the division unit (each layer or computation unit). The existing works encrypt the parameters layer-by-layer because they divide the loaded parameters into each layer for memory saving.

Fig. 3.
figure 3

Comparison of execution flow in the proposed method and the existing works

Low Runtime Overhead. The low runtime overhead is achieved to reduce the number of times switching the execution environment by using the shared memory of TEEs. The shared memory is a memory area accessible from both the secure and normal worlds to communicate data quickly and efficiently. Figure 3 shows a comparison of the execution flow in the proposed method and the existing works [4, 6, 10]. In this figure, the bold arrows indicate the execution flow, and the gray squares indicate loading the parameters to the memory. The execution of all layers in the secure world [4] can fall into the lack of memory due to loading a large number of parameters to the secure world. The execution of layer-by-layer in the secure world [6, 10] saves the memory, however, it increases the runtime due to the switching overhead between the normal and secure worlds for each layer. The proposed method saves the memory by dividing the loaded parameters for each layer and reduces the number of times switching the execution environment by using the shared memory to load the divided parameters.

Detection of Parameter Manipulation. The detection of parameter manipulation is achieved by using GCM. The parameters are encrypted, and an authentication tag is produced to detect parameter manipulation. To execute the prediction, the parameters are decrypted, and then parameter manipulation is detected by verifying the authentication tag. The proposed method encrypts all parameters by GCM in advance. In prediction, the parameters can be obtained by stopping decryption for each division unit based on the counter value, since GCM encrypts them in counter mode. The decrypted parameters for each division unit can be deleted after the divided execution of prediction using them to save the memory for the next divided execution. This method reduces the runtime overhead for switching to the secure world because only the first pointer of the parameters stored in the shared memory is sent to the secure world. Note that the method needs to pad and encrypt the parameters according to the block size of a cipher used in GCM and the division unit so that the encrypted parameters can be decrypted according to the division unit.

Fig. 4.
figure 4

Software architecture of the proposed method

3.4 Software Architecture

Figure 4 shows a software architecture of the proposed method. First, an encrypted model configuration is transferred to the secure world, and then it is decrypted and set in the secure world. The model configuration includes the model architecture such as the structure of a trained model and the parameter size. For example, it corresponds to a cfg file of Darknet framework [14]. Next, the pointers of input data, parameters, and an authentication tag in the shared memory are transferred to the secure world, and then the prediction is started. The proposed method executes GCM and prediction for a division unit in concurrent computing. The parameters of the division unit are decrypted, and a part of the authentication tag is calculated. The divided prediction is executed with the decrypted parameters. After the used parameters are removed from the memory in the secure world, the operations for the next division unit are executed in the same way. When an output layer is executed, the production of the authentication tag, which is partially executed for each division unit, is completed, so the parameter manipulation can be verified. Finally, the output of the prediction result to the normal world is controlled according to the verification result of the authentication tag. The proposed method enables to transfer of the encrypted input data to the secure world and decrypt it and execute prediction in the secure world, or to encrypt the prediction result in the secure world and transfer it to the normal world, depending on the use case.

Fig. 5.
figure 5

Concurrent computing of prediction with deep learning and GCM decryption

Figure 5 shows the concurrent computing of prediction with deep learning and GCM decryption. The encrypted parameters (Ciphertext) are decrypted by GCM in the secure world, based on the counter value (Counter) according to each division unit. The decrypted parameters (Plaintext) are fed to the deep learning operation (DNN), and a part of the prediction is executed. Moreover, a part of the authentication tag (Tag) is calculated. These operations are executed for each division unit in concurrent computing. When the last parameters (Ciphertext n) are decrypted, the production of Tag is completed. Tag is compared to the authentication tag associated with the ciphertext (Auth tag) to detect parameter manipulation. The prediction result (Output data) is transferred to the normal world after the verification with the authentication tag. When there is parameter manipulation, the prediction result should not be transferred to the normal world. A message of detecting parameter manipulation will be output.

Table 1. Trained model information in the experiments

4 Experiments and Evaluation

This section explains the experiments and evaluations of the proposed method. The method is evaluated in terms of memory saving and runtime overhead.

4.1 Experimental Setup

The experiments use a Raspberry Pi 3 Model B with Arm TrustZone (for Cortex-A architecture) and OP-TEE [13] to implement the proposed method. The available memory size of the secure world is 7 MB for a deep learning application in the experimental setup. In practice, the memory size left just for the loaded parameters would be less.

The experimental results are evaluated using three trained models. The models are implemented in C language using Darknet framework [14]. Table 1 summarizes the trained model information in the experiments. The three models are the LeNet model [15] for recognizing handwritten digit images in MNIST dataset [16], the VGG-7 model [17] for classifying object images in Cifar10 dataset [18], and the Darknet(tiny) model [14] for classifying object images in ImageNet dataset [19]. The parameter size of each model is 191.1 kbytes, 274.5 kbytes, and 3.1 MB, respectively.

The experimental results are compared using four execution methods based on the existing works. Figure 6 shows the compared execution methods in the experiments. The four methods are the execution of all layers in the normal world, which is no countermeasure, the execution of all layers in the secure world [4], the execution of layer-by-layer in the secure world [6, 10], and the proposed method.

The experiments evaluate the proposed method in terms of the maximum size of loaded parameters, which is related to the memory saving, and the runtime, which is related to the overhead. The loaded parameters are encrypted in the cases of the three methods ((b)(c)(d) in Fig. 6) by using AES128-GCM of mbed TLS [20]. In other words, the methods include the decryption operation of the parameters. The execution of layer-by-layer in the secure world loads the parameter layer-by-layer. The division unit of the proposed method is also set layer-by-layer in the experiments. The runtime is measured separately for model setup and prediction.

Fig. 6.
figure 6

Compared execution methods in the experiments

Table 2. Maximum size of loaded parameters and runtime among the methods

4.2 Experimental Results

Table 2 shows the experimental results of the maximum size of loaded parameters and runtime among the four execution methods using the three trained models. The size of the loaded parameters is equal to the parameter size of the training model, in the case of the execution of all layers in the normal or secure world. The proposed method is compared with the other methods in Fig. 7 (the maximum size of loaded parameters) and Fig. 8 (the runtime). Note that the runtime in the experimental results is the average runtime of 10 samples, respectively.

Fig. 7.
figure 7

Comparison of maximum size of loaded parameters among the methods

Fig. 8.
figure 8

Comparison of runtime among the methods

Maximum Size of Loaded Parameters. According to Fig. 7, the proposed method has the same results as the execution of layer-by-layer in the secure world. This is because the proposed method divides the loaded parameters layer-by-layer, the same as the exiting works [6, 10]. Therefore, the maximum size of loaded parameters is equal to the parameter size of the layer with the largest size of the parameters in a trained model. Compared with the execution of all layers in the normal world, the proposed method cuts the size of loaded parameters by about one-fifth to one-half. It depends on the type of trained models because of the different parameter sizes in one layer. However, the Darknet model cannot be implemented by the execution of all layers in the secure world. This is due to the lack of memory in the secure world. The parameter size of the Darknet model is 3.1 MB to the available memory of 7 MB in the secure world. The available memory size left just for the loaded parameters is less than 7 MB.

Runtime. According to Fig. 8, the runtime of the proposed method is about one-tens compared with the runtime of layer-by-layer in the secure world [6, 10]. Compared with the runtime of all layers in the normal world, the proposed method has a runtime overhead of about 2 to 10 times, depending on the type of trained models. Note that the execution of layer-by-layer and the proposed method have a disadvantage of the runtime because they require loading the parameters at every prediction. The execution of all layers in the normal or secure world can iteratively predict by only loading the parameters once.

4.3 Evaluation

The experimental results show that the proposed method reduces the size of the loaded parameters, and also reduces the runtime overhead compared to the execution of layer-by-layer in the secure world [6, 10]. According to the results of the three models, the proposed method has a trade-off between the loaded parameter size and the runtime depending on the division unit. It is effective to set the division unit by considering the trade-off in the proposed method. For example, the division unit is set to the grouped multiple layers, depending on the upper limit of available memory size.

5 Discussion

This section discusses the security technologies to build and support the proposed method as a model protection system. The technologies are a trusted computing base, secure computations with hardware accelerators, and side-channel resistance.

5.1 Trusted Computing Base

It is necessary to guarantee the reliability of a platform applied to the proposed method including the TEE and OS, and the applications running on the platform. The proposed method protects trained model information and the execution of deep learning operations, however, the protection of the platform and application stored in storages goes beyond the proposed method. Therefore, it is necessary to verify the reliability of the platform and application by secure boot, trusted boot, or remote attestation. The verifications require security hardware such as secure memory as the root of trust (RoT). The proposed method also requires the RoT to protect a key used for GCM.

5.2 Secure Computations with Hardware Accelerators

In order to accelerate deep learning applications, hardware accelerators are used such as GPUs, FPGAs, or ASICs. This paper assumes that deep learning computations in the proposed method are executed on a CPU. When the proposed method outsources the deep learning computations to a hardware accelerator, it is necessary to consider the coordination between the TEE and the hardware accelerator. For example, the proposed method needs to control the access from the normal OS to the hardware accelerator, when the deep learning computations are executed with the hardware accelerator from the secure OS. In the NVIDIA Jetson embedded development board, the Google Trusty TEE [21] can be implemented as a TEE, however, there are no reports of secure deep learning computations with the TEE and the GPU acceleration [22].

5.3 Side-Channel Resistance

Even if trained model information is treated using TEEs, it has been reported that the information can be revealed by side-channel leaks such as memory access patterns, execution time, power consumption, or electromagnetic radiation. Tople et al. have reported that the prediction results of a deep learning application running on Intel SGX are leaked by the memory access patterns [23]. It is necessary to consider the countermeasure against side-channel leaks related to the trained model information in TEEs.

6 Related Works

This section introduces the related works pertaining to trained model protection. This paper splits up the related works into five groups: TEE-based schemes [4,5,6,7,8,9,10], homomorphic encryption schemes [24], multi-party computation schemes [25], watermark schemes [26], countermeasures against side-channel attacks [27]. This section provides an overview of each scheme; the TEE-based schemes are described in a separate subsection. Note that model extraction attacks [28], which extract trained model information from input/output data, are not covered in this paper because the attacks can not steal the model information itself. The attacks try to generate an alternative model with the equivalent property.

6.1 Model Protection Without TEEs

Homomorphic Encryption Schemes. The homomorphic encryption schemes execute deep learning computations while the input data is encrypted, and the prediction result is also encrypted [24]. Because the computation is executed while the data is encrypted, the computational cost is very high. Therefore, it is considered practical to apply the schemes to servers with high-performance computing resources. The schemes keep the input/output data secret, however, the trained model information itself such as the parameters is not protected basically. In addition, there are several limitations on the scale of the trained model and the type of activation function that can be applied depending on the homomorphic encryption algorithm.

Multi-party Computation Schemes. The multi-party computation schemes divide the input data and the trained model information among multiple parties (clients and servers) and communicate with them confidentially to execute deep learning computations [25]. In other words, the divided input/output data and trained model information are kept secret. The schemes have a lower computational cost than the homomorphic encryption schemes, however, the communication cost is high because the schemes require a lot of communications among the parties. Therefore, it is considered practical to apply the schemes to servers because the schemes need multiple trusted parties and online communication among parties.

Watermark Schemes. The watermark schemes embed watermark information into a trained model to detect unauthorized uses of the trained model [26]. The schemes are not countermeasures against the leaks of trained model information itself, however, the approach to detect whether the use of the trained models is unauthorized. To embed the watermark information into the trained model, several methods have been proposed to train a model for a specific output result according to input data [26]. There are several attacks against the schemes, to modify the stolen trained model by the techniques such as transfer learning or distillation to avoid the watermark. It is necessary for the schemes to evaluate the resistance to these attacks.

Countermeasures Against Side-Channel Attacks. The countermeasures against side-channel attacks have been proposed to prevent side-channel leaks of trained model information [27]. To prevent side-channel leaks depending on the parameters of a trained model, a countermeasure that adds random numbers to mask the computations with the parameters is proposed [27]. Since TEE-based schemes are countermeasures against software attacks in an isolated environment, a combination of both countermeasures is possible.

6.2 Model Protection with TEEs

Several technical reports describe using TEEs for model protection [29, 30]. The existing works of TEE-based schemes have proposed several methods designed for cloud computing using Intel SGX [4,5,6,7]. Recently, several methods using Arm TrustZone have also been proposed for embedded devices [8,9,10]. This paper mainly focuses on embedded devices using Arm TrustZone. In Sect. 4, the proposed method is compared with the existing works related to Arm TrustZone for evaluation.

Intel SGX. Ohrimenko et al. propose a method to protect the execution of machine learning computations, including deep learning, by using Intel SGX for cloud computing [4]. The method decrypts encrypted input data in an isolated environment of Intel SGX, called an enclave, to generate a trained model. The trained model is output in encrypted form. The method also includes dummy memory accesses to prevent observing memory access patterns as side-channel leaks to reveal the trained model information by malicious software in the same cloud server. However, it cannot be applied to large-scale trained models due to the limitation of memory size in an enclave (128 MB).

Tramer et al. propose Slalom, which is a framework to verify the results of deep learning computations entrusted to an untrusted cloud server using Intel SGX on a trusted device [5]. Slalom entrusts the execution of linear computations, i.e., matrix multiplication with weights in deep learning, to an untrusted cloud server and verifies the results on an enclave of the trusted device using the Freivalds algorithm. The input data to the untrusted cloud server and the results from the server are kept secret by a stream cipher. Slalom can be applied to large-scale trained models, however, the trained model information such as the parameters is not protected.

Hanzlik et al. propose MLCapsule, which executes the commutations of each layer in deep learning on an enclave using Intel SGX [6]. MLCapsule can avoid the limitation of the memory size in the enclave because it divides the execution of deep learning computations layer-by-layer. Therefore, it can be applied to large-scale learned models. However, it has a runtime overhead because the execution environment is switched to execute In addition, the threat of differential attacks to reveal the parameters is pointed out because the intermediate data of each layer is outside the enclave [7]. The intermediate data is not encrypted.

Schlögl et al. propose eNNclave, which executes only computations of several layers in deep learning on an enclave and the other computations outside the enclave using Intel SGX [7]. The eNNclave divides the deep learning computations into two layers: the public layer is the computationally expensive first half of the layer, and the private layer is the second half of the layer, where features specific to the trained model. It mainly assumes transfer learning. The public layer is executed by hardware accelerators such as GPUs, while the private layer is executed by the enclave. It is debatable whether there are any problems in disclosing the parameters of the first half layer. It would not be secure that the first half layer is completely free of features specific to the trained model, and it depends on the trained model. In addition, it potentially becomes more vulnerable to other attacks such as adversarial example attacks. The model extraction attacks will only attack the latter layer, and the number of estimated parameters on the attack will be reduced. The attacks can easily extract the trained model information close to the original one.

Arm TrustZone. Bayerl et al. propose OMG, which protects the execution of deep learning computations using Arm TrustZone for embedded devices [8]. OMG utilizes the address space controller (TZASC) of the Arm TrustZone to create a temporary isolated environment (SANCTUARY) in a normal world to execute deep learning computations. Therefore, it can avoid the limitation of memory size in a secure world. However, the Arm Trusted Firmware and the OS need to be customized for OMG.

Mo et al. propose DarkneTZ, which executes each layer commutation of deep learning in the secure world [9]. DarkneTZ is similar to eNNclave, however, it selects the layers to execute in the secure world by calculating the privacy information of each layer against the model inversion attacks. It has the same issues as eNNclave. It is debatable whether there are any problems in disclosing the parameters of the several layers outside the secure world.

VanNostrand et al. propose a method that extends DarkneTZ and executes the commutations of each layer in deep learning on the secure world [10]. The method is similar to MLCapsule, and has the same issues. It has a runtime overhead because the execution environment is switched to execute Note that the authors have not done the evaluation on devices yet, and the specific overhead is unknown in the paper.

Proposed Method. Table 3 summarizes the comparison of the proposed method and related works for model protection using TEEs in terms of three problems: lack of memory, runtime overhead, and parameter manipulation. The proposed method solves these problems by dividing the task composed of loading the parameters and executing GCM and prediction in concurrent computing. The execution of partial layers in a secure world [7, 9] reduces the runtime overhead because of the execution of the other partial layers in a normal world. However, it requires attention to the threat such as model extraction attacks as well as parameter manipulation. The execution in a temporarily isolated environment [8] also requires the specific OS for the customized Arm Trusted Firmware.

Table 3. Comparison of the proposed method and related works using TEEs

7 Conclusion and Future Work

This paper proposes a model protection method using TEEs mainly to protect the deep learning models implemented in embedded devices. The porting the program of a trained model to a TEE simply does not work well, because of the limitation of the memory size in TEEs, the increase in runtime due to TEEs, and the threat of parameter manipulation. The features of the proposed method used to solve these problems are memory saving, a low runtime overhead, and the detection of parameter manipulation. The idea is to divide the loaded parameters and execute GCM and prediction in concurrent computing. In the experiments, the proposed method is compared with existing works using three types of trained models. The experimental results show that the proposed method reduces the memory size of the loaded parameters by about one-fifth to one-half, and also reduces the runtime overhead by about one-tens compared to the execution of layer-by-layer in the secure world [6, 10]. The proposed method is effective by setting the division unit, considering the trade-off between the parameter size and the runtime. This paper discusses the security technologies to build and support the proposed method as a model protection system, such as a trusted computing base, secure computations with hardware accelerators, and side-channel resistance. The related works for model protection are summarized and compared with the proposed method.

In future work, the division unit of the proposed method will be expanded for the computation unit in a layer. In particular, the matrix multiplication of a convolutional layer such as im2col consumes a lot of memory. Therefore, it is necessary to save memory for the computations. The method will be also evaluated using the other types of trained models and the other authenticated encryption modes such as OCB (Offset Code Book) mode and on the other platforms. In particular, a recurrent neural network, which is different from the feed-forward neural networks featured in this paper, will be applied to the proposed method. Moreover, it will also be considered to use shared memory on the other platforms and cooperation with hardware accelerators.