1 Introduction

In recent years, business information technology has developed rapidly, and the amount of data in their respective business management systems is getting larger and larger, the content is more and more complex, and the types are increasing. Commercial information also responded to this situation and immediately relied on the commercial data center to successfully establish a business big data information summary and comprehensive analysis service platform [1]. However, how to effectively protect the security of information and data related to trade secrets under the current severe network security situation has gradually become a key issue affecting the healthy and rapid development of China’s business.

With the rapid and in-depth development of intelligent technologies such as the mobile Internet of Things and 5G, the development trend of the mobile Internet of Everything continues to deepen. The Internet of Things smart technology and terminal smart mobile devices are increasingly penetrating into the daily lives of modern people. New mobile business development models such as smart mobile grids, smart cities and autonomous driving continue to emerge. The number of smart mobile device users will continue to show explosive rapid growth. According to IDC’s forecast, by 2020, at least one terminal intelligent mobile device with a capacity of up to 50 billion will be able to access the mobile Internet. The following problem is the “massive” quality data generated by industrial terminal video. Taking an industrial scene video as an example, a single video camera 1080 ps format industrial video will simultaneously produce 330g of industry at a high bit rate of 4 mbps per day. Massive data terminal communication equipment resource interconnection management mode puts forward higher technical requirements for accurate response, duration and data security performance of data resource management requests. Cloud computing’s “pay-as-you-go” interconnection model has completely freed traditional enterprises and network end users from the constraints of many key details in resource ownership and secure management of large amounts of data, such as saving storage resources, computing time constraints and reducing network communication occupation cost. However, in many industrial sites and other high-latency-sensitive application environments, when millions of traditional smart home devices begin to request smart services, the current smart cloud computing infrastructure is difficult to fully meet the requirements of traditional smart home devices for high-speed mobile networks. There is a great demand for support, location information awareness and low latency. As a result, a new model of data processing on the edge side of the object has been spawned, that is, the use of edge computing, and thus has received widespread attention in the information academia and information industry [2, 3]. Edge data computing reduces the load of cloud-based network computing systems by directly empowering massive intelligent computing devices located on the edge side to perform data calculations and network data processing calculations in combination with current intelligent cloud computing and centralized network data processing system models. It alleviates the transmission pressure on network data bandwidth and improves the data collection and processing transmission efficiency for massive intelligent devices [4].

Pasupuleti and other companies have proposed a public key outsourcing cloud platform data privacy security protection solution (esppa) for large mobile terminal devices. The solution mainly uses high-probability public key outsourcing encryption technology (ppke) and Baidu keyword outsourcing ranking. Search algorithm (rks), an outsourced ranking data query, can implement data privacy security protection on large mobile device terminals with limited data resource traffic [5]. First, mobile users need to generate an index of file data and encrypt the file data and other index files before uploading. Second, in order to access the department’s ciphertext data stored in the password in the cloud in real time, the user generates a trapdoor password for each keyword password and publishes it to the user cloud. In the end, the cloud computing server traps according to the search results of the user and sorts all users to return the data that are stored based on the correlation of the data and the sorted results, and then automatically decrypts the original number obtained by the user [6]. Bahrami et al. proposed a lightweight cloud encryption storage method for us in the current environment of mobile computing, multi-cloud and cloud computing, which was originally used to store the encrypted data in the cloud [7]. This encryption method mainly uses a pseudo-random encryption permutation (prpm) algorithm based on the chaotic encryption system to directly implement lightweight replacement encryption. The replacement encryption operation is usually performed directly on a mobile communication device, rather than in the entire cloud.

To effectively protect the privacy of user databases [8, 9], this article first introduces the environmental background of big data and the security requirements of trade secret protection and then discusses the security management of data in a big data environment and the technical protection platform. Based on the edge computing, this paper proposes a method for protecting confidential information of commercial information physical systems. Finally, experiments are performed based on distance to determine abnormal data. The identification of abnormal data is basically accurate, and the task of eliminating abnormal data can be completed [10]. Through data acquisition experiments in human behavior recognition scenarios, it was found that recognition errors mainly occurred in identifying non-anomalous data as abnormal data, and monitoring business information from identifying the abnormal data can provide a good protection for commercial information physical system method.

2 The methods of classroom research

2.1 Edge computing architecture

The “edge” in marginalized cloud computing networks is actually a relatively narrow concept. It refers to any computing network resource and other network computing resources transmitted from one data source to the entire cloud computing data center. Edge terminal computing capabilities allow terminal devices to simultaneously migrate data storage and application computing capabilities tasks to multiple network edge terminal nodes, such as wired base stations (bs), wireless network access points (wap) and edge terminal servers. It meets the requirements of higher computing processing capacity and scalability of network terminal equipment, and at the same time, it can effectively save computing power tasks to complete the data transmission between the network cloud terminal server and the network terminal equipment and the link in the link. The basic architecture of edge core computing is mainly divided into four main functional architecture levels: edge core computing infrastructure, edge computing data center, edge computing network and terminal mobile application terminal [11].

2.2 Data security and privacy protection

The main content of data security and privacy protection refers to the security of outsourced business data, including the security and confidentiality of external data and secure data sharing, integrity data auditing and secure and reliable search data encryption [12, 13]. Data sharing confidentiality and secure data sharing usually need to be implemented using data encryption technology. Usually, the data are encrypted first and then sent back online. The traditional data encryption processing methods mainly include symmetric password encryption and other asymmetric. The password is encrypted, but the encryption process of the user’s subsequent shared data will also provide some difficulties. Currently, the widely used file attribute-based information encryption technology is a threshold access policy based on information attributes. When a user needs to possess certain attribute-based information, it can access and decrypt these information attributes. This kind of access threshold from an original monotonous hierarchical access threshold tree gradually evolved to form a multi-level access feature. The file attribute ciphertext encryption solution based on the file ciphertext protection strategy is widely used in enterprise cloud computing and database storage and information sharing security. The repeated encryption key algorithm of the data proxy is usually a semi-trusted data agent. It can directly convert the ciphertext into the proxy ciphertext for multiple data proxy users at the same time through the repeated encryption key. Encrypted plain text cannot be obtained simultaneously between data agents. Message data proxy reconstruction and encryption are widely used in cloud security system applications for large data message forwarding, file data distribution and multi-user data sharing [14]. The problem that complete information audit needs to solve is how to completely determine the information integrity and data availability of audit data, and the audit functions include dynamic management audit, batch management audit, privacy protection audit and low work complexity. Programmable search data encryption technology refers to the effective search and query of encrypted edge data information in the mobile phone cloud. One of the main problems currently solved for cloud edge data calculation is that the complexity of its processing algorithm is too high. During the calculation, a large amount of calculation power is generated. At present, the support of elastic search code encryption is mainly divided into security and ranking support, elastic search code encryption based on agent attribute, elastic search code encryption supporting agent dynamic code update and agent importance encryption supporting elastic search encryption.

Privacy security protection can be subdivided into user data privacy security protection, location information privacy security protection and user identity privacy security protection. The main applications are concentrated in cloud computing for mobile phones and other cloud computing applications. The protection of location data privacy information is mainly to ensure that host users can flexibly use the existing data while ensuring the security of the data. The location information privacy data protection scheme is based on the anonymous location server. The data use the k-server’s anonymous data algorithm, but because the k-server’s anonymous data algorithm usually consumes a lot of data bandwidth in practice, it is particularly important to design a lightweight and efficient location privacy data protection solution [15]. The protection of identity privacy right now is only explored in various mobile and cloud computing application environments. The edge computing data security and privacy protection research system diagram is shown in Fig. 1.

Fig. 1
figure 1

Research system chart of computing data security and privacy protection

2.3 Authentication

Because the current edge information calculation process is a multi-function entity’s real-time calculation processing paradigm, which itself contains many functional domain-based entities, the functional entities between different trust domains in edge management calculations currently need to switch to each other in real time. Verification, including internal identity information authentication technology in a single domain, is used to mutually resolve the internal identity information allocation authentication problem of each functional entity, and the entity has certain self-privacy information protection features; cross-domain identity information authentication technology is used for mutual authentication. The value of identity authentication between entities in different trust domains is extreme, and the current technology is in the initial stage of development; real-time switching identity authentication technology is used for various mobile terminals and handover devices with high-precision mobile performance and other characteristics in current edge management computing. A tailor-made real-time handover management technology based on user internal identity information authentication ensures the real-time handover accuracy of the device, but switching authentication is important for the privacy protection of user IDs during the handover task.

There are several types of identity authentication:

  1. 1.

    Identity authentication in a single domain

The security identity information authentication in a single authorized trust domain is mainly used to solve the security identity information distribution security problem of each network entity. First, the respective entities must first pass the security identity authentication of a single authorized data center to be able to directly access the information store and then data computing.

  1. 2.

    Cross-domain authentication

At present, the theoretical research on communication authentication mechanisms applicable to different communication domain authentication entities is still in the initial stage of theory, and a relatively complete theoretical context of authentication research and specific theoretical research methods have not been basically formed. In the research of cross domain identity authentication management based on cloud computing, the multi cloud identity management and authentication between different cloud computing service providers can be regarded as or a new cross domain identity authentication management form, which can make some authentication management machines suitable for single multi cloud authentication management standards and multi cloud single sign on (SSO). It is hoped that the system can be widely applied to multi cloud identity management authentication between multiple trusted domains [16].

  1. 3.

    Switch authentication

Due to the high mobility of mobile terminal devices in mobile edge network computing, the geographic location of mobile users often changes, making traditional mobile centralized user identity information authentication network protocols no longer applicable in such situations. Authentication switching identity authentication is an application authentication handover application technology to effectively solve the user’s personal identity information authentication of high-edge mobile performance. Therefore, in-depth research on authentication switching identity authentication handover technology can not only provide a strong technical guarantee for real-time accurate information authentication handover of users using edge mobile devices in edge mobile computing, but also solve the user’s personal identity in the practice of applying authentication handover technology. The issue of authentication privacy protection has also been the focus of academic research [17].

Because edge computing is an open dynamic system in which multiple entities and multiple trust domains coexist, the identity authentication protocol must consider the correspondence between entities and trust domains. Specific research contents include cross-domain authentication and switch authentication of the same entity between different trust domains; identity authentication and mutual authentication of different entities in the same trust domain; and finally, while achieving lightweight identity authentication, taking into account anonymity and integrity. Features such as reliability, traceability and batch certification are also important research points.

2.4 Access control

Remembering the real-time control of the resource access process based on predetermined models and policies is the key technology and important method to ensure system security and protect user privacy in edge computing. Based on the system architecture of edge computing, multiple entities in different trust domains put forward attribute-based access control and role-based access control. Access control is divided into the following types:

  1. 1.

    Attribute-based access control

Because edge computing is a data-dominated computing model, access control for edge computing is usually implemented using cryptographic techniques. Traditional cryptographic techniques are not suitable for distributed parallel computing environments, and attribute encryption (ABE) works well. It is suitable for distributed architectures to achieve fine-grained data sharing and access control [18].

  1. 2.

    Role-based access control

Role-based access control provides flexible control and management through a dual permission mapping mechanism, that is, user-to-role and role-to-data object permission mapping.

2.5 Data preprocessing module

According to the design of the data preprocessing module for edge computing, the data type and local computing power are considered. After selection, this paper uses a distance-based method to determine abnormal data. For the processing of abnormal data, this article will completely delete the found abnormal data and then fill it with the missing data.

Considering the characteristics of the timeliness of environmental data and the limitations of processor performance, this paper will use the nearest neighbor padding of hot card padding when performing data padding on edge devices, combined with Lagrange interpolation. The essence of Lagrangian interpolation is to construct an interpolation function that is simple and has sufficient accuracy based on known node data or data from some known points on the line graph and use this interpolation function to quickly obtain the location data; this method of using several nodes to construct the interpolation function is called Lagrange interpolation [19].

The mathematical expression is that the number of interpolation nodes on the original function f(x) is n + 1, which are (x0, y0), (x1, y1), (x2, y2),…, (xn, yn), substituted into the following formula:

$$y = a_{0} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} + \cdots + a_{n - 1} x^{n - 1}$$
(1)

The expression of the simultaneous available Lagrange interpolation polynomial is:

$$y = p(x) = \sum\limits_{i = 0}^{n} {y_{i} } \prod\limits_{i = 0,i \ne j}^{n} {\frac{{x - x_{i} }}{{x_{i} - x_{j} }}}$$
(2)

After collecting the user’s face information, the authority distribution authority will send the user’s authority vector Li to the edge computing node; Li is an n-dimensional zero-bit string representing the user’s authority information for the building where the n edge computing nodes are:

$$L_{i} [j] = 1\quad (1 \le j \le n)$$
(3)

On behalf of the user has the right to enter the building where the jth edge computing device is located, otherwise if:

$$L_{i} [j] = 0\quad (i \le j \le n)$$
(4)

it means that the user does not have the permission.

After calculating the nearest points from the missing data by the above formula and substituting them into the above expression, a corresponding polynomial can be obtained. Then, the point X corresponding to the exact function value is substituted into the interpolation polynomial to obtain the approximate value of the missing value.

For the extraction of human behavior recognition feature vectors, the edge computing device uses the human behavior feature extraction SDK provided by the cloud service provider to perform feature extraction on the user’s human behavior picture and extracts a 160-dimensional feature vector from the face picture, which is recorded as:

$$f_{i} = \left( {f_{i,1} ,f_{i,2} , \ldots ,f_{i,160} } \right)^{\text{T}}$$
(5)

Make:

$$\hat{f}_{ia} [j] = \hat{f}_{ib} [j] = \hat{f}_{i} [j];\quad S[j] = 1$$
(6)

Calculate the final encryption result:

$$\left( {M_{1}^{T} \;\;\hat{f}_{ia} ,M_{2}^{T} \;\;\hat{f}_{ib} } \right)$$
(7)

First t edge computing devices (1 ≤ j ≤ t) calculate:

$$U_{j}^{i} [k] = Z_{ij} [k]\prod\limits_{l = 1,l \ne j}^{t} {\frac{{ - x_{l} }}{{x_{j} - x_{l} }}} (\bmod \;p)$$
(8)

The transformation of the feature vector rewrites the feature vector as:

$$\hat{f}_{q} = \left( {\frac{{f_{q,1} }}{{\left\| {f_{q} } \right\|}},\frac{{f_{q,2} }}{{\left\| {f_{q} } \right\|}}, \ldots ,\frac{{f_{q,160} }}{{\left\| {f_{q} } \right\|}}} \right)$$
(9)

2.6 Data acquisition for human behavior recognition scenarios

Human behavior recognition is a technology that collects specific information about human behavior and then analyzes and judges human behavior status [20]. Based on this definition, the workflow of human behavior recognition is to obtain data information of human behavior through various channels, analyze the original data, extract features that have the ability to distinguish between different behavior types, model the sample data and finally realize the classification and recognition of human behavior [21].

However, in traditional human behavior recognition systems, the analyzed data are generally existing data that have been collected, so they do not include a data acquisition module for real-time data transfer. On the one hand, this architecture is not conducive to correcting the model after the model is completed due to the lack of real-time and effective feedback information and universality [22]. When edge devices are introduced, on the one hand, edge devices and their derivatives can serve as data collection modules to avoid the problems mentioned above. At the same time, the edge device can be used to test the new model that has been established. By comparing with the data collected by itself, it can timely feedback the prediction results to modify the existing model, so as to build a behavior recognition model for the current individual and improve the accuracy of model prediction degrees [3, 23].

Therefore, the solution determined in this article is to use the relevant sensors connected to the edge device to complete the real-time collection of data required for human body recognition and then preprocess the data accordingly and send the data to the cloud. The cloud computing center uses related machine learning algorithms to complete the modeling work and sends the model back to the edge device. The edge device uses the trained model to predict the pose behavior of the current surveillance object. When the model prediction result does not match the current status, it can also notify the cloud computing center of the data and the correct result to modify the relevant model [24].

3 The process

3.1 Experimental settings

Because the collected human behavior data are physically accelerated and angular velocity transmitted by the data pre-processing module, such stream data have no specific physical meaning when analyzed independently, so model training cannot be performed. Therefore, in order to obtain training samples suitable for machine learning models, the stream data need to be converted into sub-sequences divided by a certain size before the formatted data are analyzed.

The algorithms under this module are implemented using Python programming. The experimental test runs are divided into two parts: the cloud computing center and the edge device. The specific configurations are as follows:

Cloud computing center: Intel (R) Core (TM) i5-4570; CPU dual core@3.20 GHz; 4.00 GB memory; Windows 7 64-bit operating system.

Edge device: BCM2863; ARM Cortex-A7 CPU; 1 GB memory; RASPBIAN JESSIE WITH PIXEL operating system.

In the current scenario, the system’s training for human behavior recognition is done using data stored in the relevant database on the UCI database. The data collection method in this database is similar to the method described earlier in this chapter. All three IMU devices are bound to the subject’s chest, arms and ankles to collect three-dimensional data of the human body. In addition, the database also contains a record of the real-time heartbeat frequency of the detected object. The collection frequency of all data is 100 Hz.

3.2 Experimental data and parameters

During the monitoring period, the behavior of the monitored objects was subdivided into 19 items by the UCI database. However, due to the needs of the test in this experiment, these behaviors are summarized into four main behavioral characteristics: sitting, walking, running and standing. Behaviors that are not monitored temporarily such as cycling and skipping will not be added to training among the data.

After sorting the data according to the above requirements, a total of 34,091 experimental data were obtained, which included relevant data for a total of 8 men and women monitored. As mentioned above, there are a total of 20 variables in this data set as input to the model.

After processing the processed UCI data set, the parameters obtained are shown in Table 1.

Table 1 Comparison of algorithms in UCI data sets

Because Xgboost comes with cross-validation, we can call the corresponding library function to complete the test of the model. We perform a 50-fold cross-validation method on the existing data, as shown in Table 2.

Table 2 Fold cross-validation method data

4 Interpretation of result

4.1 Xgboost model

  1. 1.

    The parameter interface is provided by Xgboost developers under python. Users can modify the relevant parameters of the model according to their own needs, which has achieved the ideal model training effect.

An important parameter that needs to be adjusted in a decision tree is the height of the decision tree. Due to the characteristics of Xgboost’s algorithm, the shape of the basic classifier is required to generate the model, that is, the number of leaves and the vertical depth of the decision tree play a very important role in the process of model building. Therefore, when doing model training, we need to find a suitable value to ensure that the final classifier can meet the needs of users on the one hand, and it will not affect its applicability on other related data.

Therefore, in this experiment, we need to control the maximum depth of the final decision tree by modifying the max_depth parameter under the xgb.train () function. The obtained parameters are shown in Table 3.

Table 3 Xgboost model parameters

For the current model, because the classification result of the data set is four categories, the soft function multi-classifier is used as the loss function. Generally speaking, when the learning rate is small, the generalization ability of the model will be higher than that without the learning rate. The model improves a lot. Generally, the learning rate is inversely proportional to the number of iterations. The lower the learning rate, the more iterations are required. In this experiment, in order to make each learning more accurate, the learning step is set to 0.15, during the iterative process. The deeper the maximum depth of each tree, the more features are learned, but it will also reduce the generalization ability of the model. Therefore, the depth is set to 8; the multi-class error rate is used as the data measurement method. The data graph is shown in Fig. 2.

Fig. 2
figure 2

Xgboost model parameters

  1. 2.

    Cross-validation: The per-judgment accuracy of the existing model for the training data is about 98.3%, and the accuracy will continue to increase as the cycle deepens [25,26,27]. However, because the decision tree itself is a greedy algorithm, theoretically, if the number of iterations is infinite, the accuracy of the prediction will reach 100%. However, in this case, over-fitting generally occurs, so you need to adjust the values of the two parameters min_child_weight and max_depth in Xgboost to prevent over-fitting. By setting the parameter mun_round that controls the number of iterations, the accuracy of classification can also be further improved in the model obtained after the above training, as shown in Fig. 3.

    Fig. 3
    figure 3

    Fold cross-validation method data

4.2 Comparison test

  1. 1.

    Compared with other basic algorithms, Xgboost runs faster and has greater advantages in accuracy and recall. Compared to weak classifiers such as C4.5 decision trees, the advantages of integrated learning algorithms are more obvious. Since Xgboost has more regularization of self-models than other models, this type of model has stronger generalization ability. In actual work, as the amount of data continues to increase, the accuracy advantage of Xgboost will be correspondingly more obvious [28, 29].

However, it should also be pointed out that comparing the model performance during parameter optimization, it will be found that it is impossible to greatly improve the performance of the model only by adjusting the parameters and small optimization of the model. To achieve the overall qualitative leap in classification accuracy, we also need to rely on other means, such as feature engineering and model combination. For the design of the data analysis function in this edge system, in addition to completing the training of the classification model in the cloud computing center, it is also necessary to use the trained model in the edge device to implement the classification test on real-time data.

First, you need to build a software environment in the edge device that can run the classification algorithm. Because Xgboost has a corresponding optimization algorithm for memory, and the calculation amount of the algorithm itself is mainly focused on the construction of the model, the edge device selected in the current environment, the Raspberry Pi, can fully bear the human behavior recognition model based on Xgboost Testing works. The comparison test is shown in Fig. 4.

Fig. 4
figure 4

Comparison of algorithms in UCI data sets

  1. 2.

    Analysis of accuracy and maximum depth: As the depth increases, the classification effect of the Xgboost model becomes better. However, when the depth reaches 8, the increase of this effect becomes less obvious, which indicates that if the depth of the decision tree continues to be increased, the current model cannot effectively improve the classification ability, but will cause the model to over-fit the current data and over-fitting. Therefore, it is finally determined that when the value of the parameter max_depth is set to 8 under this experiment, the best classification effect can be obtained.

Another parameter that needs attention is eta. This parameter is equivalent to another parameter learning_rate under the more popular machine term library learn. This parameter can improve the robustness of the model by reducing the weight of each step, that is, reducing the weight of features to make the promotion calculation process more conservative. In actual use, it will be set between 0 and 1 according to the actual situation; usually, the range is [0.01–0.2], as shown in Fig. 5.

Fig. 5
figure 5

Relationship between accuracy and learning rate

5 Conclusions

This article mainly analyzes the information security requirements of edge computing from the perspective of identity authentication and privacy protection. In edge computing, due to different deployment options, edge computing service providers can be located in different places. In this context of open interconnection. The identity authentication and management functions are spread across all functional levels of the edge computing reference architecture. The user’s identity authentication is the first line of defense for edge computing. Identification is critical to ensuring the security and confidentiality of applications and data. In this paper, human body recognition technology in commercial information protection methods is studied through data collection and preprocessing, mainly to enable data mining. Regardless of whether you use existing models or algorithms designed by yourself, you need a hardware environment that can be implemented. This includes not only the basic computing power required by the algorithm, but also appropriate storage space for data and model storage and scheduling. When multiple sensors send data to the edge device, each sensor uploads its respective data to the edge device through parallel transmission. For network data, the edge device itself needs to access the target web page through the network module, download the required data and complete the data collection task. The data exchange between the edge device connected to the Internet and the cloud computing center is completed by wireless transmission, so no other device is needed for data collection.

This article is aimed at the research of confidential protection of commercial information physical systems, and encryption algorithms can be used. Existing data confidentiality and secure data sharing solutions are usually implemented using encryption technology. The conventional process is that the data owner encrypts and uploads the outsourced data in advance and decrypts the data when needed. Data encryption technology provides effective solutions to ensure data security in various computing modes. In an open edge computing environment, traditional encryption solutions can be combined with features such as parallel distributed architecture in edge computing, limited terminal resources, edge big data processing and highly dynamic environments to achieve lightweight and distributed data security protection system.

Human behavior recognition technology has been applied in various fields of people’s clothing, food and shelter. Compared with traditional password-based authentication, human behavior recognition technology has the advantages of being directly friendly, not easy to be stolen and noninvasive and has become an application in the field of identity authentication mainstream. A typical solution is to use a third-party human behavior recognition technology provider to achieve accurate face recognition. In this paper, xgboost model is used to do data preprocessing module experiments to determine the abnormal data. The identification of abnormal data is basically accurate, and it can complete the task requirements of eliminating abnormal data. It is applied to a popular business information confidentiality protection method human behavior recognition scene data acquisition. The experimental data show that the identification error mainly occurs in identifying non-anomalous data as abnormal data. By identifying the abnormal data, it can play a monitoring role in the security of the commercial information physical system, and it is also a good security protection method for commercial confidential information. The experimental data show that the comparison of data after identifying anomalies can physically monitor the security of business information and has guiding significance for the confidential information protection of business information physical systems.