1 Introduction

In the past 2 decades, artificial intelligence researchers have endeavored to give machines the ability to recognize and discern emotions in order to recognize and express emotions. This research is called sentiment analysis. Sentiment analysis has become a new trend in social media, which effectively helps users understand the opinions expressed on different platforms (Stefano and Gabriele 2020). In recent years, a large number of researchers have conducted sentiment research through text, and text sentiment analysis has made great progress. With the rise of video social networking apps, people are more inclined to use video rather than text to express opinions when expressing their opinions on products or services. In life, consumers tend to shoot videos to record their comments and opinions on products and upload the videos to social media platforms such as YouTube or Facebook to express their opinions and favorites to users. Some people think that browsing text comments on social media is not as easy to understand and convenient as videos, and that long texts will take a lot of time for users. At the same time, it is difficult to find authentic and reliable text-based comments on social media, but it is easy to find reliable and authentic video comments. The voice data in the video expresses the tone of the speaker, while the visual data conveys facial expressions, which in turn helps to understand the emotional state of the user. The data obtained from the video can be used as a useful information source for sentiment analysis, but there are some major challenges that need to be resolved. For example, the expression of opinions varies greatly from person to person. Some people express their opinions euphemistically, some people express their opinions intuitively, and some people rely entirely on logic to express emotions. When a person changes his voice to express his opinion, the audio data usually contains most of the information that expresses his opinion. When a person expresses personal opinions through facial expressions, most of the data required for sentiment analysis can often be determined through facial expression analysis. Therefore, these personal differences prompt us to find a general sentiment analysis framework. At present, there are few researches on sentiment analysis of multimodality, but most of multimodality research only considers visual and audio information and ignores text information. On the other hand, how to effectively integrate different forms of features is a problem that needs to be solved. Therefore, by developing a multimodal framework, it is possible to cope with all three sensing modes in a human-centric environment: text, audio, and video. It enables people to communicate and express their emotions through different channels, and enables text, audio, and visual forms to be used simultaneously to effectively extract the semantic and emotional information conveyed in the dialogue (Arnold et al. 2011).

The features extracted from traditional emotion recognition methods are based on artificially designed features (Bourdeau et al. 2019). With the increasing popularity of large-scale data sets, deep learning has become a general method of machine learning. In many computer vision and natural language processing tasks, machine learning can get better results. Recently, 3D Convolutional Neural Networks (C3D) have made great progress in handling various video analysis tasks. C3D can model the appearance and motion information at the same time, and the C3D features of the linear classifier can achieve good performance on different video analysis benchmarks (Mary and Arockiam 2017). In the existing video-based emotion recognition task, few researchers use C3D network. Therefore, constructing a novel multi-modal fusion framework combined with C3D network can play a very important role in the development of social sentiment analysis.

This study uses SVM and KNN to build models for Japanese social sentiment classification, which lays the foundation for the further development of the translation system.

2 Related work

The traditional analysis of micro-blog text is mainly carried out independently from the content of the text or the subjective emotion of the text. For example, through text content, Weibo text can be divided into entertainment, society, education, sports and other categories, and through the subjective sentiment in the text, the Weibo text can be divided into two categories: positive and negative. The topic-emotion mixed model combines the two together, and analyzes the objective topics contained in the text and the corresponding subjective emotions. The literature Rabie et al. (2020) proposed an unsupervised topic emotion hybrid algorithm. The algorithm comprehensively considers the characteristics of Chinese microblogs, builds a five-level sentiment lexicon, and recognizes new words through rules and statistical methods. Moreover, the algorithm samples the sentiment labels of each sentence in the text, and the subject label samples of each word, and establishes the corresponding relationship, and finally extracts the theme and emotion elements. However, when the unrelated emotional words in the text are not included in the constructed emotional lexicon, the algorithm's recall rate will be affected. In order to be able to use different annotation tree banks and sentiment lexicons at the same time, the literature Schuelke-Leech et al. (2015) proposed a mixed syntax processing method. Based on the undirected graph MST syntax analyzer, this method searches the dependent structure in the text through the maximum spanning tree and learns the edge weights through the training algorithm, which eliminates the need to re-develop the syntax analyzer after fusing multiple corpora. The literature Vimalkumar and Radhika (2017) proposed a new topic sentiment classification method. This method first divides the existing sentiment dictionary into core and ordinary levels. Then, this method uses the N-Gram algorithm to segment the words in the Weibo text, and calculates its feature value to determine the level of the emotion word and enhance it with different multiples. The literature Yu et al. (2018) has conducted research on the theme sentiment in product reviews. The literature Cui et al. (2017) proposed the ILDA model, extended PLSI and LDA, and combined with a variety of probabilistic graphical models and product review data to extract relevant product review topics and corresponding ratings. In order to solve the problem of sparseness in short texts, the literature Li et al. (2018) proposed a short-text sentiment-topic model (SSTM) for short texts. Aiming at the characteristics of short text, this model represents the entire corpus as a set of word pairs, which effectively improves the accuracy of topic sentiment classification. In view of the characteristics of social media data such as large volume, few words, and rapid spread, literature Mukherjee et al. (2017) proposed a text sentiment analysis technology based on keyword analysis. The technique first extracts key sentences from three features: location, keywords, and word frequency, and then proposes seven types of part-of-speech collocations that have a greater influence on emotional tendencies and gives corresponding calculation rules. Finally, the technique calculates sentiment values based on the weights of all keywords. In order to reduce the reliance on artificially defined rules, the literature Nasir and El-Ferik (2017) combined machine learning methods with Weibo topic sentiment analysis. Moreover, the literature used a variety of machine learning models to quantitatively evaluate the modeling effect and introduced emoticons into word features, making the analysis results closer to objective facts. The literature Saafan et al. (2017) extended the existing supervised theme sentiment model and proposed a theme model based on multi-label and implicit emotions, and connected the potential theme with the user's induced emotions.

The literature Van (2018) implements a visual query system for e-commerce product data. The system is mainly based on statistical graphs, which is convenient for users to analyze shopping consumption preferences and tap potential value. In order to enable students to better understand the abstract data structure, the literature Xiu et al. (2017) used the JSAV (JavaScript Algorithm Visualization) visualization library to combine visualization and tutorials, so that students can learn algorithms and practice in the visual display process. The literature Yen et al. (2017) improved the LIC algorithm for the visualization of vector data such as ocean currents and typhoons and implemented parallelization with GPU to realize real-time interactive visualization of vector fields. The literature (Adomavicius and Tuzhilin 2011) improved the data prefetching method and improved the data exchange capability during the visualization of large-scale flow field vector lines. Parallel coordinates (Wang and Zheng 2017) is the most classic visual representation method, which is mostly used to visualize high-dimensional geometry and multivariate data. Each attribute in high-dimensional data can be represented by each axis in parallel coordinates, and the data value gradually increases along the direction of the coordinate axis. The positioning of data points in parallel coordinates is very fast, as long as the data values of each attribute of the data points are connected on the corresponding coordinate axis. However, the classic parallel coordinates have the disadvantage of fixed coordinate axes. In the case of a large amount of data, the coordinates will have a serious edge overlap and cross phenomenon. The literature Celik et al. (2006) proposed a novel radial visual arrangement based on Time Wheel. This method surrounds the attributes into a time axis in the form of hexagons. The mapping relationship between each attribute in the data and the time axis is represented by different color lines. This method enhances the efficiency of multi-dimensional data browsing and analysis. The literature Das et al. (2014) proposed a visual method based on flexible link axes. In this method, each axis defines a related attribute and range. By modifying the parallel coordinate axis, it can help the user to flexibly define various attributes. The literature Eisenman et al. (2009) used hierarchical clustering algorithms to develop multi-resolution views of data, construct hierarchical clustering trees, and hierarchically display the data in parallel coordinates, which can well expand the size of the data set and represent the data from different abstract levels and reduce the visual clutter in parallel coordinates. The literature Gao et al. (2009) proposed a new parallel coordinate visualization clustering method. This method uses spline curves to replace the original parallel coordinates of lines to minimize the curvature and maximize the parallelism of adjacent edges to optimize the spline curves. By adjusting the shape of the edges and maintaining their relative order, the overall clustering effect is improved, and the visual effect is improved.

3 Linear local tangent space arrangement based on orthogonal discrimination

The LLTSA algorithm is obtained by LTSA linearization. This algorithm can obtain a clear mapping relationship, so that when new data points enter the program, dimensionality reduction processing can be quickly performed to obtain the data after dimensionality reduction. The specific steps of the LLTSA algorithm are as follows:

The n data points in the high-dimensional space \(R^{D}\) are selected, and then the conversion matrix \(X = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\}\), \(x_{i} \in R^{D}\) is found to make the data points after the projection of the data m become \(Y = \left( {y_{1} ,y_{2} , \ldots ,y_{n} } \right)\), \(y_{i} \in R^{d}\),\(d < D\), and satisfy the linear mapping \(Y = A^{T} XH_{n} ,H_{n} = {{I - ee^{T} } \mathord{\left/ {\vphantom {{I - ee^{T} } n}} \right. \kern-\nulldelimiterspace} n}\). e is an n-dimensional column vector whose elements are all 1. We assume that the k points closest to each data point \(x_{i}\) are \(X_{i} = \left( {x_{i1} ,x_{i2} , \cdots ,x_{ik} } \right)\), and k is a selection matrix containing only 0 and 1, and satisfying \(Y_{i} = Y_{si}\). In order to maintain the local linear maximization and its internal local geometric structure, the reconstruction error \(E_{i}\) of the data should be minimized, so that the objective function of LLTSA can be obtained as:

$$\begin{gathered} \min \sum\limits_{i} {\left\| {Ei} \right\|^{2} } = \min \left\| {YSW} \right\|^{2} = \hfill \\ \min tr\left( {YSWW^{T} S^{T} Y^{T} } \right) \hfill \\ \end{gathered}$$
(1)

In the formula,

$$\begin{gathered} E_{i} = Y_{i} w_{i} = Y_{si} w_{i} ,S = \left[ {s_{1} , \cdots ,s_{n} } \right], \hfill \\ W = diag\left( {w_{1} , \ldots ,w_{n} } \right),W_{i} = H_{n} \left( {I - V_{i} V_{i}^{T} } \right) \hfill \\ \end{gathered}$$

\(V_{i}\) is the right singular vector of the d-th largest singular value of \(X_{i} H_{n}\). To determine the unique Y, we set \(YY^{T} = I_{d}\), and \(I_{d}\) is the identity matrix. Considering that the mapping is \(Y = A^{T} XH_{n}\), the objective function is as follows:

$$\left\{ {\begin{array}{*{20}c} {\mathop {\min }\limits_{Y} tr\left( {A^{T} XH_{n} BH_{n} X^{T} A} \right)} \\ {A^{T} XH_{n} X^{T} A = I_{d} } \\ \end{array} } \right.$$
(2)

From the above formula, we can see that the arrangement matrix is:

$$B = SWW^{T} S^{T}$$
(3)

In order to obtain the optimal projection vector, we can introduce the Lagrange multiplier and then solve the following generalized eigenvalue problem:

$$XH_{n} BH_{n} X^{T} a = \lambda XH_{n} X^{T} a$$
(4)

If the eigenvector corresponding to eigenvalue \(\lambda_{1} \le \lambda_{2} \le \cdots \le \lambda_{d}\) is \(a_{1} ,a_{2} , \ldots ,a_{d}\), then the conversion matrix of LLTSA is:

$$A = \left[ {a_{1} ,a_{2} , \cdots ,a_{d} } \right]$$
(5)

Matrix A is the mapping matrix. By multiplying the data matrix and matrix A, the feature vector of each data point can be obtained to realize the dimensionality reduction of the data, and then it can be used for data analysis. Although LLTSA has obtained a clear mapping relationship, it only maintains the neighbor relationship and does not use the category information between the data to increase the spatial distance of different categories of data. Therefore, it needs to be improved.

In order to solve the above-mentioned shortcomings of LLTSA, this paper applies the ODLLTSA algorithm to complex chemical process fault diagnosis to extract feature data from process data. It can form a discriminant linear local tangent space arrangement algorithm (DLLTSA) by using the inter-class divergence matrix to obtain a new objective function, thereby obtaining the mapping matrix. Then, the mapping matrix is orthogonalized to obtain the mapping matrix of ODLLTSA. In this way, the category information can be fully utilized to increase the distance between different categories of data.

We assume that the data point \(x_{i}\) belongs to a class in \(\left\{ {x_{1} ,x_{2} , \ldots ,x_{c} } \right\}\), and

$$\left\{ {\begin{array}{*{20}c} {u = \left( \frac{1}{n} \right)\sum\limits_{i = 1}^{n} {y_{i} } } \\ {u_{i} = \left( {\frac{1}{{m_{i} }}} \right)\sum\limits_{{y_{i} \in x_{i} }} {y_{i} } } \\ \end{array} } \right.$$
(6)

Among them, u is the mean vector of all the data after projection, \(u_{i}\) is the mean vector of the i-th data after projection, and \(m_{i}\) is the number of the i-th data. If \(Z = \left( {z_{1} ,z_{2} , \ldots ,z_{n} } \right) = XH_{n}\) and \(Y = \left( {y_{1} ,y_{2} , \ldots ,y_{n} } \right) = A^{T} XH_{n}\), then \(y_{i} = A^{T} z_{i}\). Then, the inter-class divergence matrix \(T_{B}\) of the ODLLTSA algorithm is:

$$\begin{gathered} \sum\limits_{i = 1}^{c} {m_{i} } \left( {u_{i} - u} \right)\left( {u_{i} - u} \right)^{T} \hfill \\ \quad = \sum\limits_{i = 1}^{c} {m_{i} } \left\| {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {A^{T} z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {A^{T} z_{i} } } } \right\|^{2} \hfill \\ \quad = \sum\limits_{i = 1}^{c} {m_{i} } tr\left\{ \begin{gathered} \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {A^{T} z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {A^{T} z_{i} } } } \right) \cdot \hfill \\ \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {A^{T} z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {A^{T} z_{i} } } } \right)^{T} \hfill \\ \end{gathered} \right\} \hfill \\ \quad = tr\left\{ {A^{T} \sum\limits_{i = 1}^{c} \begin{gathered} \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {z_{i} } } } \right) \hfill \\ \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {z_{i} } } } \right)^{T} A \hfill \\ \end{gathered} } \right\} \hfill \\ \end{gathered}$$
(7)

In order to be able to better identify different types of data, the above value needs to be maximized, that is:

$$\begin{gathered} \max tr\left\{ {A^{T} \sum\limits_{i = 1}^{c} \begin{gathered} \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {z_{i} } } } \right) \hfill \\ \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {m_{i} }}} \right. \kern-\nulldelimiterspace} {m_{i} }}} \right)\sum\limits_{{x_{k} \in X_{i} }} {z_{k} - \left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\limits_{i = 1}^{n} {z_{i} } } } \right)^{T} A \hfill \\ \end{gathered} } \right\} \hfill \\ = \max tr\left( {A^{T} T_{B} A} \right) \hfill \\ \end{gathered}$$
(8)

In order to make the local geometric structure unchanged and satisfy the above formula, the optimization problem of DLLTSA needs to be calculated:

$$\left\{ {\begin{array}{*{20}c} {\min tr\left( {A^{T} XH_{n} BH_{n} X^{T} A} \right)} \\ {\max tr\left( {A^{T} T_{B} A} \right)} \\ \end{array} } \right.$$
(9)

The objective function of the DLLTSA algorithm can be obtained from the above formula:

$$J\left( A \right) = \frac{{tr\left( {A^{T} XH_{n} BH_{n} X^{T} A} \right)}}{{tr\left( {A^{T} T_{B} A} \right)}}$$
(10)

The above formula can be transformed into the following generalized eigenvalue problem:

$$\begin{gathered} XH_{n} BH_{n} X^{T} a_{i} = \lambda_{i} T_{B} a_{i} , \hfill \\ \lambda_{1} \le \lambda_{2} \le \cdots \le \lambda_{d} \hfill \\ \end{gathered}$$
(11)

The conversion matrix of DLLTSA can be obtained from the above formula:

$$A = \left( {a_{1} ,a_{2} , \ldots ,a_{d} } \right)$$
(12)

The projection vector of A obtained in the above formula is non-orthogonal, so in order to make the subspaces orthogonal, the Gram–Schmidt orthogonal method is used to orthogonalize A. If \(c_{1} = a_{1}\)

$$c_{i} = a_{i} - \sum\limits_{j = 1}^{i - 1} {\frac{{c_{j}^{T} a_{i} }}{{c_{j}^{T} c_{j} }}c_{j} } \left( {i = 2,3, \ldots ,d} \right)$$
(13)

If \(h_{j,i} = \frac{{c_{j}^{T} a_{i} }}{{c_{j}^{T} c_{j} }}\), then \(c_{i} = a_{i} - \sum\limits_{j = 1}^{i - 1} {h_{j,i} c_{j} }\).

If \(C = AL\), and L is the upper triangular matrix, then the objective function of the ODLLTSA algorithm is:

$$J\left( C \right) = J\left( {AL} \right) = \frac{{tr\left( {L^{T} A^{T} XH_{n} BH_{n} X^{T} AL} \right)}}{{tr\left( {L^{T} A^{T} T_{B} AL} \right)}}$$
(14)

The following results can be obtained from the above formula:

$$C = \left( {c_{1} ,c_{2} , \ldots ,c_{d} } \right)$$
(15)

The C obtained by the above formula is the mapping matrix of the ODLLTSA algorithm.The matrix after the dimensionality reduction can be obtained by multiplying the data matrix and the mapping matrix.

From the above description, the flow of the ODLLTSA algorithm can be obtained as shown in Fig. 1.

Fig. 1
figure 1

ODLLTSA algorithm flow chart

The specific steps of the ODLLTSA algorithm:

Input: data \(X = \left\{ {x_{1} ,x_{2} , \cdots ,x_{n} } \right\}\), number of nearest neighbors k, dimension d after dimensionality reduction.

Output: conversion matrix \(C = \left( {c_{1} ,c_{2} , \cdots ,c_{d} } \right)\).

  1. 1.

    According to the given value of k, the k nearest neighbors of point \(x_{i} \left( {i = 1,2, \cdots ,n} \right)\) are found to obtain the nearest neighbour matrix \(X_{i}\).

  2. 2.

    \(X_{i} H_{n}\) is subjected to singular value decomposition to obtain \(V_{i}\), and then the weight \(W_{i} = H_{n} \left( {I - V_{i} V_{i}^{T} } \right)\) is calculated.

  3. 3.

    According to formula (3), the permutation matrix B is calculated.

  4. 4.

    According to formula (7), the inter-class divergence matrix \(T_{B}\) of the ODLLTSA algorithm is calculated.

  5. 5.

    According to formula (11), the generalized eigenvalues are solved, and matrix A is obtained, and the non-orthogonal mapping matrix of DLLTSA is obtained;

  6. 6.

    According to formula (13) and formula (14), matrix A is orthogonalized, and finally the mapping matrix C of ODLLTSA is obtained.

After the mapping matrix C is obtained, the data matrix and matrix C are multiplied to obtain the feature vector of each data to complete the dimensionality reduction of the data, and then the data after the dimensionality reduction is input into the chemical process fault monitoring and diagnosis. ODLLTSA can effectively deal with nonlinear data. Reducing the m-dimensional matrix to the n-dimensional matrix \(\left( {m > n} \right)\) can not only retain the local neighbor relationship between data, but also reduce or maintain the distance between data under normal working conditions. The distance between the fault condition data and the normal condition data is increased as much as possible, thereby improving the accuracy of fault monitoring and diagnosis.

4 Chemical process fault monitoring based on KPCA

The kernel principal component analysis method is a nonlinear form of principal component analysis. It uses a kernel function to map the original space of nonlinear data into a high-dimensional space to make the data linearly separable, and then implement PCA in the high-dimensional space. The mapping diagram is shown in Figs. 2 and 3.

Fig. 2
figure 2

Before mapping

Fig. 3
figure 3

After mapping

The specific analysis of KPCA is as follows:

We set the data point to n, and a positive definite kernel matrix K can be obtained from the kernel function calculation, as follows:

$$K = k\left( {x_{i} ,x_{j} } \right)\left( {i,j = 1,2, \ldots ,n} \right)$$
(16)

In order to ensure that the input data vector of the feature space is 0 mean, the data is standardized, and \(\sum\nolimits_{i = 1}^{N} {\varphi \left( {x_{i} } \right)} = 0\). Then, the covariance matrix of the feature space is:

$$C^{H} = \frac{1}{n}\sum\limits_{i = 1}^{N} {\phi \left( {X_{i} } \right)} \phi \left( {X_{i} } \right)^{T}$$
(17)

Among them, \(\phi \left( {X_{i} } \right)\) is the function after data mapping, \(C^{H}\) is the eigenvalue decomposition. If we assume that \(\lambda\) is the eigenvalue and \(\nu\) is the eigenvector, then

$$\lambda \nu { = }C^{H} \nu$$
(18)

After multiplying both sides of formula (18) by the mapped point \(\phi \left( {X_{k} } \right)\), the following result is obtained:

$$\lambda \left( {\phi \left( {X_{k} } \right) \cdot \nu } \right){ = }\left( {\phi \left( {X_{k} } \right) \cdot C^{H} \nu } \right)$$
(19)

For the eigenvector \(\nu\) of any \(\lambda \ne 0\), there is a coefficient matrix \(\alpha_{i} \left( {i = 1,2, \ldots ,N} \right)\).

$$\nu = \sum\limits_{i = 1}^{N} {\alpha_{i} \phi \left( {X_{i} } \right)}$$
(20)

After formulas (16), (17), (20) are brought into formula (19), we obtain:

$$\begin{gathered} \lambda \sum\limits_{i = 1}^{N} {\alpha_{i} \left\langle {\phi \left( {X_{k} } \right),\phi \left( {X_{i} } \right)} \right\rangle } = \hfill \\ \frac{1}{N}\sum\limits_{i = 1}^{N} {\alpha_{i} \left\langle {\phi \left( {X_{k} } \right),\sum\limits_{j = 1}^{N} {\phi \left( {X_{J} } \right)} } \right\rangle } \left\langle {\phi \left( {X_{i} } \right),\phi \left( {X_{j} } \right)} \right\rangle \hfill \\ \end{gathered}$$
(21)

In the formula: \(k = 1,2, \ldots ,N,j = 1,2, \ldots ,N\).

After formula (16) is brought into formula (19), we can obtain:

$$\lambda \sum\limits_{i = 1}^{N} {\alpha_{i} K_{ki} } = \frac{1}{N}\sum\limits_{i = 1}^{N} {\alpha_{i} } \sum\limits_{j = 1}^{N} {K_{kj} K_{ji} }$$
(22)

Formula (22) is converted into the following characteristic equation:

$$\lambda NK\alpha = K^{2} \alpha$$
(23)

The kernel matrix K obtained by formula (16) needs to be centralized in the feature space. The process is as follows:

$$\tilde{K} = K - 1_{N} K - K1_{N} + 1_{N} K1_{N}$$
(24)

Among them: \(1_{N}\) refers to a matrix that all elements are \(\frac{1}{N}\). Then, formula (23) can be expressed as:

$$\lambda N\alpha = \tilde{K}\alpha , \alpha = \left[ {\alpha_{1} , \ldots ,\alpha_{N} } \right]$$
(25)

\(\alpha\) is standardized to make \(\alpha\) satisfy \(\left\| \alpha \right\|^{2} = \frac{1}{N}\lambda\). Then, KPCA can obtain the components of each principal component by solving the above eigenvalue problem.

When the system generates new data \(\phi \left( X \right)\), it needs to project under the feature vector \(\nu_{k}\),\(k = 1, \ldots ,p\). p is the number of principal elements of KPCA. Then, the projection \(t_{k}\) is as follows:

$$t_{k} { = }\sum\limits_{i = 1}^{N} {\alpha_{i}^{k} \left[ {\overline{\phi }\left( {X_{i} } \right),\overline{\phi }\left( X \right)} \right]} = \sum\limits_{j = 1}^{N} {\alpha_{j}^{k} \overline{k}} \left( {x_{j} ,x} \right)$$
(26)

KPCA includes two parts: offline modeling and online monitoring. The specific steps are as follows:

  1. (a)

    Offline modeling:

  1. 1.

    The data matrix Z under normal operating conditions is normalized to obtain a mapping matrix and a standardized matrix X.

  2. 2.

    The Gaussian kernel function is selected to calculate the kernel matrix K, and centralized processing is performed according to formula (24).

  3. 3.

    According to formula (25), the eigenvalues and eigenvectors are calculated, and the eigenvectors are normalized.

  4. 4.

    According to formula (26), the principal component \(t_{k}\) is calculated in the feature space.

  5. 5.

    Under normal operating conditions, the statistics of \(T^{2}\) and SPE are calculated and defined as follows:

The statistic of \(T^{2}\) is:

$$T^{2} = t\Lambda^{ - 1} t^{T} = \left[ {t_{1} ,t_{2} , \ldots ,t_{p} } \right]\Lambda^{ - 1} \left[ {t_{1} ,t_{2} , \ldots ,t_{p} } \right]^{T}$$
(27)

In the formula: \(t_{k} \left( {k = 1,2, \ldots ,p} \right)\) represents the principal element after normal working condition data mapping, and \(\Lambda^{ - 1}\) represents the inverse matrix of the diagonal matrix of eigenvalues.

The statistic of SPE is:

$$SPE = \left\| {\phi \left( x \right) - \phi_{p} \left( x \right)} \right\|^{2} = \sum\limits_{i = 1}^{N} {t_{i}^{2} - } \sum\limits_{i = 1}^{p} {t_{i}^{2} }$$
(28)

6. The control limits of \(T^{2}\) and SPE statistics are solved, the formula is as follows:

The control limits of \(T^{2}\) is:

$$T_{p,N,\alpha }^{2} = \frac{{p\left( {N - 1} \right)}}{N - P}F_{\alpha } \left( {p,N - p} \right)$$
(29)

In the formula: N represents the number of input data points, and p represents the number of principal elements of KPCA.

The control limits of SPE is:

$$SPE_{Iim} = g\chi_{h}^{2}$$
(30)

In the formula: g and h are constant coefficients related to the mean and variance of SPE.

  1. (b)

    Online monitoring

  1. 1.

    The real-time working condition data \(x_{i}\) is collected, and the mapping matrix obtained by normalizing the normal working condition data is used to standardize the data \(x_{i}\);

  2. 2.

    The new kernel matrix \(K_{i}\) is constructed and centrally processed.

  3. 3.

    The nonlinear component \(t_{i}\) is extracted in the feature space.

  4. 4.

    The \(T^{2}\) and SPE statistics of real-time operating condition data are calculated.

  5. 5.

    The control limits of \(T^{2}\) and SPE statistics are obtained. If both statistics exceed or only the SPE statistics exceed the control limit, the system detects the fault and enters it into the fault diagnosis. However, if it is otherwise, the operator needs further analysis to determine whether there is a fault.

5 Model building

Feature level fusion is to extract features from all patterns, combine them to form a long feature vector, and then feed the feature vector to MKL to perform the classification task. Multi-core model is a kind of core-based learning model with strong flexibility. Recently, theory and application have proved that using multi-core instead of single-core can achieve better classification results and can obtain better performance than single-core model or single-core machine combination model. The simplest and most commonly used method for constructing multi-core models is to consider the combination of multiple basic kernel functions. After proposing features from visual, text, and audio information, we input the features into the kernel space, extend MKL using interval constraints, and calculate its target cost function. According to the recognition ability of different base features, the weights of different base features are learned, and the gradient weighting method is used to update the feature weights. Moreover, the kernel is dimensionally normalized to maximize the maximum discriminative power of its basic features. The multi-core learning model based on interval dimension constraints proposed in this paper is shown in Fig. 4:

Fig. 4
figure 4

Multi-core learning model based on interval dimension constraints

The latest developments in multi-core learning make it an attractive technology in the field of kernel machine learning. By merging multiple cores into a unified optimization framework, MKL strives to jointly learn the optimal combination of multi-core and related predictors in supervised or semi-supervised learning settings. It does not require single-core learning, and can meet the practical needs of heterogeneous information in sample features, irregular data in feature space, uneven data distribution, and large problem size. Due to the automatic adjustment of kernel parameters, various features of data representation are described in detail, so that multi-source and heterogeneous data sets are processed flexibly and stably. In addition, using multiple kernels can enhance the interpretability of the model and improve the generalization performance of the classifier. The latest research on multi-core learning shows that this method can effectively fuse multiple basic features in target detection and recognition. However, MKL tends to select only the most discriminatory basic features and ignore other less discriminatory basic features that may provide supplementary information. In addition, MKL usually uses Gaussian RBF kernels to transform each basic feature into a high-dimensional space. Generally, the basic characteristics of different modes require different kernel parameters to get the best performance. Therefore, MKL may not be able to utilize the maximum discriminative power of all basic features from multiple modes simultaneously.In order to solve these problems, we propose a multi-core learning MDMKL method based on interval dimension constraints. This method extends MKL with interval constraints and applies dimension normalized RBF kernel to multimodal feature fusion. Moreover, the proposed MDMKL method learns the weights of different base features based on their recognition capabilities. Unlike traditional MKL, when constructing the optimal combination kernel, MDMKL merges and discriminates basic features by allocating smaller weights, so as to take full advantage of the complementary features of different modes.

6 Model performance analysis

In this study, after extracting features from audio, text, and visual information, the proposed MDMKL is used to perform feature-level fusion. Figure 5 shows the experimental results obtained by using different fusion algorithms to classify sentiment of multimodal data information MOUD. It can be seen from the figure that as the training data set increases, the classification results after fusion become better and better. In the initial stage of training sample growth, the classification accuracy is rapidly improved. When the amount of data increases to a certain extent, the growth rate will slow down or even occasionally show negative growth. The MDMKL model proposed by this research has the highest classification accuracy. The accuracy rate obtained by MDMKL is 97.25%, and the accuracy rates of SVM and MKL are 88.90% and 94.34%, respectively.

Fig. 5
figure 5

Experimental comparison diagram of different fusion algorithms

When conducting experiments on the MOUD dataset, we conduct an analysis of the convergence of the algorithm, that is, the f value of the objective function changes as the number of iterations increases. In Fig. 6, we compare the convergence performance of the MDMKL and MKL methods. The stopping criteria for both methods is that the d change between two consecutive steps is less than the given threshold 0.001. Compared with traditional MKL, it is found that within a limited range of iterations, as the number of iterations increases, MDMKL converges faster than MKL. In addition, we can also observe that the target value of MDMKL can converge to a stable value in less than 5 iterations.

Fig. 6
figure 6

Comparison diagram of target convergence and iteration times

Table 1 and Fig. 7 show the results of using MDMKL to fuse different modalities. In the single-mode classifier, the visual mode provides the classification results with the highest accuracy. For visual and text modal fusion, the accuracy rate of V + T fusion modal is up to 96.97%. For visual and audio modal fusion, the accuracy rate of V + A fusion modal is up to 91.84%.

Table 1 Comparison table of single-mode and multi-mode feature level fusion
Fig. 7
figure 7

Comparison diagram of single-mode and multi-mode feature level fusion

For text and audio modal fusion, the accuracy of T + A modal fusion is up to 96.56%. We found that when visual emotion is integrated with audio and text emotion, the visual text frame is superior to the visual audio frame. When the three modes of audio, text and visual are fused, the accuracy is higher than that of any two modes. The accuracy of the three modal fusion is up to 97.25%, which is higher than the accuracy of the existing framework.

Finally, the performance analysis of the Japanese social sentiment classification of this research model is conducted. The data of this research comes from Twitter, which has many Japanese users, and selects the Japanese part of the platform. A total of 60 sets of data are collected for experiments, and sentiment classification is performed on these data to determine the classification accuracy. The system requires a recognition rate of 95% to qualify. Therefore, this study takes 95% as the bottom line, and the results are shown in Table 2 and Fig. 8.

Table 2 Statistical table of the accuracy of the Japanese social emotion classification of the model
Fig. 8
figure 8

Statistical diagram of the accuracy of the Japanese social sentiment classification of the model

As can be seen from Fig. 8 and Table 2 above, the accuracy rate of this research model on the Japanese social sentiment classification results exceeds 95%, which meets the requirements of the system for this research algorithm. This result shows that the model proposed in this study has a certain stability, which meets the system's requirements for algorithm stability.

Based on the real data set of Twitter, according to the research needs of this article, a reasonable filtering strategy is designed to select the experimental data set, and this study supplements the tweet pictures and comment information through web crawling. On this basis, the tweet text is preprocessed to reduce its irregularity. Secondly, from a psychological point of view, this study extracts the underlying visual features related to emotional factors from tweet pictures, and uses the visual bag of words model to quantify the underlying visual features into visual words. Finally, this study verifies the validity of the data set by analyzing the relevant statistical characteristics of the data set.

Using the Twitter real data set as the experimental data set, the effectiveness of the model algorithm proposed in this paper is verified through experiments. On this basis, this research designs and implements a user sentiment analysis prototype system.

7 Conclusion

Aiming at the problem of user sentiment analysis in Japanese social media, this paper proposes a topic model of user sentiment analysis based on SVM and KNN and uses real social media data sets to conduct experiments to verify the effectiveness of the model algorithm.

Taking TE process data as an example, this paper selects radial basis kernel function and grid search method for the construction of SVM classifier.

Moreover, this study combines the ODLLTSA algorithm with the KNN algorithm to propose KNN-based feature recognition. The ODLLTSA algorithm can increase the distance between heterogeneous points, reduce or maintain the distance between similar points, and ensure that the local neighbor points remain unchanged. The KNN algorithm is a distance-based monitoring algorithm and can handle nonlinear and non-Gaussian data very well. Therefore, the two can be well matched to improve the speed and accuracy of fault monitoring.

In addition, this study uses the Twitter real data set as the experimental data set to verify the effectiveness of the user sentiment analysis model proposed in this article. Through experimental research, we know that the model proposed in this study has certain stability, and the accuracy rate of emotion classification is above 95%, which meets the system's requirements for algorithm stability and accuracy.