Keywords

1 Introduction

Space telemetry is a set of technologies that allows remote collection of information about on-board spacecraft subsystems. The subsystems are controlled by analysis of sensor readings that are distributed across submodules. A subsystem state at a particular point in time is described by a vector of sensor values. The time sequence of states is a sequence of vectors of sensor values. Hence, space telemetry data are multivariate time series. One of analysis tasks is the forecasting of such time series.

The task of forecasting multivariate time series is generally formulated as follows [1, 2]: from the known current value of the sequence y(k) and the prehistory \( y(k{-}{ 1}),y(k{-}{ 2}), \ldots ,y\left( {k{-}m} \right) \) we should evaluate the next value ŷ(k + 1). Each element of the sequence y(k) represents a vector of values at time k. The length sequence m is called as time window.

A variety of techniques has been used in short-term forecasting, including regression and time series analysis. Simple regression and multiple linear regressions are frequently used. They have an advantage that they are relatively easy for implementation. However, they are somewhat limited in their ability to forecast in certain situations, especially in the presence of nonlinear relationships between high-level noisy data. Most time series models also belong to the class of linear time series forecasting, because they postulate a linear dependency between the value and its past value. The autoregressive moving average ARMA model and its derivatives are often used for the case of univariate analysis However, the artificial neural networks (NNs) often outperform these models in solving complicated tasks [1]. Deep neural networks can also be used [3], but the large training set is needed in this case.

The processing and analysis of the telemetry data is accompanied by non-deterministic noises. In this case it is preferred to use NNs technology. The effectiveness of this technology depends on NN architectures and learning methods [1, 4], which requires multiple experiments.

There are examples of using NNs in on-board intelligent decision support systems for managing complex dynamic objects and diagnosis of its condition [5, 6].

In this paper, we investigate the possibility of short-term prediction parameters of telemetry using a neural network ensemble (ENN) [7, 8], which is a set of NNs and makes decisions by averaging the results of the separate NNs.

Predictive analytics and machine learning are often faced with the “concept drift”, which means that the statistical properties of the target variable, which the model tries to predict, change over time in an unpredictable manner [9] and this increases the prediction error. Hence, the neural network prediction efficiency can be improved by using iterative learning methods [7, 9, 10]. These methods involve accuracy estimation of the models and their ranking on each analysis iteration. In the case of lowering the overall accuracy, the ensemble detects the concept drift and a new NN, trained on the relevant data, is added to the ensemble. In this approach, the model laid during initial training is retained and the new parameters are entered without “forgetting” problem. Thus, an additional learning ENN is realized.

2 Methods and Algorithms

The main objective of this study is to explore the possibility of using ENNs for telemetry data forecasting. Therefore, NNs and ENNs are selected as the test models. Classic feed forward NN with one hidden layer is selected as a single NN model.

A comparative analysis of the following approaches to the formation of the output value of the ensemble is performed.

  1. 1.

    The output value is formed as the sum of the individual networks outputs. It is calculated for the case with a single output neuron by the formula:

    $$ y = \frac{1}{n}\sum\limits_{i = 1}^{n} {y_{i} } , $$
    (1)

    where n – the number of the networks in the ensemble, y i – the output of i-th network;

  2. 2.

    The output value is formed as a weighted sum of the individual networks outputs. It is calculated for the case with a single output neuron by the formula:

    $$ y = \sum\limits_{i = 1}^{n} {y_{i} \cdot w_{i} } , $$
    (2)

    where n – the number of the networks in the ensemble, y i – the output of i-th network, w i – the weight of i-th network, which is formed according to the formula:

    $$ w_{i} = \frac{{mse_{i} }}{{\sum\limits_{i = 1}^{n} {mse_{i} } }}, $$
    (3)

    where mse i – MSE of i-th network on a validation set;

  3. 3.

    The output value is formed as the weighted sum of the outputs of the individual networks (formula (13)) and weighing is repeated after a certain interval of time samples with the evaluation on this interval (dynamically weighted ensemble).

Also an iterative method of ensemble learning was studied. The following algorithm was used for this purpose:

  1. 1.

    Processing of the current input vector.

  2. 2.

    Evaluation of the accuracy of the ensemble. For this purpose, errors in the previous step and the current are compared.

  3. 3.

    If the error is not increased or increased in a predetermined range, go to the next input vector.

  4. 4.

    Otherwise, the training set is formed, which includes all the accumulated data from the last additional training.

  5. 5.

    The formation and training of a new neural network.

  6. 6.

    Formed network is added to the ensemble.

  7. 7.

    Recalculation of the weighting coefficients based on their errors on the latest data produced for all neural networks of the ensemble.

3 Experiment

The data set is a finite set of precedents, which is selected in some way from the set of all possible precedents called the general population.

Data sets parameters for our experiments are presented in Table 1. Each set of telemetry data is obtained by sensors of the correction propulsion system. These include temperature parameters and pressure levels of the xenon supply unit, electric parameters of flow control, electrical parameters of the engines anode and cathode. Values ​​provided by the sensors depend on the mode which is set by the control commands. Sensor values ​​are correlated with each other; it allows expecting a satisfactory assessment of the forecast.

Table 1. Telemetry data of the correction propulsion system of the spacecraft

Since the learning is carried out with the teacher, it is necessary to form the learning set of pairs “input vector, output vector”. Formation of the pair of the learning sample is carried out by windowing method [2]: it takes a certain period of time series and a few observations stand out from it, and that will be the input vector. The value of the desired output in the training example will be the next in order. Then, the window moves in the one position in the direction of increasing time, and the process of forming the next couple of training sample is repeated. Thus, if the dimension of the time series data is N and the window size − W, the neural network should receive the input sample with size \( N \times W \). So, for the window W = 20 Dt_set_s1 set is converted into an input set with size of \( 6 5 8 9 \times 4 80 \), and target set − with size \( 2 8 \times 6 5 8 9 \).

Resampling and scaling performed during the preparation of the datasets of telemetric sensor information.

Resampling is performed to convert the raw data, representing as a sequence of time stamps of important events to the form with fixed sampling time. Scaling is necessary to bring the data in the valid range [−1, 1]. Also, the outputs of the network are scaled.

The aim of the experiment is to determine an influence of the parameters of a single neural network and ensembles on forecast performance.

Multilayer perceptron with one hidden layer with hyperbolic tangent non-linear activation function is used as the base element of the ensemble. RPROP algorithm is used to train a single network [11]. Prediction window is chosen to be 20 samples.

To evaluate the quality of trained neural networks as well as to compare different ensembles, the following values are used:

  • mean square error, MSE:

$$ {\text{MSE }} = \frac{1}{m}\sum\limits_{i = 1}^{m} {e_{i}^{ 2} } ; $$
(4)
  • mean absolute error, MAE:

$$ {\text{MAE }} = \frac{1}{m}\sum\limits_{i = 1}^{m} {\left| {e_{i} } \right|} . $$
(5)

where e i  = y i t i , y i and t i – the obtained and the desired output signals of i-th neuron of the output layer, respectively, m - the size of the output layer of the neural network.

In this experiment, the input set is divided in a ratio of 9:1 for a general training set and the final test set. The general training set is divided into validation set (15%), a test set (15%) and training set (70%) randomly, which were used for training, evaluation and search for the best architecture, respectively.

The resulting test set is used to calculate the final estimates obtained by neural networks.

3.1 Evaluation of the Hidden Layer Size

Suboptimal size of the hidden layer of the neural network was evaluated according to the following algorithm.

  1. 1.

    Determination of the search interval.

  2. 2.

    Training of 10 networks with the current hidden layer size, which is selected from the search interval.

  3. 3.

    Formation of the weighted ensemble.

  4. 4.

    Evaluation of the ensemble accuracy.

  5. 5.

    Until the end of the search range, go to the next element of the interval (step 2).

  6. 6.

    Select the ensemble with the lowest MSE on search range, the size of the hidden layer of the ensemble element is the best one.

The dependence of the accuracy of the hidden layer size is shown in Fig. 1.

Fig. 1.
figure 1

Evaluation of the hidden layer size for dataset Dt_set_s1

The sizes of the hidden layers for the data sets are shown in the Table 2.

Table 2. Evaluation of the hidden layer size

It should be noted that this algorithm significantly increases the formation of the ensemble.

3.2 Evaluation of Neural Network Training Time

The aim of the experiment is to determine the differences in the speed of execution of training the neural network on a single central processing unit (CPU) and graphics processing unit (GPU).

  • Hardware for the experiment:

  • CPU Intel Core i5 4200H, 2 cores, 4 threads, 2 800 MHz (Turbo 3 400 MHz); GPU NVIDIA GeForce GTX 860 M, 1029 MГц, 640 CUDA cores.

  • hardware were chosen based on the close price.

  • MATLAB is chosen as an experimental platform. Hidden layer size varied in the range [4, 56] neurons.

  1. 1.

    The evaluation procedure includes the following steps:

  2. 2.

    Select the hardware (CPU, GPU).

  3. 3.

    Select a value from the range of the hidden layer.

  4. 4.

    Train 10 single neural networks.

  5. 5.

    Evaluate an average execution time.

  6. 6.

    If not all the values of the interval are used, then go to step 3.

  7. 7.

    If not all of the hardware are used, then go to step 2.

The results are shown in Table 4.

As can be seen from the table, there is a 5-fold reduction in training time of a neural network using a GPU with CUDA technology.

3.3 Analysis of Approaches for Formation of the Output Value of an Ensemble

MSE depends on the hidden layer size (it is shown in Fig. 2).

Fig. 2.
figure 2

MSE of different models depending on the size of the hidden layer

Evaluation of different models across testset Dt_set_s1 is given in Table 3.

Table 3. Evaluation of training time

Table 3 shows that dynamically weighed ensemble has the smallest error and the difference in the estimated parameters for weighted ensembles is very small.

3.4 Evaluation of Dynamically Weighted of Ensemble

Evaluation of weighting interval for dynamically weighed of the ensemble is performed. MSE plot for the various parameters is shown in Fig. 3.

Fig. 3.
figure 3

Evaluation of weighting interval for the dynamically weighted ensemble

The evaluation procedure includes the following steps:

  1. 1.

    Training the ensemble with the sub-optimal size of the hidden layer.

  2. 2.

    Determination of the search interval of weighting steps.

  3. 3.

    Evaluation of MSE for different models and the ensemble with the current step of weighting repetition.

  4. 4.

    Until the end of the search interval, go to the next element of the interval.

Figure 3 shows that the dynamic weighting ensemble with a small step of repetitive weighting (less than 10 samples) has the smallest error.

3.5 Evaluation in the Case of Concept Drift

Concept drift refers to a change in value over time and, consequently, to a change in the distribution of the value. The environment from which these values are obtained is not stationary.

An iteratively trained (without previous access to the data) expert ensemble in combination with some form of weighted voting for the final solutions is used in drift detection algorithms [9, 10].

For this experiment, modifications have been made artificially in the investigated data. The linearly increasing trend was added, and the sine wave signal was modeled as a periodic component.

Modification of Dt_set_s1 dataset is shown in Fig. 4. The first of the 1319 accounts are used without modification.

Fig. 4.
figure 4

Modification of Dt_set_s1 data set: (1) Linearly increasing trend; (2) periodic component; (3) a modified multivariate signal

The evaluation procedure includes the following steps.

  1. 1.

    Train the ensemble.

  2. 2.

    Modify the data set by adding the trend and (or) the seasonal component.

  3. 3.

    Set the error threshold for additional training algorithm.

  4. 4.

    Set a minimum amount of data for accumulation.

  5. 5.

    Rate accuracy of different models, as well as an ensemble with additional training (additional training is performed only when the accumulated needed amount of data).

Evaluation of different architectures on a modified dataset are shown in Table 4.

Table 4. Evaluation of NN models for dataset Dt_set_s1

All NN models showed a significant drop in the accuracy on a modified set including the ensemble with additional training. This is associated with accumulation interval for the additional data set (Table 5).

Table 5. Evaluation of different architectures on a modified set

4 Conclusion

The developed ENNs significantly reduce the forecasting error in comparison with single NNs.

Dynamically weighted ENN with a small step of repetitive weighing (less than 10 samples) has the smallest error and the difference in the estimated parameters for weighted ensembles is very small.

The minimum mean square error at short-term forecasting of telemetry data obtained by sensors of the correction propulsion system is equal to \( 2, 7 5 \times 10^{ - 4} \).

All NN models showed a significant drop in the accuracy on a modified set, including the ensemble with additional training. The ensemble of neural networks with additional training also showed a drop in precision due to needing of additional network training on enough samples of new data, but the resulting accuracy was higher.

Testing also showed that NN training on a single GPU with CUDA technology gave 5-fold reduction in the time in comparison with the CPU in the specified equipment.