1 Introduction

The design of efficient combustion systems requires an in-depth understanding of the underlying physics that occur within combustion chambers. The challenge is significantly escalated for supersonic combustion due to the presence of shocks and other waves. Considering that turbulent flows are inherently three-dimensional and shocks occur over very thin regions of the order of one mean free path, the presence of shocks introduces additional complexities to a flow that is characterized by a multitude of scales in time and space. Computational methods typically use artificial diffusion to smear the shocks such that they can be captured on a grid that is more coarse than the thickness of the shock [3]. Despite this approximation, the simulation of supersonic turbulent combustion is not only computationally demanding but also able to produce large quantities of data.

As discussed in detail in our previous work [35], an important aspect of turbulence modeling and model validation involves analysis of the subgrid scale stress tensor or the stress tensor . Unfortunately, the scale of these datasets is very large; it is necessary to capture all the scales of turbulence, combustion, as well as the discontinuities. For example, Reynolds Averaged Navier-Stokes (RANS) simulations of the scramjet engine have used in excess of 33 million cells [15, 16, 46]. Meanwhile, Large eddy simulation (LES) of a simplified scramjet engine model have required upwards of 6 million cells [6], and preliminary LES simulations of a full scramjet at low Reynolds numbers have used 14 million cells [8]. It is expected that high-fidelity LES simulations of the full scramjet geometry at high Reynolds numbers will exceed 100 million cells. These datasets cannot be output frequently for visualization because of the computational cost of writing the files. When one considers that the computational cost of such a simulation can easily exceed 17,000 core-hours, it becomes necessary to limit disk-intensive operations to preserve core-hours [28].

Feature extraction can be a useful approach to reducing the size of the dataset to visualize. In this approach, a small subset of points are identified as features of interest and output for visualization. While multiple filtering techniques exist for flow shock feature extraction, some of these methods incorrectly identify regions of turbulence as shocks, and conversely some turbulence models incorrectly identify shocks as turbulence [17]. Most importantly, filtering techniques generally do not have the ability to identify abnormalities in the flow in the absence of expert knowledge input. As a result, it is necessary to investigate more flexible approaches to analyzing discontinuities in the flow that may correspond to shockwaves, including approaches which can identify normal shock behavior and abnormalities. Such anomaly detection tasks tend to require a statistical approach.

In this work we investigate the potential and limitations of deep learning [5, 29]—a machine learning technique based on learning representations of the data—with respect to shock feature extraction from strain tensor values. Deep learning has been successfully used in a variety of applications, including image analysis and speech recognition, and, beyond feature extraction, has the potential to capture abnormalities in the data. Here we take an exploratory first step in this direction by investigating the feasibility of using deep learning to retrieve shock locations. Future work will investigate the ability of deep learning with respect to anomaly detection.

2 Background and Related Work

2.1 Stress and Strain Tensors

A tensor is an extension of the concept of a scalar and a vector to higher orders. Scalars and vectors are 0-th and 1-st order tensors, respectively. In general, a k-th order tensor can be represented by a k-dimensional array, e.g. a second order tensor is a two-dimensional array (a matrix). For example, while a stress vector is the force acting on a given unit surface, a stress tensor is defined as the components of stress vectors acting on each coordinate surface; thus stress can be described by a symmetric 2-nd order tensor.

The velocity stress and strain tensor fields are manifested in the transport of fluid momentum, which is a vector quantity governed by the following conservation equation:

$$\displaystyle{ \frac{\partial \rho u_{i}} {\partial t} + \frac{\partial \rho u_{i}u_{j}} {\partial x_{j}} = -\frac{\partial p} {\partial x_{i}} + \frac{\partial \tau _{ij}} {\partial x_{j}},\qquad \mbox{ for}\;i = 1,2,3 }$$
(1)

where the Cartesian index notation is employed in which the index i = 1, 2, 3 represents spatial directions along the x, y, and z Cartesian coordinates, respectively; and the repeated index j implies summation over the coordinates. t is time, ρ is the fluid density, u ≡ [u 1, u 2, u 3] is the Eulerian fluid velocity, p is the pressure, and \(\boldsymbol{\tau }\) is the stress tensor defined as:

$$\displaystyle{ \tau _{ij} = 2\mu \left (S_{ij} -\frac{1} {3}\delta _{ij}\frac{\partial u_{k}} {\partial x_{k}}\right ) }$$
(2)

where μ is the dynamic viscosity coefficient (a fluid-dependent parameter) and S is the velocity strain tensor defined as:

$$\displaystyle{ S_{ij} = \frac{1} {2}\left (\frac{\partial u_{i}} {\partial x_{j}} + \frac{\partial u_{j}} {\partial x_{i}} \right ) }$$
(3)

As indicated by the definitions above, both the stress and strain tensor are two-dimensional, symmetric, positively-defined arrays. Density, along with the three velocity components and total energy, are the primary variables calculated in the code. All other information, including the stress tensor and velocity strain tensor, are secondary variables calculated from these primary variables.

2.2 Shock Feature Extraction

Most feature extraction techniques fall into one of three basic categories. The most widespread method uses feature attributes such as mass, centroid, volume, texture, or moment of inertia [10, 5052]. A number of filtering techniques specifically for detecting and visualizing shocks waves have been developed, including using shock surface alignment to the pressure gradient vector [32], and using the density gradient in the direction of the velocity [33, 44]. A second approach to feature extraction uses isosurfacing in higher dimensions [24]. A third class of approaches uses various machine learning techniques to aid in feature tracking. Tzeng and Ma [58] utilize neural networks to learn which transfer functions are most appropriate in tracking the features of interest. Ozer et al. [43] use a clustering algorithm to group features based on similarity measures. In our previous work [35], we introduced a large scale K-Means clustering approach to define and track regions of interest. Our approach is similar to these last category approaches in that we also utilize machine learning.

The approach we use to generate labels for the tensor data is based on the Schlieren filter [34]. The density-gradient of combustion datasets relates indirectly to the stress tensor through the conservation equation [34]. Such density-gradient descriptors can be used to generate flow visualizations in the style of Schlieren images [19], and have been shown to accurately reflect shock boundaries [34].

The use of the Schlieren computation is intended as an exploratory first step in investigating the ability of deep learning to identify shockwaves and other features in CFD datasets. While the computation of the Schlieren itself is not costly enough to justify a learning alternative, more accurate shock prediction methods are significantly more costly and difficult to pose numerically. The ultimate goal of the future work would be to employ a deep learning approach to directly pinpoint and differentiate different phenomena at a lower computational cost than the many sensors currently available in literature.

2.3 Deep Learning

In 1943 McCulloch and Pitts [38] introduced a set of simplified computational models of biological networks. These ideas were soon extended to include models of how these networks might learn including Hebbian learning [20], multilayer perceptrons [42] and eventually backpropagation [49]. However, the extensive computational complexity of training large networks [40] prohibitively limited their usefulness.

Deep architectures yield a greater expressive power than shallow networks since functions that can be compactly represented by an architecture of depth k, would require an exponential number of computational elements to be represented by an architecture with a depth of k − 1 [5]. This breadth for depth trade-off allows deep architectures to represent a wide family of functions with reduced complexity and improved generalization. Each succeeding layer of the network combines the features of the previous layer forming a higher level abstraction of the features in the preceding layer. This increasing level of abstraction from layer to layer allows deep networks to produce strong generalizations for highly varying functions. The difficulty lies in efficiently training the large number of parameters needed to form the network [40].

Convolutional networks are specific types of deep architectures inspired by the structure of the visual cortex [29, 36] which do not suffer from the typical convergence issues of other deep architectures [5]. The defining characteristic of convolutional networks is the use of local receptive fields with shared parameters. These fields are used to scan input features with a two dimensional structure and form feature maps that capture low-level (e.g., edges and corners) representations of the input. This process is then repeated in each succeeding layer allowing for the formation of progressively higher-level abstractions. This two-dimensional representational power of convolutional networks has allowed them to dominate the field of computer vision [27, 55].

Volumetric convolutional networks [37, 39, 47] are an extension of convolutional surface-based architectures to input features with three dimensional structure. They have successfully been applied to the area of object recognition [37] and MRI segmentation [39]. The ability of these architectures to take advantage of three dimensional structure directly translates to the problem of inferring Schlieren features in combustible fluids.

2.4 Application Domain Background

Supersonic combustors, such as scramjet engines, are prime examples of flows where turbulence and combustion interact with shock waves [15, 46, 48]. However, the concurrent presence of shock waves and turbulence in the simulation of supersonic turbulent flow presents additional challenges compared to subsonic flows; the numerical methods designed to treat these properties must predict the presence and capture these features accurately [17]. The inability to accurately predict these features contributes to the many unresolved fundamental issues that surround supersonic combustors, such as the scramjet. Flames in the presence of shocks are known to become distorted, generate vorticity, and break up or stretch [25, 26]. Additionally, when turbulence is seen in the presence of a shock, there is an amplification of velocity fluctuations [8]. The inability to accurately predict these behaviors affects the remainder of the solution domain, thereby causing a failure to compare well against experiment.

While numerical simulations can provide an acceptable prediction of turbulence behavior, either through the use of large eddy simulation or direct numerical simulation, there is still a significant challenge in the simulation of flows involving flow discontinuities, such as shocks. Methods for numerically modeling supersonic turbulent combustion[4, 23] rely on the accurate prediction of the shock location. Current methods for shock capturing may rely on an artificial viscosity to dissipate the shock, to smear it across many solution points so that it may be captured [13]. However, incorrectly predicting the location of the shock may have unintended side effects, such as adding the artificial viscosity in the incorrect regions, thereby dissipating other regions of the flow. Research has been performed in developing sensors to accurately predict the shock location [3, 14, 17]. However, some of these methods incorrectly identify regions of turbulence as shocks, and conversely some turbulence models incorrectly identify shocks as turbulence [17]. As a result, it is necessary to investigate more flexible approaches to analyzing discontinuities in the flow that may correspond to shockwaves, including approaches which can identify normal shock behavior and abnormalities.

3 Methods

To study the applicability of deep learning to the feature identification problem, we trained convolutional neural networks to learn a mapping from strain tensors to Schlieren values [34] for each time-step in a turbulent flow. To accomplish this we form a regression network similar to an auto-encoder [9, 21] where instead of learning to replicate the strain tensors used as input, we learn to construct the associated Schlieren values for each time step. The strain tensor is calculated with Eq. (3) above [34]. Additionally, the Schlieren value for each pixel is derived with the following equation:

$$\displaystyle{ \text{Schlieren}(x,y,z) =\beta e^{- \frac{k\vert \nabla \rho \vert } {\vert \nabla \rho \vert _{\text{max}}} }, }$$
(4)

where x, y, and z are the position coordinates, β and k are rendering parameters set to 0.8 and 20 respectively, and Δp is the gradient of the density field.

We will examine this approach on two datasets, a three-dimensional Sod dataset and a two-dimensional blast dataset. The differing spatial dimensions of both of these domains requires the formation of two different network architectures. In this section we will describe the methods used to construct and train both of these networks.

3.1 Data Processing

3.1.1 Three-Dimensional Sod Dataset

The three-dimensional Sod problem is one form of a shock tube problem, which is frequently considered a benchmark test for shock capturing methods. It is also commonly used for testing compressibility terms in numerical codes due to its inclusion of spatial pressure variation [31]. The initial condition contains a driver and driven gas separated by a diaphragm in the center. When the diaphragm breaks, at time zero, a discontinuity forms and travels to the end of the tube. The final solution, which is available as an analytical solution, consists of rarefaction, contact, and shock waves [57].

For each time-step (of a total of 1,775 steps in our experiments) we discretize the position coordinates of the Sod dataset into a 804 × 4 × 4 volume and for each position calculate the strain tensor S and the Schlieren using Eqs. (3) and (4).

We then represent each of the six unique values of the strain tensor as a different channel of a three dimensional image (6 × 804 × 4 × 4) and the Schlieren value as a single channel image (1 × 804 × 4 × 4). Representing the data in this form allows for us to use computer vision techniques such as volumetric convolutional networks to learn a mapping from the strain tensors to the Schlieren values.

With the data in this format we place every tenth time-step into a test set (177 total) and randomly separate the 90% remaining time-steps into a training set (1416 total) and 10% into an evaluation set (182 total) that is used to determine when the network training has converged.

3.1.2 Two-Dimensional Blast Dataset

The second problem considered is a two-dimensional explosion. The initial condition consists of a high-density, high-pressure region located inside of a circle in the center of the geometry and low-density, low-pressure in the remainder of the computational domain. The two regions are joined by a discontinuity, which travels outwards in time, forming shock, contact, and rarefaction waves. Comparison along the radial directions gives virtually identical results due to the problem’s symmetry, and the resolution of discontinuities that travel in all directions is the same as that in the one-dimensional Sod problem [57]. The solution of this problem requires high resolution throughout the domain due to the sharp discontinuities traveling in all spatial directions.

For each time-step of a total of 158 steps we discretize the position coordinates into a 70 × 70 surface and for each position calculate the strain tensor S using Eq. (3) and the Schlieren value with Eq. (4).

We then represent each of the six unique values of the strain tensor as a different channel of an image (6 × 70 × 70) and the Schlieren value as a single channel image (1 × 70 × 70). As with the three-dimensional dataset we place every tenth time-step into a test set (15 total) and randomly separate the 90% remaining time-steps into a training set (133 total) and 10% into an evaluation set (ten total) that is used to determine when the network training has converged.

3.2 Network Architecture

For both the three-dimensional and the two-dimensional dataset we constructed an eight-layer all convolutional network [53] that is divided into two parts; a feature extractor that learns a function for condensing the input features into a low-dimensional feature vector, and an image constructor that learns a function that transforms the low-dimensional feature vector calculated in the previous part into a schlieren image. We construct the network in this way in order to improve generalization by forcing the network to learn a sparse feature representation with useful (general) features of the strain tensors. The ultimate structure is chosen by tuning the hyper-parameters on the evaluation set [7]. The main difference between the two networks is the use of volumetric convolutions for the three-dimensional dataset and regular convolutions for the two-dimensional dataset.

The feature extractor for both networks consist of four strided volumetric convolutional layers that reduce the input strain tensors into a 64 neuron feature vector. Strided convolutions incorporate regularization into the convolutional layers while improving the efficiency in network performance [54] when compared to standard max-pooling based sub-sampling. The image extractor consists of an inverse mapping, sometimes referred to as deconvolutional layers [41], with matching strides and kernel sizes as the feature extractor. The key difference is that the image extractor outputs a single channel image compared to the six channel input of the feature extractor. Figure 1 details the structure of both of the networks with a feature extractor reducing the input strain tensors to a low dimensional vector of 64 features and the image extractor building the Schlieren from the feature vector. Additionally, we use Exponentiated Linear Units (ELU) [11] as our activation function for both networks which leads to improved efficiency and performance while addressing the vanishing gradient problem in training deep networks.

Fig. 1
figure 1

Networks for three-dimensional Sod dataset (left) and the two-dimensional Blast dataset (right). Each network comprises two parts; a feature extractor that learns a function for condensing the input features into a low-dimensional feature vector, and an image constructor that learns a function that transforms the low-dimensional feature vector calculated in the previous part into a Schlieren image

3.3 Optimization

We train our network parameters to minimize the mean squared error between the predicted Schlieren values and the ground truth values and optimize the weights with an adaptive learning rate calculated using the adadelta optimization function [59]. To avoid large gradient updates that may be caused from learning regression values, we incorporate gradient clipping [45] to constrain the gradient norm to lie within a specific threshold (i.e., [0, 1]).

In order to improve the efficiency of our training routine we use spatial batch normalization [22] to normalize the feature maps generated after each convolutional layer. This ensures that the input distribution for each layer is consistent (zero mean and unit variance) which greatly improves learning performance.

Convergence of the network training is determined using a five-step average windowed delta loss on the evaluation set. When the average delta loss (mean squared error) on the evaluation set is positive, meaning the network is not improving its ability to construct the Schlieren, learning is stopped. This ensures that we do not over-train on the training set and preserve the network’s ability to generalize.

3.4 Regularization

When training a system on a limited amount of data (or an unbalanced dataset) there is a risk of over-training on the training set and losing the ability to generalize to new instances. For this reason we incorporate spatial dropout [56] after each of the convolutional layers in the network. During training this randomly drops entire feature maps in forward propagation (by setting all values to 0). This forces the network to learn a sparsified representation of the feature vector (condensed low-dimensional feature representation) leading to improved generalization.

4 Results

We evaluate our approach on two datasets, a three-dimensional Sod dataset and a two-dimensional Blast dataset as described in Sect. 3.1. In this section we describe our results in training the networks and examine the features the networks learned.

It is important to note that while calculating the Schlieren for the three-dimensional Sod dataset via Eq. (4) takes an average of 550 ms per time step on an Intel Xeon E5-2697 2.6 GHz processor, our trained network generates the Schlieren in less than 107 ms on the same CPU processor (respectively less than 1 ms on a GeForce GTX 1080 GPU ) running the Torch framework [12], an open source machine learning library and scientific computing framework that provides a range of algorithms for deep machine learning. Torch uses the scripting language LuaJIT and an underlying C implementation. This increase in efficiency becomes extremely significant as datasets grow to include hundreds of thousands of time-steps. In total the three-dimensional network trained for about 35 s on the Sod dataset and the two-dimensional network trained for about 6 s on the Blast dataset before convergence. Nevertheless, the main strength of the deep learning approach lies in its potential for anomaly detection, which the Schlieren filter cannot do.

4.1 Training

Figure 2 shows the prediction (mean-squared) error on the training, evaluation, and test set for both datasets after each training epoch. We can clearly see that the networks perform very well on the training sets and converge quickly on the evaluation sets. The test error matches closely with the evaluation error indicating that the evaluation error is a strong representation of the network’s ability to generalize to the test set. In fact we were able to achieve an average mean-squared error of 0.14 on the three-dimensional Sod test set and 0.12 on the two-dimensional Blast test set. These results suggest that an adequately trained network has the ability to detect anomalies in the tensor field by comparing the predictive error of the network to the true Schlieren values for each time step. A large error would signal a possible deviation from the regular tensor flow indicating anomalous behavior.

Fig. 2
figure 2

The average error of the network generated Schlieren for both the three-dimensional Sod dataset (left) and the two-dimensional Blast dataset (right) after each training epoch. The Y axis represents the mean squared error of the Schlieren value and it is plotted in log-scale for clarity. A standard Schlieren carries units of density gradient (kg/m4), however the flow simulation code uses a normalized function and as such in this case the Schlieren is unitless

4.2 Learned Network Features

In this section we show examples of the features that the two-dimensional network learns for the Blast dataset. We restrict this section to the two-dimensional dataset for the sake of visual clarity as viewing images of the low resolution volumetric features of the Sod dataset in two dimensions is not very informative. However, the conclusions from the two-dimensional dataset match those of the three-dimensional dataset in that the network learns strong feature representations of the input strain tensors allowing for accurate constructions of the associated three-dimensional Schlieren images.

As an example from the test set, Fig. 3 displays the six channel strain tensor (calculated via Eq. (3)) used as input into the network for time step 120 followed by the true Schlieren image (calculated via Eq. (4) and the Schlieren image generated by the network. We can see that the output of the network (bottom right) matches closely with the calculated Schlieren (bottom left) with a clear separation between high and low-density regions. The fact that we can develop such a close approximation of the desired Schlieren for a time step in the test set is very impressive. Deep networks tend to require large amounts of data to generate strong results indicating that increasing the size of our training set will further improve our predictions.

Fig. 3
figure 3

The six channels of the strain tensor used as input for step 120 followed by the ground truth Schlieren (bottom left) and the Schlieren generated by the network (bottom right)

Figures 4 and 5 show examples of feature maps in the first two layers of the feature extractor for time step 120 from the test set (we restrict our view to the first two layers due to the low resolution of feature maps in deeper layers). We observe that the network has learned the informative features of each unique strain tensor and combined them to form feature maps with the necessary information for constructing the Schlieren image. The specialization of these feature maps allows for the network to learn a general representation of the input data which is very important for accurately predicting the Schlieren values in unseen time steps. Additionally, these feature maps serve as a set of dictionary references with increasing abstraction (by layer) that allow the network to properly condense the input tensors into a general low-dimensional feature vector as shown in Fig. 1.

Fig. 4
figure 4

Feature maps in the first layer of the feature extractor

Fig. 5
figure 5

Feature maps in the second layer of the feature extractor

4.3 Output Visualization

The output of the network produces a reconstructed volume for every tenth time-step of both datasets (test cases), where each point corresponds to the floating-point Schlieren value at that location in the discretized grid. In our experiments, we evaluated 177 and 15 output Schlieren volumes for the three-dimensional and two-dimensional datasets, respectively.

To analyze the results, we visualize each volume as a two-dimensional pseudocolor image that encodes the Schlieren value between two diverging colors—red and black. Similar to grayscale photos, pseudocolor images map intensity to a color scale that ranges from the minimum and maximum value of a data sample [13].

We create three pseudocolor images for each reconstructed Schlieren volume to compare it against the input data and its ground truth. Figure 6 shows the analysis of the output corresponding to the three time-steps of the Blast dataset. Note how the distinct dark areas in the reconstructed images (right) match those of the ground truth (left). These areas signify the occurrence of large density gradients across data samples that correlate with potential shock locations. Figure 7 shows the similar ground truth and reconstruction for the three-dimensional dataset. In this figure, note the three distinct shock zones.

Fig. 6
figure 6

The network comparison between the network ground truth (left) and the final reconstruction (right) for the 1st, 50th, and 150th time-step of the two-dimensional Blast dataset. The dark areas seen in the images indicate large density gradients corresponding to potential shock regions

Fig. 7
figure 7

The network comparison between the network ground truth (top) and the final reconstruction (bottom) for the 100th time-step of the three-dimensional Sod dataset. The dark narrow bands seen in the two images indicate large density gradients, and correspond to potential shock regions. Note the three distinct shock zones in the images

The noise in the Schlieren generated from the network in Fig. 6 is a symptom of the low amount of training data in this particular dataset. We trained on data from a single combustion run for both the three-dimensional and two-dimensional settings. Expanding the training data will likely improve results, as is the case in most deep learning applications. This paper is meant as a proof of concept that deep learning is a feasible tool for generating the Schlieren from strain tensors and had potential for identifying anomalies in the combustion fields. Additional image processing may create a clearer Schlieren, as would an increase in the amount of training data.

5 Discussion and Conclusion

As previously discussed, the goal of this project was to examine the potential of using deep learning on tensor field data generated by turbulent combustion simulations. First, we were interested in finding out whether a supervised approach can detect structures in the data, and whether these structures correlate with the regions of interest. Second, we wanted to examine whether this problem is computationally feasible. The answer to both questions is affirmative.

In summary, we found that the deep learning approach can effectively capture and construct shock features. The results indicate that deep networks have the ability to identify anomalies in the tensor flow. Furthermore, the efficiency of the machine learning algorithm exceeds that of calculating the Schlieren via Eq. (4). With a greater than 5× improvement on time complexity for the three-dimensional Sod dataset. The efficiency of the machine learning algorithm leads to a 500× improvement when running on the GPU , while optimized CPU to GPU throughput-computing transfers lead to only a 2.5× improvement on average [30]. A small deviation in the predicted values shows that deep learning has the potential to be an effective tool for efficiently generating visualizations of large tensor fields.

It is important to note that this work is exploratory and is meant to be a proof of concept for the use of deep networks in visualizing large tensor fields. While the results are strong, we trained and tested the networks on the same time sequences (with separate training and testing time steps). Predicting time steps in sequences that were not previously trained on may be a more difficult task, However, the results in this chapter were generated with a very limited dataset, 1775 and 158 time-steps for the Sod and Blast datasets respectively, and deep networks usually require very large amounts of data to be effectively trained [55]. This implies that while the predictive performance may drop with separate training and testing sequences, it may also improve with the inclusion of more data. Additionally, the datasets used in this chapter are sequential in nature lending to the notion that the results may improve with deep architectures that can take advantage of sequential features in their predictions such as recurrent neural networks [18]. Further exploration of this area is needed in order to form a decisive conclusion.

In conclusion, we have introduced a supervised machine learning approach for the segmentation of shocks in large scale tensor field datasets generated by computational turbulent combustion simulations. The approach employs a deep learning architecture based on volumetric convolutional networks. Our evaluation on two rich combustion datasets shows this approach can assist in the visual analysis of the combustion tensor field and that it is more effective than direct filtering calculations. Most importantly, the approach has potential for the detection of anomalies in the data.