Keywords

1 Introduction

Informationization, digitization, intelligentization, and remote support systems require high-quality mud logging data. The comprehensive mud logging instrument is the main technical equipment on-site for mud logging, responsible for data collection, processing, analysis, and transmission. It monitors engineering parameters and drilling fluid parameters in real-time during the drilling process, analyzes various gas contents in the drilling fluid. The accuracy and reliability of mud logging data directly affect the quality and safety of drilling projects. It is the basis for timely discovering and evaluating oil and gas layers and optimizing drilling construction safety. Currently, there are several factors that affect the quality of mud logging data.

  1. (1)

    The harsh installation and usage conditions of the mud logging sensors may cause reduced accuracy, malfunctions, and damages (as shown in Fig. 1).

  2. (2)

    Mechanical vibration and impact caused by frequent lifting and long-distance transportation can damage equipment in the instrument room. Chipsets and electronic components will have degraded performance as their service time increases, which can lead to abnormal data channels or reduced conversion accuracy.

  3. (3)

    The performance of the gas analysis system will decrease with production and operation time.

  4. (4)

    Similar to drilling operations, mud logging operations are located in remote locations with difficult-to-control environmental conditions. Existing indoor testing and assessment devices have low integration and large size and weight, which cannot meet on-site testing needs, resulting in delayed and incomplete testing and assessment.

  5. (5)

    Comprehensive mud logging instruments of different brands and periods have significant differences in quality, configuration, performance, etc. Many instruments have been in service for more than 10 years.

  6. (6)

    Most mud logging companies mainly focus on individual testing, calibration, and verification, lacking a unified and systematic comprehensive mud logging instrument testing and evaluation device and technical specifications.

Therefore, major petroleum companies at home and abroad attach great importance to mud logging work, regarding improving mud logging equipment performance and ensuring mud logging data quality as the basis for improving mud logging quality and engineering technical data quality. In order to eliminate the impact of the above unfavorable factors, research on comprehensive mud logging instrument testing and evaluation technology and equipment has been conducted, forming a set of complete technical specifications for comprehensive mud logging instrument testing and evaluation, and developing a comprehensive mud logging instrument testing and evaluation device that can adapt to fieldwork.

Fig. 1.
figure 1

Partial sensors of comprehensive mud logging instrument

Significant progress has been made in recent years with the application of artificial intelligence technology. Many researchers apply AI to sensor-based health and sports biomechanics [17], while others utilize it for intelligent industrial manufacturing [810]. In the area of using artificial intelligence for sensor calibration, many scholars have also conducted extensive research and achieved significant progress [1116].

2 Performance Testing, Data Acquisition and Preprocessing of Sensor

2.1 Performance Testing of the Sensors

Sensor performance testing and data acquisition and preprocessing form the foundation of artificial intelligence-based calibration technology for comprehensive logging instrument sensors. A complete set of sensor performance testing equipment was developed using a Siemens 16-bit high-precision PLC, high-precision pressure source, high and low-temperature constant temperature water tank, standard current generator, and supporting coils and high-precision resistors, and an accompanying system was programmed in C#.

Sensor performance testing: Standard testing equipment is used to test the performance of the logging sensors and record measurement results and error data. Sensor performance testing usually includes the following indicators: sensitivity, resolution, accuracy, and response time. The performance of the sensor is determined by testing it through methods such as adding a known quantity to the sensor or directly placing it in a changing environment and collecting feedback signals. After completing the sensor performance testing, the sensor is evaluated against specific application requirements. In short, sensor performance testing is an important step in ensuring data accuracy and reliability.

2.2 Data Collection and Preprocessing

The data collection system is used to collect the data obtained from the above tests and preprocess it for feature extraction by machine learning algorithms. Different signal acquisition methods, either analog or digital, are employed depending on the type of sensor. Then, the collected data must be preprocessed to remove noise, artifacts, and other unwanted signals.

Filtering is a common data preprocessing technique that can separate useful signals from noisy signals by applying filters. Depending on the filtering method, it can be classified into various types such as low-pass filtering, high-pass filtering, and band pass filtering. Sampling refers to the process of discretizing raw data by converting continuous analog signals into discrete digital signals, making them easier to store and process.

Filtering is a process that removes or retains certain components of a signal via filters. Its mathematical principles are based on signal processing theory. Common filtering methods include moving average filtering, median filtering, IIR low-pass filtering, FIR low-pass filtering, and frequency domain filtering. For logging sensor signal filtering, FIR low-pass filtering is used.

The mathematical formula for FIR low-pass filter can be expressed as:

$$y\left(n\right)={\sum }_{k=0}^{M}h\left(k\right)x\left(n-k\right)$$
(1)

where x(n) represents the original signal, y(n) represents the filtered signal, and h(k) represents the coefficients of the filter. The order of the filter is denoted by M.

A linear phase FIR filter is adopted, and the specific formula for calculating its coefficients is as follows:

$$h(k)=\left\{\begin{array}{c}2{f}_{c},k=\frac{M}{2}\\ \frac{{\text{sin}}(2\pi {f}_{c}\left(k-\frac{M}{2}\right))}{\pi \left(k-\frac{M}{2}\right)},k\ne \frac{M}{2}\end{array}\right.$$
(2)

In the above formula, h(k) represents the coefficients of the filter, M is the order of the filter, and fc is the cutoff frequency of the filter. After calculating the filter coefficients using the aforementioned formula, they can be applied to convolution operations to filter the original signal and obtain the filtered results.

After applying the standard excitation signal generated by the high-precision standard source to the logging sensor, an analog current signal will be generated by the sensor, which is generally located between 4–20 mA. With the developed signal acquisition instrument, the current signal can be read and converted into corresponding physical quantities to complete the calibration of the sensor. Figure 2 shows the sensor signal acquisition instrument, and Fig. 3 shows the working of the electric torque sensor calibration device.

Fig. 2.
figure 2

SDA-01 Sensor Data Acquisition Instrument

Fig. 3.
figure 3

Calibration of electric torque sensors

Fig. 4.
figure 4

Interface of the calibration system for comprehensive logging instrument sensors

3 Model Establishment Based on Bayesian Optimization

After obtaining the raw data of mud logging sensors, the optimal model structure and hyperparameter combination are searched through Bayesian optimization to obtain a better theoretical curve.

Bayesian optimization is a black-box function optimization method commonly used in scenarios where a target function needs to be maximized or minimized. When constructing a Bayesian optimization model, we need to define a Gaussian process to describe the overall trend and uncertainty information of the target function. We also need to define a surrogate function to approximate the target function and optimize the surrogate function to find the optimal solution of the target function.

Specifically, the following steps are taken to construct the Bayesian optimization model:

  1. (1)

    Define the prior distribution of the Gaussian process. In this step, we need to define a mean function and a covariance function for the Gaussian process. The mean function is used to describe the average value of the target function at different input values, while the covariance function is used to describe the correlation between different input values. The typically chosen Gaussian process prior distribution is the zero-mean Gaussian process.

  2. (2)

    Update the posterior distribution of the Gaussian process based on the existing data. In this step, we need to update the mean function and covariance function of the Gaussian process based on the existing sample data to obtain a more accurate function approximation.

  3. (3)

    Calculate the next sampling point based on the surrogate function. In this step, we need to use the current Gaussian process to fit the target function, construct a surrogate function, and select the next sampling point by optimizing the surrogate function. Common optimization methods include greedy algorithm and coordinate axis optimization.

  4. (4)

    Update the posterior distribution of the Gaussian process based on the new sampling point. After obtaining the new sampling point, it can be added to the existing samples, and these data can be used to update the mean function and covariance function of the Gaussian process.

  5. (5)

    Repeat steps 3 and 4 until the preset stopping conditions are met.

The entire process of Bayesian optimization can be mathematically expressed as follows:

$${x}_{i+1}=arg\underset{x}{{\text{max}}}EI\left(\left.x\right|{D}_{t}\right)=arg\underset{x}{{\text{max}}}({\mu }_{t}\left(x\right)-\xi {\sigma }_{t}(x)$$
(3)

In the equation, xt+1 represents the next sampling point chosen in the iteration, \(EI\left(\left.x\right|{D}_{t}\right)\) is the expected improvement metric, representing the expected increase in target function value over the current best known value, given x as input under Gaussian process fitting. \({\mu }_{t}\left(x\right)\) and \({\sigma }_{t}(x)\) represent the mean and standard deviation of the current Gaussian process at x, respectively. ξ is a hyperparameter that controls the balance between exploration and exploitation and is commonly set to 2 or 3”.

4 Model Training and Experimental Verification

4.1 Model Training

Neural networks are models composed of neurons that utilize components such as weights, biases, and activation functions to facilitate information transmission and processing. These models possess remarkable fitting and expressive abilities, making them suitable for solving various machine learning and deep learning tasks. Therefore, in this study, the neural networks were used to train the sensor calibration data. Simultaneously, the Cross Entropy loss function and the Stochastic Gradient Descent (SGD) optimizer was selected as key components of the neural network and combined to train a more accurate and efficient model.

Model Training: Using a large-scale dataset to train the model, constantly updating the model parameters to improve prediction accuracy and robust performance.

To train the model with a large-scale dataset, the following steps are required:

  1. (1)

    Data collection and preparation: First, it is necessary to obtain enough data to train the model, and the data should be representative and able to cover various possible situations. Then, the data needs to be cleaned, transformed, and normalized, so that the model can better understand and process it.

  2. (2)

    Model selection and design: Based on the application scenario and data characteristics, select an appropriate model structure and determine the parameters that need to be optimized.

  3. (3)

    Loss function and optimizer selection: Depending on the task of the model, select an appropriate loss function and optimizer to evaluate the model and adjust the model parameters.

  4. (4)

    Batch training: Since the dataset is too large to be loaded into memory for training at once, the data needs to be divided into equally-sized batches, and the stochastic gradient descent algorithm (SGD) is used to update the model parameters batch by batch.

  5. (5)

    Batch normalization and regularization: Performing batch normalization before or after each batch can reduce the bias and variance of input features, thereby improving the model's prediction accuracy and robustness. In addition, methods such as L1 or L2 regularization can constrain the size and number of model parameters, avoiding overfitting and underfitting.

  6. (6)

    Model evaluation and fine-tuning: Evaluate the model through the training and testing sets to determine the model's prediction accuracy and robust performance. If problems are found in the model, fine-tuning is needed, such as changing the model structure, adjusting the loss function or optimizer, etc.

Through these steps, the model can be trained using a large-scale dataset, constantly updating the model parameters to improve prediction accuracy and robust performance.

When choosing the appropriate network architecture, number of layers, and number of nodes to establish the ANN model and initialize weights, several steps usually need to be performed:

The problem type is determined by firstly clarifying whether a classification problem or a regression problem is faced. This will help determine the network structure and activation function.

Input and output are determined by specifying the number and type of input feature vectors and output predicted values.

An appropriate activation function is chosen based on the problem type, such as sigmoid or ReLU.

The network structure is designed by selecting a network structure that includes determining the range of the number of nodes in each hidden layer, whether to use dropout techniques, and so on.

Weights are initialized by selecting appropriate weight initial values, such as Xavier initialization, etc.

The model is trained by using the training dataset, and parameters are adjusted based on the validation set results.

The model is evaluated by examining its performance using the testing set.

When selecting the network structure and number of nodes, the principle of Occam's Razor should be followed. That is to say, the model structure should be made as simple as possible with reduced node numbers to prevent overfitting. At the same time, when designing the model, common deep learning frameworks such as TensorFlow and PyTorch can be considered. They provide a series of optimized structures, numbers of layers, and nodes, as well as pre-trained weights, which can reduce some manual parameter tuning work.

Taking the casing pressure sensor as an example, with a measuring range of 0 ~ 70MPa, it should be calibrated using a standard pressure pump source calibrated by the Beijing Institute of Metrology and Measurement. The training samples are shown in the table below.

Table 1. Training sample for calibration of casing pressure sensor.

4.2 Experimental Verification and Data Visualization

Test the model, compare the experimental data with the predicted data, evaluate the reliability and accuracy of the model, and adjust and improve it accordingly.

Based on the experimental test results, present the data in the form of charts and analyze the sources and trends of errors to provide visual support for sensor calibration and testing.

As can be seen from the figure below, the data predicted by AI technology is in very good agreement with experimental data, with a maximum error of only 0.15%, thereby proving the effectiveness of this method.

Fig. 5.
figure 5

Experimental Verification and Data Visualization

Using the above technical solution, it is possible to utilize artificial intelligence technology for the calibration of logging tool sensors in order to improve testing accuracy and efficiency, reduce human error and testing costs, and provide strong support for the drilling engineering in the oil and gas industry.

5 Conclusion

  1. (1)

    The application of artificial intelligence in mud logging sensors can greatly improve the accuracy and reliability of the measurement results.

  2. (2)

    The use of machine learning algorithms artificial neural networks (ANNs) can effectively address the problem of nonlinearity and complex interference in mud logging data.

  3. (3)

    The calibration technology based on these algorithms has been successfully applied to real drilling engineering, achieving excellent results. The study improves the quality of mud logging data and provides a theoretical basis and practical guidance for the further promotion and development of intelligent mud logging technology.