Keywords

1 Introduction

With the continuous growth of global energy demand and the depletion of oilfield resources, there is an urgent need to achieve sustained and stable production in oilfields through automation and intelligent production optimization management. The sucker rod well is the most commonly employed artificial lift system in the oil and gas industry, and liquid production serves as a vital metric for evaluating well productivity and determining optimal production parameters. However, automating the measurement of liquid production in sucker rod wells is challenging due to the complexity of the process flow and high maintenance costs of traditional mechanical liquid measurement mode [1], which hinder oilfield automation management.

With the development of electronic measurement technology and the industrial Internet of Things (IoT) [2], the virtual flow measurement technology based on dynamometer cards in sucker rod wells has gradually gained wide acceptance in the oilfield industry since the 1980s [3], due to its low cost, acceptable error range, and remote operability. This physical-driven model utilizes a vibration mathematical model of the pump rod to solve for the dynamometer card, which indicates the pumping performance of the downhole pump. By analyzing the effective stroke of the pump plunger and quantifying various parameters, the effective liquid production at the wellhead can be calculated. In practical applications, the accuracy and stability of this method can be affected by the simplified system theory models and the complex working conditions of the downhole pump. In light of existing problems, various advanced approaches have been proposed. In 2013, Lyu et al. [4], proposed an interactive method for obtaining pump valve points based on prior knowledge of dynamometer cards and manual experience, which reduces the effective stroke error. In 2020, Yin et al. [5] proposed an analytical solution easily applied for predicting the behaviors of multi-tapered sucker-rod pumping systems, which provides a more precise description of the motion characteristics of the downhole pump. In 2020, Lv et al. [6] proposed a production measurement method based on quantitative analysis of fault dynamometer cards, which effectively improved the accuracy of liquid production prediction under valve leakage conditions. Nevertheless, there is an immeasurable gap between physical models based on prior information and the real world. This difference leads to inaccuracies in the liquid measurement, and further optimization is necessary to address these issues.

In recent years, artificial intelligence (AI) technology has emerged as the engine driving the “Fourth Industrial Revolution,” and it has played a significant role in the digital transformation and intelligent development of the oil and gas industry. Machine learning methods, with their intelligence, simplicity, and efficiency, are widely utilized to address traditional engineering problems [7]. In 2019, Ruiz et al. [8] employed fuzzy logic (FL) and artificial neural networks (ANN) to interpolate oil well data and select the most effective features for predicting production. In 2021, Pan et al. [9] combined convolutional neural networks (CNN) and long short-term memory neural networks (LSTM) alongside attention mechanisms to forecast production with time series data derived from the oil well.

The dynamometer cards, as the most effective indicator data for characterizing the motion characteristics of sucker rod well system, has significantly improved its fault identification and liquid production measurement accuracy due to the application of machine learning models. In 2020, Peng et al. [10] employed a deep autoencoder to extract high-dimensional features from the dynamometer cards, aiming to overcome the limitations of traditional manual feature extraction methods. In 2022, Zhang et al. [11] aimed to the disadvantage that the traditional dynamometer card diagnosis needs a large number of samples, a small sample diagnosis framework based on meta transfer learning is proposed. However, whether it is machine learning or deep learning, the characteristic of these data-driven models is to explore and utilize the underlying patterns in the data. The drawback is that they often lack higher-order explanations in terms of real-world physical significance and may suffer from overfitting and limited generalization capabilities. Furthermore, it is important to note that the current methods for liquid production prediction generally lack the strong theoretical foundation provided by the measurement based on dynamometer cards.

In this work, we presented a hybrid-driven prediction model for liquid production of sucker rod wells that integrated physical and data-driven models using an attention mechanism. A mathematical model was employed to solve the dynamometer cards of the downhole pump, and then quantitative analysis was conducted on the cards to extract physical features that characterized the pump's operational state and theoretical displacement. This ensured that the hybrid model possessed reliable global characteristics. To address limitations in the physical model and quantitative analysis, while using Resnet to extract the high-dimensional features of the surface dynamometer cards. The attention mechanism is used for concentrating on effective features and reduce the impact of low-contributing and ineffective features, which guarantees the high accuracy and robustness of hybrid model.

The remaining work of the paper is arranged as follows. Section 2 introduces theoretical methods in oil production engineering and machine learning. Section 2.1 discusses the production measurement based on dynamometer cards, while Sects. 2.2 and 2.3 present the fundamental theories of the deep learning network ResNet and attention mechanism. Section 3 introduces the hybrid model for production prediction. Section 3.1 presents the detailed structure of the hybrid model. Section 3.2 elaborates on the establishment of the physical model and the steps for extracting physical features. Section 3.3 describes the modeling approach of the hybrid-driven model based on ResNet and the attention mechanism. Section 4 validates the performance of the model through comparative experiments and ablation study. Section 5 summarizes the main contributions of this paper and provides an outlook for future work.

2 Methodology

2.1 Production Measurement Based on Dynamometer Cards

Calculation of Downhole Pump Dynamometer Card

The surface pumping unit is connected to the downhole pump via sucker rods, enabling reciprocating motion. The displacement and load of the surface pumping unit's hanging point are recorded using a dynamometer card. However, the downhole pump is subject to various disturbances, forces, and torques, resulting in vibration or impact phenomena. Therefore, the surface dynamometer card cannot accurately depict the downhole pump's motion characteristics and operational state. Consequently, it is necessary to establish a model of the sucker rod well motion system to mathematically convert the surface dynamometer card into a downhole pump dynamometer card.

The sucker rod well motion system model is a mathematical model that describes the dynamic characteristics of a pumping unit well system. The Gibbs [12] model utilizes a wave equation with viscous damping as the fundamental differential equation to describe the dynamic behavior of the sucker rod:

$$ \frac{{\partial U\left( {x,t} \right)}}{{\partial t^{2} }} = a^{2} \frac{{\partial^{2} (x,t)}}{{\partial x^{2} }} - c\frac{\partial U(x,t)}{{\partial t}} $$
(1)

where, \(U\left(x,t\right)\) is the displacement of any cross-section (\(x\)) of the sucker rod column at any given time (\(t\)), m; \(a\) is the stress wave propagation velocity, m/s; \(c\) is equivalent damping factor, 1/s.

The dynamic load function of the hanging point expressed by the truncated Fourier series and the displacement function of the light rod are used as the boundary conditions, and the motion equation of the cross-section of the sucker rod at any depth can be obtained by the separation variable method:

$$ U(x,t) = \frac{\sigma }{{2EA_{r} }}x + \frac{{\nu_{0} }}{2} + \sum\limits_{n = 1}^{N} {[O_{n} (x)\cos n\omega t + P_{n} (x)\sin n\omega t]} $$
(2)

where, \(E\) is the rod pump Young’s modulus, Pa; \(A_{r}\) is the rod string cross-sectional area, m2; \(\sigma\), \(\nu_{0}\), \(O_{n} (x)\) and \(P_{n} (x)\) are all Fourier coefficients.

According to Hooke's law, the time-varying dynamic load on that section can be determined:

$$ F(x,t) = EA_{r} [\frac{{\sigma_{0} }}{{2EA_{r} }} + \sum\limits_{n = 1}^{N} {[\frac{{\partial O_{n} (x)}}{\partial x}\cos n\omega t + \frac{{\partial P_{n} (x)}}{\partial x}\sin n\omega t]} $$
(3)

where, \(F\left(x,t\right)\) is the dynamic load on any cross-section at a given depth (\(x\)) of the sucker rod, N. At time \(t\), the total load on the cross-section at depth (\(x\)) is equal to the sum of the dynamic load \(F\left(x,t\right)\) and the weight of the sucker rods below the x-section.

A conversion example is shown in Fig. 1. The downhole pump dynamometer card exhibits a smoother and more stable shape by eliminating the deformation of the sucker rod column, rod friction, vibrations, and inertia. This will facilitate quantitative analysis of the pump dynamometer card to determine the effective stroke of the plunger \(S_{p}\).

Fig. 1.
figure 1

Downhole pump dynamometer card conversion

Calculation of Sucker Rod Well Production

The effective plunger stroke \({S}_{p}\) is primarily determined based on the position of the valve opening and closing points on the pump dynamometer card. Typically, the smaller displacement difference between the traveling valve switching point and the standing valve switching point is used as \({S}_{p}\). For example, in Fig. 2(a)(b)(c), the length of segment AD is considered the effective stroke, while during plunger unloading, the \({S}_{p}\) corresponds to the smaller length of segment BC.

Therefore, without considering the conditions of tubing leakage and pump leakage, the actual daily production at the wellhead of a pumping unit well can be calculated using the following equation:

$$ Q = 1440\frac{{\pi D_{p}^{2} }}{4}S_{p} NB_{l} $$
(4)

where, Q is the daily production rate, m3/d; Dp is the diameter of the pump, m; N is the stroke number, min−1; Bl is the volume coefficient of the crude oil with dissolved gas.

Fig. 2.
figure 2

Typical plunger effective stroke. a: Normal. b: Liquid pound. c: Gas effect. d: Plunger removal pump

2.2 Residual Neural Networks

ResNet, introduced by He et al. [13]. in 2015, is a deep convolutional neural network structure. It was specifically designed to tackle the problems of gradient vanishing and gradient explosion during deep neural network training, enabling more efficient training of deeper networks.

The core concept of ResNet is the incorporation of residual connections, also known as skip connections. These connections enable direct flow of information from shallower layers to deeper layers, preventing the loss or degradation of information within the network. The basic building block of ResNet is the residual block, as depicted in Fig. 3. It consists of two main components: identity mapping and residual mapping.

Fig. 3.
figure 3

Residual learning: a building block

When the number of channels of the identity mapping \(x_{i}\) is the same as the residual mapping \(F(x_{i} )\), the output of the residual block can be obtained using the following equation:

$$ x_{i + 1} = x_{i} + F(x_{i} ,w_{i} ) $$
(5)

When the number of channels is different, dimension matching is required by applying a convolutional kernel \(W_{s}\) to adjust the dimensions.:

$$ x_{i + 1} = W_{s} \cdot x_{i} + F(x_{i} ,w_{i} ) $$
(6)

2.3 Attention Mechanism

The core idea of the Attention mechanism is to simulate the attention mechanism humans employ when processing information. In traditional deep learning models, each input is assigned the same weight and attention, but this is not always the most effective approach. On the contrary, the Attention mechanism allows the model to dynamically adjust attention allocation based on the relevance of the inputs. The calculation formula of attention mechanism is:

$$ O = softmax(\frac{{QK^{T} }}{\sqrt L }).V $$
(7)

where O is the output; Q is the input features; K and V the key-value pairs, which are directly derived from the input sequence; L is the input feature length [14].

3 Hybrid Model for Production Prediction

3.1 Hybrid Model

The hybrid model for production prediction consists of several modules: an input module, a data-driven model, a physics-driven model, and an attention mechanism module. The specific architecture is shown in Fig. 4.

Fig. 4.
figure 4

Overview of the proposed Hybrid Model

As shown in Fig. 4, the hybrid model takes as input parameters both the surface dynamometer cards and daily production parameters, such as stroke count, pump diameter, and water cut. The surface dynamometer cards are processed by the data-driven module to extract deep features and obtain a data feature matrix that represents the high-dimensional features.

Simultaneously, the production parameters, along with the surface dynamometer cards, are analyzed by the physics-driven model. This analysis results in a physical feature matrix, which includes conventional sucker rod well production calculations and other physical characteristics.

These data and physical feature matrices serve as inputs to the attention mechanism module, where attention weights are dynamically assigned to the outputs of the data-driven and physics-driven models. The attention mechanism evaluates the relevance and importance of the predictions generated by each module, considering the specific task and input conditions.

By combining the deep features extracted from the data-driven module and the physical features obtained from the physics-driven module, the hybrid model aims to leverage the complementary strengths of both approaches. This integration enables a more comprehensive representation of the input parameters, leading to enhanced the accuracy of production predictions in the context of the sucker rod well system.

3.2 Physics-Driven Model

The physics-driven model is primarily based on the conventional dynamometer card production measurement technique introduced in Sect. 2.1. As shown in Fig. 4, It begins by mathematically modeling and solving the motion system of the sucker rod well to obtain the pump dynamometer card that represents the downhole pump's motion characteristics.

Subsequently, in feature extraction step, the pump dynamometer card is quantitatively analyzed and computed to identify the switch positions of the traveling valve and the fixed valve. This information is then used in Eq. (4) to calculate the theoretical liquid production rate.

Fig. 5.
figure 5

Feature points extraction. (a) is the displacement curve of the data points. (b) is the slope curve of the normalized load variation of the data points. (c) is the normalized pump dynamometer card.

Additionally, by combining the analysis of displacement curve and load slope curve of pump dynamometer card [15], various physical features are extracted, including geometric slope, average load, valve displacement, and load. The specific steps are follows:

Step 1: In Fig. 5(a), starting from the first data point, search for the first point with a displacement equal to 0, which corresponds to the bottom dead center (D). Also, search for the point with the maximum displacement, which corresponds to the top dead center (U).

Step 2: In Fig. 5(b), identify the point with the maximum slope as K1 and the point with the minimum slope as K2. K1 is located during the upward stroke loading process, while K2 is located during the downward stroke unloading process.

Step 3: In Fig. 5(b), starting from point K1, search forward in the data points for the first point where the slope of the load curve is approximately 0. This point corresponds to the first local maximum between the upward stroke loading process and the top dead center (U), and it is referred to as the fixed valve opening point (S1).

Step 4: In Fig. 5(b), starting from point K1, search forward in the data points until the last point before the top dead center (U) where the slope of the load curve is approximately 0. This point corresponds to the last local maximum between the upward stroke loading process and the top dead center (U), and it is referred to as the fixed valve closing point (S2).

Step 5: In Fig. 5(b), starting from point K2, search forward in the data points for the first point where the slope of the load curve is approximately 0. This point corresponds to the first local minimum between the downward stroke unloading process and the bottom dead center (D), and it is referred to as the traveling valve opening point (T1).

Step 6: In Fig. 5(b), starting from point K2, search forward in the data points until the last point before the bottom dead center (D) where the slope of the load curve is approximately 0. This point corresponds to the last local minimum between the downward stroke unloading process and the bottom dead center (D), and it is referred to as the traveling valve closing point (T2).

Step 7: In Fig. 5(c), record the load values of each data point between the fixed valve opening point (S1) and the fixed valve closing point (S2), and calculate the average load during the upward stroke.

Step 8: In Fig. 5(c), calculate the difference in displacement between the fixed valve opening point (S1) and the fixed valve closing point (S2), which corresponds to the effective stroke during the upward stroke. Also, calculate the difference in displacement between the traveling valve opening point (T1) and the traveling valve closing point (T2), which corresponds to the effective stroke during the downward stroke.

These features, along with the theoretical liquid production rate, are combined to construct the physical feature matrix.

3.3 Data-Driven Model

From formulas (1)–(4) and the process of constructing the physical feature matrix, it can be observed that conventional production measurement technique involves numerous assumptions and quantitative analyses. However, during the actual production process of oil wells, various operating conditions and unpredictable dynamometer card deformations can adversely affect the quantitative analysis of the valve switch points, leading to deviations in the calculated effective plunger travel. Therefore, the calculation of production using empirical formulas or mathematical models inevitably introduces certain errors, especially under special operating conditions.

To address this issue, as shown in Fig. 4, this paper adopted a data-driven model to extract deep features from the dynamometer card and utilizes an attention mechanism to effectively integrate the physical and data-driven models.

During the training process of the data model, the dynamometer card Xn is first passed through an image input module that includes convolutional and pooling layers for initial image feature extraction:

$$ {\mathbf{X}}_{{{\mathbf{conv}}}} = f[conv({\mathbf{X}}_{{\mathbf{n}}} * {\mathbf{W}}_{c} ) + b] $$
(8)
$$ {\mathbf{X}}_{{{\mathbf{map}}}} = [\max \{ {\mathbf{X}}_{{{\mathbf{conv}}}} \} ] $$
(9)

where Xconv is the convolutional layer output; Xmap is the pooling layer output; \(\mathit{conv}\left(\cdot \right)\) stands for the convolution operation; Wc is the convolution kernel.

Precise prediction of liquid production from dynamometer card images requires accurate extraction of features, specifically the characteristics embodied in the variations of valve switch points and curves during the loading and unloading processes. To overcome the limitations inherent to multi-layer neural networks, like gradient vanishing, a residual neural network consisting of multiple residual blocks is designed to further extract high-dimensional image features.

$$ {\mathbf{X}}_{{\mathbf{L}}} = {\mathbf{X}}_{{\mathbf{l}}} + \sum\limits_{i = l}^{L - 1} F ({\mathbf{X}}_{{\mathbf{i}}} ,{\mathbf{W}}_{i} ) $$
(10)

where XL is the characteristic of deep unit L; Xl is the characteristic of shallow element l; Other symbols have the same meaning as in formula (6).

Then, the feature matrix obtained from the analysis of the physical model is connected to the data model through fully connected layers. Together, these features are fed into the attention mechanism module for the final prediction of oil production.

4 Case Study and Results

4.1 Dataset

In this study, production data from a certain oilfield in China were selected as an example for experimentation. The sample set consists of 6278 dynamometer cards and corresponding production data from 350 sucker rod wells within a period of 30 days. The dataset was subjected to mathematical and statistical analysis based on different operating conditions, as shown in Table 1.

Upon observing the sample quantities, it can be seen that the largest number of samples corresponds to normal operating conditions, followed by insufficient fluid supply situations. Due to the presence of various types of leakage conditions in pumping unit wells, such as fixed valve leakage, traveling valve leakage, and piston leakage, and the relatively low number of samples for each specific condition, the subcategories related to leakage were merged into one category for statistical analysis.

Table 1. Production statistics under different working conditions.
Table 2. Model Evaluation Results.

Based on the distribution of sample production, it can be observed that under normal operating conditions, the average production of wells is the highest, followed by the leakage condition. This indicates that most wells experiencing leakage have relatively mild leakage situations and lower leakage volumes. The condition with the lowest average production is gas influence, as in this oilfield, most wells affected by gas experience gas lock phenomena, resulting in minimal liquid production.

4.2 Evaluation Metrics

When evaluating regression algorithms, their performance is typically assessed by examining the magnitude of the differences between their predicted results and the true values. The most commonly used evaluation metrics for regression models are the Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE).

$$ RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } }}{n}} $$
(10)
$$ MAPE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{\hat{y}_{i} - y_{i} }}{{y_{i} }}} \right|} \cdot 100\% $$
(11)

where \(y\) is the true value; \(\hat{y}\) is the predicted value from the model. A larger value for both RMSE and MAE indicates a larger difference between the predicted results and the true results of the model, which suggests that the model has lower accuracy in its predictions.

4.3 Performance Verification Based on Ablation Study

Performance verification based on ablation study is a crucial step in assessing the effectiveness and contribution of different components or factors within a machine learning or deep learning model. In this section, we will conduct model performance testing and comparisons by employing a hybrid drive model, various ablation models, and conventional benchmark machine learning models.

The ablation models used in the study include the following:

  1. (1)

    DModel: This model represents a data-driven approach that solely relies on ResNet as the primary component for prediction. It utilizes the deep learning to extract features of dynamometer cards and make predictions.

  2. (2)

    PModel: In this model, only the physical feature matrix is constructed and fed into the attention mechanism model for predicting production.

  3. (3)

    PDModel: This model is a modified version of the hybrid drive model, where the attention mechanism is removed.

In addition to the ablation models mentioned earlier, this study also includes several benchmark models for comparison, predicting the daily fluid production from the physical feature matrix.. The benchmark models are as follows:

  1. (1)

    Support Vector Machine (SVM): SVM works by finding an optimal hyperplane that separates different classes or predicts continuous values based on the data.

  2. (2)

    XGBoost: XGBoost is a gradient boosting algorithm which combines the power of decision trees and gradient boosting techniques to create an ensemble of weak models that collectively make accurate predictions.

  3. (3)

    Multilayer Perceptron (MLP): MLP is a type of artificial neural network with multiple layers of interconnected nodes. It is widely used for various machine learning tasks, including regression.

The experiment details are as follows: The dataset was divided into a training set and a test set in a 4:1 ratio, with 4,708 samples in the training set and 1570 samples in the test set. During the training process, each model underwent a random grid search to determine the best-performing model. In the testing phase, both the hybrid drive model and the other six comparative models were evaluated on the test set. The partial fitting performance of the hybrid model on the test set is illustrated in Fig. 6. The comparison of fitting between the training set and the test set is depicted in Fig. 7. The specific results of the ablation study comparison are presented in Table 2.

Fig. 6.
figure 6

Comparison of actual production and predicted production of hybrid drive model.

From Fig. 6 and Fig. 7, it can be observed that the hybrid model demonstrates satisfactory production prediction accuracy in both the training and test sets. However, there are some samples where the model predicts significantly lower production compared to the actual values. Upon further inspection of these wells, it was discovered that besides the model error, some wells were operating in a "gushing with pumping" state, where the surface production rate significantly exceeded the downhole pump's maximum theoretical displacement. This situation deviates from the overall distribution of the oil well sample set and makes it challenging for the model to predict such high production rates accurately. Therefore, the model's prediction accuracy still remains at a high level.

Analyzing the results from Table 2, it can be observed that the hybrid model exhibits the lowest RMSE and MAPE losses, indicating that the proposed model outperforms other conventional production forecasting models. The accuracy of the model on the test set reaches 95.67%, which is at least 2.43% higher than that of the baseline model.

Fig. 7.
figure 7

Training and Test set fit plots.

Specifically, PDModel ranks second, indicating that the attention mechanism enables the hybrid model to better capture the weight relationship between the image feature matrix and the physical feature matrix, focusing on the most influential features for production forecasting.

Moreover, compared to the baseline models that solely use the physical feature matrix and exhibit lower accuracy, PDModel leverages Resnet for deep feature extraction from the dynamometer card images, while PModel incorporates the attention mechanism to adapt the internal weights of the physical feature matrix, resulting in significantly improved production prediction accuracy.

It is worth noting that the DModel, which solely uses Resnet for extracting pump dynamometer card features, performs the poorest. This is because it lacks input of important production features specific to oil wells, such as stroke count and pump diameter.

Fig. 8.
figure 8

Histogram of model scores.

4.4 Performance Verification Based on Different Working Conditions

In this section, the hybrid model and baseline models will be used to predict liquid production in five different operating conditions: Normal, Insufficient supply, Gas influence, Pump Hitting, and Leak. The goal is to analyze and compare the robustness and generalization of the hybrid driving model. Detailed information about the dataset has been presented in Sect. 4.1.

From Table 3, it can be observed that the hybrid driving model exhibits excellent accuracy in the Normal operating condition. Additionally, in the abnormal operating conditions, it maintains an error of less than 10%. Compared to the corresponding optimal baseline models, it achieves an improvement of around 2% in accuracy, demonstrating good generalization and robustness. It is worth noting that although the hybrid model achieves a relatively high average Mape in the Pump Hitting condition, its RMSE is only 1.72. After observing the distribution of liquid production in the sample of Pump Hitting conditions in the dataset, it can be found that the overall liquid production of the oil well under this operating condition is relatively low, with an average value of 12.74 m3/d and a minimum value of only 0.49 m3/d. Therefore, in cases where the sample size is small and the average value is low, even if the RMSE is only 1.72, the relative accuracy of the model prediction will be greatly affected.

Overall, the number of samples for some special operating conditions in the dataset used is relatively small, which is consistent with the uneven nature of oil well operating conditions in the actual production process. Even under these conditions, the hybrid model can still predict oil well fluid production with high accuracy. Therefore, if the sample is equalized through human operation, the accuracy of the hybrid model will be significantly improved under special operating conditions. Alternatively, in future work, it is necessary to consider combining more comprehensive machine learning algorithms and big data processing techniques to reduce the negative impact of sample imbalance on the overall performance of hybrid models.

Table 3. Model characteristics under different operating conditions.

5 Discussion and Conclusion

In this paper, a physical-data hybrid-driven liquid production prediction method based on the attention mechanism is proposed to solve the problem of automatic and accurate measurement of oil well liquid production. This model fuses the weight relationship of the dynamometer card image feature and the physical feature matrix, and realizes the effective combination of features through the attention mechanism. This allows the model to extract key features from multiple perspectives to better understand the relationship between well conditions and fluid production. It provides a powerful tool for oilfield automation management and intelligent development.

In the future work, more machine learning algorithms will be used to solve the problem that the production prediction of sucker rod wells is greatly affected by the type of working conditions, so as to achieve ideal accuracy under special working conditions.