1 Introduction

Reliability centred-maintenance (RCM) is a widely used technique within reliability engineering practise, with over 40 years of history of successful application in various industries [1], including the oil and gas industry. It provides a way of selecting the appropriate maintenance policy and assessment of periodicity (scheduling), where high attention is given to safety critical equipment, especially passive equipment with ‘hidden failure’ potential such as blow-out preventers (BOPs) and downhole safety valves (DHSVs). This paper will focus mainly on the latter.

RCM is a way to identify critical items and to develop appropriate maintenance programs maintaining inherent reliability [2]. Assessment of maintenance intervals (maintenance optimization) being a main step. A description of the steps is given in Rausand et al. [3, p. 392]. The assessment covers attributes such as requirements, consequences, cost, and the probability of failure on demand (PFD).

Functional testing is normally the default for passive safety critical equipment unless there is failure-alerting condition-monitoring. When considering DHSV reliability as part of the RCM analysis, for operations on the Norwegian Continental Shelf, such valves are normally subject to a six-month functional test interval as recommended in NORSOK D-010 [4]. The first year being a bit special, consisting of three one-month and then three three-month intervals, afterwards resuming six-month intervals unless some functional failure occurs. The interval could be adjusted depending on the reliability demonstrated, but also based on planning of other maintenance activity, i.e., grouping of maintenance tasks. Testing the functionality (proof-test) makes it possible to detect DHSV failures before a demand and a potentially dangerous situation occurs. A shorter time between the tests influences the expected time between failure and detection. The idea being that shorter intervals is in favour of reliability and safety. Cost aspects might pull in the opposite direction. One could also have maintenance-induced failures, challenging the actual reliability and safety benefits of frequent testing.

When optimising maintenance, the potential for maintenance-inducing failures could be included as part of an imperfect repair model, although the influence is often ignored in practice [5]. According to Dekker [6] a main reason for why, is the lack of adequate tools and methods for identification of such failures, besides the lack of good data. The lack of sufficient degradation experience leads to extensive use of the exponential failure distribution when estimating the DHSV mean time to failure and PFD. However, for DHSVs such a distribution might not be realistic as the industry recommended test schedule indicate a higher failure rate during the first year of operation. A main objective of this paper is to consider the use of an age-based model, comparing the exponential versus the Weibull distribution, as well as different test strategies for the DHSV maintenance optimisation part of RCM. A Weibull distribution might be more realistic and is often pointed to in qualification, but also way more complex to integrate analytical-wise. This as input for development of an imperfect repair model for scheduling of DHSV proof-tests, where the effect of maintenance-induced failures may be studied.

2 Imperfect Repair Modelling

A typical objective or objective function when optimizing maintenance, is cost; the optimization criteria typically expressed as expected cost over a time period. Reliability might also serve as a criterion, e.g., the number of failures or the PFD. When the prime focus is safety, a key is to ensure acceptable safety integrity performance. In general, the criteria can be expressed through time-based models, which are widely applied in the context of RCM; such as P-F interval models, block-replacement, and age-replacement models (see model descriptions in e.g. Lindqvist [7]; Moubray [1] and, Rausand et al. [3, p. 549]). The influence of maintenance-induced failures for passive safety equipment is discussed in e.g., Hafver et al. [8], where two BOP test schemes are compared, i.e. a scheme with constant testing intervals and a scheme with adaptive scheduling. Adaptive scheduling means that time to next test is adjusted to compensate for changes in failure rate or failure frequency during the operational cycle to maintain the reliability. Such flexibility allows for more efficient test scheduling.

In modelling of repairable systems, such as DHSVs, the impact of imperfect repair is typically reflected by adjustment of the failure frequency w(t) as the starting point following (immediately after) a repair action. The basis is then that the failure frequency immediately after the repair should reflect the quality of the maintenance. The w(t) is sometimes referred to a rate of occurrence of failure (ROCOF).

One of the simplest ways is to assign a repair efficiency function (or index) for the maintenance quality, i.e., the probability of maximum condition improvement, and link this to the w(t) or the PFD. This could be modelled binary by assigning ρ(t) as the probability of perfect repair (as-good-as new), with 1−ρ(t) as the probability of the item condition being as immediately before the failure occurs (as-bad-as old). Such a concept could be applied for any maintenance policy. The concept could also be extended to multistate as suggested in e.g., Doyen and Gaudoin [9]. It represents a type of imperfect repair modelling based on w(t) adjustment, where each maintenance event could lead to either perfect repair (as the best) or minimal repair (as the worst), or something in between. However, the steepness of the failure rate curve at time t, the w’(t), is not influenced by the adjustments. A different way to adjust, is to make time the basis for w(t) reduction. This could change the steepness of w(t), where the reduction factor or function moves the failure rate (virtually) back in time (age reduction). However, such an adjustment will not have much effect under assumption of an exponential failure distribution, where the w(t) is constant. The reduction factor then gives the percentage of time reduction based on the time elapsed since the previous repair or total elapsed time. The adjustment is then limited to the development recorded for the earlier period.

Assuming that failures only are revealed from proof-testing at intervals with length τ and as-good-as-new after repair, the average PFD can be calculated from:

$$PFD_{AVG} = \frac{1}{\tau }\int_{0}^{\tau } {F\left( t \right)}dt$$
(1)

Where F(t) is the PFD at time t; P(T < t). For each cycle, and immediately after the repair, the item is virtually moved back to t = 0. However, if the repair is imperfect, unless the failure rate is decreasing, the PFDAVG will be increasing for each test cycle of length τ. F(t) can be calculated with reference to the failure rate z(t):

$${\text{F}}\left( {\text{t}} \right) = {\text{PFD}} = 1 - {\text{exp}}\left[ { - \int_{0}^{t} {z\left( t \right)dt} } \right]$$
(2)

In (2) if z(t) is a constant z(t) = λ the integral in (2) equals λ · t, giving the exponential distribution. F(t) for the Weibull distribution is presented in Sect. 4. Following each cycle of tests, the underlying failure density distribution and F(t) may then change depending on the repair quality. Let Fi(t) denote the PFD in the interval from immediately after test number i−1 to test i. For constant test intervals with length τ: F1(τ) → F2(2τ) → F3(3τ) → … → Fn-1[(n−1)τ] → Fn().

A change in Fi would then reflects the change in z(t) induced by the maintenance. In addition, the maintenance effect could be modelled by adjusting the virtual start time for cycle i. Instead of start at t = 0, it could start at a point in time reflecting the current equipment condition at start-up to match the PFD. Such a modelling may allow for adjustment of the initial cycle level and for the development up to next test. The PFDAVG for cycle n specifically could be expressed as:

$$PFD_{AVG} \left( {\text{n}} \right) = \frac{1}{\tau }\int_{S\left( n \right)}^{S\left( n \right) + \tau } {F_{n} \left( t \right)} dt;\;\;{\text{for}}\;n > 0$$
(3)

where S(n) is a function assigning virtual start-up time of cycle n. This is used as basis for modelling the effects of maintenance-induced DHSV failures in Sect. 4. For more details and presentation of different imperfect repair models, we refer to reviews of imperfect repair models given in e.g., Wang and Pham [10, p. 13], Pham and Wang [11]. See also Rausand et al. [3, p. 455] and Nakagawa [12, p. 171; 13].

3 The Downhole Safety Valve Situation at NCS

DHSV is a main barrier element in offshore wells and plays a key role for safety management for oil and gas facilities. Each year the Petroleum Safety Authorities Norway (PSA) publish a risk level report including a DHSV reliability status. Figure 1 gives an overview of the development in fraction of failed tests from 2002 to 2019 based on data reported from the oil and gas companies to the PSA (see [14]).

In total over the period 89,514 tests are registered, with 2,582 failures; see also Table 1. As indicated by the figure, there is a significant increase in fraction of failures per test, and data show that 35 out of 80 facilities have a fraction above the critical level of 0.02 in 2019. For the full period almost half of the facilities are above this critical level. Assuming a six-month test interval this gives a constant failure rate λ = \({6}.{68} \cdot {1}0^{{{-}{6}}} {\text{/h}}\), meaning an expected time to failure of 17.3 years. Due to confidentially issues, field-specific data are not presented.

Fig. 1.
figure 1

Fraction of DHSV failures on number of tests

According to the PSA reporting, previous analysis shows that facilities with more than 20 years of operation, is more prone to DHSV failures compared to younger facilities (for the period 2008 to 2017). The analysis also shows that facilities with 6 to 20 years of operation have a significantly lower fraction of failures; supporting a lower failure frequency in the middle of the lifetime.

Table 1. Number of DHSV tests with failures at the Norwegian Continental Shelf

A main challenge for the failed valves is the failure mode ‘leakage in closed position’. A presentation by Molnes [15] shows that this specific failure mode accounts for around 35% of historical failures, being the dominating one. It represents a type of failure that could be traced to the maintenance activity and number of tests performed. The challenge being primarily the robustness of the seal. Number of tests may influence the performance, but also the time between tests if there are long dormant periods (as for having one-year test intervals) with fluids or sand particles eroding the seal, inducing ‘sticking’ and insufficient closing ability. Several publications, e.g., Vick et al. [16] and Vinzant et al. [17], focus on the sealing technology as a performance limiting factor. We refer to Selvik and Abrahamsen [18] for DHSV reliability review.

4 Modelling the Effect of Maintenance-Induced DHSV Failures

The modelling presented in this section is based on (3), where The PFDAVG for each cycle is interpreted as the mean value for the specific period. The overall PFDAVG can then be calculated from the arithmetic mean over the n periods with length τ:

$$Overall\;PFD_{AVG} \left( n \right) = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\frac{1}{\tau }} \int_{S\left( i \right)}^{S\left( i \right) + \tau } {F_{i} \left( t \right)dt}$$
(4)

For a test cycle i, the function S(i) is here seen as a function assigning a value in [0, ∞) based on the stresses accumulated by the i tests performed, assuming no use of the valves except the test demands, with S(i−1) ≤ S(i); 0 < i ≤ n. S(i) comprise as such the tests’ probability of reducing DHSV condition or functionality; with 0 as ‘as-good-as-new’, τ as ‘as-bad-as old’, and t > τ as a decrease in condition and at worst a complete loss of function. It depends on the number of tests, quality of the tests, and the condition at the test. One way is to specify periods of equal length from τ. We select here 45 periods as the number of cycles m assumed before the valve is in a ‘as-bad-as-old’ condition after testing, giving (in hours) when τ is six months:

$${\text{S(i)}} = \frac{\tau }{m}\left( {i - 1} \right) = 96.0\left( {i - 1} \right)$$
(5)

We also assume the F(t) to be independent on the cycle number, i.e., F0(t) equals Fi(t). As input, we refer to the failure data presented in Sect. 4 and will use these as the field-specific results. Regarding the F(t), we consider two common distributions: the exponential distribution with parameter λ = \({6}.{68} \cdot {1}0^{{{-}{6}}} [{\text{h}}^{{ - {1}}} ]\); (giving a mean time to failure MTTF = 17.3 years), and, the Weibull distribution with rate parameter λ (and scale parameter 1/λ), and shape parameter k = 1.3; and with F(t) = \({1} - {\text{exp[}} - (t \cdot \lambda )^{k} {]}\) (MTTF = 15.8 years). The Weibull k parameter is derived from available qualification testing results. The comparison is then achieved by considering three distinct testing strategies: the one with one-year shorter initial tests, a constant six-month strategy, and a maximum 12-month strategy.

For the overall PFDAVG formula (4) under assumption of a constant failure rate λ, and fixed test intervals τi = τ, the formula can be expressed as:

$$Overall\;PFD_{AVG} \left( n \right) = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\frac{1}{{\tau_{i} }}\int_{S\left( i \right)}^{{S\left( i \right) + \tau_{i} }} {\left( {1 - e^{{ - \lambda \cdot{\text{t}}}} } \right)dt} }$$
(6)
$$= \frac{1}{n\tau }\sum\nolimits_{i = 1}^{n} {\left[ {t + \frac{1}{{\uplambda }}e^{{ - {\lambda t}}} ]} \right]}_{S\left( i \right)}^{S\left( i \right) + \tau } = 1 + \frac{1}{n\tau }\sum\nolimits_{i = 1}^{n} {\frac{1}{\lambda }\left( {e^{{ - \lambda \cdot \left( {S\left( i \right) + \tau } \right)}} - e^{ - \lambda \cdot S\left( i \right)} } \right)}$$
(7)

The overall PFDAVG(n) when applying the Weibull distribution can then be expressed as:

$$Overall\;PFD_{AVG} \left( n \right) = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\frac{1}{{\tau_{i} }}} \int_{S\left( i \right)}^{{S\left( i \right) + \tau_{i} }} {\left( {1 - e^{{ - ({\uplambda }_{i} {\text{ t}})^{{k_{i} }} }} } \right)dt}$$
(8)

In practical applications there will be initially higher failure probability. However, this is not included here. The effect would be a higher probability in the first year, although it is uncertain how large this effect would be.

When assuming an exponential distribution the probability of failure at each test is shown in Fig. 2. Longer test intervals have a higher probability of failure from the beginning, but the difference diminishes over the cycles due to maintenance induced wear. Due to the maintenance induced failures, the PFD at the start of a cycle will be larger for shorter test intervals. The PFDAVG for a short test interval policy will therefore become greater than the PFDAVG for a longer interval policy earlier than Fig. 2 might indicate. The 6-month intervals policy will have a higher PFDAVG than the 12-month interval policy after 21.8 years, and with more frequent initial testing already after 17.8 years.

Fig. 2.
figure 2

Probability of the DHSV being in a failed state for a test with exponential distribution, conditioned on that it was functioning at the start of the interval, when the test is performed either at 6 or 12-month intervals or 6-month intervals with more frequent tests initially.

When considering the Weibull distribution, there can be seen an effect of wear in Fig. 3. The more frequent initial test intervals demonstrate a lower probability of being in a failed state for a test during the early years, at the cost of slightly higher probability due to test-induced wear in the later years. There is also a slight curvature to the lines in the plot caused by the increase in failure rate due to the time-adjusted aging. For the Weibull situation, the 6-month intervals policy will have a higher PFDAVG than the 12-month interval policy after 23.0 years, and with more frequent initial testing after 19.2 years; 1.4 years later than for the exponential.

Fig. 3.
figure 3

Probability of the DHSV being in a failed state for a test with Weibull distribution, when the test is performed either at 6 or 12-month intervals or 6-month intervals with more frequent tests initially.

Alternative modelling-approaches are shown below. Figure 4 shows a Weibull distribution where the scale parameter is reduced by a fixed amount per interval. Even though the four additional tests performed in the first year only changes the probability by a small amount initially, the effect grows larger later in life. If the scale is reduced by a percentwise reduction per test a similar difference increase is seen, as shown in Fig. 5. Finally, a scenario where the maintenance activity does not modify the distribution, but rather an independent increasing probability of failure occurring due to the maintenance itself that is then added to the failures that can occur within the interval is shown in Fig. 6. In this case each maintenance activity causes a fixed increase that remains constant over time.

Fig. 4.
figure 4

Probability of DHSV being in a failed state for a test using Weibull distribution, where the scale parameter is reduced linearly by each test, when the test is performed either at 6 or 12-month intervals or 6-month intervals with more frequent tests initially.

Fig. 5.
figure 5

Probability of DHSV being in a failed state for a test using Weibull distribution, where the scale parameter is reduced by a percent for each test, when the test is performed either at 6 or 12-month intervals or 6-month intervals with more frequent tests initially.

Fig. 6.
figure 6

Probability of DHSV being in a failed state for a test using Weibull distribution, where the maintenance activity can cause failures independently of wear state, when the test is performed either at 6 or 12-month intervals or 6-month intervals with more frequent tests initially.

Overall PFDAVG for the different cases are shown in Table 2 below.

Table 2. PFDAVG for different maintenance schemes and different maintenance-induced failure models.

Table 2 presents then the average probability of failure over the 39 years calculated for the different models. The maintenance scheme with 6-month intervals results gives the lowest values in most cases, although for failure independent of running time longer intervals are better with the exponential distribution.

5 Discussion

5.1 The Effect of Maintenance-Induced DHSV Failures

A key is the difference in effect between the test policies. As seen in the above calculations, frequent testing in the initial phase will reduce the maximum probability reached of being in a failed state at a test, at the cost of slightly higher probabilities in later years. From the overall average probability of failure on demand, it is seen that 12-month intervals are worse off in most cases. However, this depends on the severity of damage caused by the testing. It is nevertheless obvious that maintenance is a trade-off between reducing the highest probability of being in a failed state and the overall probability of being in a failed state (such as during the initial phase). This depends on the failure distribution as well.

How maintenance-induced failures occur matters. In most of the examples where the increased testing shifts towards a state of increased wear, either through time-adjustment or through parameter adjustment, the changes seen are not drastic but will accumulate to a greater overall probability of failure over the lifetime. In the examples where failures are actually induced by maintenance, rather than just causing an increase in failure rate during the next cycle, the failure probability will here shift the curves upwards, which illustrates how frequent initial testing can deteriorate the valves quickly. Similar behaviour is difficult to achieve with models that simply “age” the valves when wear-out failures occur relatively late in the example distribution used. Distributions with a finite support (i.e., components with limited lifespan) would be able to develop much stronger differences.

5.2 RCM Value

The modelling of maintenance-induced failures is important to understand if frequent early initial testing is worth the wear to reduce the probability of being in a failed state within a test interval. If there is a fixed cost (in terms of failures) to wear, such as if there is a constant probability of failure due to the maintenance, then the frequency of maintenance can be done by focusing on keeping the maximum probability of failure in an interval as low as possible without considering additional long-term effects. In other cases, the long-term effect of maintenance should be considered, as the total probability of failure during the time the valve is installed can be larger.

When discussing maintenance, other aspects can also be considered, such as the associated costs as well as the needs. If during the early stages there is greater uncertainty in subsurface conditions there is also a greater need for a functioning DHSV, while if the uncertainty is low and therefore low probability of a demand for the DHSV the condition of the valve is less precarious. In such situations one could discuss having more frequent tests when a demand is more likely, and rather increase the test intervals when a demand is less likely to avoid excessive wear.

While several deterministic models are shown here to illustrate some different alternatives to modelling maintenance induced failures, when making decision related to maintenance based on reliability modelling the uncertainty of these models and the underlying failure causes they attempt to represent must be considered.

6 Conclusions

In this paper, DHSV failures and imperfect repair modelling has been presented, including a dataset from the NCS. A Weibull distribution and exponential distribution based on this dataset was then used to illustrate the effect of different models for maintenance-induced DHSV failures. The results illustrate the cost frequent testing can have on later reliability. To make correct decisions regarding testing, it is important that also the uncertainty around maintenance-induced failures, wear during operation and their interactions are included.