Keywords

1 Introduction

Experienced corrosion and material engineers have the task of defining corrosion circuits within a system based on construction materials, operating conditions and active damage mechanisms [1]. This is part of a recommended practice developed by the American Petroleum Institute (API), which outlines the basic elements to maintain a credible risk-based inspection (RBI) programmeFootnote 1. Once the circuits have been defined, the Condition Monitoring Locations (CMLs) and the Thickness Monitoring Locations (TMLs) are installed and documented on a type of engineering drawings known as a Piping and Instrumentation Diagram (P&ID). This process involves a manual mark up which becomes time-consuming and error prone, as shown in Fig. 1.

Fig. 1.
figure 1

Extract of a P&ID representing three corrosion circuits (colour sections), pipe specification (rectangles) and connection points (ellipses) (original resolution: 2048 × 080 pixels).

To define a corrosion circuit, engineers need to identify two key elements within the piping system on the P&IDs. The first one is the pipe specification (pipe spec), which is a character string formed by seven sections divided by hyphens. The second one is the connection point. This is a pair of text lines and arrows pointing towards a division line. In a specific P&ID, both the pipe spec and the connection point can be oriented in different directions (see Fig. 2a and 2b). The more complex a piping system is, the larger the amount of information displayed in the drawing. At first, our aim is to detect the text in both pipe specs and connection points, to then read it and identify the limits of the corrosion sections.

Fig. 2.
figure 2

Different types of shapes and orientations for (a) Pipe specifications and (b) Connection points detections.

In this paper, we propose a framework to develop a semi-automatic novel tool that allows the user to mark up the corrosion circuits in P&IDs based on automatically locating pipe specs and the connection points. The paper is organised as follows: Sect. 2 presents the related work, Sect. 3 the methodology, Sect. 4 the experiments and results, and Sect. 5 concludes and presents future directions.

2 Related Work

Text detection is one of the cornerstones in document image analysis, as it can lead to the location and identification of the depicted shapes. Later on, it can also help on the mapping of the structural representation, as the labels usually contain relevant information such as the direction of flow, sectioning, and other useful data [2].

Many attempts have been presented in literature to digitise P&IDs, mostly following two lines of work. The first one involves the use of heuristics to detect certain well-known shapes, such as geometrical symbols, arrows, connectors, tables and even text [3,4,5,6]. The second and most recent one relies on deep learning techniques in which the algorithms are trained to recognise shapes based on the collection and tagging of numerous samples [7,8,9,10]. Both approaches have advantages and disadvantages depending on the use case. The first family of methods is better suited when the characteristics of the P&ID follow a certain standard; however, if this is not the case, then the latter option can be used.

In the previous edition of this workshop, we presented a paper addressing the challenges and future directions of P&ID digitisation [11]. In that work, we addressed three main challenges: image quality, class imbalance and information contextualisation. Why being able to digitise drawings with reduced quality is still a work in progress, in this new challenge we focus on the two latter. On the one hand, there is a need to locate symbols and text strings which do not appear often in these drawing. Therefore, we must resort to consider heuristic and automatic solutions to properly locate the text strings and symbols which depict pipe specs and connection points respectively. On the other hand, by correctly identifying these pointers, we are able to allow the user to manually mark up the corrosion sections, bringing us one step closer to identifying the structures depicted within the engineering diagram [12].

3 Methodology

The digitisation workflow of our method is comprised of five steps:

  1. I.

    Train two different deep learning models to detect pipe specs and connection points using the YOLO v5 framework and store the detection coordinates.

  2. II.

    Create a binarised image for each detection and look over for connected components based on pixel connectivity to locate the region of interest (ROI) containing the text.

  3. III.

    Get the components statistics from the ROIs and apply a heuristic method to align the detections horizontally.

  4. IV.

    Apply the TesseractFootnote 2 Optical Character Recognition (OCR) engine with a custom configuration.

  5. V.

    Link the codes found in both pipe specs and connection points with their respective locations on the drawing.

3.1 Connection Points and Pipe Specs Detection

A pre-processing step was applied to the dataset to standardise the P&IDs and convert them into grayscale images to reduce noise. Subsequently, two different models were built to detect the pipe specs and connection points separately (see Fig. 3). The convolutional neural network selected was YOLOv5xFootnote 3 with the default configuration. Each model was trained at 3000 epochs with a batch size of 8. The marker tool runs the trained models on PyTorch to generate a cropped image of every single detection and store the positional coordinate relative to the P&ID.

Fig. 3.
figure 3

Detection boxes (a) Connection points and (b) Pipe specs.

3.2 Text Bloc Localisation

The next step is to localise the text block regions. While the text in pipe specs tends to have a consistent shape, the text in the connection points can vary considerably in size and orientation. Depending on the amount of information shown on the P&ID, textual and non-textual elements can overlap and appear on the detection boxes. Thus, to extract the ROI, a noise removal technique was introduced. Firstly, a binarisation process is applied to all the cropped images to reduce noise. Secondly, a connected component labelling (CCL) technique is used to get the shape and size information of the elements in the image. Finally, the components that fulfil a heuristic threshold criterion are kept (see Figs. 4a and 4b for more details).

Fig. 4.
figure 4

Location ROI for (a) Connection points and (b) Pipe specifications ROI.

3.3 Text Alignment

One of the limitations of the Tesseract OCR engine is that it can only interpret the text when horizontally aligned. Although this engine has a built-in configuration to assess the orientation of the text in an image, the number of characters in the detections is not enough to implement this feature. Hence, an additional method is applied to adjust those detections that are misaligned.

This method consists of a two-step process. First, the marker tool identifies the nonaligned detections, and then it determines the direction to rotate them. Given the height and width of the cropped image, we can classify which detections are vertically oriented. While this is a general rule for pipe specs, it does not apply to connection points (see Fig. 2b). Thus, a conditional criterion is added.

Figure 5 shows the rotation conditions for connection points. For each ROI, the x-coordinate of the components are calculated and checked for collinearity among themselves. If the group standard deviation falls behind a heuristically learned threshold, the image is vertically oriented. Subsequently, if the ROI is on the left side of the image, a clockwise rotation is applied. Conversely, if it is located on the right side, the flip direction is counter-clockwise.

Fig. 5.
figure 5

(a) Clock-wise and (b) Counter clockwise rotation criteria.

3.4 Text Recognition

The text strings inside the detected areas contain the code which delimits the corrosion circuits; therefore, the next stage is to read them as accurately as possible. The ROI in the image can contain text in a single or a double line (see Fig. 2b). Connection points are composed of two text blocs with three characters, whereas pipe specs contain a long code string. These aspects can affect accuracy performance. Hence, different experiments were tested on Tesseract OCR to set a custom configuration which delivered the highest accuracy. Given the size of the detections being significantly small compared to the original image, different filters were tested to remove the noise caused by this distortion. The results of these configurations will be discussed in detail in Sect. 4.2.

3.5 Linkage

The last step is to link the codes extracted in the connection points and pipe specs with their respective positional coordinates relative to the P&ID. The pair codes on the connection points are processed and stored as three-string characters (see Fig. 6). For pipe specs, since the code is right next to the fourth hyphen, this is extracted by iterating over the string of characters. Finally, both codes and coordinates are stored in a dictionary.

Fig. 6.
figure 6

The linkage between pipe specs and connection points codes.

4 Experiments and Results

The private dataset consists of 85 P&IDs provided by our industrial partner Archimech Limited. A total of 75 images were labelled, 70% were used for training, 20% for validation, and 10% for the testing, having 1653 and 537 annotations for pipe specs and connection points, respectively. The remaining 10 images from our dataset were used to test the end-to-end performance of the marker tool.

4.1 Detection

Several experiments were deployed modifying the hyperparameters. The optimal results were attained by setting the input image size to 2048 × 1080 pixels, a batch size of 8, and 3000 epochs. The confidence threshold used to run the detection was 0.4 for both models. In the end, we have achieved an accuracy of 96.23% for detecting detect pipe specs and 92.68% for connection points (see Table 1).

Table 1. Stages and their accuracies.

4.2 Text Recognition

The text recognition accuracy for pipe specs is 98.72% and 93.55% for connection points. Due to the variation in shape and the overlapping of text with graphics, it becomes more challenging to recognize the codes on the connection points. Different experiments were tested in order to improve the text recognition accuracy. The optimal performance was achieved by applying a median blur filter to the detections and setting a custom page segmentation model in the Tesseract-OCR engine. In Tables 2 and 3, we show how this method can successfully improve the accuracy text recognition for both connection points and pipe specs, respectively.

Table 2. Connection points text recognition accuracy improvement.
Table 3. Pipe specs text recognition accuracy improvement.

4.3 Final Output

The marker tool allows uploading either a single or a set of multiple P&IDs at once. After running the application, a list of all the codes detected in the drawing is deployed. The user can then select the code to visualize on the drawing and the colour of the marker. Finally, the user can mark up the corrosion circuits and save the document as a jpg file. Figure 7 shows an example of a P&ID section with three different corrosion circuits marked with our novel tool. The company works with many stakeholders which have even more P&IDs to digitise.

Fig. 7.
figure 7

Extract of a P&ID with three corrosion circuits marked using the proposed framework (original resolution: 2048 × 1080 pixels).

5 Conclusion and Future Work

In this paper, we presented an end-to-end framework which allows Oil & Gas engineers to load undigitised P&IDs and locate the corrosion circuits with minimal intervention and error. We used two models trained with a state-of-the-art deep learning technique (i.e. YOLOv5) to find the two shapes of interest, namely the pipe specs and the connection points. Once these are located, there are additional post-processing steps that have to be performed prior to presenting these symbols to the user. In the case of the connection points, the text needs to be found, oriented properly and read with total accuracy. This allows the system to identify which are the symbols of interest which will be shown to the engineer so that the corrosion circuits can be marked up.

In future work, we would like to explore the scalability of our system to perform appropriately in P&IDs generated in other standards and qualities. Moreover, we aim to test more novel deep learning frameworks which allow us to increase the accuracy of the detection and OCR tasks by using techniques that work with limited character sets [13]. Finally, we aim to automate the last step so that the engineer is shown the corrosion circuits automatically.