Keywords

1 Introduction

In the context of “Made in China 2025” [16] and industrial upgrading, various fields have the demand for transformation to digitalization, intellectualization, and internationalization. In this process, engineering drawings, as information carriers for design and manufacturing, play an important role. For example, in the field of architecture, manufacturing, and energy, engineering drawings contain important information on design structure and component details.

The data presented in the title blocks of engineering drawings record important information, including the designer, design date, product parameters, etc. Such information can be used as a reference for new products or can be used to improve directly on the original drawings, thus shortening production cycles and reducing research and development costs.

Nowadays, with the development of the industrial manufacturing field, a huge number of engineering drawings have been accumulated. However, these drawings are difficult to manage. Although CAD software is widely used in the design process, most of the drawings in the fabrication plants exist in the form of scanned copies, prints, and other images. Since the title block, which contains unstructured information and is presented in various styles, cannot be directly recognized by computer system and can only be scanned and queried manually, it greatly reduces the efficiency of engineering drawings utilization and is not conducive to the digital preservation and management of engineering drawings. So far, there is no solution to this problem on the market. Therefore, the development of software that can provide a quick way to manage drawing information and a retrieval function is an urgent need. To achieve this solution, we combine methods including table detection, table line extraction, cell extraction, character segmentation, optical character recognition, and key information extraction.

For table recognition, many methods based on deep learning have been proposed. Gilani et al. [2] implemented and improved the Faster R-CNN [11] model and proposed an image transformation method that converts the table image to a natural image. Huang et al. [5] improved the precision of table recognition by introducing adaptive adjustments, an Anchor optimization strategy, and a post-processing method to the original YOLOv3 [10] model. Also, Qasim et al. [9] proposed an architecture based on graph networks for table recognition; and formulated the table recognition problem as a graph problem.

For table structure recognition, Suzuki et al. [1] proposed a border following algorithm to detect the borders of binary images, which can be used to detect the table cell well. Based on CNN, Siddiqui et al. [13] used semantic segmentation techniques based on FCN to recognize the rows and columns of tables well. Also, based on transformer architecture, Nassar et al. [8] presented a new end-to-end model by using a transformer-based decoder, which can solve complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc.

For character segmentation, many methods based on traditional image processing can segment images with merged characters, noise inference, and unknown size well. For instance, the Histogram-Based Character Segmentation method can both segment the horizontal and vertical characters with high accuracy. Many methods based on the connected components can also segment the characters well.

For optimal character recognition problems, many models based on recurrent neural networks and transformer architecture have been proposed to recognize characters of printed and natural images. SHI et al. [12] presented the CRNN model which can handle arbitrary lengths sequence without requiring character segmentation or horizontal scale normalization. Li et al. [7] proposed an end-to-end model named TrOCR, which is based on the transformer architecture without using CNN as the backbone.

The related studies mentioned above provide partial solutions to the problem of identifying and managing information in the table of engineering drawings and provide algorithmic processes for drawing information extraction. Although the software available in the market can achieve information extraction, content recognition, and text reconstruction of tables, they have some shortcomings. Firstly, they do not provide solutions in an engineering context. Secondly, they do not have an automatic positioning function to detect the table. Thirdly, they can only recognize tables with certain formats, such as title bars with less dense lines. In addition, they do not provide functions for data management and key information extraction.

The rest of the paper is organized as follows: Sect. 2 describes software structure. Section 3 explains the key algorithms used in ED Manager. Section 4 explicitly illustrates the improvement of the histogram projection method and an innovative algorithm called dots-connection method. Section 5 contains the software recognition results on different drawings and comparative analysis with other software. Section 6 concludes the software and provides some directions for future research.

The contributions of this paper are summarized as follows:

  1. (1)

    we improved the character segmentation algorithm to extract words from a sentence with a traditional image processing method.

  2. (2)

    we improved the table line extraction algorithm to avoid the problem of text misidentification due to text sticking.

  3. (3)

    Based on deep learning and traditional image processing techniques, the software extracts both the structure information and content information, reconstructs the tables, and builds a data management supporting keywords search, which provides a solution to the problem of digital preservation and information management of engineering drawings.

Fig. 1.
figure 1

Software framework

2 Overview of ED Manager

2.1 Software Framework

Figure 1 shows the framework of ED Manager, which can be divided into three main parts: user interface layer, information processing layer and data management layer.

The user interface layer provides basic operating functions and interfaces of other layers’ functions. After reading the instructions, the user can run the software through the interface layer such as importing and editing images, selecting processing regions, and viewing the final result.

The information processing layer reconstructs the table and extracts the key information through four main steps. The input of this layer is an selected image. The output is a html and csv file transmitted to the user interface layer for instant check and to the data management layer for long-term storage.

The data management layer defines several operations for users to add, delete, modify and require the stored data. These functions will be shown graphically on the user interface layer. The data management layer contains three kinds of data: image, text and table file. Image is from user input and used for long-term storage and checking results; text comes from the key information extraction in the information process layer; table file from the table reconstruction step in the information process layer includes both the text content and location information.

2.2 Workflow of Information Processing Layer

The workflow of the key algorithms is shown in the red box area of Fig. 2, and the four main steps are image preprocessing, structural information extraction, text information extraction and table reconstruction.

Fig. 2.
figure 2

Workflow of information processing layer

  1. (1)

    Image preprocessing: At this stage, operations include image binarization, resizing, and enhancement of the whole image.

  2. (2)

    Structural information extraction: The Structural information extraction includes four steps: table region location, area preprocessing, table line extraction, cell extraction. Table region location aims to automatically locate the tables on the image; also, the final table region can be chosen and modified manually. Table line extraction extracts the table line and reconstructs the table. The cell extraction extracts each cell by the findContours algorithm.

  3. (3)

    Text information extraction: Character segmentation and text recognition are applied to each cell to convert the optical character into a digital format; after that, key information such as title, date and designers are identified, extracted and stored in the database.

  4. (4)

    Table reconstruction: Table reconstruction combines the location information and text information to reconstruct the comprehensive information of the table.

3 Key Algorithms

3.1 Image Preprocessing

Image preprocessing aims to improve the effectiveness of information extraction and computational efficiency and is usually a necessary process to realize the key image processing algorithms. Specifically, grayscale and binarization are used to reduce the amount of calculation; erosion and dilation are used for filtering out the text and extracting the straight line; the median filter is used for removing noise points; image pyramid is used for expressing the image at multiple resolutions so the segment algorithm can fit images in various sizes. These methods mainly refer to the book Digital Image Processing [3]

3.2 Structure Information

Table Detection. In ED Manager, the detection of title blocks in the drawings is treated as a task of table detection, which means locating tables in an image using bounding boxes. We directly apply the X101 model trained on the dataset TableBank [6] to do this task. TableBank is an image-based table detection and recognition dataset, containing 417K high-quality labeled tables. X101 model is trained on this dataset with the object detection framework Detectron2 [14]. Faster R-CNN algorithm with the ResNeXt [15] is used as the backbone network architecture in training. Other training parameters are specified in [6]. Figure 3 shows the performance of the model on table detection.

Fig. 3.
figure 3

Results of table detection

Table Line Extraction. In ED Manager, three methods based on traditional image processing, including Hough function method, dilating and eroding method, and an innovative dots-connection method, are used to implement table line extraction.

  1. (1)

    The Hough transformation method uses the Hough function to transfer a line in the image coordinate (\(x\,-\,y\)) into a point in the parametric coordinate (\(\rho \,-\,\theta \)). The process of transformation is

    $$\begin{aligned} y=kx+b \implies \rho =x\cos \theta +y\sin \theta \end{aligned}$$
    (1)

    Specifically, the method can be divided into three steps: using the Hough function to transfer the discretized points; discretizing the parametric coordinate and generating a Hough matrix; finding the peak value of the Hough matrix.

    The advantages of the Hough transformation method include strong anti-interference ability and high tolerance of gaps. The disadvantages include high time and space complexity and missing length information.

  2. (2)

    The dilating and eroding method uses the OpenCV to set a specific kernel to extract the horizontal and vertical lines.

    In order to extract the horizontal and vertical lines of images in different sizes, an adaptive algorithms is applied to choose appropriate kernel size. The algorithm can be described as

    figure a
  3. (3)

    The innovative dots-connection method uses dilating and eroding method to extract the dots of the table, and uses the connected component method to delete the misidentified table lines. The method provides a way to eliminate the influence of the characters’ size, table’s aspect ratio, and complex structure on the table line extraction.

Cell Extraction. In ED Manager, the findContours function in OpenCV is used to extract the topological structure of a binary picture, and convert the picture into the border representation. FindContours is a boundary tracking algorithm that scans a binary image by the raster scan method and could distinguish between the outer border and the hole border. With this algorithm, we can extract the surrounding relationship between the boundaries, which corresponds to the surrounding relationship between the connected components [1].

In ED Manager, the function returns the inflection point information of the profile and saves the information in the contours vector. By finding the upper-left and lower-right corners of these returned contour inflection information, we can find the corresponding areas in the original map and intercept and save these areas, thus achieving cell extraction.

3.3 Content Information

Character Segmentation. The segmentation process consists of two parts: horizontal segmentation and vertical segmentation. The former divides the cell into lines of words; the latter segments a line into words. Both two parts apply the histogram projection method. This method counts the number of white pixels in each row (or column). If successive lines (or columns) contain a large number of white pixels in the histogram, it means there are characters in this area and those lines (or columns) should be cut out as a whole. Figure 4 shows the results of projected historgram, images after horizontal segmentation, and vertical segmentation.

Fig. 4.
figure 4

Results of character segmentation

Character Recognition. In ED Manager, to recognize a word instead of a single character, the CRNN with LSTM [4] (long short-term memory) model is used to recognize arbitrary-length words with high accuracy. The principle of the CRNN-LSTM model is based on a feature extraction network and bidirectional LSTM, which can combine long-term memory and short-term memory to avoid the error back-flow problems. Therefore, the CRNN-LSTM model can efficiently recognize arbitrary words by considering the previous and subsequent input.

For ED Manager, we adopt three datasets to train the model. Firstly, an OCR dataset containing English words from Kaggle is implemented. Secondly, we use TRDG (TextRecognitionDataGenerator), a synthetic data generator for text recognition, and an English word list to generate more data. Thirdly, we use pygame to generate a dataset containing out-of-order characters. By combining the three different datasets, the CRNN-LSTM model can recognize various English words accurately. Figure 5 shows the characters that ED Manager is able to recognize.

Fig. 5.
figure 5

Supported recognized characters

Key Information Extraction. To construct an effective database, we extract five types of key information from the drawing: designer name, company name, project title, drawing date, and drawing number.

The key information is extracted by searching with proper keywords. For example, by searching for the characters ‘No.’, we can directly find the cell with the drawing number, as shown in Fig. 6.

Fig. 6.
figure 6

Examples of drawing number

4 Improvement

4.1 Improvement of the Histogram Projection Method

To achieve a better performance of character segmentation, we improve the histogram projection method in the following aspects:

  1. (1)

    The top/bottom 10 rows (left/right 10 columns) are not considered in the histogram. This is because the extracted cell often has white table lines around the characters, which should be ignored for the purpose of character recognition. Otherwise, the white lines will be taken as characters and they may yield errors in the recognition process.

  2. (2)

    We do horizontal dilation to the characters before vertical segmentation. The dilation operation makes the characters in a word closer to avoid cutting a word wrongly in the middle.

4.2 Dots-Connection Method

The dots-connection method combines the dilating and eroding method, connected components algorithm, and the prior knowledge of the table to extract the table line more robust and avoid extracting misidentified table lines, which are caused by some large size characters.

Figure 7 shows the workflow of the dots-connection method.

Fig. 7.
figure 7

Workflow of dots-connection method

In the workflow, the dots-connection algorithm can be divided into five steps. (1) Use dilating and eroding method to extract horizontal and vertical lines in advance. (2) With a bit-and logical operation obtain the intersections of vertical and horizontal lines. (3) Since the second step contains some misidentified areas which is the intersection of large size characters, we calculate the connected components areas and set a threshold value to delete the large areas. (4) To delete the misidentified corner points, we check whether each point is an isolated point. For a non-isolated point, there is at least one point in the same row and one point in the same column. To delete the isolated point, we set its coordinate equal to the previous point. (5) Connect those selected corner points to extract the table lines.

Step 3 and Step 4 are the major improvements of the dots-connection method. Since the intersections of large-size characters tend to be large size adhesion areas, and the corner points’ areas are much smaller, the connected component algorithm can be used to calculate the area of intersections and delete the large parts. Also, with the prior knowledge that at least two points are needed to determine a line of tables, we can delete the misidentified corner points further.

Compared with the Hough function method and dilating and eroding method, our algorithm can get better performance, especially for tables with complex structures, extreme aspect ratios, and large size characters. Also, the time complexity and space complexity of our algorithm is smaller than the Hough function method.

Figure 8 shows different algorithms’ performance for table with complex structures, extreme aspect ratios and large size characters. The experiement illustrates that the dots-connection method gets better performance since the former two methods cannot eliminate the misidentified lines caused by large size characters, which will affect the character segmentation and character recognition.

Fig. 8.
figure 8

Different algorithms’ performance for tables with complex structures, extreme aspect ratio and large size characters

Furthermore, The dots-connection method not only improves the accuracy of extraction; but also guarantees the speed of extraction. Table 1 shows the comparison of the three algorithms on eight different tables (H represents Hough function method, D-E represents dilating dan eroding method, D-C represents dots-connection method). The experiement shows that the accuracy of the dots-connection method is greater than the other two methods in nearly all situations. Since the method relies on the dilating and eroding method to obtain the intersections, the speed is slower than it. Yet the method is faster than the Hough function method. The error rate is defined as the ratio of misidentified cells to the number of total cells.

Table 1. Results of three methods
Table 2. Comparison of different software

5 Experiements and Analysis

To evaluate the performance of ED Manager in table reconstruction, we compare it with some other OCR software, including Tencent OCR, Baimiao OCR, Youdao AI and iFLYTEK AI.

For each of the software mentioned above, we test it with 11 images and count the number of cells which are not reconstructed correctly. The total number of cells of each image is also counted for reference. The results are shown in Table 2. (‘–’ means the table cannot be reconstructed at all, ER represents the error rate). As Table 2 illustrated, the error rate of ED Manager is the lowest among all software, which shows the accuracy of our algorithm.

We notice that the aspect ratio of some images is extremely large and some OCR software has great difficulty in reconstructing such images. However, the dots-connection method in ED Manager enables it to extract this kind of table structure information accurately. Figure 9 shows the reconstruction results for one image with extreme aspect ratio. The result shows that compared with other software, our dots-connection method can extract tables with complex structures, extreme aspect ratios and large characters well.

Fig. 9.
figure 9

Reconstruction results for images 4

6 Conclusion

In this work, we presented the ED Manager, a software based on the fusion of deep learning and traditional image processing, to detect, recognize, and manage the table of engineering drawings automatically with high speed and accuracy.

The ED Manager can finish various tasks including table detection, table structure extraction, table content recognition and table data management. For the table structure extraction, an improved method called dots-connection was presented to extract lines for tables with complex structures, extreme aspect ratios and large characters. And compared with both the existing methods and software, the dots-connection method achieves better performance.

In future work, we plan to enlarge the dataset of character recognition to improve the accuracy of CRNN-LSTM. Also, we intend to improve the dots-connection method to fit more situations such as tables without ruling lines. Furthermore, we hope to develop Android and ios application software, and a website to provide a variety of interfaces.