Introduction

3D geological modeling technology is an interdisciplinary technology integrating geology, computer science, and other disciplines, and the 3D visual representation of geological bodies can intuitively simulate the shape of the geological structure, geological phenomena, and the spatial distribution of geophysical and geochemical data (Chen et al. 2020; He et al. 2020). Its application field has expanded from the earliest oil and gas exploration (Li et al. 2016) to engineering geology, hydrogeology, urban geological exploration (Xiong et al. 2018), underground space management, and other fields that play a supporting role in the national economy. In recent years, with the rapid development of urbanization, geological problems have become increasingly prominent. The urban underground 3D geological model can not only accurately describe underground geological information, but also provide a decision-making basis for urban environmental monitoring, underground engineering planning, disaster prevention and mitigation. It has become an important topic of smart city research, and it is also the foundation and supporting technology of many major scientific research projects (Wu et al. 2021).To meet the application requirements and support the evaluation and decision-making of geological resources and the geological environment, the 3D geological model should not only accurately describe the relevant geological structures, but also fill in the model with various types of material components and characteristic attributes, namely the geological attribute model, which refers to the spatial distribution characteristic model of geological attribute fields including density, resistivity, salinity, porosity and other geophysical and geochemical parameters. Currently, 3D geological modeling mainly adopts a data-driven modeling method (Wu et al. 2021), and the data source mainly comes from high-precision original data obtained from direct observation or exploration, such as drilling data, profile data, geological attribute data, etc. The precision of a 3D geological model depends on the accuracy of the data. Limited by actual engineering projects, the amount of data used for modeling is limited and unevenly distributed. Even the geological model automatically drawn by the interpolation algorithm cannot well support the construction of the complex geological model. Therefore, the integration of geological knowledge and the guidance of expert knowledge are indispensable steps in practical production.

In China, many years of geological survey work have accumulated a large amount of geological data, which is multi-modal (Ma et al. 2018; Wu et al. 2017). In addition to the traditional structured geological spatial database and vector plane geological map, it also has a large amount of unstructured document data (Wu et al. 2015). The geological survey report contains rich and valuable modeling data. For instance, the text information described in natural language can help researchers understand the regional geological structure framework; the borehole log, as a comprehensive geological map, can be used to describe the characteristics of stratum name, stratum thickness, and lithology; and the geophysical data in tables can be used for the construction of geological attribute models. However, the multi-modal data in a large number of geological survey reports have not been effectively used (Zhang and Liu 2019), which makes the service ability of these rich geological data low and the application rate extremely low. The sparsity of 3D geological modeling data is a bottleneck of 3D geological modeling work. The description information in the existing geological reports contains many valuable modeling data, but it cannot be directly used for geological modeling. How to fully mine these data, accurately understand, and effectively construct a 3D model need to be further studied. Fully mining geological survey report information for 3D geological model construction can provide a new data source for geological modeling, while reducing some manual work.

To improve the utilization efficiency of geological survey reports and make the 3D geological model more reasonable, this study planned to take the bridge group as the research area, which is located in the starting area of the transformation of old and new kinetic energy in Jinan. In addition, we proposed an information processing workflow to extract text, figures and table data from geological survey reports. These will help further construct the 3D geological structure model and attribute model, intuitively reflect the underground space resources in the study area, and provide geological data support for the transformation of old and new kinetic energy and the construction of major projects of government departments.

The remainder of this research is structured as follows: Sect. 2 introduces the related work in advances in 3D geological modeling and previous research of information extraction in the geological context; Sect. 3 provides details on the workflow for the construction of a 3D geological model based on multimodal data; Sect. 4 shows the experimental results of urban 3D geological modeling using the proposed method in this paper; Sect. 5 presents the discussion; and conclusions and suggestions are put forward for future improvement in Sect. 6.

Related work

Advances in the 3D geological model

Since Houlding (1994) put forward the concept of 3D geoscience modeling, developed countries such as Britain and the United States have formulated and implemented plans related to 3D geological surveys. In a scientific strategy report titled "Gateway to the Earth", the British Geological Survey proposed building a nationwide 3D geological model and expanding it to offshore areas by 2023 to meet the growing demand for Earth exploration (Council 2014). The USGS also plans to develop and apply new 3D geological models with spatiotemporal characteristics in its research plan report "Center of Excellence for Geospatial Information Science Research Plan 2013–18"(Usery 2013). In China, with the gradual completion of the construction of the geological cloud platform and the improvement of geological survey informatization, 3D geological modeling has also made some progress in some fields. Shandong Province has carried out the "Perspective of Shandong" project to ascertain the urban geological structure and the status of underground space, and build a high-precision 3D visual geological structure model, which lays the technical foundation for the exploitation of underground space and the siting of construction projects.

Currently, according to the different data sources used in modeling, 3D geological modeling methods can be divided into section based, borehole based, geophysical data based, multi-source data based, etc. The section-based can be used for complex geological body modeling, which is relatively mature. The precision of the 3D geological model is constrained by the number of sections. The more sections there are, the more accurate the model, and the greater the workload. Hao et al. (2021) constructed a three-dimensional geological model of the Kangyangju area using the plane section modeling method with multiple constraints. Chen et al. (2018) presented a locality-based multiple-point statistics approach to reconstruct 3D geological models using 2D cross sections instead of an entire training image. The borehole-based approach is suitable for modeling layered geological bodies, but it is difficult for complex geological bodies. Jiskani et al.(2018) effectively established a 3D solid seam model to produce the spatial distribution maps for coal seam thickness. The modeling method based on multi-source data comprehensively uses plane geological maps, borehole data, profile data, geophysical or geochemical data, attribute data and so on. These data complement and constrain each other, coordinate and unify together, and improve the accuracy of the model. Zhang et al. proposed a collaborative analysis method using multi-source geological data and an interpolation algorithm for geological body modeling (Zhang and Zhu 2018). In recent years, due to the rapid development of machine learning algorithms, many scholars have begun to explore research on 3D geological modeling based on machine learning (Guo et al. 2021; Jia et al. 2021; ZHOU et al. 2019). Bai and Tahmasebi (2020) proposed a hybrid algorithm that combines the cross-correlation simulation method (CCSIM) and CNN, CCSIM adopts a cross-correlation function to calculate the similarity between the reproduced patterns and the training image to obtain initial predictions. Then, the convolutional neural network (CNN) is used to improve these initial values to increase the accuracy of hard data reproduction. Gonçalves et al. (2017) proposed a machine learning approach to the potential-field for implicit modeling of geological structures. More relevant to this study is the work of Zhang et al.(2020), which utilizes information extraction technology to convert paper borehole log data into structured data that can be directly integrated into the 3D geological modeling process.

Previous information extraction in the geological context

In the field of geosciences, many scholars are exploring the research of geological text mining, which focuses on using NLP (Natural Language Processing), statistical analysis, and data mining techniques to find patterns, structures, or trends in unstructured natural language texts. Information extraction based on original unstructured data is the basis and core of text data mining. Peters et al. (2014) developed a statistical machine learning system called the PaleoDeepDive (PDD), which automatically identifies and extracts information about fossils from the scientific literature. Experiments show that the data quality generated by PDD can be comparable to that generated by humans, even when only a small amount of training data is available. Wang and Stewart (2015) built a disaster ontology and combined it with a gazetteer to process semantic information of disaster events from online news reports, extract spatiotemporal information, and obtain spatiotemporal patterns of disaster-related events to track the occurrence and evolution of natural disasters. Holden et al. (2019) developed a geological document analysis system called GeoDocA, which uses automatic text analysis technology to help geologists browse and search geological content in large document libraries.

Recently, there have been some explorations of Chinese text mining research in the field of geosciences. Wang et al. (2015) extracted spatiotemporal information from online news reports with the help of a geological disaster event ontology. Wu et al. (2017) divided each document in the geological archives into content fragments through content splitting, extracted thematic and spatiotemporal features from each fragment, and stored these features in a NoSQL database that is more suitable for unstructured data management, making it suitable for content retrieval and data mining. Qiu et al. (2019) developed an unsupervised deep learning model, that can automatically extract and identify geological named entities from complex geological report texts. Shi et al. (2018) proposed a text mining method based on CNN, which can learn and classify four types of geoscience text data of geology, geophysics, geochemistry, and remote sensing, and automatically extract prospecting information. Qiu et al. (2020) introduced the workflow of extracting spatiotemporal and semantic information from geological reports.

To effectively extract information from Chinese texts, many scholars have studied word segmentation methods for Chinese geological texts to segment meaningful words from continuous Chinese texts without separators (Qiu et al. 2018a, 2018b). Huang et al. (2015) developed a segmenter, named GeoSegmenter, which specializes in the field of geoscience and can effectively identify geoscience terms and general terms. In addition, research on the construction of professional domain ontology has also made many achievements (Mantovani et al. 2020; Xu et al. 2014; Zhong et al. 2017). The introduction of domain ontology can help to improve the accuracy of information extraction. Garcia et al. (2020) proposed a general core ontology model for the geological industry. Li et al.(2019) constructed the geological event ontology, which integrates the event information in the geological field to meet the particularity of the geological field. Hou et al. (2018) designed the Chinese Geological Time Scale Ontology regarding the temporal features within datasets, which can better resolve the semantic ambiguity of geological concepts and data.

Methodology

The main data source of this study is the geological survey report in PDF format, which is a summary report compiled by geological professionals who integrate field investigation results and indoor research data. It has the characteristics of both figures and texts. In addition to the text information described in natural language, it also has a large number of geological maps and attribute tables. These figures or tables are usually explicit or implicit representations of geological knowledge (Qiu et al. 2020). For example, the plane profile can fully reflect the stratum and geological structures at a certain depth underground; the borehole log is the basic data source for constructing a 3D geological structure model; and the table data record extremely valuable geophysical or geochemical analysis or experimental data. Therefore, this study explored the use of multimodal data fusion to construct a 3D geological model by extracting information from geological survey reports. The workflow mainly consists of five parts (Fig. 1): 1) the deconstruction of geological reports; 2) extraction and association of text information; 3) information extraction of borehole logs; 4) construction of a 3D geological structure model; 5) construction of 3D geological attribute model. The details of each part are described in the following sections.

Fig. 1
figure 1

3D geological modeling flow based on multi-modal data

The deconstruction of geological reports

To further extract and analyze different modal data, this study first deconstructed the geological survey report. Through document preprocessing, the text, figures, and tables in the report were divided, classified, and marked, and irrelevant information (e.g., invalid characters), references, catalogs, and other contents were removed at the same time. Inspired by Holden et al.(2019), we also adopted the open source software PDF Figures 2.0 to identify and distinguish text information and graphic content from geological reports. For a given geological report, PDF Figures 2.0 will extract the figures in JPG format, and the document hierarchy was stored in JSON file format, which records the location and content of each chapter title, the location, and the caption of figures and tables (Clark and Divvala 2016). Second, the CamelotFootnote 1 tool is used to extract high-value tabular data, and the data are exported to various file formats, e.g., JSON format, Excel, HTML file, or SQLite database, which can be used for the construction of geological attribute models. In particular, information extraction for borehole logs will be detailed in Sect. 3.3.

Extraction and association of geological text information

In this study, the purpose of geological text context information extraction was to identify named entities and spatial constraints of geological entities from geological reports. Here, geological named entities refer to concepts or terms in the geological field, including but not limited to place names, geological stratification, geological structure, rock-soil mass, minerals, rocks, etc. (Table 1). This paper uses the General Architecture for Text Engineering (GATE)Footnote 2 tool to perform named entity recognition tasks, which is a mature, open-source natural language processing framework that integrates many well-known NLP libraries (e.g. Stanford CoreNLP, LingPipe, OpenNLP, UIMA, etc.), is being widely used by corporations, research labs and universities around the world (Maynard et al. 2020; Song et al. 2021; Van Erp et al. 2021). Figure 2 shows the entire workflow of text information extraction, which is mainly composed of three steps: text preprocessing, named entity recognition, and spatial constraint relation extraction. Text preprocessing is the process of data cleaning, word segmentation, and part-of-speech tagging. Named entity recognition is the basis and premise of spatial constraint relation extraction because the output of named entity recognition is the input of spatial constraint relation extraction.

Table 1 Examples of geological named entities
Fig. 2
figure 2

Text information extraction workflow based on GATE

The geological report text has obvious domain characteristics and contains a large number of geological domain concepts or terms. The direct use of the gazetteer extracted from GATE and the rules of Java Annotations Pattern Engine (JAPE) (Cunningham et al. 1999) is not ideal, mainly in: 1) The Chinese gazetteer usually contains only general domain terms, and does not involve terms in the field of geology; 2) The JAPE rules written based on the characteristics of English grammar cannot effectively support the recognition and extraction of Chinese entities and constraints. To this end, to use GATE for named entity recognition and constraint relationship extraction, it is necessary to work on the above problems: 1) create a professional gazetteer in the geological domain; 2) according to the characteristics of geological report texts, write two-stage JAPE extraction rules.

Gazetteer creation

To effectively identify the proprietary terms in the geological domain, we intend to create a gazetteer with the help of the terms in the geological domain ontology (Zhuang et al. 2021) constructed in previous research. Geological domain ontology is further subdivided into geological thematic ontology, geological temporal ontology, and geological toponym ontology. Specifically, geological toponym ontology is mainly Chinese traditional place names and words related to spatial relations. The geological thematic ontology is constructed according to the geological dictionary released by the Geological Cloud Platform of the China Geological Survey (GeoCloudFootnote 3). As of April 2022, a total of 122,923 geological terms have been included, covering multiple disciplines such as geology, geophysics, geochemistry, rock and mineralogy, hydraulic environment, and paleontology, etc.

Named entity recognition

GATE is based on JAPE rules for named entity recognition, and the result mainly depends on the ability of JAPE matching patterns. JAPE rules are defined in files with.jape, and each.jape file defines the extraction rules of an entity type. Concretely, to use GATE for named entity recognition, we need to perform the following steps (Susanna et al. 2018):

  1. (1)

    Analyze the expression patterns of named entities in geological reports;

  2. (2)

    Define corresponding JAPE syntax rules for each pattern;

  3. (3)

    Load.jape files into GATE as a new JAPE transducer.

GATE also provides a visual interface. Once the named entity is matched, the identified named entity will be highlighted in the editor and assigned a related type. Missing or incorrect annotations can also be manually repaired.

Spatial constraint relation extraction

The description of geological report text is usually carried out under certain standards. The words have obvious domain characteristics, and the expression patterns can be easily extracted. Combined with a large number of the actual corpus, many different types of keywords that can reflect the constraint relationship of geological entities can be summarized (Table 2). Similarly, based on this relatively normative information, combined with the annotated geological named entities, we can define JAPE rules and use the JAPE transducer to extract spatial constraints. The spatial constraint relationship expresses the relative positions between spatial objects. Take the sentence "contact between Granite gneiss and Cambrian Zhangxia Formation limestone" as an example, which expresses the contact relationship between geological entities. There may be many kinds of contact relationship predicates, such as " < geological entity > and < geological entity > [contact type]", " < geological entity > [contact type] in < geological entity > ", etc. These relationship predicates will be expressed according to certain rules, which will be used to construct the rule base for geological feature and constraint relationship identification. After pattern matching based on the rule base, the knowledge is extracted according to the predefined predicate relationship " < geological entity > and < geological entity > [contact type]" to obtain the geological entity constraint relationship "fault contact (Granite gneiss, Cambrian zhangxia formation limestone)", to extract and express the geological entity relationship between geological entities. Similarly, temporal relationship extraction adopts similar methods and rules as the above-mentioned spatial relationship extraction.

Table 2 Keywords of entity constraint relationship

Information association

Geological reports are divided into knowledge units such as figures, named entities, and constraints through information extraction. These knowledge units present the characteristics of diversification and fragmentation. Due to the lack of data associations, it is difficult to exert the potential value of data. In this study, the named entities extracted from the figure caption and text are used as the feature marks, and the knowledge association system between different knowledge units is constructed through the spatial constraint relationship and similarity relationship between the feature marks, which is conducive to geological modelers integrating multiple knowledge units into the process of 3D modeling of urban geology. The association model can be seen in previous papers (Zhuang et al. 2021).

Information extraction of borehole logs

The borehole log is one of the most important diagrams in geological reports. It can be used to describe stratigraphic units, stratum thickness, lithological characteristics, geological structure and contact relationships (Zhang et al. 2020), as shown in Fig. 3. As an important reference for the visualization of underground exploration information, the borehole log is the basis of constructing a three-dimensional urban geological model and plays an important role in the analysis and decision-making of various underground projects. In the past, people needed to manually collect data from papery data to complete the extraction of drilling information. The urban geological 3D model built based on this method has the problems of low efficiency and poor accuracy (Zhang et al. 2020). To this end, we hope to have an approach of automatic and intelligent identification and extraction of borehole logs, which has precedent to follow. In this study, we employ the approach of Zhang et al. (2020) to identify and extract the borehole log. The entire process is shown in Fig. 4.

Fig. 3
figure 3

Examples of borehole logs

Fig. 4
figure 4

The flow of information extraction of borehole logs

Image preprocessing

After the borehole logs are separated from the geological reports, gray processing is used to transform the color images into gray images to enhance the image. Then, the appropriate threshold value is adopted for the binary image, and only the useful information (e.g. table lines and text) is retained to reduce the influence of irrelevant information.

Recognition of table line

An effective method to quickly identify table lines in the image is the Hough transform (Hassanein et al. 2015) algorithm, which is a feature extraction algorithm in image processing and is widely used to identify various geometric shapes in images. Among them, the Hough lines transform is an algorithm that identifies the straight lines in the image. The Houghlines function can be called in OpenCVFootnote 4 to perform the Hough lines transform, and the interference lines generated by the image can be filtered according to the threshold to obtain the line frame of the table.

Cell positioning

On the basis of table line recognition, the four marked points of the table cell are further obtained through staggered calculation. However, the table lines of the borehole log are not completely standardized, and there are some abnormal intersections, which need to be adjusted to the normal position.

Text and symbol recognition

Text and symbol recognition is a process of using optical character recognition (OCR) to convert words into text format in an image, which will greatly reduce manual entry. Additionally, for some difficult words, the optimal word can be selected as the recognition result by combining with the geological thesaurus, which can improve the accuracy of identification to a certain extent.

According to the corresponding national standards and industrial standards, with the help of an information system and combined with the knowledge and experience of professionals, the borehole data will be standardized through classification, data quality inspection, and stratum standardization.

Construction of the 3D geological structure model

The purpose of constructing a 3D geological model is to reflect the real geological situation as accurately as possible based on respecting the existing geological data. Due to the sparsity or inaccuracy of input data and the error of the modeling process, a 3D geological model inevitably has uncertainty (Hou et al. 2021; Olierook et al. 2021). Consequently, to ensure that the constructed model can more accurately restore the geological structure of the modeling area, this study established geological profiles based on standardized borehole data to assist in stratigraphic analysis, which can more accurately and intuitively simulate the structure of the underground space. In this process, the 2D section still needs to be corrected to assist the modeling, which mainly includes the following:

Structurization

According to the requirements of 3D geological modeling, the attributes of the profile are structured, and the necessary geological attribute and control point information is added, such as assigning attributes to strata and borehole trajectory lines to form standard 3D section data with colors, textures, and attributes.

Topology reconstruction

Following the requirements of the section of the 3D geological model, according to the drilling information on the section, the horizontal and vertical coordinate values of the inflection point and endpoint of each section are obtained, the drilling trajectory line of the section is drawn, and the geological boundary and topological relationship of the geological area on the section are reconstructed.

Consistency check

In the case of inconsistent layering or dislocation in the section, the underground position of the stratum can be determined by referring to various data source information such as adjacent boreholes, geological sections, ground floor panels, and plane geological maps (most of the data have been extracted and correlated in Sects. 3.1 and 3.2), and the correct layers can be drawn by taking the terrain line as the benchmark and checking from top to bottom until all cross-sections are consistent with the information of the intersection layer.

3D modeling is established on the basis of a 2D section. This model takes the imported cross-geological section as the main modeling data, which controls the frame structure of the entire modeling area. The 2D section is converted to the 3D section after being imported. If the stratigraphic line cannot be connected, adjustment treatment must be performed on the section. The modeling area was divided into multiple cells through mesh subdivision, and each cell was the smallest unit of modeling. Import the 3D section and geological surface, and transfer the surface boundary line to generate the cell stratigraphic line. The auxiliary line is reasonably drawn, the encapsulation line is used to build the surface, the body is further built, and the encapsulation of the geological body is finally checked. The geological surface in all cells is merged to automatically construct the geological surface model of each stratum in the entire region. According to the attribute and topological relationship of the stratum, all the cell blocks are merged, and the geological model of the whole region is built by combining the 3D surface.

Construction of 3D geological attribute model based on stratigraphic constraints

As mentioned in Sect. 3.1, we employed Camelot to extract the test parameters of engineering and hydrological boreholes in the geological report. Optimize the attribute data mathematically, and eliminate the noise data in light of the geological characteristics of the stratum. The attribute data are resegmented according to the standardized depth of the strata. If there is a situation where the sampling depth of the geotechnical test intersects with the depth of the strata, the corresponding geotechnical test data will be extracted segmentally based on the depth of the top or bottom of the strata, as shown in Fig. 5.

Fig. 5
figure 5

Resampling of attribute data

Taking each stratum of the structural model as the constraint, attribute interpolation is carried out layer by layer to build a 3D geological attribute model, which more finely depicts the distribution form of stratigraphic attributes in this area.

Implementation

Taking the bridge cluster in Jinan city as the modeling area, we adopted the urban 3D geological modeling data processing flow based on the geological survey report proposed in this study and employed MapGIS10.2, which is a mature 3D geological modeling software, to establish the final urban geological model.

The processing of the geological survey reports was conducted on a PC with a 2.50 GHz Intel Core i9-11,900 CPU with 16.0 GB of RAM, and to ensure that it works properly, the minimum requirements of the system are 2G memory. The modeling process is conducted on a PC loaded with discrete graphics.

Overview of the study area

Jinan is located on the northern edge of the middle mountainous region of Shandong. It is adjacent to Mount Tai in the south and the Yellow River in the north. The terrain is high in the south and low in the north. In this paper, the study area Bridge Group is located in the starting area of new and old kinetic energy conversion in Jinan, with a total area of approximately 51 square kilometers. As shown in Fig. 6, the Bridge Group is located on the North Bank of the Yellow River, northeast of the Tianqiao District, bordering Cuizhai Town, Jiyang County in the east, facing Huashan Town, Licheng District across the Yellow River in the south, adjacent to Sangzidian town in the west and Sungeng Town, Jiyang County in the north. Four national highways 104, 308, 309, and 220, and provincial highway 001 run through the territory, and are the transportation hubs leading to Beijing, Tianjin, and Northern Shandong.

Fig. 6
figure 6

The study area in Jinan City

Introduction of used data

This study used the PDF format "Jinan Urban Geological Survey Report" as the research data source. The report includes approximately 388,555 words and more than 500 figures and tables. The modeling area is the Bridge Group in the starting area of Jinan's new and old kinetic energy conversion. The stratigraphic structure of this area is relatively simple. There were 12 boreholes in the study area, and the drilling depth was about 100 m. The bottom of the model was a plane, and the modeling depth was from the ground down to—100 m.

The processing flow of the geological survey report

Taking the "Jinan Urban Geological Survey Report" as the input data, according to the deconstruction method in Sect. 3.1, the geological report was divided into three types of data: text fragments, figures, and tables, as shown in Fig. 7. Geological named entities are indicators that reflect the information and knowledge in the report. In this study, after the extraction of geological text information, we used Neo4j graphics to visualize the diverse relationships between geological named entities, which reflects the basic knowledge of geological survey reports, as shown in Fig. 8. In the figure, the orange circle represents the geological named entity, the blue circle represents the figure and the red circle represents the table.

Fig. 7
figure 7

An illustrative example of a geological report split into figures, tables, and descriptive text

Fig. 8
figure 8

Entity Association expression using neo4j

In addition, there are abundant borehole logs and tabular data in geological survey reports. According to the flow in Sect. 3.3, after graying the picture, the Hough transform is used to identify the table grid, extract the table corners, obtain the divided cells, and identify the contents of the cells. According to the "DD2015-04 Urban Geological Survey Database Structure Specification", the borehole data are standardized, and the attribute table data are extracted by Camelot and stored in the database for the construction of the 3D geological model.

Modeling process

After loading the borehole data from the database, the same stratum of different boreholes was connected to form a complete regional stratum. According to the knowledge association provided by the geological text, the section was processed or manually modified, and it is easy to obtain the section at a given position, as shown in Fig. 9. The fine geological structure model was constructed by crossing and splicing sections to connect the geological boundary with the corresponding boundary points on the section. The line was used to build the surface, and the surface was used to build the body. Finally, it has been checked and corrected. In addition, we use DEM combined with remote sensing to simulate the surface model, which has an intuitive and realistic show and was closer to the actual situation (Fig. 10). MapGIS software provides a variety of interpolation algorithms. We compared the results of different interpolation algorithms and selected the appropriate algorithm based on geological cognition. Finally, taking the stratum as the constraint, we used the inverse distance interpolation method to complete the attribute assignment and finally realized the three-dimensional geological attribute model. Figure 11 shows the saturation geological attribute model we built.

Fig. 9
figure 9

Example of a partial section

Fig. 10
figure 10

The final geological structure model of the study area from five perspectives

Fig. 11
figure 11

The geological attribute model (saturation)

Discussion

In this study, we mainly introduced the workflow of building a 3D geological model by extracting information from geological survey reports. In the process of information extraction, the errors of manual processing are reduced, and the efficiency and accuracy are improved.

Information extraction of geological text

Due to the lack of domain knowledge and large differences in target domains, the recognition ability of specific terms in the geological domain is insufficient, and the feature extraction algorithm in the general domain cannot effectively extract the theme features of the text content. With the help of domain-specific terms provided by the geological domain ontology, we have created a gazetteer to provide a common understanding of geological knowledge and a clear definition of the relationship between geological concepts at different levels, to improve the recognition ability of geological domain specific terms.

Information extraction of borehole logs

There are few studies in this area at present. In the past, people needed to manually collect data from geological survey reports to achieve the structure of drilling data. Our work will reduce the workload of manual processing to a certain extent, especially when the amount of data is large, which will be more efficient. Of course, there are still many problems to be solved in this area, such as the recognition and extraction of irregular tables.

Uncertainty

In addition, since geological survey reports are written by different people and geologists, each person may have different interpretations of the same matter, which may lead to some differences between different reports and uncertainty of information extraction. In this study, this uncertainty has not been addressed for the time being, but it will be the direction of future work. For text information extraction, we can consider adding synonyms to the gazetteer to solve the uncertainty of named entities. For the information extraction of borehole logs, the existing reports are basically expressed in the form of tables, which is a more standardized form. It mainly solves the identification problem of some irregular tables. Later, the uncertainty can be further eliminated through standardization.

Conclusion and future work

Using text mining and natural language processing technology, this study designs a new method to construct a three-dimensional geological model through the information extraction of a geological survey report. Geological text information provides knowledge guidance for geologists; Using the automatic extraction and standardization of borehole data, as well as a certain degree of profile editing and adjustment, the three-dimensional geological structure model was constructed; the three-dimensional geological attribute model is constructed using the extraction of geophysical table data. The entire process makes the creation of a 3D geological model more intelligent and automatic, reduces errors caused by manual treatment, and exerts the data value of geological survey reports.

However, there are still many details to be improved upon. In the process of borehole data and tabular data extraction, the data must be relatively standardized. For some irregular data, the extraction effects are poor, and professional guidance is needed. However, the study area in this paper is the Bridge Group in Jinan, which does not contain a large number of faults, landslides, and rock solublilities. The geological structure is relatively simple, so it is necessary to further establish a more complex urban geological model for verification.

The current research work treats the figures as a whole without more fine-grained analysis and exploration. In the future, we will try to objectify and extract more fine-grained information from the plane geological profile through the deep learning algorithm, and establish the connection between the text and the geological profile through the legend.