1 Introduction

Geoanalytical data primarily include analytical information on isotopes, structure and morphology, rare earth elements, and major/trace element characteristics of geological samples by means of various analytical instruments, such as ICP-MS, EPMA, and XRF (Lightfoot 1993). Geoanalytical data effectively reflect the material composition, external characteristics, internal structure, interaction, and evolutionary history of the Earth and provides essential information based on which geological researchers can understand the Earth’s processes and exploit its resources for human survival and development (Potts 2000; Yin 2009). A large amount of financial, material, and human resources has been invested in geoanalytical research and geological surveys of different domains to acquire more comprehensive and abundant geoanalytical data. Long-term investigations are generating huge volumes of geoanalytical data (Hall 1996; Wang et al. 2001). Through effective preservation and reutilization, such costly information can significantly contribute to scientific research. For example, it can decrease repeated data collection and save the investigation from the country. Through the accumulation of such data, they can be used to obtain knowledge by statistical analysis or data mining techniques, such as trace element discrimination diagrams which are acquired by more than 600 existed trace element analyses of granites from known setting. (Pearce et al. 1984; Thieblemont et al. 1994) Interdisciplinary studies can also reutilize existing data, e.g. for studying cultural development in archeology and developing new geoanalytical methods. However, geoanalytical data are usually stored and used by individual researchers who have collected these data, and the data are managed in an ad hoc manner, often without the metadata that provide the necessary context for interpretation and data integration. This has resulted in confusion and loss of essential information for effective data interpretation and archival purposes. Fewer geoanalytical data are therefore reutilized. To solve this problem, geoanalytical databases are being widely constructed to help preserve and reutilize geoanalytical information. These databases have been established in different study areas for various applications. Databases have been used as tools for preserving, managing, and sharing geoanalytical data. In particular, different kinds of databases provide diverse data repositories to support scientific research. The use of geoanalytical databases saves researchers’ time and has benefited a wide range of scientific problems and disciplines (Liu et al. 2017; Wang et al. 2017a, b, Zhang et al. 2016). For example, researches cited data from database PetDB have been published in many high-cited journals such as Nature (Brandl et al. 2013; Carbotte et al. 2013; Cheng et al. 2016; Dick and Zhou 2014; Helo et al. 2011; Hoernle et al. 2011; Kamenov et al. 2011; Kelley 2014; Samuel and King 2014; Schlindwein and Schmid 2016; Straub et al. 2009) and Science (Cottrell and Kelley 2013; Greber et al. 2017; Joy et al. 2012; Kelley and Cottrell 2009; Mcnutt et al. 2016). Since geoanalytical databases are dispersed and diverse, it is difficult and time-consuming for geological researchers to search and use these databases. Hence, a range of global searchable geoanalytical databases are reviewed in this contribution. The content that can be acquired from these databases, the access methods that they provide, and the functionalities that are developed are introduced. Moreover, constraints of these databases in facilitating the reutilization of geoanalytical data and the creation of more advanced geoanalytical databases are discussed.

2 Profile of geoanalytical databases

The geoanalytical databases reviewed here can be divided into four categories based on their usages. The first category is called “geochemical survey database”, which generally contain data derived from government geochemical surveys. The second category is called “rock databases”, which are typically used for storing data about specific rock types or rock samples from a specified area or project. The third category is called “geochronology and isotope databases”, which are typically used to store age determinations and isotopic ratios. The fourth category includes a small number of other databases, which are not group into a specific category, including reference material geoanalytical databases or laboratory information management systems.

2.1 Geochemical survey databases

Table 1 lists the main geochemical survey databases and each database is outlined as follows. Geochemical survey databases are usually constructed for a certain country, and store data on a national scale (e.g. database No. 1–No. 8). Database No. 9 (FOREGS) was constructed from a geochemical baseline mapping program in Europe, and integrates data archives from 26 countries. The Country attribute in the second column represents the area covered by the database. Geochemical survey databases generally contain data related to the concentrations of 50 or more elements. Geoanalytical data refer to the information including minimum, median, and mean values, as well as the standard deviation, percentile, and the maximum value of different elements. The amount and type of elements analyzed in each database differ between databases. The Elements attribute in the fifth column lists the elements of each database. Methods in the sixth column indicate the main analytical instrumentation adopted in the process of element analysis. Most of the elements are measured by ICP-MS, XRF, and ICP-AES, but some databases will adopt other instrumentation such as No. 3, which uses INAA, and No. 2, which uses AAS.

Table 1 List of geochemical survey databases

2.2 Rock databases

The geoanalytical data in rock databases primarily consist of concentrations of major, trace, and rare elements, petrographic information, isotope ratios, and age determinations. Table 2 lists the main rock databases and the core attributes of these databases. The construction of rock databases is typically performed by a university, such as No. 4 and 6, by an organization, such as EarthChem databases Nos. 1, 2, and 5, or by a department, such as Nos. 10, 13, and 16. Different types of database have different applications and core attributes include the area of interest e.g. the sample type, and contained data and sources. Rock databases are separated according to rock types and geological area. Some databases focus on one rock type; for example, Nos. 2, 3, 9, 14, and 15 consist of igneous rock, Nos. 5 and 8 consist of sediment, and Nos. 7 and 10 consist of intrusive rocks. Some databases contain all rock types such as Nos. 3, 13, and 16. The attribute Type in the fourth column explains the rock type of each database. Different databases have different data sources. Educational databases typically collect data from departments of universities, such as Nos. 3, 4, 6, and 16, organization databases always collect data from international published papers, such as Nos. 1, 2, 5, 7, 9, 12, and 14, and department databases always collect data from staff in the department, such as Nos. 2, 8, 10, 11, 13, and 15. The attribute Source in the sixth column represents the data sources of each database. Data are commonly divided into sample and analytical fields. The sample field provide basic sample collection and description information, including field number, collector, data, site description, geographic co-ordinates expressed in latitude–longitude or easting-northing of a specified map projection, and rock descriptions. Rock descriptions include petrographic, mineralogical, and textural descriptions, and metamorphic fields or textural zones. The geoanalytical data are closely related to sample information. Geoanalytical data are often composed of several items, including major element chemistry, trace element chemistry, isotopic measurements, age and isotopic calculations, volumetric data, petrophysical data, and sample images. The attribute Content in the fifth column represents the geoanalytical data of each database.

Table 2 List of rock databases

2.3 Geochronology and isotope databases

Table 3 lists the main geochronology and isotope databases and their core attributes. As with rock databases and geochemical survey databases, the sample metadata and geoanalytical data are the major data contents of geochronology and isotope databases. The difference is that the analytical data is simply age determinations and isotope ratios, and the sample types are not limited to one or more sample media. Geochronology and isotope databases are usually categorized by different age determination methods; for example, Nos. 5–8 focus on data analyzed by Pb isotopes. However, databases contain data from multiple analytical methods, such as Nos. 1–4, 10–14, and 16–18. The attribute Content in the fourth column represents the analysis method contained in the databases. Other columns in Table 3 have a similar meaning to those in Tables 1 and 2.

Table 3 List of geochronology and isotope databases

2.4 Other related databases

The other four databases in Table 4 are not constructed as data repositories. Nos. 1–3 were constructed as integrated service management platforms for laboratories. These databases are used as a management tool by a laboratory, and data are typically collected for a specific laboratory. No. 4 is a geochemical database for reference materials and isotopic standards. Reference samples include rock powders originating from the USGS, GSJ, and GIT-IWG, synthetic and natural reference glasses originating from NIST, USGS, and MPI-DING, and mineral (e.g. 91,500 zircon), isotopic (e.g. La Jolla, E&A, NIST SRM 981), river water, and seawater reference materials.

Table 4 List of other databases

3 Database usage

3.1 Access methods

Four types of access methods are provided by the database maintainers (Table 5). There are 18 databases that can be accessed on the internet, and the associated website address is listed. Two databases are constructed as desktop software, which must be downloaded and installed onto a local computer. The download address and installation requirements are listed. The download of the software is not free for INFOREX-3.0 databases, while another database, DataView, is completely free. The third access method is using the references listed in databases. In databases ALKEMIA, OZCHEM, and ROCKCHEM, the references for the analytical data are listed in a table. Users can check the sample data they want to use, and download the relevant publications. The last method provided by databases IGBADAT and SEDBA is acquiring data from the authors, as the data is stored on magnetic media (taps or diskettes). The email address of the author is listed for users.

Table 5 Database access methods

3.2 Functionalities

The common functionalities provided by databases are data query, data visualization, data download, and data upload. In addition, some databases have provided data process functionalities to help geological researchers process the data using common techniques. Major functionalities are summarized below and the functionalities of each database are outlined in Table 6.

Table 6 Functionalities of databases

3.2.1 Data query

There are three main query types provided by the databases:

  1. (a)

    The first method involves filling out a query form. The form is composed of multiple specific fields that are filled or selected from the given options. The fields typically consist of a sample description involving rock type, and country or geographical information such as longitude, latitude, and altitude. Some databases include more details such as collector, laboratory, and collection. The fields can be filled selectively. Once the form is completed, the database will provide data according to the contents of the fields. Figure 1 shows a query form of Janus Web Database as an example.

    Fig. 1
    figure 1

    Query form of database Janus Web Database

  2. (b)

    The second method is GIS query. This provides a web map with different scales on the page, on which users can draw an area of interest and submit it as a standalone search term. The data within the area will then be presented on the web page (Fig. 2).

    Fig. 2
    figure 2

    GIS query of Geochron database

3.2.2 Data download

Data download format is an important item for geological researchers who are the users of these precious data. The format can be downloaded from the database effects directly what and how they can deal with these data. For example, SHRIMP zircon U–Th data which can be downloaded from Geochron, it provides the Excel download format and this format is the unified and only format that can be processed with the profession software “Squirt”, it means that geological researchers or SHRIMP analysts are able to reprocess the existed data and estimate the age determination. Another example, data can be downloaded with KML format means users are able to see data directly with Google Earth. As we all know, spatiality is a very important character for all geological data, while geoanalytical data do not make an exception. So the data can be shown and process with Google Earth is a significant thing for geological users, and this function will facilitate many users to utilize this database. The queried subset of geoanalytical data and related sample information can be downloaded onto a local computer. For user convenience when reprocessing these data, download data are usually presented in the following formats:

  1. (a)

    The most common format is a Microsoft Excel spreadsheet (.xlsx).

  2. (b)

    The WMS (Web map service) format includes features that make the data viewable in GIS (geographic information systems) software (e.g. ArcGIS).

  3. (c)

    The KML format can be integrated into Google Earth.

  4. (d)

    The.mdb format can be opened directly by Microsoft Access.

  5. (e)

    The CSV (Comma-Separated Values) file format can be easily imported into different databases.

  6. (f)

    The XML (extensible markup language) format is a structured and general markup language.

  7. (g)

    The seventh format is PDF.

  8. (h)

    The.rdata format can be processed by R users directly.

  9. (i)

    The XML (extensible markup language) format is a structured and general markup language.

  10. (j)

    The seventh format is PDF.

  11. (k)

    The.rdata format can be processed by R users directly.

3.2.3 Data visualization

Most of the databases will provide some data visualization options to introduce the samples stored in the database. Users can then visualize the amount and appearance of samples stored in the databases. Three methods are provided:

  1. (l)

    GIS map. Through this map, users can acquire general characteristics of the samples such as sample distribution and sample number.

  2. (m)

    Table view. All data and associated information will be presented in a table view. Data in the table can be sorted by table captions, and users can acquire sample information by checking the tables page by page.

  3. (n)

    A catalog of data groups. The samples are divided into different groups, and users can check the catalogue to obtain information from the database.

3.2.4 Data process

Some databases also provide multiple data process functionalities according to their applications. The process functionalities are provided separately in Table 7.

Table 7 Data process functionalities

4 Geological application

With the support and usage of these databases, geological researchers conduct various studies and achieve many progress. The applications are so complicated and diverse. Hence, in this review, we summarized and categorized the application cases of PetDB as an example and systematically explain who have used the databases and what did they do with the data of the databases. The databases usually help in the situation that researchers have to take advantage of various and large volume of existed data to confirm the conclusion. According to the incomplete statistics, the PetDB citations number amount to 774. The citations of each year is summarized and the number of citations is drew in Fig. 3. We can acquire that the citations number is increasing year by year. The details of the citations can be acquired from the website http://www.earthchem.org/citations/petdb.

Fig. 3
figure 3

Number of citations of PetDB

The database provides data supporting for geological researchers to study their geological topic. The application of supporting occurs in the situation that researchers have to take advantage of various and large volume of existed data to confirm a conclusion, then the data from the database will be a main data source. For example, Guang-Liang Zhang and Li-Hui Chen reused εHf versus εNd isotope data for site U1431 of the East Pacific Rise (EPR) samples and Indian ridge MORBs from PetDB, the comparison of researchers’ own data and data from PetDB helps researchers acquire the Evolution from carbonated melt to alkali basalt in the south china sea. Moreover, research basing on this result have published the paper on the journal of Nature Geoscience (Zhang et al. 2017); Huichao Zhang and Yongfeng Zhu reused Th/Yb and d. Nb/Yb isotope data from MORB and Ocean Island basalts (OIB). A diagram was drew for the Huilvshan gabbro basing on the data to make a comparison with researchers’ own data and describing the Geochronology and geochemistry of the Huilvshan gabbro in west Junggar (NW China) to make an implication for magma process and tectonic regime. The research basing on this result are published on the Journal of Mineralogy and Petrology (Zhang and Zhu 2017); Monica Wolfson-Schwehr and Margaret S. Boettcher reused geochemistry data of rock samples from PetDB to state the thermal segmentation of mid-ocean ridge-transform faults and the publication basing on this result have been published on the journal of Geochemistry, Geophysics, Geosystems (Wolfson-Schwehr et al. 2017), etc. The applications of the database are diverse and each application refer to one topic. As the geological topics of these applications are so various then we listed the citations of the PetDB in latest 2018 as “Appendix” to provide a reference.

5 Discussion and outlook

For the existed databases, the study of geoanalytical data content, metadata, database structure and access methods, and query methods have laid a solid foundation for the construction of geoanalytical information systems and have preserved many geoanalytical data for Earth science research. Researchers could take advantage of these databases, such as DataView and Geochron, to manage their own data and to share data with other researchers, such as through PetDB, GEOROC, NAVDAT, and GEOREM. They can also download data and metadata from the databases to aid their own research. However, the utilization of databases are not always reliable. Recently, Verma and Quiroz-Ruiz (2016) refrained from using the available databases such as GEOROC for the following reasons. Agrawal and Verma (2007) were the first to show that an indiscriminate use of GEOROC can lead to serious problems. In this critique, they pointed out that Vermeesch (Vermeesch 2013) used the GEOROC database without even ascertaining if the rock names were based on the major element data (Le Bas et al. 1986). More recently, Rivera-Gomez and Verma (Verma and Rivera-Gomez 2013) presented an actual case of Tongan arc data and documented the difficulties in using compiled data without critically examining the original papers from which the data were compiled.

Besides that, the existing databases still have many constraints. Firstly, databases were constructed based on a specific application. Particular data are stored according to the prescribed format of the database. Other geoanalytical information cannot be stored in such a database, and databases based on different applications have subsequently been constructed. Many data are consequently not preserved in databases, and those that have been stored cannot be integrated or communicated due to their heterogeneous nature. Secondly, the data load of databases is usually managed by database maintainers. They have to collect data, check the data quality, format the data, and input them into databases. Maintaining databases is therefore time-consuming, and updating them is slow. Thirdly, databases do not provide a platform on the internet for researchers to access data, which impedes the sharing and reutilization of data.

It is crucial to overcome these problems to promote the development of more advanced databases and facilitate wider reutilization of geoanalytical data. Big data and the development of new techniques, such as cloud storage, cloud computing, and in-depth learning, provide a chance to advance geoanalytical databases. To achieve these goals, four aspects have to be considered in future studies:

  1. (a)

    A universal and efficient geoanalytical data model that can describe all kinds of geoanalytical data and metadata and a database that can accommodate geoanalytical big data have to be constructed.

  2. (b)

    It is important to study the format of data from different geoanalytical instruments, published literature, and different databases. New methods and software need to be developed to achieve the automatic transportation into databases.

  3. (c)

    An advanced platform that provides data visualization needs to be developed to bring about a user-friendly experience and to attract more geological researchers to reutilize the geoanalytical data.

  4. (d)

    Application cases based on the existing databases need to be studied and suggested based on the adoption of new techniques, such as data mining, in-depth study, and machine learning, providing more knowledge that cannot be acquired from the raw data. This could facilitate the reutilization of geoanalytical data by scientific researchers and promote the study of geological research problems.

We hope this review will promote the awareness and facilitate the usage of public databases resources in geological research and make a contribution for the construction of more advanced geoanalytical databases.