1 Biogeographic Database

The identification of marine species from the field, with information on the morphology, phylogeny, and habitat, is indispensable to elucidate marine geological and biological environments. A picture guidebook is generally useful for these purposes; now, however, several web-based databases are well-developed for the collection and storage of a large quantity of information and searching of users’ queries. In particular, accumulating information on marine species with genomic data enables one to obtain various data types, such as the gene expressions of a species in several tissues and their phylogenetic relationships.

Biogeographic databases provide information on the distribution of an organism, as well as their biodiversity at a regional to global scale. For marine organisms, the Ocean Biogeographic Information System (OBIS) has collected global data from about 2500 regional/local databases and integrated over 52 million occurrence records of over 120,000 marine species. Each record includes the locality of a species in a collection/observation. OBIS was constructed by a global research/monitoring project (Census of Marine Life [CoML]; Decker and O’Dor 2002; Halpin et al. 2006) and was opened to the public in 2000. It is currently operated as shared property under the Intergovernmental Oceanographic Data and Information Exchange in the United Nations Educational, Scientific and Cultural Organization (UNESCO). The Autonomous Reef Monitoring Structures (ARMS) Program is aimed toward collecting sessile organisms for CoML projects, and the database is managed by the Smithsonian Institute (Plaisance et al. 2009). Using OBIS, users can freely search records by organism name, geographic region, etc., and can then visualize their distribution on a scalable map. OBIS data are also downloadable for secondary use or further analyses. The information on OBIS contributes to understanding the distributions and diversity of marine organisms and has been used for the prediction of marine species distributions, such as future marine biodiversity patterns under climate change, in combination with environmental datasets/models.

In Japan, the Biological Information System for Marine Life (BISMaL) constructed by the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) is a useful data source to learn marine biodiversity around Japan (Fig. 29.1). BISMaL is an integrated database dealing with different kinds of information. It provides taxonomic information (scientific names of species and their higher classification) on marine organisms around Japan as well as their occurrence records from several datasets. An organism is searchable by name/hierarchical taxonomic tree. Descriptions of the morphology and ecology, as well as photographs, are available for some organisms, and movies taken during deep-sea surveys by JAMSTEC are also provided for deep-sea species. Users can search and visualize occurrence records on a map, filter them by taxonomic group, dataset, and depth, and download them for further analyses.

Fig. 29.1
figure 1

An integrated database for marine organisms, the Biological Information System for Marine Life (BISMaL )

BISMaL has collected over 450,000 occurrence records for over 6200 species. The data is still too poor to completely cover the marine biodiversity around Japan; thus, the Japan Ocean Biogeographic Information System Center is calling for research communities in Japan to provide data to BISMaL to establish a robust data source for related fields. Such data collections may also be beneficial for data providers because BISMaL, as an OBIS node providing regional information to OBIS, send data to OBIS, which are then integrated with other data from various regions and opened to the public. A regional database such as BISMaL can be not only a data source, but also a useful tool for data publication online, and provides a way for every scientist/scientific community to contribute to the global dataset. Other than BISMaL, several excellent public and private databases are available to access information on marine organisms online. For example, the Japanese Association for Marine Biology (JAMBIO) is continuously conducting a joint survey and summarizes the marine organisms collected as the database “RINKAI” (Inaba 2015).

2 Genome and Transcriptome Database

For the DNA-based identification of a specific organism or a mass organism survey in marine environments, DNA barcoding and environmental DNA (eDNA) are powerful tools. The former is a method for the rapid identification of known or unknown species by using a short fragment of mitochondrial cytochrome oxidase I (COI or COX1) or the nuclear ribosomal RNA internal transcribed spacer (ITS). The latter is a mass metagenomic analysis of single and multi species from a variety of environmental samples, such as sea water, containing diverse DNA sources from tissue debris and feces. It provides information on species distributions, population, and ecosystems and is useful for environmental biomonitoring.

After complete determination of genome sequences in the pufferfish Takifugu rubripes and the ascidian Ciona intestinalis in 2002, genomic information has been rapidly accumulated for many other marine organisms, due to the development of next generation sequencing. The marine organisms in which complete genome sequences are available include the diatom (Thalassiosira pseudonana ), brown algae (Ectocarpus siliculosus ), sponge (Amphimedon queenslandica ), cnidarians (Nematostella vectensis , Clytia hemisphaerica , and Acropora digitifera ), ctenophores (Mnemiopsis leidyi and Pleurobrachia bachei ), sea urchins (Strongylocentrotus purpuratus , Hemicentrotus pulcherrimus [the latter is a Japanese species]), lancelet (Branchiostoma floridae ), bivalve (Spisula solidissima ), shrimp (Litopenaeus vannamei ), and many others. In some marine organisms without full genome sequences, many sequence data are available from RNAseq analysis of expressed genes (transcriptomes).

The sequences determined independently by researchers with annotations from a single gene or those from mass sequencing are, as ever, submitted to a fundamental initiative for sequence database collaboration that operates the DNA Data Bank of Japan (DDBJ), European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) (ENA; European Nucleotide Archive), and National Center for Biotechnology Information (NCBI) (GenBank). The genome information can be accessed from their genome browsers, where online search tools, including BLAST searches, can be used against the sequence data of a specific organism or mass sequence data. It is often necessary to access the genome information for multiple marine organisms. In these cases, it is useful to obtain information through portal sites. For prokaryotes, there are well-organized databases for marine microbial genomes: MarRef, MarDB, and MarCat. MarBEF is one of the two networks for European marine ecosystem research, along with Marine Genomics Europe (MGE) (Klemetsen et al. 2018). These are easily accessed through the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/). Other useful metagenomic databases include the Genomes OnLine Database (GOLD), Viral Informatics Resource for Metagenome Exploration (VIROME), MGnify (formerly EBI Metagenomics), Integrated Microbial Genomes and Microbiomes (IMG/M), and Marine Life Genome Database (MLGD).

A few mass collections of marine organisms are operated, such as the Global Ocean Sampling (GOS) Expedition on Sorcerer II (J. Craig Venter Institute) and the Tara Expedition (Tara Foundation). The latter implemented an expedition named “Tara Pacific” from 2016 to 2018, which included the investigation of Japanese coasts (Carradec et al. 2018). Samples from the previous expedition, “Tara Oceans,” are mapped to the external Tara Oceans Sample Registry at PANGAEA (http://www.pangaea.de/). Their information and the primary sequence data can be accessed through EMBL-EBI (www.ebi.ac.uk/ena/data/view/PRJEB402; www.ebi.ac.uk/metagenomics/; www.ebi.ac.uk/metagenomics/projects/ERP001736/).