Introduction

Marine Ecosystem

Marine environments constitute for over 70% of the earth’s surface, comprising the sub-zero condition of Arctic and glacial Antarctic region, to the temperate aquatic ecosystem in the tropics, representing an untapped source of natural resources and novel drugs [1,2,3]. The marine ecosystem is formed mostly by the five oceans on Earth and their biotic components of plants, mammals, fishes, and profound microbial consortia of protozoans, phytoplankton, bacteria, fungi, micro, and macro algae, viruses, etc. Marine microbes can be found inhabiting the deep ocean sediments with depths of 11,000 m and pressures exceeding 100 MPa and temperature between − 10 and 100 °C [4]. Extreme environmental conditions along with varying temperatures, elevated pressure, and limited sources of light in the oceanic ecosystem work as ideal conditions for synthesis of secondary metabolites [5]. The instinct for survival and thrive for the limited available energy among different marine organisms can also be credited for their diverse amount of derivatives. Microorganisms form about 98% of the marine primary productivity, either as free-living organisms or by forming a synergetic relationship with other microbes, thereby posing a range of bioactive complex either through their own metabolic system or in association with others [6, 7].

Till date, over 23,000 compounds of medicinal importance had been contributed by marine microorganisms including that of peptides, fatty acids, terpenes, enzymes, alkaloids, polyketides, phenols, etc. [8, 9]. Of this, approximately 70% is extracted from actinomycetes, 20% by fungi, 7% from Bacillus spp., and 1–2% by other microbes [10]. Marine microbes like sponges, bryozoans, algae (Chlorophyta, Rhodophyta), cyanobacteria (BGA), and soft corals have also led their contribution in the discovery of drugs like Pyranonigrin [9], Rubrumazine [11], Echinulin [12], Dehydroechinulin [13], Variecolorin [14], and Cristatumin [15]. The isolated or bioactive metabolites are of several biomedical importance such as antibacterial [16], antifungal [17], anti-plasmodial [18], anti-protozoal [19], anti-inflammatory [20], anticancer [21], and anti-viral [22].

While virtually all of the marine microbial species have potential application, the bacterial communities are the foremost contributor and indeed the most-studied species. In any given marine ecosystem, the bacterial population is likely to be about 4 − 6 × 1030 [23, 24]. Among all taxa, bacteria from the order actinomycetes are the active components of marine microbial communities and form a stable and persistent population across different marine ecosystems. They can be found dwelling the intertidal zone, salt marshes, lagoons, estuaries, mangroves, coral reefs, and even the sea floors [25]. For years, actinomycetes were isolated from soil and aquatic sources with limited usages. However, the introduction of metagenomic techniques in recent times has considerably boosted the genomic sequencing of marine actinomycetes exploring their huge biosynthetic potential from marine and other sources [26, 27]. Metagenomics, a culture-independent technology, tends to offer information on the genetic make-up of non-cultivated microorganisms via detection of biosynthetic genes and its expression. Thus, providing a break-through in biosynthesis of previously unknown gene clusters of several microbes, integration of synthetic genes into host microorganisms and grows or developed them under in vitro conditions [4].

Marine Actinomycetes

Actinomycetes are ubiquitous gram-positive bacteria, largely known for the production of a range of antibiotics. Actinomycetes occur in nature mostly as free-living organisms, while some are known to be pathogenic [28]. They form symbiotic associations with different plants or animals. Actinomycetes also form symbiotic relationship with several marine macrofauna and flora-like insects, invertebrates, marine sponges (Axinella polypoides, Haliclona sp.), and cone, snails having a great ecological significance [29].

The rapid emergent of antibiotic-resistant microbes has successfully been addressed by actinomycetes as it remains the most economical and reliable microbes, producing 80% of the world’s antibiotics stock. Actinomycetes are producers of over 2500 bioactive compounds with about 50–55% being produced by the genus Streptomyces [30]. With the discovery of “Streptomycin” in 1943, actinomycetes have been rapidly put into use producing antibiotics namely Streptomycin, Erythromycin, Amphotericin, Vancomycin, etc. inhibiting pathogens of various origins (Table 1). Metabolites obtained from actinomycetes are unique, unprecedented, and occasionally complicated with excellent antibacterial potency and usually low toxicity [31]. Though secondary metabolites secreted by actinomycetes are often considered safe, its pathogenesis cannot be completely ruled out. Most infections by actinomycetes are poly-microbial, involving other aerobic and anaerobic bacteria [32, 33]. Unfortunately, unlike model microbes like E. coli and S. cerevisiae, there are only a few possible genetic manipulation tools available for actinomycetes taxa. In addition to their high GC content (sometimes > 72%), the lower acceptance rate of manipulated actinomycetes DNA implanted into other hosts also impeded its successful expression by the host organism. However, with the help of the recently developed genetic techniques, it can now be expected to address and overcome these challenges for novel actinomycotic genes.

Table 1 Bioactive compounds/drugs isolated from various marine actinomycetes

Traditional Technology for Drug Discovery

The ever-increasing world population and its demand for sources for new drugs against various health ailments had encumbered the existing natural resources. For over three decades, traditional microbiological approaches were used to culture or grow microorganisms from the ocean’s surface. Generally used culturing techniques for characterization of microbial ecology include serial dilution and plating on different selective cultural media (NA, TSA & LB media) [34]. It was followed by Phenotypic screening, compound isolation and characterization, mode of action (antimicrobial assays), preclinical development, and if successful, clinical development and eventually commercialization [35]. The use of classical approaches for selecting microbial strains is solely based on taxonomical or antimicrobial information which limits their impact for further applications. While varied number of microbial culture and mediums are available for growing marine microbes, yet large section of the group still remains unexplored and undefined due to limitations faced in in vitro culturing, lack of detailed taxonomy, and physiological characterizations [36]. Reports suggest that only 30 phyla of the known 61 phyla have cultivable representatives, thus presenting a major challenge for isolation of pure cultures and provide an acceptable overview on microbes capable of growing in in vitro conditions [37].

Marine Metagenomics

Approximately 3.67 × 1030 of microorganisms inhabits the oceanic surface [38]. Despite such ubiquity of microbial richness, so far only a fraction of them (0.001–1%) have been successfully grown under in vitro conditions, which too take months or even years to obtain sufficient amount of biomass [34]. In a culture-based method, only the microbial metabolites are identified, hence it is most likely to overlook the vast majority of other essential chemical entities or enzymes. Molecular-based culture modules with added nutrient supplements that mimic the marine conditions for nutrients content, oxygen gradient, pH, etc. have been highly efficient in overcoming technical bias and maximize the cultivable efficiency in recent times [39]. Hence they are convenient for large-scale production of microbial biomass.

Among such modern techniques, metagenomics offers immense opportunities and serves as a prevailing tool for examining, sequencing, replicating, and identifying potential marine microbial biota. Metagenomics provides an in-depth characterization of the whole DNA/genome collected directly from any mixed population of microbes. It gives a direct access to the bioactive prospective of the microbial consortia without obtaining a pure culture [40]. In addition, metagenomic sequencing helps in interpreting the metabolic and cellular pathway of the species responsible for the synthesis of such novel bioactive metabolites. This technique gives an upper edge through saving time and reducing the dependency of culturing microbes in laboratory, making it possible to explore bioactive compound from species that are obstinate to culturing [39]. Metagenomics can access the discovery of novel natural products by increasing the recovery rates up to 40% compared to traditional methods thus developed as therapeutic agent’s frontier in the discovery of modern-day drugs [41].

Metagenomics as an Approach for Discovery of Bioactive Metabolites

The introduction of metagenomics in biological science has revolutionized the process of drug discovery. It had greatly contributed towards the development of numerous natural product production potential having pharmaceutical importance from diverse environmental niches and marine derivatives. Screening, replication, and transcription of environmental DNA and array of other genomic clusters have led in detection of gene of interest such as Polyketide Synthases (PKSs) (I, II, III), Non-ribosomal Peptide Synthetase (NRPS), and Post-Translationally Modified Peptides (RiPPs) [42]. Mostly two types of approaches are used in metagenomics to explore, probe, and retrieve genes or gene clusters i.e., the Random sequencing (Shotgun analysis, Next-generation sequencing) and the targeted metagenomic sequencing.

Random Sequencing

In a random sequencing (Shotgun sequencing), longer segments of DNA are fragmented into smaller segments and sequenced randomly for several rounds. Upon fragmentation, the whole sequenced are then re-assembled using multiple overlapping of sequenced segments [42]. Shotgun sequencings can be achieved through the following steps (i) isolation of high-quality genomic DNA, (ii) random fragmentation of genomic DNA (ultra-sonication), (iii) size fractionation (electrophoresis), (iv) construction of genomic library, (v) paired-end sequencing, and (vi) assemblage of sequenced segments. Random sequencing for metagenomics is used for characterization the genome of bacteria, Achaea, and viruses for their gene contents and metabolic processes and secondary metabolites [43]. The technique has enabled the successful recovery of total genomic DNA from marine derivatives or sources without in vitro culturing. Random sequencing offers advantages over other techniques as it purges the process of the gene map and requires less time for mapping. However, positive selection and identification of active groups responsible for biological functionaries lack in the process. In addition, there are also certain limitations associated with it when applied to eukaryotic genomes, as they consist of numerous repetitive DNA sequences and gives an inaccurate data for the sequenced genome [44].

Targeted Metagenomics

Targeted metagenomic sequencing collects ample data on the genetic structure and composition, necessary for determining the mode for metabolic adaptive features for any given gene or gene cluster suspected with bioactive potential. It is an ideal platform for constructing gene libraries of specific groups with novel and functional bioactive compounds [45]. Targeted metagenomic approach deliberately sequences environmental DNA pool with high-throughput sequencing technology to reduce genetic complexity. Regulation of the process is influenced by aspects like proper data on sequencing match between target gene and standard genes, the purpose of sequencing, sequence coverage, rapid description, and identification of transformants within a metagenome [46].

Further advantages and development in molecular biology have broadly categorized metagenomic technique as (i) sequence or homology-based screening (ii) function- or active-based screening (Fig. 1).

Fig. 1
figure 1

Metagenomic approaches for drug discovery

Sequence- or Homology-Based Screening

Homology-based modeling is a well-established and high-throughput technique based on the concept that certain tertiary protein structures are better conserved than amino acid sequence. Short protein repeats with length of 20–40 residues represent a significant fraction of known proteins and have diverged appreciably in sequencing and detectable probability in a single sequence [46]. The homology-based approach employs two different strategies to sequence the target genes encoding proteins i.e., the PCR-based sequencing and hybridization-based sequencing techniques. PCR sequencing and detection of petite conserved areas at flanking region allow obtaining information on the whole genome and aid in reconstructing the evolutionary route of desired bioactive in response to adaptation to changes in the ecosystem [47]. In a hybridization-based sequencing technique as DNA microarrays, mRNA extracts from biological samples are simultaneous hybridization over a pre-selected mRNA library, containing a range of mRNA transcripts. The positive expression levels of the desired transcript with known functions can then be acquired by reading the intensities of different hybridization signals [48]. Both hybridization-based and sequencing-based technologies are rather complementary than competitive to one another for gene coding and presently are trusted means for transcriptome profiling and expression.

However, for drug discovery, Next-Generation Sequencing (NGS) technologies are more preferred as they ease finding clusters of the genes of interest (GOIs) like that of type I, II, and III PKSs (Polyketide synthases), NRPS (Non-Ribosomal Peptide Synthetase) or the hybrid PKS-NRPS syntheses, and Massively Parallel Signature Sequencing (MPSS), identified to be key genes for synthesis of many bioactive compounds in marine actinomycetes [49]. Sequencing through the Homology-based screening approach is mainly determined by (i) high-fidelity amplification of relatively large DNA fragments, (ii) employment of appropriate DNA assembly methods, and (iii) selection of host.

(i) High-Fidelity Amplification of Relatively Large DNA Fragments For long, amplification and profiling of total microbial genome were restricted due to the requirement of a hefty amount of genetic materials (50–200 μg total RNA) in the currently employed technique. The use of cDNA and poly(A) RNA libraries tends to moderately improve genetic profiling by insertion of appropriate base sequences and amplifying the gene of interest. It also avoids the mutation or minimizes the polymerase’s error rate by intensifying the fluorescence signal [50]. Headways in genetic engineering of high-fidelity DNA polymerases like Phusion and Q5 are quite helpful in reducing the mutations rate during PCR resulting in higher degree of accuracy in replication of GOIs (genes of interest). However, error-free amplification of GC-rich DNA segments over 3 kb originating from actinomycetes genomes still remains a challenge as they are dependent on enhancer mixtures of chemicals like DMSO and Betaine leading to reduced fidelity of the DNA synthesis [51].

(ii) Employment of Appropriate DNA Assembly Methods DNA assembly is the physical linkage or merging of multiple fragments of any DNA sequence in an end-to-end order, for achieving a desired higher sequence, prior to insertion into a host cell. It is a crucial stage in synthetic biology and cloning as the total genome cannot be interpreted at once with the current sequencing technology. Instead, small sections of the genome with up to 30,000 nucleotide bases are read at once and then assembled to reform the entire genomic DNA [52]. In recent times, the process of DNA assembly has undergone several modifications extending from sequencing to coding of DNA from environmental sources. Based on assembly mechanisms, DNA assembly can be broadly categorized as restriction enzyme-based methods; in vivo and In vitro sequence homology-based methods, and bridging oligo-based methods. Nevertheless, complex gene re-arrangements through de novo DNA synthesis remain labor intensive and inefficient while using assembly methods with GC-rich DNA and needed to be addressed in times to come [53].

(iii) Selection of Host Heterologous gene expression is the key in providing a convenient alternative for the production of large-scale encoded of bioactive from marine actinomycetes. Selection of an appropriate host or carrier for heterologous gene expression is hence a vital component in expression and synthesis of potential novel compound through metagenomics [54]. Gram-positive bacteria like Bacillus subtilis and Streptomyces sp. [55] and Gram-negative bacteria’s like Escherichia coli and Pseudomonas putida [56] have effectively been cast-off for inclusion and expression of suitable GOI’s representing bioactive complexes of bacterial origin. Owing to its rapid growth and facile genetic manipulation, E. coli is the most commonly used hosts for heterologous expressions and protein production [57]. However, Streptomyces being the closely related species and similarities in metabolism are the optimal choice for expressing genes from marine actinomycetes [58].

Through heterologous production, different type II PKSs, macrolactone [51], granaticin [53], medermycin [59], epothilone [60], novobiocin [61], oxytetracycline [62], meroterpene [63], and merochlorins [64] were obtained from Streptomyces coelicolor. Another compound 6-methyl salicylic acid was also extracted through Streptomyces lividans [65]. Heterologous production of NRPs and hybrid NRP-PKSs products (Daptomycin and Capreomycin) were reported from Streptomyces sp. [66]. Other engineered microbial strain considered for expression and extraction of bioactive metabolites includes S. avermitilis [67], S. venezuelae [68], S. sanyensis [69], S. albus [70], S. ambofaciens [71], and S. griseofuscus [72]. Despite the availability of such varied hosts, successful cloning and expression of latent and uncharacterized gene cluster through metagenome is by no means guaranteed and best approach is to optimize strategy against variable hosts. It is basically governed by factors, viz. reliability and consistency, 3-D protein folding, post-translational amendments, expression efficiency, and cost- efficiency [73].

Function- or Active-Based Screening

Function- or activity-based screening is a classic detection method for bioactive metabolites. It is achieved through synthesis or generation of genomic libraries from environmental samples and subsequent screening for the direct detection of the metabolite’s phenotype representing the desired bioactive compound. Functional screening allows the identification and discovery of new classes of bioactive composite and biosynthetic conduit with functions like antibacterial, anti-viral, anti-plasmodial, antifungal, and anti-tumor activity, which were earlier not detectable [74].

Functional screening generally implies the following mode for screening: (i) direct detection of gene phenotypes, (ii) heterologous compatibility, and (ii) substrate-Induced gene expression.

(i) Direct Detection of Gene Phenotypes Direct detection is mostly applied for the detection of enzymatic activities of positive clones or phenotypes using chemical dyes and substrates of the enzyme that are often linked to chromophores enabling to detect it visually or through spectrophotometry. Such techniques are quite useful for screening large number of industrial applicable enzymes “lipases” (lipolytic enzymes and phosphatases) from marine derivate [75].

(ii) Heterologous Compatibility Compatibility between engineered or cloned sequence and the host cells is of high priority for the success of any heterologous expression. Factors like sequence composition (GC content, usage of codon, and folding energy of mRNA), phylogenetic origin, host physiology, promoters in gene library, toxicity of gene products, and resistance mechanism are known to be the major determinants in heterologous compatibility [76]. Some of the commonly identified genes through heterologous compatibility are genes encoding phosphatase activity [77], DNA polymerase-encoding genes [71], antimicrobial resistance genes [66, 72], lipid substrate hydrolysis enzymes [73], Na+/H+ anti-porters [74], and lysine racemases [75].

(iii) Substrate-Induced Gene Expression Substrate-induced gene expression (SIGEX) is a rather new approach for cloning and expression of novel catabolic genes. In SIGEX, a library is created in a culture medium inoculated with restriction digested metagenomic clones supported through an operon-trap expression vector and a suitable cloning host [76]. Expression of a target gene is induced by the substrate results in the co-expressed GFP gene, and thereby positive clones can be rapidly separated from other clones by fluorescent-activated cell sorting (FACS). SIGEX had an advantage over other screening techniques as it consists of ultrahigh-throughput (HTS) FACS-based screening that aids in the rapid screening of variant library [78].

Although functional screening technique is quite successful, aspects such as expression of target sequence in host, host selectivity, and activation of multiple transcriptional are key challenges that are needed to be addressed for metagenomics to be precise and get full access to the metabolic potential of marine environmental sources [79].

Recent trends in HTS technologies can be useful for bioactive extraction from actinomycetes. Metagenomic approaches targeting specific enzymes or metabolic pathways had provided information on functional aspects, structure of gene clusters, and sequences of uncultivated actinomycetes for isolation of new classes of genes with both determined and unknown novel functions [80].

Advantages and Limitations of Metagenomic Techniques

The key advantage of the metagenomic technique is that its culture-independent studies of the microbial community without obtaining pure cultures or prior knowledge of the trait. Metagenomic techniques also have other benefits over conventional sequencing techniques in terms of sample requirements, data comparison and evaluation, powerful analysis while probing for novel bioactive, new biocatalysts, antibiotics or other molecules with possible usages in pharmaceuticals and biological science [54].

While the advantages of metagenomic research are obvious, there are also notable limitations to this process. It includes low resolution, bias classification of short target segments, and false functional confirmation [65, 66]. There are also limitations with the isolate, as a collection of high-quality environmental DNA samples are always complex. Others may occur in forms of lack of proper taxonomic context, sequencing errors, lack of efficient algorithms, poor DNA extraction, and recovery competence. It may further be complicated when DNA is extracted from extremophiles due to difficulties associated with their cell lyses. Cloning and heterologous expression of metagenomic genes in hosts’ cells like E. coli, P. putida, B. subtilis, Streptomyces sp., and other well-described model vectors may sometimes deviate from its actual design product [81].

Gene expression among two different taxonomic classes may often result in undulation of genetic machinery in the hosts as it is unable to recognize the sequence information and fails to express the programmed bioactive metabolite. Sequencing of conserved regions is limited only to the target clones, thereby overlooking the numerous other potential genes that may serve as important sources bioactive metabolites [82]. Literature available on metagenomic techniques suggest that < 1% of bacterial species of ecological communities are cultivable in vitro conditions using traditional culture-based methodologies. So as to increase the culture efficiency of marine microbe (bacteria, fungi, eukaryotes, and other environmental samples) and its product rate, an advanced metagenomic approach enhancing genetic diversity and metabolic potential is required to serve as an alternative approach against the regular microbial screening methods [83].

Summary and Future Prospects

The discovery of Streptomycin and other antibiotics has categorized Actinomycetales as a prolific source of natural bioactive compounds. As apparent that the downfall in the discovery of novel metabolites represents a decline or ineffectivity in screening efforts rather than the exhaustion of compounds, Whole-genome sequence mining can serve better in uncovering the cryptic pathways or biosynthetic pathways for previously undetected metabolites [84]. Metagenomic approaches have helped in engendering novel genes and provide an in-depth sight as well as access to the microbial genetic from different habitats and assemblages. Marine metagenomics had shown promising results in elucidating new bioactivities and metabolic pathways that were previously inaccessible. Metagenomics despite being a relatively successful technique is faced by hindrance and impediment in its discoveries. Current synthesis and development of drugs from marine actinomycetes are primarily encumbered due to improper heterologous gene expression, selective host selection, lack of a robust and universal metagenome data base, etc.

The fusion of more advanced amplification and screening methods such as Denaturing gradient gel electrophoresis (DGGE), Multiple displacement amplification (MDA), Whole-genome amplification (WGA), φ29 DNA polymerase, and random exonuclease-resistant primers had ensued in the assortment of large quantities of high-quality sequenced DNA even from small amount of environmental sample or single-stranded microbes [85]. Sequencing and detection of functional genes are also better achieved through the combination of PCR mutagenesis and chimeragenesis. Development of Amplicons, Fluorescence-activated cell sorting (FACS), Phenotypic micro-array (PM), and Community isotope array (CI Array) techniques have led to the retrieval of novel gene and mRNA recovery from metagenome [57, 61]. Functional screening methods like SIGEX (substrate-induced gene expression) [2], METREX (metabolite regulated expression), next-generation sequencing (NGS), and high-throughput screening (HTS) in metagenomic study have further enabled the exploitation of hidden microbial communities [11, 26]. Different forms of detection techniques like stable isotope probing (SIP) and 5-bromo-2-deoxyuridine labeling (BrdU), suppressive subtractive hybridization (SSH), differential expression analysis (DEA), phage display, and affinity capture have been developed to increase the possibility of screening hit ratio of the target gene and generate highly diverse protein molecules [83]. Enrichment of clones in meta-transcriptomic libraries before insertion to host vectors is also suggested as a substantial step to accomplish the desired function protein.

Heterologous gene expression has been a major challenge in a metagenomic approach. Selecting specific competent fragments of DNA of suitable sizes and subsequently expressing it in a suitable host has been overcome to some extent by Phage display expression metagenomic libraries, cosmids, and bacterial artificial chromosome (BAC) that enrich even rare DNA present in environmental metagenome [17]. Bioinformatics tools like automated genome mining, anti-SMASH [18], or ClusterMine360 [29] are proven to be quite useful in handling multiple large data sets. Developing pattern recognition-based algorithms and information extraction (IE) system will be important to extract the environmental and geographical data from biological scientific proses [13]. As understood that only a fraction (> 1%) of the available marine resources had been exploring till date as potential sources of marine drugs, and thus intensive research is necessary to certainly bring marine actinomycetes into the focus and establish their potentiality. Functional genomics, bioinformatics along with synthetic biology will play a decisive role in the future of metagenomic approach and also paved new possibilities in accessing the full genetic potential of an untamed and uncultivable marine microbial consortium.