Introduction

Secondary metabolite production in microbes is strongly influenced by nutritional factors and growth conditions. The repression of penicillin biosynthesis in the presence of lysine is one of the earliest examples of this phenomenon [2]. For instance, the production of pneumocandins, the natural precursors of the antifungal agent Cancidas, was increased 10- to 20-fold by changing the nitrogen source [6].

However, this concept is difficult to apply in the process a priori without previous knowledge of which metabolites will be detected. A common strategy, widely applied in industrial screening programs, consists in the application of several growth conditions (including variables such as production media, incubation time and other factors) to each strain studied [13]. Although there is strong anecdotal evidence that this approach improves the chances of finding the desired metabolite [1, 14], there are practical limitations to the number of conditions used for each microorganism.

Considering that any screening program of microbial natural products can handle a limited number of fermentations, applying many growth conditions to each strain may critically restrict the number of strains and therefore the diversity of biosynthetic pathways included in the screening. As a compromise between those two requirements, a limited number of conditions (typically between three and five) are usually applied to each strain [14, 15].

Furthermore, without prior knowledge about the preferred growth conditions for a given microorganism, random assignment of media to strains may generate an inefficient redundancy of metabolites or extracts lacking relevant levels of secondary metabolites. These problems may be exacerbated if a library of extracts is built for long-term testing in diverse screens. Potentially, it could result in the wasteful screening of poor quality extracts. Previous chemometric studies for strain recognition and de-replication of extract libraries focused on quantitative pair-wise comparisons of a single HPLC–mass spectrography chromatogram against a library of extracts, offering a single similarity coefficient for each pair [5]. Other studies were based on HPLC–diode-array detection (DAD) chromatograms, using matrices for confidently identifying strains that produce similar extracts [7].

In this paper, we present a strategy for improving the quality of the extracts included in a collection of microbial natural products. Our approach is based on growing each strain in a panel of ten media in small-volume format, analyzing the metabolite profiles of each extract by HPLC-DAD and selecting the three media with the most diverse and least overlapping chemical profiles. The three media are then scaled-up to the volumes required to build a collection of natural-product extracts. The process is performed in a semi-automatic fashion by means of an in-house software application (HPLC Studio ver. 1.2.0) that quickly compares HPLC chromatograms and prioritizes the most interesting fermented media according to user-selected parameters. The application of this approach to a group of taxonomically and geographically diverse Actinomycetes is described herein.

Materials and methods

Overall strategy

The analysis of actinomycete metabolite profiles was carried out in four steps: (1) fermentation of each microorganism in a battery of production media, (2) extraction of fermentation broths, (3) chemical analysis and (4) final treatment of the data to select the most appropriate media for each case. After analysis, each microorganism was grown on a larger scale for the collection of extracts for drug screening.

Fermentation of microbial strains

Different groups of Actinomycetes were chosen for the study because of their well known ability to produce secondary metabolites of therapeutic interest. A total of 250 isolates of taxonomically diverse Actinomycetes were obtained from soil (233) and marine (17) samples from different countries: Costa Rica (60), French Guyana (2), Mexico (29), Panama (14), South Africa (112) and Spain (33). A subset of 50 strains from South Africa was used for some of the validation experiments. Inocula for the production cultures were obtained following standard procedures [9] and were seeded (0.5 ml) into glass tubes (25×150 mm) containing 10 ml of the production medium. Each strain was grown in ten different production media. Fermentation tubes were incubated in the same conditions as the inocula [9], in racks with an inclination of 35 ° for 7 days before harvesting.

Initially, we decided to prepare a reasonable number of small-volume fermentations based on different nutritional sources. Each microorganism was fermented in ten of the 11 media selected for study (Table 1). MH medium, designed specifically for members of the Pseudonocardiaceae (G. Platas, unpublished data), was used mostly for strains of this family instead of GPA medium. However, MH medium was also evaluated for some sets of strains from other taxa. The production media were selected on the basis of our experience in the screening of bioactive natural products. The media included many different carbon and nitrogen sources, to maximize the chances of inducing the production of secondary metabolites (Table 1 [2, 6, 10, 11, 13]).

Table 1. Composition of the different media used for secondary metabolite production by Actinomycetes

The process was automated to accommodate the injection of ten samples per microorganism. The HPLC vials were labeled with bar codes generated by querying a database that identified the microorganism, fermentation medium and extraction procedure. Because the media contained UV-detectable chemical compounds, each sequence of vials included unfermented media extracts for later data-treatment. Blank methanol samples were also inserted between different sets as a quality control feature. Later, the HPLC Studio ver. 1.2.0 application identified all the samples injected and linked the fermentation extracts with their corresponding unfermented blank media by database query [3].

HPLC analysis of microbial extracts

Fermentation broths were extracted by liquid/liquid partition with methyl-ethyl-ketone, stirring at room temperature for 1 h. Half of the extract was used, to avoid interface interference. Once dried under a nitrogen atmosphere, the residue was dissolved in HPLC-grade methanol and filtered at a 0.2-μm pore size. Diode-array HPLC gradient characterization was performed with a ZORBAX Rx-C8 column (4.6×250.0 mm) at 210 nm of UV detection. A 10–100% gradient of acetonitrile in water with a flow rate of 0.9–1.2 ml/min was programmed into the 1100 HP Agilent ChemStation, using a constant temperature of 20 °C during the 22 min of each analysis. Samples were buffered with 0.01% trifluoroacetic acid (TFA).

HPLC reverse-phase gradient chromatography provided sufficient reliable information for comparing the chemical diversity of organic extracts. The detector system used was a HPLC-DAD characterizing the 210-nm chromatogram traces [1]. It was assumed that at least one secondary metabolite was represented for each of the peaks detected at 210 nm. The Agilent ChemStation software determined peak presence depending on standard parameters, e.g. the line slope, as described in detail in the reference manual for the HP 1100 series of HPLC spectrometers [4]. The absorption at 280 nm from each peak was also recorded to detect possible phenolic compounds, with the goal of adding more data for a more accurate comparison.

Data processing

Newly developed application software, HPLC Studio ver. 1.2.0, was used for data processing [3]. Thresholds were applied in the comparison, to minimize the influence of peaks with small areas. Only peaks with areas above 2% of the total and a retention time between 3 min and 19 min of the HPLC gradient zone, were included. Some degree of variability occurs in HPLC gradient analysis of complex extracts due to several factors, including the composition of the mixtures, the compound concentrations, the existence of molecules with very close retention times and the level of detection-resolution of the HPLC unit. The HPLC Studio ver. 1.2.0 resolution time parameter used for equating two peaks was determined to be 0.065 min, which corresponded to ±1.5× that of the resolution of the HPLC-DAD.

The core of the data treatment eliminated the peaks present in uninoculated media and created a virtual chromatogram that combined the peaks of all individual extracts. This ″chromatogram template″ (CT; Table 2) was created by a sequential mixing of the individual chromatograms and removal of the peaks with identical 280 nm UV absorption and retention-time differences lower than the user-defined resolution time. The CT contains all the data needed for characterizing the area and diversity of all extracts from one microorganism [3].

Table 2. Glossary of terms

Media prioritization

The characterization of the chemical diversity of the extracts from a given isolate started with the selection of the medium yielding the highest number of compounds. The number of peaks of this medium, relative to the number of peaks of the CT, indicated the chemical diversity achieved by this medium (expressed as a percentage; Table 2). The other media were ranked based on their overall contribution to the remaining CT peaks. The production medium covering most of the remaining peaks was assigned as the next one in the prioritization process; and its percentage of diversity was determined as the number of newly covered peaks with respect to the number of peaks of the CT. This process of counting the yet unassigned peaks, ranking and prioritizing the media that covered most of the CT was repeated until all compounds were accounted by the fermentation conditions.

In addition to the number of peaks, the quantity of detectable material is also relevant in drug discovery screening. Therefore, the areas under the chromatogram were also used for the final ranking of the fermentation media. The influence of assigning different weights to the ″diversity″ (number of peaks) and ″quantity″ (area under the curve) factors was assessed. Depending on its contribution to the CT, each extract obtained a final prioritization ranking and the order-dependent diversity characterization percentage was recalculated.

Reagents

Dextrin, β-cyclodextrin, distillers soluble and MOPS were purchased from Sigma-Aldrich Chemical Co. (St. Louis, Mo.). Salts, solvents and maltose were from Merck (Darmstadt, Germany). Other media components were purchased as follows: fructose, glucose, glycerol, lactose and soluble starch from Panreac (Barcelona, Spain), Bactopeptone, meat extract and yeast extract from Difco (USA), tomato paste from Heinz Co. (USA), V8 juice from Campbell (USA), cane molasses from Cadbury Beverages (USA), corn molasses from Quaker Oats Co. (USA), wheat flour, millet meal and oat meal from Arrowheat Mills (USA) and N-Z soy BL from Quest International (USA). Pharmamedia was purchased as 8708700 Reg from Trader Protein (USA), peptonized milk was from Oxoid (USA) and primary yeast was purchased from Champlain Industries (USA). The HPLC 1100-DAD HP ChemStation equipment (software ver. A.06.03, columns, vials, filters) were purchased from Agilent Technologies (USA).

Results and discussion

Evaluation of prioritization parameters

Automated data processing allowed us to rank the relative diversity and quantity for each microorganism from its respective media panels. These data gave us a rational basis for deciding which extracts could be included in an extract collection for drug discovery screening.

To address the basic question of how the use of multiple fermentation media increases the amounts and kinds of metabolites, we pooled the accumulated metabolite diversity indices (DI) and quantity indices (QI; Table 2) for a set of 50 representative strains and plotted the accumulations as a function of the number of media used (Figs. 1, 2). Accumulated DI and QI resulted in a non-linear correlation. The range of variation on the DI or QI reached a maximum of 18%. Six fermentation conditions were needed to ensure 75% of relative diversity when the QI was emphasized. The exclusive use of the DI decreased the number of fermentations needed to reach 75% of total diversity to four. In this case, the area of the chromatogram peaks indicated that three to four media covered half of the area of the rest of the fermentation conditions. Differences between three and four media were not significant enough to justify four large-scale fermentations for each strain, especially considering that less media permitted the inclusion of additional new strains. We believed that three growth conditions would be a good compromise between the goals of maximizing the chemical diversity produced by each strain while including as many strains as possible in the screening.

Fig. 1.
figure 1

Accumulated values of diversity for a representative subset of 50 strains vs number of media selected with different prioritization parameter conditions. The diversity/quantity parameter (D/Q), in terms of diversity percentage, is: black circles 0/100, black squares 25/75, black triangles 50/50, black diamonds 75/25 and white circles 100/0

Fig. 2.
figure 2

Accumulated values of quantity for a representative subset of 50 strains vs number of media selected with different prioritization parameter conditions. D/Q as in Fig. 1

Depending on the diversity/quantity ratio (D/Q), the accumulated diversity obtained with three media reached from 68% to 53% as the accumulated quantity increased from 33% to 50%. A D/Q ratio of 75/25 was finally selected, because it ensured an almost maximal accumulated diversity (66%) when three media were selected, while maintaining an intermediate level of accumulated quantity (39%) with the same media.

HPLC-guided prioritization vs other selection mechanisms

To evaluate the effectiveness of the media selection procedure described above, both the accumulated chemical diversity and quantity were compared with other models for media selection. HPLC-guided selection was compared with a completely random selection of media and with another strategy commonly used in screening programs based on microbial metabolites, by which the media with the most different compositions were deliberately chosen. Table 3 shows the accumulated chemical diversity and quantity of three fermentation media selected by these three approaches. The HPLC-guided system increased dramatically the accumulated chemical diversity (mean DI=66%), when compared with the random selection (mean DI=32%). The QI of secondary metabolites was increased by an average of 5% using the HPLC-guided method compared with the random selection. The increment in DI was less remarkable when compared with media selection based on maximizing the differences in composition (Table 3). The media determined as the most ″source deficient″ for the comparison were GOT, MPG and KHC, which are the media with extreme carbon/nitrogen, carbon/phosphorous and nitrogen/phosphorous ratios (Table 1). MH medium was not included in the analysis due to its specific application to Pseudonocardiaceae strains.

Table 3. Comparison of three methods for the selection of fermentation conditions for a representative subset of 50 strains of Actinomycetes. The data presented are the average and standard deviation of the 50 strains. At random Media were selected in alphabetical order (CLA, DNPM, FR23). Source deficient The three media most deficient in nutrients (GOT, MPG, KHC) were selected. HPLC Studio-guided Media were selected according to the procedure described in the Materials and methods

Comparison of production media

To determine what might be the best media for the screening program, the selection frequencies (SF) observed for each of the 11 fermentation media were studied in the representative set of 50 strains (Fig. 3). According to the graph, displacement of a medium from the 1:1 linear correlation indicates its deviation from the average. The greater the displacement towards the upper part of the graph, the better the medium results for secondary metabolite production. Displacement towards the lower half of the graph means less metabolite production. Setting the D/Q prioritization parameter to 75/25 completely discriminated the effectiveness of each of the 11 media (Fig. 3). The media were ranked with respect to better metabolite production, from best to worst in the following order: GOT, FR23, MPG, GPA, CLA, DNPM, MH, KR, KHC, RAM2 and PV8. In terms of diversity only (DI data not shown), we observed that, for this set of strains, the best media were again GOT, FR23 and MPG, whereas in terms of only quantity (QI data not shown) the best media were MH, MPG and FR23. The poorest media for metabolite diversity (DI) were PV8 and RAM2; and PV8 was specifically poor in terms of metabolite quantity (QI).

Fig. 3.
figure 3

Selection frequencies for each fermentation medium as a function of the number of media selected for 50 representative strains with a software prioritization parameter balance of diversity/quantity = 75/25. See Table 1 for details of media listed at right

Taxonomic groups and media selection

Once the prioritization parameter of diversity/quantity was set at 75/25 and the suitability was determined for scaling-up just three media fermentation conditions for each strain, the HPLC-guided media selection was performed for a larger set of 250 Actinomycetes strains. No single medium was preferred by all strains; and the fermentation conditions maximizing the chemical diversity and quantity for each strain did not overlap in most cases.

Due to the phylogenetic and physiological diversity that has been described among members of the Actinomycetes [8, 12], we narrowed the search for preferred media by grouping the strains among genera. For example, Table 4 shows the three media prioritized for all the strains tested of the genus Saccharopolyspora, together with its relative SF. The selection of media was spread among all the media included in the analysis. Four of the fermentation media presented the highest SF: DNPM, FR23, GOT and MH. Two source-deficient media were again included among them (GOT, MH), MH medium being specifically designed for the Pseudonocardiaceae. Only GPA medium was not selected for any of these strains.

Table 4. Fermentation media prioritized for a set of Saccharopolyspora strains. Not tested

Similar trends were observed by grouping the 250 strains in family subsets (Table 5). Statistically, only the Micromonosporaceae, Nocardiaceae, Pseudonocardiaceae and Streptomycetaceae families were represented sufficiently to make the results statistically relevant. None of the SF for each medium observed for these larger four families reached a 75% value, except FR-23 for Nocardiaceae (77.1%). The differences observed in frequency were statistically significant, according to the Chi-squared test. It was observed that, for each family, there were five media selected for more than 30% of the strains. MPG, GOT and FR23 were above that level for the statistically representative families. CLA was above 30% for the Micromonosporaceae, Streptomycetaceae and Nocardiaceae, whereas DNPM was selected for the Pseudonocardiaceae, Micromonosporaceae and Streptomycetaceae. KR was also preferred for the Streptomycetaceae and MH for the Pseudonocardiaceae.

Table 5. SF values of fermentation media, using all the strains screened, grouped by families. The number of strains tested for all media is indicated at the left, except for GPA and MH (where the number of strains is indicated in parentheses). The three or four best media selected for each family set are given in italics

These results support the common practice of diversifying the media in screening programs based on microbial natural products. The fact that five media presented SF values greater than 30% confirmed the necessity of individual prioritization for each microorganism when an extensive exploitation of the metabolic capabilities of the organisms is desired.

In conclusion, the approach described in this work could improve the quality of the extracts included in drug discovery screening by decreasing the number of redundant fermentation conditions selected for each isolate while maximizing the chemical diversity contributed by each microorganism. However, the value of this new strategy would need to be confirmed by long-term follow-up, measuring whether extracts selected by this method produced higher hit-rates or an increased chemical novelty relative to randomly prepared extracts [14].

It is relevant to note that the method does not discriminate between primary and secondary metabolites. However, it is commonly accepted that most of the natural secondary metabolites present 210 nm absorption and that many primary metabolites would be subtracted as components of the uninoculated complex media. Therefore, most of the diversity determined by the method would come from the secondary metabolites.

The method chemically fingerprints each extract and subsequently acts as a quality control procedure. For example, the extracts obtained from the large-scale fermentation were analyzed by HPLC and compared with the small-volume culture. A minimum of 95% of peaks with identical retention times were obtained when large-scale fermentations were compared with their corresponding small-scale fermentations for a small series of strains (data not shown). Likewise, regrowths (if needed) can be compared with the original.

Several efforts are currently under evaluation for their further implementation in the process, such as decreasing the fermentation volumes, automation of the sample preparation with the use of liquid-handlers and increasing the chemical parameters with the introduction of other chemical detectors.

The innovation of this analytical treatment of chemical data opens additional possibilities for using HPLC chemometrics in other areas, such as strain comparison, improvements in media and fermentation conditions and others.