Introduction

The characterization of whole microbial communities via molecular methods has profoundly changed the way we conduct microbial ecology studies [1, 2]. The development of high-throughput sequencing technologies are allowing comparative analyses of diversity, abundance, and important ecosystem functional genes of whole microbial communities at far greater depths than ever before [3, 4]. However, the increasing amount of genomic information being produced is currently overcoming the analytical capacity of many laboratories. Creative solutions are required to empower researchers who have no access to a team of bioinformaticians or high end computational resources [5].

Useful solutions to these problems can be applied in two ways: firstly, computational tasks can be specified in terms of human-readable equations that are independent of the programming platform, and secondly, improved performance in execution of clearly defined tasks can be achieved [6]. However, understanding the code can impose a steep learning curve: consuming precious time between biologists and programmers in tasks of installation, configuration and maintenance of software and dependencies, necessary to create and execute pipelines. Moreover, the development and operation of tailored tools require dedicated staff, high expertise, and even the acquisition of powerful computational infrastructure. These requirements become a barrier for several research groups hindering science development especially in resource limited regions.

To facilitate the dissemination and use of bioinformatics tools, several Linux distributions such as BioLinux [7], Scibuntu [8], PhyLIS [9], LXtoo [10], and Mypro [11] have been created and represent an interesting and time saving option. All the cited OS built for bioinformatics fall into two categories, being either broad such as BioLinux, Scibuntu, and LXtoo or specific like PhyLIS and Mypro (aimed at phylogenetic analysis and prokaryotic genome assembly/annotation, respectively). None of the available OS addresses the need for bioinformatics tools to support the automated and user-friendly microbiome data analyses.

The Brazilian Microbiome Project (BMP) [12] addresses the importance of studying the huge biological diversity stored in Brazil in a resource limited setting for the analysis of microbiome data. One of the main challenges of the BMP resides on testing and/or creating bioinformatics pipelines to help researchers in handling next generation sequencing (NGS) data [13]. Metagenomic analyses have been described as one of the least reproducible NGS applications, mainly due the lack of integrated and standardized solutions for performing these kinds of studies [14]. Here, we describe the Brazilian Microbiome Project Operating System (BMPOS), a flexible and user-friendly Linux distribution available to help researchers handle the most frequently used bioinformatics packages dedicated to the study of microbial ecology. The BMPOS is valuable as a tool for end-to-end analysis, training, and an excellent starting point for anyone interested in performing microbiome studies based on NGS data.

Implementation

The BMPOS (Fig. 1) is based on the free GNU/Linux distribution Ubuntu 14.04 LTS. It allows users to reproduce all bioinformatics pipelines, as recommended by the BMP advisory board, available at [15]. The BMPOS contains the most widely used packages among microbial ecologists to analyze next generation sequencing (NGS) data. The packages are installed, configured, pre-compiled, and already defined at the system’s path. An updated list of packages is maintained at [16] and is currently composed of software for sequence filtering and trimming, sequence clustering, sequence alignment, phylogenetic tree reconstruction, statistical analysis, data visualization, and database searching, besides of all BMP scripts created to make data compatible among different packages (Table 1).

Fig. 1
figure 1

The screenshot of the BMPOS, highlighting the BMP desktop application

Table 1 Packages, scripts, and databases available in the BMPOS

Among all packages applied at the BMP pipelines, only USEARCH [18] has a restriction of use, being freely available only on its 32-bit version, i.e., not capable of handling files bigger than 4 Gb. In that case, it is necessary to acquire a license for the 64-bit version. The USEARCH package possess tools capable of database search, nominally hundreds of times faster than BLAST [36], possess algorithms for processing NGS reads like quality filtering, chimera detection, and dereplication, and being implemented in BMP pipelines because of their accuracy and speed. As an alternative to USEARCH, the BMPOS also provides VSEARCH [17] which supports most of the USEARCH functions, but as an open and free 64-bit multithreaded tool. Another open source bioinformatics package used in the initial steps of the BMP pipeline is QIIME (Quantitative Insights Into Microbial Ecology) [37]. QIIME has the advantage of containing a comprehensive suite of functions and procedures. It is easy to use, implement, and combine with other packages, and it has an extensive documentation [38]. Packages like ITSx [20] and HMMER [21] are used in combination to extract non-internal transcribed spacer (ITS) sequences and delivering a better taxonomic assignment for Fungi. For better handling and storage of contingency tables containing metadata, the BMPOS uses the BIOM package [23]. Moreover, our operating system also contains all needed databases for chimera detection and taxonomic assignment of Fungi, Bacteria and Archaea (Table 1).

The BMPOS may be used directly as a bootable live USB stick plugged in any computer with at least 1 GHz CPU and 512 MB RAM, without the need for package installation or any previous configuration. It can also be installed in the user’s machine, independently of the operating system. This is the fastest way to spread bioinformatics packages among groups of collaborators.

Results and Discussion

The BMPOS is very effective for classes or courses and for research groups with limited human and computational resources. The Brazilian Microbiome Project has conducted successful courses, teaching nearly 100 students using the USB stick strategy. The courses included students that have never had any contact with command line and successfully concluded all analysis resulting in increased confidence and motivation. The use of the live USB strategy allowed the students to save the results obtained from the pipeline in the USB stick or their own computer, permitting the users to analyze the final data matrix in the software of their preference. Moreover, as they were given the USB stick, the students are able to further improve their own skills.

The execution of the BMP pipeline developed for metagenomic analyses using the 16S ribosomal RNA (rRNA) gene for Bacteria and Archaea, and ITS region for Fungi is a very simple approach for teaching how to conduct and understand each step of the pipeline that can itself be fully automated. The strategy adopted to automate these pipelines at the BMPOS relies on the so called BMP desktop application, a bash script coupled to a Java™ (SE Runtime Environment-build 1.8.0) interface that generates a better user-friendly experience, besides saving the effort of retyping that particular sequence of commands. For now, the java application only runs a default workflow, but in future versions, new improvements will allow users to adjust parameters, making it more flexible. On a personal computer with an Intel® Core™2 Duo CPU P8600 2.40 GHz (x2) and 4 GB of RAM memory, the BMPOS takes ∼4 min to run the BMP recommended pipeline on a dataset of 166,931 16S rDNA paired-end Illumina reads (151 bp–120 Mb). The same computer takes ∼5 min to run the BMP recommended pipeline on a dataset of 120,171 16S rDNA single-end Ion Torrent reads (300 bp–60Mb), and ∼9 min. to run the BMP recommended pipeline on a dataset of 166,931 ITS single-end Illumina reads (251 bp–93.4 Mb). These datasets are distributed alongside the BMPOS, in a folder located at “usr/bmp/data_example,” which allow users to run a preliminary test and evaluate their own machine. All the BMP recommended pipelines are available in the section “Standards and Protocols” of the BMP website (http://brmicrobiome.org ).

Conclusions

The BMPOS presents as a useful and user-friendly starting point to anyone interested in metagenomic analyses of microbial communities. This strategy proved itself an effective way of settling an environment for bioinformatics training and routine analyses. We are open to suggestions regarding bug fixes, the addition of new packages or updates of the currently installed software and packages. Updates on the BMPOS will be made yearly or as soon as new analysis pipelines are developed. The BMPOS is available to download at http://brmicrobiome.org.