Key words

1 Introduction

Gramene (http://www.gramene.org; Fig. 1) is a curated online resource for comparative functional genomics in socioeconomically important crops and research model plant species, currently hosting over 30 completely sequenced plant reference genomes (Table 1; [131]). Each plant genome features community-based gene annotations provided by primary sources and enriched with supplemental annotations from cross-referenced sources, functional classification, and comparative phylogenomics analysis performed in-house. Increasing amounts of genetic and structural variation data derived both from data repositories and through collaboration with large-scale resequencing and genotyping initiatives are also available for visualization and analysis (Table 2; [3240]). Furthermore, plant pathway databases generated by applying both manual curation and automated methods complement available sequence-based gene annotations. Most advantageous to plant researchers and bioinformaticians is that, by using a core set of consistently applied protocols, Gramene offers a reference resource for basic and translational research in plants.

Fig. 1
figure 1

Gramene’s homepage

Table 1 Plant reference genome sequences in Gramene build 41 (May 2014)
Table 2 Genetic and structural variation data in Gramene build 41 (May 2014)

Gramene is driven by several platform infrastructures or modules that are linked to provide a unified user experience. Its Genome Browser (http://ensembl.gramene.org; Fig. 2) takes advantage of the Ensembl project’s infrastructure [41] to provide an interface for exploring genome features, functional ontologies, variation data, and comparative phylogenomics. Since 2009 Gramene has partnered with the Plants division of Ensembl Genomes [42] to jointly produce this resource, each benefiting from the other’s proximity to research communities in the USA and Europe, respectively. This collaboration has also facilitated timely adoption of innovative tools and software updates that accompany frequent version releases by the Ensembl project [41].

Fig. 2
figure 2

Gramene Ensembl Genome Browser homepage (Ensembl software v75)

Since the last edition of this volume in 2007 [43], Gramene has also become a portal for pathway databases developed and curated internally or mirrored from external sources. Two pathway platforms are currently supported: (1) Gramene’s Pathway Tools (http://pathway.gramene.org; Fig. 3) to emphasize the annotation of metabolic and transport pathways [4446], and (2) Plant Reactome (http://plantreactome.gramene.org; Fig. 4) to facilitate the annotation of metabolic and regulatory pathways. The Pathway Tools platform [47] supports the implementation of pathway databases in the BioCyc collection [48] to which Gramene has contributed MaizeCyc [45], RiceCyc [44], SorghumCyc [46], and BrachyCyc [46]. In addition, this resource mirrors six databases for Arabidopsis (AraCyc [49]), medicago (MedicCyc [50]), poplar (PoplarCyc [51]), potato (PotatoCyc [52]), coffee (CoffeaCyc [52]), and tomato (LycoCyc [52]), as well as the MetaCyc [48] and PlantCyc [51] reference databases (Fig. 3). The Plant Reactome is based on the Reactome data model and visualization platform [53]. It currently hosts manually curated rice and Arabidopsis pathways, and gene homology-driven inferred pathway projections for the maize and Arabidopsis thaliana reference genomes. It will continue to grow with the addition of data for new species and broader coverage of molecular interactions.

Fig. 3
figure 3

Gramene’s BioCyc pathways homepage. http://pathway.gramene.org

Fig. 4
figure 4

Gramene’s Plant Reactome homepage. http://plantreactome.gramene.org

The Genomes and Pathway modules enable species-specific and cross-species data downloads for discrete region(s), gene(s) or gene feature(s) via the Genome Browser, and pathway-centered downloads via the Pathways portal and Plant Reactome . In addition, project data is available for customizable downloads from the GrameneMart utility (http://ensembl.gramene.org/Tools/Blast?db=core [54]), nucleotide and protein sequence alignments via BLAST (http://ensembl.gramene.org/Tools/Blast?db=core), bulk downloads via file transfer protocol (FTP) at Gramene (ftp://ftp.gramene.org/pub/gramene and Ensembl Genomes (http://ensembl.gramene.org/info/website/ftp/index.html), and programmatic access via Ensembl’s REST application programming interface (API) and public MySQL (http://www.gramene.org/web-services [55]). Since March 2013, the website, database, and its contents are being updated quarterly and updates can be followed from the Gramene news portal (http://www.gramene.org/blog) and by browsing the site’s release notes (http://www.gramene.org/release-notes).

This chapter summarizes updates to the Gramene website and database since reported in the last edition of this volume [43].

2 Materials

2.1 Hardware and Software Requirements for Users

A computer with internet access and a standard web browser such as Mozilla/Firefox, Internet Explorer, Chrome, or Safari.

2.2 Gramene System Components

Gramene is a web-based application that allows users to search and view biological data, making use where appropriate of graphics viewers such as the Ensembl genome browser or the Pathway Tools Omics viewer. Data is maintained in distinct relational databases (MySQL), and users connect to the site using a standard web browser. User queries for static (HTML) and dynamic content are negotiated by the Apache web server via the Solr search platform and a middleware layer written in Perl. Bulk downloads of data are provided through FTP sites at Gramene and Ensembl Genomes .

2.3 Local Installation of Gramene

The minimum hardware configuration required for a local installation of Gramene consists of a desktop or server with a multicore CPU, 4GB of memory, and 500GB of disk space. Installation inside a virtual machine is possible. A recent distribution of Linux is required, such as Redhat/CentOS 6 or Ubuntu 12.x. Software packages required include Apache web server (see Note 1 ), Perl, PHP, MySQL, OpenJava, Drupal, and Apache Solr. Many of these can be installed via the distribution package management system (yum, apt-get). Solr can be downloaded from the Apache Solr website. For specific installation instructions, contact Gramene developers at feedback@gramene.org.

3 Methods

3.1 Basic Navigation of the Gramene Website

Gramene is powered by multiple modular platforms. The main entry point for the system is through the front web homepage (http://www.gramene.org; Fig. 1). Every Gramene page contains the main navigation bar and module-specific navigation bars if applicable, a simple search form that can be refined to interrogate the different data modules, a link to the homepage and another to our Contact page. The main navigation bars are found at the top and left side of each Gramene content page and constitute the main entry point to the search module (Search), genome browser (Genomes ), pathway databases (Pathways), bulk data sets (Download), information about the project (About) and its collaborators (Collaborators), outreach events and educational materials (Outreach), and legacy data and resources (Archive). The Contact link is set up to provide the user a feedback page where the URL from the page that the user was viewing at the time of the response is automatically included in the message. The interfaces within Gramene are interactive, providing the user with links to external reference databases as well as links to internal modules within Gramene.

3.2 Example Uses of Gramene

Within the constraints of this chapter, it would not be possible to go through all of the Gramene interfaces. Instead, these examples provide sample queries and walk through using Gramene to obtain information and facilitate genomic research. These and additional examples focused on comparative analysis of plant metabolic and regulatory pathways as well as plant gene expression analysis are available from Gramene’s Outreach page (www.gramene.org/outreach). Herein we will only demonstrate one of many possible ways to explore the Gramene website to address a given query and encourage users to discover other ingenious ways to solve them.

Exercise 1. View a phylogenetic tree for a family of transcription factors

In this exercise, we will navigate a phylogenetic tree for plant genes in the TCP family of transcription factors centered on the maize gene tb1 (teosinte branched 1 [56]). We will then generate a list of homologues (i.e., orthologues and paralogues) for this gene, highlight species-specific homologues with particular Gene Ontology (GO) annotations in the tree, and download images and tables with the results. On the Gramene homepage, type “tb1” in the search box on the top of the page. Once the results appear, narrow your target by specifying “Zea mays” under Species. The resulting link will conduct you to the maize tb1 gene page of the Ensembl genome browser. Note the four distinct tabs on the top of this page: Species (Fig. 5a), Location (Fig. 5b), Gene (Fig. 6a), and Transcript (Fig. 6b); an additional Variation page (Fig. 6c) may be accessed for species with variation data in Gramene (see Table 2). Each of these tab views will be discussed in more detail below. Common to the Location, Gene, Transcript, and Variation pages, as well as the views available therein, are customizable tracks, links to internal pages, and contextual links to data sources outside of Gramene. Actions enabled for each of those pages and their embedded views include (1) configuring and resizing, (2) uploading and managing user-provided data for graphic display, (3) exporting or downloading data, and (4) sharing pages and images. For example, you may customize the tracks on display by selecting the “Configure this page” instruction on the left side navigation bar or upon clicking on the “Configure this image” icon on the top left corner of an image. A new browser window will pop up listing all available data tracks for the browser view that you wish to customize. Data tracks are grouped by category; click on a category to see the complete list of available tracks for that category (e.g., “mRNA and protein alignments” may include tracks for EST clusters, cDNAs, and protein features from various species, sources, or methods). A track gets activated for display on the browser by clicking on the square preceding its name and selecting a desired “track style”. Favorite tracks may be set and the order of tracks may be changed. Save your selections and close the pop-up window by clicking on the check mark on the top right corner. The browser will automatically refresh itself and your selected tracks should now be visible.

Fig. 5
figure 5

Gramene Ensembl genome browser pages: (a) Species and (b) Location (e.g., maize tb1 gene)

Fig. 6
figure 6figure 6

Gene-centric Gramene Ensembl genome browser pages: (a) Gene, (b) Transcript, and (c) Variation pages (e.g., maize tb1 gene; SNP variant PZE01264848659)

Gramene Ensembl Genome Browser pages:

  1. 1.

    The Species page (Zea mays for this example; Fig. 5a) contains detailed information about the reference genome assembly and gene annotation; comparative genomics data including phylogenetic gene trees, whole-genome alignments, and synteny views; gene regulation (microarray) data; genetic and structural variation; and links to download data sets in bulk.

  2. 2.

    The Location page (Chr 1: 265,811,311–265,813,044 in the B73 maize AGPv3 assembly; Fig. 5b) offers several scalable views on the left side navigation bar, e.g., karyotype or whole-genome view, chromosome summary, region overview, region in detail (expanded red box from the region overview), as well as comparative genomics views, which include multi-species alignments, region comparisons, and synteny views. Semantic zooming is available for each “region” view.

  3. 3.

    The Gene page (TB1; Fig. 6a) provides a summary of data available for a given gene, as well as an extensive list of features including splice variants (see also Transcript page), exon/intron marked-up sequence, associated ontology terms and literature references, external references, comparative genomic alignments, expandable gene trees, orthologues and paralogues, and genetic/structural variation. The Plant Compara Gene Trees are derived from a pre-computed phylogenetic analysis of protein-coding genes from all Gramene species, plus several representative animal genomes used as outgroups. The Pan-taxonomic Compara Gene Trees sample species more broadly across taxa represented by the Ensembl Genomes project, including bacteria, fungi, protists, and metazoa, and include only a subset of representative plant species held within the Gramene database .

  4. 4.

    The Transcript page (TB1-201; Fig. 6b) includes sequence data, external cross-references (including oligo probe sources), supporting protein/EST evidence, GO associations, variation, and protein domains and features. If variation data is available for a given gene, each variant will have its own Variation page (see below). The Variations table under the Protein Information category provides a complete list of the transcript’s variants with alleles, functional consequence, relative position in the protein’s amino acid sequence, ambiguity code, and actual affected codons/amino acids, if any. Moreover, for species like Arabidopsis in which the same set of variants have been genotyped in different populations, tabular and graphic “population comparisons” are available from the Transcript page.

  5. 5.

    The Variation page (e.g., PZE01264848659 for tb1; Fig. 6c) includes the variant’s genomic context, functional consequences in all transcripts, individual genotype data, as well as allele/genotype frequency by population tables. Note that if several transcripts are available for a given gene, the same variant may have different functional consequences in each transcript as per its relative location in the corresponding protein product.

To view the phylogenetic tree for the TCP family of transcription factors, go to the tb1 Gene page and click on the “Gene tree (image)” view. In the “Highlight annotations” table, both InterPro and GO terms are enabled by default; uncheck the box for GO terms to show only InterPro domains. From the list, select IPR005333, which is the InterPro ID for the complete TCP domain (see Note 2 for visualizing the complete protein domain structure of the maize tb1 gene). Figure 7a displays the collapsed view of the tree with all the clades highlighted. Click on “View fully expanded tree” from the “View options” at the bottom of the page. Except for a handful of genes, all the genes in the tree image will light up because of the prevalence of the TCP domain. We may also highlight orthologues and paralogues between two species. For example, let’s find the sorghum orthologue with highest similarity to maize tb1. From the maize gene’s page, select “Plant Compara Orthologues” and enter “sorghum” in the “Filter” box on the top right corner of the orthologues table (Fig. 7b). In the “Compare” column, click on the “Gene tree (image)” link for the sorghum orthologous gene (Sb01g010690), and upon full expansion of the tree, you will see TB1 and SB01G010690 highlighted in different shades of red, maize within-species paralogues in different shades of blue, and sorghum paralogues highlighted in black (Fig. 7c shows the collapsed view of the highlighted tree). Note that by clicking on any speciation tree node, a pop-up inset will appear with various parameters describing the tree, as well as options to selectively collapse nodes and view a sub-tree in other formats like FASTA .

Fig. 7
figure 7

Exercise 1: visualization of a phylogenetic tree for the TCP family of transcription factors centered on maize TB1. (a) Collapsed view of the tree highlighting all gene products that include the TCP InterPro domain (IPR005333). (b) Filtered view of sorghum orthologues of TB1. (c) Gene tree image for TB1 highlighting its sorghum orthologues (e.g., SB01G010690) and within-species paralogues (maize and sorghum, respectively); speciation nodes shown in black, duplication nodes shown in red. Also shown is inset that pops up upon clicking on any node

Exercise 2. Explore genetic variation in the rice orthologues of a maize gene with a known trait association

We will now explore genetic variation in the rice orthologues of the maize lycopene epsilon gene (lcyE). Specifically, we will determine whether the non-synonymous substitution mapping to nucleotide 210 relative to the start codon of the transcript with the longest genomic span (LCYE-201 or GRMZM2G012966_T03), which was found to be associated with provitamin A accumulation in the maize kernel [57], is also present in its rice orthologues.

Go to the maize lcyE gene page as done for the tb1 gene. From the gene and transcript pages, you may visualize all its genetic variants in tabular (“Variation table” option in the left side navigation bar; Fig. 8a) or graphic form in their genomic context (“Variation image” option; Fig. 8b). The Variation table groups variants by functional consequence; by clicking on the “Show” option for a given category (e.g., missense variant), you will get a list of the variants with other data like genomic position, alleles, relative amino acid position (if affected), etc. The Variation image displays the same information as the table in graphic form, plus the relative location of known protein domains. However, if you know the SNP identifier, the simplest way to find all the available information for a given variant is to select the “Variations” option under “Protein Information” as it lists all variants by identifier. The SNP variant associated with provitamin A accumulation identified by Harjes et al [57] is PZE08137569063. As shown in the “Genotype frequency” (Fig. 9a) view available from the Variation page, this variant has alleles G and T with variable genotype frequencies in 13 maize or teosinte populations, including HapMap2 (“Zmays”).

Fig. 8
figure 8

Exercise 2: exploring genetic variation in a rice gene while looking for conservation of a maize SNP variant (PZE08137569063) associated with provitamin A accumulation in the kernel [57]. (a) Genetic variation in the maize lcyE gene in tabular form, and (b) graphic form

Fig. 9
figure 9

(a) Genotype frequency data for PZE08137569063 in 13 maize or teosinte populations from HapMap2 and the Panzea 2.7 GBS data release Gramene. (b) Genotype data for Gene sequences, orthologous/paralogous gene lists, and gene variants available for customized download via the GrameneMart

Now, let’s identify the closest rice orthologues of the lcyE gene by proceeding as described above in the “Gene tree (image)”. The “Orthologues” view allows users to download all or a selected set of orthologous genes (by using the filter box on the top right corner of the table), as well as to view and download the corresponding protein sequences and/or pairwise alignments. To download nucleotide sequences or download all the genetic variants for the orthologues, users could go to each individual gene’s page and proceed as described above (i.e., go to the Variation table/image and download the data directly from the table/image or click on “Export data” option on the left sidebar menu). Alternatively, users may download for each species the same DNA/protein sequence and variation data using, respectively, the “Plant Genes” and “Plant Variation” databases in the GrameneMart utility (http://ensembl.gramene.org/biomart/martview/; Fig. 9b). Users may visually compare all the species in the gene tree by selecting “Gene tree (alignment)” in the left sidebar menu or view pairwise genomic alignments with the “Genomic alignments (text)” option. Alternatively, users may use a multiple alignment program such as ClustalW to visually compare the rice orthologous gene sequences, and realize that (1) this site has not been found to be polymorphic in O. sativa Japonica and Indica, (2) there is a sequence gap around this position in O. glaberrima, and (3) the ancestral G allele is the one present at this position in the O. nivara, O. glumaepatula, and O. punctata orthologous genes.

Further genomic analysis may be performed with the Ensembl “Tools” available at http://ensembl.gramene.org/tools.html and other links from Gramene’s archival Diversity pages at http://archive.gramene.org/diversity/tools.html.

Exercise 3. Upload, visualize, and share your own data into a new genome browser track

The Ensembl genome browser allows users to upload their own data and visualize it on a custom track. Data may be formatted in various file formats including GFF, GTF, BED , BAM , VCF , bedGraph, gbrowse, PSL, WIG, BigBed, BigWig, and TrackHub. Some data like GFF annotations may be directly uploaded from a local machine. Large data files like BED/BAM alignments or BigWig graphic display configurations need to be uploaded onto a local server that is accessible to the browser via a URL. Another way to share third-party data is via a DAS (Distributed Annotation System) registry, which would need to be set up by a software engineer.

Test data sets consisting of BAM alignments and CpG methylation for B73 and Mo17 maize lines used in the study by Regulski et al. [58] are available from the Gramene outreach pages to upload and visualize for this exercise. To upload the data simply click on the “Add/Manage your data” option on the left bar menu of any genome browser page (Fig. 10a). This action will take you to the upload page (Fig. 10b) where you need to specify the format of the file you intend to upload (formats and test sets are also described in the “Help on supported formats, display types, etc.” link therein). To visualize custom data in a new browser track, make sure that your track is turned on in the configuration menu and you are looking at a region that includes the new data you have just uploaded. The BAM alignments and CpG methylation ratios are shown in Fig. 10c.

Fig. 10
figure 10

Uploading and managing user-provided data to display as a new genome browser track. (a) Pop-up window upon clicking on the “Add/Upload Data” link from the Gene page. User selects species and file format for the data to be uploaded. (b) Preloaded BAM alignments and CpG methylation ratios. New track shows CpG methylation data in the selected gene region

4 Notes

  1. 1.

    Apache 2.x is not supported yet due to significant differences in the persistent Perl interpreter module (mod_perl).

  2. 2.

    For the complete protein domain structure of a gene, go to the Transcript page and select “Domains & features”. By clicking on a particular “Display all genes with this domain” link, you will get a list of all genes in the same species that contain any given InterPro domain. To download the list, click on the “Download” icon to the right of the Filter box.