Keywords

1 Introduction

Rice is a major staple food crop for almost half of the world population. Among several agricultural crops, rice is considered as one of the most important crop plants for bioinformatics and computational biology research as it has become the model monocot plant having a number of biological characteristics and recent research advancement in the field of genetics, breeding, genomics, germplasm collection and maintenance, systems biology, and functional genomics. During the last three decades, advancement in biotechnology led to the acceleration in many rice research programs particularly in breeding, selection of superior genotypes, large-scale cDNA analysis, genetic mapping, and genome sequencing (Khush and Brar 1998; Sasaki and Burr 2000). At the same time, this progression in biotechnology research paved the way for a new era, i.e., rice informatics in rice research, and opened new opportunities and direction for the improvement of rice crop which will address the issues concerning global problems on food security. The japonica rice cultivar Nipponbare genome sequencing project was completed in 2005 by consortium research of 10 countries, and Rice Annotation Project Database (RAP-DB) was developed to provide an accurate annotation of the rice genome through HTTP access (IRGSP 2005; Itoh et al. 2007). Parallel with rice genome sequence work and its related genomics resources, advancement in rice breeding research and development of molecular marker resources has helped the researchers to accelerate the identification, isolation, and incorporation of agronomically important genes and QTLs (Ashikari et al. 2005; Konishi et al. 2006; Ma et al. 2006, 2007; Kurakawa et al. 2007).

Recent advances in rice research are associated with the emergence of high-throughput data from large-scale sequencing, expression profiling of thousands of genes, phenotyping, and strategies on transcriptomics, proteomics, and metabolomics (Nagamura and Antonio 2010). In addition, large-scale collections of bioresources, such as mass-produced mutant lines and clones of full-length cDNAs and their integrative relevant databases, are now available (Brady and Provart 2009; Kuromori et al. 2009; Seki and Shinozaki 2009). The vast accumulation of genomics data from these strategies has culminated the need and importance of transforming these data into easily accessible and understandable form to the researcher, which can be ultimately studied and interpreted into useful biological information (Lewis et al. 2000). For this robust infrastructure for organizing data, computational methods for analysis and interfaces for integration and retrieval of various types of data through user-friendly databases have been developed (Nagamura and Antonio 2010).

The application of bioinformatics has triggered the research in rice sciences with speed. This contributed to the easy and convenient way of data handling and data analysis much faster than traditional approach. The potential of the Internet in access of most up-to-date information on scholarly content, communication with colleagues, engaging two-way process of communication between researchers, and publishing materials more easily has been visualized in the advancement of the rice-related information and technologies (Ram and Rao 2012). Many bioinformatics resources are now available to the researchers around the world through the World Wide Web. Nowadays, researchers can easily post their research findings on the Web or compare their discoveries with previous results. The easy access and sharing of data between institutions has increased the opportunity for collaboration and thus dramatically fastened the research work and the development in the field of rice science. These developments are highlighted through the availability of databases, web servers, articles, and research organizations working in this area.

In the field of agriculture, the main focus is rice research, and the last 20 years belong to advancement of sequencing technology. In every stage of rice research, global problem of food security has been the burning issue, and every research activity from morphological and physiological to application of biotechnology to marker-assisted rice improvement to the development in bioinformatics technologies has been focused and addressed. The advancement in bioinformatics rice research played a pivotal role in this. Later, large-scale DNA analysis, genetic mapping, and genome sequencing have resulted in a tremendous increase in computer-generated information on rice genome (Sasaki and Burr 2000). During the same period, the advancement of sequencing technologies such as Expressed Sequence Tags (EST) research changed the path of genetic expression of rice. EST project collectively represents about 1,251,304 entries of GenBank available on NCBI dbEST (http://ncbi.nlm.nih.gov/dbEST/dbEST_summary.html) [GenBank Release 1.3.2011] (Ram and Rao 2012). In recent years, with the development of new software and statistical analyses, the physiology experiments of rice (Oryza sativa L.) were performed by analyzing ANOVA and Tukey’s HSD mean comparison using Rv. 2.8.0 (Swamy et al. 2013). Rv. 2.8.0 is a free statistical software for computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows, and MacOS. MATLAB is a data analysis and visualization package software. Agriculture scientists use it for climate change analysis and in probabilistic model designing as it can cope up with large gridded dataset quite easily. Statistical Analysis Software (SAS) is also one of the widely used statistical packages for analyzing statistical data in crop science. Rice plants respond to different stresses via a number of mechanisms. Availability of rice genome sequences, large amount of information were generated from genomics and proteomics studies and in silico computational bioinformatics tools set a new platform for the management of environmental stresses in rice. In silico docking between the two proteins showed a significant protein–protein interaction between rice EDS1 and PAD4, suggesting that they form a dimeric protein complex, which, similar to that in Arabidopsis, is perhaps also important for triggering the salicylic acid signaling pathway in plants (Singh and Shah 2012).

2 Rice (Oryza sativa): Model Species for Monocot Plants

Poaceae family is one of the most important among the monocots which mostly includes agricultural crop species like maize, wheat, barley, sugarcane, sorghum, and rice. These species share extensive synteny across their genomes, allowing for one of the species to serve as the base for comparative genomics and bioinformatics study within the family (Moore et al. 1995). Rice is one of the major monocot plants among them and used as a model species for the various biotechnological and bioinformatics research. Rice represents the most suitable species for genomics and bioinformatics research, which is the main reason for selecting it as a model species for its small genome size (~431 Mb). The second reason is the availability of genetic and molecular resources. After the complete genome sequencing of japonica rice variety Nipponbare, the research in rice has been revolutionized.

3 Rice Information System: The Rice Informatics

The computer and information technology has revolutionized the research in biotechnology and bioinformatics. Recent technologies in molecular biology and germplasm conservation have speeded up the sequence, genetic, and phenotypic information analysis (Ram and Rao 2012). The IRRI (International Rice Research Institute) in collaboration with CIMMYT (International Maize and Wheat Improvement Center) established the International Crop Information System (ICIS) project. In this project, scientists work for varietal improvement with the use of new and advanced bioinformatics software and also work for the development of software to facilitate and fasten the research and establishing the links between information from different crops like rice and maize. By this, the ICIS is maintaining huge datasets of rice which is publicly available and easily accessible to the scientists.

4 Web Tools and Resources on Rice

The results of huge data and information generated through acquiring knowledge of genomics resources in rice improvement has posed the problem in front of agricultural scientists to maintain data for future use and manipulate the data. Such activities have created the provision for the maintenance of data in the form of databases. Rice research in global world and organizations working in the area of rice have developed and maintained such databases globally. Databases, software tools, web servers, etc. are available for data management and are used for solving problems related to rice, whether it would be the handling of molecular level activities or the production of a disease-resistant variety of rice (Ram and Rao 2012). The World Wide Web (www) provides a mechanism for extraordinary information sharing among the researchers as many bioinformatics resources are now available all over the world through www.

4.1 The International Rice Information System (IRIS)

The International Rice Information System (IRIS) is the rice implementation of the International Crop Information System (ICIS), a database system for the management and integration of global information on genetic resources and germplasm improvement (Bruskiewich et al. 2003). In 1995, the international agricultural research centers CIMMYT and IRRI partnered with other CGIAR centers to establish a project to develop an International Crop Information System (ICIS; Fox and Skovmand 1996) to overcome these deficiencies in crop data management for a wide range of crops. Several CGIAR centers, national agricultural research systems, and advanced research institutes are collaborating to develop ICIS as a generic system that will accommodate all data sources for any crop and breeding system. There are basically two objectives of ICIS: first, to integrate different data types in both private and public datasets into a single information system and, second, to provide specialist views and applications that operate on this integrated platform. After successful completion, ICIS will support a range of activities from germplasm conservation, evaluation, functional genomics, allele mining, breeding, testing, and release. The Genealogy Management System (GMS) of ICIS is the core database which ensures unique identification of germplasm, management of nomenclature (including homonyms and synonyms), and retention of all germplasm development information. The ICIS system is fast, user-friendly, PC based, and is available on CD-ROM and also available online (http://www.iris.irri.org/). Mainly the ICIS (IRIS) is designed to allow biologists to manage local data and query and view their own data fully integrated with global public information. One of the innovative features of ICIS is that it permits independent users to integrate their own local data with public central data. The IRIS is being developed under the open-source ICIS project. Code is freely available to anyone, and the latest information about the ICIS project can be accessed at http://www.icis.cgiar.org.

4.2 Database Development

Computer-based databases are new innovations in the field of molecular biology, biotechnology, and bioinformatics, finding its scope in usage and online accessibility of information. The three major database sources at GenBank at the National Center for Biotechnology Information (NCBI) (http://ncbi.nlm.nih.gov), the European Molecular Biology Laboratory (EMBL) at European Bioinformatics Institute [(EBI) http://www.ebi.ac.uk/], and DNA Data Bank of Japan (DDBJ) (http://ddbj.nig.ac.jp) are imparting major roles in the management of biological information. The basic sequenced data submitted to these institutions can be mirrored to other institutions automatically on a routine basis to suffice for basic data. Beyond sequence data, the range of pertinent functional genomics, proteomics, structured experimental data, and associated data are being utilized by various organizations in order to develop different kinds of databases in various categories. A list of such databases in rice is shown in Table 1.

Table 1 Major rice structural and functional genomics databases available

4.3 Structural and Functional Genomics Databases

Rice genomics databases are the major sources of information that could be used in understanding the genetic and molecular basis of all biological processes including many economically important traits, which are the main concern of breeders. The basic information that can be deciphered from the genome sequence is indispensable in the development of new cultivars with target traits such as high yield, biotic/abiotic stress resistance, good eating quality, etc. The availability of wide range of genetic and molecular markers made rice as an important species for genomics analysis. With the completion of the rice genome sequence in 2004, a standard annotation is necessary so that the information from the genome sequence can be fully utilized and understood. This led to the establishment of a platform for structural and functional characterization of the rice genome. The Rice Annotation Project Database (RAP-DB) provides sequence and annotation data for rice genome. RAP-DB is a hub for Oryza sativa ssp. japonica genome information. This web-based tool provides information on rice genome sequence from the Nipponbare subspecies of rice and annotation of the 12 rice chromosomes. The RAP-DB contains the IRGSP genome sequence (build 3 assembly) (IRGSP 2005) and the RAP loci with corresponding locus IDs representing the annotated genes. The primary concept of the RAP-DB is to provide simple access for the IRGSP genome sequence and the RAP annotation. RAP-DB has two different types of annotation viewers, BLAST and BLAT search, and other useful features. The Institute of Genomic Research (TIGR) works in the area of rice genome sequence with new data to improve the quality of the annotation. The TIGR Rice Genome Project BLAST server has a collection of databases for use in searching with the BLAST programs blastn, blastx, tblastn, and tblastx. The TIGR Rice Pseudomolecules database allows user to search against the latest version of the 12 TIGR rice pseudomolecules. These databases have been conceptualized with the aim of providing a comprehensive analysis of the rice genome and include both structural annotations to identify the genomics elements and functional annotations to attach biological meaning to the sequence data.

The International Rice Functional Genomics Consortium (IRFGC) has re-sequenced the 100 Mb of gene-rich genomics sequences to determine SNP variation from 20 diverse rice varieties and Landraces commonly used in breeding programs internationally and has the impressive genotypic and phenotypic diversity of domesticated rice (McNally et al. 2009). Zhao et al. (2011) showed the detailed results of a Genome Wide Association Study (GWAS) based on 44,100 SNP variants across 413 diverse accessions of O. sativa collected from 82 countries, which has been systematically phenotyped for 34 traits. SNP variation has been analyzed in 177 Japanese rice accessions, which are categorized into three groups: landraces, improved cultivars developed from 1931 to 1974 (the early breeding phase), and improved cultivars developed from 1975 to 2005 (the late breeding phase) (Yonemaru et al. 2012). Oryza Mapping Alignment Project (OMAP, Wing et al. 2005) has been initiated to characterize the genome of wild rice species and already released the genome sequence of Oryza glaberrima, an African species of domesticated rice. Due to genome sequencing advancement during recent years, the researchers are able to access huge information on genetic variation present within the genus Oryza, within the two major subspecies, and among diverse rice cultivars and landraces that have great potential for improvement of cultivated rice (Nagamura and Antonio 2010). Along with complete genome sequencing, short ESTs (Expressed Sequence Tags), full-length cDNA sequence databases in dbEST, and KOME are also useful for evaluation of gene expression and variation. Furthermore, short read sequences including microRNAs (miRNAs), small interfering RNAs (siRNAs), transacting siRNAs, and heterochromatic siRNAs can be accessed via the rice databases at the University of Delaware (Simon et al. 2008). In particular, the Rice MPSS database is a repository of small RNA sequences with detailed information on sense and antisense expression of rice annotated genes (Kan et al. 2007). The genome sequences and all short transcript resources immensely augment the information on rice and will help in understanding the genetic control of agronomically important traits.

The MIPS Rice (Oryza sativa) database (MOsDB) provides a comprehensive data collection dedicated to the genome information of rice. MOsDB integrates data from two publicly available rice genomics sequences: O. sativa L. ssp. indica and O. sativa L. ssp. japonica. MOsDB provides an integrated resource for associated data analysis like internal and external annotation information as well as a complex characterization of all annotated rice genes. It includes an up-to-date access to publicly available rice genomics sequences and various search options. MOsDB is continuously expanding to include increasing range of data type and the growing amount of information on the rice genome. The RiceGAAS (Rice Genome Automated Annotation System) is also extensively used to identify various structural and functional components (Sakata et al. 2002). It has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation. The RiceGAAS system does the functional analysis by collecting rice genome sequences from GenBank and then executing gene prediction, analysis of exons, splice sites, repeats, and transfer RNA based on algorithm which combines multiple gene prediction programs with homology search results. RiceGAAS system consists of 14 analysis programs. These include BLAST for homology search against protein database and rice EST database, GENSCAN and RiceHMM for gene domain prediction, MZEF for exon prediction, SplicePredictor for splice site prediction, and more (Sakata et al. 2002). Thus, RiceGAAS provides a systematic and comprehensive annotation of accumulated sequence data.

OryGenDB is another database for rice functional genomics. The database is an interactive tool for rice reverse genetics. Insertion mutants of rice genes are cataloged by flanking sequence tag (FST) information that can be readily accessed by this database. Oryzabase is an integrated database of rice genome resources. Oryzabase has been created for a comprehensive view of rice (Oryza sativa) as a model monocot plant by integrating biological data with molecular genomics information (http://www.shigen.nig.ac.jp/rice/oryzabase/top/top.jsp). The database contains information about rice development and anatomy, rice mutants, and genetic resources, especially for wild species of rice. Several genetic, physical, and expression maps with full genome and cDNA sequences are also combined with biological data in Oryzabase. This provides a useful tool for gaining greater knowledge about the life cycle of rice, the relationship between phenotype and gene function, and rice genetic diversity (Kurata and Yamazaki 2006).

Database has also been developed to screen and phenotype mutant lines in rice which can be used for selecting lines with specific morphological and physiological features. Some of these mutant databases are Rice Tos17 Insertion Mutant Database with about 50,000 Tos17 insertion mutant lines from japonica rice cultivar Nipponbare (Miyao et al. 2007), Rice Mutant Database (RMD) containing 134,346 rice T-DNA insertion lines (Zhang et al. 2006), Taiwan Rice Insertional Mutants (TRIM) with 55,000 T-DNA insertion lines (Chern et al. 2007), Shanghai T-DNA Insertion Population Database (SHIP) containing 65,000 T-DNA insertion lines, Oryza Tag Line with 46,000 T-DNA and Ds insertion lines (Larmande et al. 2008), and IR64 Rice Mutant Database with phenotype information for irradiation and chemical mutants of IR64 cultivar (Wu et al. 2005). These databases mostly provide information on flanking sequences of the disrupted genes and thus allow users to screen the available mutants.

Another major part in bioinformatics is the analysis of huge amount of genome expression data, which is generated by microarray and SAGE (Gerstein and Jansen 2000). For analysis of such vast amount of data, many analysis tools and databases have been developed like Rice Expression Databases (RED) and Rice Microarray Opening Site (RMOS), Rice Array Database (RAD), Rice Atlas, OryzaExpress, Collection of Rice Expression Profiles (CREP), RiceArrayNet (RAN), RiceChip.Org, Rice MPSS, RicePLEX, MGOS (Magnaporthe grisea, Oryza sativa) database, and Rice Gene Expression Profile database (RiceXPro). These databases provide analysis tools that allow comparison of expression profiles from different samples based on specific criteria. Moreover, these databases provide Gene Expression Networks and various kinds of omics information including genome annotation, metabolic pathways, and gene expression.

For the better analysis of genome assemblies and annotation, a database, namely, Rice Genome Knowledgebase (RGKbase) – an annotation database for rice comparative genomics and evolutionary biology – has been introduced (Wang et al. 2012). RGKbase has three major components: (1) integrated data curation for rice genomics and molecular biology, (2) user-friendly viewers, and (3) bioinformatics tools for compositional and synteny analyses. Currently RGKbase includes data from five rice cultivars and species, and new datasets are continuously introduced in it. A very important genomics database, known as Rice TOGO Browser, a component database of AgriTOGO, is an integrated database on rice functional and applied genomics. Rice TOGO Browser can be accessed through a user-friendly web interface that provides three search options, namely, keyword search, region search, and trait search, to retrieve information on specific genes, sequences, genetic markers, and phenotypes associated with a specific region of the genome (Nagamura et al. 2010).

4.4 Rice Proteomics Databases

As genome sequencing of rice has been completed, proteome analysis, which is the detailed investigation of the functions, functional networks, and 3-D structures of proteins, has gained increasing attention. Many rice proteomics databases are also available that are important resources for better understanding of protein functions in cellular system, protein–protein interaction, and downstream protein functions. Today, there are lists of proteome-related databases and tools available for rice proteomics analysis. This includes ExPASy Proteomics tools, Compute pI/Mw tool in ExPASy, ProteinProspector, pIans MW calculation service of aBi, SWISS-PROT and TrEMBL, ProteinProspector (UCSF), Rockefeller Univ Prowl (search engine), Mascot, EMBL, PeptideSearch, Swiss-protExPaSy, YPD (Proteome Inc.), and Sherpa. OryzaPG-DB, a Rice Proteome Database based on shotgun proteogenomics, incorporates the genomics features of experimental shotgun proteomics data. This version of the database was created from the results of 27 nanoLC-MS/MS and runs on a hybrid ion trap-orbitrap mass spectrometer, which offers high accuracy for analyzing tryptic digests from undifferentiated cultured rice cells. Approximately 3,200 genes were covered by these peptides and 40 of them contained novel genomics features. OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomics origin, including the annotation of novelty for each peptide (Helmy et al. 2011).

The Rice Proteome Database is the first detailed database to describe the proteome of rice. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13,129 identified proteins, and the amino acid sequences of 5,092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Rice Proteome Database contains the calculated properties of each protein such as molecular weight, isoelectric point, and expression, experimentally determined properties such as amino acid sequences obtained using protein sequencers and mass spectrometry, and the results of database searches such as sequence homologies. The database is searchable by keyword, accession number, protein name, isoelectric point, molecular weight, and amino acid sequence or by selection of a spot on one of the 2D-PAGE reference maps (Komatsu et al. 2004). The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.

The important proteome databases and web resources popular in the rice research are listed in Table 2. These given databases are classified into different categories that have been used for rice proteomics databases comparison.

Table 2 Major rice proteomics databases available worldwide

5 Statistical Rice Informatics Using R-Software

The new science of statistical rice bioinformatics aims to modify available classical and nonclassical statistical methods, develop new methodologies, and analyze the databases to improve the understanding of the complex biological phenomena. This interdisciplinary science requires understanding of mathematical and statistical knowledge in the biological sciences and its applications (Mathur 2010). Statistical programming language R was developed by R Development Core Team (2009), specifically for our purposes as shown in Fig. 1. This will solve the practical issues to follow the stream of reasoning. R can be used as a simple calculator and also for complex statistical analysis.

Fig. 1
figure 1

R version 2.15.1 (2012-06-22). “Roasted Marshmallows.” Copyright (C) 2012. The R Foundation for Statistical Computing ISBN 3-900051-07-0. Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with absolutely no warranty. It also generates codes which are actively in use by researchers for crop science to evaluate their experimental data. A rice strip-plot experiment with three replications, variety as the horizontal strip, and nitrogen fertilizer as the vertical strip was calculated using R-Software as shown in Fig. 2 (Gomez and Gomez 1984).

Fig. 2
figure 2

Rice strip-plot experiment (Gomez and Gomez 1984)

dat<- gomez.stripplot

# Gomez figure 3.7

desplot(gen~x*y,data=dat, out1=rep, num=nitro, cex=1)

# Gomez table 3.12

tapply(dat$yield,dat$rep,sum)

tapply(dat$yield,dat$gen, sum)

tapply(dat$yield,dat$nitro, sum)

# Gomez table 3.15. Anova table for strip-plot

dat<-   transform(dat,nf=factor(nitro))

m1 <- aov(yield ~ gen * nf + Error(rep + rep:gen + rep:nf), data=dat)

summary(m1)

>library(agridat)

>png(filename="gomez.stripplot_%03d_large.png", width=1000, height=800)

> ### Name: gomez.stripplot

>### Title: Rice strip-plot experiment

> ### Aliases: gomez.stripplot

>

> ### ** Examples

>

>

>dat<- gomez.stripplot

>

> # Gomez figure 3.7

>desplot(gen~x*y,data=dat, out1=rep, num=nitro, cex=1)

>

> # Gomez table 3.12

>tapply(dat$yield,dat$rep, sum)

R1 R2 R3

84700 100438 100519

>tapply(dat$yield,dat$gen, sum)

G1 G2 G3 G4 G5 G6

48755 56578 54721 50121 47241 28241

>tapply(dat$yield,dat$nitro, sum)

0 60 120

72371 98608 114678

>

> # Gomez table 3.15.Anova table for strip-plot

>dat<-  transform(dat, nf=factor(nitro))

> m1 <- aov(yield ~ gen * nf + Error(rep + rep:gen + rep:nf), data=dat)

>summary(m1)

Error: rep

Df SumSq Mean Sq F value Pr(>F)

Residuals 2 9220962 4610481

Error: rep:gen

Df Sum Sq MeanSq F value Pr(>F)

gen 5 57100201 11420040 7.653 0.00337 **

Residuals 10 14922619 1492262

- - -

Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Error: rep:nf

Df Sum Sq MeanSq F value Pr(>F)

nf 2 50676061 25338031 34.07 0.00307 **

Residuals 4 2974908 743727

- - -

Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

gen:nf 10 23877979 2387798 5.801 0.000427 ***

Residuals 20 8232917 411646

- - -

Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

>dev.off()

null device

1

>

Result

Thiel et al. (2009) used R-statistical package to test three regression methods: (i) linear least squares regression, (ii) local polynomial regression fitting using loess with the degree = 1 (the degree of polynomials) and span = 0.7 (the smoothing parameter), and (iii) local polynomial regression fitting with the same set of parameters as (ii) to normalize the observed map length differences of up to 24 % among chromosomes of both datasets using the previously defined anchor points in order to support the evidence and evolutionary analysis of ancient whole-genome duplication in barley predating the divergence from rice as shown in Fig. 3.

Fig. 3
figure 3

Local polynomial regressions fitting of anchor markers of chromosome 1 using R-statistical package (Thiel et al. 2009)

6 MATLAB Computing Tool in Rice Science

MATLAB stands for MATrixLABoratory, which is a state-of-the-art mathematical software package that integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Unlike other mathematical packages, such as MAPLE or MATHEMATICA, MATLAB cannot perform symbolic manipulations without the use of additional toolboxes. It remains, however, one of the leading software packages for numerical computation. MATLAB application is widely used in the field of rice sciences for designing probabilistic model to predict the time window of various rice diseases. The Generalized Regression Neural Network (GRNN) models were developed using the MATLAB (the MathWorks Inc., Natick, MA) software for designing a second type of neural network which do not require the optimization of multiple parameters that is required in feed-forward Back Propagation Neural Network (BPNN) and help in designing probabilistic model web server for forecasting rice BLAST disease prediction (Kaundal et al. 2006) as shown in Fig. 4.

Fig. 4
figure 4

An overview of submission form for online prediction of rice BLAST severity with “RB-Pred” web server (Kaundal et al. 2006)

7 SAS (Statistical Analysis Software) Application in Rice

SAS is an integrated system of software solutions that enables us to perform the various tasks like data entry, retrieval, management, report writing, graphics design, statistical and mathematical analysis, business forecasting, decision support, operations research, project management, and applications development. SAS runs on IBM mainframes, Unix, Linux, OpenVMS Alpha, and Microsoft Windows. Code is “almost” transparently moved between these environments. Older versions have supported PC DOS, the Apple Macintosh, VMS, VM/CMS, PrimeOS, Data General AOS, and OS/2 as shown in Fig. 5.

Fig. 5
figure 5

SAS Windowing Environment (Cary 2001)

SAS is also one of the widely used statistical software for analyzing statistical data in crop sciences. The statistical analysis that was carried out using SAS obtained the first three places, respectively, based on Version 8.2. A sample SAS program that can be used to superiority along with a high stability level (Table 3) while carrying out the analysis for three-month maturity group for Bg2845, Bg2834, and Bg300 obtained the first three places 2001/02 wet season (Samita et al. 2010). Varieties and their interactions were tested by using SAS software. ANOVA of SAS was used to display the results. Table 3 shows the results of ANOVA of split plot design for grain yield (kg/plot) in AVT-2 (BT) trial and observed significant variation in grain yield (Cyprien and Kumar 2012).

Table 3 ANOVA of split plot design for AVT-2 (BT) trial (Cyprien and Kumar 2012)

8 Computational Modeling in Rice Science

Protein–protein interaction is one of the crucial ways to decipher the functions of proteins and to understand their role in complex pathways at cellular level. The most accurate structural characterization of proteins is provided by X-ray crystallography and NMR spectroscopy. Due to certain technical difficulties and labor intensiveness of these methods, the number of protein structures solved by experimental methods lags far behind the accumulation of protein sequences. By the end of 2007, there were 44,272 protein structures deposited in the Protein Data Bank (PDB) (www.rcsb.org) (Berman et al. 2000) – accounting for just one percent of sequences in the UniProtKB database (http://www.ebi.ac.uk/swissprot). The advancements in the field of bioinformatics have given us efficient tools to understand several biological processes at the molecular level. In recent times, significant progress has been made in computational modeling of protein structures and molecular docking, which holds great promise in prediction of protein–protein interactions. Docking is the computational scheme that attempts to find the best matching between two molecules: a receptor and ligand (Halperin et al. 2002). Protein–protein docking is one of the potential means to study the structure of protein–protein complexes such as antibody-antigen complexes (Gray 2006; Sivasubramanian et al. 2006; Sharma 2008). There are several docking software now introduced into the market such as GOLD, Autodock, Hex etc. Certain online server resources like SWISS Model (http://swissmodel.expasy.org/) and Modbase (http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi) are also being used in homology modeling. Other structure validation programs such as Procheck, WHAT IF, VERIFY3D, and ERRAT were employed for further validation of models.

In the absence of crystallographic structures for rice MAPKs and MAPKK, the homology modeling approach was employed to determine a reasonable 3-D structure of these proteins based on the known structure of the template proteins. The 3-D structures were further used as an input for protein–protein docking using ZDOCK and RDOCK programs, to predict MAPKK–MAPK interactions. Simultaneously, Y2H analyses were used to study rice MAPKK–MAPK protein–protein interaction networks. A direct comparison of computational prediction and Y2H analyses of MAPKK and MAPK was made to assess the reliability of computational docking for prediction of protein–protein interactions (Wankhede et al. 2013). All the 3-D models were refined with the help of loop refinement (Modeler and looper algorithm based) and side chain refinement protocols. The modeled 3-D structure of each of eleven MAPKs and five MAPKKs are shown in Fig. 6.

Fig. 6
figure 6

Theoretical 3-D models of rice MAPKKs and MAPKs built by homology modeling. Structure of 11 rice MAP kinases (OsMPK3, OsMPK4, OsMPK6, OsMPK7, OsMPK14, OsMPK16-1, OsMPK17-1, OsMPK20-2, OsMPK20-3, OsMPK20-5, and OsMPK21-2) and 5 MAP kinase kinases (OsMKK3, OsMKK4, OsMKK5, OsMKK6, OsMKK10-2) are shown. The red region represents the alpha helices and sky blue regions the beta sheets, green colored regions depict the turns, whereas the gray color represents the loops (Wankhede et al. 2013)

The overall stereochemical quality of the modeled 3-D structure of proteins was evaluated by using Ramachandran plot which is based on psi (Ca-C bond) and phi (N-Ca bond) angles of the protein and provides information about the number of amino acid residues present in allowed and disallowed regions. All the modeled proteins showed maximum residues in the most favored region followed by allowed region and least in the generously allowed regions in the Ramachandran plot (Fig. 7).

Fig. 7
figure 7

Ramachandran plot analysis of theoretical 3-D structure of rice MAPKKs and MAPKs. The 3-D structures of 11 rice MAP kinases (OsMPK3, OsMPK4, OsMPK6, OsMPK7, OsMPK14, OsMPK16-1, OsMPK17-1, OsMPK20-2, OsMPK20-3, OsMPK20-5, and OsMPK21-2) and 5 MAP kinase kinases (OsMKK3, OsMKK4, OsMKK5, OsMKK6, OsMKK10-2) were validated using Ramachandran plot. The green dots/yellow dots show the amino acids that are in the most favored regions and additionally allowed region while red dots show the amino acids that are in generously allowed region or disallowed regions. The regions covered by sky blue line show most favored regions, while the regions covered by pink line show additionally allowed regions. Other regions of the plot show the generously allowed or disallowed region loops (Wankhede et al. 2013)

9 Conclusion

Rice is one of the economically important monocot crops in the world. The field of bioinformatics fulfills the needs and provides information on demand to the researcher in all fields of rice research. Bioinformatics is generally applicable to all branches of agriculture and is a boon for varietal improvement. It helps in understanding many agronomic traits involved in crop productivity. With the advancement in structural and functional genomics and proteomics rice databases, the research at both molecular and phenotypic level will advance further. Integrated Bioinformatics Information Resource Access (iBIRA) is an initiative to associate bioinformatics researchers with bioinformatics resources at a single platform. Every day new databases, web server, and software tools are coming up to fulfill the need of the researchers. The challenge for rice informatics is how to translate it into a logical end.