Abstract
Modern biological and chemical studies rely on life science databases as well as sophisticated software tools (e.g., homology search tools, modeling and visualization tools). These tools often have to be combined and integrated in order to support a given study. SIBIOS (System for the Integration of Bioinformatics Services) serves this purpose. The services are both life science database search services and software tools. The task engine is the core component of SIBIOS. It supports the execution of dynamic workflows that incorporate multiple bioinformatics services. The architecture of SIBIOS, the approaches to addressing the heterogeneity as well as interoperability of bioinformatics services, including data integration are presented in this paper.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher and S. Mock, Kepler: An extensible system for design and execution of scientific workflows, in 16th Intl. Conference on Scientific and Statistical Database Management (SSDBM) (Santorini Island, Greece, 2004).
S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res 25(17) (1997) 3389–3402.
BLOCKS, J.G. Henikoff, E.A. Greene, S. Pietrokovski and S. Henikoff, Increased coverage of protein families with the blocks database servers, Nucl. Acids Res. 28 (2000) 228–230.
Z. Ben Miled, N. Li, G. Kellett, B. Sipes and O. Bukhres, Complex life science multidatabase queries, in: Proceedings of the IEEE, vol. 90, no. 11, (2002).
D. Buttler, M. Coleman, T. Critchlow, R. Fileto, W. Han, C. Pu, D. Rocco and L. Xiong, Querying multiple bioinformatics information sources: Can semantic web research help?, SIGMOD Record 31(4) (2002).
A. Bairoch, The ENZYME database in 2000, Nucleic Acids Res. 28 (2000) 304–305.
T. Berners-Lee, J. Hendler and O. Lassila, The semantic web, Scientific American (2001).
D. Booth, M. Champion, C. Ferris, F. McCabe, E. Newcomer and D. Orchard, Web services architecture, W3C Working Draft (2003).
D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte and M.D. Winer, Simple object access protocol (SOAP) 1.1, W3C Note (2000).
T. Bellwood et al., UDDI Spec, Technical Committee Specification, (2002).
S. Brin and L. Page, The anatomy of a large scale hypertextual web search engine, 7th WWW Conference, (1998).
E. Christensen, F. Curbera, G. Meredith and S. Weerawarana, Web services description language (WSDL) 1.1, W3C Note (2001).
S.B. Davidson, O.P. Buneman, J. Crabtree, V. Tannen, G.C. Overton and L. Wong, BioKleisli: Integrating biomedical data and analysis packages, in: Bioinformatics: Databases and Systems, S. Letovsky (ed.), Kluwer Academic Publishers, Norwell, MA pp. 201–211 (1999).
DoubleTwist, Inc., http://www.doubletwist.com
eMOTIF, J.Y. Huang and D.L. brutlag, The EMOTIF database, Nucleic Acid Res., 21(1) (2000) 202–204.
T. Etzold, A. Ulyanov and P. Argos, SRS: Information retrieval system for molecular biology data banks, Methods Enzymol 266 (1996) 114–128.
Entigen Corporation (eBioinformatics, Inc., and Empatheon, Inc.), http://www.entigen.com/
Entrez, Entrez's 3D-structure database, Nucl. Acids. Res. 31 (2003) 474–477.
GenBank, GenBank, Nucl. Acids. Res. 31 (2003) 23–27.
Genome resources and searches, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome
INCOGEN, Inc., VIBE: Visual integrated bioinformatics, white paper, http://www.incogen.com
Java Web Start, http://java.sun.com/products/javawebstart/
JavaScript, http://wp.netscape.com/eng/mozilla/3.0/handbook/java-script/
K. Kochut and J. Arnold, et al., IntelliGEN: A distributed workflow system for discovering protein-protein interactions, Distributed and Parallel Databases 13 (2003) 43–72.
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, The KEGG resource for deciphering the genome, Nucl. Acids. Res. 32 (2004) D277–D280.
L. Moreau, S. Miles, C. Goble, M. Greenwood, V. Dialani, M. Addis, N. Alpdemir, R. Cawley, D. De Roure, J. Ferris, R. Gaizauskas, K. Glover, C. Greenhalgh, M. Greenwood, P. Li, X. Liu, P. Lord, M. Luck, D. Marvin, T. Oinn, N. Paton, S. Pettifer, M.V Radenkovic, A. Roberts, A. Robinson, T. Rodden, M. Senger, N. Sharman, R. Stevens, B. Warboys, A. Wipat and C. Wroe, On the use of agents in a bioinformatics grid, in: Proceedings of the Third IEEE/ACM CCGRID'2003 Workshop on Agent Based Cluster and Grid Computing, Sangsan Lee, Satoshi Sekguchi, Satoshi Matsuoka, and Mitsuhisa Sato (eds.), Tokyo, Japan, (2003) pp. 653–661.
L. Moreau, S. Miles, C. Goble, M. Greenwood, V. Dialani, M. Addis, N. Alpdemir, R. Cawley, D. De Roure, J. Ferris, R. Gaizauskas, K. Glover, C. Greenhalgh, M. Greenwood, P. Li, X. Liu, P. Lord, M. Luck, D. Marvin, T. Oinn, N. Paton, S. Pettifer, M. V Radenkovic, A. Roberts, A. Robinson, T. Rodden, M. Senger, N. Sharman, R. Stevens, B. Warboys, A. Wipat and C. Wroe, On the Use of Agents in a bioInformatics grid, in: Proceedings of the Third IEEE/ACM CCGRID'2003 Workshop on Agent Based Cluster and Grid Computing, Sangsan Lee, Satoshi Sekguchi, Satoshi Matsuoka, and Mitsuhisa Sato (eds.), Tokyo, Japan, (2003) pp. 653–661.
OWL, A non-redundant composite protein sequence database, Nucl. Acids. Res. 22 (1994) 3574–3577.
Protein Sequence Analysis, a practical guide. http://www.bioinf.man.ac.uk/dbbrowser/bioactivity/
PIR, The protein information resource (PIR), Nucl. Acids. Res. 28 (2000) 41–44.
PROSITE, The PROSITE database, Nucl. Acids. Res. 30 (2002) 235–238.
Profiles, http://hits.isb-sib.ch/cgi-bin/PFSCAN
Pfam, The Pfam protein families database, Nucl. Acids. Res. 32 (2004) D138–D141.
W.R. Pearson and D.J. Lipman, improved tools for biological sequence comparison, PNAS 85 (1988) 2444–2448, W.R. Pearson, Rapid and sensitive sequence comparison with FASTP and FASTA, Methods in Enzymology 183 (1990) 63–98.
P. Rice, I. Longden and A. Bleasby, EMBOSS: The European molecular biology open software suite, Trends in Genetics, 16(6) (2000) 276–277.
D. Rocco and T. Critchlow, Discovery and Classification of Bioinformatics Web Services, Lawrence Livermore National Laboratory Technical Report. UCRL-JC-149963 (2002).
R. Stevens, P. Baker, S. Bechhofer, G. Ng, A. Jacoby, N.W. Paton, C.A. Goble and A. Brass, TAMBIS: Transparent access to multiple bioinformatics information sources, Bioinformatics 16(2) (2000) 184–186.
A. Siepel, A. Tolopko, A. Farmer, P. Steadman, F. Schilkey, B.D. Perry and W. Beavis, An integration platform for heterogeneous bioinformatics software components, IBM Systems Journal 40(2) 570–591.
Swiss-Prot, The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucl. Acids. Res. 31 (2003) 365–370.
S. Schulze-Kremer, Ontologies for molecular biology, Third Pacific Symposium on Biocomputing (1998) 695–706.
Transeq, EMBOSS tool for translating DNA/RNA into protein. http://www.ebi.ac.uk/emboss/transeq/
TurboWorxTM, http://www.turboworx.com
The workflow portal, The Workflow Handbook 2004, Published in association with the Workflow Management Coalition (WfMC), Layna Fischer (ed.).
M.D. Wilkinson and M. Links, BioMOBY: An open-source biological web services proposal, Briefings in Bioinformatics 3(4) (2002) 331–341.
GCG® Wisconsin PackageTM, http://www.accelrys.com/products/seqweb
C. Wroe, R. Stevens, C. Goble, A. Boberts and M. Greenwood, A suite of DAML + OIL ontologies to describe bioinformatics web services and data, International Journal of Cooperative Information Systems 12(2) (2003).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahoui, M., Lu, L., Gao, N. et al. A Dynamic Workflow Approach for the Integration of Bioinformatics Services. Cluster Comput 8, 279–291 (2005). https://doi.org/10.1007/s10586-005-4095-1
Issue Date:
DOI: https://doi.org/10.1007/s10586-005-4095-1