Abstract
Communication with XML often involves pre-agreed document types. In this paper, we propose an offline parser generation approach to enhance online processing performance for documents conforming to a given DTD. Our examination of DTDs and the languages they define demonstrates the existence of ambiguities. We present an algorithm that maps DTDs to deterministic context-free grammars defining the same languages. We prove the grammars to be LL(1) and LALR(1), making them suitable for standard parser generators. Our experiments show the superior performance of generated optimized parsers. Our results generalize from DTDs to XML schema specifications with certain restrictions, most notably the absence of namespaces, which exceed the scope of context-free grammars.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Apache (2002), Xerces C++ Parser, Apache XML Project, http://xml.apache.org/xerces-c/.
B2B Group (2002), aXMLerate Project, University of Karlsruhe, http://i44pc29.info.unikarlsruhe.de/B2Bweb/.
Berstel, J. and L. Boasson (2000), “XML Grammars, ” In Mathematical Foundations of Computer Science (MFCS'2000), N. Nielsen and B. Rovan, Eds., Lecture Notes in Computer Science, Vol. 1893, Springer, pp. 182–191. Long version as Technical Report IGM 2000–06, see www-igm.univ-mlv. fr/~berstel/Recherche.html.
Brüggemann-Klein, A. (1993), “Regular Expressions into Finite Automata, ” Theoretical Computer Science 120, 2, 197–213.
Clark, J. (2000), “Expat - XML Parser Toolkit Version 1.2, ” http://www.jclark.com/xml/expat.html.
DeRemer, F.L. (1971), “Simple LR(k) Grammars, ” Communications of the ACM 14, 7, 453–460.
Donelly and Stallmann (1988), “Bison Manual, ” The GNU Project, http://www.gnu.org/manual/bison/.
Grosch, J. (1989), “Generators for High-Speed Front-Ends, ” In Proceedings of the 2nd Workshop on Compiler Compilers and High Speed Compilation, D. Hammer, Ed., Lecture Notes in Computer Science, Vol. 371, Springer, Berlin, pp. 81–92.
IBM AlphaWorks (2001), “XML Parser for Java, ” IBM Alpha Works, http://alphaworks.ibm.com/aw.nsf/techmain/xml4j.
ISO (1986), “Information Processing - Text and Office Systems - Standard Generalized Markup Language (SGML), ” ISO 8879.
Johnson, S. (1975), “Yacc - Yet Another Compiler-Compiler, ” Technical Report 32, Bell Telephone Laboratories, Murray Hill, NJ.
Microsoft (2002), “Component Object Model, ” Microsoft, http://www.microsoft.com/com/.MOST (2002), “The MOST Cooperation, ” The MOST Cooperation, http://www.mostnet.org/.
OMG (2002), “Corba 2.4.2 Specification, ” Object Management Group, http://www.omg.org/technology/documents/formal/corbaiiop.htm.
PhiDaNi (2001), “The XML Booster, ” PhiDaNi Software, http://www.xmlbooster.com.
Rosenkrantz, D.J. and R.E. Stearns (1969), “Properties of Deterministic Top Down Grammars, ” In Conference Record of ACM Symposium on Theory of Computing, Marina del Rey, CA, pp. 165- 180.
Vielsack, B. (1988), “The Parser Generators lalr and ell, ” Technical Report 93–3, Gesellschaft für Mathematik und Datenverarbeitung, Forschungsstelle Karlsruhe.
W3C (1998), “Extensible Markup Language (XML) 1.0, ” W3C Recommendation 10 February 1998, http://www.w3.org/TR/1998/REC-xml-19980210.
W3C (1999), “Namespaces in XML, ”W3C Recommendation 14 January 1999, http://www.w3.org/TR/1999/REC-xml-names-19990114.
W3C (2001), “XML Schema Part 1: Structures, ” W3C Recommendation 2 May 2001, http://www.w3.org/TR/2001/REC-xmlschema–1–20010502.
Waite, W. and G. Goos (1985), Compiler Construction, Texts and Monographs in Computer Science, Springer, Berlin.
doc.html.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Löwe, W.M., Noga, M.L. & Gaul, T.S. Foundations of Fast Communication via XML. Annals of Software Engineering 13, 357–379 (2002). https://doi.org/10.1023/A:1016566031114
Issue Date:
DOI: https://doi.org/10.1023/A:1016566031114