Abstract
The PHP string analyzer developed by the second author approximates the string output of a program with a context-free grammar. By developing a procedure to decide inclusion between context-free and regular hedge languages, Minamide and Tozawa applied this analysis to checking the validity of dynamically generated XHTML documents. In this paper, we consider the problem of checking the validity of dynamically generated HTML documents instead of XHTML documents.
HTML is not specified by an XML schema language, but by an SGML DTD, and we can omit several kinds of tags in HTML documents. We formalize a subclass of SGML DTDs and develop a translation into regular hedge grammars. Thus we can validate dynamically generated HTML documents. We have implemented this translation and incorporated it in the PHP string analyzer. The experimental results show that the validation through this translation works well in practice.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Berstel, J., Boasson, L.: Formal properties of XML grammars and languages. Acta Informatica 38(9), 649–671 (2002)
Goldfarb, C.F.: The SGML Handbook. Oxford University Press, Oxford (1990)
Minamide, Y.: Static approximation of dynamically generated Web pages. In: Proceedings of the 14th International World Wide Web Conference, pp. 432–441. ACM Press, New York (2005)
Minamide, Y., Tozawa, A.: XML validation for context-free grammars. In: Kobayashi, N. (ed.) APLAS 2006. LNCS, vol. 4279, pp. 357–373. Springer, Heidelberg (2006)
Murata, M.: Hedge automata: a formal model for XML schemata (1999), http://www.xml.gr.jp/relax/hedge_nice.html
Pair, C., Quere, A.: Définition et étude des bilangages réguliers. Information and Control 13(6), 565–593 (1968)
World Wide Web Consortium. HTML 4.01 Transitional DTD (1999), http://www.w3.org/TR/html401/loose.dtd
Warmer, J., van Egmond, S.: The implementation of the Amsterdam SGML Parser. Electronic Publishing 2(2), 65–90 (1989)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nishiyama, T., Minamide, Y. (2008). A Translation from the HTML DTD into a Regular Hedge Grammar. In: Ibarra, O.H., Ravikumar, B. (eds) Implementation and Applications of Automata. CIAA 2008. Lecture Notes in Computer Science, vol 5148. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70844-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-70844-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70843-8
Online ISBN: 978-3-540-70844-5
eBook Packages: Computer ScienceComputer Science (R0)