Abstract
There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software.
One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale.
In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.
This work was done during an internship at Microsoft Research and this research was partially supported by NSF Expeditions in Computing award CCF 1138996.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alur, R., Cerný, P.: Streaming transducers for algorithmic verification of single-pass list-processing programs. In: POPL 2011, pp. 599–610. ACM (2011)
Bjørner, N., Ganesh, V., Michel, R., Veanes, M.: An SMT-LIB format for sequences and regular expressions. In: Fontaine, P., Goel, A. (eds.) SMT 2012, pp. 76–86 (2012)
Bjørner, N., Tillmann, N., Voronkov, A.: Path Feasibility Analysis for String-Manipulating Programs. In: Kowalewski, S., Philippou, A. (eds.) TACAS 2009. LNCS, vol. 5505, pp. 307–321. Springer, Heidelberg (2009)
Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise Analysis of String Expressions. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 1–18. Springer, Heidelberg (2003)
de Moura, L., Bjørner, N.S.: Z3: An Efficient SMT Solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)
Fülöp, Z., Vogler, H.: Syntax-Directed Semantics: Formal Models Based on Tree Transducers. EATCS. Springer (1998)
Godefroid, P.: Compositional dynamic test generation. In: POPL 2007, pp. 47–54 (2007)
Hooimeijer, P., Livshits, B., Molnar, D., Saxena, P., Veanes, M.: Fast and precise sanitizer analysis with Bek. In: Proceedings of the USENIX Security Symposium (August 2011)
Hooimeijer, P., Veanes, M.: An Evaluation of Automata Algorithms for String Analysis. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 248–262. Springer, Heidelberg (2011)
Kaminski, M., Francez, N.: Finite-memory automata. TCS 134(2), 329–363 (1994)
Livshits, B., Nori, A.V., Rajamani, S.K., Banerjee, A.: Merlin: specification inference for explicit information flow problems. In: PLDI 2009, pp. 75–86. ACM (2009)
Minamide, Y.: Static approximation of dynamically generated web pages. In: WWW 2005: Proceedings of the 14th International Conference on the World Wide Web, pp. 432–441 (2005)
NVD, http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938
OWASP. Double encoding, https://www.owasp.org/index.php/Double_Encoding
SANS. Malware faq, http://www.sans.org/security-resources/malwarefaq/w-nt-unicode.php
Saxena, P., Akhawe, D., Hanna, S., Mao, F., McCamant, S., Song, D.: A symbolic execution framework for javascript. Technical Report UCB/EECS-2010-26 (March 2010)
Segoufin, L.: Automata and Logics for Words and Trees over an Infinite Alphabet. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 41–57. Springer, Heidelberg (2006)
Veanes, M., de Halleux, P., Tillmann, N.: Rex: Symbolic Regular Expression Explorer. In: ICST 2010, pp. 498–507. IEEE (2010)
Veanes, M., Hooimeijer, P., Livshits, B., Molnar, D., Bjorner, N.: Symbolic finite state transducers: Algorithms and applications. In: POPL 2012, pp. 137–150 (2012)
Veanes, M., Molnar, D., Mytkowicz, T., Livshits, B.: Data-parallel string-manipulating programs. Technical Report MSR-TR-2012-72, Microsoft Research (2012)
Yu, S.: Regular languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, pp. 41–110. Springer (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
D’Antoni, L., Veanes, M. (2013). Static Analysis of String Encoders and Decoders. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2013. Lecture Notes in Computer Science, vol 7737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35873-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-35873-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35872-2
Online ISBN: 978-3-642-35873-9
eBook Packages: Computer ScienceComputer Science (R0)