1 Introduction

Lua is a lightweight imperative scripting language, featuring dynamic typing, automatic memory management, data description facilities, and meta-programming mechanisms to adapt the language to specific domains [11]. The typical use case of a Lua application is as an extension library embedded in a host application, commonly written in C or C++. In that setting, Lua offers the possibility to add scripting facilities to the host application, combining the flexibility and rapid prototyping of a dynamic language within the static guarantees and optimizations of stricter programming languages.

Lua is extensively used in many diverse applications, ranging from game development, most notably by “AAA” games [12] but also in mobile games and game frameworks, plug-in development (for example, in the photo-editing software Adobe Photoshop Lightroom [1] and the type-setting system LuaTex [28]), web application firewalls [7], and embedded systems [27].

Lua is informally specified by both its reference manual and its reference interpreter, developed and maintained by the core Lua authors. Thanks to Lua’s success, several alternative implementations as well as code linters and static analyzers can be found in the wild [24, 25]. However, the informal nature of the specification means that developers of these tools must resort to their intuition, formed by study of the reference manual, inspection of the source code of the interpreter, and experimentation.

Even if Lua is considered a small language, it is not unusual to stumble across some puzzling behaviors that may not be easily inferred from the official documentation. As a simple example consider the following program:

Example 1

(To return or not to return.)

figure a

In this example, we have returning Lua’s null value ( ), and not returning anything. Then, when printing three identical calls to and , respectively, we see a difference: in the former case three s are printed, while in the latter only two are (we write to indicate the output of the interpreter; is Lua’s token for comments). The different outputs can be explained by two related aspects of Lua: tuples and vararg arguments. Briefly, some Lua functions can receive a list of arbitrary length of actual parameters. Such functions are known as vararg functions. The service is an example of a vararg function. Internally, the list of zero or more actual parameters are manipulated through what we will call a tuple. Also, a function can return a list of zero or more values, which will be also modeled as tuples. Over a tuple there are a set of rules that apply, in order to extract their values and use them in the context where the tuple appears: if it appears into a list of values (for example, the list of actual parameters in a function call, as in lines 5–6 in the code shown), we need to append the tuple’s values to the list; if it appears where a single value is required, we need to truncate the tuple and extract just its first value. As in the example shown, said rules may shield unexpected results, mostly when we deal with empty tuples.

In this work, we present an extensive formalization—mechanized in Redex [5]—of a large subset of Lua 5.2, explaining in detail its different mechanisms, like its particular take on tuples, and providing sound grounds for developing alternative implementations, extensions, and analysis tools for the language. In fact, the present work refines, extends, and greatly improves [37], which was the basis to formalize Lua’s garbage collector [38], including the development of a small tool to verify simple programs making use of Lua’s weak tables (akin to weak references). As a side note, Lua 5.2 includes interfaces with the garbage collector, namely weak tables and finalizers, which have an observable impact on the behavior of user programs. In order to account for this, we include on top of our semantics a simple syntactic, non-deterministic, garbage collector to which it is possible to communicate through said interfaces, and we show how the impact on user programs’ semantics manifests in that setting.

In order to gain confidence in our model, we took three measures: first, we developed a semantics that (mostly) contains each programming concept from the reference manual, in a one-to-one correspondence, instead of focusing on an idealized core of the language. Second, we successfully run the test suite of the reference interpreter of the language [21] directly within Redex, checking that our semantics is correct to the extent of what is tested within the suite. That is, unlike many other works in the literature (e.g., [8, 35]), we did not pay the cost—confidence-wise—of building a new interpreter. Last, we used Redex’s support for formal systems and random testing [15] to random test soundness properties of our semantics. This allowed us to detect a plethora of errors and omissions that were not caught during the development of the semantics, even after successfully passing the tests of the test suite.

In overall, we obtain evidence that our semantics is sound and corresponds to that of the selected subset of the language’s features, including

  • Every type of Lua value, except coroutines and userdata (see below);

  • Tuples;

  • Metatables (Lua’s configurable dynamic dispatch);

  • Identity of closures;

  • Dynamic execution of source code;

  • Error handling;

  • A large collection of services from the standard library;

  • Garbage collection (presented in [38]).

For reasons of simplicity, we purposely excluded the following features for future work:

  • Coroutines, in essence single-shot delimited continuations;

  • Userdata, opaque handles to data from the host application and native libraries;

  • The goto statement;

  • Services from the standard library that interface with the operating system, such as file manipulation, or have large complex C implementations, such as string pattern matching.

We develop further the main reasons behind the exclusion of some of these features in Sect. 9.

1.1 Contributions

As a summary, we present

  1. 1.

    A formalization of a large portion of Lua, in its version 5.2, including several semantics relevant details of the official interpreter not covered in the reference manual.

  2. 2.

    A mechanization of said formalization in Redex.

  3. 3.

    An interpreter based on the mechanization that successfully passes all relevant tests cases from Lua’s interpreter test suite.

  4. 4.

    A formalization and a randomly tested mechanization of the soundness property of the semantics.

  5. 5.

    A brief discussion of applications of the semantics model presented.

The mechanization can be downloaded from

https://github.com/Mallku2/lua-gc-redex-model

1.2 Quick Tour

  • Section 2 presents a brief description of Lua, with emphasis on particular features that we formalize in later sections. Its reading is non-essential for the Lua connoisseur.

  • Section 3 presents the basic concepts that our model uses, via a formalization of a very small subset of Lua. Its reading is important to understand the following sections.

Example 2

(Memoization in Lua.)

figure k
  • Section 4 expands Sect. 3 with the formalization of the most interesting parts of the semantics. Of special interests are the formalization of metatables (Sect. 4.4) and tuples (Sect. 4.2.3).

  • Section 5 presents some relevant properties of the language.

  • Section 6 presents the software artifact (the mechanization in Redex of the language), and the results of the two types of testing performed. It is an essential part of the work; its reading being most recommended.

  • Section 7 introduces two related applications of this work, modeling garbage collection, and developing static analysis tools.

  • Section 8 discusses related work.

  • Section 9 reflects on the experience of formalizing and mechanizing Lua, and discusses future avenues of research.

2 Lua: An Extensible Scripting Language

In this section, we present a quick tour to some of Lua’s most salient characteristics and features captured in our model. The Lua expert can safely skip this section and come back to it only if necessary.

2.1 First-Class Closures and Tables

We start with a simple memoization function,Footnote 1 listed in Example 2, which takes a function fn as argument and returns its memoized version. The values of fn already computed will be stored in a table (t in line 2). Tables are, in essence, associative arrays indexed by any Lua value except the null value (nil). They also come with syntax sugar and metaprogramming facilities that can greatly extend their functionality beyond simple associative arrays, as we will see in Sect. 2.3 below.

Example 3

(The environment _ENV)

figure l

The memoized version of fn is returned through an anonymous function in line 3. This function takes x as argument and, before computing fn(x), performs a look-up in the table for value x (line 4). If the result of the look-up is nil, it means no result was found, so it proceeds to compute fn(x) and store it in the table (line 5). The resulting value, either computed or retrieved from the table, is returned in line 6. Note that the memoization only works for values fn(x) different than nil. The function memoize is used in lines 10–14 to improve the performance of a function that performs a sum from 1 to x.

It is important to note that all procedures in Lua, anonymous or named, are first-class values, and form lexically scoped closures. The anonymous function that memoize returns will effectively capture into its scope the table t, as expected.

2.2 Local Definitions and the Environment _ENV

Note that the definitions of memoize, t, and memsum are prefixed by the keyword local. As its name suggests, local creates a local variable, initialized with the rvalue (what comes after the ‘=’ operator). If we omit this keyword, a declaration is considered an assignment, and if the variable was declared local before, then it performs an update of its value, as usual. More interestingly, if there is no such variable in scope, then Lua looks up for the variable in a table called the environment, which is bound to the identifier _ENV, using the variable’s identifier as the key. Similarly, reading the value of a variable that is not within scope triggers the search for the value associated with that name in the environment. Abusing the notation, this means that any occurrence of a variable x that is not in scope, is just syntax sugar for _ENV[” x ”]. Variables in the environment are called global variables. Services from the standard library (like print) are accessible through global variables, and are, therefore, bound by this environment.

Since _ENV is in itself a variable, the programmer can, at any time, change the environment in which a program is executed by simply assigning another table to _ENV. Example 3 shows some simple interactions with the environment to illustrate this concept.

2.3 Metatables

The most notable feature of Lua is its meta-programming mechanism, metatables, that lets the programmer adapt the language to specific domains. Thanks to metatables, Lua can maintain its original design decision to keep the language simple and small [12], while still being able to cope with a variety of programming concepts (Section “Code Structure/Programming Paradigms" of [26]).

Example 4

(OOP based on Lua’s metatable mechanism.)

figure m

As an example, we show the implementation of some basic concepts of object-oriented programming. The exampleFootnote 2 is listed in Example 4, and it models classes and objects by combining tables, first-class functions—which we already saw in Sect. 2.1—and the metatable mechanism. It also introduces some syntax sugar provided by Lua to better support OOP.

In Lua, a class is essentially implemented as a dictionary (e.g., table), in which the method names form the keys of the dictionary, and the method implementations are the associated values. Objects are also modeled with tables, containing in their fields their attributes.

In the example, we have a class MyClass with its corresponding constructor (line 4) and only one field value with its setter (line 10) and getter (line 14). The function declarations in these lines are actually syntax sugar for assignments, where the left-hand sides are, respectively, MyClass[”new”], MyClass[”set_value”], and MyClass[”get_value”]. For the two methods on line 10 and line 14, the use of : instead of . also includes an implicit first parameter for these functions, named self.

In the last lines of Fig. 4, we show how to create an instance of MyClass (line 18), and how to invoke the methods. In line 20, we can observe the invocation of set_value with yet another syntax sugar: mc:set_value(6) is equivalent to mc[”set_value”](mc, 6) (remember that the first, implicit, parameter of set_value is self).

Example 5

(Taking the maximum element of an array, together with is index.)

figure n

If classes contain methods, and objects contain fields, how is mc[”set_value”] looking up the set_value method? The answer is the metatable mechanism, used in lines 2 and 5. In line 5, the call to setmetatable assigns MyClass as the metatable of the table provided as argument (the empty table {}). The (empty) table returned is our self object.

A metatable can modify the behavior of a table with regard to most of Lua’s operations. For this example, the behavior we are modifying is look-up of non-existing keys. Each behavior can be modified through a metamethod that, despite its name, is a value associated with a specific key in the metatable. In this case, the key is __index (line 2), and the value is the table MyClass. The overall effect will be that a miss when looking up for a key in the table (mc) will trigger a look-up for the same key in the metatable (MyClass). This is how mc[”set_value”] results in the method set_value from MyClass.

Lua also allows metamethods for an indexing with a non-existing key, for a function call over a value that is not a closure, for unexpected circumstances involving binary and unary operators, for setting finalizers, and even for some functions in the standard library. Lua programmers typically use metatables for object-oriented programming (including more elaborated object models than class-based single inheritance), for operator overloading, and for proxies.

2.4 Tuples

A Lua function may return several values in what we call a tuple. Tuples are particular in the sense that several of Lua constructs treat them specially. Take for instance Example 5 ([10], Chp. 5.1). The function maximum takes an array a and returns its maximum element, together with its index. It uses the iterator, obtained from the function ipairs of the standard library (line 4), which, in a first call, returns a tuple with three values: an iterator function, the table to be iterated, and 0. By calling repeatedly this iterator, supplying the table to be iterated and a position, we can traverse the table following its numeric integer indexes, beginning at 1. Accordingly, the for-in construct knows how to use the iterator to obtain, at each iteration, an index to the table being iterated and the value at that position. When the for-loop ends, we have in m the maximum value, and in mi, its index in the array. These are returned by maximum in line 10.

Example 6

(Converting an array of elements into a tuple.)

figure o

The size of a tuple does not need to be statically determined. An example of dynamically constructed tuples is presented in Example 6, where function unpack takes an array of elements and produces a tuple containing such elements. There are a couple of niceties in the code that needs some explanation. To begin with, in Lua, a function can be called with fewer or extra arguments. Every argument that is not passed by the caller of the function has value nil, and every extra argument is simply discarded. This feature allows, for example, to avoid having an auxiliary function performing recursion, and then the main function calling it with the base case. Instead, we can combine the two. In unpack, the index i used to index the array can be omitted, in which case we ensure it starts at 1 (arrays are 1-based indexed) thanks to the or operator (line 2). If i is nil, then or will return its second argument, 1.

Another aspect of tuples worth discussing is that we can choose to ignore elements from a tuple, or even take more elements than actually present in the tuple. The first case is illustrated in line 8, where we assign only the first two elements from the tuple returned by unpack, discarding the third. The second is illustrated in line 10, where the first three variables get the three elements from the tuple, while the fourth gets value nil.

2.5 Vararg Arguments

It is possible to specify a variadic function, i.e., a function that can receive an arbitrary quantity of parameters, by appending the vararg parameter to the list of formal parameters of a function. Noted with ..., this argument collects all the extra arguments that are provided to a function. In Fig. 7, we see a function dual to : it receives a list of arguments of arbitrary length and returns a vector containing those arguments: a table whose elements are indexed by natural numbers.

Example 7

(Converting a list of arguments into a table.)

figure q

3 Basics of the Formalization

In this section, we gently introduce the semantic framework used throughout the paper by providing semantics to Lua\(_0\), a toy subset of Lua. We write the semantics in a small-step operational semantics with evaluation contexts. Evaluation contexts, taken from Felleisen–Hieb’s Reduction Semantics (RS) [5], are used for the specific purposes of modularization; for providing a concise description of the context-sensitive semantics; and to define the execution order. In this, we follow the path taken by [8, 34, 35], where RS is similarly applied for successfully formalizing real programming languages. However, we choose a different path with regard to the presentation of our formal semantics and the relation with its mechanization:

  1. 1.

    We emphasize the distinction between what constitutes the language we want to model, and what are the run-time constructs: concepts proper of the dynamic semantics required to provide an operational semantics.

  2. 2.

    We reduce the complexity of the desugaring process, by staying as close as possible to the source language.

  3. 3.

    We maintain a short-yet-noticeable distance between what is the formal model presented in the paper and its corresponding Redex mechanization.

The first two respond to the need of building the model as comprehensible as possible, mainly for the developers of the language and tools: Point 1 helps distinguishing the syntax developers care about from the constructs that are not visible to them when executing programs, and Point 2 reduces the gap between the code shown in the rules and actual Lua code, which helps building trust in the model (also noted in [2, 29]).

The motivation behind Point 3—discussed in depth in Sect. 6—is to present the rules of the language in a more natural form (as seen in many books on semantics, e.g., [33]), instead of presenting Redex code directly.

For the small subset of the Lua language presented in this section, we have two fragments: pure statements and expressions (following Lua’s distinction of statements and expressions), and stateful (i.e., memory changing) statements. We describe each of them in isolation, each with their own relation. Then, we compose the two using a third relation which will also deal with execution order, providing semantics to entire programs.

Some comments on notation: in the following, we will use nt to refer to, either, the non-terminal nt or the set of terms generated from the production of said non-terminal (resorting to context for disambiguation); an nt, possibly with a numeric sub-index, to refer to a term that belongs to the set of terms generated by nt.

3.1 The Pure Fragment of Lua\(_0\)

We show the grammar and semantics for stateless programs in Fig. 1. As statements (s), we only include conditional branching and skip . The condition is an expression e, which can be a value (v), a fully applied and infix binary operator (binop), or a fully applied and prefix unary operator (unop). Values are nil or boolean literals (true or false). Operators are the logical connectives and, or, and not. Of course, with this language, we are not able to write any useful program, but in the coming sections, we will grow the language until we reach Lua.

Fig. 1
figure 1

Pure fragment of Lua\(_0\)

The operational semantics for the pure fragment are modeled with the \(\rightarrow ^{s/e}\) relation between statements and expressions. Rule If-T states that if the conditional of the if is any value different from niland false, then it is considered true, and therefore, the then branch is taken. Note that we write above the line the conditions in which the rule applies. We omit the line in the case that no condition is required. Rule If-F states that, for false or nil, the else branch is taken.

The reader may wonder why we have not put true instead of v in the first rule, since there are no other values. The reason is that we will grow the language to have more values, like numbers, and we will still take this rule as is. Indeed, in Lua, a value like the number 0 is considered true, unlike in languages like C.

As for the semantics of expressions, instead of providing operational rules for each operator, we follow [8] and use a function, called the interpretation function \(\delta \), which provides meaning to operators using a declarative style of semantics. That is, we put the emphases on what is the result obtained from using an operator, rather than describing how each operator produces a result step by step. The benefits of using this style are more clearly seen in coming sections when providing semantics to more complex operations than arithmetic operators. To maintain cohesion in our semantics, we will use \(\delta \) to provide semantics to every primitive operator and service.

In this section, \(\delta \) provides meaning to the shortcut boolean operators, in which the right operand of a binary operator is not expected to be a value. As with the conditional, a value that is distinct from false and nil is considered true.

A final note regarding the model introduced so far: to present a familiar grammar to the Lua programmer, we maintain the syntactic distinction between statements and expressions. However, we do not let it to scale up to the level of the semantics relations. First, this distinction is merely a concrete grammar issue [23]. Second, previous experience with the formulation of a formal semantics for Lua [37] shows that defining semantics relations following that distinction does not result in an improved understanding of the concepts, but it does result in an unnecessary proliferation of relations.

Fig. 2
figure 2

Stateful fragment of Lua\(_0\)

3.2 The Stateful Fragment

We extend the language presented so far with imperative features, namely imperative variables (Fig. 2). Statements are extended with local variable definitions and assignment. There are two things to note about a variable declaration: first, as mentioned in Sect. 2.2, they must be preceded with local; otherwise, in Lua, they are global. Second, we make here the first of a few modifications to Lua’s syntax: we explicitly declare the scope of a variable with the in keyword, absent in real Lua. This minor change in the grammar allows us to have a very simple representation of state that, as will become clear, brings several benefits.

The operational semantics of imperative features are commonly understood in terms of state and state change; therefore, we enrich our semantics with a model of state: a partial function from a set of references to values, denoted as \(\sigma \). We refer to \(\sigma \) as the “values’ store,” or simply store.

As for the domain of \(\sigma \), referred as to \(\mathsf {dom}(\sigma )\), we do not force any specific representation and just require it to be a finite set, with elements that must be syntactically represented and different from any other syntactic object in the language. We further assume it is always possible to obtain a fresh reference from the store. We will write \((\textsf {r} , \textsf {v} ), \sigma \) to mean the extension of store \(\sigma \) with reference r pointing to value v, and we will assume that r is fresh, i.e., not in the domain of \(\sigma \).

For the semantics of the new constructs, we use a new relation \(\rightarrow ^{\mathsf{\sigma }}\), which maps a pair of a store \(\sigma \) and a term \({\mathsf {t}}\) (either a statement or an expression) to another pair of a new store \(\sigma '\) and the resulting term \({\mathsf {t}}'\). The first rule (Local-Decl) models the declaration of a local variable. When the definition is a value \(\textsf {v}\), we put it in the store mapped with a fresh reference \(\textsf {r}\). Then, we replace each occurrence of variable \(\textsf {x}\) in the program s with \(\textsf {r}\), denoted as \(\textsf {s}[\textsf {x} \backslash \textsf {r}]\).

By substituting every occurrence of the variable with a fresh reference (e.g., [4]), we obtain two benefits: we avoid having to carry around the environment (a mapping between variables and references), and we obtain a simple representation for closures (c.f., Sect. 4.2.3). As a little downside, it forces us to add a semantic component to the language, making references be part of the grammar of expressions. We call a run-time term such an extension to the language needed to express its semantics. In more advanced sections, they will also serve us to cleanly modularize the semantics.

As for variable assignment (Local-Assgn), we update the store using the notation \(\sigma [\textsf {r} := \textsf {v}]\) to mean a new store in which the value of r is replaced with v. This is only defined if \(\textsf {r} \in \mathsf {dom}(\sigma )\), which is an implicit side condition of Local-Assgn that must hold in order to be able to apply it. Formally, \(\sigma [\textsf {r} := \textsf {v}]\) denotes a new store \(\sigma ^{\prime }\), such that \(\mathsf {dom}(\sigma ^{\prime }) = \mathsf {dom}(\sigma )\), where \(\sigma ^{\prime }(\textsf {r}) = \textsf {v}\) and \(\forall \textsf {r'} \in \mathsf {dom}(\sigma ^{\prime }), \textsf {r'} \ne \textsf {r} \Rightarrow \sigma ^{\prime }(\textsf {r'}) = \sigma (\textsf {r'})\). Note that the assignment reduces to a skip, indicating that there is nothing else to do for this particular statement. Finally, rule Local-Deref shows that references are implicitly dereferenced.

3.3 Executing Entire Programs

Fig. 3
figure 3

Semantics of Lua\(_0\) programs

We have already defined two different relations, each of them computing a bit of a program: \(\rightarrow ^{s/e}\) relates stateless statements or expressions and \(\rightarrow ^{\mathsf{\sigma }}\) stateful ones. Now we are ready to combine the two relations to perform the execution of a complete program. To that effect, we define the \(\mapsto \) relation. This relation selects the next term to be executed within a program, called the redex and forwards it to the corresponding relation. We are aiming at a deterministic semantics, so there should be only one redex in every program.

We capture the notion of redex syntactically with an evaluation context (Fig. 3): a program with a hole \(\llbracket \; \rrbracket \) specifying where the redex is to be found. For every constructor of a term from our language, and for a given position into the term where the next redex must be found, there will be one evaluation context’s constructor labeling that position with \(\llbracket \; \rrbracket \). The determinism of the semantics emanates then from the fact that, for every program, either it is a final computation, or there is only one way of decomposing it into an evaluation context and a term, and that term can be executed using only one rule (see Sect. 5).

We can observe from the definition of E the expected evaluation order: in an if statement, the guard must be evaluated first. For defining local variables, we evaluate the rvalues first. In a binary operation, we evaluate the left operand first.Footnote 3 Note that, unlike with the grammar presented so far, for simplicity, we are using a single category of evaluation contexts for statements and expressions.

As stated above, the semantics of programs is given with the \(\mapsto \) relation. \(\textsf {E} [\![ \textsf {t} ]\!]\) denotes an evaluation context where its hole has been filled with the term t, yielding a program. This relation simply forwards execution to the relations defined in previous sections.

3.4 Example

Fig. 4
figure 4

Trace of execution of a simple program

Figure 4 presents the reduction of a simple program that flips boolean variable b. In the right-most column, we write the corresponding evaluation context considered for the step, leaving to the reader the other two components of a reduction step: the fragment of the program filling the hole and the rule being used. Note how in the first step, the variable b is replaced with reference r1.

With all of the main ingredients in place, we are now ready to provide semantics to Lua.

4 A Formal Description of Lua

In this section, we describe the highlights of our formalization of the semantics of Lua, the main contribution of this work. Section 4.1 covers the stateless subset of the language, Sect. 4.2 covers the imperative subset, Sect. 4.3 describes the concepts added to support standard library services, Sect. 4.4 covers the semantics of metatables, and Sect. 4.5 completes the semantics with execution order, to be able to explain the execution of complete programs and error handling.

4.1 Stateless Lua

We extend the stateless subset of Lua\(_0\) presented in Sect. 3 with new statements (while loops, breaks, composition of statements) and new values and expressions (numbers and strings, with their corresponding operations). To ease the presentation, we present each addition separately, starting with the new statements.

4.1.1 Statements

Fig. 5
figure 5

The stateless subset of Lua (statements)

Figure 5 shows the new statements. As mentioned, these are compositions of statements (essentially, one statement after the other) and while loops (with their break operation). We purposely do not include for loops and instead translate them into while loops, following Lua’s reference manual definition (3.3.5 [22]).

As for the semantics, we start with the composition of statements (Seq): once the left-most statement has been executed to an end (;), the execution continues with what follows (s).

More interesting is the semantics of while loops. First, a while loop is labeled with a Break label (While-Start), renaming while to $iter. The labeling and $iter are new run-time statement created for the purpose of executing whiles: the label marks the point from which a break should continue the execution, and $iter is necessary to avoid repeatedly unfolding a while, stacking up labels.

A loop marked with $iter is then unfolded as usual, using an if-then-else to check the guard and perform a new iteration (While-Iter). We took from [8] the idea of labeling the while loop to mark the exit point of a break; however, our label is inserted at run time, while in the aforementioned work, it is inserted when desugaring the code. Our take reduces the complexity of the desugared code, but at the expense of requiring the extra run-time construct $iter and the labeling step. While in this particular case, the gain is little, in the case of function calls (Sect. 4.2.3), the gain is more evident. For consistency, thus, we decided to do the same for while loops.

When the execution finds a break inside a labeled block (While-Break), the whole program contained in the inner-most Break label is discarded, effectively modeling the jump out of the body of the loop. To model the program within the label, we use a new category of evaluation context, \(\textit{E}_\textsf {lf}\) (label-free contexts), which represents a program in which no other labeled term or while loop occurs. In order to avoid repetition, we decided to split the evaluation context in two: first, \(\textit{E}_\textsf {lf}\) is like the evaluation context from Sect. 3.3, also adding the case for composition of statements; second, we define the new E context as an extension of \(\textit{E}_\textsf {lf}\). Extending an evaluation context like \(\textit{E}_\textsf {lf}\) to create a new evaluation context E has the effect of renaming each occurrence of \(\textit{E}_\textsf {lf}\) with E.

When the execution of the loop ends normally (While-End), we just discard the label.

The presented features cover the main aspects of the semantics and control flow of the while statement. Note that Lua 5.2 does not have a continue statement, yet some of the proposed ways of emulating itFootnote 4 involve concepts already present into our model: using conditionals, Lua’s error mechanism (to be introduced in Sect. 4.5), or using a break.

4.1.2 Expressions

Fig. 6
figure 6

The stateless subset of Lua (binary operators)

We turn our attention to the evaluation of expressions, which we break in two: evaluation of equality and the application of the remaining binary operators. We omit the evaluation of boolean expressions and of the unary minus operator since they follow a similar pattern.

Binary operators, except equality, are treated in Fig. 6. As for the grammar, we define the class strictbinop to differentiate binary operators that requires the two values to be evaluated prior to execution, unlike with the boolean operators. Note how we extend the evaluation context \(\textit{E}_\textsf {lf}\), and assume that E is extended likewise.

Turning to their semantics, Binop states that the actual semantics of the application of op over operands \(\textsf {v}_1\) and \(\textsf {v}_2\) is captured by the function \(\delta (\textsf {op})\). The semantics of the binary operators shown is mostly standard, the only exception being an internal step of coercion done between numbers and strings: in every Lua’s primitive operator and library service, each time that the semantics dictates that a number is expected, it is possible to provide a string representation of a number, and internally, Lua tries to coerce it to a number, following certain lexical conventions. The same happens when a string is expected and a number is provided. These details, together with the remaining aspects of the standard semantics of binary operations, are all captured in \(\delta \). The only role of the semantics relations is to interface with \(\delta \), checking that it is actually possible to apply op over operands \(\textsf {v}_1\) and \(\textsf {v}_2\), i.e., the application returns a value v, and not, for example, an error object.

On the other hand, rule Binop-Wo shows the case when the binary operator cannot be applied successfully (\(\delta (\textsf {op}, \textsf {v}_1, \textsf {v}_2)\;\notin \;\textit{v}\)). This could mean that we tried to perform arithmetic over values that cannot be interpreted as numbers, or tried to compare values for which there is no standard order relation built-in in Lua. In this case, we label the expression with information about what has gone wrong (BinopWO, where WO stands for Wrong Operands). We, therefore, extend the class of expressions to include labeled expressions and extend the set of labels with the new labels, in Fig. 6.

At this point, execution is stuck here. If we were not to extend the semantics any further, these error labels will only serve the purpose of informing the user what caused her program to fail. Later on, we will see how metatables try to resolve the situation, either by providing an alternative value, or by throwing an error (Sect. 4.4). Such errors can be caught with the primitive xpcall (Sect. 4.5).

Fig. 7
figure 7

The stateless subset of Lua (equality)

In case of equality, the behavior is a bit more intricate (Fig. 7). As with previous operators, we abstract into \(\delta ({\mathbf {\mathsf{{{==}}}}})\) the corresponding details of equality comparison, but we perform some extra processing of the result. However, note that the equality relation is not trivial: e.g., in order to obtain a semantics that can be tested against the official test suites, it must define proper comparison of IEEE 754 floating point numbers (as in a standard binary of the official interpreter).

In essence, \(\delta ({\mathbf {\mathsf{{{==}}}}})\) compares numbers and strings by value, and tables and closures by reference—with values of different types being always distinct. That is, unlike other dynamic languages like JavaScript, evaluates to false.

When the objects are equal (Eq-True), we simply return true. But if the objects are distinct (for instance, they have distinct types), we have two options: if the operands involved are not tables (Eq-False), equality comparison reduces to false. Otherwise, if they are tables (Eq-Fail), instead of returning false, we label the expression and defer its result to the metatable mechanism (Sect. 4.4). The metatable mechanism will then look up for user-defined functions (metamethods) implementing a comparison method for non-equal table values. In that way, Lua allows for the definition of arbitrary (reflexive) relations over table values.

4.2 Imperative Lua

In this section, we grow the imperative fragment presented in Sect. 3 to include tables (Sect. 4.2.1), multiple local variables definition and assignment (Sect. 4.2.2), and functions (Sect. 4.2.3). The latter is presented here for reasons that will become clear in time.

4.2.1 Tables

Fig. 8
figure 8

Formalization of tables (excerpt)

The syntax of table constructors, table field indexing and assignment, is presented in Fig. 8, together with their associated semantic rules. Essentially, they are mutable associative arrays: data structures where we can store key/value pairs, where keys cannot be nil or nan (a value in the set of numbers used for undefined or not representable numeric results), and values cannot be nil.

Tables are manipulated through references, so, to that end, we create a new mapping between tables’ identities (tid) and tables (and associated meta-data), called \(\theta \). The elements from the domain of \(\theta \) are considered values and satisfy the same properties as references in \(\sigma \), with the reasonable addition that the former should be syntactically distinguishable from the latter. The image of \(\theta \) only contains tables and any table-related meta-data. For the present model, this meta-data reduce to information about the associated metatable, introduced later. Formally, \(\theta : \textit{tid} \rightarrow \textit{tables} \times \{ {\mathbf {\mathsf{{{nil}}}}}, \textit{tid} \}\), where tables denotes the set of tables as described by the grammar in Fig. 8, and nil or tid are the possible metatable-related values associated with the given table, to be introduced later. We capture a new store \(\theta \) resulting from, either, extending a previous store with a new pair tid/table or changing the content of a previously stored table, using the same devices and notation as with \(\sigma \) (Sect. 3.2).

The construction of a table is formalized in Rule Table-Constr (Fig. 8), which acts when all of its fields have been evaluated. Fields are evaluated from left to right, with each key being evaluated before each value. Note the new syntax non-terminal efield to denote evaluated fields, which can be just a value or a key/value pair (denoted \( \textsf {{\textbf {[}}}\textit{v}\textsf {{\textbf {]}}}\;\textsf {{\textbf {=}}}\;\textit{v}\)). The meta-function addkeys adds absent keys—namely, consecutive natural numbers—to fields of the form v. It also discards fields already present in the constructor that contain numeric keys in the interval \([1, {\mathsf {n}}]\). Here, n is the quantity of fields without keys in the constructor (hence, the fields for which addkeys adds the corresponding numeric keys). We show an excerpt of its formal definition, in Fig. 8. Note that Lua’s reference manual is not precise in this point, and different definitions of addkeys can have an observable impact on the semantics of operations like computing the length of a table (through the operator #), or iterating a table, through library services like ipairs. Our mechanization of addkeys successfully passes the official interpreter’s tests, and as such, can serve as a frame for future implementations.

Finally, returning to Table-Constr (Fig. 8), note that we are actually mapping a fresh table id tid with an ordered pair: the first component contains the actual table and the second one the id of the metatable or nil, if absent, as is in this case. Metatables are discussed in depth in Sect. 4.4.

Fig. 9
figure 9

Formalization of tables (primitives for table assignment and indexing)

Assignment to table fields has three cases: the successful update of the field (Table-Update), and two special scenarios. The first comes from trying to update a non-existing key (Table-Update-WK); while the second comes from trying to update an object which is not a table (Table-Update-NT). The semantics of these special scenarios can be meta-programmed, so, as we did with similar situations, we just label the term with information about what has occurred, to delegate the execution to the metatables mechanism.

To know if a key exists in a table and to update the field, we use two services of Lua, rawget and rawset, formalized in Fig. 9. \(\delta (\textsf {rawget},\textsf {tid},\textsf {v},\theta )\) yields either the value associated with \(\textsf {v}\) in \(\theta (\textsf {tid})\) or nil, if there is no associated value. Note that we use the equality comparison function (\(\delta (\textsf {{\textbf {==}}})\)) to determine which field is being indexed with the given key. Recall from Sect. 4.1.2 that \(\delta (\textsf {{\textbf {==}}})\) should define the details of a proper comparison between Lua values, and, also, should not resort to the metatables mechanism when comparing different table values (which is not the case with the language == comparison operator).

\(\delta (\mathsf {rawset},\textsf {tid},\textsf {v}_1,\textsf {v}_2,\theta )\) yields a new \(\theta \) where the table referenced by \(\textsf {tid}\) associates \(\textsf {v}_2\) with value \(\textsf {v}_1\). In the first equation, we consider an existing key, in which case we remove or update the field, depending of the value \(\textsf {v}_2\). Note how nil and nan cannot be the keys of a table field. As with rawget, the keys are compared using \(\delta (\textsf {{\textbf {==}}})\), and the same considerations apply with regard to the semantics consequences when using user-defined equality comparison relations. The second equation of rawset considers the case when the key does not exist in the table. In this case, the pair is added, only if the value is not nil. The last equation considers the error cases.

Accessing a field of a table follows a similar pattern so we omit the rules of this operation.

4.2.2 Local Variables

Fig. 10
figure 10

Multiple variables declaration and assignment (excerpt)

We introduce a small change in the language in order to support multiple-variable definition and assignment (Fig. 10). Essentially, we can have arbitrary lists of variables and terms as the lvalues and rvalues of a local declaration, respectively. Similarly, we can have arbitrary lists in the assignment, with the addition that we can update variables and tables’ fields.

For multiple-variable assignment, the following steps are performed. First, the execution evaluates all lvalues before the rvalues, as indicated by the evaluation contexts. Second, the quantity of rvalues is adapted to match the lvalues, either adding extra nil rvalues when there are fewer rvalues than lvalues (Assgn-Fewer), or discarding the extra rvalues when there are more rvalues than lvalues (Assgn-More). Once the number of lvalues matches that of rvalues, a multiple assignment is decomposed into a sequence of single assignments. This is so since rvalues can be local variables and/or table fields, and we need to distinguish each case in order to properly explain their semantics. Note that we need not to make any further step: each assignment will be handled by the corresponding rules (Local-Assgn, from Sect. 3.2, and Table-Update, from Sect. 4.2.1). Finally, by splitting a multiple assignment into several single variable assignments, we also model what happens in Lua when there is an lvalue that is repeated: the assignments are effectively repeated over the local variable or table field.

The introduction of local variables follows the same steps, which we omit for brevity. The creation of new references is modeled with a simultaneous substitution operation (Local-Decl, which subsumes the previously defined rule with the same name in Sect. 3.2). Repeated occurrences of variable identifiers (as lvalues) are allowed, and their correspondent rvalues are evaluated. Then, when the introduction of the variable in the corresponding scope is applied (through the substitution function), only the last variable definition is taken into account.

4.2.3 Functions, Function Applications, and Tuples

Fig. 11
figure 11

Functions and function calls (grammar)

It might at first look suspicious to consider Lua’s functions and application as belonging to the imperative subset. However, Lua’s functions are first-class values internally represented with closures closing over their external variables’ references, and, also, the parameters of a function are modeled as imperative variables. Therefore, we need the state to define functions and function’s applications. Tuples are also considered here because they are required to model functions.

Figure 11 shows the grammar involved in the formalization of function definitions and calls. The first thing to note is that functions are values, and that each function is labeled with a new countable set of elements \(\ell \), which should not to be confused with a function’s name: the label \(\ell \)—inserted at desugaring—is used for closure comparison (Sect. 4.2.4), and in practice, it is akin to the line number, in the source code. As an example, is desugared as .

Functions have two constructors; the difference being the vararg arguments \(\centerdot \centerdot \centerdot \) (we use this special notation to avoid confusion with ellipses also used in the grammar; when showing Lua code, we will use the conventional syntax ). This special token refers to the extra arguments passed to a function beyond of those explicitly named in its definition. Within the function body, \(\centerdot \centerdot \centerdot \) is treated as a tuple. Tuples are run-time expressions, noted as < e , ... >, and cannot be constructed by the user. They are not values and, hence, cannot be stored.

Then, we have the return statement, which allows the return of multiple values; function calls, with a special syntax for calling an object’s method; and parenthesized expressions ( \( \textit{e} \) ), which takes the first value of a list of values.

As for function calls, we need to distinguish their occurrences: they can appear in a statement, as in f(); print(’hi’), or in an expression, as in print(f()). A function call used as an expression can return none, one or more values, while a function call used as a statement is useful just for its side effects, with any value returned from such function being discarded. Our parser is in charge of distinguishing and compiling each case into syntactically different objects, tagging with $statFcall the latter case.Footnote 5

Fig. 12
figure 12

Functions and function calls (semantics, 2nd step)

Function calls are not plain \(\beta \)-contractions: they allow us to call a given function with an arbitrary quantity of parameters, regardless of what is specified in its signature, discarding or completing with extra nil values as required, like the following simple example shows:

Example 8

(Calling a function with different number of parameters.)

figure v

Additionally, the result of a call might be in a position where the returned values must be appended to existing ones or trimmed. For instance, consider the following short example, in which is the function from Example 5:

Example 9

(Returned values are appended to the argument list or trimmed.)

figure x

Remember that returns the greatest number of a list together with its position. To understand this result, first note that the inner call returns and . These values are then appended to the outer list, resulting in . Therefore, the second call to yields values and , the second value being trimmed out and the first one being assigned to .

A last aspect of function calls that can be captured into the model presented so far, is related with tail calls: nested function calls can reuse the stack frame of the actual called function, if the function call is the last instruction.Footnote 6 Specifically, in Lua 5.2, tail calls are calls of the form \({\mathbf {\mathsf{{{return}}}}} \; ({\mathbf {\mathsf{{{function}}}}} \;{\ell }{}\;{\mathbf {\mathsf{{{(}}}}}\textsf {x}_1,\ldots {\mathbf {\mathsf{{{)}}}}}\; \textsf {s})\; (\textsf {v}_1,\ldots )\). In that case, the called function will reuse the stack frame of the actual function from which it is being called. In our model, we include just one piece of information from a regular stack frame: the position to which the function must return, denoted with the labels RetStat and RetExp. And by means of the manipulation of this representation, we can model tail calls. We explain it below.

In order to specify these behaviors, we take the following actions:

  1. 1.

    Evaluate the function position and then the arguments, from left to right.

  2. 2.

    Recognize function calls in tail position and act accordingly.

  3. 3.

    Adjust the list of actual parameters and perform the function call itself.

  4. 4.

    According to the place of the call (statement or expression), discard the returned values or put them in a tuple.

  5. 5.

    In the case of an expression, distinguish if the tuple must be appended to other values or trimmed.

The first step is specified with the help of the evaluation contexts, as can be seen in Fig. 11. As a shorthand, we introduce a new non-terminal E \(_{\textsf {el}}\) to mean a list of arguments whose elements to the left of the evaluation point are fully evaluated.

The second step is partially described by in Fig. 12, last rule, only for the case of a nested function call made from another function call embedded into an expression. In order to better understand it, we first explain the next step.

The third step is shown in Fig. 12, only for the case of function calls in expressions, as the other case is analogous. Rule E-Call models a simple call, with the parameters of the function treated as mutable variables, so a fresh reference is used to model each parameter. If there are fewer arguments, the value nil is assigned to the remaining parameters of the function, and if there are more, they are simply silently ignored. The resulting term—the body of the function with the fresh references replacing the parameters—is labeled with a new run-time expression . This labeling serves two purposes: first, it allows to recognize the function call as, in this case, an expression, something needed when reasoning about tuples (see below); and second, it marks the place where a return statement must jump to. In the case of a function call in a statement, the label is RetStat.

Rule E-CallVArg shows the case of a vararg function call, with the difference residing on what is done with surplus arguments: instead of being ignored they are put into a tuple, which replaces the vararg expression ( \(\centerdot \centerdot \centerdot \) ) in the body of the function. Note that they are not stored, as it does occur with the remaining actual parameters. Formally, we treat the vararg mark occurrence in the signature of a function as a binding occurrence, with special scoping rules in order to account for a particularity discussed in ([30], Issue #1): the binding occurrence binds any vararg mark that occurs in the body of the function, except if it appears in the body of a nested function definition, regardless of it being or not a vararg function. The following interaction with the official interpreter (taken from [30], Issue #1 ) illustrates this scoping rule:

figure ag

Rule E-CallWrong describes the exceptional situation that might be handled by the metatable mechanism: a function calls over a non-function value. As with previous special situations, we just label the whole expression with a tag documenting what happened.

Finally, E-MCall describes method invocation: it is simply translated into a table look-up, with the object being injected as the first argument of the function.

Now, previous to the function call itself, we try to recognize if it is in tail position. Remember that, in Lua 5.2, tail calls reduce to calls of the form:

\({\mathbf {\mathsf{{{return}}}}} \; ({\mathbf {\mathsf{{{function}}}}} \;{\ell }{}\;{\mathbf {\mathsf{{{(}}}}}\textsf {x}_1,\ldots {\mathbf {\mathsf{{{)}}}}}\;\textsf {s})\; (\textsf {v}_1,\ldots )\)

This step will be responsibility of the top level relation \(\mapsto \): it will be in charge of guaranteeing that a tail call is correctly processed before actually executing the function call itself. Once that a tail call is recognized, we manipulate the stack of calls: in our model, we just look for the inner-most RetStat or RetExp label around the function call, and remove it together with the whole rest of computation between the label and the tail call itself. That is, we remove the computations that would be discarded when performing the return of the function being called in tail position. More formally, we look for a function call that looks like this, for the case of a tail call nested into another function call done in the position of an expression:

In that case, and in terms closer to an actual implementation, we could say that we pop the actual stack frame from the call stack, obtaining

\(({\mathbf {\mathsf{{{function}}}}} \;{\ell }{}\;{\mathbf {\mathsf{{{(}}}}}\textsf {x}_1,\ldots {\mathbf {\mathsf{{{)}}}}}\;\textsf {s})\; (\textsf {v}_1,\ldots )\)

This is performed by rule E-PopSF, in Figure 12. Then, the actual execution of the function call, as described by the previous semantics rules, will install a new stack frame for the current tail call. Together, all these steps model the phenomenon of a tail call.

A last remark: in the context of a programming language with interfaces with the garbage collector, like finalizers, these implementation details have an observable effect on the semantics of programs. As an example, consider the following program:

figure ah

In the previous program, given that the recursive call to is in tail position, at each recursive call the whole content of the stack frame of the previous call is discarded. This implies that, at each recursive call, the reference to the parameter from the previous call is lost (except for the first call, made in the last line). On the other hand, if the recursive call to would be just , instead of , then the call would not be in tail position, and every stack frame of every call would be preserved, including the reference to the parameter . As we explain in [38] and Sect. 7, Lua 5.2 has a reachability-based garbage collector that, in each recursive tail call of , would consider unreachable the parameter of the previous call. Given that we have interfaces with the garbage collector, this difference in the internal treatment of the stack frames could be observed at the semantics level.

Fig. 13
figure 13

Functions and function calls (semantics, 3rd step)

The fourth step is split in two. First, the body of the function is executed to an end, either a skip or a return. Second, as listed in Fig. 13, the values are processed as follow: a function call as a statement simply returns skip (S-RetSkip), discarding the values if needed to (S-Return). Instead, in an expression, it returns a tuple, either empty (E-RetSkip) or containing the values (E-Return). A return might occur inside the body of a while (S-ReturnBreak), in which case we must jump outside of the loop.

The fifth step kicks in once we have a tuple. As noted in Example 9, we need to know if values are appended to some existing list of values or trimmed to just obtain the first value. The process requires some machinery but is simple in essence. If the context, in which the tuple is immersed, requires a list of values, the tuple’ values are appended to the values existing in the context, else, the first value is taken. To know the context, as the name suggests, we use the same type of contexts used for evaluation.

Fig. 14
figure 14

Functions and function calls (semantics, 4th step)

Figure 14 shows the details of the fourth step. Rule Tuple-One replaces the tuple with its first value. For this rule to be executed, the tuple must be in a context \(\textit{E}_\mathsf{{t}} {}\) (the t stands for truncate), which for brevity is shown partially. This means that the tuple is, for instance, in the conditional of an if-then-else, or in a list of values (but not at the end), or in a parenthesized expression, etc. If the tuple is empty, under the same context, it is replaced with nil (Tuple-Zero).

Rule Tuple-Append shows the case when the tuple must be appended to a list. This rule uses a new context, \(\textit{E}_{\textsf {a}}{}\) (a for append), which only applies if the tuple is at the end of a list of values. Then, the result of the tuple is computed with the meta-function append, which is also shown in abridged form. This function takes the context and the tuple and simply insert the values of the tuple to the end of the list, or does nothing if the tuple is empty.

With the semantics shown in this section, we can understand the example from the introduction (Example 1). We explain it in two bits. First, we have function returning nil:

figure aq

The execution of the statement occurs after each call to is resolved. Since they are in the positions of expressions, after E-Call, we get . With an empty \(\textit{E}_\textsf {lf}\) context, we perform one E-Return step and get a tuple with a nil inside. In the first two calls to , the tuple is in a position where a value is expected, therefore after executing Tuple-One we simply get nil. In the third call, instead, we append the tuple to the values computed so far (Tuple-Append). Therefore, we end with , which in turn prints the three nils.

Different is the case when we do not return anything:

figure av

Note that the body of is an implicit skip. Therefore, in the first two calls to we obtain a skip that is converted to an empty tuple by E-RetSkip, and then to nil by Tuple-Zero. In the last call, as in the previous case, TupleAppend applies and append discards the empty tuple. Therefore, we now end with .

4.2.4 Function Comparison and Caching

Lua 5.2 admits an unspecified caching of closures: when their definitions have no observable difference ([13], Chapter 8.1). We implement this caching in a way that is compatible with the official interpreter, correctly mimicking the following interactions:Footnote 7

Example 10

(Caching of functions.)

figure az

The first two functions are equal; however, Lua creates a closure for each of them, as line 3 shows. But when the function is created from the same function definition, it does cache its definition (lines 4–5).

As mentioned at the beginning of Sect. 4.2.3, our function’s definitions include a label \(\ell \), which is what distinguishes different occurrences of definitions in the program. Observe the partially unsugared code obtained from Example 10:

figure ba

Each function created from calling h shares its label ( ), and therefore, we can treat each instance as the same. But this is not enough to identify a function, as the following example shows:

Example 11

figure bc

Note how at each iteration it prints a new address: the function is not cached. The reason is that its body refers to a variable, i, which is different at each iteration (for loops makes the iteration variable immutable by creating a fresh variable at each iteration). The environment of a closure, including every local variable in which scope reaches to the given closure, is part of what the official interpreter recognizes as “observable difference” (besides the function definition itself), and, accordingly, differences in a scope are what triggers the creation of new closures, from the same function definition. In our formalization, we consider, as a representation of a closure, the whole function definition with its embedded environment, as it is when the focus of reduction reaches to the function definition (see Corollary 2, Sect. 5). Therefore, returning to the previous example, we would include in the corresponding representation of the closures created, each new instance from each iteration of the variable i. At each iteration, we would end up with a different closure and, hence, no closure would be reused.

4.3 Built-in Services

As we saw throughout this work, the interpretation function \(\delta \) defines the semantics of built-in services ( , , ...) and operators (+, .., ...). Originally introduced in ISWIM ( [5]) for the purpose of abstracting the semantics over a set of primitive operators, we use it here motivated by different purposes. As mentioned in Sect. 3, instead of describing operationally the semantics of built-in services and primitive operators, we use a declarative style; i.e., we specify just the result of the call to a service or the application of an operator, instead of describing how we obtain such results. This approach reflects the style of the reference manual: there is no operational specification from which the results can be obtained, just a description of what is expected from calling a given service or applying a given operator. While there may be some services or operators for which it could be possible to extract a simple operational semantics, for purposes of cohesion we apply the same approach to the semantics of all of them.

4.3.1 Calling a Service

Fig. 15
figure 15

Interface with the \(\delta \) function

There are two different types of calls to services: those that are internal of other services and operators (e.g., the iterator function next calls the type service), and those that are invoked by the user. This distinction is important because it is possible to change the binding of a service but only in the latter situation:

Example 12

(Overriding a built-in service.)

figure bf

As we can see from this example, the next function was unaffected by the change to the type function, while the user is now unable to call Lua’s built-in service.

From the point of view of the formalization, we introduce a new expression $builtIn which simply acts as a dispatch to the \(\delta \) function (Fig. 15). Note that we cannot simply let a program refer directly to the \(\delta \) function: \(\delta \) is a meta-function, and it does not belong to the programming language. Therefore, we need an interface within the language. Then, prior to the execution of a program, the \(\mathsf {\_ENV}\) table is populated with functions from the standard library as wrappers to the $builtIn constructor.

The semantics of the new expression is simple: it just reduces to an application of \(\delta \), providing the necessary arguments and processing its output when needed. We distinguish three cases, according to the semantics of the service: if it does not use the store (Svc-NoState); if it inspects the objects store (Svc-Const); or if it updates the object store (Svc-Update).Footnote 8 Each rule lists some of the services that fit into each category. The actual list of services includes almost all the built-in basic functions of the Lua language, together with several services from the libraries math, string, and table.

4.4 Metatables

We have seen in repeated occasions how several operations required the intervention of metatables, usually in special situations, like trying to add two values that are not numbers nor strings. Metatables are plain tables that let programmers specify metamethods for resolving such situations, associating specific string keys with—typically—functions. For instance, ”__add” is the key for associating a handler to the addition operator.

We first explain how to obtain and set metatables. Then, we show with an example how the metatable mechanism triggers once a labeled term if found.

4.4.1 Setting and Getting Metatables

Fig. 16
figure 16

Primitives for setting and getting metatables

For each Lua type except tables (and userdata, not included in our model), it is only possible to define one single metatable. That is, for example, the type of numbers has one metatable ruling every possible interaction with numbers. For tables, instead, it is possible to define a metatable per value. To this effect, the basic library provides two primitives, setmetatable, to set the metatable of a given table, and getmetatable, to obtain the metatable associated with a given value (of any type). Being services of the basic library, we provide a declarative description of their behavior using \(\delta \). Their semantics is shown in Fig. 16.

The first equation of setmetatable shows the normal behavior: we are setting v as the metatable of tid, and v is itself a table or nil; and tid is not protected. If these conditions apply, the output of setmetatable is the identifier of the table together with a new object store, resulting from updating \(\mathsf \theta _1\) to include v as the second component of the value stored for tid. If v is nil, this has the effect of removing any previous association between the table and a metatable.

If a metatable has a field having key ”__metatable”, it is said to be protected. The concept is specified by the predicate \(\mathsf {prot?}\). Semantically, a protected metatable cannot be overridden.

The second equation of setmetatable deals with different erroneous situations: trying to set a metatable for a value which is not a table; trying to set, as a metatable, a value which is not a table or nil; or trying to modify a protected metatable. In any of these cases, an error is issued. We write ... to abbreviate the message, which depends on the condition that produced the failure.

The first equation for getmetatable shows its normal behavior: when applied over a table value tid, it either returns the value set in its ”__metatable” field, if present, or the second component in the \(\mathsf \theta \) store associated with tid.

More interestingly is the situation when we want to obtain a metatable for a non-table value (the second equation of getmetatable). As it turns out, for values such as numbers or strings, it is possible to set the metatable by using the C API or, in the official interpreter, by calling the function. In our model, we assume that, if set, the metatables for these types are identified by fixed tables’ ids: for a given value v, we assume that its metatable must be identified with \(\textsf {tid}_\mathsf{\delta (type, \textsf {v})}\). Then, determining if such metatable is set reduces to ask if \(\textsf {tid}_\mathsf{\delta (type, \textsf {v})} \in \theta \). Note that such metatables might be protected too.

4.4.2 Resolving Labeled Terms (Excerpt)

In this section, we introduce a new relation \(\rightarrow ^{\textsf {meta}}\) that takes a labeled term and acts according to the case that originated the exceptional situation. We do not show all the rules that apply to each different label; instead, we focus on one example rich enough to show how this mechanism works.

Fig. 17
figure 17

Metatable mechanism for field update

Figure 17 describes the \(\rightarrow ^{\textsf {meta}}\) relation for field updates. The inner workings of the metatables mechanism is handled by the meta-function new_index, while \(\rightarrow ^{\textsf {meta}}\) serves as an interface with it (M-Upd). Recall from Sect. 4.2.1 that we have two situations, each marked by its distinctive label: either the update is attempted on a value that is not a table (NonTable), or the value is a table, yet the key does not belong to it (WrongKey). In either case, if there is a function associated with the key ”__newindex” in the metatable of the object (first equation of new_index), the function is called with the arguments of the update. If, instead, the handler is not a function, nor nil, the update is performed on the handler (second equation).

If the handler is nil, we need to distinguish according to the label. For WrongKey, we try creating the field with . If it succeeds (third equation), we return the modified \(\theta \) store and the skip statement. If it fails (fourth equation), because the key is nil or nan, we propagate the error. Note that we need to inspect the returned error object $err—the result of \(\delta (\mathsf error)\)—which is introduced later in Sect. 4.5. Finally, if the label is NonTable, we just produce an error (last equation of new_index).

4.5 Semantics of Programs and Error Handling

We concluded the presentation of the different relations used to provide semantics to Lua. Following Sect. 3, we must now provide meaning to entire programs, defining a new relation \(\mapsto \) that includes execution order and forwards to the specific relation the next piece of the program to be executed. Since its definition is mostly an extension of the relation shown in Fig. 3, but now adding also the \(\theta \) context, we will omit these definitions for brevity. More interestingly, we develop in this section the treatment of errors in Lua, whose propagation requires its formulation at the level of the \(\mapsto \) relation, for soundness reasons.

4.5.1 Errors and Protected Mode

Lua provides mechanisms to generate and catch errors. Errors are represented with error objects or error messages. They can be explicitly generated by user code, using the function , and it is possible to attach information about the error using any Lua value. The semantics of error objects consist, essentially, in aborting the execution of the program where they are generated.

In order to catch an error, the user is provided with two functions, and , the former being a specialization of the latter. They allow to execute a function under a protected mode: any error thrown by the function is caught, avoiding error propagation. For an erroneous function call, returns false and the error object. Otherwise, it returns true together with the values returned by the function call. behaves as , but instead of returning the error object in case of error, it forwards it to a user-provided handler function to continue the execution.

4.5.2 Formalization

In formalizing error propagation, we need to take into account that it is a context-sensitive computation, requiring to look at the whole remaining program in order to decide about program termination. Therefore, we need to be cautious if we want to obtain a deterministic model. To understand this, let us consider a program of the form \(\textsf {E} \,\llbracket \, {\mathbf {\mathsf{{{\$err}}}}}\;\textsf {v}\, \rrbracket \), where E does not contain a protected mode. What we want for such a program is to completely halt in one step, ending as $err v. This suggests to include a step:

But, since \(\mapsto \) closes over \(\rightarrow ^{s/e}\) as follows:

we have several possible ways of propagating an error, depending on the decomposition of context E between contexts E’ and E”, offending the intended determinism of the semantics.

The common workaround to this problem is to propagate errors directly from \(\mapsto \). In absence of a protected mode to catch an error, it will make the whole program to halt.

Fig. 18
figure 18

Error object and protected mode

Figure 18 shows the new constructions added to the model for formalizing errors and protected mode. Error objects will be denoted with the construction $err  v, which allows for the inclusion of any Lua value as information attached to the error. represent an expression or a context that must run in protected mode, where v is the error handler passed to . Treating function calls in protected mode just as expressions will simplify the semantics, as shown below. We also include the same protected mode labels but without the handlers, which has a particular meaning to be shown below. Error propagation will be decided in terms of the presence or absence of a protected mode. To define the absence, we use a new context \(\textit{E}_{\textit{np}}{}\) that is a subset of E but without protected modes (omitted for brevity).

With the given constructions, we can formalize what it means for an error to propagate or to be caught in protected mode. The first rule (E-Termination) aborts the whole program if there is no protected mode around the error object. This is the only rule from error handling that is defined in the \(\mapsto \) relation.

The remaining rules are defined in the \(\rightarrow ^{s/e}\) relation, and concerns the execution of code inside a protected mode. If the code executed in protected mode runs until completion without an error (E-ProtTrue), then its values are appended to the tuple starting with true. Instead, if there is an error, we have two situations: if the handler is a function, then it is executed within a protected mode (without a new handler, E-ProtHandler). Note how the call is performed within parenthesis, to take just the first value. Also, note how it is not possible to call the handler if it is not a function (E-ProtHandlerErr), that is, this scenario cannot be meta-programmed through metatables. In this case, we return a tuple starting with false and a descriptive error message.

Once the code of a handler is executed in protected mode (denoted by a ProtMd context without a handler), we find two options: in a successful execution of the handler, its only returned value is appended to a tuple starting with false (to signal that a handled error occurred, E-ProtFalse), and, in an erroneous execution of the handler, an error is returned (E-ProtErr), exactly as in E-ProtHandlerErr. Finally, note that, by considering function calls in protected mode just as expression, we simplify the last step of the mechanism: we do not need to reason if a given tuple must be discarded or not. We are into an expression, so the tuples are always returned.

Here concludes our tour of the semantics of the language, proceeding to state its properties and discuss its mechanization in the coming sections.

5 Properties of the Semantics

In this section, we state two fundamental properties of our model: soundness (as the traditional combination of progress and preservation) and determinism. Their statements were tested using a lightweight approach enabled by Redex [17], in which we specify a property and use random testing to try to falsify it. Together with the successful execution of the code from the test suite of Lua, they help build trust in the model. The details of the experience with the mechanization are presented in Sect. 6, but it is worth mentioning that we were able to catch, with little effort, several ambiguities and ill-defined cases.

Fig. 19
figure 19

Well-formedness of configurations (excerpt)

When we say preservation, we say that a well-formed configuration that can perform a step results in another well-formed configuration. As for progress, we say that a well-formed configuration represents the final computation of a program, or it can perform a step of computation. In essence, together they allow us to exclude stuck terms, computations that are incorrect from the point of view of the semantics, like a program with references not allocated in the store. We explicit all these terms next.

The most important ingredient is the predicate that captures well-formedness of configurations (bottom of Fig. 19), which states the conditions that a given configuration \(\sigma :\theta :\textsf {t}\) must hold to be well formed, for the given stores \(\sigma \) and \(\theta \), and term t (a statement or expression). This predicate is parametrized by a new context C, whose excerpt can be found at the top of Fig. 19, and which captures the contextual information required to check the well-formedness of t. In essence, C keeps track of local variables, while loops, labeled terms, and function definitions. Using just term contexts to keep track of contextual information allows us to include every information needed to reason about well-formedness in a single object that is also just a kind of term context. That is, we do not need to define new constructs with the corresponding operations to manipulate them. We just reuse the language constructions, and the operations for matching and manipulation of contexts. For example, we can maintain not only information about variables in scope, but also occurrences of while loops or labeled terms, with the same kind of construction, and ask about that information by means of plain pattern matching.

A configuration \(\sigma :\theta :\textsf {t}\) is well formed for a context C, written \(\textsf {C} \vdash _{ wf {\mathsf {c}}}\sigma :\theta :\textsf {t}\), if every term of \(\sigma \) and every table in \(\theta \) is well formed, and also t is. When the context is \(\llbracket \; \rrbracket \), we omit it and simply write \(\vdash _{ wf {\mathsf {c}}}\sigma :\theta :\textsf {t}\). In order to check when a given term is well formed (w.r.t. the well-formed stores), we use a new relation \(\vdash _{ wf {\mathsf {t}}}\) and show only a representative set of rules.

\(\vdash _{ wf {\mathsf {t}}}\) is defined recursively over the structure of the term t, as can be seen, for example, in WF-If. Some rules require checking side conditions, like WF-Ref, in which the reference must exists in the corresponding store, in order to check for the absence of free occurrences of references that would render the corresponding term stuck. Additionally, some rules require specific contextual information: for a variable x, we need to check that the context contains a local or function signature binding x (WF-VarLocal and WF-VarFun, respectively). Similarly, a break can only occur if the context contains a yet-to-be-executed while (WF-BreakWhile) or an already-executed one (WF-BreakUnfolded). In both cases, we also need to check that the break statement is actually immediately inside the loop or labeled term: this is that the statement is not inside another construction surrounding it, like a function definition. The contextual information is extended each time we check certain constructs. For instance, WF-Local shows how local variables are inserted in the context C prior to checking the inner statement s. Similarly, WF-While shows how the while construct is inserted in the context C in order to check the body of the loop. Finally, for labeled expressions or statements representing situations that must be handled by metatables, we need to make sure that the conditions that created the labeling holds. For instance, for WrongKey, we need to check that the key indeed does not exist (WF-WrongKey).

Now that we have a precise meaning of what constitutes a well-formed configuration, and prior to state the main property of the formalization, we enunciate what final computations are

Definition 1

We say that a statement s is final if it is one of the following, for some value v:

  1. (a)

    \({\mathbf {\mathsf{{{\$err}}}}} \; \textsf {v}\)

  2. (b)

    \({\mathbf {\mathsf{{{return}}}}} \; \textsf {v},\ldots \)

  3. (c)

Lemma 1

(Soundness of \(\vdash _{ wf {\mathsf {c}}}\)) For given stores \(\sigma \), \(\theta \), statement s, if \(\vdash _{ wf {\mathsf {c}}}\sigma : \theta : \textsf {s}\), then one of the following situations holds:

  1. (a)

    s is final.

  2. (b)

    There exists a configuration \(\sigma ' :\theta ' :\textsf {s'}\) such that \(\sigma :\theta :\textsf {s} \mapsto \sigma ' :\theta ' :\textsf {s'}\) and \(\vdash _{ wf {\mathsf {c}}}\sigma ' :\theta ' :\textsf {s'}\).

Note that in order to prove this and the following statements, we require a clear formalization of the operations of matching and decomposition.Then, we could prove the soundness of \(\vdash _{ wf {\mathsf {c}}}\) by induction on the structure of the proof of \(\vdash _{ wf {\mathsf {c}}}\sigma : \theta : \textsf {s}\).

The previous property does not state that the step is unique. In order to have deterministic semantics there should always be at most one redex pointed by an evaluation context, into a given term:

Lemma 2

(Unique decomposition) Given stores \(\sigma \), \(\theta \), and statement s, such that \(\vdash _{ wf {\mathsf {c}}}\sigma : \theta : \textsf {s}\), then one of the following situations holds:

  1. (a)

    s is final.

  2. (b)

    There exist only one evaluation context E and term t such that s matches against the pattern \(\textsf {E} \,\llbracket \, \textsf {t}\, \rrbracket \), in such a way that only one of the following is a redex:

    (1):

    t

    (2):

    \(\sigma :\textsf {t}\)

    (3):

    \(\theta :\textsf {t}\)

    (4):

    \(\sigma :\theta :\textsf {t}\)

In order to prove it, given that we are looking for the next redex into a given term, we could rely on the information that we have in the proof of \(\vdash _{ wf {\mathsf {c}}}\sigma : \theta : \textsf {s}\). It checks the side conditions of every semantics rule, in order to guarantee that a given term is not stuck: that is, if it is not a result, then it is a redex. Then, we could prove this statement by means of induction on the structure of the proof of \(\vdash _{ wf {\mathsf {c}}}\sigma : \theta : \textsf {s}\).

Additionally, the proof will depend on a correct correspondence between evaluation contexts and semantics relations. That is, that each evaluation context should point to just one term that is the next redex, according to the semantics relations.

From these properties, we infer two relevant corollaries. The first one is a direct consequence of the previous lemmas: there are no stuck terms, and the semantics is deterministic. Hence:

Corollary 1

A well-formed configuration runs until completion, or loops forever. In other words, every erroneous situation is handled properly by the error handling mechanism:

For every configuration \(\sigma :\theta :s\), such that \(\vdash _{ wf {\mathsf {c}}}\sigma :\theta :s\), just one of the following situations holds:

  1. (a)

    The execution diverges.

  2. (b)

    The execution ends with a configuration \(\sigma ' :\theta ' :\textsf {s'}\), such that \(\vdash _{ wf {\mathsf {c}}}\sigma ' :\theta ' :\textsf {s'}\) and s’ is final.

Finally, the following corollary of soundness of \(\vdash _{ wf {\mathsf {c}}}\) justifies the representation of closures explained in Sect. 4.2.3:

Corollary 2

(No free variables introduced) For a given closed term s (i.e., without unbounded occurrences of variables’ identifiers), and well-formed configuration \(\sigma : \theta : \textsf {s}\), if there exists \(\sigma '\) and \(\theta '\) such that \(\sigma : \theta : \textsf {s} \mapsto \sigma ' : \theta ' : \textsf {s'}\), then s’ is a closed term.

Note that, otherwise, the resulting configuration \(\sigma ' : \theta ' : \textsf {s'}\) could not be considered well formed, as the soundness of \(\vdash _{ wf {\mathsf {c}}}\) guarantees. This means that, once the focus of execution has reached to a closure, every external local variable in its body has been replaced by the corresponding reference, i.e., the environment of the closure is completely embedded into its body, avoiding stuck terms.

6 Mechanization

In this section, we discuss the mechanization in Redex of the model presented in this paper. We begin by explaining the differences between the formal model and its mechanization and then discuss the results of executing the tests from the test suite and of randomly testing fundamental properties.

6.1 Correspondence with the Formal Model

During the process of formalizing the language, we went back-and-forth between our pen-and-paper formalization and its mechanization in Redex. This tool was of great help, as the resulting mechanized formalization can be executed and traced; therefore, we could recognize problems in our first attempts at formalizing the language, experiment with new ideas before landing them on the formalization and testing the mechanization against the test suite for the official interpreter, prior to verifying the desired properties.

Given that we produced two documents—the formalization on paper and its mechanization—we did not follow closely the original philosophy of Redex, consisting in considering semantic models as software artifacts [17]. Although it is indeed an interesting approach, with a real potential to optimize the time spent in a formalization effort, we still need to observe that reading the mechanization alone is at times a difficult task: some concepts from our model are not possible to express directly in Redex, forcing us to obtain a mechanization that loses some abstraction and clarity. This occurs, for example, when dealing with concepts specified in terms of a quantification over an arbitrary predicate: Lua function calls, operations that filter fields of a given table, etc.

This said, the formalization presented in this paper and its mechanization are remarkably similar. There are, however, four groups of differences that are worth mentioning.

6.1.1 Particularities of the Tool

Redex fully supports the set of concepts of reduction semantics with evaluation contexts; namely, the definition of languages, evaluation contexts, meta-functions, and reduction rules. In particular, reduction rules are described as relations over terms of a defined language. In practice, this means that we must represent some semantics elements of our model as phrases of a language. For instance, we presented the stores as partial functions in Sect. 3, but they must be defined as terms in the mechanization. This is observable for instance in the rule for updating a variable:

The same rule in Redex is written as follows:

figure bp

where the \(\sigma \) store is now described in terms of a phrase, and vsp is a new syntactic class for describing a pair containing a reference and a value, something that we did not considered part of the language in the pen-and-paper formalization.Footnote 9 Also, the substitution of the value associated with the reference r by the value v_2, is now performed using pattern matching and a term template [16], which builds a new \(\sigma \) storage that differs from the first one just in the value associated with the reference r: now, v_1 is replaced by v_2 (a pair of a reference and a value is noted with (rv)).

While in this simple example, the distance is short, in some complex rules, the difference becomes more noticeable.

6.1.2 Particularities of the Use of the Semantics

When using our model to build a tool to check misuses of weak tables [38] (briefly introduced in Sect. 7), we had to introduce two changes in order to better emulate the behavior of a garbage collector. First, we change the introduction of local variables (e.g., Local-Decl in Sect. 4.2.2) and the instantiation of formal parameters of functions (e.g., E-Call in Sect. 4.2.3). We do it in a way that the variables’ references that are still in scope are maintained (in some structure) through the whole execution of the code in their scope.

In a common implementation of a compiler of a programming language with reachability-based garbage collection, a variable that is in scope is always considered reachable, regardless of its actual reachability from what is left to be computed. This is so because of the way in which the root set of references is calculated (for example, just by looking at the stack of a function call).Footnote 10 On the other hand, in the model presented so far, once that a variable is not reachable from what is left to be computed in its scope, we would consider it unreachable. It does not matter if the code that belongs to its scope is still under execution. This difference with real implementations naturally introduces some observable effects in the context of a programming language with interfaces to its garbage collector, as it happens with Lua 5.2.

In order to better approximate the behavior of a real implementation, we keep the references that correspond to introduced variables within a local statement or function call, annexed to the code in which they are replaced. This allows for a simple representation of the set of references of variables that are in scope, regardless of their presence in what remains to be computed. Second, and for similar reasons, closures are manipulated through references, analogous to the manipulation of tables. In fact, this is closer as to how the official implementation of Lua works, and allows us to have a model that better represents the common phenomenons associated with garbage collection.

6.1.3 Particularities of the Official Interpreter

In order to stay close to the official interpreter and its corresponding test suite, we had to enrich our specification with implementation details that might not be strictly considered as belonging to the semantics of the language. In many cases, these were just corner cases of the services of the language and its standard library (discussed in Sect. 6.3). But a more interesting example is how to deal with mutually recursive references between metatables and handlers, as the following program illustrates

Example 13

(Mutually recursive references in metatables.)

figure bq

From the point of view of the semantics of metatables (Sect. 4.4), the previous program is an infinite loop: the instruction in line 13 triggers the metatable mechanism, looking for the metatable of to index it with the key ”__newindex” (rule M-UpdNotFun in Fig. 17), obtaining the handler. But this is itself, as specified in line 13, repeating the assignment in an endless loop.

While the reference manual does not specify a particular way to handle these situations, the official interpreter implements some intelligence that is capable of recognizing these loops. In particular, the execution of the previous program results in an error with the message “loop in settable”.

Given that the previous behavior is tested in the official test suites, we include it into our semantics by maintaining, in the instruction term, the tids to the metatables is used while trying to solve a labeled term. In this way, we can recognize when the same metatable is being indexed repeatedly and raise the appropriate error.

6.1.4 Particularities of the Implementation: The Service

Like many scripting languages, Lua includes a service to execute an arbitrary program coded in a string. This service, called , has little interest from the point of view of its semantics: given a string , it returns a function that, when called, executes the code written in . The service returns if there are syntax errors or unbound identifiers in the resulting program.

From the point of view of its mechanization, however, it is interesting to mention that its inclusion was rather trivial: our parser and the desugaring process are both written in Racket, the underlying language of Redex, and Redex itself.

As in [34], we also split desugaring into 2 pieces: a compiler from Lua source code to our language that produces (possible) open terms; and a definition of an evaluation context into which the previously compiled term is plugged. This context contains the standard implementation of the execution environment: a table, containing the set of bindings to library services, which, in our model, are just invocations to the services through the construction. This is our definition of the original execution environment _ENV (see Sect. 2.2). Under normal circumstances, load will assume that the global variables in the interpreted code refer to said environment: they are just fields of the table bound to _ENV. However, the user can pass, as an argument to load, a table that will be used as a replacement to _ENV. In that case, global variables will be interpreted as fields of the given table.

6.2 Tests Coverage

Fig. 20
figure 20

Lua 5.2’s test suite coverage

From the 25 files present in Lua’s test suite [21], we successfully run 12 files (including garbage collection; see [38]), with varying degrees of coverage (Fig. 20). Each file from the test suite is a sequence of assertions about the expected outcome of the code. Those files and lines not tested are due to the following reasons:

  • Language features not covered by our formalization: coroutines, the goto statement, some standard library functions (mostly related with file handling), and other standard libraries implemented in C (bit32, coroutine, debug, io, etc.).

  • Tests covering implementation details of the interpreter and not relevant to the language’s semantics: tail call implementation, manipulation of large tables (not tested for performance reasons), generation of bytecode, and the like.Footnote 11

In pursuit of reasonable running times for each module of tests, we divided each .lua file in smaller modules. Also, when necessary, we reduced the iterations performed by loops and added explicit calls to the garbage collector (16 LOCs, affecting results in a few other lines). Additionally, while looking for more coverage, we changed a handful lines in modules errors.lua and strings.lua to remove services that are not yet included. In the case of errors.lua, the lines were in functions used throughout the code to test the error messages. By modifying them, we obtained weaker predicates for testing our formalization of the error mechanism, without testing details of the produced error messages.

With the addition of parallel threads of execution, enabled by modern-day processors, the whole selected fraction of Lua’s test suite can take, roughly, 26 min to completion without testing garbage collection, and 1:22 h including it.Footnote 12 Therefore, it is possible to add the execution of these tests in CI to ensure that new features and fixes included in the language do not break existing ones.

The accompanying material includes the modified files with relevant information to inspect the changes and run the tests.

6.2.1 Improving Performance Thanks to Semantics-Preserving Garbage Collection

As mentioned, in order to reduce the timings of the test, we introduce explicit calls to our mechanization of Lua’s garbage collector. To understand the reasons, consider the execution of the following Lua program:

figure by

Without the intervention of the garbage collector, each iteration of both loops generates garbage that will persist in the configurations until the end of the execution.Footnote 13 Given the computational model of reduction semantics—in which at each step, the whole configuration is matched against the (possible) several patterns of the left side of each execution rule—increasing the size of the store impacts negatively on the performance of this task. Hence, if we reduce the size of the store during run time, in a way that preserves the semantics of the program, we obtain some performance benefits without compromising soundness.

For instance, the execution of the previous program takes roughly 2 minutes on the same hardware. By adding a single call to the garbage collector, right after the first loop, the execution time is shortened by more than a half. This said, in smaller configurations the performance of the execution of the garbage collector can also impact negatively. For example, if we consider a modification of the previous program where both loops iterate 10 times, the program that performs garbage collection after the first loop is slower (by a couple of tenths of a second) than the one that runs until the end carrying all the garbage generated by both loops. We understand this as the cost of having to execute our garbage collector, which inherits some of the complexities of Lua’s—yet not its efficiency.

Together with the modularization of the code into smaller files, adding explicit calls to the garbage collector allowed us to mitigate the known problems with performance that comes with mechanizing an interpreter in Redex [35], and in a semantic-preserving way [38].

6.3 Random Testing of Properties

Redex’s features for implementing formal systems and random testing (redex-check[15]) allowed us to mechanize the desired check for well-formedness of programs and to stress test its soundness property, together with the expected determinism of the semantics.

Interestingly, the randomly generated terms showed unexpected errors and omissions in our mechanization that were not caught during testing against the test suite. As it turns out, redex-check proceeds completely uninhibited, unaware of the semantics [17], providing intricate, hard-to-decipher terms that proved to be useful to uncover:

  1. 1.

    Missing border cases and implicit coercions of the parameters of some library services.

  2. 2.

    Erroneously handled cases of the table constructor, mostly related with the generation of numeric keys.

  3. 3.

    Misinterpretations from our part of the reference manual.

  4. 4.

    Errors in our first attempts at defining a suitable notion of well-formed configuration for our semantics.

With respect to 1, these omissions can be understood as behavior that is assumed or which does not belong to the essentials of the semantics. Naturally, the later cases cannot be eluded from our semantics, since we aim at an executable model. As an example, the results of the following invocations of library services cannot be inferred from the reference manual:

Example 14

(Special cases in library services)

figure ca

In the call to the third argument is negative, a situation that—while mentioned in the reference manual—its semantics is not actually specified. In the call to the parameter is not an integer. In this case, the official implementation takes the floor, although this is not mentioned in the manual.Footnote 14

Even though our first attempt at mechanizing the semantics contained several of these errors, it actually managed to pass the tests. We remark that it is not necessarily a problem of the tests of the interpreter, since several services are just wrappers around already tested C’s libraries.

In this scenario, using redex-check, for testing easily verifiable properties like progress, served as a lightweight approach to stress testing our implementation in search of omissions and errors, with a reasonable time overhead.

6.3.1 Preparation of Test Cases

Naturally, if run inhibited, most of the terms generated by redex-check are ill-formed, since the grammar does not enforce well-formedness. To alleviate this, it is possible to provide redex-check with a preparing function that manipulates the generated term and turns it into a well-formed one, when possible.

Additionally, to obtain a better coverage of the rules, redex-check offers the possibility of generating terms following the pattern of the left-hand side of each of the rules. However, while we obtain full coverage, this selection of patterns is too restrictive: the properties that we are interested in random testing, are predicates quantified over a well-formed configuration \(\sigma :\theta :\textsf {s}\), i.e., not just terms that matches against the left-hand side of the semantic rules. Therefore, to increase the confidence on the testing, we generate just a fraction of the total cases following the semantics rules, and ask redex-check to generate the remaining terms by just following the pattern \(\sigma :\theta :\textsf {s}\).

6.3.2 Properties Tested and Results

For every well-formed term generated, we tested progress and preservation of well-formedness, together with the determinism of our semantics. For a run of 50,000 attempts, we obtain, in an average of ten runs, close to 43,000 well-formed configurations that successfully pass the tests, with complete coverage of the rules.

To better understand the high percentage of well-formed configurations, note that our preparation function takes ill-formed terms and tries to turn them into well-formed ones, also attempting to directly replace conflicting sub-terms by (simpler) well-formed ones.

During random testing, the terms produced by redex-check may grow in size and complexity after each failed attempt at generating a term. It is possible to specify a rate at which the terms grow in size, by providing a bound to it. According to the documentation of redex-check, said bound “bounds the height of the generated term (measured as the height of its parse tree)”. We used the default bound growth rate which grows as the base 5 logarithm of the number of previous attempts at generating terms. To get a general idea of the complexity of the terms generated, on a run with 50,000 attempts, the maximum bound used is 6, which may generate configurations with programs consisting of, roughly, 250 statements (each with its expressions), and stores with 30 elements (without counting those added by the preparation function).Footnote 15

7 A Use Case: Modeling Garbage Collection

In this section, we present a brief tour to an existing application of the dynamic semantics presented so far. The content is a distilled version of [38].

7.1 Garbage Collection

As we already mentioned, we used our dynamic semantics to model garbage collection (GC) in Lua. In essence, the task reduces to include a non-deterministic semantics rule that performs a GC cycle. Now, in order to reason about its behavior, we need to study it in the context of a complete dynamic semantics and see how it interacts with the rest of the semantics model. For plain GC, i.e., without considering the interfaces with the garbage collector, we expect each collection step to preserve the semantics of the program. This amounts to observing the same behavior of the program before and after the GC step.

A more interesting situation arises when we include the interfaces with the garbage collector: weak tables and finalizers. Weak tables are tables whose keys and/or values are referred by weak references. These references are not taken into account by the garbage collector when determining reachability and can, therefore, be collected at any time. Finalizers are similar, but not equal, to destructors in languages with explicitly managed memory. Such interfaces might introduce observably diverging results. Our work enables the development and certification of static analyzers for ruling out potentially dangerous non-determinism from a program.

In this section, we first present LuaSafe (Sect. 7.2), a tool that uses the modeled semantics of weak tables to ensure deterministic execution of programs, and then we take a sneak peak at how we model (simple) garbage collection (Sect. 7.3).

7.2 LuaSafe in a Nutshell

LuaSafe is a prototype static analyzer that aims at the detection of misuses of weak tables that could lead to non-deterministic behavior. While the general problem is known to be undecidable [3], we propose an approximation to the solution by combining techniques from statics semantics (type inference, type checking, and data-flow analysis) together with weak tables’ semantics.

For a given Lua program p, we say that p is gc-safe if it exhibits a deterministic behavior under \(\mapsto \) extended with GC and its interfaces. Then, we denote with \(P_{ safe }\) the set of gc-safe programs. In our approach, we aim at taking a user program and trying our best to guess if it belongs to \(P_{ safe }\), without asking the user for modifications of the program or to use weak tables according to some particular idioms, as proposed in [3].

In order to achieve this, LuaSafe proceeds in a series of steps, including a type system which makes heavy use of the formal tools required to understand Lua’s GC. In the coming section we make a dip at the underlying formalism.

7.3 Modeling Garbage Collection

The purpose of GC is to remove from memory (the store) information that will not be used by the remaining computations of the program. One of the simplest and commonly used approaches to find such information is based on the notion of reachability (e.g., [6]). The idea is simple: given the set of references that literally occur in the program (the root set), it must be the case that any information (e.g., value in a store) that may be used by the program must be reachable from that set. Conversely, any binding (a reference with its value) in the store that cannot be reached from the root set, will not be accessible from the program and, therefore, can be safely removed as it will not be needed in the remaining computations of the program.

Lua implements two reachability-based GC strategies: a mark-and-sweep collector (the default) and a generational collector. In this section, we will provide a specification for the behavior of a typical reachability-based GC that encompasses the essential details of the behavior of the two algorithms included in Lua and any other based on reachability.

In the context of this work, those values which are not reachable will be called garbage. This notion, sufficient to model Lua’s GC, is purely syntactic: it will take into account just the literal occurrence of references in the program, or their reachability from this set of references that occur literally, to determine if a given value is garbage or not. In contrast, there are approaches, to identify garbage, where also the semantics of the program may be taken into account (e.g., [3]).

To formally capture the notion of garbage, it will be easier to begin with the definition of reachable references. The only difference worth to mention, in comparison with common definitions found in the literature [6, 19], is the inclusion of metatables: a metatable of a reachable table is considered reachable, so a reachability path, that is, a path between a reference and the root set, might also go through a metatable.

Informally, a location (value reference or an identifier) will be reachable with respect to a given term t, and corresponding stores, if one of the following conditions hold:

  • The location occurs literally in t.

  • The location is reachable from the information associated with a reachable location. This includes

    • The location is reachable from the closure associated with a reachable location.

    • The location is reachable from the table associated with a reachable location.

    • The location is reachable from a metatable of a reachable table identifier.

This is formalized in the following definition:

Definition 2

(Reachability for Simple GC) We say that a location \(\textsf {l} \in \textsf {r} \cup \textsf {tid}\) is reachable in term t, given stores \(\sigma \) and \(\theta \), iff:

$$\begin{aligned} \mathsf {reach}{(\textsf {l}, \textsf {t}, \sigma , \theta )}= & {} {\textsf {l}} \in \textsf {t} \; \vee \\&(\exists \textsf {r} \in \textsf {t}, \mathsf {reach}(\textsf {l}, \sigma (\textsf {r}), \sigma \setminus \textsf {r} , \theta )) \; \vee \\&\exists \textsf {tid} \in \textsf {t}, (\mathsf {reach}{(\textsf {l}, {\pi _{1}}(\theta (\textsf {tid})), \sigma , \theta \setminus \textsf {tid}) \; \vee }\\&\qquad \qquad \mathsf {reach}(\textsf {l}, {\pi _{2}}(\theta (\textsf {tid})), \sigma , \theta \setminus \textsf {tid})) \end{aligned}$$

The model proposed in [38] manages closures through references, in a similar fashion as with tables. Here, we avoid that aspect in order to keep it simpler and closer to the model presented in this work, keeping in mind that it is only and approximation of the actual notion of reachability we use in the cited work.

We write \(l \in t\) to indicate that l occurs literally in term t and write \(\gamma \setminus l\) as the store obtained by removing the binding of l in \(\gamma \). Informally, this predicate states that either l occurs in t, or there is a reference in t such that l is reachable from it.

To avoid cycles generated from mutually recursive definitions, in the stores that would render undefined the preceding predicate, we remove from the stores the bindings already considered. We assume the predicate is false if a given location occurs in t but does not belong to the domain of any of the stores.

Note that for a table tid, we not only check its content (\(\pi _1(\theta (\textsf {tid}))\)) but also its metatable (\(\pi _2(\theta (\textsf {tid}))\)). That is, a table’s metatable is considered reachable when the table itself is reachable. Observe that, being metatables ordinary tables, they can contain other tables’ ids or even closures, which in turn may have other locations embedded into them. Naturally, if metatables were not taken into account for reachability, we could run straight into the problem of dangling references any time a metamethod is recovered from the metatable. Also, note that during the recursive call \(\mathsf {reach}(\textsf {l}, \pi _2(\theta (\textsf {tid})), \sigma , \theta \setminus \textsf {tid})\), at first it will determine if l is exactly \({\pi _2(\theta (\textsf {tid}))}\) (because it asks for \(\textsf {l} \in \pi _2(\theta (\textsf {tid}))\), for \(\pi _2(\theta (\textsf {tid}))\) being either nil or a table identifier) and, if not, it will continue with the inspection of the content of the metatable, by dereferencing its id, given that it is not nil. Hence, we do not remove \(\pi _2(\theta (\textsf {tid}))\) from \(\theta \) in the mentioned recursive call.

7.3.1 Specification of a Garbage Collection Cycle

We keep abstract the specification of a cycle of GC in order to accommodate to any implementation of GC:

Definition 3

(Simple GC cycle) \(\mathsf {gc}(\textsf {s}, \sigma , \theta ) = (\sigma _1, \theta _1)\), where:

  • \({\sigma = \sigma _1 \uplus \sigma _2}\)

  • \({\theta = \theta _1 \uplus \theta _2}\)

  • \(\forall \mathsf {l} = \in \mathsf {dom}(\sigma _2) \cup \mathsf {dom}(\theta _2), \lnot \mathsf {reach}(\mathsf {l}, \textsf {s}, \sigma , \theta )\)

We use \(\gamma _1 \uplus \gamma _2\) to denote the union of stores with disjoint domains. This specification states that \({{\mathsf {g}}}{{\mathsf {c}}}(\textsf {s}, \sigma , \theta )\) returns two stores, \({\sigma _1}\) and \({\theta _1}\), which are a subsets of the stores provided as arguments, \({\sigma }\) and \({\theta }\). We do not specify how these subsets are determined. We just require that the remaining part of the stores (\({\sigma _2}\) and \({\theta _2}\)) do not contain references that are reachable from the program s. Satisfied this condition, it is safe to run code s in the new stores \({\sigma _1}\) and \({\theta _1}\), as no dereferencing of a dangling pointer may occur.

Observe that the previous specification does not impose \(\sigma _1\) and \(\theta _1\) to be maximal, meaning they might have non-reachable references with respect to s.

Using the previous specification of GC, we can extend our model of Lua with a non-deterministic step of GC:

figure ce

We require it to actually perform some changes to the stores to ensure progress. This obviously introduces non-determinism: at any time, as long as there is some garbage left, we can choose to collect the garbage or to continue with the execution of the program. But, for the definition provided so far, this non-determinism should not change the behavior of the program: every execution path will eventually lead to the same result. This property will not longer be true when extending GC with the interfaces with the garbage collector present in Lua 5.2. We further develop the model in the cited bibliography.

8 Related Work

As mentioned throughout the text, the major source of inspiration are the line of work by [8, 34, 35]. The first two of these works formalize JavaScript, and the third one formalizes Python. At a broad view, we share with these works several of the characteristics of the semantic models and their corresponding mechanization in Redex, although there are several differences important to note. First, our model includes particularly interesting features, like Lua’s mechanism for meta-programming. While similar to Python’s metaclass construction, this feature is not covered in [35]. Second, we purposely do not focus on identifying a core set of features, but instead model the language as a whole (approach enabled by Lua’s relatively small size). As mentioned in Sect. 6, we have a very thin desugaring process, staying very close to the surface language. Fourth, and related to the previous one, we tested our mechanization using the language’s own test suite directly within Redex, instead of writing another interpreter, as done for performance reasons in [8, 35]. This increases the confidence in our mechanization, at the expenses of spending more testing time. Lastly, we pay special attention on the specification and random testing of the model’s properties.

By not following the core language approach, we avoided known complexities in the resulting model [2, 29]: verbose desugared code and a reduced confidence on the compliance of the given semantics with respect to the original language’s specification. Additionally, maintaining the proximity with the original language paves the way to a mechanization that also Lua developers could use and verify, as their intuition about the language’s semantics is better expressed. This goal is shared with WebAssembly, whose semantics is formally specified in its core documentation [36], and introduced in [9].

Another major step in formal semantics for JavaScript is JSCert [2]: a formalization of ES5 in the Coq proof assistant, together with an interpreter extracted from the formalization (JSRef). It presents a big-step semantics for the specification of ES5. In order to gain confidence about the compliance of their formalization with respect to the specification of ES5, the authors recognize the importance of the revision of their Coq model by people of different areas, ranging from developers of analysis tools for JavaScript, developers of JavaScript VMs and even ECMA authors. In our project, it is our hope to include Lua developers. This may be difficult to achieve over a Coq model because, as the authors from [2] recognize, using proof assistants requires considerable more learning than using, for example, tools specifically designed for mechanizing language specifications. While we pursue in the future the mechanization of proofs of properties of our model, perhaps using Coq, we want to take advantage of the ease of use of Redex.

Besides the aforementioned [8, 29, 34] introduces a small-step operational semantics for the full language specified in ECMA-262 Standard, 3rd Edition, including proofs of several properties of the model. The authors recognize that, by defining the semantics of each construction as described by the ECMA specification, it gives them the greatest likelihood that the model is correct. While this approach resulted in a model that inherited the size and complexities of the language being defined, the experience is interesting for our investigations on Lua, as we are dealing with a smaller, simpler language.

Specific to Lua’s semantics, to the best of our knowledge, there is just one other experience for Lua 5.2: [20] presents an operational semantics for Lua, in the style of Featherweight Java [14], i.e., with a strong focus on recognizing a core language. It considers a subset of the features from the ones presented here and provides a reference implementation in Haskell. Lua 5.1’s dynamic semantics has been studied in [31], together with type inference, to obtain an optimized compiler.

9 Final Thoughts and Future Work

This project aims at understanding Lua to the point of being able to construct certified tools, being [38] our first step in this direction. Currently, we are at the point where it is possible to analyze realistic programs, perhaps with coroutines and goto being the most salient missing features.

With respect to the mechanization, Redex proved to be a great tool for the semanticist: it is flexible and it is equipped with a well-suited collection of tools. However, as is expected with a dynamic language as Redex, this flexibility is at times a double-edge sword, as it gives path to several simple bugs to crawl in, bugs that are easily avoided with a type system. This said, we ended up eliminating a large amount of bugs by random testing the semantics.

The interpreter resulting from the mechanization is significantly slow, but it is of no surprise taking into account that the whole mechanization takes about 8k LOCs (without counting comments and tests). Some researchers [8, 35] decided to construct another interpreter by hand, testing this interpreter instead of the mechanization in Redex. In our case, we were still able to execute files with hundreds of lines of code within Redex, by splitting the files and/or adding specific calls to the garbage collector.

An interesting possibility to explore is to port and formally prove the properties in a proof assistant. Then, one could extract a verified interpreter based on abstract machines that should, in principle, run faster than executing the model within Redex, that follows the computation model of reduction semantics. Also, this automated step should help in building trust about the correspondence of the obtained interpreter and the original formal semantics. We are currently interested in that venue of research, and leave for future research extending and updating the model to current versions of Lua.

From the missing aspects of the semantics, the most interesting ones from a semantic point of view are goto and coroutines. About the former, it is relevant to mention that it is not a widely used keyword. A code searchFootnote 16 shows only 37k occurrences of “goto” in about two millions of files, including a significant fraction of spurious occurrences in comments. As such, it seems the effort required to extend the semantics seems unjustified: program configurations has to be enriched to include information about possible jumps and always keep the already-executed code. This is studied in detail in [18] for a C-like language.

Coroutines, on the other hand, are significantly more used (a similar search returns about 164k hits). Luckily, they are already understood from a formal point of view: [32] presents a reduction semantics specifically tailored for expressing Lua’s coroutines, and it seems it could be easily added on top of the concepts already present in our model.