1 Introduction

Program sketching [1] is a program synthesis approach [2,3,4,5] where developers write partial programs that have “holes” and let the synthesizer fill in the holes based on the given specification, which is usually provided in the format of test harnesses or reference implementations. Existing sketching approaches [1, 6] translate the partial program to propositional satisfiability formulas and leverage SAT solvers to generate a program that satisfies all constraints. While these translation-based approaches have shown their effectiveness on a range of small well-defined domains, they have a key limitation: when applying to real applications with libraries or programs with complex constructs like reflection, these translation-based approaches require either translating all libraries that are invoked directly or indirectly by the given sketch or creating models of all those libraries and complex constructs, which can lead to impractical SAT problems.

To tackle this limitation, we introduce EdSketch, a novel approach that performs execution-driven sketching to synthesize Java programs using backtracking search. The key novelty of our work is to introduce an on-demand candidate generation technique that leverages runtime behavior to substantially prune a large amount of search space and efficiently explore the actual program behaviors in the presence of libraries. To illustrate, consider trying to sketch a while condition as well as the body of the while loop, if a test execution raises an exception upon evaluating a specific candidate for the while loop condition, all candidates of the while loop body are pruned from search for that choice of the candidate condition expression. If the while loop body is not executed, our approach for lazy candidate generation will not create any candidates for the while loop body, which may contain thousands of candidates. When a test fails due to either a runtime exception or a test assertion failure, the parts of the candidate program that were executed directly determine the generation of the future candidates.

As inputs, EdSketch takes a sketch (partial program) with holes written in Java syntax and a test suite that characterizes the correctness specification. EdSketch executes the test suite against the sketch and backtracks the search when it encounters a failure (runtime failure or test assertion failure) and tries the next candidate. EdSketch terminates when the space of candidate programs is exhausted or a complete program that satisfies all tests is found.

EdSketch supports four kinds of holes: expressions (e.g., field dereferences), arithmetic operators ({+, −, \(\times \), /}), Boolean conditions (e.g., while loop), and blocks of assignment statements. To initialize the search, EdSketch generates candidate expressions for the holes based on the target type and variables given by the user. For instance, using up to two field dereferences, the expressions of the type Entry derived from a variable e that represents an entry in a singly linked list should be {e, e.next, e.next.next}, where the field next represents the next entry in the linked list.

EdSketch introduces pruning strategies to expedite the sketching process. These strategies prune redundant candidates for assignment statement blocks and condition expressions. If the program violates a pre-defined pruning rule, EdSketch backtracks immediately and tries the next alternative candidate. For assignment blocks, we define a set of pruning rules based on the program isomorphism and Java syntax. For conditions, we introduce a value grouping strategy that splits all condition candidates into two sets based on their values (true and false) of the current iteration, and continue splitting the two sets based on the evaluated values for each iteration.

We embody our approach in two forms of prototypes: EdSketch-JPF, a stateful search based on the Java PathFinder model checker [7]; and EdSketch-JVM, a stateless search based on re-execution inspired by the VeriSoft model checker [8].

Experimental results show that EdSketch ’s performance compares well with the SAT-based Sketch synthesizer [1] based on a dataset of small yet complex data structures. Out of 43 sketching tasks, EdSketch outperforms the Sketch synthesizer on 40 tasks. Using three recursive data structures, EdSketch successfully completes all of them in a second, whereas Sketch synthesizer fails in one given the same setting as EdSketch. The experiments also show that our pruning strategies are able to prune an average of 35% of candidates before evaluating them against the tests. Moreover, EdSketch completes some sketches that require handling reflection, I/O, native calls, and external libraries.

This paper makes the following contributions:

  •  Execution-driven sketching We introduce an execution-driven approach for program sketching that lazily generates candidates on-demand by leveraging the runtime behavior;

  •  Pruning strategies We introduce pruning strategies derived from the Java syntax to reduce the search space of candidates that must otherwise be explored;

  •  Embodiment We embody EdSketch into two prototypes: one based on the stateful model checker Java PathFinder [7], and the other based on a dedicated stateless backtracking search using re-execution in the spirit of the VeriSoft model checker [8];

  •  Evaluation We compare EdSketch with the SAT-based Sketch synthesizer based on a set of small but complex data structure subjects. We illustrate EdSketch ’s ability to sketch partial programs in the presence of libraries, recursive procedures and advanced features like reflection.

2 Motivating example

Fig. 1
figure 1

A sketch example for singly linked list reversal

We use a sketching task of reversing a singly linked list to illustrate our approach. Assume that the users want to implement a reverse() method for the singly linked list. Each list has a head entry, and each entry has a next entry and a value integer as fields. To reverse the singly linked list, the users have the notion that they need a while loop to traverse the list, and some local variables are required to record the current entry, previous entry and the next entry during the list traversal. They are also able to provide unit tests that specify the desired behavior as shown in Fig. 1c. However, it is hard for the users to complete the detailed implementation for the condition and the body of the while loop.

EdSketch enables users to provide high-level insights and leaves the implementation details to the synthesizer. Based on the high-level understanding of the problem, the users provide three local variables with the type Entry and a while loop to traverse the list, and leave the condition of the while loop as well as its body to be synthesized by EdSketch. Figure 1a shows a code skeleton written by the users that contains unknown while condition at line 7 and an unknown block of assignments at line 8. To specify the correctness criteria of the program sketch, the users write a JUnit test case that contains three entries of the linked list, as shown in Fig. 1b.

This sketching task is not trivial: Consider four visible variables (head, ln1, ln2, ln3), their field dereferences (given a minimum number of field dereferences required for this sketch, the candidate field dereferences are head.next, ln1.next, ln2.next, ln3.next) and a default value null for non-primitive types, the candidate for each assignment can be \(8\times 9\) given that null can only be the right-hand-side expression, and thus the search space for four consecutive assignment block is \(72^4\) (The sketch requires at least 4 assignments). Moreover, considering a condition expression as a combination of left-hand-side and right-hand-side expression together with a relational operator (either != or == for non-primitive types), the total search space for this sketch is 4.4 billion (\(72^4 \times (9\times 9\times 2)\)).

EdSketch dynamically selects candidates for the sketch invocations (line 7 and line 8) when it executes the given test cases. When EdSketch first reaches the while condition hole, EdSketch groups all condition candidates based on their evaluated value (true and false), and non-deterministically considers two Boolean possibilities for the execution. When the value false is considered, the unknown assignment block in the body of the while loop will not be executed, and thus all candidates of the assignment block hole are ignored and EdSketch will not create any additional search space for the assignment block hole.

When EdSketch first reaches the assignment sketch at line 8, it non-deterministically selects expressions for the right-hand-side and the left-hand-side of the assignment statement. A simple search may explore many candidates that are subsumed by other candidates which are already being explored. EdSketch prunes a number of such candidates based on the program isomorphism (Sect. 3.5). EdSketch backtracks its search if the current choice fails due to a runtime exception or a test failure. By default, EdSketch incrementally adds one assignment at a time until it finds the first solution or reaches the pre-defined upper bound on the number of assignments.

Figure 1c presents a solution generated by EdSketch based on a single test case shown in Fig. 1b. In this example, EdSketch finds the first solution in 9 s after exploring over 490 thousand candidates.

Fig. 2
figure 2

EdSketch architecture diagram

3 Approach

We describe our execution-driven sketching in this section. We first define the syntax of unknown holes h in Sect. 3.1. As shown in Fig. 2, EdSketch constructs all candidates based on the visible variables exprList specified in the sketch P and the bound of field dereference b (Sect. 3.2). If the sketch contains assignment block holes, EdSketch instruments the program so that it can dynamically select candidates for the holes (Sect. 3.3). Section 3.4 discusses our backtrack engines for stateful prototype (EdSketch-JPF) and stateless prototype (EdSketch-JVM). Section 3.5 describes the pruning strategies we apply to sketch assignments and conditions when evaluating candidates via the test execution (T).

Fig. 3
figure 3

Syntax of partial expressions

3.1 Partial expression syntax

Figure 3 denotes the syntax of the holes. We define two basic types of partial expressions for the sketches: expression holes and operator holes. The atomic expression holes (EdSketch.EXP) represent visible variables, constant values, and field dereferences. As to the operator holes, we define arithmetic operators {\(+, -, \times , /, \%\)} (EdSketch.AOP) and relational operators {\(==, !=, >, <, \leqslant , \geqslant \)} (EdSketch.ROP). EdSketch generates composite expressions by combining expression holes with operator holes, including arithmetic operators, relational operators and logical operators (|| and &&). Composite expressions can further combine together to generate complex expressions. For instance, we define a hole for conditions (EdSketch.COND) as two expression holes at left- and right-hand side combined with a relational operator. Both sides of expression holes in the condition can be replaced by infix expressions (e.g., a+b) with arithmetic operators, and the condition holes can further be combined together with logical operators to support multiple clauses. Moreover, EdSketch provides the assignment block holes (EdSketch.BLOCK) to specify a list of consecutive assignments.

To specify these holes in Java syntax, EdSketch provides a list of method invocations that take three parameters: an object list that contains all visible variables and default values (null or \(0, 1, -\,1\)), a hole id to distinguish different holes for the same type, and the target type of the generated candidates as an optional parameter. The hole id is used to distinguish different holes, it must be unique within a type of the holes, i.e., two condition holes cannot have the same id whereas a condition hole can have the same id as a block hole. In our motivating example of Fig. 4, the user assigns an identifier 0 to the while condition and an identifier 0 to the assignment statement block sketch. If user does not specify the target type of the hole, EdSketch takes the first two parameters and treats the target type as another hole of object type. EdSketch enumerates all types that can be derived from the given object list within the bound of the field dereferences to fill in the hole of the object type. Note that SAT-based Sketch synthesizer [9] also requires users to provide candidates for the “holes” (Fig. 6).

EdSketch provides a set of Java method invocations to specify different types of unknown holes. These APIs are treated as normal Java methods, thus our sketches can be directly compiled and executed using the given test suite. We highlight two APIs provided by EdSketch as below using the example shown in Fig. 1:

Condition Hole The complete invocation of the while condition synthesis at line 8 shown in Fig. 1 is: EdSketch.COND(new Object[] {head, ln1, ln2, ln3}, 0, Entry.class), which represents all conditional clauses whose left-hand-side and right-hand-side expressions are either the given variables or their field dereferences, combined with the relational operator holes. As the type Entry is non-primitive type, the relational operator hole consists of candidates {==, !=}.

Block Hole The complete invocation of the unknown assignment block at line 9 shown in Fig. 1 is: EdSketch. BLOCK(new Object[] {head, ln1, ln2, ln3}, 0, Entry.class), which represents a block of assignments that consist of either the given variables or their field dereferences. The number of assignments in the block is up to the pre-defined bound.

The purpose of program sketching is to bridge the gap between users’ high-level insights of the expected program and the low-level implementation. Following the same spirit, EdSketch enables users to provide more insights by invoking a set of supportive methods.

Maximum number of field dereferencesEdSketch generates up to one field dereference by default and makes this bound configurable. For example, the user can specify EdSketch.BLOCK(...).setFieldDeref(2) in Fig. 1 to allow up to two field dereferences. With this setting, EdSketch will generate field dereferences {ln1, ln1.next, ln1.next.next} derived from the ln1 in Fig. 1.

Enable default values or not We define the default value for integer and double as {0, 1, -1}, boolean as {true, false}, non-primitive types as null, and the type String as empty string. The user can exclude the default values as EdSketch.BLOCK(...).enableDefault(false) in Fig. 1.

Maximum number of assignments By default, EdSketch generates no more than four statements for the block holes. EdSketch also allows users to specify the number of assignment statements using the method set- Length(), e.g., EdSketch.BLOCK(...).setLength(5).

Note that the purpose of these bounds is to leverage users’ insights to explore the search space in a more effective manner, yet there is no bound on how many holes users can introduce to the sketch. As long as the holes are covered by the test execution, EdSketch is able to generate candidate and validate whether there exists a solution that satisfies all tests.

figure a

3.2 Expression candidate generation

This section describes how we generate expression candidates. Shown as Algorithm 1, EdSketch leverages a breadth-first iteration to generate field dereferences for all provided variables within a pre-defined bound of b (line 3-12). EdSketch creates all field dereferences using reflection and iteratively adds generated field dereferences to the list of candidates. After generating all candidates, EdSketch selects expression candidates based on the target type of the “hole” specified in the method invocation. In particular, the object this is also regarded as a variable yet we do not generate its field references because these fields are already included as heap-allocated variables. The implicit length field for the array type is not reflected by the getFields() method [10], and thus we manually insert this field if there exist candidates with the array type.

For each non-deterministic “hole”, we put all its expression candidates in a vector called candidate vector. We assign each expression candidate a unique identifier, which is its index in the candidate vector. When EdSketch performs sketching, it dynamically selects a candidate identifier for each “hole” using non-deterministic choose() operator and executes the program based on the candidate it selects. All candidate identifiers for the holes are initialized as \(-1\) (Algorithm 1 line 14), indicating that EdSketch has not selected a candidate for this “hole”. Once EdSketch selects a candidate identifier for this hole, this candidate will be used consistently across all test cases. When EdSketch backtracks, it increments the candidate identifier and re-executes the program with the next candidate that corresponds to the new identifier of the hole.

3.3 Program instrumentation

To introduce non-determinism to the program and allow a backtracking search to explore the space of candidate programs, we need to instrument the sketches in the following procedures and transform them to executable programs that can be validated by the test execution.

Fig. 4
figure 4

The instrumented program for the example of singly linked list reversal

Loops and Recursive Procedures To get rid of infinite loops during the synthesis of while loop, following the same spirit of SAT-based Sketch synthesizer, we set up a bound for the while loop iteration and backtrack whenever the execution has exceeded the pre-defined bound, shown in Fig. 4 line 10–11. By default we set the bound EdSketch.LOOP_BOUND as 16 (the default loop bound for the SAT-based Sketch synthesizer) and make it configurable to the end users. For recursive procedures, we use a heuristic to check whether the method itself is directly called in its body, and introduce a configurable bound EdSketch.RECUR_BOUND as 16 at the beginning of the recursive procedures, shown in Fig. 11 line 2–3. We leave the checking of indirectly recursive procedures as future work.

Assignment Block Figure 4 presents the instrumented program for the motivating example, and highlights the newly instrumented code using the “+” signal. EdSketch generates a for loop to sketch the assignment statement block. This for loop enumerates each statement in the list, which is the return value of the assignment statement block hole EdSketch.BLOCK (...). In this for loop, EdSketch assigns values to the left-hand-side expressions for each assignment using a switch statement to select the left-hand-side expression based on the candidate identifier for the “hole”, and assigns the right-hand-side expression to the selected left-hand-side expression. We only generate case statements for variables and field accesses that can be assigned, and will not generate case statements for this object, unmodifiable fields like array.length, and default candidates such as null, \(0, 1, -1\). If the user does not provide the bound for the number of sketching assignments, EdSketch will add one assignment at a time until it finds the first solution or reaches the default bound for the number of sketching assignments.

figure b

3.4 Execution-driven sketching

As shown in Algorithm 2, EdSketch starts sketching the partial program by directly executing the test cases (line 913). Whenever it encounters a runtime exception or a test failure, it backtracks and fetches for the next choice until it has explored the entire search space or finds a solution that meets the correctness criteria. Shown as Function sketch() in Algorithm 2, whenever the test execution encounters an assertion failure or an exception, EdSketch throws a BacktrackException, increments the candidate identifier (incrementCounter()) and re-executes the test cases based on the new candidate identifier which maps to the next unexplored candidate. EdSketch prints a complete program if this program satisfies all test assertions.

We build two prototypes based on two different backtrack engines: a stateful prototype based on Java Path Finder [7] and a stateless prototype based on re-execution.

Stateful Prototype with Java PathFinderJavaPathFinder (JPF) [7] is a mature model checker that implements a customized JVM to efficiently store and restore program states. While JPF cannot handle Java native calls and can hardly scale up to large applications with libraries and millions lines of code, using JPF as the backend for sketching opens the future work possibility to sketch multithreading programs [11] with systematic path exploration.

Stateless Prototype Using Re-Execution Our second prototype is based on a dedicated stateless search [8] using re-execution [12]. Since this prototype executes on the standard JVM, it allows synthesis in the context of open-source projects with advanced features, such as reflection, I/O, and native calls.

3.5 Pruning strategies

Brute force search yields a huge search space of sketch candidates, whereas some candidates can be isomorphic with each other and some candidates can be detected as wrong solution based on pre-defined roles. With the purpose of pruning the search space of sketch candidates, we discuss our pruning strategies for sketching assignments and conditions in this section.

3.5.1 Assignment pruning

We define four pruning rules based on the Java syntax and program isomorphism analysis. We omit the proof of these rules as they can be easily derived from the Java syntax. These rules may prune the program based on one assignment (rule 1) or two consecutive assignments in the sketching block (rule 2-4). For the rules below, we use \(e_1\) to represent an expression which can be either variables or field dereferences, and use \(v_1, v_2, v_3\) to represent variables. The method id() returns the candidate identifier of this candidate.

(1) “\(e_1 = e_1\)”. If the left-hand-side expression is equal to the right-hand-side expression, the candidate is ignored as the assignment has no effect on the current program state. For example, any candidates that have the assignment ln1.next = ln1.next will be pruned in the motivating example as shown in Fig. 1.

(2) “\(v_1 = v_2; v_2 = v_1\)”. If the left-hand-side variable of the first assignment is the same as the right-hand-side variable of the second assignment, and the right-hand-side variable of the first assignment is the same as the left-hand-side variable of the second assignment, this candidate is omitted because the two assignments are subsumed by the assignment \(v_1 = v_2\). For example, in Fig. 1, the candidate will be pruned if it has two consecutive statements ln1= ln2; ln2 = ln1.

figure c

(3) “\(v_1 = v_2; v_1 = v_3\)”. If both left-hand-side variables in two consecutive assignments are the same and both right-hand-sides are variables, we do not need to execute this program because it is subsumed by the assignment \(v_1 = v_3\). For instance, the candidate in Fig. 1 with consecutive assignments ln2 = ln4; ln2 = ln3 will be ignored, because the two assignments are equivalent to a single assignment ln2 = ln3, which has been covered by the current search.

(4) “\(v_3 = v_1; v_2 = v_1\)” while \(id(v_3) > id(v_2)\). If two consecutive assignments have the same variable at the right-hand-side, we only execute the program if the identifier of the first assignment’s left-hand-side variable is smaller than that of the second assignment. We consider the consecutive assignments “\(v_2 = v_1; v_3 = v_1\)” and “\(v_3 = v_1; v_2 = v_1\)” as isomorphic solutions and we only evaluate isomorphic solutions once. For example, in Fig. 1, we will not execute the program with consecutive assignments “ln4 = ln2; ln3 = ln2”, as the candidate identifier of ln4 is bigger than that of ln3 and an isomorphic program with “ln3 = ln2; ln4 = ln2” has already been explored.

We also try to apply existing synthesizing optimization strategies to EdSketch. Counter-example-guided inductive synthesis technique (CEGIS) [4] has shown its effectiveness on a number of SAT-based synthesizers to add the validated counter-example to the propositional satisfiability formulas and further expedite the synthesizing process. It does not directly apply to EdSketch as EdSketch is purely based on execution and no SAT translation is involved. The candidate that has been validated against the test execution will never be generated again by EdSketch. Yet we borrow the idea of memorizing information from previous validation results to prune the candidates before validating them with the test execution. We implement a pruning strategy to reuse the validation result of the NullPointerException, i.e., if the first several consecutive assignments lead to a NullPointerException, we will not generate any assignment blocks staring with these assignments. However, our preliminary evaluation result based on the dataset specified in Sect. 4.1 indicates that the pure execution via JVM is often much faster than memorizing and further checking the validation conditions. Therefore, we keep the simplicity of the pruning strategies and leave the improvement in these strategies as future work.

Fig. 5
figure 5

EdSketch pruning example

3.5.2 Condition pruning

To sketch condition expressions including if conditions and while loop conditions, EdSketch first generates all condition candidates and splits these candidates into two groups based on their evaluated values (true and false). Algorithm 3 outlines how EdSketch sketches a condition expression.

Condition Candidate Generation During the first access of the condition sketch, EdSketch generates all condition candidates by combining expression candidates with relational operators rop. We define two relational operators {\(==, !=\)} for non-primitive types and six condition operators for primitive types {\(==\), \(!=\), >, <, \(>=\), \(<=\)}. The primitive type operators are also applied to corresponding wrapper classes such as Integer. We only need to consider the combination \(e_1\)rop\(e_2\) where the candidate identifier of \(e_1\) is smaller than that of \(e_2\) (\(id(e_1) < id(e_2)\)) based on the program symmetry. Assume that \(e_1\) and \(e_2\) are two non-primitive expression candidates, and the \(e_1\)’s candidate identifier is smaller than \(e_2\), we only need to consider the condition candidates \(e_1 == e_2\) and \(e_1 != e_2\) because \(e_2 == e_1\) and \(e_2 != e_1\) are equivalent to the previous two candidates. If we have five expression candidates with the primitive type int, EdSketch will generate 60 (\(6\times (4+3+2+1)\)) condition candidates by combining each candidate with the ones that have bigger identifiers using six condition operators. We also include two constant Boolean value true and false for completeness (\(e_1 == e_1\) and \(e_1 != e_1\)).

Condition Value Grouping The generated condition candidates are further split to two sets based on their evaluated values, shown as line 9 to 20 in Algorithm 3. If it is not the first access, EdSketch will re-evaluate each candidate in the set and split the candidate set based on the evaluated value of each condition candidate in the new execution. If EdSketch selects boolean value true in the previous execution, the trueSet will be split in the current execution, and vice versa. For example, the condition candidate ln1 != null in the motivating example may be true in an iteration, and its value may change to false in the next iteration. Therefore, the condition candidate ln1 != null will be put in trueSet in the first iteration and will be moved to falseSet in the next iteration.

EdSketch chooses a Boolean value at the end of each invocation based on the size of two candidate sets, shown as line 21 to 27 in Algorithm 3. If there is no candidate that is evaluated to be true, EdSketch will select false represented as 1 at line 22 in Algorithm 3. And if the set of candidates which are evaluated to be false is empty, EdSketch will select true represented as 0. The selected Boolean value is returned from the getCondition method together with two candidate sets. If the chosen value does not satisfy test assertions, EdSketch will backtrack to the previous choice, and select a different value based on the non-deterministic choose() operator.

To illustrate the efficacy of our condition value grouping strategy, we present three cases with and without value grouping and compare the number of transitions in each case. As shown in Fig. 5, assume the while condition has eight candidates (\( wc \in \{w_1, w_2,\ldots , w_8\}\)) and the if condition also has eight candidates (\( ic \in \{i_1, i_2,\ldots , i_8\}\)):

(1) “Best Case” In this case, we assume that there is only one candidate that is evaluated to be true for the while loop (\(wc^1\)) and false for the if condition (\(ic^1\)). \(wc^1\) indicates the while loop condition in the first iteration. In this case, no further choices will be created because the program terminates after executing this candidate; we term this case as the “best case” as shown in Fig. 5a. Shown as edges from one node to another, the “best case” requires 4 transitions with value grouping and 16 transitions without value grouping.

Table 1 Evaluation subjects

(2) “Worst Case” We assume that there is only one candidate for which the condition evaluates to false for the while loop (\(wc^1\)) and true for the if condition (\(ic^1\)) when it is executed the first time. EdSketch will keep creating non-deterministic choices until it exhausts all choices; we term this case as the “worst case”. As shown in Fig. 5b, EdSketch has 28 transitions with value grouping strategy, while the traditional approach without value grouping requires 64 transitions.

(3) “Equal Splitting” Lastly, we consider a case where candidates are equally split in each iteration. As shown in Fig. 5c, EdSketch with value grouping strategy requires 12 transitions while traditional approach requires 40 transitions.

Our example illustrates that our value grouping strategy effectively reduces the number of transitions in three cases.

4 Evaluation

To investigate EdSketch ’s efficacy of sketching partial programs, we curate two evaluation datasets: the first dataset consists of small but complex data structures including recursive procedures, these subjects have been used to evaluate SAT-based sketch synthesizers in prior works [1, 4, 13, 14], the second dataset contains partial programs in the presence of libraries and advanced features like reflection.

We address the following research questions in the evaluation:

  • How effective is EdSketch to sketch small but complex subjects compared to the SAT-based synthesizer?

  • How do the pruning strategies affect the search space of sketching?

  • Can EdSketch complete synthesis tasks in the presence of libraries in open-source projects and advanced language features such as reflections?

4.1 Sketching tasks with data structures

To study EdSketch ’s efficacy of sketching small but complex data structures, we select 10 subjects from java.util source code and algorithm book [15] as reference implementation. Based on the reference implementation, we transform all if and while conditions to condition holes, and all expression assignments as assignment holes [16]. We do not evaluate expression holes separately because they are part of the condition and assignment holes.

As shown in Table 1, the 10 subjects are: Binary Search Tree Insertion (BSTAS and BSTCD), Finding Median (MEDAS and MEDCD), Red-Black Tree Insertion (RBTAS and RBTCD), Singly Linked List Reversal (LLREV), Doubly Linked List Add First (DLLAF), Doubly LinkedList Add Last (DLLAL), Red-Black Tree Removal (RBTRM), Fibonacci numbers using recursion (FIB), Singly Linked List Insertion using recursion (LLINS) and Binary Search Tree Insertion using recursion (BSTIS). We evaluate the performance and pruning strategies of sketching assignments and conditions separately considering different pruning strategies to these two types of sketching. We use the first 10 subjects to compare the performance of EdSketch-JPF, EdSketch-JVM and SAT-based Sketch synthesizer and use the last 3 subjects to illustrate EdSketch ’s effectiveness to sketch recursive procedures.

To reach full branch coverage, we use 7 test cases from [17] for Find Median subjects and 4 test cases for the first 4 Fibonacci numbers in the subject of FIB. For the rest of subjects, we use Korat [18] to generate bounded exhaustive test suites. Korat is a test generation tool that uses given constraints to guide the generation of bounded suites. We use bounded exhaustive test suite up to three nodes for binary search tree, singly linked list, and doubly linked list, and test suite up to four nodes for red–black tree. We sort the test cases based on the number of nodes and execute EdSketch with test cases in ascending order.

Table 1 lists the average search space for the first five assignments or conditions. The subjects can have more than five partial expressions (RBTCD has 7 condition holes) or only 4 non-deterministic holes, whose search space of the fifth expression is marked as N/A. For example, the reference implementation of the Singly Linked List Reversal (LLREV) has 4 assignments (Fig. 1), and the search space of candidates for 3 assignments (262.4 K in Table 1) is calculated as the average search space of candidates for all combinations of 3 out of 4 assignments (i.e., the search space of candidates for the assignments \(\{1,2,3\}\), \(\{1,2,4\}\), \(\{2,3,4\}\)). Given 4 variables (head, ln1, ln2, ln3 in Fig. 1), one field dereference (head.next, ln1.next, ln2.next, ln3.next) without considering the default value null, the number of candidates for one assignment are \(8\times 8\). The number of candidates for three assignments (say assignment \(\{1,2,3\}\)) are \(64^3 = 262.4K\), and the average candidate search space for 3 assignments are also 262.4K. Similarly, the search space of candidates for condition holes in the findMedian method will be 96 considering six operators (\(==, !=,>,<, >=, <=\)) and 4 candidates at both sides of the condition, thus synthesizing four conditions can have \(96^4\) candidates. Using these subjects, we first compare the sketching performance of EdSketch with SAT-based Sketch synthesizer, and then illustrate the efficacy of our pruning strategies.

Comparison with Sketch Synthesizer We compare EdSketch with Sketch synthesizer [1], a state-of-the-art SAT-based synthesizer that has been evaluated as a benchmark for other synthesizers [4, 19]. We choose Sketch synthesizer because it can sketch assignments, conditions and expressions similar to EdSketch, whereas other synthesizers focus on API sequences and cannot sketch assignments blocks (e.g., CodeHint [20], SyPet [21] and EdSynth [22].

Fig. 6
figure 6

Singly linked list reversal written in Sketch syntax [1]

We manually transform the Java subjects and test suites to Sketch language, which is a type-based language similar to Java. Figure 6 illustrates the Singly linked list reversal example written in Sketch language. To make sure the subjects written in Sketch language are semantically equivalent to the subjects for EdSketch (Fig. 1), we provide the same variables, field dereferences and relational operators for the subjects written the Sketch language. Note that the keyword minrepeat in Sketch language shares the same logic as EdSketch: adding one statement at a time until a solution is found.

We execute EdSketch-JVM, EdSketch-JPF, and Sketch synthesizer on the subjects using the same test suites and report the time when they find the first solution that satisfies all test cases. All performance experiments are conducted on a MacBook Pro with 2.2 GHz Intel Core i7 processor and 16 GB of 1600 MHz DDR3 memory running OS X version 10.12.1.

Figure 7 represents the sketching performance time of EdSketch-JVM, EdSketch-JPF, and Sketch synthesizer for sketching different numbers of assignments. The x-axis shows the number of statements under sketching, and the y-axis represents the average performance time for sketching tasks with specific number of assignments. The y-axis in for the last 4 subjects in Fig. 7 is transformed with log2 scale for better display. The green line with triangle represents the performance of EdSketch-JVM, the red line with circle represents EdSketch-JPF, and the blue line with square represents Sketch synthesizer.

Figure 7 indicates that EdSketch is able to sketch small but complex data structures with a better performance compared to SAT-based inductive synthesizer in 20 out of 22 sketch tasks. For example, for the subject LLREV, EdSketch-JVM sketches the first correct solution with 5 assignments in 3.1 s while Sketch synthesizer takes 11.7 s for the same task. EdSketch-JVM is slower than Sketch synthesizer in 2 experiments: DLLAF with four assignments (22.4 vs. 1.9 s) and the subject DLLAL with 4 assignments (25.8 vs. 2.8 s). The average performance time for EdSketch-JVM is 16.2 s while the Sketch synthesizer is 25.1 s.

Figure 8 presents the performance of three tools for sketching conditions, including if conditions and while conditions. For instance, for the subject RBTCD, EdSketch-JVM spends an average of 0.06 s sketching 6 conditions while Sketch synthesizer spends 58.9 s. It might because the transformation of red-black tree implementation to boolean formulas is not trivial. and it takes a long time for the SAT solver to find a solution. EdSketch-JVM is faster than Sketch synthesizer in 19 out of 21 experiments. EdSketch-JVM is slower than Sketch synthesizer in the subject BSTCD with three and four conditions (1.7 vs. 0.8 s and 10.5 vs. 1.7 s).

Fig. 7
figure 7

Compare the performance of sketching assignments

Fig. 8
figure 8

Compare the performance of sketching conditions

Previous work [19] conjectured that the backtracking solver purely based on concrete execution will be much slower than SAT-based solver in exploring large state space due to the highly optimized heuristics used by modern SAT engines. However, our experiments show that EdSketch with our pruning strategies for sketching assignments and conditions is comparable or even sometimes faster than the SAT-based synthesizer.

Comparison of two prototypes In our experiments, stateful search using Java PathFinder (JPF) is always slower than the dedicated stateless search. JPF is a general purpose model checker that implements a custom JVM to handle all of Java bytecode, including multithreading programs. JPF provides an off-the-shelf backtrack engine, which is a very convenient way to implement a solution for the sketching problem, albeit with sub-optimal performance, it suffers from the performance cost on saving and restoring previous program state. Yet using JPF as the backend for sketching opens the future work possibility to sketch concurrent programs [11] with systematic path exploration. We leave the sketching for multithreading programs as future work.

Fig. 9
figure 9

Compare the pruning efficacy for sketching assignments

Fig. 10
figure 10

Compare the pruning efficacy for sketching conditions

Efficacy of Pruning Strategies To evaluate whether our pruning strategies can effectively reduce the number of candidates before executing given test suites, we report the number of executed programs with and without pruning strategies when EdSketch finds the first solution that satisfies all test cases.

Figure 9 presents the number of executed programs with and without pruning rules for sketching assignment subjects. The x-axis represents the number of assignments and the y-axis represents the average number of executed programs for sketching a certain number of assignments. The last 4 subjects in Fig. 9 are transformed to square-foot scale for better display. The black bars represent the number of executed programs with pruning rules and the gray ones represent the number of executed programs without pruning rules. As shown in Fig. 9, our pruning rules can effectively prune 7% to 70% with an average of 21% candidates before executing the test suite. For instance, the pruning rules discard 68.4% of candidates for sketching four assignments in the subject LLREV, that is, only around one third of the candidates are actually executed with the test suite and more than two thirds of them are pruned before being executed. The subjects BSTAS and MEDAS have a lower pruning rate (the percentage of candidates being pruned out of all candidates) compared to other subjects, because the sketched assignments in these two subjects are scattered in different if–else conditions and only the rule 1 of the pruning rules can apply to these two subjects.

Figure 10 presents the number of executed programs with and without condition pruning strategy for sketching condition subjects. The y-axis for all subjects in Fig. 10 are transformed to log2 scale for better display. Our value grouping strategy can effectively prune an average of 56% of candidates before executing test suites. Note that for sketching one condition in the subject RBTRM, 35 programs are executed on average with value grouping strategy, while only 15 programs are evaluated without value grouping strategy. This result indicates that the value grouping strategy might not always bring in saving when the search space is small, which is different from the pruning rules for assignments.

In summary, EdSketch effectively prunes an average of 21% of candidates for assignment sketching and more than half candidates for condition sketching.

Fig. 11
figure 11

A sketching example of Fibonacci numbers using recursion

Sketching Recursive Procedures Sketching recursive procedures is known to be a challenging problem in program synthesis [13, 14, 23]. A handful of previous works utilize heuristic search [13], reusable templates [14] and formal specifications [23] to synthesize recursive programs. Sketch synthesizer has been used as a performance benchmark for these works. We follow the same spirit [13] to compare EdSketch with Sketch synthesizer using recursive tasks borrowed from prior works [13, 14, 23]: Fibonacci number (integer program), link list insertion (list) and binary search tree insertion (tree).

Figure 11 presents the subjects of generating Fibonacci numbers using recursion. Similar to prior works on synthesizing recursive procedures [13, 14, 23], we create a sketch with a condition hole and a hole for the arithmetic operators ({+, -, \(\times \), /}). We use four JUnit test cases to check the first four Fibonacci numbers starting from 0. As described in Sect. 3.3, EdSketch instruments the recursive procedure with a bound of the recursive execution to avoid infinite recursion (Fig. 11a). EdSketch completes the sketching task in 1 s. Yet using semantic-equivalent sketches and test harnesses, Sketch synthesizers throws an out-of-memory exception with 2GB memory limit with the same default setting.

For the other two recursive examples for linked list insertion and binary search tree insertion, we introduce an average of one condition holes and four expression holes based on the reference implementation [13, 14, 23]. Leveraging the test generation tool Korat [18], we generate a bounded exhaustive test suite up to 3 nodes and convert the generated tests to JUnit format. Both EdSketch and Sketch synthesizer successfully complete the recursive synthesis task within 1 s. Our results demonstrate the EdSketch ’s efficiency at synthesizing a broad range of programs including recursive procedures.

4.2 Sketching with libraries and advanced features

EdSketch is able to explore the actual runtime behaviors in the presence of libraries and advanced features such as reflections, File I/O and native calls. We only use EdSketch-JVM to complete the sketch tasks in this section. The SAT-based Sketch synthesizer and other translation-based synthesizer can hardly tackle these tasks because it’s not trivial to translate third-party libraries, reflections and native calls to propositional satisfiability formulas and leverage SAT/SMT solvers to complete the sketches.

Fig. 12
figure 12

Sketching tasks with advanced features

Sketching Tasks with Advanced FeaturesEdSketch evaluates code with concrete program execution and hence can sketch real-world Java code with advanced features. We illustrate EdSketch ’s ability to sketch program with reflection, I/O and native call using the prototype EdSketch-JVM.

Figure 12a presents a program sketch of the repOK() method for Binary Search Tree derived from Korat’s evaluation dataset [18]. At line 11, we try to sketch the object whose left field will be used in the if condition. We provide a bounded exhaustive test suite for up to 3 nodes and execute EdSketch on the given test suite from Korat’s evaluation dataset. EdSketch completes this task with the variable expression cur. This example also indicates that EdSketch can be used in a wider scope of sketching tasks, such as sketching a formal specification written in Java.

Figure 12b shows a getSum() method that reads two variables from a file and outputs their sum. The task sketches the infix expression \(a+b\) for the return statement, and EdSketch completes this task with three test cases (0, 0), (0, 1), and (2, 1).

Figure 12c presents a sketch task with native calls. EdSketch sketches an incomplete swap() method for two integers and returns the string concatenation for the swapped integers in a native method nativeToString() using three test cases (0, 0), (0, 1), and (2, 1).

Table 2 Sketching tasks in presence of libraries from open-source projects

Sketching Tasks in Presence of Libraries We evaluate EdSketch ’s efficacy of sketching tasks in the presence of libraries on 10 tasks derived from human-written patches and use the original test cases from the open-source projects as the synthesis criteria. These patches are randomly sampled from five open-source projects: JFreeChart (Chart [24]), Closure compiler (Clo- sure [25]), Apache Commons-Lang (Lang [26]), Apache Commons-Math (Math [27]) and JodaTime (Time [28]), which have been used to evaluate a number of prior works [29,30,31].

Table 2 presents these 10 sketching tasks. Column Commit represents the commit number we refer to when generating the sketches. Part of the human-written patches are shown in the next column. The last column represents the sketches written in EdSketch syntax to simulate the original patches written by the developers.

For instance, the No. 7 task is derived from the commit c9d786 of the open-source project Apache com- mons-lang (Lang [26]). The version we use has 234 files, 55 thousand lines of Java code, and more than 10 external libraries including maven plug-ins. Derived from the RandomStringUtils class, the sketch task involves an if condition and an assignment. Each invocation of this sketch involves more than 20 methods and 10 JUnit test cases. EdSketch generates a complete program that is identical to the original human-written patch in 8.8 s.

4.3 Discussion and threats to validity

In this section, we discuss some alternative setting for EdSketch which may affect its sketching efficacy and discuss the threats to validity of our experiments.

We first investigate the order of the test cases and its influence on EdSketch ’s sketching efficacy. We sort the bounded exhaustive test cases based on the number of nodes, execute EdSketch with test cases in ascending order and descendent order, and compare the performance time of finding the first solution for all subjects. The result illustrates that the performance is almost the same with test cases reordered (5.1 vs. 5.7 s, Wilcoxon test \(p>0.05\)). It might be because the subjects we select are relatively small with a small number of test cases and the time to evaluate test cases is negligible, and thus the prioritization of test cases has little influence on the total performance.

We then investigate the order of selecting left-hand-side and right-hand side expressions for sketching assignments. We have two options to sketch an assignment: select left-hand-side expression first and then select right-hand side expression, or vice versa. We compare the sketching performance with these two options on sketching assignment subjects. Based on our subjects, we find that sketching right-hand side first performs faster than the other option, especially for the experiments with large search space. For instance, EdSketch spends 22.4 s sketching four assignments for the subject DLLAF by selecting right-hand side first, whereas it takes 112.5 s with the other option. Yet this difference is not significant for the experiments with small search space (sketching less than 4 assignments) based on Wilcoxon tests (\(p>0.05\)). We report the performance time by selecting right-hand side first in our performance evaluation.

To further investigate the efficacy of our pruning strategies, we execute EdSketch to find all solutions that satisfy the test suite and compare the pruning rate, i.e., the percentage of candidates that have been pruned out of all candidates. The result shows that the pruning rate for the first solution is similar to that of all solutions (35% vs 37%), indicating that the efficacy of our pruning strategies is consistent in finding the first solution and all solutions based on our subjects.

External Validity We only compare EdSketch with a SAT-based counter-example-guided inductive synthesizer with a small but complex dataset. Our performance comparison results may not extend to other SAT-based synthesizers in a different dataset. Yet Sketch synthesizer has been used as a benchmark for a number of counter-example-guided inductive (CEGIS) synthesizers [4] with reasonable performance. EdSketch sketches expressions, conditions and assignments, our idea of execution-driven sketching can be extended to sketch method invocations [22], which has been shown as comparable with the state-of-the-art synthesizers for API sequences, such as SyPet and CodeHint.

Construct Validity In our performance comparison experiment, we manually transform the Java program to Sketch language, which may have inadvertently introduced behavioral differences. We list all subjects and tests at [16] for cross-validation.

Internal Validity To sketch partial programs, we use bounded exhaustive test suites generated by Korat [18] based on formal specification, yet these test cases might still lead to plausible sketching results, i.e., passing all test cases but is incorrect from the perspective of users. In our experiment, we manually inspect the first generated solution for each subject to validate their correctness.

5 Related work

Sketch-based synthesis Program synthesis has had numerous successes on synthesizing code in small well-defined domains such as bit-vector logic [32] and data structures [33]. They transform partial programs [1], input-output examples [5] or oracles [32] to decision procedures and leverage SAT/SMT solvers to complete the procedures based on the given specification. These techniques are very efficient in certain domains that have been fully modeled [34]. Sketch-based synthesis [1] is a sub-problem of program synthesis that asks programmers to write a draft program containing missing expressions, and uses counter-example-guided inductive synthesis to complete the holes. Jsketch brings sketch-based synthesis to Java [6]. Given a partial Java program written in the sketch syntax, Jsketch translates the Java program to Sketch synthesizer and transfers the synthesizer’s result back to executable Java code. Yet it only supports a limited subset of Java libraries and does not support access control and exceptions. Jsketch is confined to the limitations of translation-based sketching approaches. EdSketch does not translate program sketches and test suites to SAT formulas. It directly executes test cases thus can sketch real-world Java applications that involve advanced language features like reflection. While a primary focus of sketching has been on imperative and functional languages, ASketch [35] introduces a test-driven approach for sketching declarative models in Alloy [36]—a relational first-order logic with transitive closure.

API sequence synthesis Another group of synthesis tools [20, 21, 37] try to synthesize a composition of API calls, where each method takes some arguments and returns a non-void value. Prospector [37] and CodeHint [20] synthesize Java APIs and evaluate code at runtime, thus in theory they could also apply to programs with libraries and advanced features such as reflection. EdSketch focuses on synthesizing conditions and blocks of assignments, whereas Prospector [37], CodeHint [20] and SyPet [21] focuses on API chain completion. Our idea of execution-driven sketching can be extended to synthesize sequences of API method invocations [22], which has been shown as comparable to SyPet [21] and CodeHint [20].

Angelic Programming Similar to our approach, angelic programming [19] leverages the non-deterministic backtrack algorithm [38]. Barman et al. [19] embed the angelic choice construct into the Scala programming language and build a parallel backtracking solver to explore the scalability of their backtracking solver. Without any pruning strategies, this approach scales up much faster compared to the SAT-based Sketch synthesizer. We illustrate that our pruning strategy is efficient and the performance of EdSketch on our dataset is comparable or even faster than a SAT-based synthesizer. We also demonstrate that our approach can be extended to real-world Java code that involves reflection and external libraries.

Code completion Code completion refers to the generation of small code snippets. Perelman et al. [39] infer partial expressions using type-directed completion and InSynth [34] handles high-order functions and polymorphism using theorem proving. Yet both are confined to single-statement synthesis. Slang [40] predicts probabilities of API calls using statistical models based on machine learning. Strathcona [41] assists developers for relevant API invocations from similar program contexts. Different from code completion tools based on probabilistic models, EdSketch ensures that the generated solution satisfies all test assertions.

Program Repair Program repair [31, 42, 43] for automated debugging addresses a similar technical problem as program synthesis. Our program isomorphism analysis is similar to AE [44], which determines semantic equivalence with respect to test cases. AE considers duplicate statements and dead code elimination in terms of variable dependency. Our condition value grouping is similar to SPR’s notion of abstract condition values [45], which only considers Boolean value true and false as abstract symbols instead of considering all condition candidates. Angelix [46] leverages symbolic analysis and constraint solving to generate repairs for real-world C programs. The idea of finding an angelic value is similar to our approach, but it cannot generate new statements while EdSketch is able to sketch multiple assignments. Hua et.al [47] reduce the problem of program repair to program sketching by translating the faulty program into a sketch of a correct program, and leverages SAT solvers to generate a complete program that satisfies all test assertions. SketchFix [31] applies execution-driven sketching technique to program repair and automatically generates sketches to fix problematic conditions and expressions. Instead of fixing existing programs, EdSketch is designed in the purpose of sketching partial expressions, conditions, and a block of assignment statements.

6 Conclusion

This paper presents a novel execution-driven sketching approach that synthesizes Java programs using backtracking search. Our key insight is to introduce effective pruning strategies to reduce the search space for solutions and explore the actual program behaviors by executing the given test suite. EdSketch can synthesize Java code that may use complex constructs of imperative languages, such as reflection and native calls, and can be applied to recursive procedures and large-scale projects in the presence of libraries. Our experiments show that our approach is comparable or even sometimes faster than a SAT-based synthesizer on our dataset. We believe our execution-driven approach holds a key to practical and scalable solutions to a wide-class of synthesis problems.