Keywords

1 Introduction

Geometric Semantic Genetic Programming (GSGP) [16] exploits the spatial properties of program semantics in order to improve the effectiveness of program synthesis. The operators proposed within this branch of genetic programming (GP) have well-understood effects in terms of program behavior on tests, and some of them even guarantee producing programs with semantics that remain in certain geometric relationship with the parent(s). As a result, the dynamics of a GSGP search process is in general more predictable than for conventional GP, GSGP methods are often superior in terms of performance [3, 4, 16, 20, 23] and lend themselves conveniently to theoretical analysis [1720, 22].

The majority of research effort in GSGP focuses on search operators, which is not surprising given that successful program synthesis is directly contingent on them. However, like in case of other evolutionary computation algorithms, the performance of GSGP depends also on the starting point of a search process, i.e., on the contents of the initial population. It is often assumed that the exploratory capabilities of evolutionary search weaken that dependency. This claim can be however questioned in GSGP, because of the mentioned above more predictable, more ‘directional’ and targeted behavior of search operators.

Our case in point in this study is the convergent character of semantic geometric crossover (Sgx). The exact variant of this search operator [16] is guaranteed to produce an offspring with semantics located in the segment connecting parents’ semantics. This, as we showed in [20], implies that for a GP run equipped with Sgx only, the set of all semantics that can be reached in a search process is determined by the initial population. A scenario in which the initial population precludes arriving at the program with desired semantics (the target) is plausible, and the odds for it grow with the dimensionality of semantics (the number of tests (fitness cases)). This risk was largely ignored in past studies on GSGP, in part due to widespread use of mutation along with crossover. Nevertheless, this effect deserves better understanding. Also, as we will argue further, it may be worth addressing this issue even if mutation accompanies crossover as a search operator.

The main contribution of this study is the observation that an alternative (to mutation) remedy to Sgx’s high susceptibility to initial conditions is to construct the starting population more carefully. We propose Semantic Geometric Initialization (Sgi), a semantically aware initialization method that designs the initial population with the search target in mind. A population initialized with Sgi is guaranteed to make the target semantics achievable with Sgx. Sgx, due to its stochastic nature and oblivion to target, is still not guaranteed to synthesize the correct program when initialized with Sgi; however, such a success becomes much more likely, as we demonstrate in experimental part of this study.

The following Sect. 2 briefly introduces the necessary formalisms. Section 3 uses that formal framework to identify the problem signaled above, i.e., that initial population imposes strict constraints on the set of semantics that can be reached with Sgx. Section 4 presents the Sgi algorithm and justifies its design. Section 5 argues that Sgi is fundamentally different from population initialization methods proposed in the past (including the semantic-aware ones), and Sect. 6 demonstrates Sgi’s usefulness empirically on a suite of well-known GP problems. Section 7 discusses the main results, and Sect. 8 summarizes this study and outlines the potential follow-up directions.

2 Background

We define a program \(p\in \mathcal {P}\) in a programming language \(\mathcal {P}\) as a function that maps a set of inputs \(\mathcal {I}\) into a set of outputs \(\mathcal {O}\), which we denote by \(o=p(in)\), where \(in\in \mathcal {I}\) and \(o\in \mathcal {O}\). We consider only deterministic programs that feature no side effects, nor memory persistent across executions. Semantics \(s\in \mathcal {S}\) is a vector \(s=[o_{1},o_{2},\ldots ,o_{n}]\in \mathcal {O}^{n}\), where we refer to \(\mathcal {O}^{n}\) as semantic space (a vector space), and \(o_{i}\) corresponds to ith element in a given n-tuple of program inputs from \(\mathcal {I}^{n}\) that defines the considered program synthesis task. Semantics s(p) of a program p is a vector of p’s outputs when executed on a fixed set of inputs \(I\subset \mathcal {I}\), i.e., \(s(p)=[p(in_{1}),p(in_{2}),\ldots ,p(in_{n})]\), \(in_{i}\in I\).

The concept of semantics allows reasoning about program behavior in terms of n-dimensional spaces. Each program has a well-defined semantics, i.e., a point in semantic space \(\mathcal {O}^{n}\). The desired outputs given by a specific synthesis problem uniquely determine a point t in that space called target (or target semantics). If a fitness function happens to be a metric (which is almost always the case in GP), the fitness landscape defined over \(\mathcal {O}^{n}\) is a unimodal cone with the apex in t [22]. These properties open the door to defining spatial relationships between program semantics, and investigating whether particular search operators obey them or not and what is the impact of those properties on the efficiency of search. The concepts of particular importance here are geometric mutation and geometric crossover.

Definition 1

Given a parent program p, an r-geometric mutation is an operator that produces an offspring \(p'\) with semantics \(s(p')\) in a ball of radius r centered in s(p), i.e., \(\left\| s(p),s(p')\right\| \le r\).

Definition 2

Given parent programs \(p_{1},p_{2}\), a geometric crossover is an operator that produces an offspring \(p'\) with semantics \(s(p')\) in a segment between \(s(p_{1})\) and \(s(p_{2})\), i.e., \(\left\| s(p_{1}),s(p_{2})\right\| =\left\| s(p_{1}),s(p')\right\| +\left\| s(p'),s(p_{2})\right\| \).

A crossover operator with the above property has the ideal ‘mixing’ characteristics: the semantics of the offspring is located ‘in between’ of parents’ semantics. This is in stark contrast to the highly unpredictable semantics of programs produced by conventional search operators (see, e.g., the arguments in [11]).

The quest for practical algorithms that implement geometric search operators lasted for several years. Among others, multiple approximately geometric crossovers have been proposed [1012, 20, 23, 24]. The breakthrough came with publication of [16], where exact versions of geometric crossover and mutation have been proposed, defined as follows for particular domains.

Definition 3

(Algorithms for geometric mutation, Sgm) Symbolic regression: Given parent arithmetic program p, an offspring is a program \(p'=p+r(m_{1}-m_{2})\), where \(m_{1}\) and \(m_{2}\) are random arithmetic programs that output values in range [0,1]. Boolean domain: Given Boolean parent program p, an offspring is a program \(p'=m\vee p\) with probability 0.5, \(p'=\overline{m}\wedge p\) otherwise, where m is a random minterm.

Definition 4

(Algorithms for geometric crossover, Sgx) Symbolic regression: Given parent arithmetic programs \(p_{1},p_{2}\), an offspring is a program \(p'=mp_{1}+(1-m)p_{2}\), where m is a random arithmetic program that returns values in range [0,1]. Boolean domain: Given Boolean parent programs \(p_{1},p_{2}\), an offspring is a program \(p'=(p_{1}\wedge m)\vee (\overline{m}\wedge p_{2})\), where m is a random Boolean program.

3 The Problem

The Sgm and Sgx operators are geometric by construction: the offspring is guaranteed to be geometric with respect to parents in the sense of Definitions 1 and 2. Expectedly, they deliver superior search performance in all domains, as shown experimentally in [16] and further studies. However, Sgm is the key to that success: it is indispensable for good performance, while Sgx used alone sometimes fails to converge to the sought program [16, 23].

The reason for this state of affairs is the ‘centripetal’ character of Sgx, which can produce only the offspring with semantics in the segment connecting parents’ semantics. On one hand, this is highly desirable given that fitness landscape in semantic space is unimodal: an application of Sgx to any pair of solutions in that space is guaranteed to produce an offspring with attractive properties – for instance for fitness and operator’s metric being Euclidean distance, an offspring that is not worse that the worse of the parents [22]. On the other hand however, if Sgx is the only search operator used for program synthesis, the set of semantics achievable from a given initial population \(P\subset \mathcal {P}\) is limited to the convex hull spanning the semantics of programs in P, since that convex hull incorporates all segments between semantics of all pairs of programs in P. Formally (cf. [20]):

Theorem 1

Consider a population \(P_{1}\) of programs and a search process that starts from \(P_{1}\) and uses Sgx to generate subsequent generations of programs. A program having semantics t can be achieved in that search process iff the convex hull of \(P_{1}\) includes t.

Proof

Let \(P_{g}\) be population in generation \(g\ge 1\), \(C_{g}\) be convex hull of semantics of programs in \(P_{g}\). For all given pairs of parent programs \(p_{1},p_{2}\in P_{g}\), a semantics \(s(p')\) of an offspring \(p'\) is included in a segment of \(s(p_{1})\) and \(s(p_{2})\) that in turn is included in \(C_{g}\). The set of all offspring \(P_{g+1}\subseteq C_{g}\) constitutes a population of generation \(g+1\). By the non-decreasing property of convex hull, \(C_{g+1}\subseteq C_{g}\). By induction, \(C_{g+1}\subseteq C_{g}\subseteq ...\subseteq C_{1}\). Hence, semantics t can be achieved in generation g iff \(t\in C_{g}\subseteq C_{1}\).

The choice of ‘t’ as the symbol denoting the semantic in question is not incidental: the above theorem applies in particular to the target, with profound consequences. If t happens to be located outside the convex hull, applications of Sgx to P, regardless how many, cannot lead to a program with semantics t. Unfortunately, this is relatively likely for populations initialized in conventional, semantic-unaware ways. For instance, for symbolic regression, the semantics of programs generated by means of the popular Ramped Half-and-Half (Rhh) method [9] tend to initialize programs with relatively simple semantics, typically clustered around the origin of coordinate system of semantic space [13]. A target located far away from that origin is likely to be outside the convex hull and thus unreachable by the actions of Sgx.

This observation, though rarely formalized in the above way in past literature, was one of the reasons for using mutation operators alongside with Sgx (the other reason being Sgx’s propensity to produce large programs). The natural operator of choice is in this context Sgm, as it is provably capable of reaching the target from arbitrary starting location in semantic space (even when used alone; see the semantic stochastic hill climber in [16]). However, it may require multiple iterations and by this token produce large programs. In the following, we propose an alternative way of making target achievable for Sgx.

4 Semantic Geometric Initialization

In the light of Sect. 3 it becomes evident that the target becomes reachable for Sgx once it belongs to the convex hull spanning the semantics of programs in initial population. In this section, we propose Semantic Geometric Initialization (Sgi), a method that achieves that goal by means of semantic- and geometry-aware population initialization.

figure a

The input to the method is the target t. The algorithm proceeds in two steps:

  1. 1.

    Use the function Wrap (Algorithm 1) to generate a set of semantics \(S\subset \mathcal {O}^{n}\) such that the convex hull of S encloses target t, i.e., \(t\in C(S)\),

  2. 2.

    For each semantics \(s_{i}\in S\), synthesize a program p such that \(s(p)=s_{i}\).

The realization of each step is domain-dependent. For the first step we provide an abstract Algorithm 1 that will be specialized in the following for symbolic regression and Boolean function synthesis domains. It calculates set of semantics that wrap t in their convex hull in \(\mathcal {O}^{n}\) semantic space. The algorithm iteratively constructs new semantics by modifying k components (dimensions) of the target t. For each subset of k components of t, the algorithm attempts to construct \(2^{k}\) semantics by applying to these components all combinations of two domain-dependent functions: AddOne and SubtractOne. We gather the resulting semantics in the output set S while discarding duplicates. We start with \(k=1\) and increment it each time all k-component sets have been considered.

For the symbolic regression domain, we define AddOne and SubtractOne functions using arithmetic operations:

$$\begin{aligned} \textsc {AddOne}(t_{a_{i}}) \equiv t_{a_{i}}+1 \qquad \quad \textsc {SubtractOne}(t_{a_{i}}) \equiv t_{a_{i}}-1 \end{aligned}$$

The Wrap algorithm, when instantiated with these functions and invoked with \(popsize\ge 2\), is guaranteed to construct a set of semantics S such that \(t\in C(S)\). In other words, there exists a dimension i of t and semantics \(s^{1}\) and \(s^{2}\), such that \(s_{i}^{1}<t_{i}<s_{i}^{2}\) and \(\forall _{j\ne i}s_{j}^{1}=t_{j}=s_{j}^{2}\). Thus, under any metric, t is included in the segment between \(s^{1}\) and \(s^{2}\) – a degenerated case of a convex hull.

Concerning program synthesis, for each semantics \(s\in S\) calculated by Algorithm 1, Sgi constructs a program using multivariate polynomial interpolation as described in [26]. The set of points used in interpolation comes from the set of program inputs \(in\in I\) on which program’s semantics is to be calculated and corresponding components of s, i.e., \((in_{i},s_{i}),\,k=1,\ldots ,n\).

For the Boolean function synthesis, the definitions of AddOne and SubtractOne are Boolean counterparts of the above arithmetic operations, limited to the corners of the unit hypercube \(\{0,1\}^{n}\). Since there are only two values in Boolean domain: 0 (false) and 1 (true), the Boolean addition of 1 results in 1 (i.e., \(q\vee 1\equiv 1\)), the Boolean subtraction of 1 results in 0 (i.e., \(q\wedge 0\equiv 0\)) for any term, and the functions reduce to constants:

$$\begin{aligned} \textsc {AddOne}(t_{a_{i}}) \equiv 1 \qquad \quad \textsc {SubtractOne}(t_{a_{i}}) \equiv 0 \end{aligned}$$

For \(popsize\ge 2\) Wrap is guaranteed to include the target t in C(S), because there exists a dimension i of t and semantics \(s^{1}\) and \(s^{2}\), such that \(s_{i}^{1}=t_{i}\ne s_{i}^{2}\) or \(s_{i}^{1}\ne t_{i}=s_{i}^{2}\) and \(\forall _{j\ne i}s_{j}^{1}=t_{j}=s_{j}^{2}\). Sgi synthesizes the Boolean programs \(p_{i}\) for semantics \(s_{i}\in S\) using the following formula:

(1)

where \(in_{jk}\) is a value of kth variable of jth input used to calculate semantics, \(y_{j}\) is a minterm that is 1 for jth input and \(x_{k}\) is kth program argument.

5 Related Work

To our knowledge, Sgi is the first semantic and geometric population initialization method in GP, however not the first semantic method of this kind.

Looks in [13] proposed Semantic Sampling (Ss) heuristic that produces a population of semantically unique Boolean programs with uniformly distributed program sizes. Ss partitions a population into bins by program size and fills them up to assumed capacity by semantically unique programs. Sgi differs from Ss in its awareness of geometry of semantic space. Also, Sgi can operate in any domain for which the AddOne and SubtractOne functions can be defined (and efficiently computed).

Table 1. Parameters of evolutionary algorithms.
Table 2. Benchmark problems.

Beadle and Johnson [1] proposed Semantically Driven Initialization (Sdi) that fills population with semantically unique programs. Sdi was designed for Boolean and artificial ant problems and uses a reduced ordered binary decision diagram or a sequence of ant moves, respectively, as the representation of program’s semantics. Like Ss, Sdi does not engage geometric considerations.

An approach called Behavioral Initialization (Bi) was proposed by Jackson [7]. Bi is a wrapper on Rhh that accepts a program created by Rhh if it is semantically unique in the population being initialized. Although domain-independent, Bi is oblivious to geometry of the semantic space.

Pawlak proposed Competent Initialization (Ci) [20] for symbolic regression and Boolean domains. Ci repetitively invokes Sdi and accepts (i.e., adds to the population) the created program if its semantics expands the convex hull of the semantics of programs already present in the population. Ci is geometric in the limit of population size approaching infinity, but in contrast to Sgi does not guarantee including the target in the convex hull.

6 Experimental Verification

We compare Sgi to Ramped Half-and-Half (Rhh) [9] – the arguably most common population initialization method in GSGP and in GP in general. We run two groups of GSGP setups, with Sgi and with Rhh as the initialization operator, to determine the advantages and disadvantages of using Sgi. In both groups, we consider a setup with Sgx [16] as the only search operator to verify whether inclusion of the target in the convex hull of the initial population increases Sgx’s ability of reaching it (cf. Sect. 3). The second setup in each group uses both Sgx and Sgm [16], in a configuration that is most commonly practiced in GSGP. In addition, we run a canonical control setup, with Rhh and traditional non-semantic search operators. Overall, there are thus five setups:

  • SgiXSgi accompanied with Sgx only,

  • SgiXMSgi with Sgx and Sgm in proportions 50 : 50,

  • RhhXRhh with Sgx only,

  • RhhXMRhh with Sgx and Sgm in proportions 50 : 50,

  • RhhTxTmRhh with tree crossover and tree mutation [9] in proportions 90:10.

Table 1 sums up the parameter settings; other parameters are set to ECJ [14] defaults.

Fig. 1.
figure 1

Average and .95-confidence interval of the best-of-generation fitness.

We compare the setups on nine uni- and bi-variate symbolic regression problems, and nine Boolean function synthesis problems. The problems come from [15, 23] and are summarized Table 2. In univariate symbolic regression problems, 20 Chebyshev nodesFootnote 1 [2] are used for training, and 20 uniformly sampled points for testing. For bivariate problems 10 values are picked in analogous way for each input variable and the Cartesian product of them constitutes a data set. Points are selected from the ranges shown in the table. For the Boolean benchmarks, training sets enumerate all combinations of inputs and there are no testing sets.

Training Set Performance. Figure 1 and Table 3 present average and .95-confidence interval of the best-of-generation and the best-of-run program, respectively. Both Sgi* setups begin from a relatively low fitness of about 1 in all problems in both problem domains. This phenomenon originates in the construction of AddOne and SubtractOne formulas that cause the initial population to consists of multiple programs at distance 1 from the target. These superior starting conditions are especially evident in Boolean domain, where the programs produced by Rhh in the initial generation are 1–2 orders of magnitude worse than those produced by Sgi.

Table 3. Average and .95-confidence interval of the best-of-run fitness. Last row presents the averaged ranks of setups.
Table 4. Post-hoc analysis of Friedman’s test on Table 3: p-values of incorrectly judging a setup in a row as achieving better best-of-run fitness than a setup in a column. Significant values (\(p<0.05\)) are visualized as outranking graph.

In symbolic regression problems GSGP clearly benefits from Sgi. We observe steep improvement of fitness in the first ten generations that gradually slows down and finally stops after about 20 generations. The initial rate of improvement is slower for RhhX and RhhXM, which gradually improve in the first 5–8 generations and then saturate. In Boolean domain both Sgi* setups find the optimum in 1–2 generations, while Rhh* GSGP setups need 10–25 generations to solve all problems, except for multiplexers.

There is no noticeable difference in generational characteristic between both Sgi* setups. In contrast, both Rhh* GSGP setups differ noticeably: RhhXM fares better in most symbolic regression problems. In the Boolean domain, there are no significant differences. Best-of-generation fitness of RhhTxTm is in between those of the GSGP setups for greater part of runs for the symbolic regression problems, and worse than all GSGP setups for the Boolean problems.

Friedman’s test [8] on the best-of-run fitness signals significant differences between setups (\(p=9.68\times 10^{-6}\)), so we conduct post-hoc analysis with symmetry test [6]. Table 4 presents the p-values for the hypothesis that a setup in a row is better than a setup in a column, and the graph of significant outrankings. The setups initialized with Sgi are significantly better than all other setups.

Table 5. Probability and .95-confidence interval of success over problems. Problems that were not solved at least once are not shown. Last row presents the averaged ranks of setups.
Table 6. Post-hoc analysis of Friedman’s test on Table 5: p-values of incorrectly judging a setup in a row as having higher probability of success than a setup in a column. Significant values (\(p<0.05\)) are visualized as outranking graph.
Fig. 2.
figure 2

Median and .95-confidence interval of test set fitness of the best-of-generation program on training set.

Table 7. Median and .95-confidence interval of test set fitness of the best-of-run program on training set. Last row presents the averaged ranks of setups.
Table 8. Post-hoc analysis of Friedman’s test on Table 7: p-values of incorrectly judging a setup in a row as achieving better test set fitness than a setup in a column. Significant values (\(\alpha =0.05\)) visualized as outranking graph.
Table 9. Average and .95-confidence interval of number of nodes in the best of run program. Values \(\ge 10^{4}\) are rounded to an order of magnitude. Last row presents the averaged ranks of setups.
Table 10. Post-hoc analysis of Friedman’s test on Table 9: p-values of incorrectly judging a setup in a row as producing smaller programs than a setup in a column. Significant values (\(\alpha =0.05\)) visualized as outranking graph.

Table 5 presents the empirical probability of solving problems by particular setups, which we define as achieving best-of-run fitness lower than \(2^{-23}\) (the difference between 1.0 and the closest IEEE754 single precision number). Both Sgi* setups solve all Boolean problems, while the setups using Rhh for population initialization do not solve Mux11 in any run, and do not always solve the other 3 out of 9 Boolean problems. The probabilities are a bit higher for the setup that uses mutation. On the other hand, in symbolic regression, none of the GSGP setups solved any of problems and only Ng9 problem is solved by RhhTxTm. Friedman’s test (\(p=6.73\times 10^{-4}\)) and post-hoc analysis in Table 6 show that both Sgi* setups and RhhXM are better than RhhTxTm, however the evidence is too weak to conclude about the differences between Sgi and Rhh in GSGP.

Test-Set Performance. Figure 2 and Table 7 present the median and .95-confidence interval of test-set fitness of the best-of-generation and the best-of-run program on the training set, respectively. All GSGP setups significantly overfit to training data: the test set fitness quickly increases in early generations and remains high for the rest of runs. The values are slightly worse for the Sgi* setups. This observation is consistent with previous studies, [20, 23] and can be attributed to the well-known bias-variance dilemma [5] in machine learning. The use of Sgi leads to better adaptation to training data, and thus reduces bias. That in turn increases the variance of performance on the unknown test data, and makes GP more prone to overfitting. The conventional RhhTxTm setup is the only one that generalizes well to the test set. Friedman’s test (\(p=2.18\times 10^{-5}\)) and post-hoc analysis in Table 8 confirm: RhhTxTm and RhhX are better than both Sgi* setups.

Program Size. Table 9 presents the average and .95-confidence interval of the number of nodes in the best-of-run programs. The exponential growth of Sgx’s offspring is clearly visible in the data. For the setups that involve that operator, we report the total number of nodes/instructions in ‘unrolled’ trees. The actual number of unique program nodes held in memory is many orders of magnitude lower, because a given program may refer to its ancestor programs multiple times, due to the ‘aggregative’ nature of exact semantic operators (see Definitions 3 and 4).

In contrast, RhhTxTm produces programs smaller than 500 nodes. For the setups initialized with Sgi, it is worth noting that they produce large programs only for problems that they failed to solve in some of runs (cf. Table 5). For the remaining problems, the average number of nodes in a program is smaller, however still bigger than of RhhTxTm. An exception is Par5 problem, for which both Sgi* setups produce the smallest programs. Fortunately, use of Sgi increases the probability of success and thus rises the likelihood of producing small programs. Friedman’s test (\(p=2.74\times 10^{-6}\)) and post-hoc analysis in Table 10 confirm that RhhTxTm produces smallest programs, but there is no sufficient evidence to reveal the differences between the remaining setups.

7 Discussion

The results presented above clearly corroborate the importance of population initialization for GSGP. In particular, the geometric and semantic-aware initialization offered by Sgi brings to zero the differences of the setups that use and do not use Sgm (i.e., SgiX and SgiXM), on virtually all performance indicators (fitness, test-set performance, program size). In other words, the convex hull of semantics built around the target by Sgi makes the use of Sgm optional. We hypothesize that this characteristics may be convenient in scenarios where Sgm is slow at traversing search space, and where Sgx may be better in that respect. Concerning low generalization capabilities, they are not due to Sgi, but are common to all setups that involve Sgm and Sgx, and reveal the more fundamental problem of semantic geometric GP, i.e., its inherent lack of bias resulting in high variance [5] (Sec. 6).

A vigilant reader might have noticed the seemingly paradoxical feature of the proposed initialization technique. Sgi employs exact techniques to synthesize the programs that support the convex hull around the target (multivariate polynomial interpolation for the symbolic regression domain and disjunctive normal forms for the Boolean domain). Then it relies on heuristic GP search to synthesize the program that solves the original task, i.e., has semantics in the target. It does not take long to realize that the above exact techniques could be directly used to synthesize the sought program, without using GP altogether.

Note however that the above paradox applies to the entire domain of GP, and not only to Sgi or this particular study. Our goal was to verify the relevance of geometric and semantic-aware initialization for search conducted by means of GSGP, and the empirical evidence gathered here confirms the theoretical suppositions. We explore the possibility of one-step construction of a perfect program from a population in another study published in this very volume [21].

Sgi offers certain advantages for program size in the Boolean domain. As it follows from Table 9, GSGP starting from traditionally initialized populations (with Rhh) may grow monstrously large programs before reaching the target (even when using Sgm, which is known to increase program size only by fixed factor in every application, compared to the exponential growth of Sgx). When an initial population forms a convex hull around the target, a few moves of Sgm and/or Sgx may be sufficient to solve a synthesis task. We hypothesize thus that the positive impact of Sgi is not only due to its convex hull property, but also due to the proximity of the initial population to the target.

8 Conclusions

We have brought theoretical evidence that the possibility of finding a program with the optimal semantics by GSGP running solely geometric crossover depends on whether the convex hull spanning the programs in the initial population includes the search target. Experimental verification has shown that the above is true also for GSGP equipped with crossover and mutation. The commonly used Rhh initialization does not provide this guarantee. To overcome this problem, we provided the Sgi algorithm that seeds the initial population with programs that form an appropriate convex hull.

Further work is needed to develop Sgi for other domains than those considered in this paper, e.g., for the categorical one. Another interesting research topic is the influence of the initial distribution of programs’ semantics on the analogous distributions in subsequent populations and search performance. Last but not least, the convex hull property is only the necessary condition to reach the target. It remains an open question how to efficiently prevent GSGP operators from losing the target from the population’s convex hull.