Keywords

1 Introduction

The use of Evolutionary Computation in the field of computer-assisted composition has been widely addressed through a large variety of evolutionary techniques [12, 17]. In this paper we present a novel fitness function that can be employed on evolutionary algorithms that need to evaluate the dissimilarity between the genotype of individuals with different number of genes. We will focus on the specific problem of Thematic Bridging [8], conceived as obtaining automatically a set of smooth transitions from any given initial melody to any given goal melody. To achieve this, several evolution strategies [6] will be implemented and afterwards tested in two experiments, measuring the performance indicators for several settings. The evolutionary search, thanks to mutation and crossover operators, will progressively minimize the dissimilarity of every offspring, until an individual reaches the objective genotype of the goal melody. The best individuals of every offspring will be stored and will create the desired bridging, that could be written into a musicXML interchange format file.

2 Related Work

The use of evolutionary algorithms in the field of Computer-Assisted Composition is said to have started at the beginning of the 90’s. In the year 1991, Horner and Goldberg [8] implemented one of the first applications of the evolutionary algorithms to computer composition, describing an evolutive technique called Thematic Bridging that is able to produce melodic material as a result of iterative transitions between two small melodies. The composition is created by means of human selection and organization of the algorithmically created melodic material, in a form of an imitative five-voices canon.

Three years later, Biles [3] presented GenJam, a noteworthy application of genetic computation that generates improvisations in jazz style, keeping the hierarchical relations between different melodic ideas suggested by the harmonic chords progression that is playing. At the same time, the system retrieves feedback information in real-time from the human player. Other interesting evolutive designs were proposed by Hartman [7] and Mcintyre [13].

De la Puente et al. [4] introduced GEMUSIC, a tool that creates algorithmically melodic lines similar to human compositions, thanks to the implementation of Evolutionary Grammars. Weinberg et al. [21] described an interactive evolutive robotic system that collaborates with human players and improvises while playing on a xylophone. The system detects the musical material played by the human and evolves it using several fitness functions. Tzimeas et al. [19] developed the software Jazz Sebastian Bach, a system that evolves melodies originally composed by J. S. Bach and turns them into a jazzy style. The authors propose a fitness function called Critical Damped Oscillator that overcomes several algorithmic difficulties related to Automatic Fitness Assessment (AFA) or Interactive Genetic Algorithm (IGA).

De León et al. [16] proposed the characterization of a melody as the result of a set of rules coming from a fuzzy genetic algorithm, aimed to distinguish if a given MIDI file contains a melody or not. The figure of the human-expert knowledge is replaced by a fuzzy genetic system. Sánchez et al. developed MELOMICS project [15], a sophisticated evolution system able to compose and orchestrate whole musical pieces.

Scirea et al. presented in [18] the framework MetaCompose, for music composition that includes a chord sequence and accompaniment generator, and a melody generator that uses a novel evolutionary technique combining FI-2POP and multi-objective optimization. In 2019, Nam YW. and Kim YH. [14] automatized the production of good-quality jazzy melodies by means genetic algorithm, using a variable-length chromosome and geometric crossover.

Trump proposed in [20] a evolutionary framework for improvisation in which the improvisation is created by successive sound cells containing a musical content transformed by a creative selection. Other interesting approaches were proposed by Almada [1], Arutyunov [2] and Donelly [5].

3 Theoretical Background

3.1 Mathematical Definition of a Melody

As we presented in [10], a melody can be understood as a sequence of points, \(\mathscr {M} = \{ \mathbf {x}_i\}^n_{i=1},\) where each point \(\mathbf {x}_i\in \mathbb {R}^q\) is a musical note. The most simple way to represent a note is using three musical characteristics: the duration, the frequency and the time distance that could exist until the next note (representing in this way the possible existence of a silence between this note and the next one). Thus, a musical note will be expressed by a point within a three-dimensional metric space \(\mathbf {x}=(x_1,x_2,x_3) \in \mathbb {R}^3\), where the feature \(x_1\) expresses the time duration of the note, \(x_2\) expresses the frequency and \(x_3\) indicates time duration of an optional silence until the next note. In order to calculate the time features \(x_1\) and \(x_3\) of each note, we will use the relative-duration coefficient \(\delta \) proposed in [10]. For the symbolic representation of the frequency in the feature \(x_2\) we will use the MIDI pitch number associated to any musical note.

3.2 Neighbourhood Functions

Neighbourhood functions introduced in [10, 11] are the key point of the fitness function that will be introduced in the following section. When making a comparison between two sequences, with the first one having a number of n elements and the second one having a number of m elements, the aim of the neighbourhood function will be to calculate the degree of similarity between any element i from the first sequence and any element j of the second one.

In this way, when comparing two sequences A and B with very different number of elements, if a correct function is defined, the first elements of sequence A will be strongly correlated with the first elements of the sequence B, but very weakly correlated with the final elements of B. In addition, the ending elements of sequence A will be weakly correlated with the first elements of the sequence B, but strongly correlated with the final elements of B. Equation 2 shows the expression of Gaussian Neighbourhood Functions used in this paper.

$$\begin{aligned} f(i,j) = \frac{1}{\sqrt{2\pi \sigma ^2}}e^{-\big [\frac{1}{2\sigma ^2}\big (i-\frac{(n-1)}{(m-1)}j\big )^2\big ]}. \end{aligned}$$
(1)

3.3 Fitness Function

We propose a new fitness function for evolutionary music composition based on the definition of Melodic Dissimilarity proposed in [10]. Let \(\mathscr {M}^A=\{\mathbf {x_1}, \dots ,\mathbf {x_n}\}\in \mathbb {R}^q\) and \(\mathscr {M}^B=\{\mathbf {y_1}, \dots ,\mathbf {y_m}\}\in \mathbb {R}^q\) be two different melodies constructed by a different number of notes. Let \(d:\mathbb {R}^q\times \mathbb {R}^q\rightarrow \mathbb {R}\) be any distance function on the metric space. Let f(ij) be any neighborhood function. The Neighborhood Average Dissimilarity \(\mathscr {D}\) from melody \(\mathscr {M}^A\) to melody \(\mathscr {M}^B\) is defined as

$$\begin{aligned} \mathscr {D}(\mathscr {M}^A, \mathscr {M}^B) = \frac{1}{n \cdot m} \sum \limits _{i=1}^n \sum \limits _{j=1}^m f(i,j) \cdot d( \mathbf {x_i}, \mathbf {y_j}). \end{aligned}$$
(2)

The proposed fitness function of each individual will be constructed by de absolute difference between the dissimilarity of each individual A with the goal melody B minus the dissimilarity of the goal melody with itself. Consequently, the expression for the fitness function is

$$\begin{aligned} F_{fitness}(\mathscr {M}^A) = |\mathscr {D}(\mathscr {M}^A, \mathscr {M}^B) - \mathscr {D}(\mathscr {M}^B, \mathscr {M}^B)| \end{aligned}$$
(3)

Observe how expression (2) does not accomplish any of the requirements of a distance function. Consequently \(\mathscr {D}(\mathscr {M}^B, \mathscr {M}^B)\ne 0\), for the most of the cases.

4 Genetic Algorithm

4.1 Genotype Representation

A melody will be represented into an individual. In the genotype, the sequence of all notes is codified into the sequence of genes. Each gene contains the minimum information of a note. For the following experiments, three different representations of an individual genotypes will be used, each of one containing the required information by any of the evolutionary strategies [3] tested: simple mutation, uncorrelated mutation with one step size, and Uncorrelated mutation with n Step Sizes.

Simple Mutation. In this representation, the genotype of every individual is a sorted array of float numbers in which the three features \(x_1\), \(x_2\) and \(x_3\) of every note (gene) will be stored by order. The length of the array will be \(3\times n\), where n is the number of notes of the melody that is coded for each individual. The representation is as follows

$$\begin{aligned} (x_1^1, x_2^1, x_3^1,x_1^2, x_2^2, x_3^2, \dots , x_1^n, x_2^n, x_3^n) \end{aligned}$$
(4)

Uncorrelated Mutation with One Step Size. In this representation, the genotype of each individual is again a sorted array in which the three features of every note have been stored, besides three values \(\sigma \) belonging to features of time duration, pitch and time distance. The length of the genotype will be \(3\times n + 3\), being n the number of notes of the melody, and its structure will be the following:

$$\begin{aligned} (x_1^1, x_2^1, x_3^1,x_1^2, x_2^2, x_3^2, \dots , x_1^n, x_2^n, x_3^n; \sigma _1, \sigma _2, \sigma _3) \end{aligned}$$
(5)

Uncorrelated Mutation with n Step Sizes. In addition to the previous information of time duration, pitch and time distance belonging to each one of the n notes, this representation will include the sigma value \(\sigma \) corresponding to each one of this features. In this case, the length of the genotype will be \(6\times n\) and its structure:

$$\begin{aligned} (x_1^1, x_2^1, x_3^1, \dots , x_1^n, x_2^n, x_3^n; \sigma _1^1, \sigma _2^1, \sigma _3^1, \dots , \sigma _1^n, \sigma _2^n, \sigma _3^n) \end{aligned}$$
(6)

4.2 Restrictions on the Evolution Strategy

Some constrains will be implemented into the evolutionary strategy aimed to reduce the space of research. The first restriction is introduced in the mutation of the MIDI pitch value. The mutated pitch value will be forced to be a integer number, due to the MIDI mapping of the musical notes is enclosed from 0 to 127, so the algorithm does not consider the possible existence of intervals smaller than a semitone.

The second constrain is implemented into the possible variation of the relative-duration Coefficient \(\delta \). This feature will be mutated by means of adding or subtracting a multiple of a minimum-duration figure. The arbitrarily chosen minimum-duration is a demisemiquaver (thirty-second note), with coefficient \(\delta _{min}=0.03125\).

4.3 Mutation

The mutation in a specific gene will be done using random resetting, establishing an uniform mutation probability for every genes. When a gene is randomly chosen for being mutated, the feature represented by this gene into a float number will be modified adding or subtracting a certain amount, calculated according to the case that we consider.

Simple Mutation. In the case of simple mutation, the features of duration \(x_1\), pitch \(x_2\) and time distance \(x_3\) of a selected note i will be modified using these expressions based on the equations exposed in [6]:

$$\begin{aligned} \begin{aligned} {x'}_1^i=x_1^i + \delta _{min} \cdot \lceil \sigma _1\cdot N(0, 1)\rceil \\ {x'}_2^i=x_2^i + \lceil \sigma _2\cdot N(0, 1)\rceil \\ {x'}_3^i=x_3^i + \delta _{min} \cdot \lceil \sigma _3\cdot N(0, 1)\rceil \end{aligned} \end{aligned}$$
(7)

where N(0, 1) is a generator of Gaussian distributed random numbers, centered in the zero value (mean equal to zero), and with standard deviation equal to 1. In this case, the values of \(\sigma \) stay constant.

Uncorrelated Mutation with One Step Size. In this kind of mutation, the values \(\sigma _1\), \(\sigma _2\) and \(\sigma _3\) used to calculate the changes in the features of duration, pitch, and time distance, are assumed to change randomly for each individual. Therefore, the mutations on the standard deviations and features \(x_i\) will be done by means of the following expressions:

(8)

For every mutation of the deviation \(\sigma _j\), we should check if the new value is not too small. To achieve this, we establish a threshold value \(\epsilon _j\) below which \(\sigma \) can not still decrease, so:

$$\begin{aligned} \begin{aligned} {\sigma '}_1< \epsilon _1 \Rightarrow {\sigma '}_1 = \epsilon _1 \\ {\sigma '}_2< \epsilon _2 \Rightarrow {\sigma '}_2 = \epsilon _2 \\ {\sigma '}_3 < \epsilon _3 \Rightarrow {\sigma '}_3 = \epsilon _3 \end{aligned} \end{aligned}$$
(9)

Uncorrelated Mutation with n Step Sizes. In this case, each one of the \(x_j^i\) features of an individual will mutate with a specific deviation \(\sigma _j^i\). The mutations of the deviations and features of a chosen note i will be carried out with these expressions:

(10)

Once again, it is necessary to check if the mutated value of every deviation \(\sigma _j^i\) is not smaller than a given threshold \(\epsilon _0\).

Mutation Operation Concerning the Number of Notes of a Melody. Besides mutating the features of duration, frequency and time distance of a random note from the melody coded on the genotype of each individual, it is needed to establish a mutation operation to change the number of notes of the melody, since we want to achieve an evolutionary transition from one initial melody to a second one, both having conceivably a different number of notes.

Two arbitrary probabilities for insertion and suppression of a note will be implemented in order to insert a new note in a random position p inside the genotype, or remove the note located on the position p, respectively.

When inserting a brand new note in a random position of the genotype, there exist three different possibilities:

  • Inserting the note at the beginning of the melody (\(p=0\)): In this case, three new positions will be inserted at the very beginning of the genotypical array. The values of these three new positions will duplicate exactly the three features of the prior first note, so the new inserted note will duplicate exactly the ancient first one.

  • Inserting the note at the end of the melody (\(p=3\cdot n\)): In this case, three new positions will be added at the end of the genotypical array. The values of this positions will duplicate the previous last note of the melody.

  • Inserting the note in an intermediate position within the melody (p = k, \(0<k<3\cdot n\)): In this case, a new note will be inserted between the notes placed in the positions \(k-1\) and k. Each one of the three features of the new note will be calculated as an average value of the corresponding feature from the two adjacent notes, taking into account the previously specified constraints of mutation changes in duration and pitch.

In the cases of the genotype related to the representation of Uncorrelated mutation with one Step Size and Uncorrelated mutation with n Step Sizes, it will also be necessary to include in the genotypical array one extra position in case of one Step size, or three extra positions in case os n Step Sizes, in order to include the new \(\sigma \) values corresponding to the new note.

4.4 Initialization of the Population, Parents Selection and Crossover

The population will be initialized creating a number \(\mu \) of different individuals, whose genotypes have been initially cloned from the starting melody one, and afterwards subjected to a random mutation process.

A number of \(\lambda \) couples of parents will be randomly chosen to generate a new child from every couple. The recombination operation for the new genotype will be the uniform crossover, so each gene will be randomly inherited from any of the parents.

The selection process of survivors for next generation will be guided by method \(\mu +\lambda \), which involves mixing together the population of parents and offspring [3], sorting by each individual’s fitness and choosing the best \(\mu \) individuals.

4.5 Performance Indicators

For each execution there will be a maximum of 200 generations. Each experiment will be executed 1.000 times for any one of the pre-established setups. The algorithm will store the following performance indicators [3]:

  • SR (Success Rate): Percentage of executions that finish successfully over the total number of executions.

  • MB (Mean Best Fitness): Average of the best fitness value of the population when execution finishes, successfully or not.

  • MBFS (Mean Best Fitness Success): Average of the better fitness value of the population taking into account only the successful executions.

  • AES (Average number of Evaluations to a Solution): Average number of generations needed to reach a successful execution.

  • MST (Mean Success Time): Mean time needed to find a successful execution.

5 Experiments

5.1 First Experiment

Given initial melody A and goal melody B, use the evolutionary strategies with fitness function (3) to generate a melodic transition from A to B (Fig. 1).

Fig. 1.
figure 1

Initial and final melodies of the first experiment.

Settings of the algorithm: \(\mu =20\), \(\lambda =200\), mutation prob. \(=\) 0.15 and note insertion prob. \(=\) 0. Run 1.000 times. Results shown in Table 1 and Fig. 2:

Fig. 2.
figure 2

Some intermediate melodies generated at one successful execution.

Table 1. Benchmarking indicators for experiment one.

5.2 Second Experiment

Given initial melody A and goal melody B, use the evolutionary strategies with fitness function (3) to generate a melodic transition from A to B (Fig. 3).

Fig. 3.
figure 3

Initial and final melodies of the second experiment.

Fig. 4.
figure 4

Some intermediate melodies generated at one successful execution.

Settings of the experiment: \(\mu =20\), \(\lambda =500\), mutation prob. \(=\) 0.15 and note insertion prob. \(=\) 0.05. Run 1.000 times. Results shown in Table 2 and Fig. 4.

Table 2. Benchmarking indicators for experiment two.
Fig. 5.
figure 5

Performance charts for the first and second experiment.

5.3 Results

In Fig. 5 it is possible to compare the performance curves for experiment one and two, for simple mutation, Uncorrelated mutation with one Step Size and Uncorrelated mutation with n Steps Size. The final benchmarks for the performance indicators of SR, AES and MST are summarized in Table 3.

Table 3. Final performance indicators for the experiments.

6 Discussion

We have run 1000 executions for each set of experiments one and two, with a maximum number of 200 offsprings for each execution. The success rate of experiment one was 100%, due to the melodies not being very distant in terms of evolutionary search. The success rate of experiment two was 40,7% as the initial and final melodies where very distant. For the experiment one, the most efficient mutation was the simple mutation. Nevertheless, Uncorrelated mutation with n Steps Sizes has been the most efficient in experiment two. All the representations achieved low values of mean success time (MST) and low average number of evaluations to a solution (AES).

7 Conclusions

The evolutionary algorithm implemented and tested in the two experiments has proved to find solutions to the problem of thematic bridging between two melodies, thanks to the minimization of the novel fitness function proposed in this paper (3) and based on the Neighborhood Average Dissimilarity (2).

The evolutionary algorithm implemented in the experiments exercise has been shown to be capable of making transitions between two melodies by minimizing the fitness function proposed in (3), in a quick and useful way for simple evolutionary composition poruses. Future work will involve the implementation of more sophisticated evolutionary techniques, widening the rhythmical restrictions of the evolutionary algorithm and incorporating the generation of harmonic chord sequences.