Keywords

1 Introduction

1.1 Use of Design of Experiment in Research

The design of experiment (DOE), also known as experimental design, was developed by Ronald Fisher in the early 1920s. It is a statistical method aimed to plan and analyze experiments, in order to extract the maximum amount of information with the fewest possible number of runs. It allows also to build regression models and to optimize the output by choosing proper variable settings. The traditional way of conducting experiments is intended to change One Variable at A Time (OVAT), thus accepting the risks to become trapped in a local optimum, missing the global optimum. The DoE allows changing simultaneously all the variables, helping in finding their best combination [6, 12, 14]. Moreover, it provides a regression model that, in the range of variables used to build it, can make predictions for values different from those used in the study (see: [1, 18]).

During the last decades, DOE has been successfully applied to optimize processes in chemistry and engineering [13] as well as in pharmaceutical and bio-pharmaceutical industry, both in development and production [2, 7, 911, 14, 22, 25, 26]. New applications are emerging in biomedical research, specifically in medium-high throughput assays and in the optimization of laboratory protocols [3, 4]. In particular, DoE has recently been used in drug screening, where progress in molecular biology and advanced technologies has given new opportunities to test large chemical libraries against biological targets. However, the introduction of combinatorial chemistry and high-throughput screening has not met the expectations, rather it has been accompanied by a decline in productivity [20]. This can be ascribed to a number of reasons, including the fact that the process of selection leaves behind many potentially interesting molecules [16, 23]. This has drawn the attention to cell-based assays and to more robust screening approaches in order to increase R&D efficiency/efficacy, and thus productivity. In this respect, attention has also grown toward methods for an efficient development and setup of the assays. In the last decades, there are several examples of the application of DoE in drug screening [5, 11, 15, 17, 24], mainly related to the optimization of biological and biochemical process condition. Other examples are oriented to optimization of data processing in metabolomics [8, 27]. The use of experimental design for optimizing software parameters is still poorly explored (e.g. [21]).

In this paper, we report the use of DoE for the fine set up of the analytical processing in a newly developed drug screening approach.

1.2 Dedicated Image Analysis Software for a New Drug Screening Approach

An innovative optical platform for ion channel drug screening, based on a proprietary approach, has been developed by a multidisciplinary team. Briefly, cells expressing the channel of interest are loaded with a fluorescent voltage-sensitive dye and the effect of a drug is revealed by the fluorescence values recorded before and during exposure to electrical stimulation (see EP2457088 patent for more details). Images acquired under the above conditions are processed by a MatLab-based Image Analysis (MaLIA; a program developed in the laboratory) that offers the possibility to employ different filters, parameters and types of analysis. Data representative of the cellular response to the electric pulses are used to extrapolate changes in resistance/conductance; these values are put in relation with increasing concentrations of the molecule of interest, thereby obtaining a typical sigmoidal Concentration-Response (CR) curve defined by a set of qualitative and quantitative parameters. In the course of the project, the MaLIA has progressively evolved, gaining in flexibility, to explore multiple analytical options. In this development phase, different analysis configurations were experimented: the parameter space was narrowed to a set of five, four of which varying between two values only. The final goal of the project was to define the optimized values of these parameters, in order to perform a standard analysis in full automation, without external, arbitrary interventions. To this end, we employed the DoE to evaluate the effects of different parameters/filters implemented in the MaLIA as well as their interactions.

2 The Design of Experiment (DoE) Method

The different values assumed by each factor (the experimental variables) are called levels (typically only two, codified as −1 or +1) and can be either qualitative or quantitative. The DoE allows evaluating both the influence of single variables (main factors) and the interplay among factors (interactions), i.e. when the effect of a factor depends on the level of one or more other factors. A specific combination of levels is called treatment (or run). Each treatment is evaluated in terms of outputs or responses, which are representative of the behaviour of the system. The magnitude of a change in response, when factors are varied, is called effect.

In order to achieve statistically relevant conclusions from experiments, it is necessary to adopt different statistical principles: randomization (i.e. scrambling the running order of treatments), replication (i.e. repeating each treatment twice or more) and blocking (i.e. modelling extraneous sources of variation as special variables). These principles minimize experimental bias that may mask the responses of the significant factors (see [18]).

The term factorial design identifies the most used set of treatments employed to investigate the effects of factors on a response. The simplest factorial design is called two-level factorial design, and is used in two forms: full factorial and fractional factorial design. Full factorial design requires 2K runs, where K is the number of factors, and generates, with the increasing number of factors, a considerable (even unmanageable) amount of runs. In case of many factors (e.g. >5), we can reduce the number of requested runs, based on specific assumptions, such as ignoring interaction of more than three factors: in this way we perform a fractional factorial design. Experimental designs reduced to 2K−1 runs are called Resolution V designs. In these designs, no main effect or two-factor interaction is aliased with any other main effect or two-factor interaction, but two-factor interactions are aliased with three-factor interactions. There is also the possibility to consider more than two levels for each factor, and to create a general factorial design [1].

After performing experiments according to the planned design, results are analyzed through a graphical interpretation (Factorial Plots and Statistical Plots) and a set of statistical parameters. The Factorial Plots include the Main Effect Plot and the Interactions Plot. The former is represented as a straight line: the slope indicates the direction while its magnitude the strength of the effect. On the other hand, the Interactions Plot shows how different combinations of factor settings affect the response: non-parallel lines show interaction between couple of factors. The Normal Probability Plot (one of the Statistical Plots) is a different representation of a distribution, with the cumulative percentage on the logarithmic Y-axis and the ordered values of the observations on the X-axis. In this representation, the Gaussian distribution appears as a straight line. It is used to check normality of the data and to find out the most significant ones: non-significant data are dispersed along a straight line, whereas significant data are apart. In the experimental design, the Normal Probability Plot is used to evaluate significance and normality of both main and interaction effects. The Pareto Plot (another Statistical Plot) displays the absolute values of main and interaction effects: a reference line shows statistically significant values (P < 0.05). The Normal Plot for Residuals is conceptually the same as the one used for effects and interactions and estimates the difference (residuals) between actual and predicted values (calculated by the regression model obtained from the DoE analysis), to verify whether the data have a Gaussian distribution.

This analysis can be complemented by a number of statistical parameters, including a regression model describing each response as a function of the selected factors and information coming from the ANOVA analysis (see: [18]).

As a general approach, a screening analysis is first performed with less stringent conditions to identify the most significant factors. Subsequently, an optimization analysis is applied to a narrower set of factors to find the best condition that optimizes the output(s).

Along with the factorial designs, DoE offers a rich set of other designs, to suit most requirements. Few examples are:

  1. 1.

    PlackettBurman design, which evaluates the effects of main factors only, with a small set of runs. It is mainly used in the screening phase;

  2. 2.

    Response surface designs (e.g. Central Composite, Box–Behnken), which are used to identify points of absolute maximum, and to highlight possible nonlinearities (for quantitative factors only). They are mainly used in the optimization phase.

  3. 3.

    Mixture design, which is used when factors are components of a blend, and the output depends on their relative proportion.

3 Experimental Setup

Experiments were performed on a Chinese hamster ovary (CHO) cell line expressing the human transient receptor potential (TRPV1) channel (kindly provided by Axxam S.p.A) using capsaicin as reference agonist. CHO-TRPV1 cells were stained with a voltage sensitive dye (VSD; di-4-ANEPPS), and exposed to a square electric pulse. Local fluorescence values were measured before and during the pulse (Fig. 1, left and right, respectively) both in the absence and in the presence of capsaicin. As expected from the poor sensitivity of the VSD (~8 % fluorescence variation/100 mV), changes are hardly appreciated at first sight and a sophisticated analysis is necessary to automatically isolate and evaluate subcellular responsive areas. Further details are available on the patent EP2457088 and will be reported in a full paper on this new approach (Menegon et al. in preparation).

Fig. 1
figure 1

CHO-TRPV1 cells images before (a) and during (b) exposure to an electrical square pulse. The signal (differences in fluorescence intensity in specific subcellular regions) is not easily appreciable without proper data processing

Among the different types of DoE designs, we decided to use factorial designs for two reasons. On the one hand, we needed to evaluate second order interactions and Plackett–Burman was not suitable. On the other hand, our factors were typically at two levels, making inappropriate other analyses such as response surface designs.

The very same stack of images was processed many times with MaLIA, to cover all the combinations of parameters indicated by the experimental design. Randomization was not required, because no external bias factors could affect the running of the software analysis. For DoE analysis, we selected, among the parameters implemented in MaLIA, the following five factors (variables) that appeared to influence the output data:

  1. a.

    Binning: to reduce image noise by combining cluster of pixels into single pixels;

  2. b.

    Shape-mask (ShapeM): to select the membrane responsive areas;

  3. c.

    Minimum responses filtering (MinRespFilt): to discard signal values lying inside the noise range;

  4. d.

    Response calculation (RespCalc): Fold Change (FC) or Normalized Fold Change (NFC);

  5. e.

    Output data filtering (OutputDataFilt): pure statistical or functional (to exclude variations not coherent with the expected biological response)

We defined also two outputs to evaluate the influence of these parameters on CR curves:

  1. 1.

    R-squared (Rsq), as a measure of good fitting of the sigmoidal curve;

  2. 2.

    Top minus bottom (T-B), as the difference between highest and lowest values in the sigmoidal curve (a measure of the efficacy of tested drug).

Finally, in order to account for possible inter-day variations (due to biological variability and/or changes in the process), we repeated the same set of treatments on image stacks obtained in three different experimental days, and modelled each of these replications by blocks.

The MaLIA parameters are qualitative and at two levels only, with the sole exception of Binning that has three possible levels: for a full evaluation, a general factorial design with five factors should be employed. According to Anderson and Whitcomb (see Chap. 7, pp. 133–134): (1) a general factorial design is to be avoided when the number of factors increases (typically higher than 3), (2) a reduction of a general factorial design requires ad hoc elaboration. The same authors suggest making preliminary tests to attempt to reduce the analysis to a two-level factorial. In our case, a complete general factorial design (5 factors, one of them at 3 levels) would require 2(5−1) × 3 = 48 runs per replicate that, multiplied by the 3 foreseen replicates, give a total of 144 runs. As expected, the DoE software we use does not allow for reducing general factorial designs. In line with the suggestions of Anderson and Whitcomb, we evaluated the possibility to reduce the number of levels for Binning. Therefore, we first set an unconventional screening analysis, by considering the sole two factors directly involved in the extraction of data from images: Binning (three levels) and Shape Mask (two levels), by using a standard set of the other parameters. Thanks to the reduction of Binning to two levels, in the second analysis we were able to evaluate all factors at two levels with a fractional factorial design. In this way, it was possible to perform the analysis with only 6 + 16 runs for each replicate. Finally, we made a validation to verify: (1) that there is no significant interaction between Binning and the other factors; and (2) that the discarded Binning level was less suitable for optimal results.

We generated Normal Plot and Pareto Plot, to identify statistically significant factors and interactions, as well as the Main Effect Plot and the Interaction Plot to evaluate factors influence on each output. Goodness of fit was judged by the Residuals Plots and other statistical parameters. Blocks provided information about the influence of different experimental days (inter-day variations). All the DoE analysis was performed by Minitab, a statistics package developed at the Pennsylvania State University (Minitab Inc., State College, PA, USA).

4 Results of the First Analysis (Screening)

The first analysis was aimed to find the two most significant values out of the three possible levels of image Binning and was performed considering the Shape-Mask as the sole factor able to interact significantly with the Binning. In fact, only these two variables are directly related to the pixels of the image. Factors and their levels were as follows:

  1. 1.

    Binning (1 × 1, i.e. no binning; 2 × 2; 4 × 4; referred to as 1, 2 and 4, respectively);

  2. 2.

    Shape-mask (yes; no).

Because of the three-levels Binning factor, a General full factorial design was used (18 runs, 3 replicates). The screening analysis clearly demonstrates an interaction between Binning and Shape Mask on the Rsq output (Fig. 2) but not on the T-B output (Fig. 3).

Fig. 2
figure 2

Minitab graphs of the screening analysis for Rsq: Normal probability plot for Residuals (a), Main Effect Plot (b) and Interaction Plot (c). The table in (d) shows P for the chosen factors and their interactions

Fig. 3
figure 3

Minitab graphs of the screening analysis for T-B: Normal probability plot for Residuals (a), Main Effect Plot (b) and Interaction Plot (c). The table in (d) shows P values for the chosen factors and their interactions

Figure 2a indicates that residuals for Rsq are normally distributed—i.e. very close to the line representing the normal distribution—a condition necessary to proceed with a standard analysis without doing a variable transformation (see: [1]). The analysis shows a significantly lower Rsq for Binning 1 compared with Binning 2 and even more with Binning 4 (P = 0.04, Fig. 2d). An improvement in Rsq is observed when Shape Mask is applied (Fig. 2b). The interaction Plot (Fig. 2c) confirms that Binning 1 gives lower Rsq while Binning 2 and 4 show the best results. The influence of Shape Mask is maximal with Binning 1, moderate with 2 and negligible with 4. A P-value = 0.248 for the variable Blocks shows no influence of inter-day conditions for Rsq.

Figure 3a indicates that residuals are normally distributed also for T-B. The effect of Binning on the T-B output confirms Binning 1 as the worst condition, but also shows a trend, with Binning 4 better than 2 (see Fig. 3b); interestingly, Shape Mask has no influence on the T-B considered alone or even in combination with Binning as shown by the interaction plot (Fig. 3c), where lines are almost parallel. A P-value < 0.001 for the variable Blocks indicates a significant influence of inter-day conditions on T-B. Overall, Binning was the sole significant factor (P < 0.01, Fig. 3d).

5 Results of the Second Analysis (Optimization)

The aim of the second analysis was to define an optimal parameter configuration, by considering the following factors/levels:

  1. 1.

    Binning (2; 4);

  2. 2.

    Shape-mask (yes; no);

  3. 3.

    Minimum response filtering (yes; no);

  4. 4.

    Output data filtering (Stat; Funct);

  5. 5.

    Response calculation (NFC; FC).

Under these conditions, a Full Factorial design would have required 32 runs per replicate, i.e. the same number of runs needed for an OVAT approach, however, with the advantage of providing information about interactions. Considering that we were interested also on the influence of inter-day variability, a minimum of 3 replicates (performed on image stacks produced in different days) had to be performed. This would have required a total of 96 runs. In order to reduce this number, we made the assumption that the interactions of the second order (i. e. interactions of two factors at a time) were sufficient for a correct approximation in our analysis, also considering that higher interactions (three factors at a time or more) are expected to be negligible in most cases (see [19]). Based on these considerations, we reduced the number of trials by employing the fractional factorial design with resolution V, which required 48 runs for 3 replicates, at the expenses of the assessment of third order interactions. Figure 4 shows the results of DoE Analysis for the Rsq output. Normal probability Plot (Fig. 4a) for Residuals show good fitting. The Pareto Chart of the Standardized Effects (Fig. 4b) indicates that the only statistically significant factor is Shape Mask (P = 0.001) while the only significant interaction is Shape Mask with Response Calculation (P = 0.029).

Fig. 4
figure 4

Minitab graphs for Rsq optimization analysis: the Normal Probability Plot for Residuals (a) indicates an suitable distribution of residuals; the Pareto of the Standardized Effects (b) indicates that there are only two significant effects (i. e. laying beyond the vertical line that marks the threshold for Alpha = 0.05): Shape Mask and the interaction between Shape Mask and Response Calculation

Taking into consideration the results shown in Fig. 5a, b, we can assume that, as far as Rsq is concerned, best results are obtained with: Shape-mask, Binning 4, no Minimum response filtering, Statistical Output data filtering and NFC Response calculation.

Fig. 5
figure 5

Minitab graphs for Rsq optimization analysis (factorial plots): the Main Effects Plot (a) confirms that the Shape Mask effect is the most important among single factors and that best results are obtained when the mask is applied: the Interaction Plot (b), shows the best combination for Shape Mask and Response Calculation (if ShapeM = yes, both values for RespCalc are suitable)

Similar analysis was performed considering T-B as the Output. Normal probability Plot (Fig. 6a) for residuals show good fitting. The Pareto chart of the standardized effects indicates that all the main factors, but Minimum response filtering, are statistically significant (Fig. 6b): Output data filtering (P < 0.001); Binning (P < 0.001); Shape-mask (P = 0.002); and Response calculation (P = 0.004). Minimum response filtering has a significant interaction with Output Data Filtering (P = 0.021).

Fig. 6
figure 6

Minitab graphs for T-B optimization analysis: the Normal Probability Plot for Residuals (a) indicates a suitable distribution of residuals; the Pareto of the Standardized Effects (b) shows that all single factors, but the Minimum Response Filter (MinRespFilt), are significant, while only one interaction, the one between MinRespFilt and OutputDataFilt, lays beyond the vertical line (threshold for Alpha = 0.05)

Considering the Main Effects Plot (Fig. 7a) and the Interaction Plot (Fig. 7b) for T-B, we can infer that best results are obtained with Shape-mask, Binning 4, Statistical Output data filtering, no Minimum response filtering and NFC Response calculation.

Fig. 7
figure 7

Minitab graphs for T-B optimization analysis (factorial plots): the Main Effects Plot (a) indicates the best values for the significant factors: Binning = 4, ShapeM = yes, OutputDataFilt = Statist and RespCalc = NFC. In the Interaction Plot (b), the value MinRespFilt = no together with OutputDataFilt = Statist are the significant interacting factors values that optimize the output T-B

Based on the above results, we were able to define an optimized configuration (Table 1) and a suboptimal one (Table 2).

Table 1 Optimized parameter set for Rsq and T-B
Table 2 Suboptimal parameter set for Rsq and T-B

C-R curves were then calculated with both the optimized and the suboptimal set on the same data used for DoE analysis. Figure 8 illustrates an example in which the C-R curve obtained with the optimized set exhibits an Rsq value improved from 0.91 to 0.99 and a T-B value from 0.28 to 0.49, which represent a percent improvement (defined as (PSopt − PSsubopt)/PSsubopt, where PS = parameter set) of respectively +8.8 % (Rsq) and +75 % (T-B).

Fig. 8
figure 8

CR curves obtained with optimized and suboptimal parameter sets: the CR curve shows the fractional changes of the membrane resistance at different drug concentrations (log). The CR curves obtained with the suboptimal (a) and with the optimized (b) parameter sets, on the same images stack, are compared to put in evidence the marked improvement: Rsq from 0.91 to 0.99, T-B from 0.28 to 0.49

As a final consideration, P-value for BLOCKS showed an influence of inter-day conditions that is significant for T-B (P < 0.001) but not for Rsq (P = 0. 147).

6 Validation of Obtained Optimized Configuration

The optimized parameter configuration we obtained with the previous analysis was then validated.

As a first step, we verified the initial hypothesis that Binning had no significant interactions with factors other than ShapeM. Indeed, Figs. 4a and 6a show that the interactions between Binning and the other factors do not reach statistical significance. Of note, in the same Fig. 4a we can appreciate that ShapeM has a significant interaction with RespCalc, clearly indicating that it is not possible to separate the pixel-related factors from the others.

Afterwards, to validate the rejection of Binning = 1 in the first analysis, we ran MaLIA on 8 different image stacks with the same parameters employed in the optimized (Table 1) and suboptimal (Table 2) configurations, with the exception of Binning value set to 1. The substitution of Binning = 1 worsened the value of Rsq and T-B in both the optimized (−6 % and −30 %, respectively) and the suboptimal configuration (−17 % and −54 %, respectively). We can conclude that, as suggested by the experience during the development of the MaLIA program and assumed during the design of the first analysis, Binning = 1 minimized the overall performance. This ex post validation also confirms the validity of the assumptions we made in the first analysis of this unconventional DoE design.

Afterward, we produced C-R curves with both the optimized and the suboptimal sets on data from different experiments in order to validate the results in a wide range of cell and drug types (see Table 3).

Table 3 Pharmacological targets used to validate the optimized parameter set

Experimental data were randomly selected within a time interval of 2 years, representing five cellular lines exposed to their reference drugs. Two experiments for each cell line were considered. Such a wide time interval was used to take into account also changes due to the evolution of both biological protocols and screening processes.

The Rsq and the T-B values of the C-R curves obtained with the optimized and suboptimal parameters sets are compared in Fig. 9a, b and shown as percent variation ((PSopt − PSsubopt)/PSsubopt) in Fig. 9c. The charts clearly indicate that the optimized set consistently produces better CR curve: Rqs benefits of a slight improvement (up to 4.3 %), while T-B takes much more advantage (up to 90.4 %). The only exception is represented by an experiment (HEK-293 GABA-A exp. 1), in which Rsq is lower (−1.2 %) with the optimized set, even though the T-B response maintains a positive performance of +9.3 %.

Fig. 9
figure 9

Validation of the optimized set on different cells/drugs (see Table 2): Rsq (a) and T-B (b) values are always improved with the optimized set of parameters (black columns) rather than with the suboptimal one (gray columns), with a single exception (HEK-293 GABAAR exp.1, Rsqopt < Rsqsubopt). The percent variations are shown in (c), where Rsq variations refer to the right y-axis, while the T-B variations to the left y-axis

7 Discussion and Conclusions

DoE was performed to optimize the set of analytical parameters used in a new drug screening procedure. This is a simple application of the method that provided useful results with good efficiency (time and resources vs results).

We adopted an unconventional DoE approach: the screening design, instead of being employed to reduce the number of factors, was used to reduce the number of levels of one of the factors, taking advantage of the specific knowledge of the image analysis process. This simplification made possible, in a second analysis, a direct and simpler comparison among all main parameters, thereby avoiding the more complex General Factorial design. This second design, which we called optimization in this study, was performed with a Resolution V Fractional Factorial design. A standard Response Surface design could not been employed since all factors are qualitative and intermediated values could not be envisaged (see: [18]). The choice of using a Fractional Factorial design, which ignores third order interactions, is largely supported by the evidence that results are little influenced by pairs of factors (interaction of the second order), validating the initial assumption of a negligible contribution of higher order interactions.

The unconventional choice of a general full factorial, followed by a fractional factorial design, allowed us to downsize the number of runs. A traditional general factorial, with 4 factors at 2 levels and 1 at 3 levels, would have required 48 (24 × 3) runs for each replicate. With our approach, we made 6 (3 × 2) runs for the first phase and 16 (2(5–1)) for the second analysis, with a total of 22 and a saving of 26 runs with respect to a single general factorial design. Considering 3 replicates, we saved 78 runs. Each MaLIA run (inputting data, setting parameters, waiting for analysis elaboration and collecting results) takes at list 7–10 min to a skilled operator. Accordingly, we saved up to 13 h on a total forecasted effort of 24 h, i.e. ~54 % saving. Overall, by this DoE approach, we saved time, gaining more information.

Finally, the use of the Blocking reveals that the impact of the experimental day could not be neglected in this study, which embraces 2 years’ work of development of the drug screening platform. Interestingly, only the T-B, but not the Rsq, was subjected to inter-day variation. This result can lead to two conclusions: first, this influence may deserve further analysis after final validation of the screening platform. Secondly, experimental data could have been transformed to correct for blocks, thus obtaining a result independent of day-to-day variability (see [1]; Chap. 2). However, the excellent validation of the optimized set of parameter demonstrates that the result is robust enough to make a more sophisticated analysis not necessary. Further investigation might involve a refinement of the quantitative thresholds used for some of the parameters (e.g. threshold for Minimum response filtering). If we consider the results in terms of the final application, the observed inter-day variation appears to reflect the process of optimization of the biological and biochemical conditions during the progressive development of the drug screening platform. On the other hand, they provide direct evidence that also in sub-optimal experimental conditions, the set of choice guarantees the best possible result. This evidence receives further confirmation by the fact that a consistent improvement was observed independently of the cell lines and drug type. Overall, this is an important prerequisite to consider this new approach for the study of different pharmacological targets, in an unbiased way and in an industrial context.

In conclusion, our work demonstrates that the application of DoE on the selection of software parameters, although still poorly exploited, can provide very useful results by reducing the number of trials compared to a complete OVAT approach. In this respect, it is worth noticing how a conscious introduction of constraints to reduce the degrees of interactions, along with a two-stage design, can greatly simplify the modelling and thus the obtaining of the result.