Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm

Aissaoui, Azeddine; Ouafi, Abdelkrim; Pudlo, Philippe; Gillet, Christophe; Baarir, Zine-Eddine; Taleb-Ahmed, Abdelmalik

doi:10.1007/s10055-017-0310-7

Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm

Original Article
Published: 04 April 2017

Volume 22, pages 13–23, (2018)
Cite this article

Download PDF

Access provided by CONRICYT – Journals CONACYT

Virtual Reality Aims and scope Submit manuscript

Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm

Download PDF

Azeddine Aissaoui¹,
Abdelkrim Ouafi¹,
Philippe Pudlo^2,3,4,
Christophe Gillet^2,3,4,
Zine-Eddine Baarir¹ &
…
Abdelmalik Taleb-Ahmed^2,3,4

753 Accesses
13 Citations
Explore all metrics

Abstract

In multi-camera motion capture systems, determining the optimal camera configuration (camera positions and orientations) is still an unresolved problem. At present, configurations are primarily guided by a human operator’s intuition, which requires expertise and experience, especially with complex, cluttered scenes. In this paper, we propose a solution to automate camera placement for motion capture applications in order to assist a human operator. Our solution is based on the use of a guided genetic algorithm to optimize camera network placement with an appropriate number of cameras. In order to improve the performance of the genetic algorithm (GA), two techniques are described. The first is a distribution and estimation technique, which reduces the search space and generates camera positions for the initial GA population. The second technique is an error metric, which is integrated at GA evaluation level as an optimization function to evaluate the quality of the camera placement in a camera network. Simulation experiments show that our approach is more efficient than other approaches in terms of computation time and quality of the final camera network.

A new evolving mechanism of genetic algorithm for multi-constraint intelligent camera path planning

Article 05 January 2021

Camera Self Calibration with Varying Parameters by an Unknown Three Dimensional Scene Using the Improved Genetic Algorithm

Article 06 February 2015

New Binary Particle Swarm Optimization Algorithm for Surveillance and Camera Situation Assessments

Article 25 November 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Human motion capture using markers (active or passive) requires cameras to be positioned around the volume of interest so that at least two cameras can view each marker. Three-dimensional marker positions are then calculated by triangulation, and subject movements can thus be defined. When the movement is complex or is produced in the presence of obstacles, it is more difficult to capture markers. Consequently, a human operator tries to find camera configurations that minimize marker occlusion. This task remains difficult and requires time as well as many validation tests, even in the case of an experienced expert.

Camera configuration has a critical impact on the overall motion capture performance. The configuration quality to determine the marker in 3D is strongly affected by two sources of error: marker visibility and triangulation accuracy. In order to address these error sources, the following points should be taking into consideration:

1.
A Marker should not be occluded by any obstacle.
2.
A Marker should be within the camera’s field of view.
3.
A Marker must be visible given a camera’s resolution, the marker’s size, and its distance from the camera.
4.
At least two cameras are required to reconstruct a marker.
5.
Cameras should be arranged sufficiently non-parallel so that triangulation calculations are well conditioned.
6.
Incorporating additional cameras leads to over conditioned triangulation. However, it involves a higher cost.

To assist human operators in configuring camera networks and improving human motion capture, we developed a computer tool using a guided genetic algorithm to simulate the best camera network configuration. Our approach looks at capturing tags (or markers) attached to virtual humanoid performing movements similar to those in the real world. The goal is then to simulate the best camera network for a specified virtual scene.

2 Related work

The problem of finding optimal camera placement has been studied for a long time. The first research on the subject is based on the famous art gallery problem, of which the main purpose is to find the minimum number of cameras that can monitor a fixed number of paintings in a gallery (Chvatal 1975). Initial formulations lacked realistic models for cameras and the environment (Lee and Lin 1986; O’rourke 1987). Cameras are assumed to have 360-degree field of view, and there is no resolution degradation with viewing distance. Additionally, both the target space and the locations of cameras are restricted to 2D. Mittal and Davis (2008) and Piciarelli et al. (2010) incorporated more realistic models of cameras and the environment and expanded the search space to 3D.

Unlike the previous techniques which formulated the camera placement problem in 2D or 3D continuous spaces, most recent approaches consider the problem entirely in the discrete domain (Hörster and Lienhart 2006). Instead of optimizing a continuous function using a variational calculation, approaches in the discrete domain quantify the search space from a finite number of candidate positions and look for the best configurations to optimize a given cost function. This strategy naturally leads to combinatorial problems. Efforts have been made to develop discrete camera positioning problems using linear binary programming (Gonzalez-Barbosa et al. 2009) and quadratic programming (Ercan et al. 2006).

Moreover, as the majority of these formulations lead to NP-hard problems, a multitude of practical solutions have been proposed: binary integer programming (Zhao et al. 2008), greedy approach (Al Hasan et al. 2008), Monte Carlo simulations (Yao et al. 2010), genetic algorithms (Olague and Mohr 2002) and particle swarm optimization (Morsly et al. 2012).

The high dimensionality of the search space and nonlinearity of the camera placement problem for motion capture led us to promote the use of genetic algorithms. In the following, we present research based on the use of genetic algorithms (GA) to solve the camera placement problem.

Olague and Mohr (2002) used a genetic algorithm to evaluate a set of camera network configurations based on horizontal and vertical camera orientation ($\alpha$, $\beta$). They proposed a 3D reconstruction uncertainty measurement using a scalar function of the covariance matrix. This method proposes the resolution constraint as the only cause of 3D uncertainty. Hence, a quality metric was built by (Chen and Davis 2008) taking resolution and occlusion into consideration. Rahimian and Kearney (2015) show that their optimization algorithm can lead to more significant performance gains over Chen and Davis (2008) method both in terms of the visibility metric and in the robustness of full-body motion capture. However, their method does not estimate the minimum number of cameras for a given accuracy threshold and workspace.

Indu et al. (2009) proposed an optimization of the camera network placement using a genetic algorithm based on camera position parameters and horizontal and vertical camera orientation ($x_{{\mathrm{c}}}$, $y_{{\mathrm{c}}}$, $z_{{\mathrm{c}}}$, $\alpha$, $\beta$). This study defines a coverage matrix by dividing the area to be photographed into a uniform grid, and each grid cell is classified as a priority area, a non-priority area, or as an obstacle. This method becomes ineffective when performed with a high number of grids elements.

Katz et al. (2010) adopted Olague and Mohr (2002) method using the same 3D reconstruction uncertainty measurement but with all the camera position and orientation parameters ($x_{{\mathrm{c}}}$, $y_{{\mathrm{c}}}$, $z_{{\mathrm{c}}}$, $\alpha$, $\beta$, $\gamma$) to find an optimal network placement. This method seems ineffective in solving the motion capture problem because it is limited to finding local minima and therefore does not always provide the right camera placement. In addition, the adjustment of the population size or crossover and mutation rates is manually performed by the user; this is disadvantageous to the effectiveness of their method.

The main elements that determine the differences between previous GA-based approaches are the number of camera parameters and the way they are used to evaluate placement performance. Researchers have gradually increased the number of camera parameters in order to find adequate solutions that do not require manual intervention. However, these algorithms suffer due to their excessive computation time since they simultaneously handle several solutions. Moreover, incorrect determination of parameters such as population size or mutation rate may limit the effectiveness of the algorithm.

3 Proposed approach

Our camera placement assistance methodology is divided in two modules: graphical modelling, and camera placement optimization (Fig. 1).

Graphical modelling includes scene creation, camera placement, and virtual humanoid animation. Animation involves enabling the virtual humanoid to produce movement similar to that in the real world in order to develop a specific scenario. This scenario can be recorded, modified, or rebuilt.

The camera placement optimization module applies GGA to the scenario created in order to establish the best camera network placement using an appropriate number of cameras. This module is composed of three blocks. Distribution and estimation technique (DET) block is used to restrict the search space, the error metric block is used to evaluate the performance of the configured camera network, and the adaptive genetic algorithm block starts with the volume estimated by the DET and uses the error metric to optimize the initial camera network.

3.1 Module 1: Graphical modelling

In this section, the graphical modelling that allows us to create a virtual scenario is defined. This step is considered as the starting point of the optimization process. It includes Scene modelling, Camera modelling, and Human modelling.

3.1.1 Scene modelling

The scene represents the volume in which the cameras and the animated humanoid can be placed. The user specifies minimum and maximum values for the dimensions of each scene along the x, y and z axis (Fig. 2).

3.1.2 Camera modelling

Cameras are modelled in 3D (Fig. 2). The location and view direction of the cameras in the scene are described by extrinsic parameters, including:

Three position coordinates: $(x_{{\mathrm{c}}}, y_{{\mathrm{c}}}, z_{{\mathrm{c}}})$.
Rotation matrix R with three angles: ($\alpha$, $\beta$, $\gamma$).
$$\begin{aligned} R=R_x(\alpha )\times R_y(\beta )\times R_z(\gamma ) \end{aligned}$$
(1)
with
$$\begin{aligned} R_x(\alpha )= & {} \begin{pmatrix} 1 &{}\quad 0 &{}\quad 0\\ 0 &{} \quad \cos (\alpha ) &{} \quad -\sin (\alpha )\\ 0 &{} \quad \sin (\alpha ) &{} \quad \cos (\alpha ) \end{pmatrix} \end{aligned}$$
(2)

$$\begin{aligned} R_y(\beta )= & {} \begin{pmatrix} \cos (\beta ) &{} \quad 0 &{}\quad \sin (\beta )\\ 0 &{} 1 &{} 0\\ -\sin (\beta ) &{} \quad 0 &{}\quad \cos (\beta ) \end{pmatrix} \end{aligned}$$
(3)

$$\begin{aligned} R_z(\gamma )= & {} \begin{pmatrix} \cos (\gamma ) &{} \quad -\sin (\gamma )&{} \quad 0\\ \sin (\gamma ) &{} \quad \cos (\gamma ) &{} \quad 0\\ 0 &{}\quad 0 &{}\quad 1 \end{pmatrix} \end{aligned}$$
(4)

$R_x$ rotates y axis towards z axis.
$R_y$ rotates z axis towards x axis.
$R_z$ rotates x axis towards y axis.

3.1.3 Human modelling

A figure is represented as an H-Anim (LOA 1) model (Lee et al. 2015), in which each model node corresponds to a body part (e.g. legs, torso, arms). Tags are fixed to specific locations on the virtual humanoid according to their placement on the real actor (Fig. 2b, c). The virtual humanoid is animated by a sequence of frames. Each frame is displayed on the screen until the next frame overwrites it. This process creates a specific movement (e.g. walking, running, or jumping) that can be captured, modified, or built.

3.2 Module 2: Camera placement optimization

In the field of motion capture, GGA improves the genetic algorithm at initialization, evaluation, and reproduction operators levels to address the problems of previous studies.

The initialization step uses DET for choosing the initial population of the camera networks.

In the evaluation step, our error metric is used to estimate the visibility of the tags for each camera network, which provides a fitness score. This fitness score is then used to assess the performance of the camera networks.

In the reproduction step, adaptive strategies are used to provide the best crossover and mutation results.

We divided this camera placement optimization module into three blocks (Fig. 1): distribution and estimation technique, error metric, and adaptive genetic algorithm described in detail below.

3.2.1 Distribution and estimation technique

DET is used to compute an error metric for marker distribution during the movement to be analysed and propose the best volume to place the initial camera networks (Fig. 3).

We define a ${{ tag}}_j(x_j, y_j, z_j)$ as a given 3D point in the scene’s frame, and at each instant t the 3D position of a set of tags is given by Eq. 5:

$$\begin{aligned} L(t)= \begin{bmatrix} {{ tag}}_1(t)\\ \vdots \\ {{ tag}}_j(t) \end{bmatrix} = \begin{bmatrix} x_1(t)\\ y_1(t)\\ z_1(t)\\ \vdots \\ x_j(t)\\ y_j(t)\\ z_j(t) \end{bmatrix} \end{aligned}$$

(5)

where L(t) is the position of all the tags at instant t.

$x_j$, $y_j$ and $z_j$ are the coordinates of ${{ tag}}_j\forall j \in [1\ldots N]$, where N is the number of tags in the scene. ${{ tag}}_j(t)$ is the $j{\hbox {th}}$ tag at instant t.

From Eq. 5, we define a movement as a position matrix:

$$\begin{aligned} \overbrace{ \begin{bmatrix} {{ tag}}_1(t_0)&\cdots&{{ tag}}_1(t_{{{ Nf}}})\\ {{ tag}}_2(t_0)&\cdots&{{ tag}}_2(t_{{{ Nf}}})\\ \vdots&\ddots&\vdots \\ {{ tag}}_j(t_0)&\cdots&{{ tag}}_j(t_{{{ Nf}}}) \end{bmatrix}}^{{{ movement}}} = \begin{bmatrix} x_1(t_0)&\cdots&x_1(t_{{{ Nf}}})\\ y_1(t_0)&\cdots&y_1(t_{{{ Nf}}})\\ z_1(t_0)&\cdots&z_1(t_{{{ Nf}}})\\ x_2(t_0)&\cdots&x_2(t_{{{ Nf}}})\\ y_2(t_0)&\cdots&y_2(t_{{{ Nf}}})\\ z_2(t_0)&\cdots&z_2(t_{{{ Nf}}})\\ \vdots&\ddots&\vdots \\ x_j(t_0)&\cdots&x_j(t_{{{ Nf}}})\\ y_j(t_0)&\cdots&y_j(t_{{{ Nf}}})\\ z_j(t_0)&\cdots&z_j(t_{{{ Nf}}}) \end{bmatrix} \end{aligned}$$

(6)

where $t\in [0\ldots {{ Nf}}]$ and Nf is the number of frames.

Based on Eq. 6, initial camera locations are generated using the following two steps:

Determining initial camera positioning zone The purpose of this step is to position the cameras in the proposed solution zone (Fig. 3). To define this zone, firstly we determine, from the point clouds, the overall volume of the tags movement. Then, we select the space where tags are visible given a camera’s resolution, the marker’s size, and its distance from the camera.

Determining camera orientation from an interest point In this step, we calculate the orientation parameters ($\alpha$, $\beta$, $\gamma$) according to camera position ($x_{{\mathrm{c}}}$, $y_{{\mathrm{c}}}$, $z_{{\mathrm{c}}}$) and an interest point (IP) as a target at which the camera lens is pointed. This interest point is located in the overall volume of the tags movement at ($x_{{{ IP}}}$, $y_{{{ IP}}}$, $z_{{{ IP}}}$) from the centre of the scene $O_{{\mathrm{s}}}$ according to Eq. 7.

$$\begin{aligned} IP= \frac{1}{N} \sum \limits _{i=1}^{N} [[x_i, y_i, z_i]^{{\mathrm{T}}}]o_{{\mathrm{s}}}\Leftrightarrow \left\{ \begin{array}{ll} x_{{{ IP}}}= \frac{1}{N} \sum \limits _{i=1}^{N}x_i \\ y_{{{ IP}}}= \frac{1}{N} \sum \limits _{i=1}^{N}y_i \\ z_{{{ IP}}}= \frac{1}{N} \sum \limits _{i=1}^{N}z_i \end{array} \right. \end{aligned}$$

(7)

Then, we calculate the distances xside, yside, and zside between IP and the camera position in each axe as follows:

$$\begin{aligned}\displaystyle {xside}&=|x_{{\mathrm{c}}}-x_{{{ IP}}} | \\ \displaystyle {{ yside}}&= | y_{{\mathrm{c}}}-y_{{{ IP}}} | \\ \displaystyle {{ zside}}&= | z_{{\mathrm{c}}}-z_{{{ IP}}} |\end{aligned}$$

(8)

Using Eq. 8, we define d and r as the distances in 2D and 3D Euclidean space as follows:

$$\begin{aligned} d= & {} \sqrt{({ xside})^2 + ({{{ yside}}})^2} \end{aligned}$$

(9)

$$\begin{aligned} r= & {} \sqrt{d^2 + ({{{ zside}}})^2} \end{aligned}$$

(10)

Thus, according to the centre of the scene $O_{{\mathrm{s}}}$, the camera directions $\alpha$, $\beta$ and $\gamma$ can be calculated as follows:

$$\begin{aligned} \displaystyle \alpha= & {} \arcsin ({{{ yside}}}/r) \nonumber \\ \displaystyle \beta= & {} \arccos ({{{ zside}}}/d) \nonumber \\ \displaystyle \gamma= & {} \arctan ({ xside}/d) \end{aligned}$$

(11)

3.2.2 Error metric

The goal is to define an optimization function f, which can measure the ability of a camera network to capture tags in a 3D environment. It is then possible, out of all the possible camera network configurations, the one that maximizes this function using a minimum number of cameras.

For each tag, the number of cameras, for which the tag is visible, is determined in order to estimate the quality Q of the camera locations using the reconstruction error $E_{{\mathrm{r}}}$. There are two possible situations for $E_{{\mathrm{r}}}$: critical case, when no or only one camera captures the tag; Therefore, ${{ Er}} = 1$. Favourable case, when two or more cameras capture the tag; Therefore, ${{ Er}} = 0$.

This means that:

$$\begin{aligned}&\forall C_i; \forall {{ tag}}_j{:} \left\{ \begin{array}{ll} E_{{\mathrm{r}}}=1 \Leftrightarrow \left\{ \begin{array}{ll} \forall f_k; \sum \limits _{i, j=1}^{Nc, Nf}P_i {{ tag}}_j=0\\ \qquad \text{ or } \\ \forall f_k; \sum \limits _{i, j=1}^{Nc, Nf}P_i {{ tag}}_j=1 \end{array} \right. \\ E_{{\mathrm{r}}}=0 \Leftrightarrow \forall f_k; \sum \limits _{i, j=1}^{Nc, Nf}P_i {{ tag}}_j \ge 2\end{array} \right. \nonumber \\&k \in [1\ldots Nf], Nf=t_{{{ mov}}}/f_e, P=K.R.t, Nc\ge Nc_{{{ min}}} \end{aligned}$$

(12)

where

${{ Nf}}$ is the number of frames.
$t_{{{ mov}}}$ is the movement duration.
$f_e$ is the acquisition frequency of the movement (60 Hz).
${ P}$ is the projection matrix.
${ K}$ is the intrinsic parameters matrix.
${ R}$ and t are the rotation and translation parameters. The minimum number of used cameras must be ${{ Nc}}_{{{ min}}}\ge 2$.

We define a binary vector containing reconstructed tags (re) and a matrix ${ A}$ containing the values ${ a}_{ ij}$ of binary vectors whith:

$$\begin{aligned} {{ re}}_j&= {\left\{ \begin{array}{ll} 1 &{} {\text {if}} \quad E_{{\mathrm{r}}}=0 \\ 0 &{} {\text {if}} \quad E_{{\mathrm{r}}}=1 \end{array}\right. }\end{aligned}$$

(13)

$$\begin{aligned} a_{ij}&= {\left\{ \begin{array}{ll} 1 & {\text {if\,tag}}\,j\,{\text {is\,captured\,by\,camera}}\,i\\ 0 & {\text {else}} \end{array}\right. } \end{aligned}$$

(14)

In order to satisfy optimum camera locations, three constraints should be considered: Occlusion, camera field of view, and camera resolution.

Occlusion A marker is completely visible if nothing is blocking the view of the camera. All entities (static, dynamic or the subject itself) can be considered as obstacles ${{ Obs}}_{k}$ (Fig. 4). This constraint is expressed by Eq. 15 (Malik and Bajcsy 2008):

$$\begin{aligned} Occ= {\left\{ \begin{array}{ll} {\text {Visible}}{:} &{} \left( {{ SL}}_{CM}\cap \left( {\mathop {\bigcup }\limits _{k=1}^{n}} {{ Obs}}_{k}\right) \right) ={\varnothing } \\ {\text {Occluded}}{:} &{} {\text {else}} \end{array}\right. } \end{aligned}$$

(15)

where ${{ SL}}_{CM}$ is the straight line connecting the camera centre C and marker M.

Resolution It represents the minimum and maximum distances from which the camera can accurately view markers (Fig. 4), and is represented by the following formula (Zhao et al. 2009):

$$\begin{aligned} D_{{{ min}}} \le \Vert \left[ (X_{{\mathrm{c}}}, Y_{{\mathrm{c}}}, Z_{{\mathrm{c}}})-(X_m, Y_m, Z_m)\right] \cdot v_i\Vert \le D_{{{ max}}} \end{aligned}$$

(16)

where $v_i$ is the unit vector along the optical axis of the camera’s view trajectory.

Field of view The camera’s field of view allows for determining if tags can be observed by the camera (Fig. 4). The camera’s field of view is calculated as follows:

$$\begin{aligned} \sigma {\text {(in\, degrees)}} = 2\arctan \left( \frac{h}{2 D_{{{{ min}}}}}\right) \end{aligned}$$

(17)

Thus, we seek to maximize the following optimization function:

$$\begin{aligned} f=w_1(\sigma )+w_2(Res)+w_3(Occ) \end{aligned}$$

(18)

where ${\textit{w}}_1$, ${\textit{w}}_2,$ and ${\textit{w}}_3$ are the weights of importance predefined by the designer. This gives:

$$\begin{aligned} { Max} \sum \limits _{i}\, f \,{\text { subject\, to}}\, Q=\frac{E_{{\mathrm{r}}}}{Nf\times Nt} \end{aligned}$$

(19)

For a minimum number of cameras:

$$\begin{aligned} { Min} \sum \limits _{i=1}^{Nc}C_i\,{\text {given}}\sum \limits _{i, j, k=1}^{Nc, Nt, Nf}(a_{ij})_k\ge {{ re}}_{{{ min}}} \end{aligned}$$

(20)

Based on these equations, the objective is to evaluate the quality of the camera network to maximize the visibility of the tags present in the 3D scene.

3.2.3 Adaptive genetic algorithm

Our adaptive genetic algorithm proceeds in four steps: initialization, evaluation, selection, and reproduction as follows:

Initialization A population of camera configurations is created as an initial solution using the DET block. An individual is a vector of real numbers containing the translation and orientation parameters for each camera. We use 4–10 cameras, which results in 24-dimensional vectors of up to 60 vectors. The population consisted of 300 such individuals (Fig. 5).

Evaluation Each individual in the population is evaluated according to the error metric as described in Sect. 3.2.4 and assigned a corresponding fitness score. Our error metric was the 3D point reconstruction error applied to a distribution of tags. In our algorithm, the error from the reconstructed point returns a single score per individual, which allows evaluating its fitness. The evaluation is stopped after a predetermined number of iterations (e.g. 60 iterations) or if the reconstruction error is below a certain threshold (e.g. $f=0.9$).

Selection The population is sorted according to the fitness of the evaluated camera networks. Elitism (Goldberg and Holland 1988) is used as the selection method for camera networks exhibiting a fitness score ${\ge }0.5$ (Fig. 6).

Reproduction The camera networks selected in the previous step are used to create a new generation of camera networks based on crossover and mutation operators (Fig. 7). The choice of crossover probability $P_{{\mathrm{c}}}$ and mutation probability $P_m$ has a direct impact on the convergence of the algorithm (Mitchell 1998; Szeliski 2010). Moreover, it is difficult to find the best values of $P_{{\mathrm{c}}}$ and $P_m$ that adapt to all problems. Because of this, we adopted adaptive crossover and mutation strategies.

Commonly used adaptive operators are detailed in (Angeline and Angeline 1995; Kita 2001; Xu et al. 2003). We used crossover and mutation operators from (Kita 2001) to reduce the population size and rapidly converge to the optimal solution. These operators are efficient (Aissaoui et al. 2014).

The crossover probability is expressed:

$$\begin{aligned} P_{{\mathrm{c}}} = \left\{ \begin{array}{ll} k_1\left( f_{{{ max}}}-f\right) /\left( f_{{{ max}}}-\bar{f}\right) , &{} f \ge \bar{f} \\ k_2, &{} f <\bar{f} \end{array} \right. \end{aligned}$$

(21)

The mutation probability is expressed:

$$\begin{aligned} P_{m} = \left\{ \begin{array}{ll} k_3\left( f_{{{ max}}}-f\right) /\left( f_{{{ max}}}-\bar{f}\right) , &{} f \ge \bar{f} \\ k_4, &{}f <\bar{f} \end{array} \right. \end{aligned}$$

(22)

where

$k_1$, $k_2$, $k_3$, $k_4$: real numbers ${\le } 1.0$.
$f_{{{ max}}}$: maximum fitness value of population.
$\bar{f}$: average fitness value for each generation.
f: fitness Value of mutant individuals.

It is well established in the literature on genetic algorithms (Srinivas and Patnaik 1994; Bao et al. 2011) that $(P_{{\mathrm{c}}} \ge 0.5)$ and $(P_m \ge 0.01)$ produce good algorithm convergence. Thus, we fixed values of $k_1$, $k_2$ at 0.5 and $k_3$, $k_4$ at 0.01.

Finally, the solution space represents an n-dimensional polyhedron which changes in size and shape whenever a new generation is created. The crossover and mutation operators act as visiting corner points of the polyhedron. Thus, by visiting this corner points an optimal placement solution is proposed when the objective function is maximized.

4 Simulation and performance evaluation

4.1 Simulation

In order to compare the performance of our proposed GGA with that of GA, simulations were conducted on MATLAB 8.5 (MathWorks 1994) using three scenarios: walking, running, and jumping. The necessary simulation parameters are shown in Table 1.

Table 1 Simulation parameters

Full size table

4.2 Performance evaluation

In order to evaluate the performance of GGA from the simulation results, we considered the following three performance measures: optimization quality, execution time, and optimization cost.

4.2.1 Optimization quality

This measure is related to the fitness value of each generated camera network. The fitness value defines convergence to the optimal camera network. The closer the value is to “1”, the more efficient the camera network.

Figure 8 indicates the progression of the average fitness value in 60 iterations for both algorithms GGA and GA. For the 3 scenarios, the GA started with a low fitness value, but GGA started with a significant fitness value (initial iterations). For the 3 scenarios, we notice that the performances in terms of fitness are clearly better for the GGA.

4.2.2 Optimization time

This represents the maximum time allowed to provide the best camera network for a given scenario. If the algorithm does not provide a response within this limit for a given test, it is considered a failure, even though the algorithm could provide a fair answer with more time.

As shown in Fig. 9, GGA has a significant advantage compared to GA in terms of optimization time. GGA successfully selects the most suitable camera locations for the scenario studied to recover tag coordinates in a reasonable time. Nevertheless, the optimization time of GA varies depending on the movement studied. Thus, as a scenario becomes more complex, the optimization time clearly increases.

4.2.3 Optimization cost

This measure is essential for motion capture applications. The goal is to minimize the number of cameras used for motion capture, which in turn minimizes the necessary cost for the motion capture setup.

Concerning the optimization cost, Fig. 10 demonstrates that GGA reaches a maximum rate of recovered tags (100%) by using only 4 or 6 cameras with a runtime of 15 min. However, GA uses 10 cameras to reach 100% with a runtime of 60 min.

Figure 11 shows the best camera network configurations obtained with GGA and GA for the 3 scenarios where, regardless of the complexity of the scenario, the best configuration with GGA is obtained by using a reduced number of cameras compared to GA.

For scenarios walk, run, and jump, our GGA algorithm finds the best configuration by using 4, 4, and 6 cameras, respectively; however the best configuration of GA is obtained by using 8, 10, 10 cameras, respectively. This means that the GGA is more efficient than the GA in minimizing the number of cameras used and actually less costly.

5 Conclusion

This paper presented a method to optimize camera network configuration for a motion capture. The proposed method improves the performances of classical genetic algorithms to deal with the problem of divergence of solutions and decrease the computation time of the optimization. To achieve this goal an error metric that evaluates the quality of capture and a distribution and estimation technique that restricts the search space were developed.

The work proposed in this paper opens new challenges in the future to improve GGA robustness. We will explore the possibility of combining GGA with other evolutionary techniques to speed up the convergence process and satisfy the requirements of the global optimum problem to optimize camera network placement for motion capture.

Finally, this work opens new perspectives in terms of the acquisition of human movement in complex environments (e.g. with obstacles). We want to develop a digital application, based on this work, to help position the cameras in an experimental scene, which will save time, and also limit the loss of markers affixed to the subject being tested.

References

Aissaoui A, Baarir Z, Ouafi A, Pudlo P, Taleb-Ahmed A, Gillet C, Deraz F (2014) Optimisation technique of camera placement for motion capture simulations. Comput Methods Biomech Biomed Eng 17(sup1):122–123
Article Google Scholar
Al Hasan M, Ramachandran KK, Mitchell JE (2008) Optimal placement of stereo sensors. Opt Lett 2(1):99–111
Article MathSciNet MATH Google Scholar
Angeline PJ (1995) Adaptive and self-adaptive evolutionary computations. In: Palaniswami M, Attikiouzel Y (eds) Computational intelligence: a dynamic systems perspective. IEEE Press, New York, pp 152–163
Google Scholar
Bao S, Jiang R, Kwan W, Wang B, Ma X, Song Y-Q (2011) Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 56(6):406–414
Article Google Scholar
Chen X, Davis J (2008) An occlusion metric for selecting robust camera configurations. Mach Vis Appl 19(4):217–222
Article MATH Google Scholar
Chvatal V (1975) A combinatorial theorem in plane geometry. J Comb Theory Ser B 18(1):39–41
Article MathSciNet MATH Google Scholar
Ercan AO, Yang DB, El Gamal A, Guibas LJ (2006) Optimal placement and selection of camera network nodes for target localization. In: International conference on distributed computing in sensor systems. Springer, pp 389–404
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Article Google Scholar
Gonzalez-Barbosa J-J, García-Ramírez T, Salas J, Hurtado-Ramos J-B, Rico-Jiménez J-D-J (2009) Optimal camera placement for total coverage. In: Proceedings of the 2009 IEEE international conference on robotics and automation. IEEE Press, pp 3672–3676
Hörster E, Lienhart R (2006) On the optimal placement of multiple visual sensors. In: Proceedings of the 4th ACM international workshop on video surveillance and sensor networks. ACM, pp 111–120
Indu S, Chaudhury S, Mittal NR, Bhattacharyya A (2009) Optimal sensor placement for surveillance of large spaces. In: ICDSC 2009. 3rd ACM/IEEE international conference on distributed smart cameras. IEEE, pp 1–8
Katz I, Aghajan H, Haymaker J (2010) A process for sensor configuration in multi-camera networks. In: Proc. 4th ACM/IEEE int. conf. distributed smart cameras
Kita H (2001) A comparison study of self-adaptation in evolution strategies and real-coded genetic algorithms. Evol Comput 9(2):223–241
Article Google Scholar
Lee D, Lin A (1986) Computational complexity of art gallery problems. IEEE Trans Inf Theory 32(2):276–282
Article MathSciNet MATH Google Scholar
Lee MW, Jung CH, Lee MG, Brutzman D (2015) Data definition of 3D character modeling and animation using H-Anim. J Converg 6(2):19–29
Article Google Scholar
Malik R, Bajcsy P (2008) Automated placement of multiple stereo cameras. In: The 8th workshop on omnidirectional vision, camera networks and non-classical cameras-OMNIVIS
MathWorks (1994) Matlab r2015a. https://www.mathworks.com/products/matlab.html. Accessed 27 Dec 2016
Mitchell M (1998) An introduction to genetic algorithms. MIT Press, Cambridge
MATH Google Scholar
Mittal A, Davis LS (2008) A general method for sensor planning in multi-sensor systems: extension to random occlusion. Int J Comput Vis 76(1):31–52
Article Google Scholar
Morsly Y, Aouf N, Djouadi MS, Richardson M (2012) Particle swarm optimization inspired probability algorithm for optimal camera network placement. IEEE Sens J 12(5):1402–1412
Article Google Scholar
Olague G, Mohr R (2002) Optimal camera placement for accurate reconstruction. Pattern Recogn 35(4):927–944
Article MATH Google Scholar
O’rourke J (1987) Art gallery theorems and algorithms, vol 57. Oxford University Press, Oxford
MATH Google Scholar
Piciarelli C, Micheloni C, Foresti GL (2010) Occlusion-aware multiple camera reconfiguration. In: Proceedings of the 4th ACM/IEEE international conference on distributed smart cameras. ACM, pp 88–94
Rahimian P, Kearney JK (2015) Optimal camera placement for motion capture systems in the presence of dynamic occlusion. In: Proceedings of the 21st ACM symposium on virtual reality software and technology. ACM, pp 129–138
Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern Part B (Cybern) 24(4):656–667
Article Google Scholar
Szeliski R (2010) Computer vision: algorithms and applications. Springer Science & Business Media, Berlin
MATH Google Scholar
Xu Z, Zhang J, Zhang Y (2003) Bionics in computational intelligence: theory and algorithm. Science Press, Beijing
Google Scholar
Yao Y, Chen C-H, Abidi B, Page D, Koschan A, Abidi M (2010) Can you see me now? sensor positioning for automated and persistent surveillance. IEEE Trans Syst Man Cybern Part B (Cybern) 40(1):101–115
Article Google Scholar
Zhao J, Cheung S-C, Nguyen T (2008) Optimal camera network configurations for visual tagging. IEEE J Sel Top Sig Process 2(4):464–479
Article Google Scholar
Zhao J, Cheung S, Nguyen T (2009) Optimal visual sensor network configuration. Multi-camera networks: principles and applications, Academic Press, New York, NY, USA, pp 139–162
Google Scholar

Download references

Acknowledgements

This research was supported by the Franco-Algerian cooperation programme PHC TASSILI (12MDU876) Grants. Entitled “Assistant System to the Cameras Location in the MOCAP”, the Project gathers members of Automatic Control and Human–Machine Systems of LAMIH Laboratory-Valenciennes University-France, and AISEL Laboratory-Biskra University-Algerie.

Author information

Authors and Affiliations

LESIA Laboratory, Biskra University, BP 145 RP, 07000, Biskra, Algeria
Azeddine Aissaoui, Abdelkrim Ouafi & Zine-Eddine Baarir
UVHC, LAMIH, 59313, Valenciennes, France
Philippe Pudlo, Christophe Gillet & Abdelmalik Taleb-Ahmed
CNRS, UMR 8201, 59313, Valenciennes, France
Philippe Pudlo, Christophe Gillet & Abdelmalik Taleb-Ahmed
University Lille Nord de France, 59000, Lille, France
Philippe Pudlo, Christophe Gillet & Abdelmalik Taleb-Ahmed

Authors

Azeddine Aissaoui
View author publications
You can also search for this author in PubMed Google Scholar
Abdelkrim Ouafi
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Pudlo
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Gillet
View author publications
You can also search for this author in PubMed Google Scholar
Zine-Eddine Baarir
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmalik Taleb-Ahmed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Azeddine Aissaoui.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aissaoui, A., Ouafi, A., Pudlo, P. et al. Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm. Virtual Reality 22, 13–23 (2018). https://doi.org/10.1007/s10055-017-0310-7

Download citation

Received: 01 August 2015
Accepted: 24 March 2017
Published: 04 April 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10055-017-0310-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm

Abstract

Similar content being viewed by others

A new evolving mechanism of genetic algorithm for multi-constraint intelligent camera path planning

Camera Self Calibration with Varying Parameters by an Unknown Three Dimensional Scene Using the Improved Genetic Algorithm

New Binary Particle Swarm Optimization Algorithm for Surveillance and Camera Situation Assessments

1 Introduction

2 Related work