1 Acoustic Field Reconstruction

Ultrasound haptics is a technology that uses an array of ultrasound transducers to present tactile sensations remotely. The tactile sensation is caused by a nonlinear phenomenon called acoustic radiation. Assuming that the particle velocity of ultrasound is sufficiently small, the acoustic radiation pressure can be approximated as a quantity proportional to the square of the sound pressure p calculated in the range of linear acoustics (Hasegawa et al. 2000). Therefore, controlling p over certain control points leads to the design of a spatial tactile pattern that spreads over them. In this section, we first describe the forward problem of determining the sound pressure field p created by an ultrasound transducer array and then outline the inverse problem of determining the gain of the transducer array that generates the desired sound pressure distribution.

1.1 Forward Problem of the Acoustic Field

The first step is to formulate the sound field produced by the ultrasound-phased array. The ultrasound transducers that make up the array emit ultrasound waves due to the piston motion of their internal diaphragms. The sound field produced by an individual transducer is, in fact, affected by various factors, but let us first assume that it is a point source emitting spherical wave with frequency \(f_0\) at the origin. In this case, the sound pressure field created by the transducer is as follows:

$$\begin{aligned} p(t,x,y,z)= & {} \frac{q^\textrm{amp}}{\sqrt{x^2+y^2+z^2}}e^{j(-k\sqrt{x^2+y^2+z^2} + 2\pi f_0 t + \theta ) }, \end{aligned}$$
(1)

where \(q^\textrm{amp}, \theta \) are the intensity and phase of the point source, c is the speed of sound, and \(k = 2\pi f_0/c\) is the wave number. This equation can be transformed into the product of the transfer function g, complex gain of the transducer q, and time-dependent term.

$$\begin{aligned} p(t,x,y,z) = g(x,y,z) q(q^\textrm{amp}, \theta ) e^{j2\pi f_0t}, \end{aligned}$$
(2)

where

$$\begin{aligned} g(x,y,z)= & {} \frac{1}{\sqrt{x^2+y^2+z^2}}e^{-jk\sqrt{x^2+y^2+z^2}} \end{aligned}$$
(3)
$$\begin{aligned} q(q^\textrm{amp}, \theta )= & {} q^\textrm{amp}\ e^{j\theta }. \end{aligned}$$
(4)

Focusing on the \(f_0\) frequency component of the sound field \(\hat{p}(x,y,z)\), the relation between the complex gain and the sound field is expressed as follows:

$$\begin{aligned} \hat{p}(x,y,z) = g(x,y,z) \cdot q(q^\textrm{amp}, \theta ). \end{aligned}$$
(5)

To extend the transfer function to a general position, let \(\textbf{x}\) be the position of the sound source and \(\mathbf {x'}\) be the observed position. In this case, the transfer function \(g(\textbf{x}, \mathbf {x'})\) is as follows:

$$\begin{aligned} g(\textbf{x},\mathbf {x'}) = \frac{1}{|\textbf{x}-\mathbf {x'}|}e^{-jk|\textbf{x}-\mathbf {x'}|} . \end{aligned}$$
(6)

Let us consider a case where there are multiple transducers. Suppose that there are N transducers, each with a gain of \(q_1, q_2, \dots q_N\), at \(\textbf{x}_1, \textbf{x}_2, \dots \textbf{x}_N\). The sound pressure at point \(\textbf{y}\) can be expressed using vectors:

$$\begin{aligned} \hat{p}(\textbf{y}) = \sum _i g(\textbf{x}_i, \textbf{y}) q_i = \textbf{g}^\top (\textbf{y})\textbf{q}. \end{aligned}$$
(7)

where

$$\begin{aligned} \textbf{g}(\textbf{y}) = \begin{pmatrix} g(\textbf{x}_1, \textbf{y})\\ g(\textbf{x}_2, \textbf{y})\\ \vdots \\ g(\textbf{x}_N, \textbf{y}) \end{pmatrix}, \end{aligned}$$
(8)
$$\begin{aligned} \textbf{q} = \begin{pmatrix} q_1\\ q_2\\ \vdots \\ q_N \end{pmatrix}. \end{aligned}$$
(9)

This equation represents the continuous sound pressure created when the gain of the phased array transducer is given.

In practical situations pertaining to mid-air haptic rendering, we often consider controlling the sound pressure at discrete points in the observation space \(\textbf{y}\). When the discretized control points are represented by \(\textbf{y}_1, \textbf{y}_2, \dots , \textbf{y}_M\), the sound pressure at each point can be expressed as follows:

$$\begin{aligned} \hat{p}(\textbf{y}_j) = \sum _i g(\textbf{x}_i, \textbf{y}_j) q_i = \textbf{g}^\top (\textbf{y}_j)\textbf{q}. \end{aligned}$$
(10)

Therefore, the sound pressure vector at control points \(\textbf{y}\) follows the matrix equation:

$$\begin{aligned} \hat{\textbf{p}} = G\textbf{q} \end{aligned}$$
(11)

where

$$\begin{aligned} G = (\textbf{g}(\textbf{y}_1), \textbf{g}(\textbf{y}_2), \dots , \textbf{g}(\textbf{y}_M))^\textrm{T}. \end{aligned}$$
(12)

This is the basic equation describing the forward problem of the phased array sound field. The vector of the transducer’s gain \(\textbf{q}\) and the sound pressure on the control points \(\hat{\textbf{p}}\) are related by the transfer function matrix G.

As mentioned previously, matrix G represents the transfer function for the case where each transducer can be approximated as a point source. However, similar matrix equations can be obtained for other practical situations pertaining to mid-air haptic rendering. For example, if we consider the transducer as a piston disk of radius a attached to a baffle plate, the transfer function at the far field is as follows:

$$\begin{aligned} g(x,y,z) = \frac{i\omega \rho u a^2 J_1(ka\sin \theta )}{ka\sin \theta \sqrt{x^2+y^2+z^2}}e^{-jk\sqrt{x^2+y^2+z^2}}, \end{aligned}$$
(13)

where \(J_1\) is a Bessel function of the first order, u is the velocity of the disk, and \(\theta \) represents the angle between (xyz) and the normal of the disk. This approximation has been adopted in several mid-air haptic rendering studies using common cylindrical transducers.

Another example is to consider the effects of scattering. Thus far, we have considered the forward problem in a free-sound field. When considering scattering on the surface of an object such as a hand, according to the style of the boundary element method, the sound field can be represented as follows (Inoue et al. 2016):

$$\begin{aligned} B\hat{\textbf{p}} = G\textbf{q}, \end{aligned}$$
(14)

Matrix B represents the scattering effect. Even in this case, the same matrix equation can be obtained by designating \(\bar{G} = B^{-1}G\):

$$\begin{aligned} \hat{\textbf{p}} = \bar{G}\textbf{q}. \end{aligned}$$
(15)

Mid-air haptic rendering based on this scattered-field equation can increase the pressure at a single point on the fingertip (Inoue et al. 2016) or create an accurate pressure distribution on the hand surface (Matsubayashi et al. 2020).

1.2 Inverse Problem

The problem we face in controlling ultrasound-phased arrays is the reverse of this forward problem. We must first determine the acoustic radiation pressure at the control points that we want to present to the user and then find the transducer complex gain to output the pressure distribution.

Before discussing the sound pressure distribution control, let us first consider a simple case in which we want to maximize the sound pressure at a single point. The creation of this single focus is the most fundamental control and is practically significant. In this case, Eq. (7) can be written as

$$\begin{aligned} \text{ argmax}_{\textbf{q}} \hat{p}(\textbf{y}) = \text{ argmax}_{\textbf{q}} \textbf{g}^\top (\textbf{y})\textbf{q}. \end{aligned}$$
(16)

If we increase the energy of \(\textbf{q}\), p may become infinitely large; however, this is not realistic as a physical transducer has nominal power. That is, there is an upper limit on the absolute value for each component of p. As units are meaningless in this discussion, we set the upper limit to 1. That is, \(0 \le q^\textrm{amp}\ \le 1\). Understandably, the solution is when each element is driven at its maximum, and the phases are intensified at the focal point. One of the transducer’s complex gains that achieves this is as follows:

$$\begin{aligned} \mathbf {{q}} = \begin{pmatrix} \frac{ g(\textbf{x}_1, \textbf{y})^* }{ |g(\textbf{x}_1, \textbf{y})|}\\ \vdots \\ \frac{ g(\textbf{x}_N, \textbf{y})^* }{ |g(\textbf{x}_N, \textbf{y})|} \end{pmatrix}. \end{aligned}$$
(17)

where \(*\) denotes the complex conjugate.

Next, we consider the case of pressure distribution control. If the complex gain on the control points \(\textbf{p}\) is given, it is reasonable to assume that \(\textbf{q}\) can be obtained using the generalized inverse matrix \(G^{\dagger }\).

$$\begin{aligned} \textbf{q} = G^{\dagger } \textbf{p} \end{aligned}$$
(18)

However, two problems specific to ultrasound mid-air haptic rendering arise here. One is the limitation of the transducer amplitude. As in the case with a single focus, it becomes necessary to drive the transducer at an amplitude lower than the transducer’s maximum power. The other is the determination of the phase distribution of the sound field. Humans cannot distinguish the phase differences of ultrasonic waves. We can feel tactile sensations as vibrations below 1 kHz, which is considerably lower than the ultrasound frequency, and we can only distinguish the phase difference even smaller than that (Kuroki et al. 2016). Therefore, we are not interested in the phase of the generated sound pressure distribution, and only the sound pressure amplitude is our target of control. In other words, given the desired amplitude of the sound pressure \(\textbf{p}^\textrm{amp}\), we need to find \(\textbf{q}\) such that

$$\begin{aligned}&\forall j\in \{1,\cdots , M\} :\,\,\, \{\textbf{p}^\textrm{amp}\}_j = |\{G\textbf{q}\}_j| \nonumber \\&\forall i\in \{1,\cdots , N\} :\,\,\, 0 \le |\{\textbf{q}\}_i| \le 1, \end{aligned}$$
(19)

where \(\{\textbf{x}\}_i\) means i component of \(\textbf{x}\).

To date, various approaches have been attempted to solve this complex problem. Long et al. were the first to solve this problem and rendering haptic shape using ultrasound (Long et al. 2014). They proposed a method to determine q after solving an eigenvalue problem to find the sound pressure phase such that the focal points at control points strengthen each other. A little later, Inoue et al. (2015) proposed a method to determine the sound pressure phase first, similar to Long et al. by relaxing the phase optimization problem to a semidefinite programming. In a different approach, a fast method for finding q has been proposed by applying the Gerchberg–Saxton algorithm used in optics (Hertzberg et al. 2010; Inoue et al. 2015; Marzo and Drinkwater 2019). This algorithm has been improved to a faster and more accurate version, GS-PAT, by Plasencia et al. (2020). Matsubayashi et al. (2020) solved the least-squares problem using Levenberg–Marquardt method, which is slower than the other methods but achieves a more accurate reproduction of the sound pressure field. All of the above methods consider q as a continuous quantity. However, in practical applications, the amplitude and phase of the transducers are input in discrete quantities. Suzuki et al. (2021) focused on this and proposed a very fast method to determine q by combinatorial optimization. In the next section, we review these methods.

2 Overview of Various Algorithms

This section outlines the algorithms that have been proposed for generating ultrasonic amplitude distributions and their respective applications.

2.1 Eigenmethod

As described in the previous section, we are only interested in the amplitude of sound pressure on the control points. The eigenmethod proposed by Long et al. in 2014 determines a good candidate for phase distribution that can achieve the target amplitude distribution (Long et al. 2014). This method first considers the transducer gain vector \(\bar{\textbf{q}}_j\), which creates a focus of sound pressure \(p_j^\textrm{amp}\) on the control point j:

$$\begin{aligned} \bar{\textbf{q}}_j = p_j^\textrm{amp}\left( \begin{array}{c} \frac{G_{j,1}^*}{\sum _{i=1}^N|G_{j,i}|^2} \\ \vdots \\ \frac{G_{j,N}^*}{\sum _{i=1}^N|G_{j,i}|^2} \end{array}\right) \end{aligned}$$
(20)

This vector is a minimum-norm solution. \(G\bar{\textbf{q}}_j\) represents the pressure distribution at the control points when a focus is generated at the control point j. Therefore, the following matrix R represents the interaction between focal points:

$$\begin{aligned} R&= G\left( \bar{\textbf{q}}_1, \cdots , \bar{\textbf{q}}_M \right) . \end{aligned}$$
(21)

For phase of the focal points \(\textbf{t}\, (|t_i| \le 1)\) such that \(R\textbf{t}\) is large, the focal points have a constructive effect on each other; thus, the energy efficiency from the transducers is high. This implies that it is easier to achieve the target amplitude distribution under the upper limit of the transducer array. Finding such a \(\textbf{t}\) is equivalent to solving the following eigenvalue problem:

$$\begin{aligned} R\textbf{x}&= \lambda \textbf{x} \end{aligned}$$
(22)

The eigenmethod seeks the eigenvector corresponding to the largest eigenvalue of R and uses its phase for the inverse problem. When phase \(\textbf{t}\) is obtained, the eigenmethod solves the matrix equation with weighted Tikhonov regularization to obtain the transducer gain \(\textbf{q}\).

$$\begin{aligned}&\left( \begin{array}{ccc} &{}G&{} \\ \sigma _1^\gamma &{} \cdots &{} 0 \\ \vdots &{} \ddots &{} \vdots \\ 0 &{} \cdots &{} \sigma _N^\gamma \end{array}\right) \textbf{q} = \left( \begin{array}{c} \textrm{diag}(\textbf{p}^{amp})\textbf{t} \\ 0 \\ \vdots \\ 0 \end{array}\right) , \end{aligned}$$
(23)
$$\begin{aligned}&\sigma _i = \sqrt{ \left| \sum _{j = 1}^M \frac{G_{j,i}p^\textrm{amp}_j}{M} \right| } \end{aligned}$$
(24)

where \(\textrm{diag}(\textbf{ p}_\textrm{amp})\) is a diagonal matrix with \(\textbf{ p}_\textrm{amp}\) as a diagonal element. \(\gamma \) is a regularization parameter. The solution \(\textbf{q}\) is then truncated so that it does not exceed the maximum output of the transducers (\(q_i \in [0, 1]\)). The computational complexity of solving this linear equation is \(O(N^3)\), which is the bottleneck of this method. Therefore, it should be noted that the computation time of this method becomes extensive when the number of transducers is considerable. The entire process is presented in Algorithm 1.

Using this algorithm, Long et al. configured a system to present the cross-sectional shape of an object. The object shape is rendered by generating many focal points in real time at the intersection of the hand and the object, as shown in Fig. 1. They demonstrated that this system enabled users to recognize the shape of objects only from haptic sensations. This system is the first reported example of generating a sound pressure distribution with a specific shape for mid-air haptic rendering by determining the optimal sound pressure phase. However, it has been shown that the eigenmethod produces a distribution with slightly less pressure than the given sound pressure amplitude \(\textbf{ p}^\textrm{amp}\). Plasencia et al. modified the eigenmethod by multiplying the obtained transducer amplitudes by a correction factor (Plasencia et al. 2020) (shown in Algorithm 2). Comparisons between certain algorithms in their paper are described in Sect. 2.4.

Fig. 1
An image with three parts depicting a hand and an object. The first image shows the hand and the object, the second image shows the hand touching the object and the intersection resembling a triangular area, and the third image with the intersection triangular area with the multiple focal points.

When a tracked hand touches a virtual object, multiple focal points are generated at the intersection (Long et al. 2014)

An algorithm of the Eigen Method with an input x, followed by a for loop and further the for loop ends, and after solving the output q occurs as a result.
An algorithm of Corrected Eigen Method with an input x, followed by a for loop and further the for loop ends, and after solving the output q occurs as a result.

2.2 Semidefinite Relaxation

In 2015, Inoue et al. proposed a shape rendering method based on the generation of sound pressure distribution. Similar to the eigenmethod, this method first obtains the distribution of the optimal sound pressure phase. This method considers the following minimization problem.

$$\begin{aligned} \mathrm{minimize\;\;\;}&\;&\Vert G\textbf{q} - \textrm{diag}(\textbf{p}^\textrm{amp})\textbf{t} \Vert _2^2\nonumber \\ \mathrm{subject\; to\;\;\;}&\;&|t_i| = 1, \;\; i \in \{1,\cdots , N\} \end{aligned}$$
(25)

Then, it assumes that, when the phase distribution \(\textbf{t}\) is obtained, the transducer gain is determined by \(\textbf{q} = G^-\textrm{diag}(\textbf{p}^\textrm{amp})\textbf{t}\), where \(G^{-}\) is the Tikhonov regularization matrix \(G^{-} = (G^{*}G + \lambda I)G^{*}\). In this case, the objective function can be transformed as follows:

$$\begin{aligned}&\Vert G\textbf{q} - \textrm{diag}(\textbf{p}^\textrm{amp})\textbf{t} \Vert _2^2 \end{aligned}$$
(26)
$$\begin{aligned}= & {} \Vert (GG^- - I)\textrm{diag}(\textbf{p}^\textrm{amp})\textbf{t} \Vert _2^2 \end{aligned}$$
(27)
$$\begin{aligned}= & {} \textbf{t}^*\textrm{diag}(\textbf{p}^\textrm{amp})(GG^- - I)^*(GG^- - I)\textrm{diag}(\textbf{p}^\textrm{amp})\textbf{t}. \end{aligned}$$
(28)

Denoting \(M = \textrm{diag}(\textbf{p}^\textrm{amp})(GG^- - I)^*(GG^- - I)\textrm{diag}(\textbf{p}^\textrm{amp})\), the minimization problem can be expressed as follows:

$$\begin{aligned} \mathrm{minimize\;\;\;}\;&\textbf{t}^*M\textbf{t} \nonumber \\ \mathrm{subject\; to\;\;\;}&|t_i| = 1, \;\; i \in \{1,\cdots , N\} \end{aligned}$$
(29)

To make the problem easier to solve, it can be rewritten as an equivalent problem.

$$\begin{aligned} & \mathrm{minimize\;\;\;}\rm{Tr}(TM) \nonumber \\ & \mathrm{subject\; to\;\;\;}\rm{diag}(T) = 1, \; T \succeq 0, \textrm{rank}(T) = 1, \end{aligned}$$
(30)

where \(T = \textbf{t}\textbf{t}^*\). If the rank constraint is removed, we obtain a semidefinite programming (SDP) problem, and the global optimal solution can be easily obtained. To solve this SDP problem efficiently, Inoue et al. employed the block-coordinate descent method. The approximate solution of the original problem \(\textbf{t}\) was obtained as the phase of the eigenvector corresponding to the largest eigenvalue of the solution of the relaxed problem T. After phase determination, the transducer amplitude was obtained by solving the Tikhonov regularized linear equation. Similar to the eigenmethod, the computational complexity of this part is \(O(N^3)\), which is the bottleneck of this method. The details are shown in Algorithm 3.

Using this algorithm, Inoue et al. constructed a system called HORN, which presents volumetric tactile objects in air (Inoue et al. 2015). This system uses a large number of ultrasonic transducers to present a sound pressure pattern of sufficient intensity with some shape. As shown in Fig. 2, this allows the user to interact with a virtual object without any delays and time resolution losses caused by hand tracking.

Fig. 2
A representation of haptic images by transducers surrounding the workspace.

HORN generates static volumetric haptic image using a large number of transducers surrounding workspace

An algorithm of semidefinite relaxation with an input, for loop, if condition, end of if and for loop with a final output q.

2.3 Gerchberg–Saxton Algorithm

The methods of first finding the optimum sound pressure phase and then solving the linear equation, as described in the previous sections, are computationally expensive when a large number of transducers are used. In contrast, the methods for iteratively obtaining a solution at high speeds have been studied. One of them is the Gerchberg–Saxton algorithm (GSA) (Gerchberg 1972). The GSA is known as an effective tool to solve phase recovery problems and has attracted attention in a wide range of fields, such as electromicroscopy, computer holography, and astronomy. Phase recovery is the problem of estimating the phase of a physical quantity when only the intensity of the quantity is observed. Our problem can be considered to be a type of phase recovery problem. In terms of acoustic pressure amplitude control using an ultrasound-phased array, which is the subject of this chapter, GSA-based methods have been proposed in the context of hyperthermia (Hertzberg et al. 2010), mid-air haptic rendering (Inoue et al. 2015), and acoustic manipulation (Marzo and Drinkwater 2019). The details of the algorithm differ between these methods, and the method proposed by Marzo and Drinkwater (2019) is presented here.

In this method, each ultrasound transducer is assumed to change only its phase. In one iteration, constraints are placed on each of the two vectors \(\textbf{q},\textbf{p}\) while alternately propagating between them as follows:

  1. 1.

    Propagate forward:   \(\textbf{p} \Leftarrow G\textbf{q}\)

  2. 2.

    Impose constraint:  \(p_i \Leftarrow p^\textrm{amp}_i \frac{p_i}{|p_i|}\)

  3. 3.

    Propagate backward:   \(\textbf{q} \Leftarrow G^*\textbf{p}\)

  4. 4.

    Impose constraint:  \(q_i \Leftarrow \frac{q_i}{|q_i|}\)

The computational complexity in one iteration is O(NM). In their study, they state that approximately 100 iterations are sufficient. Considering that \(N \gg M\) in several applications, this method is fast and was proposed to take advantage of its high speed to perform acoustic manipulation of multiple micro-objects in three dimensions. However, because acoustic manipulation and mid-air haptic rendering have various similarities in terms of the requirements of the target sound field, this algorithm is also useful for mid-air haptic rendering.

An algorithm of Gerchberg-Saxton with an input, followed by a for loop and a final output q.

2.4 GS-PAT

In 2020, Plasencia et al. proposed an algorithm that is more accurate and faster than the GSA, that is, GS-PAT. The GSA uses \(G^{*}\) as the back-propagation matrix, which is inaccurate in terms of the amplitudes of the transducer. Conversely, GS-PAT uses the normalized matrix F and updates \(\textbf{p}\) in the following manner:

$$\begin{aligned}&1.\,\, \hat{\textbf{p}} \Leftarrow G F\textbf{p} \\&\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, = G \left( \begin{array}{ccc} \frac{G_{0,0}^*}{\sum _{i=1}^N|G_{0,i}|^2} &{} \cdots &{} \frac{G_{M,0}^*}{\sum _{i=1}^N|G_{M,i}|^2} \\ \vdots &{} \ddots &{} \vdots \\ \frac{G_{0,N}^*}{\sum _{i=1}^N|G_{0,i}|^2} &{} \cdots &{} \frac{G_{M,N}^*}{\sum _{i=1}^N|G_{M,i}|^2} \end{array} \right) \textbf{p}.\\&2.\,\, p_i \Leftarrow p_i^\textrm{amp} \frac{\hat{p}_i}{|\hat{p}_i|},\;\;\;\;\;\; i \in \{1,\cdots , N\}.\\ \end{aligned}$$

.

The matrix F is composed of vectors that are minimum-norm solutions when each control point is generated independently, which is identical to the one used in the eigenmethod described in Sect. 2.1. In GS-PAT, the matrix \(R=GF\) is calculated in advance to reduce the computational complexity of the matrix-vector product in iterations. When the number of iterations of the algorithm is K, The computational complexity of the GSA is O(KNM), whereas that of GS-PAT is \(O(KM^2+NM^2)\). Considering \(M<K<N\) in various practical situations pertaining to multi-point mid-air haptic rendering and multi-object manipulations, GS-PAT has an advantage over the GSA in terms of computation time. Using a middle-end GPU (NVIDIA GTX 1660) with \(N=512, M=32, K=100\), GS-PAT was experimentally shown to be capable of 17000 optimizations per second.

Plasencia et al. also conducted simulations to compare the accuracy of the eigenmethod, the GSA, and GS-PAT for multi-point amplitude control. The results showed that GS-PAT performed well as the other algorithms and was considerably faster. However, when the number of focal points was large (\(M \ge 16\)), GS-PAT was found to be less accurate than the eigenmethod.

The fast multi-point pressure rendering achieved by this algorithm has the potential to enable more diverse mid-air haptics sensations. In recent years, it has been reported that, by moving a single focal point at a high speed, it is possible to provide more intense tactile stimulation and recognize the shape of the trajectory (Takahashi et al. 2018; Frier et al. 2018) (see Chapter “Modulation Methods for Ultrasound Midair Haptics,” for details). However, how the speed and frequency of the tactile stimuli affect human perception remains to be clarified. Fast multi-focus generation enables independent control of the focus movement speed and the refresh rate of the haptic stimulus in these modulation methods. For example, rotating three focal points on a circle can present haptic sensations at three times the refresh rate of a single point rotated at the same speed (see Fig. 3). This algorithm is expected to reveal a variety of human perceptual characteristics.

Fig. 3
A sketch of two palms side by side in two-parts a and b. Part a shows the palm with a haptic point in a circle of clockwise direction indicating the Spatio-temporal modulation and part b shows the palm with three haptic points each associated with the arrow towards clockwise direction indicating the independent control of focal movement speed and refresh rate of haptic stimuli.

a Spatio-temporal modulation, b independent control of focal movement speed and refresh rate of haptic stimuli by multi-focus generation (Plasencia et al. 2020)

An algorithm of G S - P A T with an input, followed by a for loop and a final output q.

2.5 Levenberg–Marquardt Algorithm

Basically, there is a tradeoff between accuracy and speed in sound pressure amplitude control. In 2019, Matsubayashi et al. proposed a method to control the sound pressure amplitude accurately, although it is slower than previous methods. They used the Levenberg–Marquardt algorithm (LMA), which is known as an effective solution method for unconstrained nonlinear least-squares problems, to control the sound pressure amplitude in a scattered sound field. The scattering of ultrasonic waves on the surface of the hand is not a problem for macroscopic tactile rendering (e.g., different intensities of tactile rendering for each of the five fingers). However, when more detailed pressure reproduction is required, such as when we want to control the shape of the pressure generated on the fingertips, we need to consider its effect. They described the optimization for controlling the scattering sound field as the following least-squares problem.

$$\begin{aligned}&\textrm{minimize} \;\;\;\; \Vert \textrm{diag}(\textbf{ p}^\textrm{amp}) \textbf{ t} - B^{-1}G\textbf{ q}\Vert _2^2 \nonumber \\&\mathrm{subject\;to} \;\;\;\; |t_i|=1,|q_i| \le 1, \end{aligned}$$
(31)

If we fix the transducer amplitude to the maximum value and omit the computationally expensive calculation of \(B^{-1}\), the following problem is obtained:

$$\begin{aligned}&\textrm{minimize} \;\;\;\; \Vert B\textrm{diag}(\textbf{ p}^\textrm{amp}) \textbf{ t} - G\textbf{ q}\Vert _2^2 \nonumber \\&\mathrm{subject\;to} \;\;\;\; |t_i|=1,|q_i| = 1 \end{aligned}$$
(32)

Furthermore, by setting \(\textbf{ q} = [e^{j\theta _1}, \cdots , e^{j\theta _N}]^T\) and \(\textbf{ t} = [e^{j\theta _{N+1}}, \cdots , e^{j\theta _{N+M}}]^T\), the problem is simplified to an unconstrained least-squares problem for the phases of sound pressure and transducers. \(\mathbf { \theta } = [\theta _1, \cdots , \theta _{M+N}]^T\).

Without going into detail, the update step of \(\mathbf { \theta }\) in the LMA is calculated as follows:

$$\begin{aligned} \textbf{h} = - [J(\mathbf {\theta })^T J(\mathbf {\theta }) + \lambda I]^{-1}J(\mathbf {\theta })^T\textbf{ f}(\mathbf {\theta }), \end{aligned}$$
(33)

where

$$\begin{aligned} \textbf{ f}(\mathbf {\theta }) = \left( \begin{array}{c} \textrm{Re} [B\textrm{diag}(\textbf{ p}^\textrm{amp}) \textbf{ t} - G\textbf{ q}]\\ \textrm{Im} [B\textrm{diag}(\textbf{ p}^\textrm{amp}) \textbf{ t} - G\textbf{ q}]. \end{array} \right) , \end{aligned}$$
(34)

and \(J(\mathbf {\theta })\) is a Jacobian matrix of \(\textbf{ f}(\mathbf {\theta })\). Only if the value of the objective function after the step \(\Vert \textbf{ f}(\mathbf {\theta } + \textbf{h})\Vert _2^2\) decreases, the phases will be updated \(\mathbf {\theta } \Leftarrow \mathbf {\theta } + \textbf{h}\). In addition, \(\lambda \) is a damping factor that contributes to the convergence stability of the rhythm and is updated according to the behavior of the objective function after the step. When \(\lambda \) is large, the behavior of the LMA is similar to the steepest descent method, and when \(\lambda \) is small, the LMA converges in a manner similar to the Gauss–Newton method. Algorithm 6 describes the details of this process in which the update method for \(\lambda \) follows (Madsen et al. 2004).

Matsubayashi et al. proposed a method to dynamically generate a mesh model of the hand surface and render the sound pressure amplitude on the surface in real time using the LMA described above. They generated pressure amplitude distributions with different widths at the fingertips (see Fig. 4) and verified that these distributions were discriminable via a user study. This is the first reported study of accurately controlling the sound pressure amplitude distribution on the skin surface by considering the scattering on the hand surface.

LMA can be applied not only to the scattered sound field but also to the free field by replacing matrix B with the identity matrix. Sakiyama et al. used LMA in a free-field condition to reproduce the pressure distribution measured by a microphone array to render textures such as fingers, brushes, and towels.

LMA can generate an accurate amplitude distribution, but it is computationally expensive. The bottleneck of this method is that linear Eq. 33 need to be solved for step calculations, whose computational complexity is represented by \(O((N+M)^3\). However, if there is a control point with zero target amplitude, the phase at that point is not considered, and the computational complexity is reduced. The number of zero-amplitude control points can be quite large in situations where we want to prevent the sound pressure from being affected by scattering effects, other than the tactile point position. If the number of control points with nonzero amplitude is \(M'\), the computational complexity is \(O((N+M')^3\). Using a high-end GPU (NVIDIA GeForce RTX 2080 Ti), it has been shown experimentally that each iteration takes approximately 10 ms when \(N + M' = 1500, M = 10,000 \). In this case, five iterations for one optimization would result in a haptic refresh rate 20 Hz.

Fig. 4
A set of six images with two parts a and b. Each part has three images such as a1, a2, a3, and b1, b2, b3 respectively. Both part a and b shows the image of the palm with a finger pointed out. The fingertip is marked with a single color in part a, whereas part b shows different color gradients in fingertips.

Pressure amplitude control of the finger surface: (a1-a3) Target amplitude distribution. (b1-b3) Simulation results of the distribution reproduced by LMA (Matsubayashi et al. 2020)

An algorithm of Levenberg Marquardt with an input, followed by a for loop, a condition If loop, further a condition else loop and finally the output q is received.

2.6 Combinatorial Optimization

In the methods described so far, the amplitude and phase of the transducers were taken as continuous quantities. However, in practical use, they are input as discrete quantities. Furthermore, it has been found that even small amounts of gain information input to each transducer can produce accurate distributions. For example, it was reported that about eight values (3-bit) in phase are sufficient to reproduce some sound field patterns (Morales et al. 2021). Therefore, the inverse problem can be formulated as combinatorial optimization. This is much simpler than the problems we have considered so far, and allows for fast optimization.

In 2021, Suzuki et al. proposed a method to solve the inverse problem by discretizing the gain and applying a greedy algorithm (Suzuki et al. 2021). In this method, the amplitude \(q^\textrm{amp}\) and phase \(\theta \) of the transducer are divided into I and J respectively;

$$\begin{aligned} q^\textrm{amp}&\in \left\{ \frac{1}{I}, \frac{2}{I}, \ldots , 1 \right\} ,\end{aligned}$$
(35)
$$\begin{aligned} \theta&\in \left\{ 0, 2\pi \frac{1}{J}, \ldots , 2\pi \frac{J-1}{J} \right\} . \end{aligned}$$
(36)

In this method, the gain \(q_i = q^\textrm{amp}_i\textrm{e}^{j\theta _i}\) is determined one by one for all transducers by searching all combinations of these discretized gain sets. Algorithm 7 describes this method. The decision to the gain of the i-th transducer is made by minimizing the difference between the target sound pressure and the sound pressure when the 1 to i-th transducers are driven. Therefore, objective function \(E_i\) to decide the gain of the i-th transducer is set as

$$\begin{aligned} E_i(q^\textrm{amp}_i, \theta _i) = \sum _{j=1}^{M}\left| p^\textrm{amp}_j - \left| \sum _{k=1}^{i} g(\textbf{x}_k, \textbf{y}_j) q^\textrm{amp}_k\textrm{e}^{j\theta _k} \right| \right| ^2. \end{aligned}$$
(37)

To minimize the objective function \(E_i(q^\textrm{amp}_i,\theta _i)\), the brute-force search is used, which means that the objective function is computed for all combinations and the smallest one is selected. The computational complexity of the whole process is O(IJMN), but since this method can generate distributions with sufficient accuracy using small values of I and J, the computation time is short in various practical situations.

Suzuki et al. have implemented the previously described methods (the eigenmethod, the semidefinite relaxation, GS-PAT, the LMA, and this method) on a CPU and performed experiments to compare them. The results show that this method has the shorter computation time than the other method when \(I = 1, J = 16\). They also performed simulations to generate multiple foci with identical amplitudes. Their results showed that the method could reconstruct the sound pressure given at the control point with at least 80% accuracy which is better than eigenmethod and semidefinite relaxation. LMA gave the most accurate results, but the longest computation time. GS-PAT (Matsubayashi et al. 2020) showed an intermediate performance between this method and the LMA in terms of both accuracy and time.

An algorithm for combinatorial optimization with an input, a condition for loop and an output q.

3 Summary

In this chapter, we introduced the basic equations for controlling the ultrasonic sound field and the various algorithms for solving them. The requirements for sound field control in mid-air haptics rendering are accuracy and speed. The use of GS-PAT or the combinatorial optimization is suitable for fast movement of stimulus points, such as when generating a focus that perfectly follows a fast-moving finger. Note, however, that in such cases, even if the algorithm is fast enough, the effects of hardware and sound transmission delays need to be compensated for. On the other hand, if you want to generate an accurate sound pressure distribution, LMA is superior. Especially in the situation where tactile sensation is generated on a specific part of the hand, taking into account the effect of scattering on the surface of the hand, this algorithm is suitable because the computation time is not much affected by the number of zero pressure points. These and other ultrasonic sound field control algorithms proposed so far, and the applications realized by them have been introduced above, but this is only an overview. Please refer to the respective literature for details.

One of the major advantages of ultrasonic haptic technology is the ability to freely control the pressure applied to the skin surface. This is also important in investigating the sense of touch in the human body. Investigating how the spatial patterns of pressure and the waveforms of the vibrations applied to it affect human perception is the basis for future mid-air haptic rendering technology. It has the potential to define the necessary elements for haptic design and furthermore to create new tactile sensations that do not exist in the real world. For this purpose, further development of algorithms to control the ultrasonic field is expected.