Keywords

1 Introduction

According to a literacture, about 2/3 of passengers suffer from a carsickness [7]. Meanwhile, when autonomous vehicles (AVs) become commercially available, everyone in the AVs becomes a passenger. Therefore, the probability that a passenger suffering from carsickness is in the AV increases. By the way, this carsickness makes passengers uncomfortable and it eliminates the benefits of AVs. Therefore, carsickness is an important problem to be solved.

In previous studies, there was an attempt to install a webcam on a dashboard of a vehicle and use the captured scene as the background of a mobile device (e.g., smartphone or tablet) [5]. And there was an attempt to inform the rotation direction of the vehicle by giving vibration to the haptic devices composed of 7 mini vibration motors installed in both of the passenger’s forearm [3]. Also, there was an attempt to inform the vehicle’s rotation direction using 32 light-emitting diodes (LEDs) installed around the visual display device [2]. Finally, after setting the border of the smartphone as a visualization area, an attempt was made to present a moving bubble in this area according to the vehicle’s acceleration direction and magnitude [4]. However, some of them had a limitation that an additional device was necessary (e.g., webcam, haptic device, and LEDs) [2, 3, 5], and some of them had a limitation that it was effective only when using a mobile device [2, 4]. Therefore, in this paper, a method of canceling out the acceleration generated by AVs using a power seat is developed. This does not require an additional device because it uses the power seat already present in the vehicle, and it is applicable not only when using a mobile device but also when reading a book. In the proposed system, depending on which signal is applied to the power seat, carsickness increases or decreases. In this paper, therefore, the actuation signal applied to the power seat was determined through reinforcement learning (RL) that takes best choice by trial-and-error [8].

In order to validate the proposed method, a simulation was performed to compare the otolith response in the following two cases: (i) when vehicle acceleration was not canceled out, (ii) when the vehicle acceleration was canceled out by applying the actuation signal generated by RL to the power seat. As a result, the feasibility of the RL-based power seat actuation for the mitigation of the carsickness was verified.

This paper is organized as follows: Sect. 2 introduces RL and RL for power seat actuation. The next Sect. 3 introduces the simulation condition, measurements, learning environment and hyper parameters for RL, and simulation results. Finally, Sect. 4 closes this paper by presenting the conclusions and future works.

2 Reinforcement Learning

This section briefly introduces RL and the configuration for applying RL to power seat actuation.

2.1 Reinforcement Learning

RL is one of the machine learning methods to achieve performance improvement through trial and error. As shown in Fig. 1, RL consists of two components, agent and environment, and three information of action, state, and reward is transmitted between the two components. During a training, the agent performs various actions in various states and generates various rewards, and as a result, it is possible to know which action generates a higher reward in a given state.

Fig. 1.
figure 1

Structure of the reinforcement learning

2.2 Reinforcement Learning for Power Seat Actuation

The objective of this paper is to cancel out the acceleration generated by the AV through an actuation of the power seat. In this system, the actuation signal of the power seat is manipulated while observing the vehicle state, the passenger state, and the power seat state. Therefore, the agent of RL is the power seat controller that generates actuation signal, and the environments of RL are vehicle, passenger, and power seat that are observation targets.

Firstly, the agent performs an action that generates a power seat actuation signal in a specific range (between −1 m/s2 and 1 m/s2). Next, the velocity of the power seat was limited to -0.5 m/s and 0.5 m/s. Finally, the workspace of the power seat was limited to 1 m because the AV has space limits.

Secondly, the power seat receiving the actuation signal changes the passenger acceleration (vehicle acceleration minus power seat acceleration), otolith response (perceived vestibular acceleration), and power seat position. Among them, the otolith response can be obtained by a mathematical model [10].

Thirdly, states of vehicle acceleration, passenger acceleration, otolith response of the passenger, and normalized power seat position are provided to the agent.

Finally, the environment generates reward by using otolith response of the passenger as follows:

$$\begin{aligned} r = {\left\{ \begin{array}{ll} \begin{aligned} -|\hat{f}_{i}| + 2&{}\text {, if } |\hat{f}_{i}| \le 2\\ -|\hat{f}_{i}| \times 5&{}\text {, otherwise} \end{aligned} \end{array}\right. }. \end{aligned}$$
(1)

where r and \(\hat{f}_{i}\) are reward and currently perceived force, respectively. If the passenger senses a greater vestibular acceleration, greater motion sickness occurs [6]. Therefore, to make an agent that produces a smaller vestibular acceleration, the reward was generated by multiplying the otolith response by \((-1)\). In the mean time, there is a special phenomenon obtained by the workspace limitation of the power seat. If the power seat reaches the workspace limitation during actuation, the power seat stops with impact (large acceleration). Therefore, the reward must be computed by considering the power seat impact. If there is no impact (\(|\hat{f}_{i}|\) is smaller than 2), an operation was performed to add 2 to maintain the reward as a positive value. On the other hand, when there is an impact, the perceived vestibular acceleration was multiplied by 5 to give a large penalty.

3 Simulation

This section presents the simulation condition, measurement, learning environment and hyper parameters of RL, simulation results, and discussions to find out whether the RL-based power seat actuation method reduces sensory conflict of the AV passenger.

3.1 Simulation Conditions

There are values that must be selected to perform simulation. It includes otolith response related values and acceleration/velocity/position ranges of AV. The values are selected to match the simulation environment and the AV driving environment similarly. Firstly, the vehicle acceleration provided by the environment to the agent is a value randomly selected between -3 m/s2 and 3 m/s2. The simulation was performed in an environment in which the AV accelerates and decelerates at random. Secondly, there is no restriction on the position of the vehicle.

To check the feasibility of RL-based power seat actuation, the performance of the general situation in which the power seat does not move and the proposed situation in which the power seat is driven using the RL were compared. This comparison was made on an AV driven for 60 s.

3.2 Measurements

As shown in Fig. 2, if the passenger reads a book or uses a smartphone in AV, the passenger receives fixed visual feedback. That is, perceived visual acceleration is zero. On the other hand, the remainder subtracting the power seat acceleration from the vehicle acceleration is transmitted to the vestibular system. If this remainder acceleration is not zero, the perceived vestibular acceleration of the passenger becomes a non zero value. Carsickness arises from the difference between these two perceived accelerations [6]. And it is intuitively predictable that carsickness will increase as this difference increases [9]. In the meantime, it is assumed that perceived visual acceleration is zero. Therefore, the larger the perceived vestibular acceleration the greater the carsickness, so the magnitude of the perceived vestibular acceleration was used as a measurement for performance comparison.

Fig. 2.
figure 2

Carsickness caused by a sensory conflict

3.3 Learning Environment and Hyper Parameters

The learning environment was configured using the UNITY ml-agents toolkit. For training, a proximal policy optimization (PPO) algorithm [8] that performs better than others and is the most commonly used [1] was used . Also, the hyper parameters used for learning are shown in Table 1. As the learning of 4 million steps progressed, the reward did not increase and the loss did not decrease. After completing the learning, the actuation signal of the power seat was generated using the learned model.

Fig. 3.
figure 3

Mean otolith response for two conditions

Table 1. Hyper parameters

3.4 Simulation Results

Figure 3 shows the mean otolith response when RL based power seat actuation is applied and when the power seat is stationary without motion. As seen in the figure, after applying RL based power seat actuation, the mean otolith response was reduced about 38.44%. A statistical analysis was performed to check whether the difference in mean otolith response between the two conditions was statistically significant. Consequently, there was a statistically significant difference between the mean otolith responses of the two conditions (F(1, 98) = 481.039, p < 0.001***).

4 Conclusions and Futureworks

This paper proposed an RL based power seat actuation method to alleviate the carsickness that AV passengers may experience. And simulation was performed to verify the proposed methodology. As a result, it was confirmed that the otolith response decreased by about 38% when the proposed method was applied. In the future, the authors of this paper will conduct a study to find the optimal reward that minimizes the otolith response, and will verify whether this methodology is actually effective through human studies.