Abstract
In order to improve the agility and applicability of trajectory planning algorithm for autonomous vehicles, this paper proposes a novel actor-critic based learning method for decision-making and planning in multi-vehicle complex traffic. It is the coupling planning of vehicle’s path and speed thus to make the trajectory more flexible. First, generations from the decided action to the planned trajectory are described by the end-point of the trajectory. Then, the actor-critic based learning method is built to learn an optimal policy for the decision process. It can update the policy by the gradient of the current policy’s advantage. In this process, features of the real traffic are carefully extracted by time headway (TH) and speed distribution. Reward function is built by the safety, efficiency and driving comfort. Furthermore, to make the policy network have better convergency, the policy network is modularized in two parts: the lane-changing network and the lane-keeping network, which decide the optimal end-point of the path and speed candidates respectively. Finally, the curved overtaking scenario and the interaction process with human driver are conducted to illustrate the feasibility and superiority. The results show that the proposed method has better real-time performance and can make the planned coupling trajectory more continuous and smoother than the existing rule-based method.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Cheng S, Li L, Chen X, et al. Model-predictive-control-based path tracking controller of autonomous vehicle considering parametric uncertainties and velocity-varying. IEEE Trans Ind Electron, 2020, 1
Cesari G, Schildbach G, Carvalho A, et al. Scenario model predictive control for lane change assistance and autonomous driving on highways. IEEE Intell Transp Syst Mag, 2017, 9: 23–35
Cheng S, Li L, Liu C Z, et al. Robust LMI-based H-infinite controller integrating AFS and DYC of autonomous vehicles with parametric uncertainties. IEEE Trans Syst Man Cybern Syst, 2020, 1–10
Kasper D, Weidl G, Dang T, et al. Object-oriented bayesian networks for detection of lane change maneuvers. IEEE Intell Transp Syst Mag, 2012, 4: 19–31
Xie Z W, Zhang Q, Jiang Z N, et al. Robot learning from demonstration for path planning: A review. Sci China Tech Sci, 2020, 63: 1325–1334
Ulbrich S, Maurer M. Probabilistic online POMDP decision making for lane changes in fully automated driving. In: Proceedings of the IEEE Conference on Intelligent Transportation Systems. The Hague, 2013. 2063–2067
Li L, Ota K, Dong M. Humanlike driving: Empirical decision-making system for autonomous vehicles. IEEE Trans Veh Technol, 2018, 67: 6814–6823
Yang D G, Jiang K, Zhao D, et al. Intelligent and connected vehicles: Current status and future perspectives. Sci China Tech Sci, 2018, 61: 1446–1471
Lu C, Wang H, Lv C, et al. Learning driver-specific behavior for overtaking: A combined learning framework. IEEE Trans Veh Technol, 2018, 67: 6788–6802
Ngai D C K, Yung N H C. A multiple-goal reinforcement learning method for complex vehicle overtaking maneuvers. IEEE Trans Intell Transp Syst, 2011, 12: 509–522
Noh S, An K. Decision-making framework for automated driving in highway environments. IEEE Trans Intell Transp Syst, 2018, 19: 58–71
Huang Z, Chu D, Wu C, et al. Path planning and cooperative control for automated vehicle platoon using hybrid automata. IEEE Trans Intell Transp Syst, 2019, 20: 959–974
Feng G, Wang W, Feng J, et al. Modelling and simulation for safe following distance based on vehicle braking process. In: Proceedings of the IEEE Conference on E-business Engineering. Shanghai, 2010. 385–388
Ji J, Khajepour A, Melek W W, et al. Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints. IEEE Trans Veh Technol, 2017, 66: 952–964
Wang J F, Zhang Q, Zhang Z Q, et al. Structured trajectory planning of collision-free lane change using the vehicle-driver integration data. Sci China Tech Sci, 2016, 59: 825–831
Gonzalez D, Perez J, Milanes V, et al. A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst, 2016, 17: 1135–1145
McNaughton M, Urmson C, Dolan J, et al. Motion planning for autonomous driving with a conformal spatiotemporal lattice. In: Proceedings of the IEEE Conference on Robotics and Automation. Shanghai, 2011. 4889–4895
Park B, Lee Y C, Han W Y. Trajectory generation method using Bézier spiral curves for high-speed on-road autonomous vehicles. In: Proceedings of the IEEE Conference on Automation Science and Engineering. Taipei, 2014. 927–932
Li X, Sun Z, Cao D, et al. Real-time trajectory planning for autonomous urban driving: Framework, algorithms, and verifications. IEEE/ASME Trans Mechatron, 2016, 21: 740–753
Li P, Duan H B. Path planning of unmanned aerial vehicle based on improved gravitational search algorithm. Sci China Tech Sci, 2012, 55: 2712–2719
Yu L, Shao X, Yan X. Autonomous overtaking decision making of driverless bus based on deep Q-learning method. In: Proceedings of the IEEE Conference on Robotics and Biomimetics. Macau, 2017
Tram T, Jansson A, Grönberg R, et al. Learning negotiating behavior between cars in intersections using deep Q-learning. In: Proceedings of the IEEE Conference on Intelligent Transportation Systems. Maui, 2018
Zhu M, Wang X, Wang Y. Human-like autonomous car-following model with deep reinforcement learning. Transpation Res Part C-Emerging Technologies, 2018, 97: 348–368
Wang P, Chan C Y, Fortelle A. A reinforcement learning based approach for automated lane change maneuvers. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV). Changshu, 2018. 1379–1384
Wnag C Y, Zhao W Z, Xu Z J, et al. Path planning and stability control of collision avoidance system based on active front steering. Sci China Tech Sci, 2017, 60: 1231–1243
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, 2016
Hillenbrand J, Spieker A M, Kroschel K. A multilevel collision mitigation approach—Its situation assessment, decision making, and performance tradeoffs. IEEE Trans Intell Transp Syst, 2006, 7: 528–540
Ward J R, Agamennoni G, Worrall S, et al. Extending Time to Collision for probabilistic reasoning in general traffic scenarios. Transpation Res Part C-Emerging Technologies, 2015, 51: 66–82
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Jiangsu Key R&D Plan (Grant No. BE2018124), the National Natural Science Foundation of China (Grant Nos. 51775007 and 51875279), and the Postgraduate Research and Practice Innovation Program of Jiangsu Province (Grant No. KYCX19_0157).
Rights and permissions
About this article
Cite this article
Xu, C., Zhao, W., Chen, Q. et al. An actor-critic based learning method for decision-making and planning of autonomous vehicles. Sci. China Technol. Sci. 64, 984–994 (2021). https://doi.org/10.1007/s11431-020-1729-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-020-1729-2