Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning

GAO Ang;DONG Zhiming;YE Hongbing;SONG Jinghua;GUO Qisheng

Acta Armamentarii ›› 2021, Vol. 42 ›› Issue (5) : 1101-1110. DOI: 10.3969/j.issn.1000-1093.2021.05.023
Paper

Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning

  • GAO Ang1, DONG Zhiming1, YE Hongbing2, SONG Jinghua1, GUO Qisheng1
Author information +
History +

Abstract

Loitering munition penetration control decision (LMPCD) is an important research direction under the concept of “multi-domain war”. The research on real-time route planning of loitering munition penetration has important military significance. Traditional knowledge, reasoning, and planning methods do not have the ability to explore and discover new knowledge outside the framework. The bionic optimization method is suitable for solving the path planning problem in static environment, such as traveling salesman problem, and is difficult to be applied to the penetration problem of loitering munition with high requirement of environmental dynamics and real-time decision-making. For the limitations of the first two methods, the applicability of the deep reinforcement learning method is analyzed, and the domain knowledge of loitering munition is introduced into each element of the deep reinforcement learning algorithm. The flight motion model of loitering munition is analyzed, the state space, action space and reward function of loitering munition are designed, the algorithm framework of loitering munition penetration control decision is analyzed, and the training process of loitering munition penetration control decision algorithm is designed. Through the penetration simulation test of 1 000rounds of loitering munition, the result shows that the penetration success rate of loitering munition is 82.1% and the average decision time is 1.48 ms, which verifies the effectiveness of the algorithm training process and the control decision model.

Key words

loiteringmunition / deepreinforcementlearning / Markovdecisionprocess / penetration / controldecision

Cite this article

Download Citations
GAO Ang, DONG Zhiming, YE Hongbing, SONG Jinghua, GUO Qisheng. Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning. Acta Armamentarii. 2021, 42(5): 1101-1110 https://doi.org/10.3969/j.issn.1000-1093.2021.05.023

References


[1]庞艳珂,韩磊,张民权,等. 攻击型巡飞弹技术现状及发展趋势[J]. 兵工学报, 2010, 31(增刊2): 149-152.
PANG Y K, HAN L, ZHANG M Q, et al. Status and development trends of loitering attack missiles [J]. Acta Armamentarii, 2010, 31(S2): 149-152.(in Chinese)
[2]郭美芳,范宁军,袁志华. 巡飞弹战场运用策略[J]. 兵工学报, 2006, 27(5): 944-947.
GUO M F, FAN N J, YUAN Z H. Battlefield operational strategy of loitering munition [J]. Acta Armamentarii, 2006, 27(5): 944- 947. (in Chinese)
[3]刘杨,王华,王昊宇. 巡飞弹发展背后的作战理论与概念支撑[J]. 飞航导弹, 2018 (10): 51-55.
LIU Y, WANG H, WANG H Y. Operational theory and conceptual support behind the development of loitering munition [J]. Aero-dynamicMissile Journal, 2018 (10): 51-55. (in Chinese)
[4]郝峰,张栋,唐硕,等. 基于改进RRT算法的巡飞弹快速航迹规划方法[J]. 飞行力学, 2019, 37(3): 58-63.
HAO F, ZHANG D, TANG S, et al.A rapid route planning me-thodof loitering munitions based on improved RRT algorithm [J]. Flight Mechanics, 2019, 37(3): 58-63. (in Chinese)
[5]欧继洲,黄波. 巡飞弹在陆上无人作战体系中的应用初探[J]. 飞航导弹, 2019(5): 20-24.
OU J Z , HUANG B. Application of loitering munition in land unmanned combat system [J]. Aerodynamic Missile Journal, 2019(5):20-24. (in Chinese)
[6]王琼,刘美万,任伟建,等. 无人机航迹规划常用算法综述[J]. 吉林大学学报(信息科学版), 2019, 37(1): 58-67.
WANG Q, LIU M W, REN W J, et al. Overview of common algorithms for UAV path planning [J]. Journal of Jilin University (Information Science Edition), 2019, 37(1): 58-67. (in Chinese)
[7]张堃,李珂,时昊天,等.基于深度强化学习的UAV航路自主引导机动控制决策算法[J].系统工程与电子技术,2020,42(7): 1567-1574.
ZHANG K, LI K, SHI H T, et al. Autonomous guidance maneuver control and decision-making algorithm based on deep reinforcement learning UAV route [J]. Journal of Systems Engineering and Electronics,2020,42(7):1567-1574. (in Chinese)
[8]BouhamedO, Ghazzai H, Besbes H, et al. Autonomous UAV navigation: a DDPG-based deep reinforcement learning approach[EB/OL]. [2020-07-11]. http:∥arxiv.org/pdf/1509.02971.pdf.
[9]张建生. 国外巡飞弹发展概述[J]. 飞航导弹, 2015(6): 19-26.
ZHANG J S. Overview of foreign cruise missile development [J]. Aerodynamic Missile Journal, 2015 (6): 19-26. (in Chinese)
[10]李增彦,李小民,刘秋生. 风场环境下的巡飞弹航迹跟踪运动补偿算法[J]. 兵工学报, 2016, 37(12): 2377-2384.
LI Z Y, LI X M, LIU Q S. Trajectory tracking algorithm for motion compensation of loitering munition under wind environment [J]. Acta Armamentarii, 2016, 37(12): 2377-2384. (in Chinese)
[11]黎珍惜,黎家勋. 基于经纬度快速计算两点间距离及测量误差[J]. 测绘与空间地理信息, 2013, 36(11): 235-237.
LI Z X, LI J X. Quickly calculate the distance between two points and measurement error based on latitude and longitude[J]. Geomatics & Spatial Information Technology, 2013, 36(11):235-237.
[12]刘建伟,高峰,罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2019, 42(6): 1406-1438.
LIU J W, GAO F, LUO X L. A review of deep reinforcement learning based on value function and strategy gradient [J]. Chinese Journal of Computers, 2019, 42(6): 1406-1438. (in Chinese)
[13]刘全,翟建伟,章宗长. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27.
LIU Q, ZHAI J W, ZHANG Z C. A survey on deep reinforcement learning [J]. Chinese Journal of Computers, 2018, 41(1):1-27. (in Chinese)
[14]KONDAV R,TSITSIKLIS J N. Actor-Critic algorithms[C]∥Proceedings of Advances in Neural Information Processing Systems. Denver, CO, US: NIPS Foundation, 2000: 1008-1014.
[15]LILLICRAPT P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2020-07-11]. http:∥arxiv.org/pdf/1509.02971.pdf.



第42卷第5期2021年5月
兵工学报ACTA ARMAMENTARII
Vol.42No.5May2021

Accesses

Citation

Detail

Sections
Recommended

/