安全最优跟踪控制算法与机械手仿真

陈文杰,崔小红,王斌锐

兵工学报 ›› 2024, Vol. 45 ›› Issue (8) : 2688-2697.

PDF(3546 KB)
PDF(3546 KB)
兵工学报 ›› 2024, Vol. 45 ›› Issue (8) : 2688-2697. DOI: 10.12382/bgxb.2023.0576
论文

安全最优跟踪控制算法与机械手仿真

  • 陈文杰, 崔小红*(), 王斌锐
作者信息 +

Safety Optimal Tracking Control Algorithm and Robot Arm Simulation

  • CHEN Wenjie,CUI Xiaohong*,WANG Binrui
Author information +
文章历史 +

摘要

为确保安全关键型系统在安全区域内运行并保持最佳性能,提出一种基于强化学习的安全最优控制跟踪算法。通过在评价函数中加入控制屏障函数,兼顾系统的安全性和最优性;通过在控制屏障函数中加入阻尼系数,指定其对评价函数的相对支配能力。引入强化学习的思想,实现系统动力学未知系统的安全最优跟踪控制,证明跟踪控制系统能在安全区域内能实现最优性和稳定性。利用二连杆平面机械手仿真验证所提算法的有效性,仿真实验结果表明:机械手末端位置被控制在安全范围内的同时,也实现稳定下的最优性能表现;仿真结果证明所提算法能实现系统状态安全下的最优的跟踪控制效果。

Abstract

A safety optimal tracking control algorithm based on reinforcement learning is proposed to ensure that safety-critical systems operate within a safe area and maintain optimal performance. Both the safety and optimality of the system are considered by adding a control barrier function in the evaluation function. The relative dominance of the control barrier function over the evaluation function is specified by adding damping coefficients to the control barrier function. The idea of reinforcement learning is introduced to realize the safety optimal tracking control of the system with unknown system dynamics. It has been proven that the tracking control system can achieve optimality and stability within a safe region. The effectiveness of the proposed algorithm is verified through simulation of a two-link planar manipulator. The experimental results show that the end position of the manipulator is controlled within a safe range, while also achieving optimal performance during stabilization. The simulated results demonstrate that the proposed algorithm can achieve safe and optimal tracking control effects.

关键词

强化学习 / 安全最优跟踪控制 / 控制屏障函数 / 阻尼系数

Key words

reinforcementlearning / safetyoptimaltrackingcontrol / controlbarrierfunction / dampingcoefficient

引用本文

导出引用
陈文杰,崔小红,王斌锐. 安全最优跟踪控制算法与机械手仿真. 兵工学报. 2024, 45(8): 2688-2697 https://doi.org/10.12382/bgxb.2023.0576
CHEN Wenjie,CUI Xiaohong,WANG Binrui. Safety Optimal Tracking Control Algorithm and Robot Arm Simulation. Acta Armamentarii. 2024, 45(8): 2688-2697 https://doi.org/10.12382/bgxb.2023.0576
中图分类号: O232   

参考文献

[1]MANNUCCIT, VAN K E J, DE V C, et al. Safe exploration algorithms for reinforcement learning controllers[J]. IEEE transactions on neural networks and learning systems, 2017, 29(4):1069-1081.
[2]NGUYENQ, SREENATH K. Robust safety-critical control for dynamic robotics[J]. IEEE Transactions on Automatic Control, 2021, 67(3): 1073-1088.
[3]胡一帆, 刘克新, 付俊杰, 等. 基于安全强化学习的模型参考跟踪控制[J]. 控制工程, 2024, 31(1):80-87.
HU Y F, LIU K X, FU J J, et al. Model reference tracking control based on security reinforcement learning[J]. Control Engineering, 2024, 31(1):80-87. (in Chinese)
[4]张传昊, 李豪杰, 宫雪峰, 等. 基于电子安全系统的巡飞弹引信多态安全逻辑控制方法设计及验证[J]. 兵工学报, 2023, 44(10):3079-3090.
ZHANG C H, LI H J, GONG X F, et al. Design and verification of polymorphic safety logic control method for cruise ammunition fuze based on electronic safety system[J]. Acta Armamentarii, 2023, 44(10):3079-3090. (in Chinese)
[5]YINY H. Design of deep learning based autonomous driving control algorithm[C]∥Proceedings of the 2022 2nd International Conference on Consumer Electronics and Computer Engineering. Guangzhou, China: IEEE, 2022: 423-426.
[6]WANGH J, PENG J Z, ZHANG F F, et al. High-order control barrier functions-based impedance control of a robotic manipulator with time-varying output constraints[J]. ISA transactions, 2022, 129(Part B): 361-369.
[7]AMESA D, XU X, GRIZZLE J W, et al. Control barrier function based quadratic programs for safety critical systems[J]. IEEE Transactions on Automatic Control, 2016, 62(8): 3861-3876.
[8]WANGL, HAN D K, EGERSTEDT M. Permissive barrier certificates for safe stabilization using sum-of-squares[C]∥Proceedings of 2018 Annual American Control Conference. Milwaukee, WI, US: IEEE, 2018: 585-590.
[9]COHENM H, BELTA C. Approximate optimal control for safety-critical systems with control barrier functions[C]∥Proceedings of the 2020 59th IEEE conference on decision and control. Jeju, Korea: IEEE, 2020: 2062-2067.
[10]MARVIZ, KIUMARSI B. Safe reinforcement learning: a control barrier function optimization approach[J]. International Journal of Robust and Nonlinear Control, 2021, 31(6): 1923-1940.
[11]PANAGOUD, STIPANOVIAC'U2 D M, VOULGARIS P G. Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions[J]. IEEE Transactions on Automatic Control, 2015, 61(3): 617-632.
[12]WANGL, AMESA D, Egerstedt M. Safety barrier certificates for collisions-free multirobot systems[J]. IEEE Transactions on Robotics, 2017, 33(3): 661-674.
[13]HOUY K, WANG H, WEI Y H, et al. Robust adaptive finite-time tracking control for Intervention-AUV with input saturation and output constraints using high-order control barrier function[J]. Ocean Engineering, 2023, 268: 113-119.
[14]LIUX, ZHANG M J, YAO F, et al. Barrier lyapunov function based adaptive region tracking control for underwater vehicles with thruster saturation and dead zone[J]. Journal of the Franklin Institute, 2021, 358(11): 5820-5844.
[15]韩子勇, 苑士华, 裴伟亚, 等. 摇臂悬挂机动平台运动姿态调节最优控制研究[J]. 兵工学报, 2019, 40(11): 2184-2194.
HAN Z Y, YUAN S H, PEI W Y, et al. Research on optimal control of motion attitude adjustment for rocker suspension mobile platform[J]. Acta Armamentarii, 2019, 40(11): 2184-2194.(in Chinese)
[16]黄英博, 吕永峰, 赵刚, 等. 非线性主动悬架系统自适应最优控制[J]. 控制与决策, 2022, 37(12): 3197-3206.
HUANG Y B, L Y F, ZHAO G, et al. Adaptive optimal control of nonlinear active suspension systems[J]. Journal of Control and Decision, 2022, 37(12): 3197-3206. (in Chinese)
[17]罗傲, 肖文彬, 周琪, 等.基于强化学习的一类具有输入约束非线性系统最优控制[J].控制理论与应用, 2022, 39(1): 154-164.
LUO A, XIAO W B, ZHOU Q, et al. Optimal control for a class of nonlinear systems with input constraints based on reinforcement learning [J]. Journal of Control Theory and Applications, 2022, 39(1): 154-164. (in Chinese)
[18]LIUD R, XUE S, ZHAO B, et al. Adaptive dynamic programming for control: a survey and recent advances[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021(1):142-160.
[19]LVY F, NA J, YANG Q M, et al. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics[J]. International Journal of Control, 2016, 89(1): 99-112.
[20]LIY L, LIU D R, WANG D.Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(3): 706-714.
[21]HAMIDREZAM, FRANK L L, JIANG Z P. Tracking control of completely unknown continuous-time systems via off-policy reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2550-2562.
[22]ZHANGH G, CUI X H, LUO Y H, et al. Finite-horizon tracking control for unknown nonlinear systems with saturating actuators[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(4): 1200-1212.
[23]WANGJ G, ZHANG D H, ZHANG J S, et al. Safe adaptive dynamic programming method for nonlinear safety-critical systems with disturbance[C]∥Proceedings of the 2021 6th International Conference on Robotics and Automation Engineering. Guangzhou, China: IEEE, 2021: 97-101.
[24]QINC B, WANG J G, ZHU H Y, et al. Safe adaptive learning algorithm with neural network implementation for H control of nonlinear safety‐critical system[J]. International Journal of Robust and Nonlinear Control, 2023, 33(1): 372-391.
[25]LEWISF L, VRABIE D, SYRMOS V L. Optimal control[M]. Hoboken, NJ, US:John Wiley & Sons, 2012.
[26]SARIDISG N, LEE C S G. An approximation theory of optimal control for trainable manipulators[J]. IEEE Transactions on systems, Man, and Cybernetics, 1979, 9(3): 152-159.
[27]VIDYASAGARM. Nonlinear systems analysis[M]. Philadelphia, PA, US:Society for Industrial and Applied Mathematics, 2002.
[28]刘金琨. 滑模变结构控制 MATLAB 仿真[M]. 北京:清华大学出版社, 2005.
LIU J K. MATLAB simulation of sliding mode Variable structure control[M]. Beijing:Tsinghua University Press, 2005. (in Chinese)br>
PDF(3546 KB)

17

Accesses

0

Citation

Detail

段落导航
相关文章

/