A siamese network with temporal-spatial attention mechanism is proposed to tackle the problem of poor accuracy in dealing with fast motion and background clutter in the current algorithms. A temporal attention module is designed, the features of multi-frames are fused according to the contribution of each reference frame,and the weights are assigned adaptively to construct an effective temporal fused template. The spatial attention module is adopted to percept the whole tracking image by non-local operation,which can improve the discriminative ability of the network. During the training stage, the Focal Loss is utilized to train the network model,which can balance the proportion of positive and negative samples,and improve the ability of model to distinguish the hard samples. Several simanation experiments were conducted on the OTB2015 and VOT2016 benchmark to evaluate the performance of the proposed algorithm,and compared it with the state-of-the-art algorithms, i.e., ECO, DSST, HDT, CFNet, KCF, SRDCF, SiamFC, DCFnet, MEEM, SiamVGG, BACF and ANT algorithms. The experimental results demonstrate that the proposed siamese network tracking model with temporal-spatial attention property can well handle the fast motion and background clutter situation,and effectively improve the performance of baseline algorithm.
JIANG Shan, DI Xiaoqiang, HAN Cheng.
Siamese Network for Visual Tracking with Temporal-spatial Property. Acta Armamentarii. 2021, 42(9): 1940-1950 https://doi.org/10.3969/j.issn.1000-1093.2021.09.015
基金
国家自然科学基金青年基金项目(61702051、61602058)
{{custom_fund}}
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]梁杰,李磊,任君,等.基于深度学习的红外图像遮挡干扰检测方法[J].兵工学报,2019,40(7):1401-1410. LIANG J,LI L,REN J,et al. Infrared image occlusion interference detection method based on deep learning[J].Acta Armamentarii,2019,40(7): 1401-1410.(in Chinese) [2]GAOP,ZHANG Q Q,WANG F,et al. Learning reinforced attentional representation for end-to-end visual tracking[J].Information Sciences,2020,517:52-67. [3]WANGN Y,YEUNG D Y.Learning a deep compact image representation for visual tracking[C]∥Proceedings of the 26th International Conference on Neural Information Processing Systems.Red Hook,NY,US:Curran Associates Inc.,2013:809-817. [4]WANGN Y,LI S Y,GUPTA A,et al. Transferring rich feature hierarchies for robust visual tracking [EB/OL].(2015-04-23)[2020-05-24].https:∥arxiv.org/abs/1501.04587. [5]MAC,HUANG J B,YANG X K,et al. Hierarchical convolutional features for visual tracking[C]∥Proceedings of the IEEE International Conference on Computer Vision.Santiago,Chile:IEEE,2015:3074-3082. [6]NAMH,HAN B.Learning multi-domain convolutional neural networks for visual tracking[C]∥Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,US: IEEE,2016:4293-4302. [7]BERTINETTOL,VALMADRE J,HENRIQUES J F,et al. Fully-convolutional siamese networks for object tracking[C]∥Proceedings of European Conference on Computer Vision.Cham,Switzerland: Springer,2016:850-865. [8]VALMADREJ,BERTINETTO L,HENRIQUES J,et al.End-to-end representation learning for correlation filter based tracking[C]∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,US:IEEE,2017:5000-5008. [9]LIB,YAN J J,WU W,et al.High performance visual tracking with siamese region proposal network[C]∥Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,US: IEEE,2018:8971-8980. [10]ZHUZ,WU W,ZOU W,et al.End-to-end flow correlation tracking with spatial-temporal attention[C]∥Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,US: IEEE,2018:548-557. [11]ZHANGZ P,PENG H W.Deeper and wider siamese networks for real-time visual tracking[C]∥Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,CA,US:IEEE,2019:4591-4600. [12]KRIZHEVSKYA,SUTSKEYER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]∥Proceedings of the 25th International Conference on Neural Information Processing Systems.Red Hook,NY,US:Curran Associates Inc.,2012:1097-1105. [13]WUY,LIM J W,YANG M H.Object tracking benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1834-1848. [14]HADFIELDS J F,BOWDEN R,LEBEDA K. The visual object tracking VOT2016 challenge results [J].Lecture Notes in Computer Science,2016,9914:777-823. [15]ZENGY Z,WANG H Y,LU T,et al.Learning spatial-channel attention for visual tracking[C]∥Proceedings of 2019 IEEE/CIC International Conference on Communications in China.Changchun,China:IEEE,2019:277-282. [16]LINT Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]∥Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy: IEEE,2017:2980-2988. [17]HUANGL H,ZHAO X,HUANG K Q,et al.GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(5):1562-1577. [18]DANELLJANM,BHAT G,KHAN F S,et al. ECO:Efficient convolution operators for tracking[C]∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,US:IEEE,2017:6931-6939. [19]DANELLJANM,HAGER G,KHAN F,et al.Accurate scale estimation for robust visual tracking[C]∥Proceedings of British Machine Vision Conference. Nottingham,UK:BMVA Press,2014. [20]HENRIQUESJ F,CASEIRO R,MARTINS P, et al.High-speed tracking with kernelized correlation filters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(3):583-596. [21]DANELLJANM,HAGER G,KAHN S F S,et al.Learning spatially regularized correlation filters for visual tracking[C]∥Proceedings of IEEE International Conference on Computer Vision.Santiago,Chile:IEEE,2015: 4310-4318. [22]WANGQ,GAO J,XING J L,et al. DCFNet: discriminant correlation filters network for visual tracking[EB/OL].(2017-04-13)[2020-05-20].https:∥arxiv.org/abs/1704.04057. [23]ZHANGJ M,MA S G,SCLAROFF S,et al.MEEM: robust tracking via multiple experts using entropy minimization[C]∥Proceedings of European Conference on Computer Vision.Cham,Switzerland: Springer,2014:188-203. [24]QIY K,ZHANG S P,ZHANG W G,et al.Learning attribute-specific representations for visual tracking[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. Hawaii,HI,US:AAAI,2019:8835-8842. [25]GALOOGAHIH K,FAGG A,LUCEY S,et al. Learning background-aware correlation filters for visual tracking[C]∥Proceedings of 2017 IEEE International Conference on Computer Vision. Venice,Italy:IEEE,2017:1144-1152. [26]LIY H,ZHANG X F.SiamVGG: visual tracking using deeper siamese networks [EB/OL].(2019-03-03) [2020-05-28].https:∥ arxiv.org/abs/1902.02804.