融合时空特性的孪生网络视觉跟踪

姜珊;底晓强;韩成

兵工学报 ›› 2021, Vol. 42 ›› Issue (9) : 1940-1950.

兵工学报 ›› 2021, Vol. 42 ›› Issue (9) : 1940-1950. DOI: 10.3969/j.issn.1000-1093.2021.09.015
论文

融合时空特性的孪生网络视觉跟踪

  • 姜珊, 底晓强, 韩成
作者信息 +

Siamese Network for Visual Tracking with Temporal-spatial Property

  • JIANG Shan, DI Xiaoqiang, HAN Cheng
Author information +
文章历史 +

摘要

针对现有跟踪算法处理快速运动和相似目标干扰挑战时精度欠佳的问题,提出一种基于时空注意力机制的孪生网络跟踪算法。设计时间注意力模块,利用视频初始帧作为参考,依照多幅历史参考帧的贡献程度,自适应地为其赋予权重并进行融合,构建时效性较强的多帧融合模板;结合空间注意力模块,通过非局部操作增强算法对跟踪图像的整体感知能力,进而提升算法的判别能力;在网络训练阶段,利用Focal Loss函数训练网络,以平衡正负样本的比例,提高算法判别困难样本的能力。仿真实验采用标准数据集OTB2015和VOT2016测试算法性能,并与近年来的12种优秀算法即ECO算法、DSST算法、HDT算法、CFNet算法、KCF算法、SRDCF算法、SiamFC算法、DCFNet算法、MEEM算法、SiamVGG算法、BACF算法、ANT算法进行对比。结果表明,融合时空特性的孪生网络跟踪算法可以很好地应对快速运动和相似目标干扰挑战,并有效提升基准算法的性能。

Abstract

A siamese network with temporal-spatial attention mechanism is proposed to tackle the problem of poor accuracy in dealing with fast motion and background clutter in the current algorithms. A temporal attention module is designed, the features of multi-frames are fused according to the contribution of each reference frame,and the weights are assigned adaptively to construct an effective temporal fused template. The spatial attention module is adopted to percept the whole tracking image by non-local operation,which can improve the discriminative ability of the network. During the training stage, the Focal Loss is utilized to train the network model,which can balance the proportion of positive and negative samples,and improve the ability of model to distinguish the hard samples. Several simanation experiments were conducted on the OTB2015 and VOT2016 benchmark to evaluate the performance of the proposed algorithm,and compared it with the state-of-the-art algorithms, i.e., ECO, DSST, HDT, CFNet, KCF, SRDCF, SiamFC, DCFnet, MEEM, SiamVGG, BACF and ANT algorithms. The experimental results demonstrate that the proposed siamese network tracking model with temporal-spatial attention property can well handle the fast motion and background clutter situation,and effectively improve the performance of baseline algorithm.

关键词

目标跟踪 / 孪生网络 / 时间注意力 / 空间注意力

Key words

objecttracking / siamesenetwork / temporalattention / spatialattention

引用本文

导出引用
姜珊, 底晓强, 韩成. 融合时空特性的孪生网络视觉跟踪. 兵工学报. 2021, 42(9): 1940-1950 https://doi.org/10.3969/j.issn.1000-1093.2021.09.015
JIANG Shan, DI Xiaoqiang, HAN Cheng. Siamese Network for Visual Tracking with Temporal-spatial Property. Acta Armamentarii. 2021, 42(9): 1940-1950 https://doi.org/10.3969/j.issn.1000-1093.2021.09.015

基金

国家自然科学基金青年基金项目(61702051、61602058)

参考文献


[1]梁杰,李磊,任君,等.基于深度学习的红外图像遮挡干扰检测方法[J].兵工学报,2019,40(7):1401-1410.
LIANG J,LI L,REN J,et al. Infrared image occlusion interference detection method based on deep learning[J].Acta Armamentarii,2019,40(7): 1401-1410.(in Chinese)
[2]GAOP,ZHANG Q Q,WANG F,et al. Learning reinforced attentional representation for end-to-end visual tracking[J].Information Sciences,2020,517:52-67.
[3]WANGN Y,YEUNG D Y.Learning a deep compact image representation for visual tracking[C]∥Proceedings of the 26th International Conference on Neural Information Processing Systems.Red Hook,NY,US:Curran Associates Inc.,2013:809-817.
[4]WANGN Y,LI S Y,GUPTA A,et al. Transferring rich feature hierarchies for robust visual tracking [EB/OL].(2015-04-23)[2020-05-24].https:∥arxiv.org/abs/1501.04587.
[5]MAC,HUANG J B,YANG X K,et al. Hierarchical convolutional features for visual tracking[C]∥Proceedings of the IEEE International Conference on Computer Vision.Santiago,Chile:IEEE,2015:3074-3082.
[6]NAMH,HAN B.Learning multi-domain convolutional neural networks for visual tracking[C]∥Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,US: IEEE,2016:4293-4302.
[7]BERTINETTOL,VALMADRE J,HENRIQUES J F,et al. Fully-convolutional siamese networks for object tracking[C]∥Proceedings of European Conference on Computer Vision.Cham,Switzerland: Springer,2016:850-865.
[8]VALMADREJ,BERTINETTO L,HENRIQUES J,et al.End-to-end representation learning for correlation filter based tracking[C]∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,US:IEEE,2017:5000-5008.
[9]LIB,YAN J J,WU W,et al.High performance visual tracking with siamese region proposal network[C]∥Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,US: IEEE,2018:8971-8980.
[10]ZHUZ,WU W,ZOU W,et al.End-to-end flow correlation tracking with spatial-temporal attention[C]∥Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,US: IEEE,2018:548-557.
[11]ZHANGZ P,PENG H W.Deeper and wider siamese networks for real-time visual tracking[C]∥Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,CA,US:IEEE,2019:4591-4600.
[12]KRIZHEVSKYA,SUTSKEYER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]∥Proceedings of the 25th International Conference on Neural Information Processing Systems.Red Hook,NY,US:Curran Associates Inc.,2012:1097-1105.
[13]WUY,LIM J W,YANG M H.Object tracking benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1834-1848.
[14]HADFIELDS J F,BOWDEN R,LEBEDA K. The visual object tracking VOT2016 challenge results [J].Lecture Notes in Computer Science,2016,9914:777-823.
[15]ZENGY Z,WANG H Y,LU T,et al.Learning spatial-channel attention for visual tracking[C]∥Proceedings of 2019 IEEE/CIC International Conference on Communications in China.Changchun,China:IEEE,2019:277-282.
[16]LINT Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]∥Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy: IEEE,2017:2980-2988.
[17]HUANGL H,ZHAO X,HUANG K Q,et al.GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(5):1562-1577.
[18]DANELLJANM,BHAT G,KHAN F S,et al. ECO:Efficient convolution operators for tracking[C]∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,US:IEEE,2017:6931-6939.
[19]DANELLJANM,HAGER G,KHAN F,et al.Accurate scale estimation for robust visual tracking[C]∥Proceedings of British Machine Vision Conference. Nottingham,UK:BMVA Press,2014.
[20]HENRIQUESJ F,CASEIRO R,MARTINS P, et al.High-speed tracking with kernelized correlation filters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(3):583-596.
[21]DANELLJANM,HAGER G,KAHN S F S,et al.Learning spatially regularized correlation filters for visual tracking[C]∥Proceedings of IEEE International Conference on Computer Vision.Santiago,Chile:IEEE,2015: 4310-4318.
[22]WANGQ,GAO J,XING J L,et al. DCFNet: discriminant correlation filters network for visual tracking[EB/OL].(2017-04-13)[2020-05-20].https:∥arxiv.org/abs/1704.04057.
[23]ZHANGJ M,MA S G,SCLAROFF S,et al.MEEM: robust tracking via multiple experts using entropy minimization[C]∥Proceedings of European Conference on Computer Vision.Cham,Switzerland: Springer,2014:188-203.
[24]QIY K,ZHANG S P,ZHANG W G,et al.Learning attribute-specific representations for visual tracking[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. Hawaii,HI,US:AAAI,2019:8835-8842.
[25]GALOOGAHIH K,FAGG A,LUCEY S,et al. Learning background-aware correlation filters for visual tracking[C]∥Proceedings of 2017 IEEE International Conference on Computer Vision. Venice,Italy:IEEE,2017:1144-1152.
[26]LIY H,ZHANG X F.SiamVGG: visual tracking using deeper siamese networks [EB/OL].(2019-03-03) [2020-05-28].https:∥ arxiv.org/abs/1902.02804.

151

Accesses

0

Citation

Detail

段落导航
相关文章

/