Aiming at the defect that the single vision tracking algorithm is easily affected by the occlusion, an object detection and tracking algorithm based on the audio-video information fusion was proposed. The whole algorithm framework included three modules: video detection and tracking, acoustic source localization, audio-video information fusion tracking. The YOLOv5m algorithm was adopted by the video detection and tracking module as the framework of visual inspection, and the unscented Kalman filter and Hungary algorithm were used to achieve multi-object tracking and matching. The cross microphone array was adopted by the acoustic source localization module to obtain the audio information, and according to the time delay of receiving signals of each microphone, the acoustic source orientation was calculated. The audio-video likelihood function and audio-video importance sampling function were constructed by the audio-video information fusion tracking module, and the importance particle filter was used as the audio-video information fusion tracking algorithm to achieve object tracking. The performance of the algorithm was tested in complex indoor environment. The experimental results show that the tracking accuracy of the proposed algorithm reaches 90.68%, which has better performance than single mode algorithm.
Key words
audio-visual fusion /
object tracking algorithm /
object detection /
acoustic source localization
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
尹宏鹏, 陈波, 柴毅, 等. 基于视觉的目标检测与跟踪综述[J]. 自动化学报,2016,42(10):1466-1489.
许婉君, 侯志强, 余旺盛, 等. 基于颜色和空间信息的多特征融合目标跟踪算法[J]. 应用光学,2015,36(5):755-761.
邵辰琳, 杨卫平, 张志龙. 基于简单线性迭代聚类超像素的meanshift跟踪[J]. 应用光学,2017,38(2):193-199.
崔玮玮, 曹志刚, 魏建强. 声源定位中的时延估计技术[J]. 数据采集与处理,2007,22(1):90-99.
李昕. 基于音频视频信息融合的人物跟踪及其应用[D]. 北京: 清华大学, 2005.
谢静. 基于音视频融合的定位跟踪算法[D]. 天津: 天津大学, 2009.
石勇, 韩崇昭. 自适应UKF算法在目标跟踪中的应用[J]. 自动化学报,2011,37(6):755-759.
行鸿彦, 杨旭, 张金玉. 基于四元传声器阵列的声源全方位定位算法[J]. 仪器仪表学报,2018,39(11):43-50.
孙建红, 张涛, 焦琛. 麦克风数量与阵型对声源定位性能的
YIN Hongpeng, CHEN Bo, CHAI Yi, et al. Vision-based object detection and tracking: a review[J]. Acta Automatica Sinica,2016,42(10):1466-1489.
XU Wanjun, HOU Zhiqiang, YU Wangsheng, et al. Fusing multi-feature for object tracking algorithm based on color and space information[J]. Journal of Applied Optics,2015,36(5):755-761.
SHAO Chenlin, YANG Weiping, ZHANG Zhilong. Meanshift tracking algorithm based on SLIC superpixel[J]. Journal of Applied Optics,2017,38(2):193-199.
CUN Weiwei, CAO Zhigang, WEI Jianqiang. Time delay estimation techniques in source location[J]. Journal of Data Acquisition and Processing,2007,22(1):90-99.
VERMAAK J, BLAKE A, GANGNET M, et al. Sequential Monte Carlo fusion of sound and vision for speaker tracking[C]//Proceedings of the 8th IEEE International Conference on Computer Vision(ICCV2001). Vancouver, Canada: IEEE Press, 2001: 741-746.
CHEN Y, RUI Y. Real-time speaker tracking using particle filter sensor fusion[J]. IEEE,2004,92(3):485-494.
LI Xin. Hu
{{custom_fnGroup.title_en}}
Footnotes
{{custom_fn.content}}