Abstract:In recent years, video event detection has attracted increasing attention in the field of computer vision. Uncertainty estimation can alert decision-making systems or personnel when the output detection results are unreliable, reducing decision-making errors. In this paper, we proposed an end-to-end video event detection algorithm that fuses uncertainty estimation into the event detection task. The proposed algorithm estimates the localization and classification uncertainty for the input video events and optimizes network performance by reducing the predicted uncertainty. In addition, this paper combines uncertainty estimation with the Non-Maximum Suppression strategy to further filter the high-quality prediction boxes. The experimental results show that adding an uncertainty branch can improve the algorithm’s performance. On the J-HMDB-21 dataset, the proposed algorithm improves the mAP50 detection index by 0.8% compared with advanced algorithms. On the atomic visual actions (AVA) dataset, compared with other end-to-end networks, the mAP50 index is also improved by 1.3%.
表 1 不同初始化方差对算法性能的影响
Table 1. Influence of different initialization variance on algorithm performance
表 2 不同视频帧参数对算法性能的影响
Table 2. Influence of different video frame parameters on algorithm performance
算法 算法类型 数据类型 mAP50/% I3D[10] 双阶段 V+F 15.6 ACRN, S3D[30] 双阶段 V+F 17.4 STEP, I3D[31] 双阶段 V+F 18.6 RTPR[29] 双阶段 V+F 22.3 LFB, R101+NL[32] 双阶段(离线) V 27.4 ACAR, R50, 8x8, (64-f)[30] 双阶段(离线) V 28.3 SlowFast, R50,8x8,(64-f)[33] 双阶段(离线) V 24.8 YOWO(32-f)[12] 单阶段 V 18.3 UC-YOWO(32-f) 单阶段 V 18.5 UC-YOWO+StdNMS(32-f) 单阶段 V 19.6 -
