一种时空特征聚合的水下珊瑚礁鱼检测方法

陈智能; 史存存; 李轩涯; 贾彩燕; 黄磊

doi:10.13700/j.bh.1001-5965.2020.0444

一种时空特征聚合的水下珊瑚礁鱼检测方法

doi: 10.13700/j.bh.1001-5965.2020.0444

1.
中国科学院自动化研究所数字内容技术与服务研究中心, 北京 100190
2.
北京交通大学计算机与信息技术学院, 北京 100044
3.
百度公司, 北京 100085
4.
中国海洋大学信息科学与工程学院, 青岛 266100

基金项目:

国家自然科学基金 61772526

国家自然科学基金 61876016

国家自然科学基金 61872326

百度开放研究基金

详细信息

作者简介:
陈智能  男, 博士, 副研究员。主要研究方向: 多媒体内容分析与检索、医学影像分析

史存存  女, 硕士, 工程师。主要研究方向: 计算机视觉、深度学习

李轩涯  男, 博士。主要研究方向: 物联网计算、人工智能

贾彩燕  女, 博士, 教授, 博士生导师。主要研究方向: 社会计算、文本聚类、网络社区发现

黄磊  男, 博士, 副教授。主要研究方向: 多媒体内容分析与检索、计算机视觉

通讯作者:
李轩涯, E-mail: lixuanya@baidu.com

中图分类号: TP399
计量
- 文章访问数: 663
- HTML全文浏览量: 187
- PDF下载量: 60
- 被引次数: 0
出版历程
- 收稿日期: 2020-08-24
- 录用日期: 2020-09-19
- 网络出版日期: 2021-03-20

An underwater coral reef fish detection approach based on aggregation of spatio-temporal features

1.
Research Centre for Digital Content Technology and Services, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
2.
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
3.
Baidu Inc., Beijing 100085, China
4.
College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China

Funds:

National Natural Science Foundation of China 61772526

National Natural Science Foundation of China 61876016

National Natural Science Foundation of China 61872326

Baidu Open Research Program

More Information

Corresponding author: LI Xuanya, E-mail: lixuanya@baidu.com

摘要

摘要:
水下监控视频中的珊瑚礁鱼检测面临着视频成像质量不高、水下环境复杂、珊瑚礁鱼视觉多样性高等困难，是一个极具挑战的视觉目标检测问题，如何提取高辨识度的特征成为制约检测精度提升的关键。提出了一种时空特征聚合的水下珊瑚礁鱼检测方法，通过设计视觉特征聚合和时序特征聚合2个模块，融合多个维度的特征以实现这一目标。前者设计了自顶向下的切分和自底向上的归并方案，可实现不同分辨率多层卷积特征图的有效聚合；后者给出了一种帧差引导的相邻帧特征图融合方案，可通过融合多帧特征图强化运动目标及其周边区域的特征表示。公开数据集上的实验表明：基于以上2个模块设计的时空特征聚合网络可以实现对水下珊瑚礁鱼的有效检测，相比于多个主流方法和模型取得了更高的检测精度。
- 珊瑚礁鱼 /
- 卷积神经网络 /
- 时空联合特征 /
- 目标检测 /
- 特征融合
Abstract:
It is challenging to detect coral reef fish from underwater surveillance videos, due to issues like poor video imaging quality, complex underwater environment, high visual diversity of coral reef fish, etc. Extracting discriminative features to characterize the fishes has become a crucial issue that dominates the detection accuracy. This paper proposes an underwater coral reef fish detection method based on aggregation of spatio-temporal features. It is achieved by designing two modules for visual and temporal feature aggregation and fusing multi-dimensional features. The former designs a top-down partition and a bottom-up merging, which achieve effective aggregation of feature maps of different convolutional layers with varying resolutions. The latter devises a temporal feature fusion scheme based on the pixel difference between adjacent frames. It enhances the feature representation of moving objects and their surrounding area through the fusion of feature maps coming from adjacent frames. Experiments on a public dataset show that, by employing the spatio-temporal aggregation network built on top of the two proposed modules, we can effectively detect coral reef fishes in the challenging underwater environment. Higher detection accuracy are obtained compared with the existing methods and popular detection models.
- coral reef fish /
- convolutional neural network /
- spatio-temporal feature /
- object detection /
- feature fusion

HTML全文

图 1 本文时空特征聚合神经网络的整体结构

Figure 1. Overall architecture of the proposed spatio-temporal features aggregation neural network

下载: 全尺寸图片幻灯片

图 2 本文提出的视觉特征聚合模块和时序特征聚合模块

Figure 2. The proposed visual feature aggregation module and temporal feature aggregation module

下载: 全尺寸图片幻灯片

图 3 非极大值抑制和本文提出的时序后处理

Figure 3. Non-maximum suppression and the proposed temporal post-processing

下载: 全尺寸图片幻灯片

图 4 各种检测模型在不同珊瑚礁鱼类别上的检测结果

Figure 4. Detection results of different coral reef fish species by various detection models

下载: 全尺寸图片幻灯片

表 1 SeaCLEF数据集中不同类别鱼的数量

Table 1. Numbers of different fish species on SeaCLEF dataset

编号	珊瑚礁鱼名称	训练集样例数	测试集样例数
1	五带豆娘鱼	132	93
2	褐斑刺尾鲷	294	129
3	克氏双锯鱼	363	516
4	月斑蝴蝶鱼	1 217	1 896
5	川纹蝴蝶鱼	335	1 317
6	短身光腮雀鲷	275	24
7	宅泥鱼	894	1 985
8	网纹宅泥鱼	3 165	5 046
9	康德锯鳞鱼	242	118
10	黄新雀鲷	85	1 593
11	迪克氏固齿鲷	737	700
12	宝石高鳍刺尾鱼	72	187

下载: 导出CSV

表 2 不同融合方式及性能

Table 2. Different fusion methods and their performance

融合方式	mAP
融合方式	视觉特征聚合模块	时序特征聚合模块
对应相加	0.634 5	0.600 2
取最大值	0.632 8	0.601 2
取平均值	0.629 6	0.598 6

下载: 导出CSV

表 3 输入为3帧图像时不同参数下的网络性能

Table 3. Network performance under different parameters when three-frame images are input

采样邻域及间隔	2	4	6	8
mAP	0.599	0.601	0.602	0.602

下载: 导出CSV

表 4 输入为5帧图像时不同参数下的网络性能

Table 4. Network performance under different parameters when five-frame images are input

采样邻域及间隔	24	26	28	46	48	68
mAP	0.612	0.614	0.617	0.618	0.622	0.621

下载: 导出CSV

表 5 不同方法的检测性能

Table 5. Detection performance of different methods

模型	mAP		检测时间/s
模型	图像级	视频级	检测时间/s
BS+GoogleNet^[20]	0.597	0.603
Faster R-CNN^[26]	0.571	0.581	0.153
YOLOv3^[28]	0.553	0.562	0.022
SSD^[23]	0.576	0.586	0.050
FFDet^[21]	0.614	0.628	0.065
FGFA^[10]	0.643	0.647	0.384
Ours-VFA	0.624	0.635	0.067
Ours-TFA	0.619	0.622	0.113
Ours-VTFA	0.652	0.656	0.121

下载: 导出CSV

参考文献(38)

[1]	CINNER J E, HUCHERY C, MACNEIL M A, et al. Bright spots among the world's coral reefs[J]. Nature, 2016, 535(7612): 416-419. doi: 10.1038/nature18607
[2]	赵焕庭, 王丽荣, 袁家义. 南海诸岛珊瑚礁可持续发展[J]. 热带地理, 2016, 36(1): 55-65. ZHAO H T, WANG L R, YUAN J Y. Sustainable development of the coral reefs in the South China Sea Islands[J]. Tropical Geography, 2016, 36(1): 55-65(in Chinese).
[3]	GRAHAM N A J, EVANS R D, RUSS G R. The effects of marine reserve protection on the trophic relationships of reef fishes on the Great Barrier Reef[J]. Environmental Conservation, 2003, 30(2): 200-208. doi: 10.1017/S0376892903000195
[4]	HODGSON G. Reef check: The first step in community-based management[J]. Bulletin of Marine Science, 2001, 69(2): 861-868.
[5]	李永振, 史赟荣, 艾红, 等. 南海珊瑚礁海域鱼类分类多样性大尺度分布格局[J]. 中国水产科学, 2011, 18(3): 619-628. LI Y Z, SHI Y R, AI H, et al. Large scale distribution patterns of taxonomic diversity of fish in coral reef waters, South China Sea[J]. Journal of Fishery Sciences of China, 2011, 18(3): 619-628(in Chinese).
[6]	SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems, 2014: 568-576.
[7]	DONAHUE J, HENDRICKS L A, ROHRBACH M, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 2625-2634.
[8]	CHEN Z, AI S, JIA C. Structure-aware deep learning for product image classification[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15(1s): 1-20.
[9]	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4489-4497.
[10]	ZHU X, WANG Y, DAI J, et al. Flow-guided feature aggregation for video object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 408-417.
[11]	ZHAO B, WU X, FENG J, et al. Diversified visual attention networks for fine-grained object classification[J]. IEEE Transactions on Multimedia, 2017, 19(6): 1245-1256. doi: 10.1109/TMM.2017.2648498
[12]	PU J, JIANG Y G, WANG J, et al. Which looks like which: Exploring inter-class relationships in fine-grained visual categorization[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 425-440.
[13]	LEE D J, SCHOENBERGER R B, SHIOZAWA D, et al. Contour matching for a fish recognition and migration-monitoring system[C]//Two- and Three-Dimensional Vision Systems for Inspection, Control, and Metrology Ⅱ. International Society for Optics and Photonics, 2004, 5606: 37-48.
[14]	FOUAD M M M, ZAWBAA H M, EL-BENDARY N, et al. Automatic nile tilapia fish classification approach using machine learning techniques[C]//13th International Conference on Hybrid Intelligent Systems. Piscataway: IEEE Press, 2013: 173-178.
[15]	LARSEN R, OLAFSDOTTIR H, ERSBØLL B K. Shape and texture based classification of fish species[C]//SCIA 2009: Image Analysis. Berlin: Springer, 2009: 745-749.
[16]	SPAMPINATO C, GIORDANO D, DI SALVO R, et al. Automatic fish classification for underwater species behavior understanding[C]//Proceedings of the First ACM International Workshop on Analysis and Retrieval of Tracked Events and Motion In Imagery Streams. New York: ACM Press, 2010: 45-50.
[17]	WEI G, WEI Z, HUANG L, et al. Robust underwater fish classification based on data augmentation by adding noises in random local regions[C]//Pacific Rim Conference on Multimedia. Berlin: Springer, 2018: 509-518.
[18]	JOLY A, GOËAU H, GLOTIN H, et al. LifeCLEF 2016: Multimedia life species identification challenges[C]//International Conference of the Cross-Language Evaluation Forum for European Languages. Berlin: Springer, 2016: 286-310.
[19]	CHOI S. Fish identification in underwater video with deep convolutional neural network: SNUMedinfo at LifeCLEF fish task 2015[C]//CLEF (Working Notes), 2015: 1-10.
[20]	JÄGER J, RODNER E, DENZLER J, et al. SeaCLEF 2016: Object proposal classification for fish detection in underwater videos[C]//CLEF (Working Notes), 2016: 481-489.
[21]	SHI C, JIA C, CHEN Z. FFDet: A fully convolutional network for coral reef fish detection by layer fusion[C]//2018 IEEE International Conference on Visual Communications and Image Processing. Piscataway: IEEE Press, 2018: 1-4.
[22]	ZHUANG P Q, XING L J, LIU Y L, et al. Marine animal detection and recognition with advanced deep learning models[C]//Working Notes of the 8th International Conference of the CLEF Initiative, 2017: 1-9.
[23]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[24]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[25]	JAISAKTHI S M, MIRUNALINI P, ARAVINDAN C. Coral reef annotation and localization using faster R-CNNs[C]//Working Notes of the 8th International Conference of the CLEF Initiative, 2019: 1-6.
[26]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems, 2015: 91-99.
[27]	BOGOMASOV K, GRAWE P, CONRAD S. A two-staged approach for localization and classification of coral reef structures and compositions[C]//Working Notes of the 8th International Conference of the CLEF Initiative, 2019: 1-11.
[28]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[29]	ZIVKOVIC Z. Improved adaptive Gaussian mixture model for background subtraction[C]//Proceedings of the 17th International Conference on Pattern Recognition(ICPR). Piscataway: IEEE Press, 2004, 2: 28-31.
[30]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems, 2012: 1097-1105.
[31]	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1-9.
[32]	CAI Z, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162.
[33]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 658-666.
[34]	LAW H, DENG J. Cornernet: Detecting objects as paired keypoints[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 734-750.
[35]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2117-2125.
[36]	ZHANG S, WEN L, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4203-4212.
[37]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 3213-3223.
[38]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2020-08-01].