基于结构重参数化的深度可分离卷积神经网络

陈红; 闫建国; 杨华; 张靖; 李伟; 杨靖

doi:10.13700/j.bh.1001-5965.2024.0287

基于结构重参数化的深度可分离卷积神经网络

doi: 10.13700/j.bh.1001-5965.2024.0287

陈红¹,
闫建国²,
杨华¹,
张靖¹,
李伟³,
杨靖^{1, 4, ,}

1.
贵州大学电气工程学院，贵阳 550025
2.
中国电建集团贵州工程有限公司，贵阳 550025
3.
贵州省普通高等学校工程研究中心，贵阳 550025
4.
贵州大学贵州省“互联网+”协同智能制造重点实验室，贵阳 550025

基金项目:

国家自然科学基金(61640014,61963009)；贵州省教育厅创新群体(黔教合KY字[2021]012)；贵州省科技支撑计划(黔科合支撑[2022]一般017,黔科合支撑[2023]一般411, 黔科合支撑[2024]一般051)；贵州省教育厅工程研究中心(黔教技[2022]040, 黔教技[2022]043)；中国电建集团科技项目(DJ-ZDXM-2020-19,DJ-ZDXM-2022-44)；贵州省双碳研究院开放课题(DCRE-2023-13)；中铝国际工程股份有限公司科技重大专项(CJ2022JS-ZD02)

详细信息

通讯作者:
E-mail：jyang7@gzu.edu.cn

中图分类号: V557⁺.1；TP751
计量
- 文章访问数: 487
- HTML全文浏览量: 127
- PDF下载量: 21
- 被引次数: 0
出版历程
- 收稿日期: 2024-05-07
- 录用日期: 2024-06-25
- 网络出版日期: 2024-07-03
- 整期出版日期: 2026-06-30

Deep separable convolutional neural networks based on structural reparameterization

CHEN Hong¹,
YAN Jianguo²,
YANG Hua¹,
ZHANG Jing¹,
LI Wei³,
YANG Jing^{1, 4
, ,}

1.
Electrical Engineering College，Guizhou University，Guiyang 550025，China
2.
China Power Construction Group Guizhou Engineering Co.，Ltd，Guiyang 550025，China
3.
Engineering Research Center of Guizhou Higher Education Institutions，Guiyang 550025，China
4.
Guizhou University，China Guizhou Provincial Key Laboratory of Internet + Intelligent Manufacturing，Guiyang 550025，China

Funds:

National Natural Science Foundation of China (61640014, 61963009); Innovation Group of Guizhou Education Department (Qianjiaohe KY[2021]012); Guizhou Provincial Science and Technology Projects (Qiankehe Zhicheng [2022]Yiban017, Qiankehe Zhicheng [2023]Yiban411, Qiankehe Zhicheng [2024]Yiban051); Engineering Research Center of Guizhou Education Department (Qianjiaoji[2022]040, Qianjiaoji[2022]043); Science and Technology Project of Power Construction Corporation of China, Ltd (DJ-ZDXM-2020-19, DJ-ZDXM-2022-44); Guizhou Dual Carbon Research Institute Open Subjects (DCRE-2023-13); China Aluminum International Engineering Corporation Limited Science and Technology Major Project (CJ2022JS-ZD02)

More Information

Corresponding author: E-mail：jyang7@gzu.edu.cn

摘要

摘要:
针对目前卷积神经网络(CNN)模型大部分采用的单分支深度卷积方法影响模型的表达能力、占用大量参数和Flops的问题，提出一种新的轻量级CNN模型——基于结构重参数化的深度可分离卷积神经网络(DSCNN)。该模型的特征提取模块结构重参数化混合深度卷积(RepMiX)能融合不同通道和空间位置之间的信息，实现多尺度特征融合；基于结构重参数化的DSCNN中的重参数化非对称空间算子(RepASO)通过具有不同功能的多分支结构来学习不同信道特征信息，提高模型特征学习能力；RepMiX和RepASO均结合结构重参数化技术和深度可分离卷积(DS-Conv)思想，实现了训练和推理阶段的结构解耦，在减少模型参数和Flops的同时加速模型推理；在Tiny-ImageNet-200、CIFAR-10、CIFAR-100数据集及自建铝锭表面缺陷分类数据集上进行了对比实验，结果表明：基于结构重参数化的DSCNN实现了更高的浮点运算速度，并保持了具有竞争力的精度。
- 卷积神经网络 /
- 结构重参数化 /
- 深度可分离卷积 /
- 非对称卷积 /
- 结构解耦
Abstract:
A new lightweight convolutional neural network (CNN) model called the deep separable convolutional neural network (DSCNN) based on structural reparameterization is proposed, aiming at the single-branch deep convolutional approach used in the majority of the current CNN models, which not only affects the expressive ability of the model but also occupies a large number of parameters and Flops. Firstly, the feature extraction module (reparameterization MiXer, RepMiX) of the model can fuse information between different channels and spatial locations, realizing multi-scale feature fusion. Secondly, the reparameterized asymmetric spatial operator (RepASO) in the DSCNN learns different channel feature information through a multi-branch structure with different functions, which improves the model feature learning ability; meanwhile, both RepMiX and RepASO combine the structural reparameterization technique and the idea of depth wise separable convolution (DS-Conv) to realize structural decoupling in the training and inference phases, which accelerates the model inference while reducing the model parameters and Flops. Finally, comparative experiments are carried out on the Tiny-imagenet-200, CIFAR-10, and CIFAR-100 datasets in addition to a self-constructed dataset for the classification of aluminum ingot surface defects. The experimental results demonstrate that the DSCNN maintains competitive accuracy while achieving higher floating-point speeds.
- convolutional neural networks /
- structural reparameterization /
- depthwise seperable convolution /
- asymmetric convolution /
- structural decoupling

HTML全文

图 1 基于结构重参数化的DSCNN的整体架构

Figure 1. Overall architecture of DSCNN based on structural reparameterization

下载: 全尺寸图片幻灯片

图 2 重参数化非对称空间算子

Figure 2. Reparameterized asymmetric spatial operator

下载: 全尺寸图片幻灯片

图 3 非对称卷积融合原理

Figure 3. Principle of asymmetric convolution fusion

下载: 全尺寸图片幻灯片

图 4 DSCNN与其他轻量模型的性能对比

Figure 4. Performance comparison between DSCNN and other lightweight models

下载: 全尺寸图片幻灯片

图 5 缺陷样本

Figure 5. Defective sample

下载: 全尺寸图片幻灯片

图 6 合格样本

Figure 6. Qualified sample

下载: 全尺寸图片幻灯片

图 7 基于结构重参数化的DSCNN和多个轻量级CNN模型在铝锭表面缺陷分类数据集上的性能比较

圆圈大小代表Flops

Figure 7. Performance comparison of DSCNN based on structural reparameterization and multiple light-weight CNN models on aluminum ingot defect detection dataset

下载: 全尺寸图片幻灯片

图 8 不同模型热力图的比较

Figure 8. Comparison of heat maps of different models

下载: 全尺寸图片幻灯片

表 1 基于结构重参数化的DSCNN网络架构

Table 1. Architecture of DSCNN based on structural reparameterization

名称	输出大小	模型	数量
Stalk	$ \dfrac{h}{4}\times \dfrac{w}{4} $	$ \left.\begin{matrix}3 \times 3 & 40 \\\text {RepMixer} & 40 \\\text {FFN} & 40\end{matrix}\right. $	1
Stage 1	$ \dfrac{h}{4}\times \dfrac{w}{4} $	$ \left.\begin{matrix}1 \times 1 & 80 \\\text {RepASO} & 80 \\1 \times 1 & 40\end{matrix}\right. $	1
Merge layer	$ \dfrac{h}{8}\times \dfrac{w}{8} $	$ \left.\begin{matrix}2\times 2\;\text{Conv} & \text{步长2}\\\text{BN} & \mathrm{Re}\text{LU}\end{matrix}\right. $	1
Stage 2	$ \dfrac{h}{8}\times \dfrac{w}{8} $	$ \left.\begin{matrix}1 \times 1 & 160 \\\text {RepASO} & 160 \\1 \times 1 & 80\end{matrix}\right. $	2
Merge layer	$ \dfrac{h}{16}\times \dfrac{w}{16} $	$ \left.\begin{matrix}2\times 2\;\text{Conv} & \text{步长2}\\\text{BN} & \mathrm{Re}\text{LU}\end{matrix}\right. $	1
Stage 3	$ \dfrac{h}{16}\times \dfrac{w}{16} $	$ \left.\begin{matrix}1 \times 1 & 320 \\\text {RepASO} & 320 \\1 \times 1 & 160\end{matrix}\right. $	8
Merge layer	$ \dfrac{h}{32}\times \dfrac{w}{32} $	$ \left.\begin{matrix}2\times 2\;\text{Conv} & \text{步长2}\\\text{BN} & \mathrm{Re}\text{LU}\end{matrix}\right. $	1
Stage 4	$ 7\times 7 $	$ \left.\begin{matrix}1 \times 1 & 640 \\\text {RepASO} & 640 \\1 \times 1 & 320\end{matrix}\right. $	2
Head	$ 1\times 1 $	$ \left.\begin{matrix}7 \times 7 \;\text {AvgPool2d} & 320 \\1 \times 1 & 1\;280 \\\text {FC} & 1\;000\end{matrix}\right. $	1

下载: 导出CSV

表 2 实验环境

Table 2. Experimental environment

序号	环境	版本
1	Use the system	Ubuntu 22.04 LTS
2	PyTorch	1.13.1
3	CUDA	11.6
4	GPU	NVIDIA RTX3090Ti
5	CPU	i9-12900KF
6	PyCharm	2022 Community
7	Python	3.9
8	RAM	32 GB
9	SSD	500 GB

下载: 导出CSV

表 3 基于结构重参数化的DSCNN与其他轻量级CNN模型在Tiny-ImageNet-200数据集上的性能比较

Table 3. The performance comparison of DSCNN based on structural reparameterization and other light-weight CNN models on the Tiny-ImageNet-200 dataset

模型	参数量	浮点运算数	GPU吞吐量/（帧·s⁻¹）	浮点运算速度/10⁹ s⁻¹	Top-1精度/%
EdgeNeXt_XXS	1.3×10⁶	0.20×10⁹	5 268	29.62	63.26
MobileVit_XXS	1.3×10⁶	0.37×10⁹	3 472	27.21	63.81
FasterNet_T0	3.9×10⁶	0.34×10⁹	8 609	50.21	58.30
DSCNN_T0	3.7×10⁶	0.32×10⁹	6 912	44.02	60.52
EdgeNeXt_XS	2.34×10⁶	0.41×10⁹	3 894	32.16	66.56
MobileVit_XS	2.39×10⁶	0.92×10⁹	1 964	33.37	67.55
GhostNetV2	6.13×10⁶	0.17×10⁹	4 228	14.84	66.89
FasterNet_T1	7.61×10⁶	0.85×10⁹	5 364	56.17	64.35
DSCNN_T1	7.38×10⁶	0.89×10⁹	4 782	50.57	65.48
ConvNeXt_Tiny	28.59×10⁶	3.47×10⁹	1 143	43.15	71.96
ResNet-50	25.55×10⁶	3.33×10⁹	1 466	42.56	72.03
MobileNeXt	3.51×10⁶	0.31×10⁹	4 128	32.04	68.90
EdgeNeXt_S	5.59×10⁶	0.97×10⁹	2 598	41.93	71.61
MobileVit_S	5.64×10⁶	1.79×10⁹	1 535	41.07	70.30
FasterNet_T2	14.98×10⁶	1.91×10⁹	3 541	61.19	68.52
DSCNN_T2	14.19×10⁶	1.90×10⁹	2 981	55.15	69.60
Swin_T	28.3×10⁶	4.51×10⁹	600	36.49	78.11
DSCNN_S	29.1×10⁶	4.38×10⁹	1 435	63.25	78.90

下载: 导出CSV

表 4 在CIFAR-10和CIFAR-100数据集上比较基于结构重参数化的DSCNN与其他轻量级CNN模型

Table 4. Comparing DSCNN based on structural reparameterization with other light-weight CNN models on CIFAR-10 and CIFAR-100 datasets

模型	参数量	浮点运算数	CIFAR-10 精度/%	CIFAR-100 精度/%
EdgeNeXt_XS	1.83×10⁶	0.41×10⁹	86.97	62.83
ShuffleNetV1	2.28×10⁶	0.16×10⁹	85.98	61.35
MobileNeXt	2.51×10⁶	0.31×10⁹	86.77	54.30
GhostNetV2	6.13×10⁶	0.17×10⁹	86.17	62.43
MobileVit_XXS	1.30×10⁶	0.37×10⁹	84.31	56.53
DSCNN_T0	3.45×10⁶	0.32×10⁹	90.42	65.99
FasterNet_T0	3.66×10⁶	0.34×10⁹	89.41	64.85
ConvNeXt_Tiny	27.97×10⁶	3.47×10⁹	89.07	64.76
ResNet-50	25.55×10⁶	3.33×10⁹	90.57	65.76
MobileVit_XS	5.12×10⁶	0.92×10⁹	90.79	65.17

下载: 导出CSV

表 5 不同配置的RepASO在各个数据集上的实验结果比较

Table 5. Comparison of experimental results on various datasets with different configurations of RepASO

序号	卷积个数						身份映射	CIFAR-10 精度/%	CIFAR-100 精度/%	Tiny-ImageNet Top-1精度/%	铝锭表面缺陷分类数据集精度/%
序号	7×7	5×5	3×3	3×1	1×3	1×1	身份映射	CIFAR-10 精度/%	CIFAR-100 精度/%	Tiny-ImageNet Top-1精度/%	铝锭表面缺陷分类数据集精度/%
A	1	0	0	0	0	0	0	88.18	61.75	58.92	92.89
B	0	1	0	0	0	0	0	88.24	62.80	59.07	92.95
C	0	0	1	0	0	0	0	88.44	63.24	59.20	93.15
D	0	0	3	0	0	0	0	88.85	63.24	59.32	93.44
E	0	0	3	0	0	1	0	88.87	63.34	59.37	93.54
F	0	0	3	0	0	0	1	88.57	63.13	59.42	93.47
G	0	0	3	1	1	0	0	88.88	63.60	59.29	93.34
H	0	0	3	1	1	1	0	88.89	63.68	60.31	94.14
I	0	0	3	1	1	0	1	88.78	63.51	59.32	94.38
J	0	0	3	1	1	1	1	90.42	65.99	60.52	96.50

下载: 导出CSV

表 6 RepMixer中不同r值的RepMiX在CIFAR-10和 CIFAR-100数据集上的定量实验结果

Table 6. Quantitative experimental results of the RepMiX with different r values in RepMixer on CIFAR-10 and CIFAR-100 datasets

r	原始模型参数量	CIFAR-10精度/%	CIFAR-100精度/%
1	3.46×10⁶	89.03	63.52
2	3.52×10⁶	89.33	63.83
3	3.58×10⁶	90.42	65.99
4	3.64×10⁶	88.17	63.55

下载: 导出CSV

参考文献(23)

[1]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[2]	DING X H, ZHANG X Y, HAN J G, et al. Scaling up your kernels to 31×31: revisiting large kernel design in cnns[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 11953-11965.
[3]	TAN M X, LE Q V. EfficientNetV2: smaller models and faster training[EB/OL]. (2021-06-23)[2024-01-05]. https://doi.org/10.48550/arXiv.2104.00298.
[4]	LIU Z, MAO H, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 11966-11976.
[5]	YU W H, ZHOU P, YAN S C, et al. InceptionNeXt: when inception meets ConvNext[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2024: 5672-5683.
[6]	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4510-4520.
[7]	ZHANG J N, LI X T, LI J, et al. Rethinking mobile block for efficient attention-based models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2024: 1389-1400.
[8]	LIU S W, CHEN T L, CHEN X H, et al. More ConvNets in the 2020s: Scaling up kernels beyond 51×51 using sparsity[EB/OL]. (2023-03-03)[2024-01-07]. https://doi.org/10.48550/arXiv.2207.03620.
[9]	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6848-6856.
[10]	SUN K, LI M J, LIU D, et al. IGCV3: interleaved low-rank group convolutions for efficient deep neural networks[EB/OL]. (2018-07-20)[2024-01-08]. https://doi.org/10.48550/arXiv.1806.00178.
[11]	GAO H Y, WANG Y, CAI L, et al. ChannelNets: compact and efficient convolutional neural networks via channel-wise convolutions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2570-2581.
[12]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13728-13737.
[13]	DING X H, GUO Y C, DING G G, et al. ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 1911-1920.
[14]	VASU P K A, GABRIEL J, ZHU J, et al. MobileOne: an improved one millisecond mobile backbone[EB/OL]. (2023-03-28)[2024-01-10]. https://doi.org/10.48550/arXiv.2206.04040.
[15]	LIU Z, HU H, LIN Y T, et al. Swin transformer V2: scaling up capacity and resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 11999-12009.
[16]	TROCKMAN A, KOLTER J Z. Patches are all you need?[EB/OL]. (2022-01-24)[2024-01-12]. https://doi.org/10.48550/arXiv.2201.09792.
[17]	WANG A, CHEN H, LIN Z J, et al. RepViT: Revisiting mobile CNN from ViT perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2024: 15909-15920.
[18]	邱云飞, 张家欣, 兰海, 等. 融合张量合成注意力的改进ResNet图像分类模型[J]. 激光与光电子学进展, 2023, 60(6): 0610008. QIU Y F, ZHANG J X, LAN H, et al. Improved ResNet image classification model based on tensor synthesis attention[J]. Laser & Optoelectronics Progress, 2023, 60(6): 0610008(in Chinese).
[19]	朱逢乐, 刘益, 乔欣, 等. 基于多尺度级联卷积神经网络的高光谱图像分析[J]. 吉林大学学报(工学版), 2023, 53(12): 3547-3557. ZHU F L, LIU Y, QIAO X, et al. Analysis of hyperspectral image analysis based on multi-scale cascaded convolutional neural network[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(12): 3547-3557 (in Chinese).
[20]	程小辉, 李钰, 康燕萍. 基于中间图特征提取的卷积网络双标准剪枝[J]. 计算机工程, 2023, 49(3): 105-112. CHENG X H, LI Y, KANG Y P. Double standard pruning of convolution network based on feature extraction of intermediate graph[J]. Computer Engineering, 2023, 49(3): 105-112(in Chinese).
[21]	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1-9.
[22]	赵利, 王雷全, 张俊三, 等. 基于双通道特征增强的高光谱图像分类[J]. 激光与光电子学进展, 2023, 60(12): 1210012. ZHAO L, WANG L Q, ZHANG J S, et al. Hyperspectral image classification based on dual-channel feature enhancement[J]. Laser & Optoelectronics Progress, 2023, 60(12): 1210012(in Chinese).
[23]	CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 12021-12031.