嵌入注意力机制的多尺度深度可分离表情识别

宋玉琴; 高师杰; 曾贺东; 熊高强

doi:10.13700/j.bh.1001-5965.2021.0114

嵌入注意力机制的多尺度深度可分离表情识别

doi: 10.13700/j.bh.1001-5965.2021.0114

西安工程大学电子信息学院, 西安 710600

基金项目:

中国纺织工业联合会科技性指导项目 2019062

详细信息

通讯作者:
宋玉琴, E-mail: 81308995@qq.com

中图分类号: TP391
计量
- 文章访问数: 382
- HTML全文浏览量: 89
- PDF下载量: 44
- 被引次数: 0
出版历程
- 收稿日期: 2021-03-10
- 录用日期: 2021-06-13
- 网络出版日期: 2021-07-13
- 整期出版日期: 2022-12-20

Multi-scale depthwise separable convolution facial expression recognition embedded in attention mechanism

School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710600, China

Funds:

Science Technology Development Department of CNTAC 2019062

More Information

Corresponding author: SONG Yuqin, E-mail: 81308995@qq.com

摘要

摘要:
针对面部表情识别中，传统机器学习方法特征提取较为复杂，浅层卷积神经网络识别率不高，以及深度卷积神经网络易带来梯度爆炸或弥散的问题，构建了残差网络嵌入注意力机制的多尺度深度可分离表情识别网络。通过多层多尺度深度可分离残差单元的叠加进行不同尺度的表情特征提取，使用CBAM注意力机制进行表情特征的筛选，提升有效表情特征权重的表达，削弱训练数据的噪声影响。所提网络模型在Fer-2103和CK+表情数据集分别取得了73.89%和97.47%的准确度，表明所提网络具有较强的泛化性。
- 表情识别 /
- 注意力机制 /
- 多尺度特征提取 /
- 深度可分离卷积 /
- 残差网络
Abstract:
For facial expression recognition, traditional machine learning method features extraction is relatively complex, shallow convolutional neural network recognition rate is not high, and deep convolutional network is easy to cause gradient explosion or dispersion problems. This paper constructs the multi-scale deep separable expression recognition network with residual network which embedded in attention mechanism. Through superposition of multi-layer and multi-scale depth separable residual elements, facial expression feature extraction of different scales is achieved; in the meanwhile, CBAM attention mechanism was used to screen the expression features for the purpose of improving the expression of the weight of the expression features and weakening the noise impact of training data. The algorithm network model in this paper achieves accuracy of 73.89% and 97.47% in Fer-2103 and CK+ expression data sets respectively, which indicates that this network has strong generalization.
- expression recognition /
- attention mechanism /
- multi-scale feature extraction /
- depthwise separable convolution /
- residual network

HTML全文

图 1 通道注意力模块

Figure 1. Channel attention module

下载: 全尺寸图片幻灯片

图 2 空间注意力模块

Figure 2. Spatial attention module

下载: 全尺寸图片幻灯片

图 3 嵌入CBAM的多尺度深度可分离卷积残差块(基础块)

Figure 3. Multi-scale depthwise separable convolution residuals embedded in CBAM(Basic Block)

下载: 全尺寸图片幻灯片

图 4 基于残差网络的嵌入CBAM的多尺度深度可分离表情识别网络结构

Figure 4. Structure of multi-scale depthwise separable facial expression recognition network embedded in CBAM based on residual network

下载: 全尺寸图片幻灯片

图 5 Fer-2013数据集的表情示例

Figure 5. Sample expression diagram of Fer-2013 dataset

下载: 全尺寸图片幻灯片

图 6 CK+数据集的表情示例

Figure 6. Sample expression diagram of CK+ dataset

下载: 全尺寸图片幻灯片

图 7 Fer-2013训练集与测试集的数据增强示例

Figure 7. Sample figure of data enhancement of Fer-2013 training set and test set

下载: 全尺寸图片幻灯片

图 8 Fer-2013私有验证集表情分类的混淆矩阵

Figure 8. Confusion matrix of expression classification in Fer-2013 private validation set

下载: 全尺寸图片幻灯片

图 9 CK+测试集表情分类的混淆矩阵

Figure 9. Confusion matrix of expression classification in CK+ test set

下载: 全尺寸图片幻灯片

表 1 表情识别网络特征参数

Table 1. Feature parameters of expression recognition network

网络层	输入类型	输出类型
Conv2d	44×44×3	44×44×64
Basic Block-1	44×44×64	44×44×64
Basic Block-2	22×22×128	22×22×128
Basic Block-3	11×11×256	11×11×256
Basic Block-4	6×6×512	6×6×512
Golbal Average Pooling	6×6×512	1×1×512
FC+Softmax	1×1×512	1×1×7

下载: 导出CSV

表 2 本文算法与其他表情识别算法的准确度对比

Table 2. Comparison of recognition rates between proposed algorithm and other expression recognition algorithms

数据集	算法	准确度/%								总体准确度/%
数据集	算法	愤怒	厌恶	恐惧	快乐	悲伤	惊喜	中立	藐视	总体准确度/%
Fer-2013	文献[15]	65	71	50	90	60	79	75	—	71.52
	文献[16]	65	65	54	91	62	82	73	—	72.67
	文献[17]	—	—	—	—	—	—	—	—	73.00
	本文	64	69	58	91	64	84	74	—	73.89
CK+	文献[18]	79	86	92	99	95	100	99	—	95.67
	文献[19]	88	97	94	99	88	99	—	78	94.90
	文献[20]	90	100	85	100	95	98	—	88	96.28
	本文	97	98	95	100	92	99	—	93	97.47

下载: 导出CSV

表 3 消融实验

Table 3. Ablation experiments

A	B	C	D	A-CBAM	B-CBAM	C-CBAM	D-CBAM	Fer-2013数据集准确度/%	CK+数据集准确度/%
√								71.91	92.12
√	√							72.53	94.74
√	√	√						72.72	95.17
√	√	√	√					72.94	95.49
√	√	√	√	√				73.06	95.96
√	√	√	√	√	√			73.42	96.67
√	√	√	√	√	√	√		73.63	97.09
√	√	√	√	√	√	√	√	73.89	97.47

下载: 导出CSV

参考文献(20)

[1]	SHAN C, GONG S, MCOWAN P W. Facial expression recognition based on local binary patterns: A comprehensive study[J]. Image and Vision Computing, 2009, 27(6): 803-816. doi: 10.1016/j.imavis.2008.08.005
[2]	LUO Y, ZHANG T, ZHANG Y. A novel fusion method of PCA and LDP for facial expression feature extraction[J]. Optik, 2016, 127(2): 718-721. doi: 10.1016/j.ijleo.2015.10.147
[3]	刘帅师, 田彦涛, 万川. 基于Gabor多方向特征融合与分块直方图的人脸表情识别方法[J]. 自动化学报, 2011, 37(12): 1455-1463. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201112006.htm LIU S S, TIAN Y T, WAN C. Facial expression recognition method based on Gabor multiorien-tation features fusion and block histogram[J]. Acta Automatica Sinica, 2011, 37(12): 1455-1463(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201112006.htm
[4]	KUMAR V D A, KUMAR V D A, MALATHI S, et al. Facial recognition system for suspect identification using a surveillance camera[J]. Pattern Recognition and Image Analysis, 2018, 28(3): 410-420. doi: 10.1134/S1054661818030136
[5]	ZHOU J, ZHANG S, MEI H, et al. A method of facial expre-ssion recognition based on Gabor and NMF[J]. Pattern Recognition and Image Analysis, 2016, 26(1): 119-124. doi: 10.1134/S1054661815040070
[6]	HSIEH C C, HSIH M H, JIANG M K, et al. Effective semantic features for facial expressions recognition using SVM[J]. Multimedia Tools and Applications, 2016, 75(11): 6663-6682. doi: 10.1007/s11042-015-2598-1
[7]	SUN K, KANG H, PARK H H. Tagging and classifying facial images in cloud environments based on KNN using MapReduce[J]. Optik, 2015, 126(21): 3227-3233. doi: 10.1016/j.ijleo.2015.07.080
[8]	李勇, 林小竹, 蒋梦莹. 基于跨连接LeNet-5网络的面部表情识别[J]. 自动化学报, 2018, 44(1): 176-182. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201801015.htm LI Y, LIN X Z, JIANG M Y. Facial expression recognition based on cross-connected LeNet-5 network[J]. Acta Automatica Sinica, 2018, 44(1): 176-182(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201801015.htm
[9]	MOLLAHOSSEINI A, CHAN D, MAHOOR M H. Going deeper in facial expression recognition using deep neural networks[C]//2016 IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE Press, 2016: 1-10.
[10]	JUNG H, LEE S, YIM J, et al. Joint fine-tuning in deep neural networks for facial expression recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. Pisca-taway: IEEE Press, 2015: 2983-2991.
[11]	HE K M, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[12]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV). Berlin: Springer, 2018: 3-19.
[13]	GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: A report on three machine learning contests[C]//International Conference on Neural Information Processing. Berlin: Springer, 2013: 117-124.
[14]	LUCEY P, COHN J F, KANADE T, et al. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2010: 94-101.
[15]	SHI C, TAN C, WANG L. A facial expression recognition method based on a multibranch cross-connection convolutional neural network[J]. IEEE Access, 2021, 9: 39255-39274. doi: 10.1109/ACCESS.2021.3063493
[16]	XIE W C, SHEN L L, DUAN J M. Adaptive weighting of handcrafted feature losses for facial expression recognition[J]. IEEE Transactions on Cybernetics, 2021, 51(5): 2787-2800. doi: 10.1109/TCYB.2019.2925095
[17]	HAYALE W, NEGI P S, MAHOOR M. Deep Siamese neural networks for facial expression recognition in the wild[J/OL]. IEEE Transactions on Affective Computing, 2021(2021-03-04)[2021-03-05]. https://ieeexplore.ieee.org/document/9423550.
[18]	SUN X, XIA P P, ZHANG L, et al. A ROI-guided deep architecture for robust facial expressions recognition[J]. Information Sciences, 2020, 522: 35-48. doi: 10.1016/j.ins.2020.02.047
[19]	兰凌强, 李欣, 刘淇缘, 等. 基于联合正则化策略的人脸表情识别方法[J]. 北京航空航天大学学报, 2020, 46(9): 1797-1806. doi: 10.13700/j.bh.1001-5965.2020.0073 LAN L Q, LI X, LIU Q Y, et al. Facial expression recognition method based on a joint normalization strategy[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1797-1806(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0073
[20]	GAN Y, CHEN J, YANG Z, et al. Multiple attention network for facial expression recognition[J]. IEEE Access, 2020, 8: 7383-7393. doi: 10.1109/ACCESS.2020.2963913