基于注意力机制与特征相关性的人脸表情识别

兰凌强; 刘淇缘; 卢树华

doi:10.13700/j.bh.1001-5965.2020.0507

基于注意力机制与特征相关性的人脸表情识别

doi: 10.13700/j.bh.1001-5965.2020.0507

中国人民公安大学信息网络安全学院，北京 102600

基金项目:

国家重点研发计划 2016YFC0801005

中央高校基本科研业务经费项目 2019JKF225

公共安全行为科学实验室开放课题 2020SYS16

详细信息

通讯作者:
卢树华, E-mail: lushuhua@ppsuc.edu.cn

中图分类号: TP391
计量
- 文章访问数: 651
- HTML全文浏览量: 161
- PDF下载量: 91
- 被引次数: 0
出版历程
- 收稿日期: 2020-09-09
- 录用日期: 2020-12-14
- 网络出版日期: 2022-01-20

Facial expression recognition based on attention mechanism and feature correlation

College of Information and Cyber Security, People's Public Security University of China, Beijing 102600, China

Funds:

National Key R & D Program of China 2016YFC0801005

Fundamental Funds for the Central Universities 2019JKF225

Open Research Fund of the Public Security Behavioral Science Laboratory 2020SYS16

More Information

Corresponding author: LU Shuhua, E-mail: lushuhua@ppsuc.edu.cn

摘要

摘要:
针对自然条件下人脸表情识别面临遮挡、光照、姿势变化等挑战，存在识别准确率低的问题, 提出了一种新的深度学习网络模型用于人脸表情识别。以ResNet为基础网络，融合了瓶颈注意力机制及全局二阶池化层，其中瓶颈注意力机制专注于表情重要特征的提取，全局二阶池化层度量表情特征之间的相关性，在此基础上通过联合正则化策略，平衡和改善特征数据分布情况，提高表情识别准确率。所提方法在2个公开数据集FER2013和CK+ 进行了测试及验证，最高准确率分别达到了74.227%和95.8%，性能优于诸多现存的主流方法，表明所提模型具有较好的准确性和鲁棒性。
- 表情识别 /
- 深度学习 /
- 瓶颈注意力机制 /
- 全局二阶池化层 /
- 联合正则化策略
Abstract:
There are many challenges including occlusion, illumination and posture variation in the facial expression recognition under natural conditions, leading to the low accuracy. This paper proposes a new deep learning network model for facial expression recognition. This model uses ResNet as backbone, and introduces the bottleneck attention module and the global second-order pooling layer. The bottleneck attention module focuses on the extraction of important expression features, and the global second-order pooling layer aims to measure the correlation among expression features. On this basis, the joint normalization strategies are used balance and improve the distribution of feature data, which improves the accuracy of expression recognition. The test and validation of the proposed method have been carried out on the two public datasets FER2013 and CK+, resulting in the highest accuracy rates of 74.227% and 95.8%, respectively. The performance is better than most of the latest facial expression recognition methods. The results indicate that this model has better accuracy and robustness.
- expression recognition /
- deep learning /
- bottleneck attention module /
- global second-order pooling layer /
- joint normalization strategies

HTML全文

图 1 网络结构图

Figure 1. Network architecture

下载: 全尺寸图片幻灯片

图 2 数据集图片示例

Figure 2. Examples of pictures on dataset

下载: 全尺寸图片幻灯片

图 3 所提模型在CK+数据集上的混淆矩阵

Figure 3. Confusion matrices of the proposed models on CK+ dataset

下载: 全尺寸图片幻灯片

图 4 所提模型在FER2013数据集上的混淆矩阵

Figure 4. Confusion matrices of the proposed models on FER2013 dataset

下载: 全尺寸图片幻灯片

表 1 不同模型在FER2013和CK+数据集上的准确率

Table 1. Accuracy rate of different modles on FER2013 and CK+ datasets

模型名称	ResNet18准确率/%		>ResNet34准确率/%		>ResNet50准确率/%
模型名称	FER2013	CK+	FER2013	CK+	FER2013	CK+
Baseline	71.190	89.3	72.304	92.8	72.109	92.0
Cov	72.834	93.5	72.973	93.1	72.527	92.5
Cov-Bam	73.057	94.4	73.224	93.6	73.001	93.0
Cov-Bam-FBN	73.614	94.9	73.671	95.1	73.447	93.5
Cov-Bam-IGN	73.419	95.8	73.502	95.5	73.224	93.1
Cov-Bam-BGN	73.670	94.9	74.227	95.1	73.279	93.1

下载: 导出CSV

表 2 所提模型与目前一些方法在CK+数据集上的准确率比较

Table 2. Comparison between proposed models and state-of-the-art methods on CK+ dataset

模型名称	网络架构	准确率/%
Fei^[30]	ResNet50	93.5
GPS^[31]	Gabor filter	95.1
ROI^[32]	AlexNet and GoogleNet	94.7
Cov-Bam-FBN	ResNet34	95.1
Cov-Bam-IGN	ResNet18	95.8
Cov-Bam-BGN	ResNet34	95.1

下载: 导出CSV

表 3 所提模型与目前一些主流方法在FER2013数据集上的准确率比较

Table 3. Comparison of accuracy rate between proposed models and state-of-the-art methods on FER2013 dataset

模型名称	网络架构	准确率/%
DAM-CNN^[33]	VGG-Face	66.200
BReG-NeXt^[34]	BReG-NeXt	71.530
Shao^[35]	ResNet101	71.140
ALAW^[36]	ResNet	72.670
Cov-Bam-FBN	ResNet34	73.671
Cov-Bam-IGN	ResNet34	73.502
Cov-Bam-BGN	ResNet34	74.227

下载: 导出CSV

表 4 所提模型与联合优化策略准确率对比

Table 4. Comparison of accuracy rate between proposed models and joint optimization strategy

模型名称	准确率/%
模型名称	CK+数据集	FER2013数据集
FBN	91.9	72.973
IGN	92.3	72.834
BGN	91.7	72.889
Cov-Bam-FBN	95.1	73.671
Cov-Bam-IGN	95.5	73.502
Cov-Bam-BGN	95.1	74.227

下载: 导出CSV

参考文献(36)

[1]	TAHA B, HATZINAKOS D. Emotion recognition from 2D facial expressions[C]//2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE). Piscataway: IEEE Press, 2019: 1-4.
[2]	LIU K C, HSU C C, WANG W Y, et al. Real-time facial expression recognition based on CNN[C]//2019 International Conference on System Science and Engineering (ICSSE). Piscataway: IEEE Press, 2019: 120-123.
[3]	CHA H S, CHOI S J, IM C H. Real-time recognition of facial expressions using facial electromyograms recorded around the eyes for social virtual reality applications[J]. IEEE Access, 2020, 8: 62065-62075. doi: 10.1109/ACCESS.2020.2983608
[4]	PANG L, LI N Q, ZHAO L, et al. Facial expression recognition based on Gabor feature and neural network[C]//2018 International Conference on Security, Pattern Analysis and Cybernetics (SPAC). Piscataway: IEEE Press, 2018: 489-493.
[5]	KUSHWAH K, SHARMA V, SINGH U. Neural network method through facial expression recognition[C]//2017 International conference of Electronics, Communication and Aerospace Technology (ICECA). Piscataway: IEEE Press, 2017: 532-537.
[6]	LI J, JIN K, ZHOU D L, et al. Attention mechanism-based CNN for facial expression recognition[J]. Neurocomputing, 2020, 411: 340-350. doi: 10.1016/j.neucom.2020.06.014
[7]	XIANG J, ZHU G M. Joint face detection and facial expression recognition with MTCNN[C]//2017 4th International Conference on Information Science and Control Engineering (ICISCE). Piscataway: IEEE Press, 2017: 424-427.
[8]	ZHOU Y, FENG Y Y, ZENG S Y, et al. Facial expression recognition based on convolutional neural network[C]//2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). Piscataway: IEEE Press, 2019: 410-413.
[9]	ZENG G H, ZHOU J C, JIA X, et al. Hand-crafted feature guided deep learning for facial expression recognition[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). Piscataway: IEEE Press, 2018: 423-430.
[10]	PARK J C, WOO S, LEE J Y, et al. BAM: Bottleneck attention module[EB/OL]. (2018-07-18)[2020-09-02]. https://arxiv.org/pdf/1807.06514.
[11]	GAO Z L, XIE J T, WANG Q L, et al. Global second-order pooling convolutional networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 3019-3028.
[12]	兰凌强, 李欣, 刘淇缘, 等. 基于联合正则化策略的人脸表情识别方法[J]. 北京航空航天大学学报, 2020, 46(9): 1797-1806. doi: 10.13700/j.bh.1001-5965.2020.0073 LAN L Q, LI X, LIU Q Y, et al. Facial expression recognition method based on a joint normalization strategy[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1797-1806(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0073
[13]	GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: A report on three machine learning contests[C]//International Conference on Neural Information Processing. Berlin: Springer, 2013: 117-124.
[14]	LUCEY P, COHN J F, KANADE T, et al. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops. Piscataway: IEEE Press, 2010: 94-101.
[15]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-10-04)[2020-09-02]. https://arxiv.org/abs/1409.1556.
[16]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[17]	HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[M]//Computer Vision ECCV 2016. Berlin: Springer, 2016: 630-645.
[18]	CHENG S, ZHOU G H. Facial expression recognition method based on improved VGG convolutional neural network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2020, 34(7): 2056003. doi: 10.1142/S0218001420560030
[19]	ZHONG Y X, QIU S H, LUO X S, et al. Facial expression recognition based on optimized ResNet[C]//2020 2nd World Symposium on Artificial Intelligence (WSAI). Piscataway: IEEE Press, 2020: 84-91.
[20]	HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. doi: 10.1109/TPAMI.2019.2913372
[21]	CHEN Y Z, HU H F. Facial expression recognition by inter-class relational learning[J]. IEEE Access, 2019, 7: 94106-94117. doi: 10.1109/ACCESS.2019.2928983
[22]	WANG F J, SHEN L P. Expression recognition using region features and facial action units[C]//2019 15th International Conference on Intelligent Environments. Piscataway: IEEE Press, 2019: 9-15.
[23]	XU Q T, ZHAO N J. A facial expression recognition algorithm based on CNN and LBP feature[C]//2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Piscataway: IEEE Press, 2020: 2304-2308.
[24]	WANG K, PENG X J, YANG J F, et al. Region attention networks for pose and occlusion robust facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069. doi: 10.1109/TIP.2019.2956143
[25]	LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism[J]. IEEE Transactions on Image Processing, 2019, 28(5): 2439-2450. doi: 10.1109/TIP.2018.2886767
[26]	GAN Y L, CHEN J Y, YANG Z K, et al. Multiple attention network for facial expression recognition[J]. IEEE Access, 2020, 8: 7383-7393. doi: 10.1109/ACCESS.2020.2963913
[27]	YU K C, SALZMANN M. Second-order convolutional neural networks[EO/OL]. (2017-03-20)[2020-09-02]. http://export.arxiv.org/abs/1703.06817.
[28]	LI P H, XIE J T, WANG Q L, et al. Is second-order information helpful for large-scale visual recognition [C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 2089-2097.
[29]	LI P H, XIE J T, WANG Q L, et al. Towards faster training of global covariance pooling networks by iterative matrix square root normalization[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 947-955.
[30]	FEI Z X, YANG E F, LI D, et al. Combining deep neural network with traditional classifier to recognize facial expressions[C]//2019 25th International Conference on Automation and Computing (ICAC). Piscataway: IEEE Press, 2019: 1-6.
[31]	ADIL B, NADJIB K M, YACINE L. A novel approach for facial expression recognition[C]//2019 International Conference on Networking and Advanced Systems (ICNAS). Piscataway: IEEE Press, 2019: 1-5.
[32]	SUN X, XIA P P, ZHANG L M, et al. A ROI-guided deep architecture for robust facial expressions recognition[J]. Information Sciences, 2020, 522: 35-48. doi: 10.1016/j.ins.2020.02.047
[33]	XIE S Y, HU H F, WU Y B. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition[J]. Pattern Recognition, 2019, 92: 177-191. doi: 10.1016/j.patcog.2019.03.019
[34]	HASANI B, NEGI P S, MAHOOR M. BReG-NeXt: Facial affect computing using adaptive residual networks with bounded gradient[J]. IEEE Transactions on Affective Computing, 2020, 99: 1.
[35]	SHAO J, QIAN Y S. Three convolutional neural network models for facial expression recognition in the wild[J]. Neurocomputing, 2019, 355: 82-92. doi: 10.1016/j.neucom.2019.05.005
[36]	XIE W C, SHEN L L, DUAN J M. Adaptive weighting of handcrafted feature losses for facial expression recognition[J]. IEEE Transactions on Cybernetics, 2021, 51(5): 2787-2800. doi: 10.1109/TCYB.2019.2925095