-
摘要:
针对目前人脸表情识别大多采用基于深度学习的端到端特征提取及分类方法的现象,提出了一种新的深度模型优化方法。基于ResNet18残差网络架构和正则化思想,提出了联合正则化策略,即将过滤器响应正则化和批量正则化、实例正则化和组正则化、组正则化和批量正则化分别嵌入网络之中,平衡和改善特征数据分布,弥补单一正则化的缺点,提升模型性能。在2个公开数据集FER2013和CK+进行了验证和测试,最高准确率分别达到了73.558%和94.9%,实验结果表明,联合正则化策略提高了基础网络的性能,其表现优于诸多当前较新的人脸表情识别方法。
Abstract:As for that end-to-end feature extraction and classification based on deep learning often used in facial expression recognition, a new method of depth model optimization has been proposed. This paper proposes the joint optimization strategies learned from ResNet18 residual network and normalization ideas, that is, filter response normalization and batch normalization, instance normalization and group normalization, as well as group normalization and batch normalization were embedded in the network, respectively, to balance and improve the distribution of feature data, make up for the shortcomings of single regularization, and improve model performance. The validation and test were carried out on the two public datasets FER2013 and CK+, and the highest accuracy rates are 73.558% and 94.9%, respectively. The experimental results indicate that the joint optimization strategy enhances the performance of the basic network, which is better than most of the latest facial expression recognition methods.
-
表 1 基础框架以及添加联合正则化策略后的实验结果
Table 1. Experimental results of basic framework and adding joint normalization strategies
模型 准确率/% 文献[38] 71.190 Model1(本文) 73.558 Model2(本文) 73.534 Model3(本文) 73.031 表 2 残差网络添加联合正则化数量的效果比较
Table 2. Comparison of impact of adding number of joint normalization based on residual network
数量 准确率/% Model1 Model2 Model3 0 71.190 71.190 71.190 1 72.555 72.722 72.499 1-2 72.053 72.417 71.691 1-3 73.558 73.530 73.031 1-4 72.778 72.416 72.723 表 3 单一正则化与联合正则化(在前3个残差块中使用)的比较
Table 3. Comparison between individual normalization and joint normalization(used in the first three residual blocks)
优化策略 准确率/% BN 71.190 IN 73.168 GN 73.029 FRN 72.276 Model1 73.558 Model2 73.530 Model3 73.031 表 4 本文方法与目前较新的方法在FER2013数据集上准确率比较
Table 4. Comparison of accuracy rate between proposed method and state-of-the-art methods on FER2013 dataset
-
[1] JAIN D K, SHAMSOLMOALI P, SEHDEV P.Extended deep neural network for facial emotion recognition[J].Pattern Recogntion Letters, 2019, 120:69-74. doi: 10.1016/j.patrec.2019.01.008 [2] HU S H, HU Y M, LI J Q, et al.Natural scene facial expression recognition based on differential features[C]//2019 Chinese Automation Congress(CAC).Piscataway: IEEE Press, 2019: 2840-2844. http://ieeexplore.ieee.org/document/8997280 [3] LI Y, CAO G T, CAO W M.Stacking-based deep neural network for facial expression recognition[C]//2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).Piscataway: IEEE Press, 2019: 1338-1342. [4] HE J, CAI J F, FANG L Z, et al.A method of facial expression recognition based on LBP fusion of key expressions areas[C]//The 27th Chinese Control and Decision Conference(2015 CCDC).Piscataway: IEEE Press, 2015: 4200-4204. 300141861_A_method_of_facial_expression_recognition_based_on_LBP_fusion_of_key_expressions_areas [5] OJALA T, PIETIKÄINEN M, MÄENPÄÄ T.Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7):971-987. doi: 10.1109/TPAMI.2002.1017623 [6] IOFFE S, SZEGEDY C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML).New York: ACM, 2015: 448-456. https://www.researchgate.net/publication/272194743_Batch_Normalization_Accelerating_Deep_Network_Training_by_Reducing_Internal_Covariate_Shift [7] ULYANOV D, VEDALDI A, LEMPITSKY V.Instance normalization: The missing ingredient for fast stylization[EB/OL].(2017-11-06)[2020-02-20].https://arxiv.org/abs/1607.08022. [8] WU Y X, HE K M.Group normalization[C]//The European Conference on Computer Vision (ECCV).Berlin: Springer, 2018: 3-19. https://www.researchgate.net/publication/323957064_Group_Normalization [9] SINGH S, KRISHNAN S.Filter response normalization layer: Fliminating batch dependence in the training of deep neural networks[EB/OL].(2019-11-21)[2020-02-20].https://arxiv.org/abs/1911.09737. [10] GOODFELLOW I J, ERHAN D, CARRIER P L, et al.Challenges in representation learning: A report on three machine learning contests[C]//International Conference on Neural Information Processing.Berlin: Springer, 2013: 117-124. [11] LUCEY P, COHN J F, KANADE T, et al.The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway: IEEE Presss, 2010: 94-10. https://www.researchgate.net/publication/224165246_The_Extended_Cohn-Kanade_Dataset_CK_A_complete_dataset_for_action_unit_and_emotion-specified_expression [12] DALAL N, TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2005, 1: 886-893. [13] LOWE D G.Object recognition from local scale-invariant features[C]//Proceedings of the 7th IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 1999, 2: 1150-1157. https://www.researchgate.net/publication/2373439_Object_Recognition_from_Local_Scale-Invariant_Features [14] LYONS M J, AKAMATSU S, KAMACHI M, et al.The Japanese female facial expression (JAFFE) database[C]//Proceedings of 3rd International Conference on Automatic Face and Gesture Recognition.Piscataway: IEEE Press, 1998: 14-16. http://www.researchgate.net/publication/318710640_The_Japanese_Female_Facial_Expression_JAFFE_Database [15] ZHAO G, HUANG X, TAINI M, et al.Facial expression recognition from near-infrared videos[J].Image and Vision Computing, 2011, 29(9):607-619. doi: 10.1016/j.imavis.2011.07.002 [16] ELSAYED A, MAHMOOD A, SOBH T.Effect of super resolution on high dimensional features for unsupervised face recognition in the wild[C]//2017 IEEE Applied Imagery Pattern Recognition Workshop(AIPR).Piscataway: IEEE Press, 2017: 1-5. https://www.researchgate.net/publication/315796189_Effect_of_Super_Resolution_on_High_Dimensional_Features_for_Unsupervised_Face_Recognition_in_the_Wild [17] WANG P Y, SU F, ZHAO Z C.Joint multi-feature fusion and attribute relationships for facial attribute prediction[C]//2017 IEEE Visual Communications and Image Processing (VCIP).Piscataway: IEEE Press, 2017: 1-4. https://www.researchgate.net/publication/323502250_Joint_multi-feature_fusion_and_attribute_relationships_for_facial_attribute_prediction [18] TAHERKHANI F, NASRABADI N M, DAWSON J.A deep face identification network enhanced by facial attributes prediction[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2018: 553-560. https://www.researchgate.net/publication/324886978_A_Deep_Face_Identification_Network_Enhanced_by_Facial_Attributes_Prediction [19] KUO C M, LAI S H, MICHEL S.A compact deep learning model for robust facial expression recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2018: 2121-2129. https://www.researchgate.net/publication/329748607_A_Compact_Deep_Learning_Model_for_Robust_Facial_Expression_Recognition [20] YANG J, ZHANG F, CHEN B, et al.Facial expression recognition based on facial action unit[C]//2019 10th International Green and Sustainable Computing Conference (IGSC).Piscataway: IEEE Press, 2019: 1-6. https://www.researchgate.net/publication/338599937_Facial_Expression_Recognition_Based_on_Facial_Action_Unit [21] YU K, SALZMANN M.Second-order convolution neural networks[J].Clinical Immunology & Immunopathology, 2017, 66(3):230-238. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=e7ccbccdd2fc1b8f8812294f939d8fed [22] GAO Z, XIE J, WANG Q, et al.Global second-order pooling convolutional networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway: IEEE Press, 2019: 3024-3033. http://ieeexplore.ieee.org/document/8954152/ [23] HUANG Z, LUC V G.A Riemannian network for SPD matrix learning[C]//31st AAAI Conference on Artificial Intelligence.San Francisco: AAAI Press, 2017: 2036-2042. [24] ACHARYA D, HUANG Z, PAUDEL D P, et al.Covariance pooling for facial expression recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway: IEEE Press, 2018: 367-374. https://www.researchgate.net/publication/325143336_Covariance_Pooling_For_Facial_Expression_Recognition [25] HAMESTER D, BARROS P, WERMTER S.Face expression recognition with a 2-channel convolutional neural network[C]//2015 International Joint Conference on Neural Networks (IJCNN).Piscataway: IEEE Press, 2015: 1-8. https://www.researchgate.net/publication/303022259_Face_expression_recognition_with_a_2-channel_Convolutional_Neural_Network [26] MOLLAHOSSEINI A, CHAN D, MAHOOR M H.Going deeper in facial expression recognition using deep neural networks[C]//2016 IEEE Winter Conference on Applications of Computer Vision (WACV).Piscataway: IEEE Press, 2016: 1-10. http://www.researchgate.net/publication/283986729_Going_Deeper_in_Facial_Expression_Recognition_using_Deep_Neural_Networks [27] CHEN Y, HU H.Facial expression recognition by inter-class relational learning[J].IEEE Access, 2019, 7:94106-94117. doi: 10.1109/ACCESS.2019.2928983 [28] LIU M, LI S, SHAN S, et al.Deeply learning deformable facial action parts model for dynamic expression analysis[C]//Asian Conference on Computer Vision(ACCV).Berlin: Springer, 2014: 143-157. [29] NGUYEN D H, KIM S H, LEE G S, et al.Facial expression recognition using a temporal ensemble of multi-level convolutional neural networks[EB/OL].(2019-10-10)[2020-02-20].https://ieeexplore.ieee.org/document/8863974. [30] MONTAVON G, ORR G B, MULLER K R.Neural networks:Tricks of the trade[M].Berlin:Springer, 1998:9-50. [31] XIE S Y, GIRSHICK R, DOLLAR P, et al.Aggregated residual transformations for deep neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2017: 1492-1500. https://www.researchgate.net/publication/320971540_Aggregated_Residual_Transformations_for_Deep_Neural_Networks [32] SZEGEDY C, LIU W, JIA Y, et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2015: 1-9. https://www.researchgate.net/publication/265787949_Going_Deeper_with_Convolutions [33] HUANG G, LIU Z, WEINBERGER K Q, et al.Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2017: 4700-4708. https://www.researchgate.net/publication/319770123_Densely_Connected_Convolutional_Networks [34] ULYANOV D, VEDALDI A, LEMPITSKY V.Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2017: 6924-6932. https://www.researchgate.net/publication/312170783_Improved_Texture_Networks_Maximizing_Quality_and_Diversity_in_Feed-forward_Stylization_and_Texture_Synthesis [35] PAN X, LUO P, SHI J, et al.Two at once: Enhancing learning and generalization capacities via IBN-net[C]//The European Conference on Computer Vision (ECCV).Berlin: Springer, 2018: 464-479. [36] EKMAN P, FRIESEN W V.Facial action coding system:A technique for the measurement of facial movement[M].Palo Alto:Consulting Psychologists Press, 1978:32-96. [37] HE K, ZHANG X, REN S, et al.Identity mappings in deep residual networks[C]//The European Conference on Computer Vision.Berlin: Springer, 2016: 630-645. https://www.researchgate.net/publication/319770414_Identity_Mappings_in_Deep_Residual_Networks [38] QIN Z Y, WU J.Visual saliency maps can apply to facial expression recognition[EB/OL].(2018-11-12)[2020-02-20].https://arxiv.org/abs/1811.04544. [39] MIAO S, XU H Y, HAN Z Q, et al.Recognizing facial expressions using a shallow convolutional neural network[J].IEEE Access, 2019, 7:78000-78011. doi: 10.1109/ACCESS.2019.2921220 [40] ZHOU J C, JIA X, SHEN L L, et al.Improved softmax loss for deep learning-based face and expression recognition[J].Cognitive Computation and Systems, 2019, 1(4):97-102. doi: 10.1049/ccs.2019.0010 [41] TIAN Y, WEN Z W, XIE W C, et al.Outlier-suppressed triplet loss with adaptive class-aware margins for facial expression recognition[C]//2019 IEEE International Conference on Image Processing (ICIP).Piscataway: IEEE Press, 2019: 46-50. https://www.researchgate.net/publication/335538473_Outlier-Suppressed_Triplet_Loss_with_Adaptive_Class-Aware_Margins_for_Facial_Expression_Recognition [42] XIE S Y, HU H F, WU Y B.Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition[J].Pattern Recognition, 2019, 92:177-191. doi: 10.1016/j.patcog.2019.03.019