Image super-resolution reconstruction network based on expectation maximization self-attention residual
-
摘要:
基于深度学习的图像超分辨率(SR)重建方法主要通过增加模型的深度来提升图像重建的质量,但同时增加了模型的计算代价,很多网络利用注意力机制来提高特征提取能力,但难以充分学习到不同区域的特征。为此,提出一种基于期望最大化(EM)自注意力残差的图像超分辨率重建网络。该网络通过改进基础残差块,构建特征增强残差块,以更好地复用残差块中所提取的特征。为增加特征信息在空间上的相关性,引入EM自注意力机制,构建EM自注意力残差模块来增强模型中每个模块的特征提取能力,并通过级联EM自注意力残差模块来构建整个模型的特征提取结构。所获得的特征图通过上采样的图像重建模块获得重建的高分辨率图像。将所提方法与主流方法进行实验对比,结果表明:所提方法在5个流行的SR测试集上能够取得较好的主观视觉效果和更优的性能指标。
-
关键词:
- 超分辨率重建 /
- 注意力机制 /
- 期望最大化 /
- 特征增强残差块 /
- EM自注意力残差模块
Abstract:In recent years, most deep learning-based image super-resolution (SR) reconstruction methods mainly improve the quality of image reconstruction by increasing the depth of the model, while also increasing the computational cost of the model. Additionally, a lot of networks have implemented the attention mechanism to enhance their capacity for feature extraction, but it is still challenging to properly understand the properties of various regions. In response to the above problems, this paper proposes a novel SR reconstruction network based on expectation maximization (EM) self-attention residual. The network constructs a feature-enhanced residual block by improving the basic residual block to better reuse the features extracted from the residual block. In order to increase the spatial correlation of the feature information, an EM self-attention residual block is constructed by introducing the EM self-attention mechanism, which is used to enhance the feature extraction capability of each module in the deep network model. Moreover, the feature extraction structure of the entire model is constructed by cascading EM self-attention residual blocks. Finally, a reconstructed high-resolution image is obtained through an up-sampling image reconstruction module.In order to verify the effectiveness of the proposed method, this paper has carried out comparison experiments with some mainstream methods. The experimental results show that the proposed method can achieve better subjective visual effects and better objective evaluation indicators on five popular widely used SR test datasets.
-
表 1 放大因子分别为2、3、4时,不同方法在各数据集上SR重建图像的PSNR和SSIM平均结果
Table 1. Average PSNR and SSIM results of SR reconstructed images obtained by different methods under different magnification factors of 2, 3, and 4 on each dataset
方法 放大因子 平均PSNR/dB 平均SSIM Set5 Set14 BSD100 Urban100 Manga109 Set5 Set14 BSD100 Urban100 Manga109 Bicubic 2 33.66 30.24 29.56 26.88 31.05 0.9299 0.8688 0.8431 0.8403 0.935 SRCNN[12] 2 36.66 32.45 31.36 29.50 35.72 0.9542 0.9067 0.8879 0.8946 0.968 VDSR[14] 2 37.53 33.03 31.90 30.76 37.16 0.9587 0.9124 0.8960 0.9140 0.974 DRCN[15] 2 37.63 33.04 31.85 30.75 37.57 0.9588 0.9118 0.8942 0.9133 0.973 MS-LapSRN[33] 2 37.78 33.28 32.05 31.15 37.78 0.9600 0.9150 0.8980 0.9190 0.976 SRMDNF[34] 2 37.79 33.32 32.05 31.33 38.07 0.9601 0.9159 0.8985 0.9204 0.976 LESRCNN[35] 2 37.65 33.32 31.95 31.45 38.49 0.9586 0.9148 0.8964 0.9206 0.9777 PAN[36] 2 38.00 33.59 32.18 32.01 38.70 0.9605 0.9181 0.8997 0.9273 0.9773 SMSR[37] 2 38.00 33.64 32.17 32.19 38.76 0.9601 0.9179 0.8990 0.9284 0.9771 本文方法 2 38.00 33.69 32.18 32.18 38.57 0.9606 0.9202 0.8997 0.9293 0.9774 方法 放大因子 平均PSNR/dB 平均SSIM Set5 Set14 BSD100 Urban100 Manga109 Set5 Set14 BSD100 Urban100 Manga109 Bicubic 3 30.39 27.55 27.21 24.46 26.95 0.8682 0.7742 0.7385 0.7349 0.856 SRCNN[12] 3 32.75 29.30 28.41 26.24 30.48 0.9090 0.8215 0.7863 0.7989 0.912 VDSR[14] 3 33.66 29.77 28.82 27.14 32.01 0.9213 0.8314 0.7976 0.8279 0.934 DRCN[15] 3 33.82 29.76 28.80 27.15 32.31 0.9226 0.8311 0.7963 0.8276 0.936 MS-LapSRN[33] 3 34.06 29.97 28.93 27.47 32.68 0.9240 0.8360 0.8020 0.8370 0.939 SRMDNF[34] 3 34.12 30.04 28.97 27.57 33.00 0.9254 0.8382 0.8025 0.8398 0.9403 LESRCNN[35] 3 33.93 30.12 28.91 27.70 33.15 0.9231 0.8380 0.8005 0.8415 0.9433 PAN[36] 3 34.40 30.36 29.11 28.11 33.61 0.9271 0.8423 0.8050 0.8511 0.9448 SMSR[37] 3 34.40 30.33 29.10 28.25 33.68 0.9270 0.8412 0.8050 0.8536 0.9445 本文方法 3 34.45 30.41 29.12 28.31 33.62 0.9278 0.8442 0.8060 0.8554 0.9450 方法 放大因子 平均PSNR/dB 平均SSIM Set5 Set14 BSD100 Urban100 Manga109 Set5 Set14 BSD100 Urban100 Manga109 Bicubic 4 28.42 26.00 25.96 23.14 25.15 0.8104 0.7027 0.6675 0.6577 0.789 SRCNN[12] 4 30.48 27.50 26.90 24.52 27.66 0.8628 0.7513 0.7101 0.7221 0.858 VDSR[14] 4 31.35 28.01 27.29 25.18 28.82 0.8838 0.7674 0.7251 0.7524 0.886 DRCN[15] 4 31.53 28.02 27.23 25.14 28.97 0.8854 0.7670 0.7233 0.7510 0.886 MS-LapSRN[33] 4 31.74 28.26 27.43 25.51 29.54 0.8890 0.7740 0.7310 0.7680 0.897 SRMDNF[34] 4 31.96 28.35 27.49 25.68 30.09 0.8925 0.7787 0.7337 0.7731 0.902 LESRCNN[35] 4 31.88 28.44 27.45 25.77 30.49 0.8903 0.7772 0.7313 0.7732 0.9091 PAN[36] 4 32.13 28.61 27.59 26.11 30.51 0.8948 0.7822 0.7363 0.7854 0.9095 SMSR[37] 4 32.12 28.55 27.55 26.11 30.54 0.8932 0.7808 0.7351 0.7868 0.9085 本文方法 4 32.25 28.69 27.59 26.20 30.50 0.8953 0.7841 0.7370 0.7891 0.9092 表 2 放大因子为2时不同方法在Set5数据集上的平均运行时间
Table 2. Average running time of different methods on Set5 dataset under a magnification factor of 2
表 3 放大因子为4时不同方法在Set5数据集上的LPIPS结果
Table 3. LPIPS results of different methods on Set5dataset under a magnification factor of 4
表 4 放大因子为4时不同模块在Set5数据集上的比较结果
Table 4. Comparsion results of different blocks on Set5 dataset under a magnification factor of 4
模型 PSNR/dB SSIM 参数量/103 计算量/109 FLOP DARN_res 32.20 0.8951 1649 1952 DARN_dr 32.08 0.8940 1436 1756 DARN_ema 31.79 0.8930 467 863 DARN_ca 32.11 0.8942 1445 1757 DARN 32.25 0.8953 1568 1878 注:FLOP为浮点运算次数,使用尺寸为1280×720的图像来测试。 -
[1] TAKEDA H, MILANFAR P, PROTTER M, et al. Super-resolution without explicit subpixel motion estimation[J]. IEEE Transactions on Image Processing, 2009, 18(9): 1958-1975. doi: 10.1109/TIP.2009.2023703 [2] ZOU W W W, YUEN P C. Very low resolution face recognition problem[J]. IEEE Transactions on Image Processing, 2012, 21(1): 327-340. [3] SHI W, CABALLERO J, LEDIG C, et al. Cardiac image super-resolution with global correspondence using multi-atlas PatchMatch[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2013, 8151: 9-16. [4] ARUN P V, BUDDHIRAJU K M, PORWAL A, et al. CNN based spectral super-resolution of remote sensing images[J]. Signal Processing, 2020, 169: 107394. doi: 10.1016/j.sigpro.2019.107394 [5] KEYS R. Cubic convolution interpolation for digital image processing[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(6): 1153-1160. doi: 10.1109/TASSP.1981.1163711 [6] MARQUIAN A, OSHER S J. Image super-resolution by TV-regularization and Bregman iteration[J]. Journal of Scientific Computing, 2008, 37(3): 367-382. doi: 10.1007/s10915-008-9214-8 [7] DONG W, ZHANG L, SHI G, et al. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization[J]. IEEE Transactions on Image Processing, 2011, 20(7): 1838-1857. doi: 10.1109/TIP.2011.2108306 [8] YANG J, WRIGHT J, HUANG T S, et al. Image super-resolution via sparse representation[J]. IEEE Transactions on Image Processing, 2010, 19(11): 2861-2873. doi: 10.1109/TIP.2010.2050625 [9] ZEYDE R, ELAD M, PROTTER M. On single image scale-up using sparse-representations[C]//Proceedings of the International Conference on Curves and Surfaces. Berlin: Springer, 2010: 711-730. [10] HU Y, WANG N, TAO D, et al. SERF: A simple, effective, robust, and fast image super-resolver from cascaded linear regression[J]. IEEE Transactions on Image Processing, 2016, 25(9): 4091-4102. doi: 10.1109/TIP.2016.2580942 [11] HUANG Y, LI J, GAO X, et al. Single image super-resolution via multiple mixture prior models[J]. IEEE Transactions on Image Processing, 2018, 27(12): 5904-5917. doi: 10.1109/TIP.2018.2860685 [12] DONG C, LOY C C, HE K, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295-307. doi: 10.1109/TPAMI.2015.2439281 [13] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [14] KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1646-1654. [15] KIM J, LEE J K, LEE K M. Deeply-recursive convolutional network for image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1637-1645. [16] LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1132-1140. [17] ZHANG Y L, LI K P, LI K, et al. Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018, 11211: 286-310. [18] DAI T, CAI J, ZHANG Y, et al. Second-order attention network for single image super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 11057-11066. [19] HUI Z, WANG X, GAO X. Fast and accurate single image super-resolution via information distillation network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 723-731. [20] LI J, FANG F, MEI K, et al. Multi-scale residual network for image super-resolution[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018, 11212: 527-542. [21] AHN N, KANG B, SOHN K A. Fast, accurate, and lightweight super-resolution with cascading residual network[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018, 11214: 256-272. [22] LEDIG C, THEIS L, HUSZAR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 105-114. [23] WANG X, YU K, WU S, et al. ESRGAN: Enhanced super-resolution generative adversarial networks[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018, 11133: 63-79. [24] YADAV S, CHEN C, ROSS A. Relativistic discriminator: A one-class classifier for generalized iris presentation attack detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 2624-2633. [25] LEI J, LIU C, JIANG D. Fault diagnosis of wind turbine based on long short-term memory networks[J]. Renewable Energy, 2019, 133: 422-432. doi: 10.1016/j.renene.2018.10.031 [26] TAN Z, WANG M, XIE J, et al. Deep semantic role labeling with self-attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2018. [27] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803. [28] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018, 11211: 3-19. [29] SANG H, ZHOU Q, ZHAO Y. PCANet: Pyramid convolutional attention network for semantic segmentation[J]. Image and Vision Computing, 2020, 103: 103997. doi: 10.1016/j.imavis.2020.103997 [30] 赵杨璐, 段丹丹, 胡饶敏, 等. 基于EM算法的混合模型中子总体个数的研究[J]. 数理统计与管理, 2020, 39(1): 35-50. doi: 10.13860/j.cnki.sltj.20190808-002ZHAO Y L, DUAN D D, HU R M, et al. On the number of components in mixture model based on EM algorithm[J]. Journal of Applied Statistics and Management, 2020, 39(1): 35-50(in Chinese). doi: 10.13860/j.cnki.sltj.20190808-002 [31] LI X, ZHONG Z, WU J, et al. Expectation-maximization attention networks for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9166-9175. [32] AGUSTSSON E, TIMOFTE R. NTIRE 2017 challenge on sngle image super-resolution: Dataset and study[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1122-1131. [33] LAI W S, HUANG J B, AHUJA N, et al. Deep Laplacian pyramid networks for fast and accurate super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5835-5843. [34] ZHANG K, ZOU W, ZHANG L. Learning a single convolutional super-resolution network for multiple degradations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 3262-3271. [35] TIAN C, ZHUGE R, WU Z, et al. Lightweight image super-resolution with enhanced CNN[J]. Knowledge-Based Systems, 2020, 205: 106235. doi: 10.1016/j.knosys.2020.106235 [36] ZHAO H, KONG X, HE J, et al. Efficient image super-resolution using pixel attention[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020, 12537: 56-72. [37] WANG L G, DONG X Y, WANG Y Q, et al. Exploring sparsity in image super-resolution for efficient inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4915-4924. [38] BEVILACQUA M, ROUMY A, GUILLEMOT C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//Procedings of the British Machine Vision Conference. Berlin: Springer, 2012: 135.1-135.10. [39] MARTIN D, FOWLKES C, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings of the 8th IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2001, 2: 416-423. [40] HUANG J B, SINGH A, AHUJA N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 5197-5206. [41] MATSUI Y, ITO K, ARAMAKI Y, et al. Sketch-based manga retrieval using Manga109 dataset[J]. Multimedia Tools and Applications, 2017, 76(20): 21811-21838. doi: 10.1007/s11042-016-4020-z