MAEA-DeepLab：具有多特征注意力有效聚合模块的语义分割网络

doi:10.3969/j.issn.0253-2778.2020.08.018

中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (8): 1170-1180.DOI: 10.3969/j.issn.0253-2778.2020.08.018

MAEA-DeepLab：具有多特征注意力有效聚合模块的语义分割网络

赵柳，陆军，刘杨

1.黑龙江大学计算机科学与技术学院，黑龙江哈尔滨150080； 2.黑龙江省数据库与并行计算重点实验室（黑龙江大学），黑龙江哈尔滨150080

收稿日期:2020-07-11 修回日期:2020-08-04 接受日期:2020-08-04 出版日期:2020-08-31 发布日期:2020-08-04
通讯作者: 陆军
作者简介:赵柳，男，1995年生，硕士生. 研究方向：深度学习计算机视觉. E-mail: 2181411@s.hlju.edu.cn

MAEA-DeepLab: A semantic segmentation network with multi-feature attention effective aggregation module

ZHAO Liu, LU Jun, LIU Yang

1. College of Computer Science and Technology, Heilongjiang University, Harbin 150080, China； 2. Key Laboratory of Database and Parallel Computing of Heilongjiang Province(Heilongjiang University), Harbin 150080, China

Received:2020-07-11 Revised:2020-08-04 Accepted:2020-08-04 Online:2020-08-31 Published:2020-08-04

摘要/Abstract

摘要： 为了实现网络的低训练成本，在保持高精度的同时大大降低计算复杂性，提出了带有多特征注意力有效聚合模块(MAEA)的语义分割网络：MAEA-DeepLab.该编码器主网络采用了下采样16步幅的低分辨率特征映射，获得高级特征.解码器通过MAEA模块充分利用特征的空间注意力机制，有效聚合多特征，获得具有强大语义表示的高分辨率特征，有效地提高了解码器恢复重要细节信息的能力，实现了高精度分割.MAEA-DeepLab的Multiply-Adds只有DeepLabV3+架构的30.9%，即943.02 B，大大降低计算复杂性.架构不经过COCO数据集预训练，仅使用两张RTX 2080 ti GPU，在PASCAL VOC 2012数据集和CityScapes数据集的测试集上进行了语义分割基准测试，mIOU分数分别达到了87.5%和79.9%.实验结果表明，MAEA-DeepLab以低计算开销达到了很好的语义分割精度.

关键词: 语义分割, 编码器-解码器, MAEA-DeepLab, 空间注意力

Abstract: To realize the low cost of network training, the computational complexity is greatly reduced while maintaining high precision. A semantic segmentation network with multi-feature attention effective aggregation module(MAEA) is proposed: MAEA-DeepLab. A 16 stride low-resolution feature map for down-sampling is adopted in the encoder’s network backbone, and high-level features are obtained. The decoder makes full use of the feature's spatial attention mechanism through the MAEA module, effectively aggregates multiple features, and obtains high-resolution features with strong semantic representation. Then the ability of the decoder to recover important details is effectively improved, and high-precision segmentation is achieved. Multiply-adds in MAEA-DeepLab is 943.02B, only 30.9% of the DeepLabV3+ architecture, which greatly reduces the computational complexity. The architecture is not pre-training on the COCO dataset. It performs semantic semantic segmentation Benchmark tests on the test set of with PASCAL VOC 2012 dataset and CityScapes dataset with only two RTX 2080ti GPUs, and the mlOU scores reach 87.5% and 79.9%, respectively. The experimental results show that good semantic segmentation accuracy is achieved with low computational cost in MAEA-DeepLab.

Key words: semantic segmentation, encoder-decoder, MAEA-DeepLab, spatial attention

中图分类号:

TP317.4

赵柳，陆军，刘杨. MAEA-DeepLab：具有多特征注意力有效聚合模块的语义分割网络[J]. 中国科学技术大学学报, 2020, 50(8): 1170-1180.

ZHAO Liu, LU Jun, LIU Yang. MAEA-DeepLab: A semantic segmentation network with multi-feature attention effective aggregation module[J]. Journal of University of Science and Technology of China, 2020, 50(8): 1170-1180.

参考文献

［1］ MOTTAGHI R, CHEN X, LIU X, et al. The role of context for object detection and semantic segmentation in the wild[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014:891-898.
[2] CAESAR H, UIJLINGS J, FERRARI V. Coco-stuff: Thing and stuff classes in context[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1209-1218.
[3] ZHOU B, ZHAO H,PUIG X, et al. Scene parsing through ADE20K dataset[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017：5122-5130.
[4] LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [J]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[5] PENG C, ZHANG X, YU G, et al. Large kernel matters-improve semantic segmentation by global convolutional network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017.
[6] FU J, LIU J, WANG Y, et al. Stacked deconvolutional network for semantic segmentation [J]. IEEE Transactions on Image Processing, 2017:99.
[7] YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
[8] ZHANG Z, ZHANG X, PENG C, et al. Exfuse: Enhancing feature fusion for semantic segmentation [J]. European Conference on Computer Vision, 2018.
[9] WANG Y, ZHOU Q, LIU J, et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation [C]// 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
[10] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018.
[11] WANG Q, WU B, ZHU P, et al. Eca-net: Efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision & Pattern Recognition. IEEE, 2020.
[12] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision [C]//Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham, 2018.
[13] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
[14] NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2016.
[15] ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks [C]// 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015.
[16] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham,2015.

[17] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[18] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. (2016-06-07)[2020-06-11]. https://arxiv.org/abs/1412.7062v4.

[19] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[20] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05) [2020-06-11]. https://arxiv.org/pdf/1706.05587.
[21] LAZEBNI S, SCHMID C, PONCE J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories [C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2006.
[22] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [J]. IEEE Computer Society, 2016.
[23] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [C]// Proceedings of the International Conference on Learning Representations. IEEE, 2015.
[24] WU H, ZHANG J, HUANG K, et al. Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation [EB/OL]. (2019-03-28)[2020-06-11]. https://arxiv.org/pdf/1903.11816.pdf.
[25] TIAN Z, HE T, SHEN C, et al. Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
[26] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
[27] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
[28] LIN T Y, DOLL?倕AR P, GIRSHICK R, et al. Feature pyramid networks for object detection [J]. IEEE Computer Society, 2017.
[29] HUANG G, LIU Z, MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:2261-2269.
[30] EVERINGHAM M, ESLAMI S A, GOOL L V, et al. The pascal visual object classes challenge: A retrospective [J]. Springer, 2015, 111(1): 98-136.
[31] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.
[32] HARIHARAN B, ARBEL?倕AEZ P, BOURDEV L, et al. Semantic contours from inverse detectors[C]// IEEE International Conference on Computer Vision. IEEE, 2011.
[33] ABADI M, AGARWAL A, BARHAM P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems [EB/OL]. (2016-03-16)[2020-06-11]. https://arxiv.org/pdf/1603.04467.
[34] WU Z, SHEN C, HENGEL A. Wider or deeper: Revisiting the resnet model for visual recognition [J]. Elsevier, 2019: 119-133.
[35] YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
[36] ZHANG H, DANA K, SHI J, et al. Context encoding for semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018.
[37] ZHAO H S, QI X J, SHEN X Y, et al. ICNET for real-time semantic segmentation on high-resolution images[C]//Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11207. Springer, Cham, 2018.
[38] LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
[39] CHAO P, KAO C Y, RUAN Y S, et al. Hardnet: A low memory traffic network [C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2020.
[40] WU T, TANG S, ZHANG R, et al. Tree-structured kronecker convolutional network for semantic segmentation [C]// 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2019.
[41] LIU C, CHEN L C, SCHROFF F, et al. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 82-92.

[42]VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 5000-5008.

()
()

MAEA-DeepLab：具有多特征注意力有效聚合模块的语义分割网络

MAEA-DeepLab: A semantic segmentation network with multi-feature attention effective aggregation module

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics

本文评价