摘要

Objective Due to sample imbalance in the existing road extraction methods in remote sensing images, we facilitate the deep convolutional neural aggregation network model, integrated attention mechanism and dilation convolutional (A&D-UNet) to optimize the issues of low automation, less extraction accuracy, and unstable model training. Method To reduce the complexity of deep network model training, the A&D-UNet model uses residual learning unit (RLU) in the encoder part based on the classical U-Net network structure. To highlight road feature information, the convolutional block attention module (CBAM) is applied to assign weights optimally from channel and spatial dimensions both to accept a larger range of receptive filed, the following road features information is obtained by dilated convolutional unit (DCU) . The A&D-UNet model takes full advantage of residual learning, dilated convolution, and attention mechanisms to simplify the training of the model, obtain more global information, and improve the utilization of shallow features, respectively. First, RLU, as a component of the backbone feature extraction network, takes advantage of identity mapping to avoid the problem of difficult training and degradation of the model caused by deep and continuous convolutional neural networks. Second, DCU makes full use of the road feature map after the fourth down-sampling of the model and integrates the contextual information of the road features through the consistent dilation convolution with different dilation rates. Finally, CBAM multiplies the attention to road features by the form of weighted assignment along the sequential channel dimension and spatial dimension, which improves the attention to shallow features, reduces the interference of background noise information. The binary cross-entropy (BCE) loss function is used to train the model in image segmentation tasks in common. However, it often makes the model fall into local minima when facing the challenge of the unbalanced number of road samples in remote sensing images. To improve the road segmentation performance of the model, BCE and Dice loss functions are combined to train the A&D-UNet model. To validate the effectiveness of the model, our experiments are conducted on the publicly available Massachusetts road dataset (MRDS) and deep globe road dataset. Due to the large number of blank areas in the MRDS and the constraints of computer computing resources, these remotely sensed images are cropped to a size of 256 × 256 pixels, and contained blank areas are removed. Through the above processing steps, 2 230 training images and 161 test images are generated. In order to compare the performance of this model in the roadway extraction task, we carry out synchronized road extraction experiments to visually analyze the results of road extraction via three network models, classical U-Net, LinkNet, and D-LinkNet. In addition, such five evaluation metrics like overall precision (OA), precision (P), recall (R), F1-score (F1), and intersection over union (IoU) are used for a comprehensive assessment to analyze the extraction effectiveness of the four models quantitatively. Result The following experimental results are obtained through the comparative result of road extraction maps and quantitative analysis of metrics evaluation: 1) the model proposed in this work has better recognition performance in three cases of obvious road-line characteristics (ORLC), incomplete road label data (IRLD), and the road blocked by trees (RBBT) . A&D-UNet model extracts road results that are similar to the ground truth of road label images with clear linear relationship of roads. It can learn the relevant features of roads through large training data sets of remote sensing images, avoiding the wrong extraction of roads in the case of IRLD. It can extract road information better by DCU and CBAM in the RBBT case, which improves the accuracy of model classification prediction. 2) The A&D-UNet network model is optimized compared algorithms in the evaluation metrics of OA, F1, and IoU, reaching 95. 27%, 77. 96% and 79. 89% in the Massachusetts road testsets, respectively. To alleviate the degradation problem of the model caused by more convolutional layers to a certain extent, the A&D-UNet model uses RLU as the encoder in comparison with the classical U-Net network, and its OA, F1, and IoU are improved by 0. 99%, 6. 40%, and 4. 08%, respectively. Meanwhile, the A&D-UNet model improves OA, F1, and IoU on the test set by 1. 21%, 5. 12%, and 3. 93% over LinkNet through DCU and CBAM, respectively. 3) The F1 score and IoU of A&D-UNet model are trained and improved by 0. 26% and 0. 18% each via the compound loss function. This indicates that the loss function combined by BCE and Dice can handle the problem of imbalance between positive and negative samples, thus improving the accuracy of the model prediction classification. Through the above comparative analysis between different models and different loss functions, it is obvious that our A&D-UNet road extraction model has better extraction capability. 4) Judged from testing with the deep globe road dataset, we can obtain the OA, F1 score, and IoU of the A&D-UNet model (each of them is 94. 01%, 77. 06%, and 78. 44%), which shows that the A&D-UNet model has a better extraction effect on main roads with obvious road-line characteristics, narrow road unmarked in label data/ overshadowed roads. Conclusion Our A&D-UNet aggregation network model is demonstrated based on RLU with DCU and CBAM. It uses a combination of BCE and Dice loss functions and MRDS for training and shows better extraction results. The road extraction model is integrated to residual learning, attention mechanism and dilated convolution. This novel aggregation network model is featured with high automation, high extraction accuracy, and good extraction effect. Compared to current classical algorithms, it alleviates problems such as difficulties in model training caused by deep convolutional networks through RLU, uses DCU to integrate detailed information of road features, and enhances the degree of utilization of shallow information using CBAM. Additionally, the integrated BCE and Dice loss function optimize the issue of unbalanced sample of road regions and background regions.

全文