摘要

Objective: With the rapid development of remote sensing technology, numerous high-resolution remote sensing images have become available. As a result, the effective retrieval of remote sensing images has become a challenging research topic. Feature extraction is key to determining the retrieval performance of high-resolution remote sensing image retrieval tasks. Traditional feature extraction methods are mainly based on handcrafted features, whereas such shallow features are easily affected by artificial intervention. Convolutional neural networks (CNNs) can learn feature representations automatically, and thus are suitable to deal with high-resolution remote sensing images with complex content. However, the parameters of CNNs are difficult to train fully due to the small scale of currently available public remote sensing datasets. In this case, the transfer learning of CNNs has attracted much attention. CNNs pretrained on large-scale datasets have good generalization ability, and parameters can be transferred to small-scale data effectively. Therefore, extracting CNN features on the basis of transfer learning has become an effective method in the field of remote sensing image retrieval. Given the abundant and complex visual content of high-resolution remote sensing images, it is difficult to accurately express the content of remote sensing images using a single feature. Thus, feature fusion is a useful method to improve the feature representation of remote sensing images. To maximize the learning parameters of different CNNs to represent the content of remote sensing images, a method based on discriminant correlation analysis (DCA) is proposed to fuse the high-level features of different CNNs. Method: First, CNN parameters from VGGM(visual geometry group medium), VGG(visual geometry group)16, GoogLeNet, and ResNet50 are transferred for high-resolution remote sensing images, and the high-level features are adopted as special convolutional features. To preserve the original spatial information of the image, the high-level features are extracted under the original input image size, and the output form of three-dimensional tensor is retained. Then, max pooling is adopted on the high-level features to extract salient features. Second, DCA is adopted to enhance the feature representation. The DCA is the first to incorporate the class structure into the feature level fusion and has low computational complexity. To maximize the correlation of corresponding features across the two feature sets and in the same time decorrelates features that belong to different classes within each feature set, the between-class scatter matrices of the two sets of high-level features are calculated, and matrix diagonalization and singular value decomposition are adopted to transform the features. The transformed matrix contains the important eigenvectors of the between-class scatter matrix, and the dimension of the transformed matrix is reduced accordingly. Thus, the transformed feature vectors have strong discriminative power and low dimension. Lastly, two methods of concatenation and summation are selected to perform the fusion of transformed feature vectors, and the fused features are normalized via Gaussian normalization. The similarities between the query and dataset features are calculated using the Euclidean distance method, and the retrieval results are returned in accordance with the sort of similarities. Result: Experiment results on the UC-Merced, RSSCN7, and WHU-RS19 datasets show that the retrieval accuracy and retrieval time of most fusion features are effectively improved in comparison with a single high-level feature; the mean average precision (mAP) of the fusion feature is improved by 10.4%14.1%, 5.7%9.9%, and 5.9%17.6%, respectively. The retrieval results of the fused features using the concatenation method are better than that using the summation method. Multifeature fusion experiments show that the best result on the UC-Merced dataset is obtained from the fusion of four features, whereas the best results on the RSSCN7 and WHU-RS19 datasets are obtained from the fusion of three features. This finding indicates that a larger number of fused features does not translate into better performance; selecting the appropriate features is crucial for feature fusion. Especially, when the different features have good representation and similar retrieval capabilities, the fusion of these features can achieve good retrieval performance. Compared with other state-of-the-art approaches, the average normalized modified retrieval rank(ANMRR) and mAP of the proposed fused feature on the UC-Merced dataset reach 0.132 1 and 84.06%, respectively. Experimental results demonstrate that our method outperforms state-of-the-art approaches. Conclusion: The feature fusion method based on discriminant correlation analysis combines the salient information of different high-level features. This method reduces feature redundancy while improving feature discrimination. Features with equivalent retrieval capabilities can be fused by the proposed method well, thus effectively improving the retrieval performance of high-resolution remote sensing images.

全文