摘要
Objective: Aircraft type recognition is a fundamental problem in remote sensing image interpretation, which aims to identify the type of aircraft in an image. Aircraft type recognition algorithms have been widely studied and improved ceaselessly. The traditional recognition algorithms are efficient, but their accuracy is limited by the small capacity and poor robustness. The deep-learning-based methods have been widely implemented because of good robustness and generalization, especially in the object recognition task. In remote sensing scenes, the objects are sparsely distributed; hence, the available samples are few. In addition, labeling is time consuming, resulting in a modest number of labeled samples. Generally, the deep-learning-based models rely on a large amount of labeled data due to thousands of weights needed to learn. Consequently, these models suffer from scarce data that are insufficient to meet the demand of large-scale datasets, especially in the remote sensing scene. Generative adversarial network (GAN) can produce realistic synthetic data and enlarge the scale of the real dataset. However, these algorithms usually take random noises as input; therefore, they are unable to control the position, angle, size, and category of objects in synthetic images. Conditional GAN (CGAN) have been proposed by previous researchers to generate synthetic images with designated content in a controlled scheme. CGANs take the pixel-wise labeled images as the input data and output the generated images that meet constraints from its corresponding input images. However, these generative adversarial models have been widely studied for natural sceneries, which are not suitable for remote sensing imageries due to the complex scenes and low resolutions. Hence, the GANs perform poorly when adopted to generate remote sensing images. An aircraft recognition framework of remote sensing images based on sample generation is proposed, which consists of an improved CGAN and a recognition model, to alleviate the lack of real samples and deal with the problems mentioned above. Method: In this framework, the masks of real aircraft images are labeled pixel by pixel. The masks of images serve as the conditions of the CGAN that are trained by the pairs of real aircraft images and corresponding masks. In this manner, the location, scale, and type of aircraft in the synthetic images can be controlled. Perceptual loss is introduced to promote the ability of the CGANs to model the scenes of remote sensing. The L2 distance between the features of real images and synthetic images extracted by the VGG-16 (visual geometry group 16-layer net) network measures the perceptual loss between the real images and synthetic images. Masked structural similarity (SSIM) loss is proposed, which forces the CGAN to focus on the masked region and improve the quality of the aircraft region in the synthetic images. SSIM is a measurement of image quantity according to the structure and texture. Masked SSIM loss is the sum of the product between masks and SSIM pixel by pixel. Afterward, the loss function of the CGAN consists of perceptual loss, masked SSIM loss, and origin CGAN loss. The recognition model in this framework is ResNet-50, which outputs the type and recognition score of an aircraft. In this paper, the recognition model trained on synthetic images is compared with the model trained on real images. The remote sensing images from QuickBird are cropped to build the real dataset, in which 800 images for each type are used for training and 1 000 images are used for testing. After data augmentation, the training dataset consists of 40 000 images, and the synthetic dataset consists of synthetic images generated by the generation module with flipped, rotated, and scaled masks. The generators are selected from different training stages to generate 2 000 synthetic images per type and determine the best end time in the training procedure. The chosen generator is used to produce different numbers of images for 10 aircraft types and find an acceptable number of synthetic images. These synthetic images serve as the training set for the recognition model, whose performances are compared. All our experiments are carried out on a single NVIDIA K80 GPU device with the framework of Pytorch, and the Adam optimizer is implemented to train the CGAN and ResNet-50 for 100 epochs. Result: The quantities of the synthetic images from the generator with and without our proposed loss functions on the training dataset are compared. The quantitative evaluation metrics contain peak signal to noise ratio (PSNR) and SSIM. Results show that PSNR and SSIM increase by 0.88 and 0.346 using our method, respectively. In addition, recognition accuracy increases with the training epoch of the generator and the number of synthetic images. Finally, the accuracy of the recognition model trained on the synthetic dataset is 0.33% less than that of the real dataset. Conclusion: An aircraft recognition framework of remote sensing images based on sample generation is proposed. The experiment results show that our method effectively improves the ability of CGAN to model the remote sensing scenes and alleviates the absence of data.
- 单位