红外—可见光跨模态的行人检测综述

作者:Bie Qian; Wang Xiao*; Xu Xin; Zhao Qijun; Wang Zheng; Chen Jun; Hu Ruimin
来源:Journal of Image and Graphics, 2023, 28(5): 1287-1307.
DOI:10.11834/jig.220670

摘要

The precision of pedestrian detection is focused on instances-relevant location on given input images. However,due to the perception of visible images to light changes,visible images are challenged for lower visibility conditions like extreme weathers. Hence,visible images-based pedestrian detection is merely suitable for the development of temporal applications like autonomous driving and video surveillance. The infrared image can provide a clear pedestrian profile for such low-visibility scenes according to the temperature difference between the human body and the environment. Under the circumstances of sufficient light,visible images can also provide more information-lacked in infrared images like hair,face,and other related features. Visible and infrared images can provide visual information-added in common. However,the key challenges of visible and infrared images is to utilize the two modalities-between and their modality-specific noise mutually. To generate temperature information,the difference is leaked out that the visible image consists of color informa? tion in red,green,and blue(RGB)channels,while the infrared image has one channel only. And,imaging mechanism-based wavelength range of the two is different as well. The emerging deep learning technique based cross-modal pedestrian detection approaches have been developing dramatically. Our summary aims to review and analyze some popular researches on cross-modal pedestrian detection in recent years. It can be segmented into two categories:1)the difference between two different modalities and 2)the cross-modal detectors application to the real scene. The application of cross-modal pedes? trian detectors to the actual scene can be divided into three types:cost analysis-related data annotation,real-time detec? tion,and cost-analysis of applications. The research aspects between two modalities can be divided into:the misalignment and the inadequate fusion. The misalignment of two modalities shows that the visible-infrared image pairs are required to be strictly aligned,and the features from different modalities are called to match at corresponding positions. The inadequate fusion of two modalities is required to maximize the mutual benefits between two modalities. The early research on the insuf? ficient fusion of two-modality is related to the study of the fusion stage(when to fuse)of two-modality. The later studies on the insufficient fusion of two-modality data are focused on the study of the fusion methods(how to fuse)of two-modality. The fusion stage can be divided into three steps:image,feature,and decision. Similarly,the fusion methods can be seg? mented into three categories:image,feature,and detection. Subsequently,we introduce some commonly used cross-modal pedestrian detection datasets,including the Korea Advanced Institute of Science and Technology(KAIST),the for? ward looking infrared radiometer(FLIR),the computer vision center-14(CVC-14),and the low-light visible-infrared parred(LLVIP). Then,we introduce some evaluation metrics method for cross-modal pedestrian detectors,including missed rate(MR),mean average precision(mAP),and a pair of visible and thermal images in temporal(speed). Finally,we summarize the challenges to be resolved in the field of cross-modal pedestrian detection and our predictions are focused on the future direction analysis of cross-modal pedestrian detection. 1)In the real world,due to the different paral? lax and field of view of two different sensors,the problem of misalignment of visible-infrared modality feature modules is more concerned about. However,the problem of unaligned modality features is possible to sacrifice the performance of the detector and hinder the use of unaligned data in datasets,and is not feasible to the application of dual sensors in real life to some extent. Thus,the problem of two modalities’position is to be resolved as a key research direction. 2)At present,the datasets of cross-modal pedestrian detection are all captured on sunny days,and current advanced cross-modal pedestrian detection methods are only based on all-day pedestrian detection on sunny days. However,to realize the cross-modal pedes? trian detection system throughout all day and all weathers,it is required to optimize and beyond day and night data on sunny days. We also need to focus on the data under extreme weather conditions. 3)Recent studies on cross-modal pedes? trian detection are focused on datasets captured by vehicle-mounted cameras. Compared to datasets captured from the moni? toring perspective,the scenes of vehicle-mounted datasets are changeable,which can suppress over-fitting effectively. However,the nighttime images in the vehicle-mounted datasets may be brighter than those of the surveillance perspective datasets because of their headlight brightness at night. Therefore,we predict that multiple visual-angles datasets can be used to train the cross-modal pedestrian detector at the same time. It can not only increase the robustness of the model in darker scenes,but also suppress over-fitting at a certain scene. 4)Autonomous driving systems and robot systems are required to be quick responded for detection results. Although many models have fast inference ability on GPU(graphics processing unit),the inference speed on real devices need to be optimized,so real-time detection will be the continuous development direction of cross-modal pedestrian detection as well. 5)There is still a large gap in cross-modal pedestrian detection technology for small scale and partial or severe occluded pedestrians. However,driving systems-assisted detec? tion and occlusion can be as a very common problem in life for small targets of pedestrians at a distance to alert drivers to slow down in advance. The cross-modal pedestrian detection technology can be forecasted and recognized for small scale targets and occlusion as the direction of future research. ? 2023 Editorial and Publishing Board of JIG

全文