摘要

Objective Visual localization is focused on the location and estimation of motion objects via easy-to-use RGB images. The feature-extracted information is challenged to meet the requirements of tasks in traditional computer vision methods in terms of feature extraction algorithms. The deep learning-based feature abstraction and demonstration ability can promote an emerging research issue for pose estimation in computer vision. In addition, the development and application of depth cameras and lasers-based sensors can provide more diverse manners to this issue as well. However, these sensors have some constraints of the entity and shape of the object and it need to be used in a structured environment. Multi-vision ability is often challenged to the issues of installing and debugging problems. In contrast, sensors-visual applications are featured of low cost and less restrictions, and they are easy to be recognized and extended for multiple unstructured scenarios. Interferences are being existed in indoor scenes, such as object occlusion and weak texture areas, which can cause the incorrect estimation of the target points easily and affect the accuracy of visual localization severely. The different methods of camera-deployment can be divided into two categories based on visual object pose estimation method. 1) In order to get the target position data, one category of the two is based on monocular object positioning of pose estimation technology of using the deployment in cameras-fixed in the scene and detecting targets in the images of the relevant information. The pros of positioning result is stable and the cons of it is affected by light and fuzzy image easily, it cannot be dealt with object occlusion in the scene as well due to the limitation of observation angle; 2) The other category of two is oriented on scene reconstruction-based object pose estimation technology, which can use the camera fixed on the target itself to obtain the pose information of the target by detecting the feature points of the scene and matching the features with the 3D scene model constructed in advance. This scheme is derived of the status of texture features. 1) For rich textures and clear features scenes, the accurate positioning results-related can be obtained. 2) For non-texture features scenes and weak texture areas like walls scene, the positioning results are unstable, and other sensors such as inertial measurement unit (IMU) are needed to be positioning-aided. To achieve more precise positions of moving objects in indoor scenes, we propose an active and passive perception-based visual localization system, which combines the advantages of fixed and motion perspectives. Method First, a plane-prior object pose estimation method is proposed by our research team. Based on the monocular localization framework of keypoint detection, the plane-constraint is used to optimize the 3-DoF (degree of freedom) pose of the object and improve the localization stability under a fixed view. Second, we design a data fusion framework in terms of the unscented Kalman filtering algorithm. To improve the reliability of the pose estimation of the moving target, a fixed view-derived passive perception output and the active perception output are fused from a motion view. The active and passive-integrated indoor visual positioning system is composed of three aspects as mentioned below: 1) passive positioning module, 2) active positioning module, and 3) active and passive fusion module. The input of passive positioning module is oriented to RGB image-captured by indoor fixed camera, and the output is based on the target pose data-contained in the image. The input of the active positioning module is the RGB image shot on the perspective of the target to be located, and the output is based on the position and pose-relevant information of the target in the 3D scene. The active and passive fusion module is dealt with the integrating the positioning results of passive and active positioning, and the output is linked to more accurate positioning result of the target in the indoor scene. Result The average localization error of the indoor visual localization system proposed can reach 2 3 cm on the iGibson simulation dataset, and the accuracy of the 10 cm-within localization error can reach to 99% . In the real scenes, the average localization error can reach 3 4 cm, and the accuracy of the localization error within 10 cm is above 90% . Experimental results are shown our proposed system can obtain centimeter-level accurate positioning. The experimental results of real scenes illustrate that the active and passive fusion visual positioning system can reduce the external interference of passive positioning algorithm under fixed visual angle effectively due to the limitation of visual angle, object occlusion and other external disturbances, and it also can optimize the defects of single frame positioning algorithm with insufficient stability and large random error. Conclusion Our visual localization system has its potentials to the integrated advantages of passive-based and active-based methods, which can achieve high-precision positioning results in indoor scenes at a low cost. It also shows better robust performance under complex interference such as occlusion and target-missed. We develop a lossless Kalman filter based framework of active and passive fusion indoor visual positioning system for indoor mobile robot operation. Compared to the existing visual positioning algorithm, it can achieve high-precision target positioning results in indoor scenes with lower equipment cost. And, under the shade circumstances, the loss of target under complex environment factors shows robust positioning performance and the indoor scene visual centimeter-level accuracy-purified of positioning. The performance is tested and validated in simulation and the physical environment both. The experimental results show that the positioning system has its priority on high positioning accuracy and robustness for multiple scenarios further.

  • 单位
    计算机辅助设计与图形学国家重点实验室; 浙江大学

全文