Issue 
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 1, February 2023



Page(s)  20  28  
DOI  https://doi.org/10.1051/wujns/2023281020  
Published online  17 March 2023 
Computer Science
CLC number: TP 391
RRVPE: A Robust and RealTime VisualInertialGNSS Pose Estimator for Aerial Robot Navigation
^{1}
College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, Jiangsu, China
^{2}
School of Mathematics & Physics, Anhui University of Technology, Maanshan 243000, Anhui, China
^{3}
Electric Power Research Institute of Guizhou Power Grid Co., Ltd., Guiyang 550002, Guizhou, China
^{†} To whom correspondence should be addressed. Email: YangZhong@nuaa.edu.cn
Received:
25
July
2022
Selflocalization and orientation estimation are the essential capabilities for mobile robot navigation. In this article, a robust and realtime visualinertialGNSS(Global Navigation Satellite System) tightly coupled pose estimation (RRVPE) method for aerial robot navigation is presented. The aerial robot carries a frontfacing stereo camera for selflocalization and an RGBD camera to generate 3D voxel map. Ulteriorly, a GNSS receiver is used to continuously provide pseudorange, Doppler frequency shift and universal time coordinated (UTC) pulse signals to the pose estimator. The proposed system leverages the Kanade Lucas algorithm to track ShiTomasi features in each video frame, and the local factor graph solution process is bounded in a circumscribed container, which can immensely abandon the computational complexity in nonlinear optimization procedure. The proposed robot pose estimator can achieve camerarate (30 Hz) performance on the aerial robot companion computer. We thoroughly experimented the RRVPE system in both simulated and practical circumstances, and the results demonstrate dramatic advantages over the stateoftheart robot pose estimators.
Key words: computer vision / visualinertialGNSS(Global Navigation Satellite System) pose estimation / realtime autonomous navigation / sensor fusion / robotics
Biography: ZHANG Chi, male, Ph.D. candidate, research direction: robot navigation. Email: laozhang@nuaa.edu.cn
Fundation item: Supported by the Guizhou Provincial Science and Technology Projects ([2020]2Y044), the Science and Technology Projects of China Southern Power Grid Co. Ltd. (066600KK52170074), and the National Natural Science Foundation of China (61473144)
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0 Introduction
Aerial robots will soon play a significant role in industrial inspection, accident warning, and national defense^{[13]}. For such operations, flight mode dependent on the human remote control can no longer meet the mission requirements under complex conditions. At present, the autonomous navigation ability has become an important indicator to measure robot intelligent level. It is difficult to obtain the aerial robot pose in real time and solve the problem of autonomous robot navigation. Fully autonomous navigation requires aerial robots to have accurate pose estimation and robust environmental awareness. Due to the aerial robot's swing during flying, the state estimation results are difficult to converge during movement, which leads to the instability of the existing pose estimation algorithms.
Compared with the aerial robot pose estimator based on a single sensor, the multisensor fusion pose estimation technologies^{[48]} can make full use of different kinds of sensors to obtain more accurate and robust robot pose estimation results. Stereo camera and inertial measurement unit (IMU) can output the robot position with centimeterlevel precision in the local coordinate system^{[9, 10]}, but the pose in the local frame will drift with the aerial robot motion. Global navigation satellite system (GNSS) has been widely used in various mobile robot navigation tasks to provide driftfree position information for agents^{[11,12]}. However, the GNSSbased agent navigation method is not suitable to use indoors, and is vulnerable to noise and multipath effects, resulting in only meterlevel positioning accuracy. In order to leverage the complementary characteristics of different sensors, the pose estimation algorithm based on visionIMUGNSS information fusion can make full use of the respective advantages of stereo cameras, accelerometers, gyroscopes and navigation satellites to obtain accurate and drift free aerial robot pose estimation. Unfortunately, the pose estimation strategy based on visionIMUGNSS sensor fusion will face the following problems: Firstly, the output frequencies from each sensor are different (the frequencies of camera, IMU and GNSS receiver are 30, 200, and 10 Hz, respectively). How to fuse these raw measurements from different sensors will be an intractable problem^{[13,14]}; Secondly, how can the pose estimator quickly restore to normal state when one of the sensors suddenly fails?
To deal with the aforementioned problems, we propose a robust and realtime visualinertialGNSS tightly coupled pose estimation (RRVPE) method, which is a probabilistic factor graph optimizationbased pose estimation strategy, for aerial robot navigation. The RRVPE system can achieve realtime robot state estimation after compute unified device architecture (CUDA) acceleration on an airborne computer. The main novelties of RRVPE are exhibited as below:
1) The RRVPE system leverages the KanadeLucas^{[15]} algorithm to track ShiTomasi^{[16]} features in each video frame, and the image corners describing and matching between adjacent frames are not required in corner tracking procedures. After NVIDIA CUDA acceleration, the robot pose estimator can achieve camerarate (30 Hz) performance on a small embedded platform.
2) The local bundle adjustment (BA) and factor graph solution process are bounded in a circumscribed container, which can dramatically abandon the number of variables in nonlinear optimization procedure. Furthermore, the computation complexity of the RRVPE system is discharged by the additional marginalization strategy.
3) By making full use of the GNSS raw measurements, the intrinsic drift from the visionIMU odometry will be abandoned, and the yaw angle residual between the odometry frame and the world frame will be updated without any offline calibration. The aerial robot pose estimator is able to rapidly execute in unpredictable environments and achieves local smoothness and global consistency without visual closed loop detection.
1 System Overview
The aerial robot equipped with the RRVPE navigation system is shown in Fig. 1, which tightly fuses sparse optical flow tracking and inertial measurements with GNSS raw data for precise and driftless aerial robot pose estimation.
Fig. 1 The aerial robot equipped with the RRVPE navigation system 1. Intel RealSense D435i camera: responsible for building realworld 3D voxel map; 2. Intel RealSense T265 camera: responsible for providing binocular video stream and inertial measurement information; 3. UBlox ZEDF9P receiver: responsible for receiving GNSS pseudorange, Doppler, ephemeris and time pulse information 
The structure of the RRVPE system overview is illustrated in Fig. 2. First, the raw sensor data from the aerial robot are preprocessed, including visual feature extraction and tracking, IMU preintegration, and GNSS signal filtering. Then, vision, IMU and GNSS cost functions are formulated respectively, and visionIMUGNSS information is jointly initialized to obtain all the initial values of the aerial robot pose estimator. Finally, the aerial robot pose solving process is constructed as a state estimation problem. In the meantime, the corresponding probability factor graph model and marginalization strategy are designed. The aerial robot pose is solved by nonlinear optimization, and subsequently the accurate, robust and drift free robot pose can be achieved.
Fig. 2 Main parallel threads of RRVPE system 
In the system initialization stage, the camera trajectory is solved by the structure from motion (SfM) algorithm^{[1719]}, and the IMU raw measurement data is preintegrated^{[20, 21]} to initialize the visionIMU tightly coupled robot odometry. Then, the rough robot position in the world coordinate frame is solved by the single point positioning (SPP) algorithm. Under the condition that visualIMU odometry is used as prior information, the transformation matrix between the odometry coordinate frame and the world coordinate frame is solved in nonlinear optimization. Finally, the precise pose of the aerial robot in the global coordinate system is modified by probability factor graph optimization.
After the estimator initialization, constraints from all sensor measurements are tightly coupled to calculate aerial robot states within a circumscribed container. If the GNSS broadcast is not available or cannot be entirely initialized, the RRVPE system will naturally degrade to visualIMU odometry. In order to maintain the realtime performance of the estimation system, the additional marginalization scheme^{[6]} is also applied after each optimization.
We define (ˑ)^{r} as the robot coordinate system, (ˑ)^{c} as the camera coordinate system and (ˑ)^{o} as the odometry frame, where the direction of the gravity is aligned with the Z axis. World coordinate system (ˑ)^{w} is a semiglobal frame, where the X and Y axes direct to the east and north direction respectively, and the Z axis is also gravity aligned. The earthcentered, earthfixed (ECEF) frame (ˑ)^{e} and the earthcentered inertial (ECI) frame (ˑ)^{E} are global coordinate system that is fixed with respect to the center of the earth. The difference between the ECEF and the ECI frame is that the latter's coordinate axis does not change with the rotation of the earth.
2 Aerial Robot Pose Estimator
2.1 Formulation
In this section, the aerial robot pose estimation is formulated as a probabilistic factor graph optimization procedure, and sensor measurement information constitutes a composite of multifarious factors in the graph, which constrains the aerial robot states. The factors in the probabilistic graph are composed of visual factor, inertia factor and GNSS factor. All of the factors in the factor graph will be formulated in detail through this section.
We can take advantage of a sliding windowbased tightly coupled visualinertialGNSS pose estimator for exceedingly robust and realtime aerial robot state estimation. The whole states inside the sliding window can be summarized as:
where x_{k} is the aerial robot state at the time t_{k} that the kth video frame is captured. It contains orientation , velocity , position , gyroscope bias and acceleration bias of the aerial robot in the odometry frame. and correspond to the clock biases and bias drifting rate of the GNSS receiver, respectively. n is the window size and m is the total number of visual features in the sliding window. is the inverse depth of the lth visual feature. is the yaw bias between the odometry and the world frame.
2.2 Visual Constraint
The visual factor constraint in the probabilistic graph is constructed from a sequence of sparse corner points. Considering the unstable vibration of the aerial robot, we separate the ShiTomasi^{[16]} sparse feature points for the KanadeLucasTomasi (KLT) optical flow tracking^{[15]}.
For each input video frame, when the number of feature points is less than 120, new corner points are extracted to maintain a sufficient number of tracking features. Meanwhile, a uniform feature point distribution is carried out by setting a minimum pixel space between neighboring corners. It is worth noting that the corner extraction and KLT optical flow tracking procedures can achieve camerarate performance on the NVIDIA Jetson Xavier NX board after being accelerated by CUDA. Assume the homogeneous coordinates of the image feature point l in the world coordinate frame are:
Then the homogeneous coordinates of feature point l in the pixel plane of video frame i can be expressed as:
where u and v are coordinate values on the pixel plane. The projection model of the airborne camera can be expressed as:
where T is the transformation matrix, n_{c} is the camera imaging noise, and K is the camera internal parameter matrix:
where f and c represent scaling and translation during camera projection, respectively. The elements of the internal parameter matrix can be obtained by the camera calibration process, and the reprojection model of the feature point l from the video frame i to the video frame j can be formulated as:
with
where represents the inverse depth of feature point l relative to the airborne camera c_{i}.
Then the visual factor constraint can be expressed as the deviation between the actual position of the image feature point l in the video frame j and the measurement position :
where represents the subvector related to visual information in the aerial robot state vector.
2.3 Inertial Measurements Constraint
In the world coordinate frame, the aerial robot's pose and velocity can be obtained by the raw data of the inertial measurement unit that are measured in the aerial robot body frame. The IMU raw data includes two parts: gyroscope measurement and accelerometer measurement , both of which are affected by the gyroscope bias b_{ω} and the acceleration bias b_{a}, respectively. The raw measurement values of the gyroscope and accelerometer can be constructed by the following formulas:
where, symbols and represent the measured values of the gyroscope and accelerometer at time t with the current body coordinate system as the reference system respectively; b_{ω} and b_{a} are the gyroscope bias and accelerometer bias; n_{ω} and n_{a} are gyroscope noise and accelerometer noise; g^{w} is the gravitational acceleration. The gyroscope and accelerometer noises are Gaussian white noise; the gyroscope biases and the accelerometer biases obey Brownian motion, and their derivatives obey Gaussian distribution.
Assuming that the motion time of the aerial robot in two consecutive video frames is t_{k} and t_{k+}_{1}, then the orientation (o), velocity (v) and position (p) of the aerial robot at time t+1 in the local world coordinate system can be expressed by the following formula:
where
In the above formula, symbols and are the measured values from gyroscope and accelerometer, and the symbols represent quaternion multiplications.
If the reference coordinate system is converted from the local world coordinate system (w) to the robot coordinate system (r), the above formula can be rewritten as:
where the IMU preintegration term can be expressed as:
Then the firstorder Jacobian approximation of the IMU preintegration term can be expressed by the following formula:
This formula represents a submatrix of the Jacobian matrix. When the gyroscope or accelerometer bias changes, the above firstorder Jacobian approximation can be used to replace the IMU preintegration without reintegration.
The gyroscope factor constraint term is constructed as a rotation residual based on quaternion outer product. Simultaneously, the accelerometer factor constraint term is constructed as velocity preintegration residual and translation preintegration residual respectively. The gyroscope bias and accelerometer bias factor terms are obtained from the bias difference between two consecutive video frames. Then the IMU factor constraint can be constructed as follows:
where represents the subvector related to IMU in the aerial robot state vector.
2.4 GNSS Constraint
Currently, there are 4 complete and independently operated GNSS constellations, namely, BeiDou, GPS, Galileo, GLONASS. The navigation satellites in each GNSS constellation ceaselessly broadcast modulated carrier signals, and consequently the ground receiver can distinguish the navigation satellites and demodulate the original messages. The GNSS factor constraint in the probability factor graph model is composed of pseudorange factor, Doppler frequency shift factor and receiver clock offset factor. The pseudorange measurement model between the receiver and the navigation satellite can be expressed as:
with
Here, and are the positions of the ground receiver and navigation satellite in the Earthcentered inertial (ECI) coordinate system respectively. is the measured value of GNSS pseudorange, c is the propagation speed of light in vacuum, and are the clock offset of the receiver and satellite, respectively, and are the delay of troposphere and ionosphere in the atmosphere, respectively, is the delay caused by multipath effect, is the noise of pseudo range signal, is the earth's rotation speed, represents the signal propagation time from the satellite to the receiver.
Then the GNSS pseudorange factor constraint can be constructed as the residual between the true pseudorange and the receiver measured pseudorange:
where represents the subvector related to the GNSS pseudorange in the aerial robot state vector.
Besides pseudorange, Doppler frequency shift is also an important navigation information in GNSS modulated signal. The Doppler frequency shift measurement of GNSS receiver and navigation satellite can be modeled as:
with
where is the carrier wavelength, is the direction vector between the satellite and the receiver, and are the speed of the receiver and the satellite respectively, and and are the clock offset drifting rate of the receiver and the satellite respectively.
Then the GNSS Doppler shift factor constraint can be constructed as the residual between the true carrier Doppler shift and the Doppler shift measurement:
where, represents the subvector related to GNSS Doppler frequency shift in the agent state vector , and is the Doppler frequency shift measurement value.
Now, the GNSS receiver clock offset error from t_{k} to t_{k+}_{1} is constructed as follows:
By combining the pseudorange factor , the Doppler frequency shift factor and the receiver clock offset factor , the GNSS factor constraint item in the aerial robot probability factor graph model can be formed.
2.5 Tightly Coupled Pose Estimation
Considering the aerial robot pose solving process as a state estimation problem, the optimal state of the aerial robot is the maximum a posteriori estimation of the robot state vector. Assuming that the measurement signals of the aerial robot's airborne camera, IMU, and GNSS receiver are independent of each other, and the measurement noise conforms to a Gaussian distribution with zero mean, the maximum a posteriori estimation problem can be equivalent to minimizing the sum of errors, then the solution process of the aerial robot's state vector can be expressed as:
where, z is the aerial robot linear observation model, represents the prior error, H_{P} matrix is the prior pose information obtained by the airborne camera, n is the number of robot state vectors in the sliding window, and E(·) represents the sum of all sensor measurement error factors.
Finally, by solving the aerial robot state vector by means of probability factor graph optimization, the complete robot pose information can be obtained.
3 Experiments
3.1 Implementation for Aerial Robot Navigation
We chose the NVIDIA jetson Xavier NX board as the companion computer for aerial robot autonomous navigation. The Intel RealSense T265 binocular camera is employed to provide visual and inertial raw measurement information for the aerial robot pose estimation. Simultaneously, the Intel Realsense D435i RGBD camera can provide 3D point cloud map. The UBlox ZEDF9P is used for GNSS receiver module, which can continuously provide GNSS pseudorange, Doppler frequency shift and universal time coordinated (UTC) pulse signals to aerial robot pose estimator. A carbon fiber quadrotor unmanned aerial vehicle (UAV) with 410 mm wheelbase can be used as the carrier of companion computer and sensors. Pixhawk4 is chosen as the flight control automatic pilot, and the PX4 is employed as the flight control firmware. Both onboard cameras and GNSS receiver are connected to the companion computer via USB. The ground station is connected with the Pixhawk4 automatic pilot and companion computer through WiFi 2.4G and Ethernet, respectively. The detailed description is shown in Fig. 3.
Fig. 3 The aerial robot implementation scheme 
3.2 Simulation in Public Dataset
The EuRoC datasets^{[22]} are collected from a binocular fisheye camera (Aptina MT9V034) and synchronized inertial measurement unit (Analog Devices ADIS 16448) carried by a micro aerial robot. The resolution of the binocular camera is 752×480, and the exposure mode of this camera is a global shutter with 20 Hz output frequency. The EuRoC datasets^{[22]} contain 11 sequences, which includes different lighting conditions and different environments. We compare the proposed RRVPE with OKVIS^{[4]} and VINSFusion^{[16]} in EuRoC datasets. OKVIS is another nonlinear optimizationbased visualinertial odometry, and VINSFusion is the stateoftheart KLT sparse optical flow trackingbased tightly coupled agent state estimator.
All methods are compared in a NVIDIA Jetson Xavier NX embedded device, as shown in Fig. 4. The NVIDIA Jetson series devices are slightly different from other onboard computers on the score of its GPU module with 384 CUDA cores, which allows the RRVPE system to execute in real time with CUDA parallel acceleration. The comparison of experimental results on rootmeansquare error (RMSE) are shown in Table 1, which is verified by an absolute trajectory error (ATE). Figure 5 shows the system consistency on absolute pose error (APE) as time goes on in the sequence MH01. RRVPE will inevitably generate some accumulation errors over time, which is an inherent characteristic of all visual based robot state estimators. Fortunately, due to the local bundle adjustment, the accumulation error of the RRVPE system is always within a reasonable range. The experimental results show that, on the NVIDIA Jetson Xavier NX embedded companion computer, RRVPE system shows a favorable accuracy comparing with other stateoftheart agent sate estimators, and achieves realtime performance.
Fig. 4 NVIDIA Jetson Xavier NX 
Fig. 5 The change of APE as time goes on in sequence MH01 
Performance comparison in the EuRoC datasets on RMSE (unit:m)
3.3 Simulation for Aerial Robot Navigation
Due to the instability of the open source flight control algorithm, a simulation test is needed before realworld flight, which can effectively avoid the aerial robot crash caused by a program error. We carried out the virtual experiment for aerial robot autonomous navigation in the Gazebo simulator, as shown in Fig. 6. After taking off, the aerial robot leverages a virtual plugin stereo camera and GNSS raw signals to obtain the spatial position. Meanwhile, a 3D voxel map calculated by a virtual plugin RGBD camera is structured to further capture the transformation matrix between the aerial robot and neighbouring obstruction. When the flight destination is entered manually, the trajectory planner generates a path for the aerial robot motion and sends the desired speed to the flight controller, then gradually approaches the destination and keeps a fixed distance from the neighbouring obstruction.
Fig. 6 Aerial robot navigation test carried out in the Gazebo simulator 
3.4 RealWorld Aerial Robot Navigation
In order to verify the robustness and practicability of the proposed aerial robot navigation system, we conduct both simulation and realworld physical verification similar to the Gazebo test. The visualinertial sensor used in our realworld test is an Intel RealSense T265 binocular camera. In the meantime, an Intel RealSense D435i RGBD camera is used to capture the 3D environmental map. In addition, the UBlox ZEDF9P is employed as GNSS receiver that is a highprecision multiband receiver with multiconstellation support. The realworld experiment was conducted on a campus tennis court, where the sky is open and most of the navigation satellites are well tracked. The terrain crossed by the aerial robot is an artificial arbitrary obstruction, as shown in Fig. 7.
Fig. 7 The realworld navigation environment conducted on a campus tennis court 
During flight, the aerial robot can change its route when approaching an obstacle and always keep a reasonable distance from the neighbouring obstruction. It is worth noting that all flight control commands are generated by the NVIDIA Jetson Xavier NX board, and the flight autopilot does not receive any external control signal generated by external environment.
4 Conclusion
In this paper, we proposed RRVPE: a robust and realtime visualinertialGNSS tightly coupled pose estimator for aerial robot navigation, which combines KLT sparse optical flow, inertial measurements and GNSS raw signal to estimate aerial robot state between consecutive images. In the nonlinear optimization phase, visualinertialGNSS raw measurements were formulated by the probabilistic factor graph in a small sliding container. The RRVPE system can achieve realtime robot state estimation with CUDA acceleration on an airborne computer. The proposed system is evaluated using both simulated and realworld physical experiments, demonstrating clear advantages over stateoftheart approaches.
References
 Lei L, Li Z H, Yang H, et al. Extraction of the leaf area density of maize using UAVLiDAR data [J]. Geomatics and Information Science of Wuhan University, 2021, 46(11): 17371745(Ch). [Google Scholar]
 Chen J J, Li S, Liu D H, et al. AiRobSim: Simulating a multisensor aerial robot for urban search and rescue operation and training [J]. Sensors (Basel, Switzerland), 2020, 20(18): 5223. [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
 Tabib W, Goel K, Yao J, et al. Autonomous cave surveying with an aerial robot [EB/OL].[20220625]. https://arxiv.org/abs/2003.13883. [Google Scholar]
 Geneva P, Eckenhoff K, Lee W, et al. OpenVINS: A research platform for visualinertial estimation [C]// 2020 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2020: 46664672. [Google Scholar]
 Paul M K, Roumeliotis S I. Alternatingstereo VINS: Observability analysis and performance evaluation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 47294737. [Google Scholar]
 Qin T, Li P L, Shen S J. VINSmono: A robust and versatile monocular visualinertial state estimator [J]. IEEE Transactions on Robotics, 2018, 34(4): 10041020. [CrossRef] [Google Scholar]
 Rosinol A, Abate M, Chang Y, et al. Kimera: An opensource library for realtime metricsemantic localization and mapping [C]// 2020 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2020: 16891696. [Google Scholar]
 Campos C, Elvira R, Rodríguez J J G, et al. ORBSLAM3: An accurate opensource library for visual, visualinertial, and multimap SLAM [J]. IEEE Transactions on Robotics, 2021, 37(6): 18741890. [CrossRef] [Google Scholar]
 MurArtal R, Montiel J M M, Tardós J D. ORBSLAM: A versatile and accurate monocular SLAM system [J]. IEEE Transactions on Robotics, 2015, 31(5): 11471163. [CrossRef] [Google Scholar]
 MurArtal R, Tardós J D. ORBSLAM2: An opensource SLAM system for monocular, stereo, and RGBD cameras [J]. IEEE Transactions on Robotics, 2017, 33(5): 12551262. [CrossRef] [Google Scholar]
 Li T, Zhang H P, Gao Z Z, et al. Tight fusion of a monocular camera, MEMSIMU, and singlefrequency multiGNSS RTK for precise navigation in GNSSchallenged environments [J]. Remote Sensing, 2019, 11(6): 610. [NASA ADS] [CrossRef] [Google Scholar]
 Cao S Z, Lu X Y, Shen S J. GVINS: Tightly coupled GNSSvisualinertial fusion for smooth and consistent state estimation [J]. IEEE Transactions on Robotics, 2022, 38(4): 20042021. [CrossRef] [Google Scholar]
 Zhang C, Yang Z, Fang Q H, et al. FRLSLAM: A fast, robust and lightweight SLAM system for quadruped robot navigation [C]//2021 IEEE International Conference on Robotics and Biomimetics (ROBIO). New York: IEEE, 2022: 11651170. [Google Scholar]
 Zhang C, Yang Z, Liao L W, et al. RPEOD: A realtime pose estimation and object detection system for aerial robot target tracking [J]. Machines, 2022, 10(3): 181. [CrossRef] [MathSciNet] [Google Scholar]
 Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision [C]// Proceedings of the 7th International Joint Conference on Artificial Intelligence. New York: ACM, 1981: 674679. [Google Scholar]
 Shi J B, Tomasi. Good features to track [C]//1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2002: 593600. [Google Scholar]
 Leutenegger S, Lynen S, Bosse M, et al. Keyframebased visualinertial odometry using nonlinear optimization [J]. The International Journal of Robotics Research, 2015, 34(3): 314334. [CrossRef] [Google Scholar]
 Forster C, Pizzoli M, Scaramuzza D. SVO: Fast semidirect monocular visual odometry [C]// 2014 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2014: 1522. [Google Scholar]
 Engel J, Schöps T, Cremers D. LSDSLAM: Largescale direct monocular SLAM [C]// Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2014: 834849. [Google Scholar]
 Qin T, Li P L, Shen S J. Relocalization, global optimization and map merging for monocular visualinertial SLAM [C]// 2018 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2018: 11971204. [Google Scholar]
 Qin T, Shen S J. Robust initialization of monocular visualinertial estimation on aerial robots [C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2017: 42254232. [Google Scholar]
 Burri M, Nikolic J, Gohl P, et al. The EuRoC micro aerial vehicle datasets [J]. The International Journal of RoboticsResearch, 2016, 35(10): 11571163. □ [CrossRef] [Google Scholar]
All Tables
All Figures
Fig. 1 The aerial robot equipped with the RRVPE navigation system 1. Intel RealSense D435i camera: responsible for building realworld 3D voxel map; 2. Intel RealSense T265 camera: responsible for providing binocular video stream and inertial measurement information; 3. UBlox ZEDF9P receiver: responsible for receiving GNSS pseudorange, Doppler, ephemeris and time pulse information 

In the text 
Fig. 2 Main parallel threads of RRVPE system  
In the text 
Fig. 3 The aerial robot implementation scheme  
In the text 
Fig. 4 NVIDIA Jetson Xavier NX  
In the text 
Fig. 5 The change of APE as time goes on in sequence MH01  
In the text 
Fig. 6 Aerial robot navigation test carried out in the Gazebo simulator  
In the text 
Fig. 7 The realworld navigation environment conducted on a campus tennis court  
In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.