ORB-SLAM2 (build + experiment) · Udacity

Built and ran the canonical ORB-SLAM2 stack (Mur-Artal et al.) on a custom monocular dataset during a Robot Perception course. Forked the upstream.

What it was

A fork of raulmur/ORB_SLAM2, the reference implementation from the famous Mur-Artal et al. SLAM paper. The project was less “build a SLAM system” and more “stand the canonical one up, run it on your own video, learn what each component does by watching it work and fail.”

What ORB-SLAM2 does

Tracking thread. ORB keypoint detection + descriptor matching against the local map; PnP for the current frame’s pose.
Local mapping thread. Triangulate new map points; local bundle adjustment over a window of recent keyframes.
Loop closing thread. DBoW2 visual-word bag matches the current frame against historical keyframes; if a loop is detected, run a pose-graph optimization to correct accumulated drift.

The whole thing runs at 25-30 Hz on a laptop CPU. No GPU. No deep learning.

What was actually tricky

Building the dependency stack — OpenCV 3.x with the right xfeatures2d build, Eigen pinned to a specific version, Pangolin built from source, DBoW2 + g2o vendored as submodules. The CMake config was where most of the time went.
Scale ambiguity. Monocular SLAM is inherently up-to-scale — the reconstruction is correct but the units are arbitrary. RGB-D or stereo gives metric scale; mono does not.
Initialization is brittle. ORB-SLAM2 needs enough parallax in the first few frames to triangulate. Move the camera too slowly at startup and it sits in “INITIALIZING” forever.

What I’d do differently with hindsight

Use ORB-SLAM3 (released 2020). Adds visual-inertial mode, multi-map support, and far better robustness — all backwards- compatible with ORB-SLAM2 datasets.
Pair with IMU for monocular cases. VI-SLAM is night-and-day more robust than mono SLAM.
For learning, also build a toy SLAM from scratch. ORB-SLAM2 is too big to internalize — but coding up “PnP + g2o pose-graph” on a half-page makes the math click.

What it taught me

SLAM is the single hardest problem in mobile robotics, and it has a generation of mature open-source implementations. Most application code should consume those, not re-invent. The valuable thing to know is “what each thread does and why” — which lets you debug when it fails — not “how to write a bundle adjustment from scratch.”