What it did
Used the AprilTag fiducial system as the localization input for an Indoor robotics class. Pin a known-size tag to a wall, point a calibrated camera at it, recover the camera’s pose relative to the tag in real-time. This was an upstream wrapper repo I forked to make a small build/calibration tweak — not original work.
How it’s used
- Calibrate the camera once (
cv2.calibrateCamerafrom a chessboard pattern) — produces intrinsicsKand distortion coefficients. - AprilTag detector returns 4 corner pixels per tag.
solvePnPrecovers the 6-DOF tag→camera transform from the corners- known tag size.
- Compose with a known tag→world transform to get camera→world.
What was actually tricky
- Tag-family selection matters. The 36h11 family is bigger (longer detection range, fewer per-frame tags) vs. 16h5 (lots of tags but short range). Pick based on environment, not preference.
- Tag size measurement is the dominant source of pose error. Measure with calipers, not “I printed it letter-paper-size.”
- Rolling-shutter cameras introduce skew when the tag moves
quickly — the corners get distorted into a non-rectangle and
solvePnPlies. Global-shutter is what you actually want.
What I’d do differently with hindsight
- Use ArUco markers in the OpenCV mainline (cv2.aruco) — same approach, no third-party wrapper needed, well-maintained.
- Use ChArUco for camera calibration too, not chessboard. Detects partial views and gives much better intrinsics.
- Fuse with IMU when used on a moving platform. Tag detection drops to 0 Hz when the camera blurs; IMU prediction fills the gap.
What it taught me
Fiducials are the cheap-and-reliable answer when the environment permits — far more robust than feature-based localization for indoor robots. The lesson generalized: when you can engineer the world (tags, light, color), don’t over-engineer the perception. Save the CV firepower for the cases you can’t control.