← udacity portfolio
ml cv other · deep learning · November 2019

Caffe PoseNet (fork)

Forked the Caffe implementation of PoseNet (regress 6-DOF camera pose directly from a single RGB image). Pre-PyTorch era of CV research.

What it was

Fork of alexgkendall/caffe-posenet — the original Caffe implementation of Kendall et al.’s PoseNet (ICCV 2015), one of the first end-to-end neural approaches to camera relocalization. Forked during a Robot Perception course to run it on a custom dataset.

What PoseNet does

Single RGB image in, 6-DOF camera pose (3D position + quaternion orientation) out. The architecture is a modified GoogLeNet with the classification head swapped for two regression heads (position + quaternion). Trained per-scene on labeled image→pose pairs.

Loss is a weighted sum of position L2 + quaternion-angle:

L = ||x̂ - x||₂ + β · ||q̂ - q||₂

Tuning β is the per-scene fiddly bit.

What was actually tricky

What I’d do differently with hindsight

What it taught me

The framework matters less than the math. Caffe was a hassle but the PoseNet idea (regress pose end-to-end from raw pixels) was a clean formulation that turned out to be a research dead-end — and that’s a useful lesson too. Many of the “first” approaches to a problem age poorly; the abstractions that age well are the loss functions and the datasets, not the architectures.


Source archive: Shivam-Bhardwaj/caffe-posenet (archived)
Writeup last touched: 2026-05-22