What it did
Process raw LIDAR .pcd frames into bounding boxes around discrete
obstacles (cars, pedestrians, cyclists) on a moving ego vehicle, then
render the result. The whole pipeline was hand-written in C++ on top
of PCL — no ML, no deep learning, no
black boxes.
The pipeline
- Voxel-grid downsample. Raw LIDAR returns ~100k points per frame; most are spatial noise or redundant samples of the same surface. A voxel grid reduces this to ~5–10k while preserving structure.
- Region-of-interest crop. Drop everything outside an axis-aligned box centered on the ego vehicle — no need to cluster the sky or the side of the road 50 m away.
- Custom RANSAC ground-plane filter. Hypothesize a plane from 3 random points, count inliers, repeat N times, keep the best plane. This separates the road surface from “everything else.”
- k-d tree construction on remaining points. Built from scratch (not the PCL one) — recursive 3-D median splits, point-to-point nearest-neighbor query.
- Euclidean clustering. For each unvisited point, BFS-flood
through neighbors within distance
d; emit each connected component as a cluster. - Bounding boxes + render. Axis-aligned boxes around each cluster, drawn with PCL’s viewer.
What I’d do differently with hindsight
- Skip the custom RANSAC + k-d tree. The “from scratch” requirement
was good pedagogically, but PCL’s
SACSegmentationandKdTreeFLANNare within rounding-error of hand-rolled versions on a single thread, and parallelize properly under the hood. Outside of coursework the re-implementation is dead weight. - Oriented bounding boxes, not axis-aligned. AABBs over-estimate the footprint of cars turning across the field of view. PCA on the cluster points gives a tight oriented box for almost no extra cost.
- Track between frames. A single-frame clusterer wastes the temporal signal. Hungarian-matching cluster centroids across consecutive frames + a constant-velocity Kalman per track is the natural next step — and connects directly to the Extended Kalman Filter project on the same nanodegree.
- Embed-then-cluster. Modern (2024+) approaches run the clouds through a learned encoder (e.g. PointNet++ or a sparse-conv backbone) and cluster in the embedding space. Geometry-first is still a defensible baseline; learned-first is where the field went.
What it taught me
Classical 3D geometry has a craft to it: knowing when a problem factors cleanly into voxelize → fit plane → tree → cluster, and how each step’s hyperparameter (voxel size, RANSAC iterations, cluster tolerance) trades precision for runtime. The lesson generalized: don’t reach for the deep model before you’ve spent an hour with a notebook and the data.