LIDAR Object Clustering · Udacity

Filter ground, build a k-d tree, Euclidean-cluster obstacles. C++ from scratch on PCL — no ML, all classical geometry.

What it did

Process raw LIDAR .pcd frames into bounding boxes around discrete obstacles (cars, pedestrians, cyclists) on a moving ego vehicle, then render the result. The whole pipeline was hand-written in C++ on top of PCL — no ML, no deep learning, no black boxes.

The pipeline

Voxel-grid downsample. Raw LIDAR returns ~100k points per frame; most are spatial noise or redundant samples of the same surface. A voxel grid reduces this to ~5–10k while preserving structure.
Region-of-interest crop. Drop everything outside an axis-aligned box centered on the ego vehicle — no need to cluster the sky or the side of the road 50 m away.
Custom RANSAC ground-plane filter. Hypothesize a plane from 3 random points, count inliers, repeat N times, keep the best plane. This separates the road surface from "everything else."
k-d tree construction on remaining points. Built from scratch (not the PCL one) — recursive 3-D median splits, point-to-point nearest-neighbor query.
Euclidean clustering. For each unvisited point, BFS-flood through neighbors within distance d; emit each connected component as a cluster.
Bounding boxes + render. Axis-aligned boxes around each cluster, drawn with PCL's viewer.

What I'd do differently with hindsight

Skip the custom RANSAC + k-d tree. The "from scratch" requirement was good pedagogically, but PCL's SACSegmentation and KdTreeFLANN are within rounding-error of hand-rolled versions on a single thread, and parallelize properly under the hood. Outside of coursework the re-implementation is dead weight.
Oriented bounding boxes, not axis-aligned. AABBs over-estimate the footprint of cars turning across the field of view. PCA on the cluster points gives a tight oriented box for almost no extra cost.
Track between frames. A single-frame clusterer wastes the temporal signal. Hungarian-matching cluster centroids across consecutive frames + a constant-velocity Kalman per track is the natural next step — and connects directly to the Extended Kalman Filter project on the same nanodegree.
Embed-then-cluster. Modern (2024+) approaches run the clouds through a learned encoder (e.g. PointNet++ or a sparse-conv backbone) and cluster in the embedding space. Geometry-first is still a defensible baseline; learned-first is where the field went.

What it taught me

Classical 3D geometry has a craft to it: knowing when a problem factors cleanly into voxelize → fit plane → tree → cluster, and how each step's hyperparameter (voxel size, RANSAC iterations, cluster tolerance) trades precision for runtime. The lesson generalized: don't reach for the deep model before you've spent an hour with a notebook and the data.