Traffic Sign Detector (TensorFlow) · Udacity

CNN classifier on the German Traffic Sign Recognition Benchmark — the Udacity SDC project that introduced TensorFlow before Keras was the default.

What it did

Train a CNN to classify 32×32 images of German traffic signs into 43 classes. Targets in the GTSRB dataset are real photos with mixed lighting, partial occlusions, and class imbalance. Project rubric was ≥93% validation accuracy.

The architecture

LeNet-ish, deliberately small:

Input (32×32×3)
  → Conv5×5 (6 filters) + ReLU + MaxPool 2×2
  → Conv5×5 (16 filters) + ReLU + MaxPool 2×2
  → Flatten → FC(120) → ReLU
  → FC(84) → ReLU
  → FC(43)

Cross-entropy loss, Adam optimizer, batch size 128, ~20 epochs to saturate.

What was actually tricky

Class imbalance. GTSRB has 43 classes but the distribution is long-tailed — some classes have 2,000+ samples, others have under 200. Without weighted loss or augmentation, the model learns the head and ignores the tail.
TensorFlow's session API. Pre-Keras default, you had to wire the graph + the session + the feed-dict manually. A single typo in placeholder names produces a silent shape mismatch buried in the stack trace.
Validation accuracy ≠ real accuracy. The val split was a hold-out of the same recording sessions; the model overfit to the lighting more than the signs. Performance on phone-camera photos was much worse.

What I'd do differently with hindsight

Start with a small pretrained ResNet18. Fine-tuning ImageNet features for ~1 hour beats training from scratch for ~3 hours, on this small a dataset.
Augment aggressively. Random brightness, contrast, perspective, occlusion — GTSRB is rotation-symmetric enough that horizontal flip doesn't apply, but everything else helps.
Use a proper CV strategy. k-fold split by recording session (not by frame) prevents the same sign appearing in train and val.

What it taught me

The first time I trained a CNN end-to-end. The shock wasn't that it worked — it was how aggressively the model memorized the training distribution and failed on anything just outside it. That experience informed every later ML project: spend the first hour finding the weird examples in your data; the model will too.