A production‑style platform that turns raw mobility signals (transit, traffic, micromobility) into live insights. This page explains the architecture, tools, and choices behind the pipeline before you jump into the code.
An end‑to‑end urban mobility analytics workflow covering streaming ingestion, validation, geospatial enrichment, feature engineering, forecasting, CI/CD, APIs, dashboards, and monitoring.
Streaming Ingestion & Quality: Kafka topics for traffic/transit feeds; Spark Structured Streaming; Great Expectations for schema/drift.
Geospatial Enrichment: GTFS & OpenStreetMap joins; grid/hex (H3) aggregation; GeoParquet/Delta.
Features & Forecasting: Reproducible transformations; demand & ETA models (XGBoost/Prophet); parity across train/serve.
Packaging & CI/CD: Docker images; GitHub Actions for tests, linting, and pipeline runs.
APIs & Dashboards: FastAPI endpoints for queries; Superset/Grafana for city‑wide KPIs & maps.
Monitoring: Data freshness SLAs; model & service metrics; alert hooks.
High‑level flow: Streams → Validation → Geo/Features → Forecasting/Tracking → Packaging/CI → APIs/Dashboards → Monitoring.
Tools chosen for reliability, reproducibility, and smooth hand‑off from experimentation to production.
Live congestion & demand metrics with geospatial layers.
Kafka → Spark Structured Streaming with quality gates.
MLflow‑tracked models for demand and ETAs.
Prereqs: Docker, Make, Python 3.10+
git clone https://github.com/airdmhund1/city-mobility-pulse
cd city-mobility-pulse
make setup # install deps / pre-commit
make stream # run Kafka + Spark streaming locally
make api # start FastAPI for mobility queries
make dash # launch Superset/Grafana
make test # unit tests & data checks
Environment variables and config files are documented in the repo’s README.