NYC Taxi MLOps

End‑to‑End ML System: Data Ingestion → Training → Deployment → Monitoring

A production‑style project that turns raw NYC taxi trips into predictions and insights. This page explains the architecture, tools, and choices behind the pipeline before you jump into the code.

Read Overview Quick Links

Project Overview

An end‑to‑end MLOps workflow for NYC taxi trip data covering ingestion, validation, feature engineering, model training, experiment tracking, CI/CD, deployment, and monitoring.

What this project showcases

Data Ingestion & Quality: Batch ingestion with Spark; data validation with Great Expectations; schema & drift checks.

Feature Engineering: Reproducible transformations; partitioning; feature parity across train/serve.

Model Training & Tracking: Scikit‑learn/XGBoost with MLflow tracking, metrics, and artifacts.

Packaging & CI/CD: Docker images; GitHub Actions for tests, linting, and pipeline runs.

Deployment & Serving: FastAPI service for prediction; environment‑based configs; IaC‑ready layout.

Monitoring: Data freshness & quality checks; model performance tracking; alert hooks.

Architecture (at a glance)

High‑level flow: Ingestion → Validation → Feature Engineering → Training/Tracking → Packaging/CI → Serving → Monitoring.

Tech Stack

Tools chosen for reliability, reproducibility, and smooth hand‑off from experimentation to production.

Data & Features

Spark, Pandas
Great Expectations
Parquet, Partitioning
Feature parity train/serve

ML & Tracking

Scikit‑learn, XGBoost
MLflow (experiments, artifacts)
Model registry (optional)
Evaluation & drift checks

Ops & Delivery

Docker, Makefile
GitHub Actions (CI/CD)
FastAPI service
IaC‑ready (Terraform layout)

Screenshots & Results

MLflow Experiments

Tracked runs, parameters, metrics, and artifacts for full reproducibility.

Data Quality Checks

Great Expectations suite catching schema issues and outliers early.

CI/CD Pipeline

GitHub Actions: tests, linting, image build, and deploy steps.

Quick Links

How to Run (Local)

Prereqs: Docker, Make, Python 3.10+

git clone https://github.com/airdmhund1/nyc-taxi-mlops
                            cd nyc-taxi-mlops
                            make setup   # install deps / pre-commit
                            make train   # run training with MLflow tracking
                            make serve   # start FastAPI for predictions
                            make test    # unit tests & data checks

Environment variables and config files are documented in the repo’s README.

Get In Touch

I'm currently open to opportunities and collaborations. Feel free to reach out!

Contact Information

Email

eadasamoah@yahoo.com

Phone

+1 (000) 000-0000

Location

San Francisco, CA

Availability

Open to opportunities