About
I'm a data engineer with 4+ years building production batch and analytical systems on AWS, and more recently end-to-end streaming lakehouses (Kafka, Spark Structured Streaming, Delta Lake) for market data. I work on idempotent ingestion, orchestration, and data-quality reconciliation at scale, from real-time process simulation and large-scale deployment orchestration to warehouse pipelines on dbt, Prefect, BigQuery, and Snowflake.
I studied Chemical Engineering at The Cooper Union (BE + ME) in New York, and that systems background shapes how I think about correctness and operational constraints.
Selected Work
-
bolt
Real-Time Market Data Platform code GitHub monitoring Dashboard
Streaming market-data lakehouse ingesting ~1,580 events/sec sustained (~37M trades, quotes, and aggregates per 6.5-hour session) for 104 equities via Polygon WebSockets → Kafka → Spark Structured Streaming → Delta Lake on Databricks → Snowflake. End-to-end idempotent delivery (coordinated Kafka offsets with atomic Delta checkpoints and event-specific MERGE keys) achieving 100% Delta-to-Snowflake row parity, ~50× faster Snowflake loads (12 min to ~15s on 7.4M rows), and dbt reconciliation marts validating streaming OHLCV against batch BigQuery ground truth.
Real-Time Market Data Platform: interactive Streamlit dashboard (live reconciliation & OHLCV metrics).
Hosted on Streamlit Community Cloud. May take a few seconds to wake.
-
candlestick_chart
Equity Data Warehouse code GitHub
Batch equity platform ingesting 500K+ daily bars, financials, dividends, and corporate actions for 104 equities over 20 years from Polygon into BigQuery via idempotent Parquet-first ingestion and Prefect orchestration. SCD2 security master with point-in-time identity resolution (FB→META, MWD→MS) and analytical dbt marts for TTM aggregation, split-adjusted pricing, and factor ranking, enforced by 43 dbt tests.
-
science
Real-Time Analytical Platform
Built a real-time analytical platform on AWS Lambda processing operational time-series data across 4 industrial facilities (50+ input time series per facility), with configurable numerical workflows, drift-adjusted forecasting, and optimization-based parameter inference.
-
rocket_launch
Scaled Model Deployment Orchestration
Engineered a staged orchestration system to deploy 10k asset monitoring models in ~2 months (vs. ~8–10 months manually), using dependency-aware sequencing and telemetry-driven throughput control to eliminate deployment instability.
Tech Stack
Resume
Links
- link
- code