Armaan Thapar

Data Engineer | AWS · Python · SQL | Kafka · Spark · Delta Lake

mail Email GitHub LinkedIn

About

Data Engineer at Georgia-Pacific with 4+ years building production batch and analytical systems on AWS, and more recently end-to-end streaming lakehouses (Kafka, Spark Structured Streaming, Delta Lake) for market data. I work on idempotent ingestion, orchestration, and data-quality reconciliation at scale, from real-time process simulation and large-scale deployment orchestration to warehouse pipelines on dbt, Prefect, BigQuery, and Snowflake.

I studied Chemical Engineering at The Cooper Union (BEng + MEng) in New York, and that systems background shapes how I think about correctness and operational constraints.

Selected Work

Real-Time Market Data Platform: Streaming vs Batch reconciliation dashboard preview

Real-Time Market Data Platform: interactive Streamlit dashboard (live reconciliation & OHLCV metrics).

Hosted on Streamlit Community Cloud. May take a few seconds to wake.

Tech Stack

Python SQL Kafka Spark Structured Streaming Delta Lake dbt Snowflake BigQuery Databricks Redshift AWS Lambda Prefect Airflow Docker

Resume