Armaan Thapar

Data Engineer | Streaming Lakehouses · Kafka · Spark · Delta Lake

About

I'm a data engineer with 4+ years building production batch and analytical systems on AWS, and more recently end-to-end streaming lakehouses (Kafka, Spark Structured Streaming, Delta Lake) for market data. I work on idempotent ingestion, orchestration, and data-quality reconciliation at scale, from real-time process simulation and large-scale deployment orchestration to warehouse pipelines on dbt, Prefect, BigQuery, and Snowflake.

I studied Chemical Engineering at The Cooper Union (BE + ME) in New York, and that systems background shapes how I think about correctness and operational constraints.

Selected Work

monitoring

Real-Time Market Data Platform: interactive Streamlit dashboard (live reconciliation & OHLCV metrics).

Hosted on Streamlit Community Cloud. May take a few seconds to wake.

Tech Stack

Python SQL Kafka Spark Structured Streaming Delta Lake dbt Snowflake BigQuery Databricks Redshift AWS Lambda Prefect Airflow Docker

Resume