Senior Data / Pipeline Engineer with 9+ years of experience designing and building production-grade Python data pipelines powering AI-driven analytics platforms across SaaS, fintech, healthcare, and public-sector datasets. Expert in Python 3.11, PyArrow, Apache Arrow, Parquet schema design, REST API integration at scale, async processing (asyncio), JSON/JSONL transformation, and AWS-based data infrastructure (EC2, S3, Docker).
Extensive experience architecting ingestion pipelines from scratch, handling pagination, rate limiting, retries, incremental processing, schema normalization, deduplication, entity extraction, and LLM API integration. Strong background in structured and semi-structured data transformation, geographic aggregation, distributed processing, and production pipeline reliability.
Proven ability to work autonomously in startup-style, async-first environments with high ownership, direct founder collaboration, minimal overhead, and high-impact architectural decision-making.
Fluent in English with daily collaboration across US Eastern time zones.