Data Engineer
Personal Projects
Jan 2023 - Current (3 years 4 months)
• Built real-time and batch data pipelines using Python, Apache Airflow, and PostgreSQL for end-to-end ingestion, transformation, and enrichment. • Developed modular web scrapers and automated Gmail-like inbox monitors using Selenium, Playwright, and proxy rotation. • Structured clean JSON payloads from enriched data sources (APIs, web, emails) for downstream analytics and ML workflows. • Containerized pipelines with Docker and deployed to AWS, following CI/CD best practices and lightweight monitoring. • Applied asynchronous techniques and caching (Redis, in-memory) to reduce processing times and ensure job durability.