Data Engineer public Remote experience
Capital One
Apr 2020 - Current (5 years 2 months)
Engineered and maintained scalable data pipelines using AWS Glue, Apache Spark, Azure Data Factory, focusing on efficient
data extraction, transformation, and loading, and processing over 5TB of data daily.
Led real-time data processing with Apache Kafka, Spark Streaming, PySpark, and Apache Flink, deriving insights from
streaming data sources that enhanced decision making process.
Collaborated with data science teams on Databricks and Azure Databricks platforms for data transformation, analysis, and
machine learning model deployment.
Managed cloud-based data warehouses including Amazon Redshift, Azure SQL DB, and Snowflake, implementing indexing,
partitioning, and optimization techniques that improved data access speed by 30%.
Designed