Python Data Engineer | AWS & Spark public Remote experience
EvolutionCode
Mar 2025 - Current (1 year)
Worked with international, cross-functional teams across the United States, China, and Japan, delivering data solutions primarily for KDDI, using English as the main communication language.
Optimized large-scale data processing pipelines by migrating from Pandas to Apache Spark, reducing processing time for 30M+ records in an AWS Glue job from 1h10m to 2 minutes, while cutting the required number of workers by 50%.
Designed and built scalable ETL systems using AWS Glue and PySpark, processing and transforming millions of records between Amazon S3 data layers.
Utilized Pandas for data analysis, validation, and preprocessing in auxiliary and non-production stages of ETL workflows.
Automated and orchestrated ETL workflows using AWS Step Fu