End-to-End AWS ETL Pipeline with Apache Airflow
Oct 2025 - Current (10 months)
⚙️ Workflow Sequence: 🔹 The project begins with Data Cleaning & Validation of raw CSV files using an AWS Glue ETL Job Notebook, ensuring the datasets are consistent and analysis-ready. 🔹 The cleaned files were then stored securely in Amazon S3, serving as the central data lake for the pipeline. 🔹 Next, I created an AWS Glue Database and configured Crawlers to automatically infer schema and build data catalogs for the cleaned datasets — making them queryable in AWS Athena. 🔹 Using Amazon Athena, I wrote multiple SQL queries to extract valuable insights and perform analytical operations such as data filtering, aggregation, and optimization directly on S3 data. 🔹 For workflow automation, I integrated Apache Airflow locally through Docker