R

Roshan Sah

About

Detail

Data Engineer
Haslet, Texas, United States

Contact Roshan regarding: 
work
Full-time jobs
Starting at USD98k/year

Timeline


work
Job
school
Education

Résumé


Jobs verified_user 0% verified
  • ATT
    Data Engineer
    ATT
    Sep 2022 - Current (3 years 9 months)
    • Design and execute robust, scalable data pipelines utilizing AWS Glue, Lambda, Kinesis, along with Apache Kafka for real-time data streaming. Ensure efficient data management and storage for expansive data sets. • Optimize and expand AWS S3 data lakes and AWS Redshift data warehouses. Apply sophisticated methods in data partitioning, compression, and enhance query performance for voluminous data handling. • Integrate Apache Spark and Hadoop ecosystem within AWS EMR for intricate data processing operations, including both batch and stream processing. Guarantee optimal resource utilization and heightened performance for data clusters. • Automate and monitor ETL/ELT workflows, leveraging AWS Lambda, Step Functions, and Apache Kafka to fac
  • Mastercard
    Data Engineer
    Mastercard
    Sep 2021 - Aug 2022 (1 year)
    • Designed and implemented data pipelines using Azure Data Factory (ADF) to efficiently ingest and transform financial transaction data from various sources into Azure Data Lake Storage. • Leveraged Apache Kafka for real-time data streaming, ensuring timely ingestion of high-velocity data into the Azure environment. • Developed PySpark scripts on Azure Databricks to process and transform large datasets, optimizing for performance and scalability. • Integrated Snowflake as the primary cloud data warehouse solution on Azure, focusing on scalable data storage and efficient querying capabilities. • Utilized Snowflake's SnowSQL and native connectors to move data seamlessly between Azure Data Lake and Snowflake, streamlining ETL processes. •
  • Fusemachines
    Data Engineer / Analyst
    Fusemachines
    May 2017 - Jul 2021 (4 years 3 months)
    • Designed and built a data platform on AWS handling over 5 TB of streaming data daily from IoT devices. • Developed PySpark ETL pipelines to extract and enrich 130 million rows of sensor data from Kafka and load them into Redshift. • Migrated the analytics database from on-prem SQL Server to Redshift using DMS and optimized performance by 35% • Built a Delta Lake architecture on Databricks for a customer 360 machine learning initiative involving billions of data points. • Led migration of legacy data workflows from Oozie to Airflow with 85% improved system stability • Automated CI/CD for big data ETL jobs using Jenkins and Kubernetes leading to 62% quicker deployments. • Combined datasets from Oracle, MySQL, MongoDB, and CSVs into a
Education verified_user 0% verified
  • University of North Texas
    Master of Science
    University of North Texas
    Jan 2021 - Jan 2023 (2 years 1 month)
  • H
    Bachelor of Science
    Hajee Mohammad Danesh Science and Technology University
    Jan 2013 - Jan 2017 (4 years 1 month)