y

ybalasai reddy

About

Detail

Data Engineer
United States

Contact ybalasai regarding: 
work
Full-time jobs

Timeline


work
Job

Résumé


Jobs verified_user 0% verified
  • MetLife
    DATA ENGINEER
    MetLife
    Jun 2023 - Current (3 years)
    • Accomplished data importation from diverse sources, designing and developing batch processing data pipelines on Amazon EMR using Apache Spark and Python to process terabytes of data in a cost-effective and scalable manner. • Engineered efficient data processing ETL pipelines with AWS services, reducing data ingestion time by 20% and enhancing data availability. • Developed AWS Lambda functions in Python to perform transformations and analytics on datasets in EMR clusters, enhancing processing efficiency. • Generated and maintained a scalable data infrastructure leveraging AWS cloud technologies, leading to a 50% increase in data processing capacity and a reduction in infrastructure costs. • Designed and implemented bronze, silver, an
  • Cognizant
    DATA ENGINEER
    Cognizant
    Nov 2019 - Jul 2021 (1 year 9 months)
    • Utilized PyCharm and Jupyter Notebook on Azure for streamlined code development, data exploration, and analysis, leveraging Python libraries such as Matplotlib, SciPy, Scikit-learn, Seaborn, and TensorFlow for model creation, data manipulation, and visualization. • Designed and executed data warehousing solutions on Azure utilizing Python alongside relevant technologies to ensure optimized data storage and retrieval for sophisticated big data analytics. • Architected and deployed scalable data pipelines on Azure Data Factory and Databricks, utilizing PySpark for enhanced performance and efficiency, handling large data volumes (petabytes) and facilitating comprehensive data analytics at reduced costs. • Managed Spark, Hive, and Sqoop w
  • Birlasoft
    Jr. DATA ENGINEER
    Birlasoft
    Jan 2018 - Oct 2019 (1 year 10 months)
    • Conducted analysis of SQL scripts and devised solutions for implementation using PySpark. • Developed ETL jobs using AWS Glue Studio to extract data from Workday via APIs, transform it using Python and PySpark scripts, and load the transformed data into S3 buckets for reporting. • Employed HiveQL to process and analyze 2TB of data, leading to a 15% enhancement in sales/performance trend prediction accuracy. • Orchestrated an automated custom workflow for streamlining the ETL process using Apache Airflow, reducing manual labor costs by 30%, and designed efficient data models in DBT. • Optimized AWS EMR cluster configurations to reduce processing time for a 100TB dataset and achieve significant savings. • Performed data cleansing and
Education verified_user 0% verified
  • A
    AWS Certified Cloud Practioner, Google Data Analytics, Python for Data Science
  • The University of Texas
    Master Of Science
    The University of Texas
  • M
    Bachelor Of Science
    Madanapalli Institute of Technology and Science