Ahmad Khan Senior Data Engineer at Northwell Health

Senior Data Engineer

Northwell Health

Apr 2022 - Current (4 years 3 months)

● Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive & Map Reduce Jobs for processing data for building recommendation Engines, Transactional fraud analytics and behavioral insights. ● Worked on processed data from different sources to AWS Redshift using EMR - Spark, Python programming. ● Worked on ETL jobs through Spark with SQL, Hive, Streaming & Kudu Contexts. ● Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming. ● Worked on validation of data transformations and perform end-to-end data validation for ETL and BI systems. ● Worked on data pipeline used for all the Transformation of SQL and Python Scripts loads in Redshift for Incremental load process and ETL-Ta

Senior Azure Data Engineer

Grange Insurance

Nov 2019 - Apr 2022 (2 years 6 months)

● Developed Spark applications using Scala and Spark-SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. ● Involved in integrating HBase with Spark to import data into HBase and also performed some CRUD operations on HBase. ● Developed Python scripts to do file validations in Databricks and automated the process using ADF. ● Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure Synapse, Azure SQL Data warehouse, write-back tool and backwards. ● Migrating on-Prem ETLs from MS SQL server to Azure Cloud

Data Engineer

Cargill

Jun 2018 - Oct 2019 (1 year 5 months)

● Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python. ● Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive & Map Reduce Jobs for processing data for building recommendation Engines, Transactional fraud analytics and Behavioral insights. ● Worked on Analyzing and Developing Complex SQL queries, Stored Procedures, ETL Mapping for application development. ● Developed the code for Importing and exporting data into HDFS and Hive using Sqoop. ● Worked on AWS and BIG Data Technologies like HDFS, HIVE, SQOOP, EMR, SPARK AWS, REDSHIFT, EMR, EC2, DATA PIPELINE. ● Developed a Python script to integrate DDL changes between on-Prem Talend warehouse and sn

Data Engineer

Magellan Health

Jan 2017 - May 2018 (1 year 5 months)

● Worked on delivering major Hadoop ecosystem Components such as Pig, Hive, Spark. ● Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. ● Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed. ● Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW. ● Creating ETL mappings and enhancing existing mappings to facilitate the data load in system. ● Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP. ● Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. ● Importing da

Hadoop Developer

Sonata Software

Aug 2016 - Nov 2016 (4 months)

● Configured Hadoop components (Map Reduce, HDFS, CDH3, Pig, Hive, HBase, Zookeeper) and developed Java Map Reduce jobs for data processing. ● Imported/exported data between HDFS and Hive using Sqoop, managed NoSQL databases, and optimized Map Reduce programs on the cluster. ● Loaded data from the UNIX file system to HDFS, installed Hive, created tables, and executed queries for data analysis. ● Implemented CDH3 Hadoop cluster, administered it, and designed HBase tables to handle diverse data formats, including PII. ● Utilized Pig scripts for complex data processing, coordinated cluster services using Zookeeper, and exported analyzed data with Sqoop. ● Assisted in setting up the QA environment, configured Pig and Sqoop scripts, and ensured

Ahmad Khan

About

Detail

Timeline

Résumé