Data Engineer
Larsen Toubro Infotech Ltd LTI
Apr 2021 - Dec 2021 (9 months)
Engineered and maintained components for HDFS, Hive, Spark, and Kafka, handling an average of 1TB of data daily and improving data throughput by 25%.
Implemented parallelization strategies in Hive, optimizing over 500 searches and reducing query times by up to 40%. Spearheaded a proof-of-concept cluster implementation for HBase, improving its performance and reducing its drawbacks by
30%.
Built and managed 100+ Hive target tables using HQL, facilitating the analysis of over 500GB of semi-structured data through
PIG Latin Scripts.