Data Engineer
Tata Consultancy Services Ltd
Sep 2018 - Jun 2022 (3 years 10 months)
end-to-end data analytics lifecycle, employing Python libraries (pandas, NumPy, Scikit-learn, Matplotlib) for data mining, cleaning, collection, model development, validation, and visualization. Improved data processing speed by 40% through the strategic implementation of Pyspark on large streaming datasets, showcasing a commitment to optimizing performance. Implemented advanced transformation logic using Big Data Technologies (Hadoop, MapReduce Frameworks, HBase, Hive), including table partitioning in Hive and optimizing data storage with ORC file formats in Hadoop. Automated ETL tasks, reducing manual intervention by 30% and enhancing operational efficiency. Utilized Airflow scheduling tools for timely execution. Applied EMR for large-sca