Sai Achyuth Dasari
Sai Achyuth Dasari
About
Detail
data engineer
Albany, New York, United States
• 4 years of professional IT experience in Data Engineer in Big Data Environment, Hadoop Ecosystem with experience including Apache Spark, Hive, Sqoop, and Python. • Sound Experience with AWS services like Amazon EC2, S3, EMR, Amazon RDS, VPC, Amazon Elastic Load Balancing, IAM, Auto Scaling, Cloud Front, CloudWatch, and Lambda to trigger resources. • Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse to control and grant database access. • Good experience with Azure services like HDInsight, Stream Analytics, Active Directory, Blob Storage, Cosmos DB, Storage Explorer. • Strong Hadoop and platform support experience with all the entire suite of tools and services in major Hadoop Distributions – Cloudera, Amazon EMR, Azure HDInsight, and Hortonworks. • Proficient in handling and ingesting terabytes of Streaming data (Kafka, Spark streaming, Strom), Batch Data, Automation and Scheduling (Oozie, Airflow). • Profound knowledge in developing production-ready Spark applications using Spark Components like Spark SQL, , DataFrames, Datasets, Spark-ML and Spark Streaming. • Expertise in developing multiple confluent Kafka Producers and Consumers to meet business requirements. Store the stream data to HDFS and process it using Spark. • Strong working experience with SQL and NoSQL databases (Cosmos DB, MongoDB, HBase, Cassandra), data modeling, tuning, disaster recovery, backup and creating data pipelines. • Experienced in scripting with Python (PySpark), Scala and Spark-SQL for development, aggregation from various file formats such as XML, JSON, CSV, Parquet. • Great experience in data analysis using HiveQL, Hive-ACID tables, Pig Latin queries, custom MapReduce programs and achieved improved performance. • Extensive knowledge in all phases of Data Acquisition, Data Warehousing (gathering requirements, design, development, implementation, testing, and documentation), Data Modeling (analysis using Star Schema and Snowflake for FACT and Dimensions Tables), Data Processing and Data Transformations (Mapping, Cleansing, Monitoring, Debugging, Performance Tuning and Troubleshooting Hadoop clusters). • Experience in monitoring document growth and estimating storage size for large MongoDB clusters as part of the data life cycle management. • Expertise in creating Kubernetes cluster with cloud formation templates and PowerShell scripting to automate deployment in a cloud environment. • Sound knowledge in developing highly scalable and resilient Restful APIs, ETL solutions, and third-party integrations as part of Enterprise Site platform using Informatica. • Experience in using bug tracking and ticketing systems such as Jira, and Remedy, used Git and SVN for version control. • Highly involved in all facets of SDLC using Waterfall and Agile Scrum methodologies. • Involved in migration of the legacy applications to cloud platform using DevOps tools like GitHub, Jenkins, JIRA, Docker, and Slack. • Experience in maintaining the entire Hadoop cluster on AWS EMR