Henry Orlando Clavo, MBA

O

OpenCoreGPT: AI for Data Insights and ETL Workflows

Dec 2024

AI-Driven Data Profiling and ETL Automation: Develop an AI agent capable of profiling data files (CSV, Parquet, Excel) and generating customizable ETL workflows, leveraging LangChain for logic orchestration and local LLMs for privacy. Secure and Portable Solution: Design the system to be modular and extensible, allowing seamless integration into any data engineering project with minimal configuration, ensuring time efficiency and reusability. Deployment and Scalability: Enable portability through containerization (Docker) and deployable in local, on-premise, or cloud environments to suit diverse client requirements.

F

Federal Reserve Bank of Richmond: Cloud Implementation

Aug 2024 - Jan 2025 (6 months)

Providing consulting services in areas of cloud implementation and data extraction and ingestion process. Automation & Scalability: Deployed automation tools and frameworks to enhance efficiency, reduce operational costs, and enable scalable solutions for future growth. Stakeholder Collaboration: Worked closely with stakeholders to align cloud solutions with organizational objectives, ensuring a smooth transition and maximum ROI. Cloud Migration Strategy: Designed and executed a robust cloud migration strategy to transition legacy systems to a modern, scalable cloud infrastructure.

N

Northeastern University: Big Data Automation Pipeline

Jul 2024 - Aug 2024 (2 months)

Create and automate a data pipeline with CI/CD tests.

p

pigybak: Automated Testing and Risk Management Project

May 2024

Provide automated testing framework and Risk Management services for a private startup. Will publish work and recommendations for case study with founder.

D

Data as a Service (DaaS): A Modern Approach to Data Delivery

Jan 2024 - Oct 2024 (10 months)

In the era of digital transformation, organizations are increasingly relying on data to drive decision-making, optimize operations, and create new opportunities for growth. One emerging trend is the shift from static, in-house data management to Data as a Service (DaaS)—a model that allows businesses to access data on demand, without the need to manage complex infrastructure or dedicate resources to maintain vast data sets.

A

Automated SFTP Azure Blob Storage

Nov 2023

Set up Azure account and develop automation process to copy large files to the network.

A

Automated SFTP Azure Blob Storage

Nov 2023

Set up Azure account and develop automation process to copy large files to the network.

A

Adobe SSH Tunnel ETL Data Sync Pipeline

Oct 2023 - Nov 2023 (2 months)

Developed an idempotent data pipeline that has access to different database warehouses and transfers them to many different data lakes in the cloud.

S

SSH Tunnel ETL Data Sync Pipeline

Oct 2023 - Nov 2023 (2 months)

Developed an idempotent data pipeline that has access to different database warehouses and transfers them to many different data lakes in the cloud.

S

SSH Tunnel ETL Data Sync Pipeline

Oct 2023 - Nov 2023 (2 months)

Developed an idempotent data pipeline that has access to different database warehouses and transfers them to many different data lakes in the cloud.

A

Arvest Banklift Project Initiative

Oct 2023 - Dec 2023 (3 months)

Responsible to set up data strategy to migrate data from on-prem to Google Cloud services. Provide recommendation and consultation on data assets using Collibra, Splunk, and Google Cloud.

A

Adobe Commerce Database Project

Sep 2023 - Jan 2024 (5 months)

Build and develop data pipelines for users to adopt Adobe Commerce. Migrated three different data sources for management. Outcome: data is leveraged and used by Power B.I. to gather user insights, and determine next steps of user adoption.

A

AWS S3 Bucket Data Processing

Sep 2023

Developed data ingestion process in AWS to extract data and load multiple data files to different storage paths. It is flexible to process multiple file path using Databricks and Cluster for performance. The process is idempotent where the data process will download new data files daily. The data is processed and loaded in Azure Data Lake generation 2.

A

Adobe AWS S3 Bucket Data Processing

Sep 2023

Developed data ingestion process in AWS to extract data and load multiple data files to different storage paths. It is flexible to process multiple file path using Databricks and Cluster for performance. The process is idempotent where the data process will download new data files daily. The data is processed and loaded in Azure Data Lake generation 2.

w

watsonx.data UI

Aug 2023

Load data to data store utilize IBM's Presto and storage buckets to collect insights.

F

File Extraction Processor

Jul 2023

Developed file extractor tool in python to extract files from the cloud and load to Databricks Azure storage. Outcomes: Extracted 500 files and processed 10,000,000 records.

C

Cloud Sync Job

Jul 2023

Developed sync job in Databricks using python to process 500 daily files. Syncing daily files from an S3 bucket to load in Databricks.

D

Development Chatbot Course

May 2023

Developing Chatbox using AWS using python and Docker. Utilizing Natural Language Processing to extract text and using regular expression to manage files and data manipulation. Storing all the data extract to data lake and data architecture.

S

Splunk Cloud Integration

Apr 2023 - May 2023 (2 months)

Collected and transferred Okta data to Splunk. Created location in splunk to house Okta logs to review in dashboards.

D

Databricks ETL Migration Development

Feb 2023

- Migrated data from S3 bucket to Azure Databricks through API using python. - Migrate Salesforce data through Airflow, Mage using python and Restful API's from Databricks. - Developed python script to extract 500 compressed files.

S

Salesforce Use Case

Nov 2022 - Mar 2023 (5 months)

Created a data pipeline using Salesforce's restful A.P.I to collect data to publish in a database. Used Python's Simple-Salesforce, Pandas Dataframe to build a batch process. A data pipeline developed for any Salesforce development. Outcome: Data pipeline with two stages.

F

Financial Crimes Fraud Use Case

Jul 2022 - Feb 2023 (8 months)

AWS Platform development and support. Used AWS data sync, Athena, Cloud Watch, Glue Catalogue, and S3 to successfully process json files from on-premises location to the cloud. Using serverless tools in AWS to process files on a daily basis. Project was successfully implemented on February 9, 2023 in production.

P

Pipeline Development and Data Quality

Dec 2021 - May 2022 (6 months)

Personal project to develop data quality rules for data assets. Great Expectations module was used for proof of concept design. Custom business logic rules was created using templates to integrate QC processor technology. Outcome: Generated report on quarterly data, and published established rules for three different data assets. Implemented rules on QC processor to automate reports and push to S3 buckets for client.

C

CFPB Architecture Review Process

Nov 2021 - Dec 2021 (2 months)

Provided resolution to data assets related to consumer complaints and data related to consumer complaints. Created proposal on current architecture and provided analysis of growth of data, and solution to mitigate risk of data assets.

E

Elasticsearch Development

Nov 2021 - Jan 2022 (3 months)

Developing process to index multiple csv's files to Elasticsearch 7. Process and develop python script to query multi-indexes for business cases. Develop docker environment to replicate process for production. Refactor and develop incremental process for production pipeline. Implemented Elasticsearch process to index millions of records.

D

Data Quality & Governance

Jul 2021 - Oct 2021 (4 months)

Personal project to develop taxonomy and present findings to improve process. To promote good practices through development of key metrics in a heavily regulated industry. A challenge to make things better through research.

E

Elasticsearch

Apr 2021 - Apr 2022 (1 year 1 month)

Implemented Python data processing from database to Elasticsearch api for CFPB internal tool. Worked with developer to provide business requirements to correct 200+ data points affecting review of process. Improved product by refactoring code and testing product through staging pipeline. Assisted project members in areas for improvements and reducing development time through research. \Work with technical lead in Consumer Products Team to solve issues in data processing and present results to management. Project members: Irina Muchkin and Richard Dinh

M

Metadata Extraction

Jan 2021 - Jul 2021 (7 months)

Consumer Response and Technology & Innovation project to extract and organize data using OOP and Python to create visualizations through ETL( Extract, Transform, Load) to Postgres database. Contributors to the project and authors. Abbie Olson and Daniel Van Balen

C

CFPB Concurrency Project

Nov 2020 - Mar 2021 (5 months)

Completed script with contributors from Technology and Innovation group. To improve processing data points for consumer complaints. Data points efficiency was achieved and more processing time resources for OCR. Contributor to the project and support Christian Decker

C

CFPB Technology Implementation

Aug 2020 - Dec 2020 (5 months)

Process selected for review for CFPB director as key initiatives and accomplishments started in 2020. Joint effort with Technology and Innovation.

A

AWS ETL Automation Covid-19

Mar 2020 - May 2020 (3 months)

- Operational Lead to handle requests from data scientists - Process data to AWS cloud in S3 buckets and generate database tables for project. - Work with Program Manager to build scalable API architecture. -Initialize tables for data ingestion process for Tableau. - Train team members on python processing and QA scripts. -Work with group to develop database functionality in AWS and ETL process. Outcome: Product has been supported through funding and sold to government agencies. Process has been documented for the Discovery Lab to use for future projects.

Henry Orlando Clavo, MBA

About

Detail

Timeline

Résumé