Henry Orlando Clavo, MBA

Henry Orlando Clavo, MBA

About

Detail

Data Engineer at Consumer Financial Protection Bureau
Charlottesville, Virginia, United States

Contact Henry regarding: 
work
Full-time jobs

Timeline


work
Job
school
Education
folder
Project (professional or personal)
flag
Award
auto_stories
Publication

Résumé


Jobs verified_user 0% verified
  • E
    Course Developer / Author
    Educative Inc
    Jun 2023 - Current (2 years 2 months)
    Coming Soon: Chatbot Engineering course in development Develop your own chatbox and utilizing large language models for your project, website, and cloud services.
  • C
    Data Engineer
    Consumer Financial Protection Bureau
    May 2020 - Current (5 years 3 months)
    Supporting the agency with modernization efforts and technology implementation. Implementation of a new technology that added new features for release in 2021. A joint effort with Consumer Response and Technology & Innovation. Daily use of postgres, docker, SQL, Python, AWS(EMR,S3,EC2), and Elasticsearch. 2021 projects: - Provide support to Complaint Analytics. - Developing docker environments and settings for testing pipelines. - AWS development POC (stream and batch processing) using EMR and open source D.Q. - Provide support to cross functional teams. - Create ETL process to manage metadata for all data pipelines used and create data model. - Experience with AWS EMR, Kinesis, API Gateway, S3, Glue, cloudwatch, and lambda. - Building doc
  • A
    Data Scientist | Applied Intelligence
    Accenture Federal Services
    Jan 2020 - May 2020 (5 months)
    Discovery Lab Product Development: Projects - Building a data collection automation process for management. - Use of Natural Language Processing to extract and analyze text data. - Used Spacy package to extract text data from sources for a Machine Learning model. - Work with Program Manager to test use cases and scenarios for the product. - Develop automation process using Python and SQL tools. - Work with team members in project to polish and optimize pipeline code. - Develop data pipelines using python from different data collection sources. - Web scrape and develop data collection scripts for automation using Beautful Soup and Selenium. Operations Lead: -  Promoted to lead AWS project efforts and managing two Data Engineers.
  • A
    Operations Lead (Remote) | Applied Intelligence
    Accenture Federal Services
    Jan 2020 - May 2020 (5 months)
    Applied Intelligence Powered by Management Consulting and Technology, AFS Applied Intelligence (AIX) is a full-service data-native practice group that helps federal departments and agencies unlock business value from their data, improve visibility and performance across the enterprise, enhance decision-support and mission outcomes, and accelerate digital capabilities at scale. We are at the forefront of helping federal agencies apply more innovative approaches to achieve their mission objectives and improve operations. As leaders in advanced analytics, artificial intelligence (AI), and automation, our cross-disciplinary team of data science, data engineering and data visualization strategists and practitioners provide deep technical, fu
  • Verisk
    Business Analyst
    Verisk
    Nov 2017 - Nov 2019 (2 years 1 month)
    Earned Verisk Way to Go Award for excellence in data analysis and research: issue of misreporting 1,300 policies for Geico from a data load was resolved through data analysis and collaboration with development to resolve any issues to state DMV. Product Owner: Guidewire Accelerator. Released Product that supports client's data implementation process. Coverage Verifier Conversion Project. Released first iteration of program to clean up data preprocessing step. Data Quality Team : Analyzed current format of data ingestion process. Communicate with management on process to improve data processing. Capture missing data elements for each customer in QA process. Train team members to report data quality process, business logic, and imple
  • M
    Data Analyst
    MedAssets
    Oct 2016 - Nov 2017 (1 year 2 months)
    Client: RWJ Barnabas Health Identified data quality issue within client's application and used SQL functions to update database records. Leveraged the test environment to quickly test and create a new table and data set to push the update to production. I quickly resolved the issue for Robert Wood Johnson's one of many facilities in New Jersey experiencing inefficiencies in data quality.
  • N
    Consultant
    New Jersey Citizen Action
    Aug 2016 - Nov 2016 (4 months)
    Data Analyst: Provided industry and technical knowledge on customer relationship management for a new start up. Provided QA for new data collection system. Jersey City project: Manage the organization's project to provide financial services to communities in Jersey City. Consult on best practice on financial services and products for consumers who are in debt. Collect data during interviews and provide an outline to clients looking to improve their finances and decrease spend.
  • W
    Lead Data Scientist
    WeLIFTco
    Jan 2016 - Feb 2018 (2 years 2 months)
    As a Data Engineer, my role is to migrate processes from the old system to a new platform. Before the analysis, I need to build the structure and pipes. Once this process is completed, I will be able to generate reports and collect and manage data flow for the organization.
  • Ipsos
    Statistical Analyst
    Ipsos
    May 2015 - May 2016 (1 year 1 month)
    Mastercard 2015 Project: Analyzed, cleaned, and standardized data for Mastercard's credit card data for the North American Market. Worked with Ipsos Loyalty director, Mastercard consultant, and intern to prepare the data for analysis by using outlier detection method, statistical inference and comparison of means to understand consumers's spending habits and top ten credit card usage in their wallet. The success of Mastercard's consumer and small business study in 2015 resulted in expansion of the study in 2016 to cover South American and Canadian markets. My Personal Accomplishment: Teaching my intern on the use of SPSS syntax to automate data cleaning process, and working together to prepare the data for our client.
  • K
    Research Associate
    Kids Corporation II
    May 2014 - Jul 2015 (1 year 3 months)
    • Administered surveys to one hundred students for summer program evaluation • Created surveys to collect data on the impact of afterschool programs in Newark Public Schools • Used Microsoft Excel and IBM SPSS tools to create charts and tables for Statistical Analysis • Created a final report for partners to continue funding Kids Corporation II for Summer 2015
  • T
    Financial Fellow
    The Financial Clinic
    Jul 2013 - Aug 2014 (1 year 2 months)
     Managed two sites in New Jersey for direct services and achieved contract obligation  Documented and tracked customer data for data manager  Provided workshops and presentations for NY/NJ customers  Formed partnerships with non-profit organizations throughout New Jersey  Invited speaker for the Lift Up project in Savannah, Georgia for services and issues that come with financial hardships
Education verified_user 0% verified
  • T
    Certification, Applied Python for Data Science
    Trivera Technologies
    Jan 2020
  • Quantic School of Business and Technology
    Executive MBA, Master of Business Administration
    Quantic School of Business and Technology
    Jan 2020 - Nov 2023 (3 years 11 months)
    Awarded scholarship to attend Quantic School of Business and Technology. Highly selective rigorous MBA program with 7% acceptance rate. Courses in 9 subject areas: Accounting Markets and economies Data and Decisions Operations Management Leading Organizations Marketing and Pricing Strategy Entrepreneurship Finance
  • Montclair State University
    Certification, Data Collection and Management
    Montclair State University
    Jan 2016 - Jan 2017 (1 year 1 month)
  • edX
    Certification, Introduction to Python
    edX
    Jan 2016 - Current (9 years 7 months)
  • Rutgers University
    Bachelor's degree, Majored in Economics with minor Political Science
    Rutgers University
    Jan 2008 - Jan 2013 (5 years 1 month)
Projects (professional or personal) verified_user 0% verified
  • O
    OpenCoreGPT: AI for Data Insights and ETL Workflows
    Dec 2024
    AI-Driven Data Profiling and ETL Automation: Develop an AI agent capable of profiling data files (CSV, Parquet, Excel) and generating customizable ETL workflows, leveraging LangChain for logic orchestration and local LLMs for privacy. Secure and Portable Solution: Design the system to be modular and extensible, allowing seamless integration into any data engineering project with minimal configuration, ensuring time efficiency and reusability. Deployment and Scalability: Enable portability through containerization (Docker) and deployable in local, on-premise, or cloud environments to suit diverse client requirements.
  • F
    Federal Reserve Bank of Richmond: Cloud Implementation
    Aug 2024 - Jan 2025 (6 months)
    Providing consulting services in areas of cloud implementation and data extraction and ingestion process. Automation & Scalability: Deployed automation tools and frameworks to enhance efficiency, reduce operational costs, and enable scalable solutions for future growth. Stakeholder Collaboration: Worked closely with stakeholders to align cloud solutions with organizational objectives, ensuring a smooth transition and maximum ROI. Cloud Migration Strategy: Designed and executed a robust cloud migration strategy to transition legacy systems to a modern, scalable cloud infrastructure.
  • N
    Northeastern University: Big Data Automation Pipeline
    Jul 2024 - Aug 2024 (2 months)
    Create and automate a data pipeline with CI/CD tests.
  • p
    pigybak: Automated Testing and Risk Management Project
    May 2024
    Provide automated testing framework and Risk Management services for a private startup. Will publish work and recommendations for case study with founder.
  • D
    Data as a Service (DaaS): A Modern Approach to Data Delivery
    Jan 2024 - Oct 2024 (10 months)
    In the era of digital transformation, organizations are increasingly relying on data to drive decision-making, optimize operations, and create new opportunities for growth. One emerging trend is the shift from static, in-house data management to Data as a Service (DaaS)—a model that allows businesses to access data on demand, without the need to manage complex infrastructure or dedicate resources to maintain vast data sets.
  • A
    Automated SFTP Azure Blob Storage
    Nov 2023
    Set up Azure account and develop automation process to copy large files to the network.
  • A
    Automated SFTP Azure Blob Storage
    Nov 2023
    Set up Azure account and develop automation process to copy large files to the network.
  • A
    Adobe SSH Tunnel ETL Data Sync Pipeline
    Oct 2023 - Nov 2023 (2 months)
    Developed an idempotent data pipeline that has access to different database warehouses and transfers them to many different data lakes in the cloud.
  • S
    SSH Tunnel ETL Data Sync Pipeline
    Oct 2023 - Nov 2023 (2 months)
    Developed an idempotent data pipeline that has access to different database warehouses and transfers them to many different data lakes in the cloud.
  • S
    SSH Tunnel ETL Data Sync Pipeline
    Oct 2023 - Nov 2023 (2 months)
    Developed an idempotent data pipeline that has access to different database warehouses and transfers them to many different data lakes in the cloud.
  • A
    Arvest Banklift Project Initiative
    Oct 2023 - Dec 2023 (3 months)
    Responsible to set up data strategy to migrate data from on-prem to Google Cloud services. Provide recommendation and consultation on data assets using Collibra, Splunk, and Google Cloud.
  • A
    Adobe Commerce Database Project
    Sep 2023 - Jan 2024 (5 months)
    Build and develop data pipelines for users to adopt Adobe Commerce. Migrated three different data sources for management. Outcome: data is leveraged and used by Power B.I. to gather user insights, and determine next steps of user adoption.
  • A
    AWS S3 Bucket Data Processing
    Sep 2023
    Developed data ingestion process in AWS to extract data and load multiple data files to different storage paths. It is flexible to process multiple file path using Databricks and Cluster for performance. The process is idempotent where the data process will download new data files daily. The data is processed and loaded in Azure Data Lake generation 2.
  • A
    Adobe AWS S3 Bucket Data Processing
    Sep 2023
    Developed data ingestion process in AWS to extract data and load multiple data files to different storage paths. It is flexible to process multiple file path using Databricks and Cluster for performance. The process is idempotent where the data process will download new data files daily. The data is processed and loaded in Azure Data Lake generation 2.
  • w
    watsonx.data UI
    Aug 2023
    Load data to data store utilize IBM's Presto and storage buckets to collect insights.
  • F
    File Extraction Processor
    Jul 2023
    Developed file extractor tool in python to extract files from the cloud and load to Databricks Azure storage. Outcomes: Extracted 500 files and processed 10,000,000 records.
  • C
    Cloud Sync Job
    Jul 2023
    Developed sync job in Databricks using python to process 500 daily files. Syncing daily files from an S3 bucket to load in Databricks.
  • D
    Development Chatbot Course
    May 2023
    Developing Chatbox using AWS using python and Docker. Utilizing Natural Language Processing to extract text and using regular expression to manage files and data manipulation. Storing all the data extract to data lake and data architecture.
  • S
    Splunk Cloud Integration
    Apr 2023 - May 2023 (2 months)
    Collected and transferred Okta data to Splunk. Created location in splunk to house Okta logs to review in dashboards.
  • D
    Databricks ETL Migration Development
    Feb 2023
    - Migrated data from S3 bucket to Azure Databricks through API using python. - Migrate Salesforce data through Airflow, Mage using python and Restful API's from Databricks. - Developed python script to extract 500 compressed files.
  • S
    Salesforce Use Case
    Nov 2022 - Mar 2023 (5 months)
    Created a data pipeline using Salesforce's restful A.P.I to collect data to publish in a database. Used Python's Simple-Salesforce, Pandas Dataframe to build a batch process. A data pipeline developed for any Salesforce development. Outcome: Data pipeline with two stages.
  • F
    Financial Crimes Fraud Use Case
    Jul 2022 - Feb 2023 (8 months)
    AWS Platform development and support. Used AWS data sync, Athena, Cloud Watch, Glue Catalogue, and S3 to successfully process json files from on-premises location to the cloud. Using serverless tools in AWS to process files on a daily basis. Project was successfully implemented on February 9, 2023 in production.
  • P
    Pipeline Development and Data Quality
    Dec 2021 - May 2022 (6 months)
    Personal project to develop data quality rules for data assets. Great Expectations module was used for proof of concept design. Custom business logic rules was created using templates to integrate QC processor technology. Outcome: Generated report on quarterly data, and published established rules for three different data assets. Implemented rules on QC processor to automate reports and push to S3 buckets for client.
  • C
    CFPB Architecture Review Process
    Nov 2021 - Dec 2021 (2 months)
    Provided resolution to data assets related to consumer complaints and data related to consumer complaints. Created proposal on current architecture and provided analysis of growth of data, and solution to mitigate risk of data assets.
  • E
    Elasticsearch Development
    Nov 2021 - Jan 2022 (3 months)
    Developing process to index multiple csv's files to Elasticsearch 7. Process and develop python script to query multi-indexes for business cases. Develop docker environment to replicate process for production. Refactor and develop incremental process for production pipeline. Implemented Elasticsearch process to index millions of records.
  • D
    Data Quality & Governance
    Jul 2021 - Oct 2021 (4 months)
    Personal project to develop taxonomy and present findings to improve process. To promote good practices through development of key metrics in a heavily regulated industry. A challenge to make things better through research.
  • E
    Elasticsearch
    Apr 2021 - Apr 2022 (1 year 1 month)
    Implemented Python data processing from database to Elasticsearch api for CFPB internal tool. Worked with developer to provide business requirements to correct 200+ data points affecting review of process. Improved product by refactoring code and testing product through staging pipeline. Assisted project members in areas for improvements and reducing development time through research. \Work with technical lead in Consumer Products Team to solve issues in data processing and present results to management. Project members: Irina Muchkin and Richard Dinh
  • M
    Metadata Extraction
    Jan 2021 - Jul 2021 (7 months)
    Consumer Response and Technology & Innovation project to extract and organize data using OOP and Python to create visualizations through ETL( Extract, Transform, Load) to Postgres database. Contributors to the project and authors. Abbie Olson and Daniel Van Balen
  • C
    CFPB Concurrency Project
    Nov 2020 - Mar 2021 (5 months)
    Completed script with contributors from Technology and Innovation group. To improve processing data points for consumer complaints. Data points efficiency was achieved and more processing time resources for OCR. Contributor to the project and support Christian Decker
  • C
    CFPB Technology Implementation
    Aug 2020 - Dec 2020 (5 months)
    Process selected for review for CFPB director as key initiatives and accomplishments started in 2020. Joint effort with Technology and Innovation.
  • A
    AWS ETL Automation Covid-19
    Mar 2020 - May 2020 (3 months)
    - Operational Lead to handle requests from data scientists - Process data to AWS cloud in S3 buckets and generate database tables for project. - Work with Program Manager to build scalable API architecture. -Initialize tables for data ingestion process for Tableau. - Train team members on python processing and QA scripts. -Work with group to develop database functionality in AWS and ETL process. Outcome: Product has been supported through funding and sold to government agencies. Process has been documented for the Discovery Lab to use for future projects.
Awards verified_user 0% verified
  • Verisk
    Verisk Way to Go
    Verisk
    Feb 2019
    Issue of misreporting 1,300 policies for Geico from a data load was resolved through data analysis and collaboration with development to resolve any issues to state DMV.
  • R
    Honor Society
    Rutgers
    May 2013
Publications verified_user 0% verified
  • V
    Guidewire Accelerator Guide
    Verisk Analytics
    Jan 2018