A

Arslan Aslam

About

Detail

Islamabad, Pakistan

Timeline


work
Job
school
Education
folder
Project
flag
Award

Résumé


Jobs verified_user 0% verified
  • BALANX-Bio
    Data Engineer Intern
    BALANX-Bio
    Dec 2025 - Current (8 months)
    • Built and maintained end-to-end RAG (Retrieval-Augmented Generation) pipelines using LangChain, enabling semantic retrieval over 50K+ enterprise documents and reducing manual data lookup time by approximately 40% • Developed and tested scalable AI/data pipelines with structured unit testing and validation checks, improving workflow reliability and reducing pipeline failures by 30% • Optimized ETL processes across 5+ data sources, improving data freshness from daily to near real-time and ensuring consistent, accurate downstream reporting • Integrated vector embeddings and AI-driven workflows into existing infrastructure by collaborating with cross-functional teams, laying the groundwork for intelligent search capabilities
Education verified_user 0% verified
  • Institute of Space Technology
    Bachelor of Science in Computer Science
    Institute of Space Technology
    Sep 2024 - Current (1 year 11 months)
Projects (professional or personal) verified_user 0% verified
  • I
    AI-Powered E-Commerce Analytics Engine with RAG and Agentic Workflows
    Independent project
    Apr 2026 - May 2026 (2 months)
    • Engineered a Medallion Architecture pipeline (Bronze → Silver → Gold) orchestrated by Apache Airflow, transforming 100K+ raw Olist e-commerce records into a star schema with fact and dimension tables in PostgreSQL • Built a RAG pipeline using Voyage AI embeddings and ChromaDB, enabling semantic search over 40K+ customer reviews for qualitative business insights with sub-second query response times • Designed an agentic LangGraph workflow with intelligent query routing — automatically classifying user questions and directing them to SQL, RAG, or hybrid retrieval paths for comprehensive answers • Containerized the full stack (FastAPI, Streamlit, Airflow, PostgreSQL, ChromaDB) using Docker Compose, enabling one-command deployment and full re
  • I
    Real-Time Smart City Data Streaming Pipeline on AWS
    Independent project
    Jan 2026 - Feb 2026 (2 months)
    • Architected a real-time streaming pipeline using Apache Kafka and Apache Airflow to ingest and process high-volume sensor data from 5+ smart city data streams • Leveraged AWS services (S3, Glue, Redshift) to build a scalable lakehouse-style storage layer supporting transformation and analytical querying • Containerized all pipeline components with Docker and implemented automated monitoring and error-handling logic, achieving 99%+ pipeline uptime
  • I
    End-to-End AWS ETL Pipeline with Apache Airflow
    Independent project
    Oct 2025 - Nov 2025 (2 months)
    ⚙️ Workflow Sequence: 🔹 The project begins with Data Cleaning & Validation of raw CSV files using an AWS Glue ETL Job Notebook, ensuring the datasets are consistent and analysis-ready. 🔹 The cleaned files were then stored securely in Amazon S3, serving as the central data lake for the pipeline. 🔹 Next, I created an AWS Glue Database and configured Crawlers to automatically infer schema and build data catalogs for the cleaned datasets — making them queryable in AWS Athena. 🔹 Using Amazon Athena, I wrote multiple SQL queries to extract valuable insights and perform analytical operations such as data filtering, aggregation, and optimization directly on S3 data. 🔹 For workflow automation, I integrated Apache Airflow locally through Do
  • I
    Sales ETL Pipeline
    Independent project
    Oct 2025 - Nov 2025 (2 months)
    Objective: To design a modular ETL (Extract, Transform, Load) process for sales data using modern data engineering tools and practices. 🔹 Tools & Technologies Used: 🐳 Docker – to containerize Apache Airflow and PostgreSQL for smooth local orchestration. ⚙ Apache Airflow – to schedule and automate ETL workflows through DAGs with task dependencies and custom logging. 🐘 PostgreSQL (via DBeaver) – to perform data cleaning, filtering, and transformations using modular SQL scripts. 💻 Python (with psycopg2, Pandas, Jupyter Notebook) – for connecting to the database, validating data, and handling transformations programmatically. 🔹 Workflow Sequence: Extracted raw sales data and loaded it into PostgreSQL. Performed data cleaning, filtering
  • I
    ETL Pipeline
    Independent project
    Sep 2025
    🔹 What this project does: Extracts raw transaction data into Databricks Transforms the data with PySpark & SQL (cleaning, aggregation, filtering) Loads structured results into tables for business insights 🔹 Skills applied: Databricks | Apache Spark | PySpark | SQL | Data Engineering Concepts
  • I
    Online Voting System
    Independent project
    Dec 2024 - Jan 2025 (2 months)
    A secure, province-based voting simulation implementing OOP and data management. ✔ Key Features: User Authentication: Login/registration system with password strength validation and unique voter ID checks. Province-Based Voting: Voters can only vote for candidates from their registered province (Punjab, Sindh, Balochistan, KPK). Admin Controls: View voter records, candidate vote counts, and election results. Data Integrity: Input validation to prevent invalid votes or duplicate registrations. Vote Tracking: Real-time vote counting and result declaration per province. ✔ Technical Highlights: OOP Concepts: Used struct to manage voter data (ID, province, votes) and modular functions for each operation. Memory Management: Employed pointers to
Awards verified_user 0% verified
  • HackerRank
    Python (Basic)
    HackerRank
    May 2026 - Current (3 months)
  • I
    Cloud Computing Fundamentals
    IBM SKILLSBUILD
    May 2026 - Current (3 months)
  • I
    Data Fundamentals
    IBM SKILLSBUILD
    May 2026 - Current (3 months)
  • HackerRank
    Problem Solving (Basic)
    HackerRank
    May 2026 - Current (3 months)
  • I
    Retrieval-Augmented Generation for Enhanced AI Outputs
    IBM SKILLSBUILD
    May 2026 - Current (3 months)
  • HackerRank
    SQL (Basic) Certificate
    HackerRank
    Aug 2025 - Current (1 year)
  • HackerRank
    SQL (Intermediate) Certificate
    HackerRank
    Aug 2025 - Current (1 year)