Varun Sharma

Varun Sharma

About

Detail

Software Developer | AI/ML Systems | Distributed Data Pipelines | Python, SQL, APIs | MS in Data Science
California, United States

Contact Varun regarding: 
work
Full-time jobs
Starting at USD75k/year
id_card
Internships
Starting at USD3.2k/month
Finding co-founders
groups
Networking

Timeline


work
Job
school
Education
folder
Project
flag
Award
auto_stories
Publication

Résumé


Jobs verified_user 0% verified
  • M
    Software Developer
    Modern Streaming
    Dec 2025 - Current (7 months)
    Built a multimodal processing pipeline to analyze and index 500+ hours of unstructured telemetry data using YOLOv8, OpenCV, and LLM-based components, enabling searchable retrieval for auditing use cases. Reduced data processing overhead by ~40% by implementing a multi-threaded extraction pipeline with frame-skipping logic, improving throughput for high-resolution video streams. Developed validation workflows combining deep learning models with classical computer vision techniques (e.g., HSV thresholding) to support compliance-related checks. Improved inference efficiency by applying model optimization techniques such as KV caching and quantization, reducing latency for multimodal query workloads.
  • Extern
    Applied Data Scientist
    Extern
    Oct 2025 - Jan 2026 (4 months)
    Built modular, AI-powered pipelines to process 200+ page mortgage blob files—combining OCR (Tesseract, PaddleOCR), PDF parsing (PyMuPDF), and RAG techniques for intelligent data extraction, classification, and search. Developed a document retrieval system using LlamaIndex and Retrieval-Augmented Generation (RAG), optimized for multi-document mortgage blobs. Enhanced precision through chunk tuning, metadata filtering, and evaluation of open-source LLMs like Mistral and Phi-2. Conducted end-to-end evaluation of the document intelligence system on 200+ page mortgage blobs—benchmarking OCR accuracy, RAG retrieval quality, and routing performance. Delivered a technical report outlining model trade-offs, optimization strategies, final deployment
  • Extern
    AI Engineer Extern
    Extern
    Oct 2025 - Jan 2026 (4 months)
  • Forage
    Global Markets Sales and Trading Analyst
    Forage
    Oct 2025
    Completed a job simulation focused on analyzing market trends and delivering client-centric solutions within the sales and trading division. Conducted in-depth data analysis using tools like Excel and Bloomberg to identify key financial trends, assess market dynamics, and align insights with client objectives. Researched and proposed strategic recommendations for optimizing trade execution processes and enhancing workflow efficiency using automation and process analysis. Developed a client proposal outlining tailored investment strategies leveraging data-driven insights to address client goals such as portfolio diversification, sustainability, and moderate growth.
  • S
    AI & ML Engineer
    Smart Rewards Inc
    Aug 2025 - Current (11 months)
    Automating HR & Marketing Workflows: Build end-to-end pipelines on N8N to post job openings on CJNNow, YouTube, and LinkedIn automatically, reducing manual effort by ~60%. Email Automation: Design and manage automated HR email campaigns, including candidate notifications and interview reminders, saving ~10–15 hours/week. Social Media Content Management: Create, schedule, and publish posts on LinkedIn using ML-driven content optimization for higher engagement and relevance. Workflow Reliability & Monitoring: Implement logging, error handling, and performance checks within N8N to ensure stable and repeatable operations. Collaboration & Iteration: Partner with HR, marketing, and IT stakeholders to define requirements, validate workflow outputs
  • Triveni IT
    Software Developer
    Triveni IT
    Aug 2025 - Apr 2026 (9 months)
    Architected driver-based analytical pipelines integrating Oracle Fusion and Microsoft Dataverse to analyze financial market trends and optimize enterprise asset lifecycle management. Developed executive-facing Power BI and Tableau dashboards delivering real-time visibility into procurement KPIs, stock turnover, and profitability forecasting. Designed and deployed large-scale Robotic Process Automation (RPA) workflows to standardize global financial data ingestion, reducing manual processing risks and improving regulatory compliance.
  • S
    AI Intern
    Smart Rewards Inc
    Jun 2025 - Aug 2025 (3 months)
    Delivered a production RAG assistant that materially improved support KPIs: ~25% ticket deflection, ~70% reduction in manual search, and hours saved per agent per week in resolution time. Drove adoption from a 20-person pilot to 4 departments (120+ DAUs; ~500 MAUs), handling 10,000+ queries/month with p95 <500 ms and 150+ concurrent users. Full-stack ownership: corpus curation of ~5M internal docs, 768-d embeddings, ~50 GB Pinecone index; LangChain-based orchestration and evaluation harness for ongoing quality. Productionized on GCP (Cloud Run + autoscaling), Dockerized services, and GitHub Actions CI/CD; implemented OpenTelemetry tracing and Grafana observability. Reduced per-query costs by ~35% and nearly halved latency (p95 ~900 ms → ~50
  • S
    Data Science Intern
    Smart Rewards Inc
    Jan 2025 - Jun 2025 (6 months)
    Scaled clinical analytics across ~6 Phase II/III trials (~2,500–3,000 patients, ~40 sites) by integrating ~20M+ records from EDC, CTMS, LIMS, and ePRO; daily/weekly data refreshes for operational currency. Designed the analytics “product”: reusable Power BI templates and standardized ETL in Python/SQL Server that harmonized KPIs and accelerated new-study rollout. Sole owner of enrollment and deviation monitoring dashboards with RLS for PHI segmentation; automated lab/ePRO pipelines with Git versioning and Airflow scheduling. Embedded robust data governance: HIPAA and 21 CFR Part 11 compliance, de-identification, automated QC/anomaly detection, and cross-source reconciliation. Delivered measurable operating leverage: ~60% reduction in manual
  • Forage
    Data Analyst
    Forage
    Nov 2024
    Analyzed 100K–150K records from 3–5 structured and semi-structured sources to surface operational bottlenecks; recommendations enabled 10% cost reduction ($20K–$25K) via resource reallocation and process simplification. Developed 3–4 Tableau/Power BI dashboards for operations leaders (≈20–25 active users) with week-1 MVP and full delivery in 3 weeks, improving time-to-insight and self-serve decision-making. Built Python-based ETL pipelines (pandas, NumPy) with SQL integration and validation checks to consolidate ERP data, CSV exports, and operational logs; standardized pipelines improved data reliability and reduced manual effort by 50%. Established and tuned KPI framework (turnaround time, completion rates, utilization) and performed anoma
  • Drexel University College of Computing  Informatics
    Research Assistant
    Drexel University College of Computing Informatics
    Oct 2024 - Aug 2025 (11 months)
    Directed dataset curation and model development for a Drexel–Thomas Jefferson Hospital collaboration, training CLIP- and ViT-family models on ~10k+ mammograms from CBIS-DDSM and VinDr-Mammo under HIPAA-compliant governance (de-identification, DUAs, audit logs). Engineered a dual-view fusion architecture tailored to mammography (CC/MLO) and tuned ViT/SWIN-T/CLIP backbones; achieved >90% internal accuracy with a 97% best-run on patient-level test splits; evaluated cross-site robustness. Established rigorous ML practices (patient-level train/val/test, augmentation, hyperparameter sweeps) and MLOps (GitHub versioning, CI/CD, reproducibility); mentored a junior researcher on preprocessing and training standards. Positioned findings for clinical
  • Drexel University College of Computing  Informatics
    Research Intern
    Drexel University College of Computing Informatics
    Sep 2023 - Dec 2023 (4 months)
    Owned end-to-end modeling for CHD risk stratification on longitudinal EHR/claims data (~250K patients, 8 years, 120 features). Advanced AUC from ~0.74 (logistic) to ~0.89 with MAF/RealNVP; improved sensitivity by ~15% at >85% specificity, surfacing ~120 additional high-risk patients/month for earlier intervention. Designed reproducible data/feature pipelines in Python/SQL on a secure Linux research cluster; harmonized Epic extracts; enforced IRB/HIPAA controls in a de-identified enclave; instituted run logging and validation checks for internal reproducibility. Engineered domain signals (rolling labs, comorbidity index, refill gaps) and benchmarked classical ML (LogReg, RF, XGBoost) vs deep generative models; handled class imbalance with we
  • ChickfilA Restaurants
    Team Member
    ChickfilA Restaurants
    Oct 2022 - Jun 2024 (1 year 9 months)
    Coordinated peak-hour operations for a 10–12 person, campus-adjacent diner, handling 70–80 orders during lunch/dinner rushes and 30–50 orders per shift overall; regularly scheduled for the busiest blocks based on reliability and throughput. Elevated guest satisfaction by 18% in 3–4 months by standardizing greetings, order read-backs, and handoff quality checks; served as point for peak-hour workflow coordination. Cut order errors by 15% through a repeatable cashier→prep→bagging process, including read-backs, QC at handoff, and prep checklists—improving first-time-right accuracy. Improved speed of service by ~10–15 seconds per order and increased throughput during peak windows by proactively reallocating staff and prioritizing high-velocity
  • Cognizant
    Junior Software Engineer
    Cognizant
    Jul 2020 - Jul 2022 (2 years 1 month)
    Full-stack .NET IC owning feature delivery for three regulated pharma systems (SFC, JPUBS, ReCAP3), integrating .NET Framework, C#, SQL Server, JavaScript, and Azure DevOps to meet uptime and compliance expectations. Scaled field-rep workflows in SFC to support ~200–300 active reps and ~1,500–2,000 daily call logs; upgraded validation and exception handling to reduce support tickets ~20% and elevate user satisfaction by 96% (internal survey). Modernized JPUBS approval flows and resolved 9 high-severity issues to accelerate publication submissions for 50–100 users; improved reliability across 500–700 publications/quarter. Delivered reusable components for ReCAP3 to automate routing and checks for ~50–80 proposals/week; upheld ~99% uptime dur
Education verified_user 0% verified
  • Drexel University College of Computing  Informatics
    Master of Science - MS, Data Science
    Drexel University College of Computing Informatics
    Sep 2022 - Jun 2024 (1 year 10 months)
    Graduate Cooperative Education Program - Combined academic excellence with industry experience in AI/ML and full-stack development. Key Achievements: Data Science Capstone I & II (A+) - End-to-end ML project development Applied Machine Learning (A+) - Predictive modeling & statistical analysis Applied Cloud Computing - AWS/Azure implementations Information Retrieval Systems (A+) - Search algorithms & data mining using Informatica & ElasticSearch platforms Healthcare Informatics (A+) - Domain-specific analytics Technical Skills Developed: ✓ Machine Learning & AI (RAG systems, deep learning) ✓ Data Engineering (ETL pipelines, preprocessing) ✓ Cloud Computing (AWS/Azure architecture) ✓ Full-Stack Development (Python, Java, JavaScript) ✓ Data V
  • SRM IST Chennai
    Bachelor of Technology, Computer Science
    SRM IST Chennai
    Jul 2016 - May 2020 (3 years 11 months)
    IET-Accredited Program - Comprehensive foundation in computer science fundamentals, software engineering, and emerging technologies. Key Coursework: Software Engineering Principles & SDLC Data Structures & Algorithms Database Management Systems (MySQL, SQL) Computer Networks & System Architecture Artificial Intelligence & Data Mining Python Programming & Object-Oriented Programming Web Technologies (HTML/CSS, JavaScript) Professional Ethics & Software Quality Assurance Technical Skills Developed: ✓ Programming: Python, Java, C++, C#, JavaScript ✓ Frameworks: .NET Framework, Spring Framework ✓ Databases: MySQL, SQL Server ✓ Development: SDLC, Git version control, Web development ✓ Core CS: Data structures, algorithms, computer networks Lea
Projects (professional or personal) verified_user 0% verified
  • U
    UEFA Champions League Match Analysis – 2020 to 2023
    Nov 2025
    Overview: Analyzed match data from the UEFA Champions League across three seasons (2020-21, 2021-22, 2022-23) to uncover trends in team performance, home scoring patterns, possession dominance, and duel outcomes. The analysis combined sports analytics with SQL in Snowflake, providing insights into team strategies and game dynamics. Key Contributions: Identified the top 3 teams scoring the highest goals at home during the 2020-21 season: PSG – 5 goals Manchester United – 5 goals Barcelona – 5 goals Determined Liverpool had the most games with majority possession in the 2021-22 season (9 matches). Examined matches from the 2022-23 season to find teams that won duels but still lost, highlighting tactical dynamics across stages: Group stage,
  • P
    Public Transport Journey Analysis – Transport for London (TfL)
    Nov 2025
    Overview: Analyzed a dataset containing millions of journeys across various transport types in London, spanning buses, Underground & DLR, overground, trams, TfL Rail, and the Emirates Airline cable car. The goal was to understand usage patterns, identify the most popular transport modes, and uncover temporal trends. Key Contributions: Calculated total journeys by transport type, identifying buses (24,905M journeys) and Underground & DLR (15,020M journeys) as the most heavily used modes. Analyzed month-by-month trends for Emirates Airline, highlighting peak travel months in May 2012 (0.53M journeys), June 2012 (0.38M), and April 2012 (0.24M). Examined journey volume trends for Underground & DLR, identifying the five years with the lowest
  • V
    Video Game Sales and Ratings Analysis
    Nov 2025
    Video games are a multi-billion-dollar industry, and understanding both critical and user reception can reveal trends about the quality of games over time. In this project, I analyzed top-selling games, yearly critic scores, and user feedback to identify trends and “golden years” of gaming. Key Contributions & Insights: Best-Selling Games: Identified the top 10 highest-selling games, including Wii Sports (82.9M copies) and Super Mario Bros. (40.24M copies). Critics’ Favorite Years: Determined the top ten years by average critic score for years with at least four releases. 1998 (avg 9.32) and 2004 (avg 9.03) were among the highest-rated years. Golden Years (Critics & Users Agreement): Highlighted years where critics and users broadly agre
  • E
    EV Charging Station Usage Analysis
    Nov 2025
    With the rise of electric vehicles, understanding charging station usage in apartment garages is crucial for efficient resource management. In this project, I analyzed EV charging session data to uncover insights about tenant charging habits. Key Contributions & Insights: Unique Users per Garage: Identified garages with the highest number of shared users (e.g., Bl2 with 18 users, AsO2 with 17). Peak Charging Times: Determined the top 10 most popular charging start times, including Sunday at 17:00 (30 sessions) and Friday at 15:00 (28 sessions). Long-Duration Users: Found users whose average session lasts over 10 hours, including Share-9 (16.85 hrs) and Share-17 (12.89 hrs). Impact: Provided actionable insights for apartment managers to
  • C
    Case Study: Netflix Emerging Markets Subscriber Growth Playbook
    Oct 2025
    Situation: Netflix needed a recommendation playbook to optimize marketing spend across emerging markets for subscriber growth. Task: Identify which markets should receive increased or decreased investment based on efficiency and growth potential. Action: Gathered internal marketing and external market data, defined KPIs (ROI, CAC, CLV, retention, growth), performed exploratory analysis, and scored markets using composite metrics and 2x2 efficiency vs. growth frameworks. Validated recommendations via geo-split/A-B tests. Result: Delivered a data-driven framework guiding smarter marketing investment, improving ROI, lowering CAC, and prioritizing growth opportunities across emerging markets. Skills & Tools: Python, Pandas, NumPy, SQL, Power B
  • N
    National Economic Intelligence Platform - Transforming Census Data into Strategic Policy Insights
    Aug 2025
    Situation: Federal agencies faced critical data quality issues in US household income datasets across 32,000+ geographic locations, hindering evidence-based economic policy development and strategic resource allocation for national economic programs. Task: Spearhead development of enterprise-grade national economic intelligence platform with comprehensive data governance frameworks for predictive economic modeling and policy recommendations. Action: Architected sophisticated data engineering ecosystem using advanced MySQL techniques for comprehensive data cleaning, validation, and exploratory analysis across multi-dimensional socioeconomic datasets. Implemented automated duplicate detection, statistical analysis across geographic hierarchie
  • P
    Public Transportation Performance Intelligence - Optimizing Urban Mobility Through Data-Driven Insights
    Aug 2025
    Situation: NYC's bus transportation system experienced over 18,000 recorded operational disruptions affecting millions of daily commuters, creating substantial public service delivery gaps and straining municipal resources across multiple boroughs and contractors. Task: Spearhead comprehensive transportation analytics initiative to establish predictive operational intelligence for proactive maintenance strategies and evidence-based resource allocation. Action: Architected enterprise-grade transportation analytics platform integrating multi-year operational data (2019-2023) across 20+ bus companies and multiple service categories. Implemented advanced root cause analysis, contractor performance benchmarking, and predictive modeling for mecha
  • C
    Customer Analytics & Segmentation Strategy | UFood Brazil Market Analysis
    Aug 2025
    Situation: UFood, Brazil's leading food delivery platform serving 1M+ consumers across 1,000+ cities, experienced slowing profit growth and needed data-driven insights to optimize marketing performance and customer acquisition in a competitive market. Task: Analyze 1,843 customer records with 39 demographic and purchasing features to identify high-value segments, understand campaign acceptance drivers, and develop actionable segmentation strategies to improve marketing ROI. Action: Performed comprehensive EDA using Python/pandas on demographics, purchase channels, and campaign response data. Conducted correlation analysis and advanced visualizations to uncover relationships between customer traits and spending patterns. Segmented customers
  • S
    Smart Insights for Insurance Success: Customer Conversion Prediction
    Aug 2025
    Situation: Insurance companies faced significant challenges in customer acquisition efficiency, with conversion rates below 15%, marketing costs consuming 40% of revenue, and inability to identify high-potential prospects from diverse demographic segments, resulting in wasted resources and suboptimal ROI on annual marketing investments. Task: Develop comprehensive customer conversion prediction system to optimize insurance acquisition strategies, improve profitability, and enable data-driven prospect targeting across multiple customer segments. Action: Built advanced machine learning pipeline using Python with comprehensive data preprocessing including duplicate removal, categorical encoding via LabelEncoder, and class imbalance correction
  • G
    Global Health Intelligence Platform - Transforming WHO Data into Strategic Health Policy Insights
    Aug 2025
    Situation: International health organizations faced fragmented data quality issues across 195+ countries spanning decades, hindering evidence-based policy development and strategic resource allocation for global health initiatives. Task: Spearhead development of enterprise-grade global health intelligence platform with comprehensive data governance frameworks and advanced analytical capabilities for predictive health modeling. Action: Architected sophisticated data engineering ecosystem using advanced SQL techniques for comprehensive data cleaning, validation, and exploratory analysis across multi-dimensional health datasets. Implemented automated data quality assessments, statistical correlation analysis between health indicators, and adva
  • F
    Forecasting Avocado Prices Using ARIMA: Agricultural Market Intelligence
    Aug 2025
    Situation: Agricultural stakeholders faced significant uncertainty in avocado pricing decisions with price volatility reaching 40-60% seasonally, lacking reliable forecasting tools that resulted in suboptimal inventory management, production planning inefficiencies, and revenue losses due to unpredictable market fluctuations affecting $2.8B+ annual US avocado industry. Task: Develop sophisticated price forecasting system using advanced time series analysis to enable data-driven agricultural decision-making and optimize market strategies across supply chain stakeholders. Action: Built comprehensive ARIMA forecasting pipeline using Python, Pandas, Matplotlib, and Statsmodels for historical price analysis spanning 2004-2020. Implemented system
  • E
    Exploring AutoScout24 Car Offers: German Automotive Market Intelligence
    Aug 2025
    Situation: German automotive market stakeholders faced information gaps in pricing strategies and inventory management, lacking comprehensive analysis of 100,000+ car listings on AutoScout24 platform. Dealerships struggled with optimal pricing decisions, while manufacturers needed competitive intelligence to guide market positioning and product development strategies. Task: Conduct comprehensive market intelligence analysis of German car listings to extract actionable insights for pricing optimization, demand forecasting, and competitive positioning across automotive ecosystem. Action: Built comprehensive data analysis pipeline using Python, Pandas, Seaborn, and Matplotlib to process extensive AutoScout24 dataset. Implemented advanced data
  • B
    Bank Customer Churn Prediction & Segmentation
    Aug 2025
    Situation: A European bank was losing customers at an unsustainable rate, with limited visibility into churn patterns and no ability to implement proactive retention strategies, resulting in millions in lost revenue and increased acquisition costs. Task: Lead data science initiative to analyze 10,000 customer records, build predictive models, and create actionable customer segments to improve retention strategies. Action: Executed end-to-end data pipeline including dataset joins, missing value imputation, duplicate removal, and inconsistent labeling corrections. Conducted comprehensive EDA using box plots, histograms, and bar charts to uncover churn patterns by demographics and financial behavior. Engineered critical features including bala
  • F
    Federal Debt Intelligence Platform - Transforming Economic Data into Strategic Policy Insights
    Aug 2025
    Situation: With US federal debt reaching $31.45 trillion and complex intergovernmental holdings structures, policymakers and economic research institutions needed sophisticated analytical capabilities to understand debt trajectory patterns and fiscal implications for strategic planning. Task: Spearhead development of comprehensive federal debt intelligence platform to establish predictive economic modeling capabilities and evidence-based policy recommendations. Action: Architected enterprise-grade economic analytics ecosystem integrating multi-year Treasury data (2015-2023) across three debt categories: public holdings ($24.6T), intragovernmental holdings ($6.8T), and total outstanding debt. Implemented advanced time-series analysis, season
  • C
    Customer Service Performance Analytics Dashboard - Driving Operational Excellence Through Data
    Aug 2025
    Situation: Customer service team of 8 agents lacked real-time performance visibility, operating reactively rather than strategically with no insights into call patterns, agent efficiency, or resolution metrics, hindering optimal customer experience delivery. Task: Spearhead development of enterprise-grade analytics solution to establish data-driven performance management, enabling proactive decision-making and scalable operational improvements. Action: Architected comprehensive business intelligence platform integrating multiple data streams into unified analytics ecosystem. Built real-time KPI monitoring, predictive analytics for call volume forecasting, and automated performance benchmarking across team members. Established governance fra
  • C
    Customer Retention Analytics Platform - Transforming Churn Intelligence into Revenue Protection
    Aug 2025
    Situation: Rising customer acquisition costs and revenue leakage from 2,000+ subscription customers across multiple payment methods highlighted critical need for sophisticated churn prediction and proactive retention strategies. Task: Lead development of enterprise-grade customer intelligence platform for predictive churn analytics, enabling proactive retention and lifecycle optimization. Action: Built unified churn analytics ecosystem integrating transaction data, contract demographics, and behavioral patterns. Implemented cohort analysis by tenure, payment method segmentation across 4 types (Bank transfer: 258, Credit card: 232, Electronic check: 1,071, Mailed check: 308), and predictive modeling for contract-type churn. Collaborated with
  • B
    Bank Customer Churn Prediction
    Aug 2025
    Situation: Working with a European bank customer dataset of 10,000 records, I wanted to identify patterns that distinguish customers who leave from those who stay, and understand which factors drive churn decisions. Task: Build predictive models to identify at-risk customers and analyze which demographic and behavioral factors most strongly correlate with churn. Action: Performed data cleaning including joining datasets, handling missing values, removing duplicates, and fixing inconsistent labels. Conducted exploratory analysis using box plots, histograms, and bar charts to uncover churn patterns across demographics (age, geography, gender) and financial behavior (balance, products, tenure). Engineered features including balance-to-income r
  • B
    Book Recommendation Engine (Goodreads Data)
    Jul 2025
    Situation: With thousands of books available, readers struggle to find their next good read. I wanted to build a content-based recommendation system using actual reader reviews and book metadata. Task: Create a recommendation engine that suggests books based on content similarity, genre patterns, and sentiment analysis of reader reviews. Action: Processed 13,000+ books and 1 million+ reviews from Goodreads dataset. Engineered 10+ features including sentiment scores from reviews using TextBlob, genre frequency analysis, description polarity, and reader volume signals. Implemented content-based filtering with text preprocessing (cleaning, tokenization, TF-IDF vectorization). Built Random Forest model to predict book ratings based on extracted
  • A
    AI-Powered Book Recommendation Engine Using Goodreads Data
    Jul 2025
    Situation: Readers struggled to discover relevant books from massive catalogs, with traditional recommendation systems failing to capture personal preferences, sentiment trends, and hidden gems, leading to poor reading experience and reduced engagement. Task: Build AI-powered personalized book recommender using comprehensive Goodreads data to enable dynamic reading list generation tailored to individual preferences and discovery patterns. Action: Processed 13K+ books and 1M+ reviews, engineering 10+ features including sentiment scores, genre frequency, description polarity, and reader volume signals. Implemented content-based filtering with automated text cleaning and sentiment analysis using TextBlob. Built Random Forest model with compreh
  • S
    Social Media Sentiment Analysis for Brand Monitoring | Python, spaCy, Machine Learning
    Jun 2025
    Situation: Brand management teams lacked automated capabilities to monitor social media sentiment at scale, relying on manual processes that couldn't keep pace with real-time brand mentions and potential reputation threats across digital platforms. Task: Develop end-to-end automated sentiment classification system to transform manual social media monitoring into intelligent, data-driven brand management. Action: Processed 9,896 tweets using advanced spaCy NLP pipeline with custom preprocessing for social media text. Engineered 1,466 TF-IDF features combined with linguistic analysis including POS tagging and named entity recognition. Trained and compared 4 ML algorithms with 5-fold cross-validation, selecting Logistic Regression as optimal p
  • A
    AI Portfolio Risk Management | S&P 500 Deep Learning Strategy
    Jun 2025
    Situation: Investment firms faced significant losses from delayed market responses and human bias in decision-making, with manual analysis of 500+ S&P stocks creating bottlenecks where profitable opportunities disappeared within hours. Task: Develop enterprise-grade AI trading system integrating deep learning and quantitative finance for real-time portfolio decision-making and risk management in volatile markets. Action: Built neural network architecture using RNN (LSTM) for price forecasting and ANN for trend classification. Engineered 24 advanced features from 7 market indicators including moving averages, RSI, and volatility metrics. Processed 497K+ time-series records across 503 S&P stocks (2014-2017) using PyTorch implementation with G
  • S
    S&P 500 Stock Price Prediction
    Jun 2025
    Situation: I wanted to understand if machine learning could predict stock price movements and whether deep learning approaches outperform traditional methods for financial time series data. Task: Build and compare neural network models to forecast S&P 500 stock prices using historical market data and technical indicators, then evaluate their real-world applicability. Action: Processed 497,000+ time-series records across 503 S&P stocks from 2014-2017. Engineered 24 features from 7 market indicators including moving averages (SMA, EMA), RSI, Bollinger Bands, and volatility metrics. Built two neural network architectures: LSTM (RNN) for price forecasting and ANN for trend classification. Implemented using PyTorch with GPU acceleration, dropout
  • S
    Social Media Sentiment Analysis (Twitter Brand Monitoring)
    Jun 2025
    Situation: Brand mentions on social media generate massive volumes of text data. I wanted to build an automated sentiment classifier that could process tweets and identify positive, negative, or neutral brand sentiment at scale. Task: Develop NLP pipeline to classify tweet sentiment and extract insights about temporal patterns and key sentiment drivers. Action: Processed 9,896 tweets using spaCy NLP pipeline with custom preprocessing for social media text (handling hashtags, mentions, URLs). Engineered 1,466 TF-IDF features and combined with linguistic features including POS tagging and named entity recognition. Trained and compared 4 machine learning models (Naive Bayes, SVM, Random Forest, Logistic Regression) using 5-fold cross-validatio
  • C
    Concrete Strength Prediction
    Mar 2025
    Situation: Construction projects wait 28 days to test concrete strength, creating delays and potential waste if batches fail quality standards. I wanted to predict strength earlier in the process using mix composition data. Task: Build machine learning model to predict concrete compressive strength based on ingredient proportions, reducing reliance on time-consuming physical testing. Action: Worked with dataset containing concrete mix compositions (cement, water, aggregates, age). Handled messy data with mixed formats—built custom parsing functions to extract numerical values from strings with multiple delimiters. Applied data cleaning pipeline including outlier treatment, missing value handling, and StandardScaler normalization. Split data
  • I
    Intelligent RAG Chatbot for Automated Document Analysis
    Jan 2025
    Situation: I noticed document-heavy workflows—legal teams, researchers, technical professionals—spend hours manually searching through PDF repositories. I wanted to build a solution that could answer questions across multiple documents with source citations. Task: Build an intelligent document analysis system that could extract information from PDFs, answer natural language questions, and provide transparent source attribution—all while being actually deployable, not just a Jupyter notebook. Action: Built an end-to-end RAG pipeline using PyPDFLoader for PDF extraction and RecursiveCharacterTextSplitter for text chunking (1000 characters, 10-character overlap). Integrated HuggingFace all-MiniLM-L12-v2 embeddings with FAISS vector store for s
  • S
    SWIN Transformer Image Classification
    Sep 2024 - Oct 2024 (2 months)
    Situation: I wanted to learn modern computer vision architectures beyond CNNs. SWIN Transformers were getting attention for image classification tasks, and I wanted hands-on experience implementing one for binary classification. Task: Implement and fine-tune a SWIN Transformer model on a binary image classification dataset to understand its performance characteristics and training requirements compared to traditional architectures. Action: Downloaded a Kaggle binary classification dataset. Implemented SWIN Transformer architecture using PyTorch/Hugging Face with data preprocessing and augmentation (random flips, rotations, color jittering). Configured training with Adam optimizer (learning rate: 0.003, weight decay: 0.3) for 10 epochs. Expe
  • S
    Subscription Service Churn Prediction
    Jun 2024
    Situation: Working with a subscription service dataset (likely a tutorial/Kaggle dataset), I wanted to predict customer churn and understand which factors most strongly correlate with customers leaving. Task: Build classification models to identify at-risk customers and analyze which features (tenure, contract type, monthly charges, services used) best predict churn behavior. Action: Performed data cleaning: handled missing values in TotalCharges column, encoded categorical features (gender, contract type, internet service) using label encoding, scaled numerical features using StandardScaler. Conducted EDA with correlation heatmaps and distribution plots. Built three models: Logistic Regression for interpretability, Decision Tree for featur
  • I
    IMDb Movie Sentiment Analysis
    Jun 2024
    Situation: I wanted to practice NLP techniques on a real-world text dataset. Movie reviews provide rich sentiment data with varied language, making them good for sentiment classification practice. Task: Build a sentiment classifier for movie reviews and compare multiple algorithms to understand which performs best on this type of text data. Action: Processed IMDb dataset with 14 features including review text, ratings, and movie metadata. Implemented text preprocessing pipeline: lowercasing, tokenization, stop-word removal using NLTK, and lemmatization. Generated sentiment scores using NLTK's VADER analyzer as baseline. Extracted TF-IDF features from review text. Trained four models: Naive Bayes, Logistic Regression, SVM, and Random Forest.
  • P
    Parkinson's Disease Prediction
    Jul 2023 - Sep 2023 (3 months)
    Situation: Parkinson's Disease affects millions globally, and early detection improves treatment outcomes. I worked with a medical dataset containing vocal features that could potentially indicate Parkinson's presence. Task: Build classification models to distinguish Parkinson's patients from healthy individuals using vocal biomarkers, and identify which features are most predictive. Action: Analyzed dataset with 22 features including vocal measurements (jitter, shimmer, harmonic-to-noise ratio) from voice recordings. Applied Principal Component Analysis (PCA) to reduce dimensionality and identify key feature patterns. Implemented three models: Multi-Layer Perceptron (neural network with 2 hidden layers), Support Vector Machine with RBF ker
  • W
    Wildfire Detection Using Deep Learning
    Apr 2023 - Sep 2023 (6 months)
    Situation: Wildfires cause massive destruction, and early detection can save lives and property. I wanted to explore whether deep learning could analyze satellite imagery or sensor data to identify wildfires earlier than traditional methods. Task: Build a computer vision model to detect wildfire presence and predict fire perimeter expansion using available wildfire datasets. Action: Worked with wildfire dataset containing spatial features (fireline length, perimeter, size, duration, spread speed). Applied data preprocessing: normalized features, handled missing values, split temporal sequences for time-series prediction. Implemented two approaches: MLP for tabular feature analysis and Random Forest for comparison. Trained models to predict
Awards verified_user 0% verified
  • D
    Member of Upsilon Pi Epsilon (UPE), International Honor Society for Computing and Information Disciplines
    Drexel University Upsilon Pi Epsilon
    Apr 2024
    Selected for lifetime membership based on outstanding academic performance (3.91 GPA) in computing and information disciplines. UPE recognizes academic excellence and professional achievement in computer science, data science, and information technology fields. Membership Benefits & Access: ACM (Association for Computing Machinery) student membership Communications of the ACM digital publications Exclusive access to computing research libraries and professional resources Network of distinguished computing professionals and researchers Recognition Criteria: Invitation extended to top-performing graduate students demonstrating exceptional academic achievement, technical competency, and leadership potential in data science and computing discip
Publications verified_user 0% verified
  • I
    Classification of Malignant Melanoma and Benign Skin Lesion with the Aid of Using Back Propagation Neural Network and AB
    International Journal of Psychosocial Rehabilitation
    Mar 2020
    🎯 Research Overview Developed automated skin cancer detection system addressing critical healthcare challenge of early melanoma diagnosis, where delayed detection significantly impacts patient survival rates and treatment costs. 💼 Healthcare Problem Melanoma skin cancer cases have risen dramatically over three decades, with early detection being crucial for successful treatment. Manual dermoscopy analysis is time-intensive and prone to human error, while late-stage detection reduces survival rates and increases healthcare costs exponentially. 🔧 Technical Solution Implemented comprehensive image processing pipeline using dermoscopy images with RGB-to-HSV conversion, Local Binary Pattern (LBP) preprocessing, and advanced feature extraction
  • I
    A Survey on Classification of Malignant Melanoma and Benign Skin Lesion by Using Machine Learning Techniques
    International Journal of Psychosocial Rehabilitation
    Mar 2020
    🎯 Research Overview Conducted comprehensive survey analyzing machine learning approaches for automated skin cancer detection, addressing the critical need for standardized diagnostic tools in dermatology where early detection saves lives and reduces healthcare burden. 💼 Healthcare Challenge Digital skin cancer diagnosis faced significant barriers including inconsistent diagnostic accuracy, lack of standardized datasets, and computational limitations preventing widespread adoption of automated screening systems. Manual diagnosis methods showed variability and required specialized expertise not available in all healthcare settings. 🔧 Technical Analysis Surveyed advanced machine learning methodologies including image preprocessing technique