S Vikas

AI Lead Engineer

Artera

Jan 2023 - Jan 2025 (2 years 1 month)

• Led the end-to-end development of a multi-agentic conversational AI pipeline for Artera, a healthcare platform streamlining patient scheduling via automated calling tools. • Architected a low-latency Orchestration Layer to manage specialized Al agents (Patient Search, Scheduling, Cancellations) by accurately identifying and routing user intent in real-time. • Reduced system latency by over 60% (from 1800ms to <700ms) by implementing streaming, chunking, and asynchronous calling strategies. • Engineered an in-house Voice-to-Text (STT) solution integrated with Twilio and WebSockets to replace high-latency third-party tools, ensuring seamless real-time communication. • Developed an "LLM-as-a-Judge" evaluation framework to automate quality as

Senior Software Engineer (AI/ML)

Meta

Jan 2021 - Jan 2023 (2 years 1 month)

• Designed and scaled LLM inference services supporting multi-million daily request volume, enabling Al features across Meta's core consumer products with high availability and low latency. • Developed model-agnostic Al orchestration infrastructure, enabling rapid experimentation and safe rollout of GenAl features, unlocking faster iteration cycles and consistent production standards. • Introduced cost-aware inference routing and fallback strategies, optimizing latency-quality trade-offs and reducing overall compute spend by double-digit percentages. • Collaborated with product and ML teams to ship Al-powered experiences globally, balancing experimentation velocity with reliability, safety, and platform constraints. • Optimized inference pi

Senior Software Developer

TestGorilla

Jan 2019 - Jan 2021 (2 years 1 month)

• Designed and scaled machine learning inference services supporting multi-million daily request volumes, enabling LSTM, CNN, and regression-based models with high availability and low latency. • Developed model-agnostic infrastructure to support training, testing, and deployment of deep learning and statistical models, enabling rapid experimentation and safe production rollouts. • Implemented cost-aware inference routing and fallback strategies for multiple model types, optimizing latency-accuracy trade-offs and reducing overall compute costs by double-digit percentages. • Collaborated with product and ML teams to deploy time-series (LSTM), computer vision (CNN), and predictive regression models globally, balancing experimentation speed wi

C

Full Stack Developer

CARMA

Jan 2017 - Jan 2019 (2 years 1 month)

• Developed end-to-end web applications including frontend UI and backend services. • Built responsive user interfaces and robust server-side logic. • Integrated third-party APIs and managed databases. • Collaborated with designers and stakeholders to translate requirements into working features. • Ensured application security, performance, and maintainability.

About

Detail

Timeline

Résumé