Large Language Model (LLM) Engineer – ML & AI
Location: Full Time, In Office (Gurugram / Bengaluru).
About Us:
We are an early-stage startup disrupting the Observability domain with Generative AI and Machine Learning. Our team includes experienced entrepreneurs and engineers who have built multiple billion-dollar products from scratch. As a well-funded US-based company backed by top-tier VCs, we have offices in the US, India, and Europe. Join us in our fast-paced environment where you’ll have a front-row seat to shape the future of AI-driven Observability solutions.
What You’ll Work On:
* Train and fine-tune Large Language Models (LLMs) for tasks related to reasoning, diagnostics, and observability.
* Build efficient LLM distillation and quantization pipelines to optimize large models for real-time performance.
* Design LLM evaluation frameworks to benchmark model accuracy, reasoning capabilities, and production-readiness.
* Develop prompt engineering strategies and instruction tuning datasets tailored to observability and monitoring use cases.
* Create LLM Ops workflows to manage model lifecycle, including versioning, deployment, and monitoring.
* Integrate LLMs into an intelligent root cause analysis system powered by causal reasoning, time series data, and anomaly detection.
* Collaborate with ML engineers and product teams to translate research into production-grade features.
* Build tools to simulate, test, and evaluate LLM agents for diagnostic and monitoring applications.
* Handle large-scale datasets using Python and its ML ecosystem (NumPy, Pandas, HuggingFace Transformers, PyTorch, LangChain).
What We’re Looking For:
* 3–5 years of experience in AI, Machine Learning, or NLP.
* 2+ years of hands-on experience building models from the ground up or fine-tuning large language models (multi-billion parameters).
* Deep expertise in LLM fine-tuning, distillation, model compression, and efficient inference techniques.
* Bachelor’s degree in Computer Science, AI/ML, or a related field; Master’s or PhD preferred.
* Proficiency in Python and libraries like Transformers, PyTorch, and TensorFlow.
* Experience building LLM training datasets, synthetic data generators, and prompt tuning frameworks.
* Familiarity with LLM evaluation, including factuality, hallucination detection, and functional correctness.
* Strong understanding of LLM Ops principles (model tracking, deployment pipelines, performance monitoring).
* Prior experience with time series analysis, anomaly detection, or causal inference is a plus.
* Background in LLM agent development, multi-agent coordination, or autonomous reasoning systems is a strong plus.
* Experience in observability, monitoring, or DevOps tooling is a big bonus.
Our Values:
* Loyalty and long-term commitment.
* Opinionated yet open-minded.
* Passion for craft and innovation.
* Humility and integrity.
* Adaptability and self-sufficiency.
* Rapid iteration: build fast and break fast.
What You’ll Build:
You will be at the forefront of creating the next-generation Observability platform using advanced LLMs to reason over complex system signals. You will fine-tune large models, optimize them for production, and develop frameworks to evaluate and deploy intelligent AI agents that assist in diagnostics and monitoring. This is a rare opportunity to work with a veteran founding team and shape the future of AI-driven infrastructure.