Lead the development of advanced artificial intelligence solutions, focusing on large language models (LLMs) and other generative techniques. You will research, design, deploy, and manage LLM-based systems to enhance user experiences and solve complex problems. This role involves hands-on engineering and collaboration with cross-functional teams.
**Key Responsibilities**
- Adapt LLMs and generative AI models to address business needs, such as conversational interfaces, question-answering systems, and content generation.
- Deploy models as scalable services (APIs, microservices), ensuring performance, reliability, and security.
- Enhance model capabilities by integrating external knowledge using RAG, integrating data sources using vector databases or specialized frameworks (e.g., LangChain, LlamaIndex).
- Develop advanced prompt strategies and employ agent-based reasoning for complex tasks, leveraging chain-of-thought prompting and relevant libraries.
- Explore methods like quantization, distillation, and hardware acceleration to improve efficiency, reduce latency, and ensure cost-effectiveness.
- Collaborate closely with engineering, data science, and product teams. Document architectures, experiments, and best practices.
- Stay updated on the latest AI and LLM research, evaluating new models, algorithms, or frameworks, and integrating them when beneficial.
**Required Qualifications**
- Bachelor's degree in Computer Science or Software Engineering.
- 6+ years of experience in software engineering, at least 2 in AI-related development, focusing on NLP and generative AI.
- Proficiency with systems programming and full-stack development.
- Solid foundation in machine learning concepts and transformer-based models.
- Hands-on experience designing, deploying, and maintaining LLM-based systems for real-world use cases.
- Experience with vector databases (e.g., Chroma, Weaviate, Pinecone).
- Proficiency in Python.
- Experience using LLM APIs and orchestration frameworks (e.g., LangChain, LlamaIndex).
- Skilled in cloud platforms (AWS, GCP, Azure), containerization (Docker, Kubernetes).
**Preferred Qualifications**
- Performance tuning and optimized model serving experience.
- Familiarity with hardware acceleration (GPUs/TPUs) and NVIDIA CUDA.