QA Engineer, AI Products at MDCalc | Torre

QA Engineer, AI Products

You'll ensure AI quality and clinical trustworthiness, impacting millions of patients globally.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (for United States residents)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
22 days ago

Requirements and responsibilities


The OpportunitySince 2005, MDCalc has been an essential part of the clinician’s workflow to help achieve better patient outcomes. Actively used by more than 65% of physicians worldwide, MDCalc is the most broadly used medical reference – at the point-of-care – for clinical decision tools and content, and one of only four references used by >50% of US HCPs. These evidence-based tools and content are used by millions of medical professionals globally and support 50+ specialties and cover 200+ patient conditions.To continue to further accelerate and steward this growth, we are expanding the AI product team with a QA Engineer. This role will be critical to MDCalc’s expanded success in continuing to support our millions of clinical users worldwide in taking care of hundreds of millions of patients.The RoleAs a QA Engineer on the AI Products group at MDCalc, you will play a key role in ensuring the quality, reliability, and clinical trustworthiness of MDCalc's AI-powered features. You'll focus on the unique challenges of testing LLM-based systems, where outputs are non-deterministic, correctness is often a spectrum rather than a binary, and regressions can be subtle. You'll be part of a collaborative, fast-moving team that takes pride in delivering software that clinicians trust to care for millions of patients worldwide.The responsibilities of this individual include the following, but are not limited to:Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detectionBuild and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputsPerform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge casesDefine quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost) and establish thresholds for release readinessCollaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to define what "good" looks like for AI responsesInvestigate and triage AI failure modes, distinguishing model issues, prompt issues, retrieval issues, and integration bugsParticipate in team discussions, offering feedback on testability, risks, prompt design, and guardrailsHelp develop QA strategies to expand future testing capacity, automation, and evaluation coverage as the AI product surface growsYour Background5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered featuresStrong understanding of QA principles, test case creation/documentation, and best practices for both deterministic and non-deterministic systemsHands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.)Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testingProficiency with test automation tools, with a focus on PlaywrightStrong SQL skills for data validation, test data creation, and verifying data integrity across systemsFamiliarity with token usage, latency profiling, and cost monitoring as quality signalsEagerness to learn quickly and a positive, solutions-oriented attitudeClear and concise communicator, able to surface issues, blockers, and risks effectively when communicating ambiguous or probabilistic failuresSelf-motivated, proactive, and able to manage time and priorities independentlyWhat MDCalc offers:Ability to make a true difference in medicine: MDCalc is the most broadly used medical reference by physicians, used by over 65% of US attending doctors weeklyMedical, Dental, & Vision Coverage, with option to extend to your dependentsCompany-sponsored short-term insuranceFully-paid 8 week parental leave, after 6 months of employmentCompany-sponsored 401k, after 3 months of employmentUnlimited vacation for salaried roles - we trust you to take the time you needBi-annual company offsites to connect, reflect, and plan togetherWork from home monthly stipendA culture of fun and motivated team members who believe in a greater mission here at MDCalc
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.