Field Hardware Engineer, HPC at Mistral AI | Torre

Field Hardware Engineer, HPC

You'll power breakthrough AI by scaling France's largest GPU clusters.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Hybrid (Bruyères-le-Châtel, France)
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Posted 6 months ago

Requirements and responsibilities


About MistralAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.We are a dynamic, collaborative team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore. We are creative, low-ego and team-spirited.Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on https://mistral.ai/careers.Role summaryOur compute footprint is growing fast to support our science and engineering teams. We’re hiring a Field HW Engineer to understand end-to-end systems, execute complex/vendor-level interventions, and guide L1 engineers on site—without direct line management. You’ll work hands-on across compute, storage, interconnect and cooling to keep one of France’s largest GPU/CPU clusters healthy and scalable.Location: Bruyères-le-Châtel — on-site, field role (multi-site mobility: Paris area and nearby)Reporting line: Hardware OpsImpactCompute is a key lever for Mistral’s success and our largest spend item.Direct impact on scale: you’ll restore service on complex incidents and raise the bar on reliability as we grow.Enable breakthrough AI: your work unlocks science & engineering teams to deliver state-of-the-art AI.What you will doLead complex interventions: plan and execute vendor-level or multi-node operations (e.g., full rack work, intricate recabling, post-restart diagnosis), own risk assessment/rollback, and coordinate with vendors (RMA/escalations).Advanced diagnostics: correlate symptoms across compute, storage, interconnect, cooling; read system indicators (LED/POST/beep), BMC/IPMI consoles, and logs to identify root causes.Guide and uplift L1s: coach on safe practices (ESD/LOTO), first-line triage, rack craftsmanship, documentation quality; pair on tricky procedures. (No people management.)Process & automation: improve SOPs/checklists; propose/build small automation (Python/Bash) for photo/serial capture, inventory sync, dashboards/alerts; shorten MTTR.Safety & compliance: enforce lockout/tagout, ESD, PPE; ensure audit-ready tickets, evidence and change traces.Parts & logistics (advanced): plan spares strategy, track failure trends, and drive proactive vendor actions.About you5+ years in data center/server hardware or L2/L3 hardware support, with proven complex hands-on work in production (HPC/AI/Cloud at scale).End-to-end hardware expertise: comfortable across CPU/memory/PCIe cards (incl. accelerators), NICs, PSUs, drives, network, power and cooling (including DLC); strong judgment on when/how to escalate.Diagnostics depth: confident in analyzing BMC/IPMI logs, linux software logs and crashes simple CLI checks; methodical root cause analysis.Safety & discipline: impeccable ESD/LOTO/PPE habits; zero rough handling; clean, labeled, auditable work.Communication & mentoring: crisp status/handovers; able to coach L1s during live operations. Provide technical documentations to L1s or other teamMobility: willing to travel between sites (Paris area or nearby regions, occasionally in Europe or US)).Nice to haveVendor tools (iDRAC/iLO/IPMI), RAID/storage basics (NVMe/SAS/SATA), high-speed interconnect (Ethernet/InfiniBand).Coding/automation (Python/Bash) for small ops tools and reporting.Experience with ticketing (Jira/ServiceNow), inventory/RMA flows, vendor coordination.Location & RemoteThe position is based in our Paris HQ offices and we encourage going to the office as much as we can (at least 3 days per week) to create bonds and smooth communication. Our remote policy aims to provide flexibility, improve work-life balance and increase productivity. Each manager can decide the amount of days worked remotely based on autonomy and a specific context (e.g. more flexibility can occur during summer). In any case, employees are expected to maintain regular communication with their teams and be available during core working hours.What we offer💰 Competitive salary and equity package🧑‍⚕️ Health insurance🚴 Transportation allowance🥎 Sport allowance🥕 Meal vouchers💰 Private pension plan🍼 Generous parental leave policyAI disclaimerWe may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.