Machine Learning Engineer - Pre Training at Mindbeam | Torre
warning

Heads-up

The job you’re trying to post already exists in Torre:

Machine Learning Engineer - Pre Training

You'll optimize large-scale AI pre-training systems, empowering groundbreaking generative AI models.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Compensation
USD150k - 190k/year
location_on
Remote (for United States residents)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
about 1 month ago

Requirements and responsibilities


About MindbeamWe are building the next-generation AI infrastructure for open source and enterprise. Our work is deeply research-oriented and passionate about developing ground-breaking innovations to take state-of-the-art AI applications to the next level.What drives us is not only advancing technology, but empowering the people behind it. We are a community of researchers, engineers, and visionaries who believe that collaboration, curiosity, and openness fuel progress. If you’re motivated by impact and inspired to build tools that others can build upon, you’ll be in the right place.MissionDesign and optimize large-scale pre-training systems that power Mindbeam’s generative AI models.Role ExpectationsBuild scalable pre-training pipelines for foundation models, optimizing throughput and efficiency.Implement distributed training strategies across GPUs/TPUs and high-performance clusters.Collaborate with researchers to translate experimental setups into production-ready workflows.Develop monitoring and fault-tolerance systems to ensure reliable large-scale training.Continuously benchmark and tune performance across hardware and software stacks.BackgroundBachelor’s, Master’s, or PhD in Computer Science, Engineering, or related field—or equivalent experience.2+ years of experience with large-scale model training and distributed systems.Strong coding skills in Python and familiarity with ML frameworks (PyTorch, TensorFlow, JAX).Experience with GPU scheduling, memory optimization, and parallelism strategies.Comfort with containerized and orchestrated environments (Docker/Kubernetes).Understanding of high-performance computing and networking bottlenecks.About YouYou thrive on scale and complexity. You enjoy solving system-level bottlenecks, pushing hardware and software to their limits, and working closely with researchers to accelerate cutting-edge AI development.
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.