Machine Learning Engineer — Multilingual Data at Featherless AI | Torre
warning

Heads-up

The job you’re trying to post already exists in Torre:

Machine Learning Engineer — Multilingual Data

You'll own and scale multilingual data pipelines, ensuring global model generalization and linguistic diversity.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Compensation is to be agreed upon.
location_on
Remote (for World residents)
Shared by
Emma of Torre.ai
about 12 hours ago

Requirements and responsibilities


We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement. You’ll work closely with researchers and infra engineers to ensure our models perform robustly across languages, scripts, and cultural contexts.This role sits at the intersection of data, research, and production ML and is ideal for someone who cares deeply about data quality, linguistic diversity, and model generalization beyond English.What You’ll DoDesign, build, and maintain large-scale multilingual datasets across high- and low-resource languagesDevelop data pipelines for collection, cleaning, normalization, deduplication, and labelingImplement quality filters using statistical, heuristic, and model-based methodsWork with researchers to define language coverage, benchmarks, and evaluation metricsAnalyze dataset bias, coverage gaps, and failure modes across regions and scriptsSupport training, fine-tuning, and distillation workflows with high-quality multilingual dataContinuously iterate on datasets based on model performance and real-world usageWhat We’re Looking For3+ years of experience as an ML Engineer, Applied Scientist, or similar roleStrong experience working with multilingual or non-English datasetsSolid understanding of NLP fundamentals (tokenization, embeddings, language modeling)Experience building scalable data pipelines (Python, Spark, Ray, or similar)Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirksComfort collaborating with researchers and translating research needs into production systemsNice to HaveExperience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)Exposure to LLM training, fine-tuning, or distillationLinguistics background or experience working with native language expertsContributions to open-source datasets or ML toolingExperience with data quality evaluation at scaleWhy JoinReal ownership over a core differentiator of the productWork on models used globally, not just in English-speaking marketsSmall, high-caliber team with deep ML and systems experienceCompetitive compensation + meaningful equity at Series A stage
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.