AI Researcher – Multilingual Data at Featherless AI | Torre
warning

Heads-up

The job you’re trying to post already exists in Torre:

AI Researcher – Multilingual Data

You'll pioneer multilingual AI research, publishing high-impact work and shaping next-gen language models.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (anywhere)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
20 days ago

Requirements and responsibilities


About the RoleWe’re looking for an AI Researcher focused on multilingual data to help us build and scale next-generation language models across diverse languages and domains. You’ll own research and execution around data sourcing, curation, evaluation, and training strategies for multilingual and low-resource languages, with a strong emphasis on publishing high-quality research and translating it into production systems.This role is ideal for someone who enjoys working close to the frontier: balancing papers, prototypes, and real-world impact in a fast-moving startup environment.What You’ll DoDesign and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurementDevelop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)Research and improve cross-lingual transfer, alignment, and robustness in large language modelsBuild and maintain evaluation benchmarks for multilingual performanceCollaborate with engineers and researchers on training pipelines and model architecture decisionsPublish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriateTranslate research insights into practical improvements in production modelsWhat We’re Looking ForStrong background in NLP / ML research, with a focus on multilingual or cross-lingual modelingPublication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)Experience working with large-scale text datasets across multiple languagesSolid understanding of:Tokenization and vocabulary design for multilingual modelsData quality metrics, filtering, and dataset biasTransfer learning and multilingual representation learningComfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)Ability to operate independently and ship research in a startup pace environmentNice to HaveExperience with low-resource languages or non-Latin scriptsOpen-source contributions in NLP or data toolingExperience training or evaluating large language modelsFamiliarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)Why Join UsReal ownership over research direction and impactA team that values papers and productionAccess to meaningful scale: large datasets, modern infrastructure, and fast iterationCompetitive compensation and meaningful equity at an early stage
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.