Member of Technical Staff- Data Intelligence at Reka | Torre
warning

Heads-up

The job you’re trying to post already exists in Torre:

Member of Technical Staff- Data Intelligence

You'll define and build petabyte-scale data pipelines to train fundamental World Models.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (for United States residents)
Remote (for United Kingdom residents)
Remote (for Singapore residents)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
about 2 months ago

Requirements and responsibilities


In this role, you’ll work closely with model researchers, data infrastructure engineers, and cross-functional partners to make sure our data is high quality and can be produced at petabyte scale in a reliable, efficient way. From understanding how data choices show up in model behavior, to building processing pipelines and running the compute behind them, you’ll help ensure our models are trained on the best data we can get.What you’ll doWork with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholdsExplore open source datasets and create internal ones most suitable to build fundamental World ModelsBuild algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runsOwn CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce frictionTrack and optimize throughput, storage, and compute utilization across pipelines and related assetsWhat we’re looking forStrong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systemsComfortable moving between research questions and production engineering: you can dig into data, run analyses, and also ship reliable systemsDemonstrated research experience with data compositions, quality, and dataset releasesAbility to design and execute experiments with convincing unbiased outcomesPractical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)Solid Python skills, and familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scaleAble to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineersBonus: experience with large video datasets, dataset curation for training, or building internal tooling for evaluation/analysis in ML environments
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.