Job Description

In this role, you’ll work closely with model researchers, data infrastructure engineers, and cross-functional partners to make sure our data is high quality and can be produced at petabyte scale in a reliable, efficient way. From understanding how data choices show up in model behavior, to building processing pipelines and running the compute behind them, you’ll help ensure our models are trained on the best data we can get.

What you’ll do

  • Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds

  • Explore open source datasets and create internal ones most suitable to build fundamental World Models

  • Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.

  • Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs

  • Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction

  • Track and optimize throughput, storage, and compute utilization across pipelines and related assets

What we’re looking for

  • Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems

  • Comfortable moving between research questions and production engineering: you can dig into data, run analyses, and also ship reliable systems

  • Demonstrated research experience with data compositions, quality, and dataset releases

  • Ability to design and execute experiments with convincing unbiased outcomes

  • Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)

  • Solid Python skills, and familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)

  • Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale

  • Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers

  • Bonus: experience with large video datasets, dataset curation for training, or building internal tooling for evaluation/analysis in ML environments

Share this job:
Please let Reka AI know you found this job on Remote First Jobs 🙏

492 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like Reka AI

Explore remote-first companies similar to Reka AI. Discover other top-rated employers that offer flexible schedules and work-from-anywhere options.

Clarifai Logo

Clarifai

An AI platform for creating, managing, and deploying AI workloads for unstructured image, video, text, and audio data.

View company profile →
Lightricks Logo

Lightricks

Developing AI-first creative products and generative video models for creators and businesses.

View company profile →
ElevenLabs Logo

ElevenLabs

AI audio research and product company offering voice generation, dubbing, and conversational agents across 70+ languages.

View company profile →
Article Group Logo

Article Group

A strategic advisory and creative studio offering marketing services for ambitious businesses.

View company profile →
Mintel Logo

Mintel

1001-5000 www.mintel.com

Global market intelligence and research providing insights into consumer behavior and market trends.

View company profile →
Nas Daily Logo

Nas Daily

A global social media agency specializing in multi-language content creation, storytelling, and media production.

View company profile →

Project: Career Search

Rev. 2026.5

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7
Apply