Job Description

About Turing

Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises looking to deploy advanced AI systems. Turing accelerates frontier research with high-quality data, specialized talent, and training pipelines that advance thinking, reasoning, coding, multimodality, and STEM. For enterprises, Turing builds proprietary intelligence systems that integrate AI into mission-critical workflows, unlock transformative outcomes, and drive lasting competitive advantage.

Recognized by Forbes, The Information, and Fast Company among the world’s top innovators, Turing’s leadership team includes AI technologists from Meta, Google, Microsoft, Apple, Amazon, McKinsey, Bain, Stanford, Caltech, and MIT. Learn more at www.turing.com

This is a remote role and can be performed anywhere in Colombia.

The Role

We are looking for a Research Engineer to help deliver frontier-quality datasets, RL environments, and evaluations that improve state-of-the-art models for leading AI labs and enterprise clients.

This is a hands-on, research-facing technical leadership role. You will work directly with customer researchers/engineers to translate their model and post-training goals into concrete data and environment specifications, and drive the production of data that meets extremely high standards for correctness, realism, diversity, difficulty, and measurable model lift.

This role is designed for candidates with roughly 4 to 5 years of experience building and improving deep learning systems, especially where strong results depend on data quality, data curation, denoising, synthetic data generation, and rigorous evaluation. You’ll operate in one or more of the following capability areas:

  • Coding and software engineering agents (repositories, unit tests, debugging, tool use, code reviews, long-horizon workflows)
  • RL environments and verifier-based training (tasks, rewards/verifiers, trajectories, evaluation harnesses)
  • Multimodal data and reasoning (text + images + documents + tables/charts; optional audio/video)
  • STEM reasoning (math, physics, chemistry, bio, engineering – solution verification and error analysis)
  • Modern embodied AI / VLM-driven agents (vision-language(-action) models, embodied task suites, tool/sensor/action abstractions, long-horizon interaction data)

What You’ll Do

1) Own data and environment quality from an AI researcher perspective

  • Translate ambiguous research goals into clear data requirements: target skills, failure modes, difficulty calibration, coverage, and success metrics.
  • Define what “good” looks like by creating detailed rubrics, counterexamples, and boundary cases (what to include vs. exclude).
  • Perform deep, detail-oriented audits of produced data: spot subtle errors, reward hacking opportunities, leakage, ambiguity, inconsistent assumptions, and distribution shifts.
  • Drive iterative improvements using evidence: error taxonomies, slice-based quality metrics, and model-behavior-informed refinements.

2) Design and build datasets and RL environments for your capability area(s)

  • Contribute to or lead the design of:

    • Task suites (single-step and long-horizon workflows)
    • Ground-truth signals (verifiers, unit tests, structured checks, reward functions, automatic validators)
    • Environment interfaces (APIs, tool schemas, state abstractions, database schemas, simulator-like dynamics)
  • Depending on your mapped capability area(s), you may focus on:

    • Coding / SWE agents: data reflecting real development work (codebase navigation, bug localization, patching, tests, code reviews, CI-like constraints, refactors, security fixes).
    • Multimodality: tasks that test true multimodal reasoning (chart reading, document QA, UI understanding, diagram-based STEM reasoning, OCR-aware tasks).
    • STEM: tasks with verifiable solutions (symbolic checks, reference solvers, numerical validation, step consistency, unit sanity).
    • Modern embodied AI / VLM-driven agents: interaction data and environments for vision-language(-action) models (long-horizon tasks, instruction following grounded in visual context, robust action selection, safety/constraint adherence, adversarial state coverage).

3) Build robust validation, denoising, and synthetic data systems

  • Implement automated validation and filtering to achieve frontier-grade signal-to-noise:

    • Deduplication, decontamination, leakage checks
    • Consistency checks (format, schema, invariants)
    • Difficulty and diversity controls (coverage, novelty, long-tail)
  • Develop synthetic data generation and augmentation pipelines where appropriate:

    • Programmatic task generators
    • Controlled perturbations to create hard negatives
    • Scenario templating with diversity constraints
    • Simulator-/tool-driven rollouts for trajectory data
  • Create documentation and data cards: dataset intent, known limitations, recommended use, and evaluation linkage.

4) Use evaluations and training runs to prove impact

  • Design and run evals that reflect the customer’s intended usage.

  • Produce analysis that connects data to outcomes:

    • Pre/post comparisons on targeted capability slices
    • Error breakdowns and “why the model failed” narratives
    • Ablations to identify which data attributes drive lift
  • When needed, run in-house fine-tuning or RL-style experiments (or partner with research) to demonstrate that the data/environment improves model behavior in measurable ways.

5) Collaborate effectively with large production teams without being ops-heavy

  • Work with cross-functional teams (engineers, researchers, QAs, domain SMEs, and large-scale data production groups) by providing:

    • Clear specs, examples, and edge cases
    • Fast feedback loops based on audits and quantitative signals
    • Structured review processes focused on quality, not throughput alone
  • You are expected to be highly engaged in reviewing and improving outputs from large annotation/creation efforts, but notprimarily responsible for hiring, staffing, or people operations.

Who We’re Looking For

  • 4–5 years of experience building or improving deep learning systems where data quality mattered materially (training, post-training, evals, or agentic systems).

  • Strong intuition for the “data ingredients” that drive model improvements: what to collect, what to filter, what to synthesize, and how to measure.

  • Ability to communicate clearly with researchers and engineers: turning research objectives into concrete specs, and turning messy outputs into actionable insights.

  • Demonstrated ability to be extremely detail-oriented in diagnosing subtle data quality issues and failure modes.

  • Solid programming ability with a bias for shipping:

    • Python proficiency required
    • Comfort with SQL/structured data workflows strongly preferred
    • For coding-focused work: proficiency in one or more major languages (e.g., C++, Java, Go, Rust, JS/TS) is a plus
  • Comfort designing quality systems:

    • Rubrics, validation scripts, gold sets, sampling strategies
    • Statistical checks and slice-based evaluation
    • Human-in-the-loop review loops grounded in measurable criteria

Strong pluses

  • RL or post-training experience (any of: RLHF/RLAIF, verifier training, reward modeling, RL fine-tuning, environment design).
  • Experience with agentic evaluation (tool use, multi-step workflows, long-horizon tasks, trajectory analysis).
  • Multimodal expertise (document understanding, charts, diagrams, OCR, UI/vision grounding; audio/video optional).
  • STEM depth (math/physics/engineering) with an eye for verifiability and rigorous correctness.
  • Modern embodied AI / VLM-driven agent experience (vision-language(-action) models, interaction datasets, embodied evals, long-horizon grounding, tool/sensor/action interfaces).
  • Systems thinking: ability to “simulate” an application’s API/data schema and design tasks that realistically reflect real-world constraints and workflows.

Why Turing

  • Work directly with the world’s leading AI labs and enterprises at the cutting edge of post-training and RL environment design.
  • Real impact (path to AGI): your datasets and environments will directly influence the trajectory toward Artificial General Intelligence and, ultimately, Superintelligence.
  • Real Impact (GDP): the systems you help build and evaluate target high-value workflows across industries, where even incremental improvements translate to significant productivity gains.
  • Talent-dense team, where you’ll find high autonomy, rapid iteration, and an exceptional learning curve.

Values:

  • We are client first: We put our clients at the center of everything we do, because their success is the ultimate measure of our value.
  • We work at Start-Up Speed: We move fast, stay agile and favor action because momentum is the foundation of perfection
  • We are Al forward: We help our clients build the future of Al and implement it in our own roles and workflow to amplify productivity.

Advantages of joining Turing:

  • Amazing work culture (Super collaborative & supportive work environment; 5 days a week)
  • Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience)
  • Competitive compensation
  • Flexible working hours

Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. Turing is proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, or any other legally protected characteristics. At Turing we are dedicated to building a diverse, inclusive and authentic workplace  and celebrate authenticity, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.

For applicants from the European Union, please review Turing’s GDPR notice here.

Share this job:
Please let Turing know you found this job on Remote First Jobs 🙏

5676 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like Turing

Find your next opportunity with companies that specialize in B2b, Ai, Machine Learning, and Hire Developers. Explore remote-first companies like Turing that prioritize flexible work and home-office freedom.

Snorkel AI Logo

Snorkel AI

51-200 snorkel.ai

Developing data layer solutions for specialized artificial intelligence, supporting frontier labs, enterprises, and government agencies.

View company profile →
Mission, a CDW Company Logo

Mission, a CDW Company

Provides end-to-end cloud managed services, consulting, and AI solutions for AWS customers.

View company profile →
IT Concepts, Inc Logo

IT Concepts, Inc

501-1000 www.kentro.us

Provides digital solutions, IT modernization, and specialized services to federal agencies.

View company profile →
Ubiminds: You, International. Logo

Ubiminds: You, International.

Connects North American companies with Latin American tech talent for software development and team augmentation.

View company profile →
ARHS Group Logo

ARHS Group

Provides IT project and system management, software development, data science, and cloud solutions.

View company profile →
SADA Logo

SADA

501-1000 sada.com

Transforming businesses with technology

View company profile →

Project: Career Search

Rev. 2026.3

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7
Apply