CloudWalk Logo

Research Engineer - Distributed Training Closed

Job Description

About CloudWalk:

CloudWalk is building the intelligent infrastructure for the future of financial services. Powered by AI, blockchain, and thoughtful design, our systems serve millions of entrepreneurs across Brazil and the US every day.

Our AI team trains large-scale language models that power real products - from payment intelligence and credit scoring to on-device assistants for merchants.

About the Role:

We’re looking for a Research Engineer to design, scale, and evolve CloudWalk’s distributed training stack for large language models. You’ll work at the intersection of research and infrastructure - running experiments across DeepSpeed, FSDP, Hugging Face Accelerate, and emerging frameworks like Unsloth, TorchTitan, and Axolotl.

You’ll own the full training lifecycle: from cluster orchestration and data streaming to throughput optimization and checkpointing at scale. If you enjoy pushing the limits of GPUs, distributed systems, and next-generation training frameworks, this role is for you.

Responsibilities:

  • Design, implement, and maintain CloudWalk’s distributed LLM training pipeline.
  • Orchestrate multi-node, multi-GPU runs across Kubernetes and internal clusters.
  • Optimize performance, memory, and cost across large training workloads.
  • Integrate cutting-edge frameworks (Unsloth, TorchTitan, Axolotl) into production workflows.
  • Build internal tools and templates that accelerate research-to-production transitions.
  • Collaborate with infra, research, and MLOps teams to ensure reliability and reproducibility.

Requirements:

  • Strong background in PyTorch and distributed training (DeepSpeed, FSDP, Accelerate).
  • Hands-on experience with large-scale multi-GPU or multi-node training.
  • Familiarity with Transformers, Datasets, and mixed-precision techniques.
  • Understanding of GPUs, containers, and schedulers (Kubernetes, Slurm).
  • Mindset for reliability, performance, and clean engineering.

Bonus:

  • Experience with Ray, MLflow, or W&B.
  • Knowledge of ZeRO, model parallelism, or pipeline parallelism.
  • Curiosity for emerging open-source stacks like Unsloth, TorchTitan, and Axolotl.

Our process is simple: a deep conversation on distributed systems and LLM training, and a cultural interview.

Competitive salary, equity, and the opportunity to shape the next generation of large-scale AI infrastructure at CloudWalk.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

1595 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like CloudWalk

Explore remote-first companies similar to CloudWalk. Discover other top-rated employers that offer flexible schedules and work-from-anywhere options.

iKhokha Logo

iKhokha

A financial technology company providing digital payment solutions, business tools, and funding to South African entrepreneurs.

View company profile →
MLabs Logo

MLabs

51-200 mlabs.city

Technical project consulting

View company profile →
Sezzle Logo

Sezzle

201-500 sezzle.com

Provides interest-free installment payment plans, empowering consumers with tools to manage spending and finances.

View company profile →
ETHGlobal Logo

ETHGlobal

Fosters an ecosystem for Ethereum developers and entrepreneurs through hackathons and educational events.

View company profile →
Coinify Logo

Coinify

Payments infrastructure for crypto

View company profile →
Backflip Logo

Backflip

A proptech and fintech company empowering real estate entrepreneurs to rejuvenate housing and improve local communities.

View company profile →

Project: Career Search

Rev. 2026.3

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7