Staff Software Engineer, ML Training

at Stack AV
  • Remote - United States, Worldwide

Remote

Software Development

Senior

Job description

About Stack:

Stack is developing revolutionary AI and advanced autonomous systems designed to enhance safety, reliability, and efficiency of modern operations. Stack’s autonomous technology incorporates cutting-edge advancements in artificial intelligence, robotics, machine learning, and cloud technologies, empowering us to create innovative solutions that address the needs and challenges of the dynamic trucking transportation industry. With decades of experience creating and deploying real world systems for demanding environments, the Stack team is dedicated to developing an autonomous solution ecosystem tailored to the trucking industry’s unique demands.

About the Role:

The ML Training Team’s core mandate is training models as fast as possible for the company. The team’s main focus is ensuring our models have 100% gpu utilization and can scale linearly from 8 gpus -> 256 gpus. We also invest in tooling to empower our MLEs, by building profiling/debugging tools, setting up efficiency monitoring and integrating our trainer into our experiment management system.

Responsibilities:

  • Setup efficiency monitoring for all our training jobs to identify models that need improvement
  • Work with customer teams to benchmark/profile their jobs and make improvements
  • Create standardized APIs for stack-wide abstractions like training datasets, bulk inference jobs, evaluation metrics
  • Optimize dataloaders / training data formats to ensure high gpu utilization
  • Optimize distributed training configurations (network topologies, sharding strategies, pipelines, etc).

Qualifications:

  • Experience: 5+ years as a SWE, ideally building infrastructure/customer facing product, experience in AV or robotics is also great.

  •  Ideal Skills:

    • Experience with both ML Platforms and building ML-based applications (bonus point if you have modeling experience).
    • Experience building scalable, reliable infra at a fast-paced environment.
    • Experience building or using ML infra built for a large number of customer teams.
    • A deep understanding of design tradeoffs and ability to articulate those tradeoffs and work with others on getting alignment.
    • Experience with building ML models or ML infra in the domains of autonomous vehicles, perception, and decision making (desirable but not required).
    • Experience with model training, model optimization, or large data processing pipelines.
    • Machine Learning Expertise is preferred but not necessary.
    • Knows how to push the GPU to its limit from Python to CUDA kernel level.
    • Built the inference or training loop for a large model (ideally with LLM flavor).
    • Shipped ML products (NLP, computer vision, recommender systems, etc.) at scale to make business impact.
    • Knows how to build low latency / high throughput batch or stream processing pipelines.
    • Knows how to write (readable) high performance C++.
    • Prior AV experience.
  • Desired Attributes:

    • High customer empathy, able to communicate with customers well
    • Comfortable reading papers / keeping up with SOTA ML literature

#LI-AW1

We are proud to be an equal opportunity workplace. We believe that diverse teams produce the best ideas and outcomes. We are committed to building a culture of inclusion, entrepreneurship, and innovation across gender, race, age, sexual orientation, religion, disability, and identity.

Check out our Privacy Policy.

Please Note: Pursuant to its business activities and use of technology, Stack AV complies with all applicable U.S. national security laws, regulations, and administrative requirements, which can restrict Stack AV’s ability to employ certain persons in certain positions pursuant to a range of national security-related requirements. As such, this position may be contingent upon Stack AV verifying a candidate’s residence, U.S. person status, and/or citizenship status. This position may also involve working with software and technologies subject to U.S. export control regulations. Under these regulations, it may be necessary for Stack AV to obtain a U.S. government export license prior to releasing its technologies to certain persons. If Stack AV determines that a candidate’s residence, U.S. person status, and/or citizenship status will require a license, prohibit the candidate from working in this position, or otherwise be subject to national security-related restrictions, Stack AV expressly reserves the right to either consider the candidate for a different position that is not subject to such restrictions, on whatever terms and conditions Stack AV shall establish in its sole discretion, or, in the alternative, decline to move forward with the candidate’s application.

Share this job:
Please let Stack AV know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply