Senior Software Engineer

at CentML
  • Remote - Worldwide

Remote

Software Development

Senior

Summary

The job is for a senior infrastructure engineer at CentML, a company focused on reducing the cost of developing and deploying ML models. The role involves designing, developing, and maintaining the CentML platform that offers a cost-effective infrastructure for serving and training large scale machine learning models across multiple cloud service providers.

Requirements

  • 4+ years of experience working with containerized deployment systems (e.g., kubernetes, openshift, terraform etc.)
  • Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure
  • Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable
  • Strong coding skills in languages like Python, Java, Go, and/or C/C++

Responsibilities

  • Design and lead the development of the deployment infrastructure of the CentML platform
  • Implement GPU cluster scheduling solutions for large scale ML training and inference workloads
  • Communicate with product teams and define new features and goals for improving the CentML platform

Preferred Qualifications

  • Contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O
  • Past experience in building GPU clusters for large scale ML training and inference is desirable

Benefits

  • An open and inclusive culture and work environment
  • Fully stocked kitchen at the office
  • Full health and dental benefits
  • Parental Leave top-up for 6 months
  • Continuous education budget
  • Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask
Share this job:
Please let CentML know you found this job on Remote First Jobs 🙏
Apply now