Staff Software Engineer AI/ML Infrastructure

💰 $270k-$371k
🇺🇸 United States - Remote
💻 Software Development🟣 Senior

Job description

The Chan Zuckerberg Initiative was founded by Priscilla Chan and Mark Zuckerberg in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education to addressing the needs of our local communities. Our mission is to build a more inclusive, just, and healthy future for everyone.

The Team

Our Central Tech team provides technology and security support for CZI, the Biohub Network,  and our grantees. We believe that Engineering and Security are most effective when in sync and learning from each other on a daily basis.  Our AI Infrastructure Engineering team enables our AI Research teams to achieve their goals faster and more securely. We leverage technology to automate manual processes, constantly innovate to optimize operations, provide first-class support, and build solutions to enable the scale and execution of our business partners’ strategies and initiatives.

The Opportunity

The AI/ML and Data Engineering Infrastructure organization works on building shared tools and platforms to be used across all of the Chan Zuckerberg Initiative and CZ Biohub, partnering and supporting the work of a wide range of Research Scientists, Data Scientists, AI Research Scientists, as well as a broad range of Engineers focusing on Education and Science domain problems. Members of the central technology’s infrastructure engineering team have an impact on all of CZI’s initiatives by enabling the technology solutions used by other engineering teams at CZI to scale. A person in this role will build these technology solutions and help to cultivate a culture of shared best practices and knowledge around AI/ML infrastructure.

What You’ll Do

  • Lead the design and delivery of secure, scalable, and high-performance AI/ML compute infrastructure.
  • Architect and implement containerized AI/ML platforms using Kubernetes for heterogeneous, distributed environments.
  • Integrate on-prem (High Performance Compute) and cloud-based AI platforms with GPU clusters to support pre-training, training, fine-tuning, and inference workflows.
  • Define and execute systems integration strategies to maximize performance, scalability, and security for AI workloads.
  • Enable research teams to effectively use AI platforms through best practices in lifecycle management and deployment.
  • Solve complex challenges in scaling AI workflows and optimizing model training and inference pipelines.

What You’ll Bring

  • BS/MS in Computer Science or related field, or equivalent experience, with 8+ years in coding and systems architecture/design across AI/ML and core infrastructure.
  • Proven proficiency in a systems language (C, C++, C#, Go, Rust, Java, Scala) and a scripting language (Python, PHP, Ruby).
  • Expertise in cloud platforms (AWS, GCP, Azure) and hybrid environments, including on-premises and colocation hosting.
  • Strong experience in AI/ML platform operation technologies (e.g. Slrum, Sunk, Run:ai, Kubeflow)
  • Advanced skills in scaling and securing containerized applications on Kubernetes, including custom container development and CI/CD integration.
  • Working knowledge of Nvidia CUDA, AI/ML custom libraries, and Linux systems optimization/administration.

Compensation

The Redwood City, CA and New York City, NY base pay range for this role is $270,000.00 - $371,800.00

The Chicago, IL base pay range for this role $230,000.00 - $315,700.00

New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process.

Work Mode

As we grow, we’re excited to strengthen in-person connections and cultivate a collaborative, team-oriented environment. This role is a hybrid position requiring you to be onsite for at least 60% of the working month, approximately 3 days a week, with specific in-office days determined by the team’s manager. The exact schedule will be at the hiring manager’s discretion and communicated during the interview process.

Benefits for the Whole You

We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.

  • CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
  • Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
  • CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
  • Paid time off to volunteer at an organization of your choice.
  • Funding for select family-forming benefits.
  • Relocation support for employees who need assistance moving to the Bay Area
  • And more!

If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.

Explore our work modes, benefits, and interview process at www.chanzuckerberg.com/careers.

#LI-Hybrid

Share this job:
Please let Chan Zuckerberg Initiative know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply