Senior Site Reliability Engineer

💰 $167k-$216k

Job description

Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic. Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or prediabetes. Virta is changing this by helping people reverse type 2 diabetes through innovations in technology, personalized nutrition, and virtual care delivery reinvented from the ground up. We have raised over $350 million from top-tier investors, and partner with the largest health plans, employers, and government organizations to help their employees and members restore their health and live diabetes-free. Join us on our mission to reverse diabetes in 100M.

As an SRE on the Infrastructure team at Virta, you will be building the foundation that will help our company move as fast as possible while meeting security and compliance requirements. Key projects for the team over the next two quarters include:

  • Implement an AI‑driven observability and metrics platform that automatically detects anomalies and highlights SLO risks, enabling product teams to make data‑driven decisions.

  • Enhancing system observability, reliability, and efficiency using off-the-shelf technology combined with internal tools developed in Python and Go to increase transparency and visibility into our systems as well as centralizing data.

  • Building out more products for our Product Development teams like observability (SLOs, alerting, dashboards) modules to allow them to spin up an MVP out of the box.

  • Improving incident readiness with better tooling and the right hygiene practices such as game days.

  • Engage with feature development teams in toil reduction exercises, capacity planning, load testing, SLO process, and other best practices — partnering with product teams to replace manual capacity planning with predictive/AI-driven scaling models and to codify self-healing runbooks that minimize toil

  • Improving the velocity and quality of our developer platform and tooling

  • General AI fluency desired: comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements

We are in the midst of re-defining our incident response tooling/strategy, improving test tooling, and developing a strategy to ensure all applications are performant and available. Joining Virta would make you one of the key people defining and driving the future vision of what reliability and observability should look like.

Responsibilities

  • Ship automation and tooling that reduces toil, with high-quality, well-structured code.

  • Design and codify self-healing workflows and guardrails to minimize toil and improve reliability.

  • Steward SLO dashboards enhanced with AI/ML-assisted insights, leveraging AIOps-style observability to surface anomalies, predict error-budget burn, and improve signal quality across golden signals

  • Integrate load-testing into reliability engineering efforts, ensuring outcomes directly inform SLOs, scaling strategies, and capacity planning.

  • Partner with product teams to replace manual capacity planning with predictive/AI-driven scaling models and implement burn-rate based alerting.

  • Coach and mentor engineers; champion best practices and pragmatic reliability trade-offs.

90 Day Plan

Within your first 90 days at Virta, we expect you will do the following:

  • Teach and inspire other engineering team members through knowledge sharing, pair programming, and giving feedback during code reviews

  • Propose and implement one or more process improvements related to reliability and observability to make our engineering team even better

  • Deliver a proof-of-concept for an AIOps initiative, demonstrating how a manual reliability or observability process can be transformed into automation to reduce toil and improve insight

Must-Haves

  • Highly proficient in shipping backend code in high-quality production environments, with strong hands-on coding and automation expertise, and a deep understanding of reliability and production readiness practices

  • Hands-on expertise with automation and infrastructure-as-code (Terraform modules preferred), ideally with experience in observability

  • Experience designing and implementing highly observable, scalable systems — with a proven track record configuring AIOps / ML-based monitoring platforms — that support large numbers of users while reducing operational burden

  • Applied and general AI fluency: ability to leverage AI/ML-assisted observability (e.g., anomaly detection, error-budget burn prediction) while also being comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements

  • Growth mindset and craftsmanship: ability to coach, mentor, and evangelize AI-first insights while continually improving engineering practices and following best practices

Values-driven culture

Virta’s company values drive our culture, so you’ll do well if:

  • You put people first and take care of yourself, your peers, and our patients equally

  • You have a strong sense of ownership and take initiative while empowering others to do the same

  • You prioritize positive impact over busy work

  • You have no ego and understand that everyone has something to bring to the table regardless of experience

  • You appreciate transparency and promote trust and empowerment through open access of information

  • You are evidence-based and prioritize data and science over seniority or dogma

  • You take risks and rapidly iterate

Is this role not quite what you’re looking for? Join our Talent Community and follow us on Linkedin to stay connected!

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta’s security and privacy procedures to ensure our patients’ information remains strictly confidential. Security and privacy training will be provided.

Virta has a location based compensation structure. Starting pay will be based on a number of factors and commensurate with qualifications & experience. For this role, the compensation range is [min of $167,249 - $216,000. Information about Virta’s benefits is on our Careers page at: https://www.virtahealth.com/careers .

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta’s security and privacy procedures to ensure our patients’ information remains strictly confidential. Security and privacy training will be provided.

As a remote-first company, our team is spread across various locations with office hubs in Denver and San Francisco.

Clinical roles: We currently do not hire in the following states: AK, HI, RI

Corporate roles: We currently do not hire in the following states: AK, AR, DE, HI, ME, MS, NM, OK, SD, VT, WI.

#LI-remote

Share this job:
Please let Virta Health know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply