Job description

Company Overview

At Zuora, we do Modern Business. We’re helping people subscribe to new ways of

doing business that are better for people, companies and ultimately the planet. It’s an

approach resulting from the shift to the Subscription Economy that puts customers first

by building recurring relationships instead of one-time product sales and focuses on

sustainable growth. Through our leading expertise and multi-product suite, we are

transforming all industries and working with the world’s most innovative companies to

monetize new business models, nurture subscriber relationships and optimize their

digital experiences.

The Team & Role

Join Zuora’s high-impact Operations team, where you’ll be instrumental in maintaining

the reliability, scalability, and performance of our SaaS platform. This role involves

proactive service monitoring, incident response, infrastructure service management,

and ownership of internal and external shared services to ensure optimal system

availability and performance.

You will work alongside a team of skilled engineers dedicated to operational excellence

through automation, observability, and continuous improvement. In this cross-functional

role, you’ll collaborate daily with Product Engineering & Management, Customer

Support, Deal Desk, Global Services, and Sales teams to ensure a seamless and

customer-centric service delivery model.

As a core member of the team, you’ll have the opportunity to design and implement

operational best practices, contribute to service provisioning strategies, and drive

innovations that enhance the overall platform experience. If you’re driven by solving

complex problems in a fast-paced environment and are passionate about operational

resilience and service reliability, we’d love to hear from you.

Our Tech Stack: Linux Administration, Python, Docker, Kubernetes, MySQL, Kafka,

ActiveMQ, Tomcat App & Web, Oracle, Load Balancers, REDIS Cache, Debezium,

AWS, WAF, LBs, Jenkins, GitOps, Terraform, Ansible, Puppet, Prometheus, Grafana,

Open Telemetry

In this role you’ll get to

Architect and implement intelligent automation workflows for infrastructure

lifecycle management, including self-healing systems, automated incident

remediation, and configuration analomy detection using Infrastructure as Code

(IaC) and AI-driven tooling.

Leverage predictive monitoring and anomaly detection techniques powered by

AI/ML to proactively assess system health, optimize performance, and preemptservice degradation or outages.

Lead complex incident response efforts, applying deep root cause analysis

(RCA) and postmortem practices to drive long-term stability, while integrating

automated detection and remediation capabilities.

●

Partner with development and platform engineering teams to build resilient CI/CD

pipelines, enforce infrastructure standards, and embed observability and

reliability into application deployments.

Identify and eliminate reliability bottlenecks through automated performance

tuning, dynamic scaling policies, and advanced telemetry instrumentation.

Maintain and continuously evolve operational runbooks by incorporating machine

learning insights, updating playbooks with AI-suggested resolutions, and

identifying automation opportunities for manual steps.

●

Stay abreast of emerging trends in AI for IT operations (AIOps), distributed

systems, and cloud-native technologies to influence strategic reliability

engineering decisions and tool adoption.

Who we’re looking for

Hands-on experience with Linux Servers Administration and Python

Programming.

Deep experience with containerization and orchestration using Docker and

Kubernetes, managing highly available services at scale.

Working with messaging systems like Kafka and ActiveMQ, databases like

MySQL and Oracle, and caching solutions like REDIS.

Understands and applies AI/ML techniques in operations, including anomaly

detection, predictive monitoring, and self-healing systems.

Has a solid track record in incident management, root cause analysis, and

building systems that prevent recurrence through automation.

Is proficient in developing and maintaining CI/CD pipelines with a strong

emphasis on observability, performance, and reliability.

Monitoring and observability using Prometheus, Grafana, and OpenTelemetry,

with a focus on real-time anomaly detection and proactive alerting.

Is comfortable writing and maintaining runbooks and enjoys enhancing them with

automation and machine learning insights.

Keeps up-to-date with industry trends such as AIOps, distributed systems, SRE

best practices, and emerging cloud technologies.

Brings a collaborative mindset, working cross-functionally with engineering,

product, and operations teams to align system design with business objectives.

1+ years of experience working in a SaaS environment.

Nice to Have:

Red Hat Certified System Administrator (RHCSA) – Red Hat

AWS Certification

Certified Associate in Python Programming (PCAP) – Python Institute

Docker Certified Associate (DCA) or Certified Kubernetes Administrator (CKA)

Good knowledge of Jenkins

#ZEOLife at Zuora

Advanced certifications in SRE or related fields

As an industry pioneer, our work is constantly evolving and challenging us in new ways

that require us to think differently, iterate often and learn constantly—it’s exciting. Our

people, whom we refer to as “ZEOs” are empowered to take on a mindset of ownership

and make a bigger impact here. Our teams collaborate deeply, exchange different ideas

openly and together we’re making what’s next possible for our customers, community

and the world.

As part of our commitment to building an inclusive, high-performance culture where

ZEOs feel inspired, connected and valued, we support ZEOs with:

Competitive compensation, corporate bonus program, performance rewards and retirement programs

Medical insurance

Generous, flexible time off

Paid holidays, “wellness” days and company wide end of year break

6 months fully paid parental leave

Learning & Development stipend

Opportunities to volunteer and give back, including charitable donation match

Free resources and support for your mental wellbeing

Specific benefits offerings may vary by country and can be viewed in more detail during

your interview process.

Location & Work Arrangements

Organizations and teams at Zuora are empowered to design efficient and flexible ways

of working, being intentional about scheduling, communication, and collaboration

strategies that help us achieve our best results. In our dynamic, globally distributed

company, this means balancing flexibility and responsibility — flexibility to live our lives

to the fullest, and responsibility to each other, to our customers, and to our

shareholders. For most roles, we offer the flexibility to work both remotely and at Zuora offices.

Our Commitment to an Inclusive Workplace

Think, be and do you! At Zuora, different perspectives, experiences and contributions

matter. Everyone counts. Zuora is proud to be an Equal Opportunity Employer

committed to creating an inclusive environment for all.

Zuora does not discriminate on the basis of, and considers individuals seeking

employment with Zuora without regards to, race, religion, color, national origin, sex

(including pregnancy, childbirth, reproductive health decisions, or related medical

conditions), sexual orientation, gender identity, gender expression, age, status as a

protected veteran, status as an individual with a disability, genetic information, politicalviews or activity, or other applicable legally protected characteristics.

We encourage candidates from all backgrounds to apply. Applicants in need of special

assistance or accommodation during the interview process or in accessing our website

may contact us by sending an email to assistance(at)zuora.com.

Job description

Similar Remote Jobs

Data Engineer - Software Engineer II

Software Engineer II, Backend

Software Engineer II, Backend

Senior Software Engineer II, Integrations

Senior Software Engineer II

Software Engineer II, Data

Software Engineer II

Software Engineer II

Software Engineer II

Zuora

Software Engineer II

Product Manager

Senior Client Manager

Site Reliability Engineer

Principal Site Reliability Engineer

Benefits of using Remote First Jobs

Discover Hidden Jobs

Advanced Filters

Priority Job Alerts

Manage Your Job Hunt

Search remote, work from home, 100% online jobs

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

How often are new jobs added?

Can I trust the job listings on Remote First Jobs?

Can I suggest companies to be added to your search?

How do I apply for jobs?