Global Incident Management

  • Remote - United States

Remote

Cybersecurity

Director

Job description

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

The Incident Ops team is a global 247 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in communications, incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.

What you’ll do

This position entails leading and optimizing Stripe’s incident management processes and automation, ensuring efficiency and adherence to stringent incident response metrics. As the head of the incident response team, you will establish and maintain a best-in-class incident response framework, upholding the reliability standards expected of Stripe. Responsibilities include but are not limited to incident classification, escalation, and notification management, along with accountability for key incident response metrics (TTx). You will generate actionable insights to drive continuous improvement, collaborating with engineering leadership to refine incident detection, response, user communication, and tooling efficacy. Leadership and development of a highly effective 247 global incident response management team, characterized by urgency, programmatic ownership of incidents and communications, and the capacity to engage engineering teams, are crucial. Additionally, you will manage incident communications across multiple channels for executive and end-user audiences, and identify automation opportunities to streamline incident response workflows, thereby safeguarding users and minimizing disruption to their operations.

Responsibilities

  • Lead the global 247 team of regional managers and incident response managers with ability to be hands-on and support frontline on-call with speed, cross-functional collaboration and escalation
  • Develop and own Stripe’s incident response and management strategy and cross-functional roadmap, ensuring it aligns with the company’s reputation for reliability.
  • Spearhead and manage Stripe’s AI-First strategy for automation of incident response workflows, partnering with the engineering team to implement required tooling enhancements.
  • Enhance Stripe’s incident response by leading and implementing improvements derived from analyzing user-facing incidents and extracting actionable insights and learnings.
  • Collaborate closely with executive leadership, engineering, and operations teams to lead significant programs and reshape workflows and metrics concerning reliability and incident operations.
  • Manage relevant TTx metrics, particularly those related to communication and escalation. Collaborate with engineering leadership to implement necessary improvements for each metric.
  • Develop user-focused metrics and data to guide Stripe’s incident response, reliability strategy, and user communications (including RCAs), ensuring impactful decision-making.

Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements

  • 10+ years of management experience, including 4+ years of experience managing managers with a proven record in building, growing and transforming teams.
  • Extensive experience (8+ years) leading incident response for complex, large-scale distributed services with high SLOs/SLAs, coupled with deep expertise in crisis management.
  • Demonstrated ability to lead, influence other leaders and deliver complex strategic projects involving multiple stakeholders
  • Strong analytical skills, and the ability to use data to drive business decisions
  • Possesses proficiency in basic incident troubleshooting and a reasonable understanding of system architecture. Fluent in using SQL, Splunk, or similar query languages.
  • Exceptional communication abilities, capable of adapting incident updates for diverse audiences (executives, external users, internal teams).
  • Affinity for a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
  • Comfort navigating ambiguity, while identifying areas for process improvement and establishing best practices

Preferred qualifications

  • Experience managing geographically dispersed teams
  • Experience using infrastructure and application monitoring tools such as Prometheus, Sentry and others
  • Experience in incident response at a high-growth technology company, preferably within the payments or e-commerce sectors.
  • Proven ability to apply Agentic and Generative AI to revolutionize incident response, coupled with a strong grasp of current industry trends in the incident response domain.
  • Demonstrated history of driving engineering and process enhancements to improve incident response efficiency within a rapidly expanding technology organization.
Share this job:
Please let Stripe know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply now