qode.world Logo

SRE Manager / SRE Architect

Job Description

Job Description – SRE Manager / SRE Architect (Hands-on)

Location: New York City, NY / Fort Mill, SC (Hybrid)

Employment Type: Full-Time / Contract

Industry: Financial Services

Position Overview

We are seeking a highly experienced and hands-on Site Reliability Engineering (SRE) Manager / SRE Architect to lead reliability, availability, performance, and release management initiatives across enterprise-scale applications and platforms. This role requires a strong blend of SRE, DevOps, Release Management, Cloud Engineering, Automation, and Production Operations expertise.

The ideal candidate will be deeply involved in designing and implementing reliability strategies, driving release governance, improving deployment processes, and ensuring operational excellence across cloud-native environments.

LaunchDarkly experience is highly preferred but not mandatory.

Key Responsibilities

Site Reliability Engineering (SRE)

  • Design and implement SRE best practices focused on reliability, scalability, performance, and availability.
  • Define and monitor SLIs, SLOs, and error budgets across critical applications and services.
  • Drive proactive monitoring, alerting, observability, and incident management processes.
  • Lead root cause analysis (RCA) efforts and implement preventive measures.
  • Improve system resiliency through automation, self-healing capabilities, and operational excellence.
  • Establish reliability standards across distributed systems and cloud platforms.

Release Management

  • Own and drive end-to-end release management processes across multiple environments.
  • Coordinate application releases across development, QA, UAT, staging, and production environments.
  • Develop release governance, release calendars, deployment strategies, rollback procedures, and change management processes.
  • Partner with development, QA, infrastructure, and business teams to ensure smooth production deployments.
  • Identify and mitigate release risks while minimizing downtime and business impact.
  • Implement deployment automation and continuous delivery best practices.

DevOps & Automation

  • Design and maintain CI/CD pipelines using modern DevOps tools.
  • Automate infrastructure provisioning, deployment, monitoring, and operational workflows.
  • Drive Infrastructure as Code (IaC) adoption using Terraform or similar technologies.
  • Support cloud-native architectures and containerized application deployments.
  • Partner with engineering teams to improve developer productivity and deployment velocity.

Cloud & Platform Engineering

  • Manage and optimize cloud infrastructure on AWS and/or Azure.
  • Support Kubernetes, container orchestration, and cloud-native application platforms.
  • Ensure platform scalability, security, compliance, and operational readiness.
  • Drive platform modernization initiatives and operational transformation efforts.

Required Skills & Experience

Core SRE Skills

  • 15+ years of IT experience with strong focus on SRE, DevOps, Platform Engineering, or Production Support.
  • Extensive hands-on experience implementing SRE practices in enterprise environments.
  • Strong understanding of:
  • SLI/SLO/Error Budgets
  • Incident Management
  • Problem Management
  • Capacity Planning
  • Reliability Engineering
  • Observability & Monitoring

Release Management

  • Proven experience managing large-scale production releases.
  • Strong expertise in:
  • Release Planning
  • Release Governance
  • Change Management
  • Deployment Automation
  • Rollback Strategies
  • Production Readiness Reviews

DevOps & Cloud

  • Hands-on experience with:
  • AWS and/or Azure
  • Kubernetes (EKS, AKS, OpenShift preferred)
  • Docker
  • Terraform
  • GitHub Actions, Jenkins, Azure DevOps, GitLab CI/CD
  • Experience building and maintaining CI/CD pipelines.

Monitoring & Observability

  • Strong experience with:
  • Dynatrace
  • Datadog
  • Splunk
  • Prometheus
  • Grafana
  • ELK Stack
  • CloudWatch

Scripting & Automation

  • Experience with Python, Bash, PowerShell, or similar scripting languages.
  • Strong automation mindset with focus on operational efficiency.

Nice to Have

  • LaunchDarkly end-to-end implementation experience
  • Feature flag management and progressive delivery strategies.
  • Financial Services, Banking, or Wealth Management domain experience.
  • Experience leading SRE or DevOps transformation initiatives.
  • Cloud certifications (AWS, Azure, Kubernetes).

Preferred Candidate Profile

  • Strong hands-on SRE leader, not just a people manager.
  • Deep expertise in Release Management and Production Support.
  • Proven background in DevOps, Cloud Engineering, and Platform Reliability.
  • Ability to work with development, infrastructure, security, and business teams.

Keywords

SRE, Site Reliability Engineering, Release Management, DevOps, Terraform, AWS, Azure, Kubernetes, Dynatrace, CI/CD, LaunchDarkly, Production Support, Incident Management, Reliability Engineering, Observability, Platform Engineering, Infrastructure Automation.

Share this job:
Please let qode.world know you found this job on Remote First Jobs πŸ™

185 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like qode.world

Explore remote-first companies similar to qode.world. Discover other top-rated employers that offer flexible schedules and work-from-anywhere options.

Emi Labs Logo

Emi Labs

A frontline recruitment automation platform that uses AI to accelerate high-volume hiring across LATAM.

View company profile β†’
Applied Technology Services, Inc. Logo

Applied Technology Services, Inc.

Delivers IT solutions, including cybersecurity and cloud services, for the Mid-Atlantic region.

View company profile β†’
Beamery Logo

Beamery

201-500 beamery.com

Talent Lifecycle Management

View company profile β†’
Zone IT Solutions Logo

Zone IT Solutions

An IT recruitment agency connecting talent across Digital, ERP, Data, and Integration.

View company profile β†’
Talentful Logo

Talentful

GenAI-driven embedded RPO services and tech recruitment for high-growth global technology companies.

View company profile β†’
nahc.io Logo

nahc.io

Provides talent acquisition and human capital solutions for startups and innovative companies in Asia.

View company profile β†’

Project: Career Search

Rev. 2026.6

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7
Apply