qode.world Logo

Senior Site Reliability Engineer

Job Description

Role: Sr. Site Reliability Engineer (SRE) – Unified Observability & AIOps

Location: Austin, TX / Fort Mill, SC (Hybrid)

Job Type: Full Time

Role Summary

We are seeking a Senior SRE with strong expertise in Unified Observability, proactive detection, AIOps, and GenAI-driven operations to support complex, distributed financial services platforms. The role requires hands-on experience designing SLI/SLO-driven monitoring, dynamic thresholds, intelligent alerting, and AI/ML-based anomaly detection across multi-stream architectures.

Key Responsibilities

Observability & Reliability Engineering

  • Design and implement unified observability dashboards across metrics, logs, traces, events, and topology
  • Define and manage SLIs, SLOs, and error budgets aligned to business outcomes
  • Build actionable dashboards for operations, engineering, and leadership
  • Implement alerting strategies using static and dynamic thresholds

Proactive Detection & AIOps

  • Leverage AI/ML/AIOps to detect anomalies, predict incidents, and reduce MTTR
  • Transition monitoring from reactive alerts to proactive insights
  • Implement noise reduction, alert correlation, and root cause analysis
  • Apply baseline modeling, seasonality detection, and anomaly scoring

Distributed Systems & Dependency Analysis

  • Monitor and troubleshoot multi-service architectures involving:
  • Microservices
  • Downstream APIs
  • Kafka / streaming platforms
  • Cloud infrastructure (Terraform, IaC)
  • Identify whether issues originate from:
  • Upstream/downstream dependencies
  • Streaming platform
  • Infrastructure
  • Application code

Tooling & Platforms

  • Deep hands-on experience with Dynatrace (mandatory)
  • Experience with:
  • OpenTelemetry
  • Prometheus / Grafana
  • ELK / EFK
  • Cloud-native monitoring (AWS/Azure/GCP)
  • Strong JSON-based telemetry manipulation and enrichment

GenAI & LLM Enablement

  • Apply GenAI / LLMs for:
  • Incident summarization
  • Root cause explanation
  • Runbook recommendations
  • Auto-remediation suggestions
  • Collaborate with platform teams to operationalize GenAI safely

Required Skills & Experience

✅ 15+ years in SRE / Production Engineering

✅ Strong Unified Observability background (not infra-only)

✅ Hands-on Dynatrace experience (metrics, traces, logs, Davis AI)

✅ SLI/SLO engineering experience in production systems

✅ Experience implementing dynamic thresholds and anomaly detection

✅ Knowledge of AI/ML concepts applied to Ops (AIOps)

✅ Distributed systems troubleshooting expertise

✅ Experience with Kafka or streaming data platforms

Differentiators (Highly Valued)

  • Experience in financial services or regulated environments
  • Proven reduction of alert noise and MTTR using AIOps
  • GenAI / LLM integration into operations workflows
Share this job:
Please let qode.world know you found this job on Remote First Jobs 🙏

998 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like qode.world

Explore remote-first companies similar to qode.world. Discover other top-rated employers that offer flexible schedules and work-from-anywhere options.

Emi Labs Logo

Emi Labs

A frontline recruitment automation platform that uses AI to accelerate high-volume hiring across LATAM.

View company profile →
Applied Technology Services, Inc. Logo

Applied Technology Services, Inc.

Delivers IT solutions, including cybersecurity and cloud services, for the Mid-Atlantic region.

View company profile →
Beamery Logo

Beamery

201-500 beamery.com

Talent Lifecycle Management

View company profile →
Zone IT Solutions Logo

Zone IT Solutions

An IT recruitment agency connecting talent across Digital, ERP, Data, and Integration.

View company profile →
Talentful Logo

Talentful

GenAI-driven embedded RPO services and tech recruitment for high-growth global technology companies.

View company profile →
nahc.io Logo

nahc.io

Provides talent acquisition and human capital solutions for startups and innovative companies in Asia.

View company profile →

Project: Career Search

Rev. 2026.4

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7
Apply