Principal Site Reliability Engineer

Job description

Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to over 30% of the Fortune 100 with a powerful, trust-driven platform that enables companies and talent to work together in new ways that unlock their potential.  Last year, more than $3.8 billion of work was done through Upwork by skilled professionals who are gaining more control by finding work they are passionate about and innovating their careers.

This is an engagement through Upwork’s Hybrid Workforce Solutions (HWS) Team. Our Hybrid Workforce Solutions Team is a global group of professionals that support Upwork’s business. Our HWS team members are located all over the world. This is an opportunity to work with a major revenue-producing website with millions of users. In addition to making sure everything works you are also expected to contribute to the continuous improvement of our environment. This is a full time position (~40 hours per week, Monday-Friday). This role will participate in our production on-call rotation in your day-time and on some weekends (once every 2-3 weeks).


Work/Project Scope:

  • Serve as a technical leader in modern SRE practices with a focus on zero-trust infrastructure, platform observability, and cloud-native scalability.
  • Guide the architectural evolution of reliability systems, including multi-cluster Kubernetes environments, GitOps workflows, and service mesh integration.
  • Champion SLO-driven engineering across teams and establish frameworks for defining, tracking, and enforcing reliability standards.
  • Partner with platform and security teams to enable service-to-service authentication, policy enforcement, and resilient control planes.
  • Develop AI-assisted tools and workflows (e.g., for incident triage, RCA generation, auto-remediation) to reduce operational burden and accelerate resolution.
  • Define and maintain end-to-end observability strategies including distributed tracing, metrics pipelines, and log enrichment.
  • Drive infrastructure automation efforts using IaC best practices, with an emphasis on policy-as-code, workload identity, and platform governance.
  • Lead post-incident reviews and reliability audits to surface systemic gaps and drive continuous improvement.
  • Mentor engineers across infrastructure and application teams on designing and operating reliable, scalable systems.

Must Haves (Required Skills):

  • 10+ years in SRE, DevOps, or production engineering roles, including experience operating large-scale distributed systems in production
  • Deep expertise in Kubernetes operations, including multi-cluster orchestration, service mesh (Istio or equivalent), and workload policy management (e.g., OPA, Kyverno)
  • Proven experience building and maintaining GitOps pipelines using tools like ArgoCD or Flux
  • Strong fluency in observability tooling (e.g., Prometheus, OpenTelemetry, Grafana, or Datadog), with a focus on SLO-based alerting and incident detection
  • Familiarity with reliability-as-code practices and automation using scripting languages (Python, Go, or Bash) and AI-enhanced workflows (e.g., Cursor, incident bots, PR-generating agents)
  • Experience designing and enforcing zero trust service-to-service authentication, workload identity, and mTLS policies
  • Track record of leading incident review programs, standardizing postmortems, and driving systemic reliability improvements
  • Ability to work cross-functionally with platform, security, and developer enablement teams to embed resilience across the SDLC.

Upwork is proudly committed to fostering a diverse and inclusive workforce. We never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

To learn more about how Upwork processes and protects your personal information as part of the application process, please review our Global Job Applicant Privacy Notice

Share this job:
Please let Upwork know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply