qode.world Logo

Lead Cloud Engineering and Production Operations Engineer

Job Description

About Incedo:

Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale. As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together. With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.

Please visit the linke to know about Incedo: https://www.incedoinc.com/

Location- San Jose, CA

Title- Lead Cloud Engineering and Production Operations Engineer

Job Description:

This role acts as a hands-on technical lead, driving cloud engineering initiatives, automating infrastructure, and ensuring high-availability and performance across customer-facing systems. The Lead Engineer will collaborate with IT, DevOps, and Software Engineering teams to build secure, scalable environments that support continuous delivery and rapid innovation.

Reporting to the Associate Director of IT and Infrastructure, this position combines deep technical execution with mentoring responsibilities—balancing architectural vision with day-to-day operational excellence.

Key Responsibilities:

Cloud Infrastructure and Engineering

  • Design, deploy, and manage hybrid and cloud infrastructures (OCI, AWS, Azure, on-prem) to support production and enterprise systems
  • Implement infrastructure-as-code (IaC) using Terraform or CloudFormation to ensure repeatable, secure, and automated deployments
  • Develop and maintain CI/CD-ready environments that support rapid build, test, and release cycles for engineering teams
  • Partner with network and security teams to implement resilient, compliant architectures

Production Operations and Reliability

  • Serve as technical lead for production systems, ensuring stability, performance, and scalability
  • Establish monitoring, logging, and alerting frameworks to improve visibility and reduce mean time to detection (MTTD) and resolution (MTTR)
  • Participate in incident response, root cause analysis, and reliability improvement efforts
  • Collaborate with Engineering and SRE teams to define SLIs, SLOs, and performance metrics for critical services

Automation and CI/CD Enablement

  • Develop and enhance deployment pipelines (e.g., Jenkins, GitLab, ArgoCD) to automate software delivery and environment provisioning
  • Embed security, compliance, and testing gates into CI/CD workflows
  • Implement configuration management and orchestration tools such as Ansible, Chef, or Puppet to manage infrastructure at scale
  • Drive efficiency through self-healing systems, auto-scaling, and infrastructure automation

Operational Leadership and Collaboration

  • Lead day-to-day production operations activities, mentoring junior engineers on cloud and reliability best practices
  • Act as a technical bridge between Infrastructure, Security, and Application Engineering teams
  • Contribute to capacity planning, cost optimization, and production readiness reviews
  • Maintain documentation, runbooks, and standard operating procedures for production systems

Qualifications:

  • Bachelor’s degree in Computer Science, Information Systems, or equivalent experience
  • 7+ years of experience in cloud and infrastructure engineering, with at least 2–3 years in a lead or senior engineer capacity
  • Deep expertise in OCI (preferred) AWS or Azure (networking, compute, storage, IAM, and monitoring)
  • Proven experience with production-scale operations and hybrid cloud deployments
  • Proficiency in:
  • Infrastructure-as-code (Terraform, CloudFormation)
  • CI/CD and DevOps pipelines (Jenkins, GitLab, ArgoCD)
  • Containers and orchestration (Kubernetes, Docker)
  • Observability tools (Datadog, Prometheus, Grafana, ELK)
  • Scripting languages (Python, Bash, PowerShell)
  • Strong troubleshooting skills and the ability to lead through high-impact incidents
  • Excellent communication and collaboration skills across cross-functional teams

Preferred Experience:

  • Experience supporting high-availability SaaS or production environments
  • Knowledge of FinOps, cloud governance, and cost optimization practices
  • Familiarity with DevSecOps principles, Zero Trust, and automated compliance frameworks
  • Exposure to AI/ML pipeline infrastructure or high-throughput data systems

AI Use Guidelines for Interviews: Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process.

Hybrid

Targeted compensation guideline: Compensation will vary based on number of factors, including market demand for specific skills, role type, job level, and individual qualifications. Final salary offers are determined by considerations including, but not limited to, subject matter expertise, demonstrated skill level, relevant experience, geographic location, education, certifications, and training.

Share this job:
Please let qode.world know you found this job on Remote First Jobs 🙏

4910 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like qode.world

Explore remote-first companies similar to qode.world. Discover other top-rated employers that offer flexible schedules and work-from-anywhere options.

Emi Labs Logo

Emi Labs

A frontline recruitment automation platform that uses AI to accelerate high-volume hiring across LATAM.

View company profile →
Applied Technology Services, Inc. Logo

Applied Technology Services, Inc.

Delivers IT solutions, including cybersecurity and cloud services, for the Mid-Atlantic region.

View company profile →
Beamery Logo

Beamery

201-500 beamery.com

Talent Lifecycle Management

View company profile →
Zone IT Solutions Logo

Zone IT Solutions

An IT recruitment agency connecting talent across Digital, ERP, Data, and Integration.

View company profile →
Talentful Logo

Talentful

GenAI-driven embedded RPO services and tech recruitment for high-growth global technology companies.

View company profile →
nahc.io Logo

nahc.io

Provides talent acquisition and human capital solutions for startups and innovative companies in Asia.

View company profile →

Project: Career Search

Rev. 2026.6

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7
Apply