SRE & MLOps Engineer Platform Reliability & AI Operations

🇮🇳 India - Remote
🔧 DevOps🔵 Mid-level

Job description

About Blue Machines

Blue Machines powers large-scale, real-time Voice AI and Agentic Workflows across BFSI,

Healthcare, HRTech, and Global Enterprises.

Role: SRE & MLOps Engineer (3–6 Years Experience)

Location: Bangalore (Hybrid)

What You Will Own

1. Platform Uptime & Reliability

- Maintain 99.9%+ uptime.

- Monitor and optimize latency for voice agents.

2. Observability, Monitoring & Incident Response

- Build and maintain monitoring dashboards.

- Configure alerts; first responder for incidents.

3. MLOps & Model Provider Reliability

- Monitor STT/TTS/LLM providers.

- Manage failovers and latency SLAs.

4. Kubernetes & Infrastructure

- Manage GKE clusters, autoscaling, deployments.

5. Internal Platform Tooling

- Build automation around scaling, canaries, logs.

6. Security & Compliance

- Enforce encryption, network policies, audit support.

You Are a Great Fit If You…

- 2–5 years SRE/DevOps/MLOps experience.

- Strong with Kubernetes, Prometheus, ELK, Redis, Pub/Sub.

- Understand streaming, SIP, WebSockets.

- Good communication and incident ownership.

Preferred Skills

- Experience with LLM pipelines, telephony, GPU, GCP.

Why Blue Machines

- Build India’s most advanced Voice AI platform.

- High-scale, low-latency engineering.

- Work with CTO’s office on reliability.

Share this job:
Please let apna know you found this job on Remote First Jobs 🙏

Find Remote Jobs

Connect with top companies hiring for remote jobs, work-from-home roles, and 100% online jobs worldwide.

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Apply