Job description
Senior Manager, Platform Engineering
SMG is a leading experience management (XM) provider, serving restaurants, retailers, and other multi-location consumer businesses by changing how brands act on customer + employee insights. With a rich 30-year history, SMG is uniquely pairing an enterprise software platform with professional services to help brands generate new revenue, grow existing revenue, reduce detractors, and drive operational efficiencies. And with our 2024 acquisition of Bulbshare, we also help the world’s leading organizations grow through real-time customer collaboration by building mobile-first customer communities in over 30 markets worldwide, enabling clients to collaborate with consumers quickly and effectively for insights, ideation, and advocacy.
We offer our talent -
- Work hard, have fun environment - We work hard to deliver a fulfilling, exciting workplace environment for each SMG employee. Our teams are composed of smart, talented, curious people who love a good challenge.
- Values driven culture where we connect, collaborate & co-create.
- Remote first company (fully remote)
- Unlimited PTO
- Tech provided
Diverse, experienced, friendly team which will welcome you, support you and challenge you. We are proud to be an equal opportunity employer. We celebrate diversity and create an inclusive work environment in which all our colleagues experience belonging, have their unique needs respected and met, have equal access to opportunities and resources, and feel fully engaged to contribute to the company’s success.
As a Senior Manager, Platform Engineering at SMG, this is what you will do:
- Lead, mentor, and develop a multidisciplinary Platform Engineering team, fostering a culture of collaboration, ownership, and continuous improvement.
- Define and continuously improve CI/CD workflows and pipelines that enable rapid, safe, and repeatable delivery of software.
- Establish Infrastructure‑as‑Code (IaC) standards, reusable module libraries, and governance checks to ensure consistency across environments.
- Provide and support robust local development environments that mirror production, boosting developer productivity.
- Deliver tooling and governance enablement—including guardrails, automated policy enforcement, and self‑service platforms—to accelerate development velocity while maintaining compliance.
- Own vendor evaluations for platform tooling, frameworks, and managed services; negotiate contracts and manage vendor relationships.
- Define, document, and enforce organization‑wide Engineering, Troubleshooting, and Runbook standards.
- Implement and operate code vulnerability management processes and tooling, ensuring remediation SLAs are met.
- Establish and maintain an authoritative component inventory / software bill of materials (SBOM).
- Design and maintain an Engineering Documentation Framework to ensure knowledge is current, discoverable, and actionable.
- Manage the lifecycle of internal engineering frameworks—from selection through deprecation—ensuring version currency and support.
- Partner with FinOps to define and operationalize a cost tagging framework; champion rightsizing, cost analysis, and patch management of shared platform services.
- Design and lead Incident and Problem Management architecture and processes, including on‑call escalation and rotation management.
- Own the Postmortem framework, ensuring blameless retrospectives, action item tracking, and learning dissemination.
- Define and maintain the strategy for Observability and Monitoring (metrics, logs, traces, dashboards, alerts).
- Develop and routinely test Disaster Recovery plans, achieving or surpassing agreed RTO/RPO targets.
- Drive Site and Service Reliability through SLO/SLA definition, error budget policies, and proactive reliability engineering.
- Measure and optimize capacity and performance of platform services through data‑driven analysis and forecasting.
- Define policies for Scheduled Changes and Change Management to minimize risk and downtime.
- Perform cost analysis and reporting, surfacing insights to engineering and finance stakeholders.
You are a perfect match for the role if you have:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 10+ years of progressive experience in software, platform, or reliability engineering, with at least 3 years in a people‑management role.
- Hands‑on experience designing and operating cloud‑native platforms on AWS, Azure, or GCP.
- Expertise with CI/CD tooling (e.g., GitHub Actions, Jenkins, ArgoCD), Infrastructure‑as‑Code frameworks (e.g., Terraform, CloudFormation), and container orchestration (e.g., Kubernetes).
- Deep understanding of SRE principles, reliability metrics (SLO/SLA, error budgets), observability stacks (e.g., Prometheus, Grafana, ELK), and incident management best practices.
- Demonstrated success implementing security and vulnerability management programs across the SDLC.
- Proven track record driving cost optimization, resource rightsizing, and FinOps initiatives.
- Exceptional communication, stakeholder management, and leadership skills.
- Ability to thrive in a remote‑first, fast‑paced environment and influence across functional boundaries.
About SMG:
To learn more about our customer, employee, and brand experience management (XM) solutions, visit www.smg.com.