Senior Site Reliability Engineer

🇨🇭 Switzerland - Remote
🔧 DevOps🟣 Senior

Job description

Mission – why we exist, what we do, and why we need you

SpotMe is a leading B2B event platform that helps enterprises increase the impact of their events by delivering CRM-connected, high-quality experiences across in-person, virtual, hybrid events, and webinars. With a strong focus on life sciences, SpotMe powers Onomi: an HCP engagement product that enables medical and commercial teams to run impactful congresses, symposia, advisory boards, and webinars. Together, SpotMe and Onomi turn events into a company’s most effective engagement channel.

This role is for a hands-on technical expert who thrives in maintaining high-performance, scalable systems and has the passion and knowledge to ensure that a 247 SaaS platform operates smoothly under all conditions. If you have a strong background in cloud infrastructure, automation, and incident response, this is your opportunity to take full ownership of our platform’s reliability and scalability in an environment where these are critical to our business success. As the SRE, you won’t just monitor; you will actively drive improvements, challenge the status quo, and raise the bar for system uptime and performance.

You will report to the Infrastructure Lead and collaborate closely with both engineering and product teams. Your role will involve not only maintaining and optimizing the platform’s infrastructure but also developing key solutions that improve the reliability and scalability of the platform. You will be responsible for ensuring that our platform can scale seamlessly to handle peak traffic loads while staying resilient during high-stakes live events. Your time will be spent on:

  • [40%] Infrastructure development
    • Develop and deploy scalable infrastructure solutions using Terraform and cloud-native services.
    • Contribute to critical full-stack features that involve backend and infrastructure/cloud development.
    • Build automation solutions to streamline infrastructure provisioning and CI/CD pipelines.
  • [40%] Infrastructure maintenance
    • Maintain and optimize the platform’s cloud infrastructure to ensure it’s highly available and cost-efficient.
    • Monitor and update infrastructure to adhere to security best practices, applying necessary patches and upgrades.
    • Ensure the infrastructure can handle peak loads, scaling seamlessly during high-traffic events.
  • [20%] Support, maintenance and observability
    • Participate in the on-call infrastructure support team, responding to incidents and providing solutions in a timely manner.
    • Enhance the platform’s monitoring and observability to identify and resolve potential issues before they affect end-users.
    • Handle infrastructure-related support requests and drive continuous improvement in incident resolution processes.

Objectives – the problems you will solve

In Your First Month:

  • Understand the current platform architecture and perform 3 IaC change request reviews.
  • Understand monthly patching procedures and deploy critical security update patches.
  • Get hands-on with our load testing framework and perform one release validation load test.
  • Participate in weekly risk analysis management meetings and perform a scheduled infrastructure DB upscaling and downscaling.
  • Handle and resolve at least 3 infrastructure-related support requests.

After 3 Months:

  • Perform at least 2 firewall rule updates.
  • Develop and deliver one IaC-related project using Terraform.
  • Develop and deliver one Python-based AWS Lambda.
  • Fully integrate into the on-call infrastructure support team.

After 6 Months:

  • Lead the resolution of a critical infrastructure-related incident and drive improvements in incident response and recovery times.
  • Lead a major tech debt refactoring project involving CI/CD (e.g., move an on-premise complex build system to the cloud).
  • Contribute and deliver a major full-stack feature that involves front-end, back-end, and infrastructure/cloud development.

What you need to be great at

  • Large-scale SaaS reliability engineering expertise – you bring deep practical knowledge in site reliability engineering, complemented by a strong foundation in system administration or software development, applying best practices to design, build, and maintain resilient, business-critical SaaS platforms that operate 247.
  • Hands-on cloud-native & distributed systems – you understand and work with complex cloud-native and distributed architectures, actively designing, deploying, and managing high-availability systems. You have a flair for diagnosing and resolving complex system issues under pressure
  • Infrastructure automation and cloud development - you are highly proficient in automating and managing infrastructure using Terraform and developing CI/CD pipelines with Jenkins.  You bring strong experience with AWS in production environments (Azure knowledge is an asset) and have hands-on experience with both document oriented and relation databases.
  • Technical Expertise: you are fluent in Python and comfortable leveraging additional languages like JavaScript/Node.js or Go. You are comfortable with observability tools such as Datadog, Pingdom, Grafana, and Elasticsearch, with hands-on experience considered a strong plus.

SpotMe recruits, compensates, and promotes regardless of race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, parental status, or veteran status.

Share this job:
Please let SpotMe know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply