Site Reliabilty Engineer

at Float.com
💰 $133k
🇺🇸 United States - Remote
🔧 DevOps🔵 Mid-level

Job description

Who We Are

Float is the leading resource management software for professional services teams. Since 2012, we’ve grown every year—independently, self-funded, and profitably. We’re rated #1 for resource management on G2 and trusted by 4,500+ customers worldwide.

As a certified B Corporation, we’re committed to making a positive impact on our team, customers, the environment, and the remote community. Our 50+ person team works 100% remotely across the globe, with perks and benefits designed to support us in living our Best Work Life. You’ll collaborate with teammates across Australia, Mexico, the UK, Nigeria, Canada, and the US. Learn more about our data security practices for employment or service contracts here. Browse our blog to get a glimpse of life at Float and check out our Glassdoor employer reviews. See why our customers love Float on G2 .

We’re on a scale-up journey, and we’re seeking people who thrive in this stage. We want Float to be the place where you have the autonomy and opportunity to do the best work of your career.

Why We’re Hiring For This Role

Float’s infrastructure has grown rapidly, meaning more customers, more complex systems, and more opportunities to build for scale. As the scale of our systems increases, we’re growing our SRE team to match. You’ll be the third site reliability engineer, and will be working alongside our QA team. This role is about stepping into a high-impact space: helping us automate smarter, improve visibility across engineering, and ensure reliability as we scale. You’ll join a team that’s laying the groundwork for stronger SLAs and an even better experience for our customers.

This role will report into Chris, our Team Lead for SRE & QA. Check out this video where he explains the important role you will play within our SRE team. Watch this video!

You’ll be working asynchronously with a bright, dedicated team from across the globe, with a strong focus on taking complex problems and creating solutions that feel simple and intuitive for our customers.

What You’ll Be Responsible For

Early on, you’ll jump right into:

  • Upgrade paths: Maintain and validate the processes that keep our Kubernetes infrastructure up-to-date, ensuring upgrades happen smoothly, safely, and regularly.
  • Service hygiene: Remove noisy, unused, or misfiring boot alerts and improve the team’s ability to trust alerts as meaningful signals.
  • Service integration: Partner with engineers to configure services within our clusters and support service migrations where possible.
  • Kubernetes optimisation: Review and optimise usage across Kubernetes services, including right-sizing scale node specifications.

Once you are a bit more settled, we expect that you will jump into the following projects:

  • Service mesh & ingress security: Lead our exploration and implementation of service mesh options and harden ingress layers to defend against spam and abuse.
  • Incident response playbooks: Define and roll out standardised playbooks to improve clarity and speed during production incidents.
  • CDC layer support: Build deep familiarity with our next-gen data layer (CDC) to support new teams building on top of it.
  • SLO coaching & support: Help teams define, measure, and meet reliability goals—enabling engineering to own quality into production and drive better outcomes for customers.

What You’ll Need To Be Successful

We want you to love your work and believe that these skills will allow you to succeed in the role. Applying these skills requires:

  • Bash + programming language: Confident writing scripts in Bash and proficient in at least one go-to language (ideally PHP, NodeJS, or Python).
  • Kubernetes: Strong production experience managing and optimising Kubernetes clusters.
  • Terraform: Solid understanding of infrastructure as code using Terraform.
  • GCP: Familiarity with Google Cloud Platform, or eagerness to get up to speed quickly.
  • Iteration mindset: You believe in shipping value early and improving over time, not chasing one-shot perfection.
  • Written communication: You write clearly and concisely, whether it’s documenting infrastructure, proposing changes, or sharing learnings across teams.

Our SRE growth framework details the key competencies and expectations needed for this role. Take a look at the Level 2 column to learn more about what you’ll need to be successful in the role, in addition to the technical skills outlined above.

As a fully remote team, we’re looking for someone comfortable with asynchronous communication as the default, which means you have previous remote experience and are comfortable using tools like Slack, Loom, and Linear to communicate as needed. Don’t worry—you will have significant deep work time since we have very few meetings.

Why Join Us

Pay for this role is US $133,000 (Level 2). Here’s a blog post with more information on how we determine our salaries.

We’re a global async remote company with a diverse team of people from all over the world who share a common belief in living our best work life. We believe deeply in the idea of transparency and share our Float Handbook publicly so potential new team members can see first hand our perks & benefits as well as our ways of working. If you feel like you can thrive at Float to do your best work, we would love to hear from you.

Hiring Process For This Role

You’ll find a lot of useful information about our interview process and what it’s like to join our global team on the Float careers page. By the way, we made a blog post on 10 tips for applying to a role at Float - we highly recommend you check it out prior to applying!

The hiring process for this role looks like this:

Initial First Meet (20 min): You’ll meet with Julia, our Talent Manager, to discuss your interest in the role and review your questions about working at Float.

Manager Interview (45 min): You’ll meet with Chris, our SRE Team Lead , to discuss how your background and experience make you a great fit for this role.

Co-Worker Interview (30 min): You’ll meet with Bogdan, our Site Reliability Engineer, to dive deeper into your goals and to learn more about your alignment with our values and ways of working.

Take-home assignment (2 hours, paid): You’ll complete a take-home technical assignment that the hiring team will review. You will be paid an honorarium after completion of your take-home assignment, and will receive feedback on your assignment regardless of the outcome.

Founder Interview (30 min): You’ll meet with Lars, our CTO and Co-Founder, to get to know you and see if you have potential to be a great addition to the team.

Note: Industry research shows that women and those in traditionally underrepresented groups generally don’t apply to jobs unless they check all the boxes for the role. If you feel strongly that you have what it takes for this role but don’t check 100% of the boxes—that’s okay—we encourage you to apply anyway and highlight what you can bring to the table.

Share this job:
Please let Float.com know you found this job on Remote First Jobs 🙏
Float.com logo

Float.com

  • 1 remote job

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply