Staff Engineer (SRE)

at NextHire
🇮🇳 India - Remote
🔧 DevOps🟣 Senior

Job description

Job Title: Staff Engineer (SRE)

Location: Remote/ Hybrid

Experience: 9 to 15 years

Industry Preference: Candidates from product-based companies only

Responsibilities:

● The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.

● They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams.

● They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO’s and SLA’s.

● They deploy and manage monitoring tools to gain insights on system health and performance.

● They analyze performance, identify bottlenecks and implement solutions to improve a system’s scalability and latency durations.

● They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.

● They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.

● They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.

● They forecast resource needs and provision adequately for current and future demand.

● They design and execute “chaos experiments” to test system’s failure resiliency.

● They own, define and implement the Disaster Recovery (DR) processes for systems.

● They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.

● They ensure that security best practices are followed and implemented during design and operations of systems.

● They also own and maintain documentation of processes, playbooks, and systems.

● They publish KPI reports and other system health updates on a regular basis to the business.

Requirements:

● Must-have - Bachelor’s degree, preferably in CS or a related field, or equivalent experience

● Must-have - 12+ years of overall IT experience

● Must-have - 7+ year of proven work experience as a Senior Site Reliability Engineer or a similar position.

● Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.

● Must-have - AWS experience - 3+ years’ experience with using a broadrange of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.

● Must-have - 2+ year of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.

● Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/Kubernetes)

● Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.

● Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack.

● Experience managing cloud network resources (AWS Preferred) such as CloudWatch, VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points.

● Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.

● Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ

● Experience with configuration automation tools like Puppet/Ansible/Chef/Salt

● Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills.

● Operating Systems: Windows and Linux system administration.

● Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues

● Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.

Good To Have:

● Experience with Terraform/Ansible/Chef/Puppet

● Experience with GitHub Actions

● Experience with CloudFront, Fastly

● Oversees team members performing these functions

● Anticipates problems and future technical needs and takes necessary steps to address

issues.

● Work primarily in server side technologies and comfortable with client side whenever

required

● Enthusiastically follow technology trends, software engineering best practices and

technologies

Perks:

● Day off on the 3rd Friday of every month (one long weekend each month)

● Monthly Wellness Reimbursement Program to promote health well-being

● Paid paternity and maternity leaves

About Forbes Advisor:

Forbes Advisor is a global platform dedicated to helping consumers make the best financial choices for their individual lives. We support your pursuit of success by making smart financial decisions simple, to help you get back to doing the things you care about most. We do this by helping turn your aspirations into reality. By arming you with trusted advice and guidance, you can make informed financial decisions you feel confident in and achieve your financial goals. Visit Forbes Advisor for unbiased personal finance advice, news and reviews, plus a comparison marketplace that helps you find the financial products that best fit your life and goals.

Websites:

https://www.forbes.com/advisor.

https://www.linkedin.com/compa…

Share this job:
Please let NextHire know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply