Principal Site Reliability Engineer

Job description

ConnectWise is an industry and Global leading software company with over 3,000 colleagues in North America, EMEA and APAC. As a community-driven software company dedicated to the success of technology solution providers, our suite helps over 45,000 of our partners manage their businesses better, sell more efficiently, automate service delivery, and remotely control technology so they can consistently deliver amazing customer experiences.

Our company is powered by our connections, our colleagues, and our community. And, we accept all kinds.

Game-changers, innovators, culture-lovers—and humankind.

We invite discovery and debate. We recognize key moments as milestones.

We see you and value you for your unique contributions. Our inclusive, positive culture lays the foundation to ensure every colleague is valued for their perspectives and skills, giving you the choice of how YOU make a difference.

Curious? Read this opportunity to learn how YOU can make a difference at ConnectWise!

General Summary:

As a Site Reliability Engineer, you will work as an integral member of product teams, helping to build, deploy, and monitor cloud services reliably. You will contribute to complex software development projects to maintain essential, revenue-critical services. Additionally, you will actively develop code and build frameworks to monitor services deployed in production, driving reliability and performance across a large scale. You will be responsible for ensuring the reliability, availability, and performance of our Elasticsearch infrastructure. We’re seeking a talented Site Reliability Engineer who can work with minimal supervision, define test procedures, and collaborate effectively with Developers, Designers, Customer Support, and Engineering Leadership.

Essential Duties and Responsibilities:

· Build systems and infrastructure to monitor complex, large-scale distributed systems.

· Identify stability/performance issues and collaborate with developers to triage critical issues in production systems.

· Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.

· Devise ways to actively monitor system throughput, capacity, and reliability.

· Have the ability to debug complex systems and evolve a running environment without causing downtime.

· Engage in service capacity planning and demand forecasting, as well as software performance analysis and system tuning.

· Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.

· Monitor and troubleshoot Elasticsearch performance issues and outages.

Who You Are

· Bachelor’s degree in Computer Science or equivalent work experience as a System Administrator with programming skills.

· Fundamental knowledge of technologies across a broad range of disciplines, including virtualization, storage, networking, server, and security.

· Understanding of systems and application design, including the operational trade-offs of various designs.

· Experience with monitoring and logging solutions such as Prometheus, Grafana, and ELK stack.

· Proficiency in scripting languages such as Python.

· Experience with infrastructure-as-code tools such as Terraform or CloudFormation.

· Strong understanding of Linux system administration and networking concepts.

· Excellent troubleshooting and problem-solving skills.

· Ability to work independently and collaboratively in a fast-paced environment.

· Strong communication and interpersonal skills.

· Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.

· Experience in analyzing logs and troubleshooting large-scale distributed systems.

· Excellent organizational, time management, and communication skills.

Nice to Have

· Experience with instrumenting and monitoring production systems using tools such as ELK stack, Zabbix, Nagios, Statsd/Graphite, APM, etc.

· Experience with Amazon AWS Infrastructure (including EC2, S3, VPC, Security Groups, RDS) and related services is desirable.

· A working understanding of Docker, Vagrant, and configuration management tools like Ansible, Chef, or Puppet.

· Experience with one or more general-purpose programming/scripting languages, including but not limited to Python, Bash, Perl, or Go.

Benefits include:

· Medical Insurance

· Flexible PTO

· Flex Friday

· Hybrid Work Option Available

· Tuition Reimbursement

· And more!

ConnectWise is an Equal Opportunity Employer, dedicated to building a diverse and inclusive workforce and providing a workplace free from discrimination and harassment. ConnectWise provides equal employment opportunities to all employees and applicants without regard to race, ethnicity, color, religion, age, sex (including pregnancy), sexual orientation, gender, gender identity or expression, ancestry, national origin, citizenship status, physical or mental disability, genetic information, military/veteran status, marital status, familial or parental status, or any other characteristic or status protected by applicable federal, state and local laws.

The statements above are intended to describe the general nature and level of work being performed by individuals assigned to this job. Other duties may be assigned as needed. Reasonable accommodations may be made to enable qualified individuals with disabilities to perform the essential functions of the job and/or to receive other benefits and privileges of employment. If you need a reasonable accommodation for any part of the application and hiring process, please contact us at [email protected] or 1-800-671-6898.

Share this job:
Please let ConnectWise know you found this job on Remote First Jobs 🙏

Benefits of using Remote First Jobs

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Search remote, work from home, 100% online jobs

We help you connect with top remote-first companies.

Search jobs

Hiring remote talent? Post a job

Frequently Asked Questions

What makes Remote First Jobs different from other job boards?

Unlike other job boards that only show jobs from companies that pay to post, we actively scan over 20,000 companies to find remote positions. This means you get access to thousands more jobs, including ones from companies that don't typically post on traditional job boards. Our platform is dedicated to fully remote positions, focusing on companies that have adopted remote work as their standard practice.

How often are new jobs added?

New jobs are constantly being added as our system checks company websites every day. We process thousands of jobs daily to ensure you have access to the most up-to-date remote job listings. Our algorithms scan over 20,000 different sources daily, adding jobs to the board the moment they appear.

Can I trust the job listings on Remote First Jobs?

Yes! We verify all job listings and companies to ensure they're legitimate. Our system automatically filters out spam, junk, and fake jobs to ensure you only see real remote opportunities.

Can I suggest companies to be added to your search?

Yes! We're always looking to expand our listings and appreciate suggestions from our community. If you know of companies offering remote positions that should be included in our search, please let us know. We actively work to increase our coverage of remote job opportunities.

How do I apply for jobs?

When you find a job you're interested in, simply click the 'Apply Now' button on the job listing. This will take you directly to the company's application page. We kindly ask you to mention that you found the position through Remote First Jobs when applying, as it helps us grow and improve our service 🙏

Apply