Job Description

Tower Research Capital is a leading quantitative trading firm founded in 1998. Tower has built its business on a high-performance platform and independent trading teams. We have a 25+ year track record of innovation and a reputation for discovering unique market opportunities.

Tower is home to some of the world’s best systematic trading and engineering talent. We empower portfolio managers to build their teams and strategies independently while providing the economies of scale that come from a large, global organization.

Engineers thrive at Tower while developing electronic trading infrastructure at a world class level. Our engineers solve challenging problems in the realms of low-latency programming, FPGA technology, hardware acceleration and machine learning. Our ongoing investment in top engineering talent and technology ensures our platform remains unmatched in terms of functionality, scalability and performance.

At Tower, every employee plays a role in our success. Our Business Support teams are essential to building and maintaining the platform that powers everything we do — combining market access, data, compute, and research infrastructure with risk management, compliance, and a full suite of business services. Our Business Support teams enable our trading and engineering teams to perform at their best.

At Tower, employees will find a stimulating, results-oriented environment where highly intelligent and motivated colleagues inspire each other to reach their greatest potential.

Responsibilities

Lead transformational projects with focus on Application Reliability Engineering, ensuring that long-term rollouts are completed, incidents are fully resolved, and global initiatives are coordinated.
Strengthen AppRE by driving productivity, automation, and monitoring improvements while representing AppRE in cross-team forums.
Maintain a clear roadmap of AppRE initiatives and ensure long-term projects (automation, monitoring, resilience) are tracked to completion or clean handoff.
Execute technical POCs for new initiatives, including hands-on configuration and initial deployment, creating comprehensive ‘how-to’ documentation and runbooks for clean handover to the broader AppRE team.
Guarantee zero orphan incidents: ensure all tickets either drive real development work or are closed, partnering with the AI team to classify/prioritize work and highlight the right items for developers.
Contribute to the redesign of Monitoring & Alerting, including real-time alerting, and work with AI to reduce noise, detect anomalies, and highlight meaningful patterns.
Partner with the AI team to centralize knowledge via AI-assisted runbooks/playbooks, tagging of incidents, and surfacing recurring patterns, ensuring AppRE documentation, monitoring standards, and onboarding guides are searchable and consistent.
Reduce silos by connecting AppRE initiatives with other teams’ work.

Qualifications

At least 5 years of relevant experience in a leadership or senior role within Application Reliability Engineering, or a related Operations/Infrastructure capacity.
Proven experience in implementing technical POCs for transformational projects.
Possess a highly analytical mindset with strong ability to influence senior stakeholders and drive complex technical decisions.
Highly organized, with excellent time management skills and experience managing project roadmaps.
Great written and verbal communication in English, with experience presenting technical concepts to cross-functional groups.
Strong knowledge of application monitoring, alerting design, and incident management best practices.
Strong networking skills within the industry and ability to build relationships with key contacts.
Financial IT sector or high-frequency trading industry experience preferable.

Preferred Qualifications (not required)

Knowledge of Microsoft Project, Visio, Jira and Confluence.
Hands-on experience with scripting languages (e.g., Python, Bash/Shell) for automation and operational tasks.
Familiarity with enterprise-level scheduling tools.
Experience leveraging AI/ML agents or tooling to enhance operational efficiency (e.g., automated root cause analysis, predictive alerting).
Experience leading a regional or global function for a technology team.
Experience working with AI/ML teams to integrate advanced data analysis into operations or reliability practices (e.g., anomaly detection, predictive maintenance).
Knowledge of real-time data streaming technologies (e.g., Kafka, Red Panda).

Anticipated annual base salary range $140,000 - 240,000, plus eligible for discretionary bonus

Tower’s headquarters are in the historic Equitable Building, right in the heart of NYC’s Financial District and our impact is global, with over a dozen offices around the world.

At Tower, we believe work should be both challenging and enjoyable. That is why we foster a culture where smart, driven people thrive – without the egos. Our open concept workplace, casual dress code, and well-stocked kitchens reflect the value we place on a friendly, collaborative environment where everyone is respected, and great ideas win.

Our benefits include:

Generous paid time off policies
Savings plans and other financial wellness tools available in each region
Hybrid working opportunities
Free breakfast, lunch, and snacks daily
In-office wellness experiences and reimbursement for select wellness expenses (e.g., gym, personal training and more)
Company-sponsored sports teams and fitness events (JPM Corporate Challenge, Cycle for Survival, Wall Street Rides FAR and more)
Volunteer opportunities and charitable giving
Social events, happy hours, treats, and celebrations throughout the year
Workshops and continuous learning opportunities

At Tower, you’ll find a collaborative and welcoming culture, a diverse team and a workplace that values both performance and enjoyment. No unnecessary hierarchy. No ego. Just great people doing great work – together.

Tower Research Capital is an equal opportunity employer.

Technical Operations Lead

Job Description

13687 similar remote jobs