25madison Logo

Senior Infrastructure Engineer

💰 $145k-$190k

Job Description

Company: Ferra

Location: Remote or Hybrid

Experience: 8+ years

Reports to: CTO / Founding Engineer

About Ferra

Ferra is building AI infrastructure for structural steel estimation. We process large-scale construction drawing PDFs, run computer vision + LLM pipelines, and generate structured steel graphs, takeoffs, and export-ready models.

Our system includes:

  • Multi-stage ML pipelines (CV + LLM)
  • Asynchronous job processing (SQS-driven workflows)
  • Large PDF ingestion and document graph processing
  • Vector-native parsing and algorithmic geometry systems
  • Graph storage + export services

Role Overview:

We are hiring a Senior Infrastructure Engineer to own uptime, reliability, latency, and scalability across our entire AWS environment.

You will ensure our AI/ML pipelines run reliably at scale — without cloud outages, timeouts, networking bottlenecks, or production instability slowing down our algorithm team.

You will build and maintain production-grade AWS architecture that supports:

  • Large PDF ingestion (100–500+ sheets)
  • Computer vision pipelines
  • LLM inference workflows
  • Distributed job queues
  • High-volume asynchronous processing

Your mission is to enable the frontend teams to move fast without worrying about infrastructure.

What You Will Own:

Keep things running. You own uptime (99.9%+), observability, incident response, and root cause analysis. When something breaks, you fix it — and make sure it doesn’t break the same way twice.

Own the AWS architecture. Deep AWS stack: EC2 (including GPU), ECS/Fargate, SQS, Lambda, S3, CloudFront, API Gateway, RDS/DynamoDB — plus VPC design, IAM, autoscaling, and monitoring. You’ll make the architectural calls, not just maintain what’s there.

Make ML pipelines reliable. The core workloads are CV, LLM inference, and long-running batch jobs. You’ll build the plumbing: retry logic, idempotency, checkpointing, parallel orchestration. Experience with event-driven or DAG-based pipelines is a plus.

Chase down performance problems. Queue bottlenecks, cold starts, LLM latency, runaway costs: you will find and fix them. Comfortable debugging at the TCP, TLS, ECS, and IAM level.

Help the team ship faster. CI/CD, infrastructure-as-code (Terraform/CDK/Pulumi), clean containerization, and proper staging environments. The goal: deployments are boring and “works on my machine” stops being an excuse.

About You:

  • 8+ years in infrastructure / DevOps / production engineering
  • Deep AWS expertise (not just “used it” — architected at scale)
  • Experience running production ML or AI systems
  • Experience with asynchronous distributed systems
  • Strong knowledge of: ECS / Fargate, EC2 (including GPU instances), SQS, S3, VPC networking, and IAM best practices
  • Strong understanding of: Containerization (Docker), CI/CD pipelines, Infrastructure as Code and observability systems
  • Experience debugging production incidents and designing fault-tolerant systems

Nice to have: Prior exposure to GPU workloads at scale, event-driven architectures, or PDF/document-heavy pipelines. Bonus if you’ve done this in a startup environment where the infrastructure and the product were both still being figured out.

Why Ferra:

You’ll be building infrastructure for real agentic AI, not wrappers around someone else’s API. The team is small and technical, which means high ownership, fast decisions, and your work has direct impact on the core product. Competitive comp, meaningful equity, and a genuine shot at defining how AI agents operate in production.

How to apply: Apply via the breezy application here. Applications will be accepted on a rolling basis.

Target Annual Base Salary Range: $145,000–$190,000

Final salary will be determined based on the candidate’s experience, knowledge, and skills. The salary reflected does not include an annual discretionary bonus, equity, or other benefits offered by the Company, as applicable.

Share this job:
Please let 25madison know you found this job on Remote First Jobs 🙏

5542 similar remote jobs

Explore latest remote opportunities and join a team that values work flexibility.

Remote companies like 25madison

Find your next opportunity with companies that specialize in Entrepreneurship, Venture Capital, Technology, and Consumer Products. Explore remote-first companies like 25madison that prioritize flexible work and home-office freedom.

Axiom Zen Logo

Axiom Zen

A venture studio that turns ideas into companies by testing emerging technology and building high-growth businesses.

View company profile →
Nerdery Logo

Nerdery

A digital consultancy focused on delivering solutions powered by data, AI, and cloud technology.

View company profile →
WillowTree Logo

WillowTree

We partner with brands to design, build, and deliver digital customer experience and AI-powered solutions.

View company profile →
The Starr Conspiracy Logo

The Starr Conspiracy

Brand and marketing experts

View company profile →
Headspace Logo

Headspace

Digital mental health support through meditation, mindfulness, therapy, coaching, and psychiatry for individuals and organizations.

21 open positions →
Inventive Works, LLC Logo

Inventive Works, LLC

Custom software applications and cloud migration services for businesses of all sizes.

View company profile →

Project: Career Search

Rev. 2026.2

[ Remote Jobs ]
Direct Access

We source jobs directly from 21,000+ company career pages. No intermediaries.

01

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

02

Advanced Filters

Filter by category, benefits, seniority, and more.

03

Priority Job Alerts

Get timely alerts for new job openings every day.

04

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

21,000+ SOURCES UPDATED 24/7
Apply