Location: San Francisco, California
The client is an AI-powered revenue architecture platform. It enables companies to simplify complex pricing, automate monetization, and gain full control over how they generate and scale revenue.
About the Role:
The infrastructure is growing quickly. We are looking for a leader who has already operated an Infrastructure org at the scale we are growing into; someone who can run the team today and build it out over the next 12–18 months. This is a very hands-on role, with regular involvement in pipelines, IaC, post-mortems, and security reviews alongside your team. You will own five pillars: Cloud Infrastructure & SRE, CI/CD & Developer Experience, Observability & Cloud Cost, Vendor & Tooling Ownership, and Application & Cloud Security. You will shape the strategy and roadmap as we scale.
What You'll Do:
Lead Cloud Infrastructure & SRE
Lead Cloud Infrastructure & SRE: AWS (ECS, EKS, networking, VPCs, IAM), databases (PostgreSQL, Kafka, DynamoDB), IaC standards, SLOs, error budgets, DR/BCP, and a sustainable on-call program
Own CI/CD & Developer Experience: GitLab CI across ~150 repos: build speed, reliability, test signal, release safety, environments, and paved roads for new services
Drive Observability & Cloud Cost: high signal logging/metrics/tracing/alerting stack, SLO instrumentation, cloud cost attribution by team/service, FinOps practice
Lead Application & Cloud Security end-to-end: secure SDLC, SAST/DAST/SCA, vuln triage and SLAs, secrets hygiene, IAM, CSPM
Partner with IT to support compliance program, customer security questionnaires, and pen-test remediation
Own Vendor & Tooling Portfolio: build-vs-buy decisions, vendor selection, POCs and lifecycle across observability, CI/CD, security, and incident management
Grow and Develop the Team:
Hire, onboard, and mentor engineers. Actively participate in recruiting; interview candidates, provide calibrated feedback, and raise the bar.
Conduct regular 1:1s, provide actionable feedback, and support career development for each team member.
Build a culture of urgency, ownership, customer empathy, and continuous improvement
Special Perks:
Competitive compensation and benefits that reward your talent and impact.
Comprehensive health, vision, dental, and life insurance
A front-row seat in the Silicon Valley tech ecosystem, where you’ll work on cutting-edge challenges shaping the future of SaaS, finance, and payments.
The opportunity to build truly groundbreaking products — your work won’t just support the business; it will influence how companies around the world monetize and grow.
A high-energy, collaborative culture where smart, supportive teammates push each other to learn fast, think boldly, and do the best work of their careers.
Room to grow, lead, and make your mark in a fast-scaling company that values creativity, ownership, and ambition.
Must Have Skills:
- 7+ years of infrastructure or SRE engineering experience, with at least 3 years managing platform, infra, or SRE teams in a product-focused SaaS environment.
- Production AWS ownership — you've run HA workloads on AWS at scale, covering high availability, governance, compliance, and a 24/7 on-call program.
- Infrastructure as code - you've built and scaled IaC practices using Terraform, and know enough about packaging tooling (Helm, Kustomize, or similar) to make good architectural decisions and hold the team to a high bar.
- CI/CD fluency — you've built or owned pipelines in GitLab CI, GitHub Actions, Jenkins, or similar, and know where they break under scale.
- Observability ownership - you've built or evolved an observability stack and made it materially better: alerting, SLOs, and distributed tracing for a real production system.
- Incident command - you've run customer-facing major incidents both as incident commander and as the manager supporting the team. You know the difference between those roles.
- Application security - you've embedded security into the SDLC: SAST/DAST/SCA, vuln management with SLAs, and threat modeling as a team practice, not a checkbox.
- Cloud security posture - you own IAM, secrets management, network segmentation, and CSPM. You've hardened an AWS environment and can explain the tradeoffs you made.
- Compliance partnership - you've worked alongside IT on compliance programs (SOC 2, ISO 27001, or similar) and know how to translate requirements into engineering work.
- Deeply hands-on - you've kept your technical edge as your teams have grown, and you're still close enough to the work to spot problems early and earn credibility with your engineers.
- Bias for action - you default to doing, not delegating. When something is broken, you dig in. When a process isn't working, you fix it. You don't wait to be told.
- Active AI adoption - you use AI-assisted tools (Claude Code, Copilot, Cursor, Windsurf, or similar) as part of your daily workflow for incident diagnosis, IaC authoring, runbook generation, or log and metric analysis. You can describe specifically how, not just that you do.
Preferred:
- B2B SaaS experience with Salesforce coupling — managed packages, org provisioning, integration patterns
- CPQ, billing, or revenue domain — high-stakes correctness, audit-relevant data
- Experience standing up a Platform Engineering practice or internal developer platform (Team Topologies, paved roads)
Tech Stack:
- Salesforce Platform: Apex, Lightning Web Components (LWC), SOQL, Flows, managed packages
- Backend: Java (Spring Boot), Node.js, REST APIs, GraphQL
- Frontend: React 16, TypeScript, Material-UI, Webpack
- Database: PostgreSQL (AWS RDS), Salesforce SOQL/SOSL
- Infrastructure: AWS (Lambda, SQS, S3), Kafka, Docker, GitLab CI/CD
- AI Tooling: Claude Code, Windsurf, Copilot (used daily across engineering)
- Collaboration: Jira, Slack, Gem, Fellow, Confluence