Location: REMOTE / New York, New York
This job allows you to work remotely.
Our client is looking for a Site Reliability Engineer (SRE) to join their team and support the reliability, security, and scalability of their infrastructure. The ideal candidate will be responsible for managing user access, service accounts, and permissions across various platforms while leveraging Terraform and Terragrunt for infrastructure management. You will also help maintain and deploy services across Kubernetes, Docker, AWS ECS, and LXC.
Key Responsibilities
-Manage and configure infrastructure using Terraform and Terragrunt when appropriate.
-Support and maintain deployment of services on Kubernetes, Docker, AWS ECS, and LXC.
-Configure user accounts and permissions across the organization using Okta, Keycloak, AWS and other apps as required
-Ensure the reliability, scalability, and security of infrastructure and services.
-Work collaboratively with the Site Reliability Team to optimize system performance.
-Automate and streamline operational processes through scripting and infrastructure as code (IaC).
Special Perks:
-100% Remote - and always will be.
-Flexibility with how you work, and plan your day
-Competitive compensation and benefits package.
Please note - this is a full-time position, but given they do not have an EOR in Canada, they are bringing people on as contractors. You will be paid like an employee (salaried, not hourly) but until an EOR is established in CAN this is how they have to hire here. Most of their team has been with them for 5+ years as contractors.
Must Have Skills:
Qualifications
-2+ years of hands-on experience with Python.
-Strong familiarity with Linux operating systems.
-Good general knowledge of AWS with at least 2 years of experience.
-Experience with Terraform and Terragrunt for infrastructure as code.
-Understanding of Kubernetes, Docker, AWS ECS, and LXC for service deployment and management.
-Experience configuring identity and access management using Okta and Keycloak.
Nice to Have Skills:
Experience with Node.js and C++.
Familiarity with modern CI/CD pipelines and DevOps best practices.
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).