Build and operate platform deployments in classified environments where clarity, reliability, and speed directly impact operational readiness.
Coordinate and support the technical delivery of new deployments.
Manage large-scale systems, automate infrastructure, and ensure seamless service reliability.
Design and implement AI-powered solutions to drive observability, automate incident response, and optimize cloud-native platforms.
Own and scale the platform that makes AI engineering work for everyone.
Hands-on maintenance and automation of GPU infrastructure across its lifecycle.
Hands-on maintenance and troubleshooting of high-performance GPU infrastructure.
Developing and enforcing SRE best practices across the organization.
Work on web applications, data mining, machine learning/data science, data transformation/ETL, and more.
Working on web applications, data mining, machine learning/data science, and more.
Maintain and harden AWS infrastructure, operate and evolve EKS clusters, and more.
Build and lead SRE and Security Engineering functions from our new strategic hub in Dublin.
Design and build secure-by-default infrastructure for Harvey's AI capabilities.
Design, develop, and deploy new infrastructure services.
Design, implement, and manage monitoring, alerting, and infrastructure resources across 50+ global regions.
Design, implement, and manage monitoring, alerting, and infrastructure resources across 50+ global regions.
Join our Infrastructure Engineering team and help ensure the reliability, scalability, and performance of Replit's infrastructure.
Co-create the future with us as we build technology that transforms how the world develops software
Scale and optimize database infrastructure for performance and reliability.
Strengthen alert management and incident response capabilities.