Maintain and scale Kubernetes clusters, automate platform quality, and support customer workloads.
Keeping user-facing services and production systems running smoothly.
Implement and enhance system reliability, availability, scalability, performance, and efficiency.
Implement and enhance system reliability, availability, scalability, performance, and efficiency.
Manage observability platforms for critical security and business operations.
Define Fleet Health metrics and indicators to objectively measure and improve system availability.
Operate and maintain bare-metal Kubernetes clusters, scaling up to thousands of nodes.
Ensure the reliability, availability, and performance of cloud-based systems and infrastructure.
Maintain systems for observability, adjust and maintain SLOs, participate in incident resolution, and work on proactive improvements to increase the reliability of managed platforms.
Design and build systems to ensure reliability of large-scale distributed systems handling petabytes of data.
Write clean, well-tested and reviewable code, leveraging technologies such as Java, Python, MySQL, NSQ, Hbase, AWS, and Kubernetes.
Write clean, well-tested and reviewable code, leveraging technologies such as Java, Python, MySQL, NSQ, Hbase, AWS, and Kubernetes.
Collect requirements, design & implement highly available systems & solutions.
Play a crucial role in ensuring the smooth operation of user-facing services and Anyscale production systems.
Ensure the smooth operation of user-facing services and Anyscale production systems.
Develop and optimize software to provision and manage xAI’s infrastructure across on-premise, virtual machine, and classified cloud environments.
Design and maintain high-quality data pipelines, ensuring reliability and accuracy for trading and research.
Helping us operationalize new features, maintain the stability of the application, and improve how we develop and deploy it.
Design and implement comprehensive capacity planning models and forecasting systems.
Ensure the reliability, availability, and performance of OpenFX’s systems.