Build and orchestrate large, distributed infrastructure with a focus on automation.
Build and orchestrate Modern OTEL-based Observability Platform
Drive incident response best practices, lead postmortems, and define SLAs/SLOs across platform services.
Create a hybrid infrastructure integrating edge devices, on-premises, and cloud resources.
Operate distributed LLM inference and large GPU clusters worldwide.
Build and operate platform deployments in classified environments where clarity, reliability, and speed directly impact operational readiness.
Lead full-cycle recruitment efforts for key infrastructure, security, and engineering roles.
Lead and support the deployment of operational planning platform across secure U.S. military environments.
Manage large-scale systems, automate infrastructure, and ensure seamless service reliability.
Manage and maintain databases for accuracy and efficiency.
Lead program to modernize Medicare Fee-for-Service shared systems and ensure secure, reliable operation.
Helping us operationalize new features, maintain the stability of the application, and improve how we develop and deploy it.
Take ownership of building and evolving internal monitoring and alerting systems.
Continuously hunt for vulnerabilities in interactions between applications, infrastructure, and models.
Building secure, scalable systems for security observability.
Design and develop scalable software systems for security observability.
Scale and optimize database infrastructure for performance and reliability.
Deploying and automating modern SaaS platforms at scale.
Strengthen alert management and incident response capabilities.
Implement reliability improvements across infrastructure and application layers.