Looking for a Senior SRE Engineer to join a team that works on a distributed architecture, spanning physical machines – and virtualizing on-prem host/cloud computing. Engineers will provide support for centralizing DevOps and help existing teams adopt best practices within our environment. Candidates will have the ability to manage complex tasks that span across multiple stack layers including working with Cloud and on-prem servers, automating system management (Python automation and scripting) bringing quicker and more efficient ways to interact with our infra. Project description:The customer develops and deploys systematic financial strategies across a variety of asset classes and global markets. We seek to produce high-quality predictive signals (alphas) through our proprietary research platform to employ financial strategies focused on exploiting market inefficiencies. Our teams work collaboratively to drive the production of alphas and financial strategies – the foundation of a sustainable, global investment platform. Responsibilities:● Design and deploy Solutions-as-a-service using open-source technologies to automate system management, scaling and monitoring.● Develop tools that improve and streamline current processes such as deployment, monitoring and incident management in a distributed environment.● Work closely with development and operations team to design software solutions that will enhance service reliability.● Set up, configure and maintain monitoring and alerting systems that provide real-time visibility into our systems.● Participate in on-call rotations.● Contribute to on-going DevOps/agile transformation.● Leverage container orchestration tools (kubernetes)● Use cloud infrastructure (AWS, GCP, Azure, etc.) and IaC tools (Helm, Ansible, Terraform) to ensure fast, safe and reliable deployments. Requirements:● Deep expertise and hands-on experience working with Linux systems. Strong focus on system optimization and troubleshooting.● Strong OOP and Python knowledge with hands-on experience on automation, scripting and system management.● In-depth knowledge of container orchestration technologies such as Kubernetes (K8S). Experience with other cluster management tools like Slurm is a plus.● Hands-on experience with IaC tools like Helm, Terraform, and Ansible.● Strong knowledge with containerization technologies (Ex. Docker and Podman) to ensure reliable and consistent deployments.● Experience working with CI/CD tools, especially GitLab (preferred), GitHub, or Git, to ensure smooth and rapid delivery cycles.● Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack to provide comprehensive insights into system performance and health.● Understanding of relational databases, their performance tuning, and management in distributed systems (Ex. PSQL, DynamoDB, Cassandra, etc.)● Familiarity with Agile development methodologies, with a focus on continuous improvement and collaboration.● Exposure to cloud technologies such as AWS or Google Cloud (GCP) is a strong plus.● A team-first attitude with excellent verbal and written communication skills in English, able to work collaboratively with peers across the organization. Perks:● Referral bonus.● Tuition Reimbursement.● English lessons with native teacher.● Home office + optional coworking space.● 2 days off per year.● 3 sabbatical weeks every 3 years in the company.● 3 weeks vacation.● Birthday gift.● Computer and USD home office budget per year.

System Reliability Engineer

System Reliability Engineer

Share this job now