Site Reliability Engineer

We are looking for a talented Site Reliability Engineer (SRE) with a deep interest in distributed systems, cloud computing and the architecture of large-scale systems. The Senior SRE will ensure our InsightIDR services have the ultra-high reliability and uptime necessary to meet our customers’ needs.About the TeamOur InsightIDR product helps identify and address key cybersecurity risks to our customers. We apply AI, ML, threat intelligence, and BI to event sources, including desktops, servers, network switches, firewalls, cloud services, directory servers, DHCP servers, and SIEMs in order to distill hundreds or thousands of daily events per customer into the few real, high priority threats that need attention. Our systems ingest large amounts of data that need to be highly available and performant at all times. Some of the technologies we use include:Java, Python, Cassandra, MySQL/RDS, Redis, ElasticSearch, Kafka, AWS (EC2, S3, CloudFormation, etc.), Zookeeper, Terraform, Jenkins, Artifactory, Chef, Puppet, Ansible, Kubernetes,.... About the RoleAs SRE, you will work closely with our engineering team and partner teams throughout Rapid7 to help solve extremely challenging problems at a massive scale.In this role, you will:Support services before they go live through activities such as design, deployment, migration strategy, monitoring, and playbook reviews Maintain services once they are live by measuring and monitoring availability, latency, and overall system healthScale systems through automation, driving service and infrastructure improvements as well as other waysTroubleshoot production issues and liaise with relevant Engineering or Infrastructure teams to find a resolutionParticipate in on-call support, and incident response follow-ups such as post-mortemsWork closely with Engineering teams, Architecture, Infrastructure and Product teams to improve the lifecycle of the InsightIDR services - from inception, design, deployment, operations, monitoring, security, upgrade and maintenanceMentor and coach team membersContinuously develop and refine your own skill setThe skills you’ll bring include:Bachelor’s degree in Computer Science, STEM-related field, or 3+ years industry experience3+ years of experience in Unix/Linux systems, IP networking, performance and application issues, RESTFul architectures, database operation and optimization3+ years of experience programming in one or more of the following languages: Java, Python, C, C++, Go, Rust, RubyKnowledge of Public Cloud Providers (AWS, Azure, GCP)Strong written and verbal communication skillsNice-to-have:3+ years of experience in SRE or DevOpsKnowledge in AWS services, including EC2, RDS, VPC, networking, S3, MSK, etc.We know that the best ideas and solutions come from multi-dimensional teams. That’s because these teams reflect a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don’t be shy - apply today.

Site Reliability Engineer

Site Reliability Engineer

Share this job now