Skip to main content

Lead Site Reliability Engineer with Java

Job DescriptionJob DescriptionJob Description

Role: Lead Site Reliability Engineer with Java

Location: San Antonio, Texa

Duration: 18 Month

Relevant Experience: 14+ Years 

 

Job Description & Key Responsibilities:

As a Lead Site Reliability Engineer (SRE), you will leverage your extensive experience in SRE practices to maintain and enhance the reliability, performance, and scalability of mission-critical systems. You will play a crucial role in ensuring the continuous availability and optimal functioning of our services.

 

Key Responsibilities:

Senior-Level SRE Expertise: Apply your deep understanding of SRE principles to lead efforts in improving system reliability and operational efficiency.

Incident Management: Provide expert-level support during incidents, ensuring swift resolution with minimal service disruption. Lead post-incident reviews to drive continuous improvement.

Monitoring & Alerting: Design, implement, and optimize monitoring, alerting, and incident response processes. Ensure the effectiveness of these systems to proactively address potential issues.

Automation: Drive the automation of manual processes to enhance operational efficiency, reduce human error, and increase overall system resilience.

CI/CD Pipeline Management: Develop, maintain, and improve automated CI/CD pipelines using tools such as GitLab CI/CD and Jenkins, ensuring seamless and reliable deployment processes.

Cross-Functional Collaboration: Work closely with cross-functional teams to ensure the reliability, performance, and scalability of our infrastructure. Foster a culture of collaboration and knowledge sharing.

Support Across Time Zones: Provide support across all U.S. time zones, with the flexibility to work weekends, rotational shifts, and overtime as required to maintain service continuity.

Required Skills & Qualifications:

Java Programming: Advanced proficiency in Java, with a deep understanding of contemporary software development practices.

Kubernetes & Containerization: Extensive hands-on experience with Kubernetes, including containerization technologies like Docker and Kubernetes storage solutions such as Portworx.

Linux/Unix Systems: Strong command of Linux/Unix operating systems and Shell Scripting (BASH), with a focus on system reliability and automation.

Functional Programming: Proficiency in functional programming such as Prolog, Haskell, and OCaml.

Additional Information

All your information will be kept confidential according to EEO guidelines.

Lead Site Reliability Engineer with Java

San Antonio, TX
Full time

Published on 03/17/2025

Share this job now

Go back