Lead Site Reliability Engineer with Java
Job DescriptionJob DescriptionJob Description
Role: Lead Site Reliability Engineer with Java
Location: San Antonio, Texa
Duration: 18 Month
Relevant Experience: 14+ Years
Job Description & Key Responsibilities:
As a Lead Site Reliability Engineer (SRE), you will leverage your extensive experience in SRE practices to maintain and enhance the reliability, performance, and scalability of mission-critical systems. You will play a crucial role in ensuring the continuous availability and optimal functioning of our services.
Key Responsibilities:
Senior-Level SRE Expertise: Apply your deep understanding of SRE principles to lead efforts in improving system reliability and operational efficiency.
Incident Management: Provide expert-level support during incidents, ensuring swift resolution with minimal service disruption. Lead post-incident reviews to drive continuous improvement.
Monitoring & Alerting: Design, implement, and optimize monitoring, alerting, and incident response processes. Ensure the effectiveness of these systems to proactively address potential issues.
Automation: Drive the automation of manual processes to enhance operational efficiency, reduce human error, and increase overall system resilience.
CI/CD Pipeline Management: Develop, maintain, and improve automated CI/CD pipelines using tools such as GitLab CI/CD and Jenkins, ensuring seamless and reliable deployment processes.
Cross-Functional Collaboration: Work closely with cross-functional teams to ensure the reliability, performance, and scalability of our infrastructure. Foster a culture of collaboration and knowledge sharing.
Support Across Time Zones: Provide support across all U.S. time zones, with the flexibility to work weekends, rotational shifts, and overtime as required to maintain service continuity.
Required Skills & Qualifications:
Java Programming: Advanced proficiency in Java, with a deep understanding of contemporary software development practices.
Kubernetes & Containerization: Extensive hands-on experience with Kubernetes, including containerization technologies like Docker and Kubernetes storage solutions such as Portworx.
Linux/Unix Systems: Strong command of Linux/Unix operating systems and Shell Scripting (BASH), with a focus on system reliability and automation.
Functional Programming: Proficiency in functional programming such as Prolog, Haskell, and OCaml.
Additional Information
All your information will be kept confidential according to EEO guidelines.