Site Reliability Engineer
As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operational problems.
Much of your support and software development focuses on optimizing new and existing systems, building infrastructure, and reducing work through automation.
You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks.
In this environment you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow.
As an SRE you’ll be focused on running better production applications and systems in the DOX Trading Portfolio for both Discretionary and non-Discretionary product teams. Responsibilities: * Develop, test, and debug automated tasks (Apps, Systems, Infrastructure). * Troubleshoot priority incidents, and facilitate blameless post-mortems. * Work with development teams throughout the software life cycle ensuring sustainable software releases. * Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions. * Build and drive adoption for greater self-healing and resiliency patterns. * Lead and participate in performance tests; identify bottlenecks, opportunities for optimization, and capacity demands. Position Requirements This role requires a wide variety of strengths and capabilities, including: * Bachelor’s degree or equivalent experience in a software engineering discipline. * Proficient in Java, Kubernetes, SQL (DB2, Oracle), no SQL (Mongo), data streaming (Kafka), and DevOps principles. * Perform in-depth research and identify cause of production issues, perform workaround for mitigation. * Design and code software solutions that help automate manual processes * Identify gaps and weaknesses in existing applications and design and implement remediation of these gaps. * Work to improve the stability of applications in the production environment. * Understand and contribute to the software delivery lifecycle. * Ensure changes to the applications follow a disciplined change management processes including all documentation, reviews, and approval steps using standard tool sets for automated testing and deployment * Expertise in application, data and infrastructure architecture disciplines * Proficient knowledge of one or more infrastructure components such as orchestration tools, containerization, cloud services, compute and storage systems * Capable of managing service-level changes to a system or service * Hands-on experience with cloud deployment, monitoring, and ops analysis tools such as Kubernetes, Prometheus, Elasticsearch, Grafana, Kibana, Splunk, and DynaTrace * Experience with cloud hosted applications is a plus.