Job DescriptionJob Description

Position : Service Reliability Engineer / Sr. Devops Engineer

Location : Santa Clara, CA

Duration : 1 Year +

OK with any visa No OPT please

Local consultants only

Customer will not provide letter for H1B candidates. Please check with the candidate and employers before submitting the resume. Face to face is mandatory so please submit local candidates only.

Responsibilities:

Development and Operations (DevOps) subject matter expert for 24x7 SaaS operation

Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson's next TV platforms.

Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.

Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale

Point of escalation/decision maker on response level of incidents

Participate in the Core SRE on-call roster and respond with command and control incident management during High Pri Events while maintaining internal and external SLAs

Act as Technical Duty Officer who leads resolution effort of the most complex service problems from network layer to the application at scale

Drive Problem Management/Retrospectives ("post mortems")

Strong contribution and maintenance of our knowledge base

Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.

Contribute to the future growth of the team by conducting candidate screenings and assessments

Accountable for deploying services to production environments

Technologies:

Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.

Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on Azure/AWS/Openstack

Azure, Openstack and AWS concepts and APIs

Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc

Demonstrable experience in one or more : Powershell, Python, BASH, C#, .NET

Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs

Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git

Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.

Qualifications:

Bachelor's Degree in CS, MIS, or equivalent experience

5+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, and application knowledge;

Solid communications skills both written and verbal. Able to effectively tailor messaging to different audiences: External Customer, Leadership, technical SME, or to Tier-1

Previous experience in customer facing roles during high stress situations

Demonstrated skills as an influencer within a previous organization

In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus

In-depth knowledge of business operations, objectives, and strategies..

Familiarity with Containers (e.g. Docker, RKT) and IaaS (e.g. AWS, Azure, Openstack).

Service Reliability Engineer / Sr. Devops Engineer - Santa Clara, CA

Service Reliability Engineer / Sr. Devops Engineer - Santa Clara, CA

Share this job now