or email this job to apply later

Site Reliability Engineering (SRE)

Onsite role - Mechanicsburg (PA), Houston (Texas) and Corvallis (Oregon)

Site Reliability Engineering (SRE) is the discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create, deploy, and manage scalable and highly reliable software systems. SRE ensures that all services - both our internally critical and our externally visible systems - have reliability and uptime appropriate to users' needs while keeping a watchful eye on capacity and performance.

We are looking for a Site Reliability Engineer to help us grow our Centre of Enablement team. The successful candidate will work in an agile team environment on cloud-based enterprise applications deployed to both internal and external clients. The ideal service reliability engineer candidate is either a software engineer with a good administration background or a highly skilled system administrator with knowledge of coding and automation.

Please check our website to get to know more about our teams in Digital Solutions.

Responsibilities

Deployment of software to multiple operating environments; installation and configuration of demo/production/training environments
Day-to-day administration of multiple operating environments including joint responsibility for our production environments
Troubleshoot issues that arise in web-based operating environments, document defects in Azure DevOps, and work with colleagues to resolve issues
Support DevOps efforts daily interacting with Development and Architects team members
Support development of processes and procedures for deployment, operations, and maintenance
Support web operations workflow automation using configuration management and continuous deployment frameworks
Apply service delivery best practices
Conduct post-mortems, investigate, analyze and document unplanned downtime events
Automate existing manual processes
Work with a global team spread across tech hubs in multiple geographies and time zones
Troubleshoot, stabilize, and resolve incidents when they occur.
Perform incident analysis to gather findings and identify follow-up actions that lead to more reliable products

Requirements

Position Qualifications

Systems engineering or DevOps experience
Developing and maintaining continuous deployment pipelines (e.g., Azure DevOps)
Good knowledge of a scripting language like Powershell, Bash, Python
Experience working on cloud-based infrastructure (e.g. Azure)
Solid foundation in both software and systems engineering
A strong desire to current on technical trends in order to suggest innovative tools and approaches to interesting problems

Preferred Skills

Experience securing cloud/web applications strongly desired
Version Control Systems experience (GIT preferred)
Azure ARM templating
Terraform
Grafana, Kiali
Experience with service reliability engineering specifically with Azure cloud

Nice to Have

Experience with container and container orchestration technology (e.g., Docker, Kubernetes)
Gitops Tools (e.g., ArgoCD, Flux)

Merican Inc

Apply Online

or email this job to apply later

	Search millions of jobs
Jobvertise

Report this job