Search Jobvertise Jobs

Site Reliability Engineer (SRE) - Moogsoft
Email this job to a friend

Report this Job

Report this job

Incorrect company
Incorrect location
Job is expired
Job may be a scam

Apply Online
or email this job to apply later

The ideal candidate will collaborate with the core teams combining software practices and engineering to strengthen the application/system reliability along with operational support. Advanced knowledge of system architecture, network, application development, testing, and operational stability will help transform the way the teams operate today. The candidate will possess advanced scripting and coding capabilities to develop artifacts for alert & event correlation ingested from diverse monitoring sources and leverage AI/ML to automate recovery actions. Five or more years of experience as a Site Reliability EngineerArchitect a new framework to establish an SRE Model across multiple teamsDevelop new processes to prevent problem recurrence and automated recoveries.Enhance SLO trending and centralized reporting (ex. Grafana dashboard integration)Identify opportunities to improve architecture/engineering practicesMentor staff to replace manual processes with automationCollaborate across all level of the organization to drive the SRE modelAdvanced experience in supporting enterprise container based platformsStrong Systems & Network Architecture experienceExperience in cloud technologies such as architecting, developing or maintaining cloud solutions in public cloud environments (AWS/OCI/GCP)Data ingestion & enrichments Webhooks, REST API design, JSON, XML, SMTPCI/CD - Deployment pipeline experience (Jenkins, Ansible)Devops container/orchestration tools (Kubernetes, Docker, Puppet, etc)Good knowledge of Python, bash or similar scripting languagesExperience with Configuration Management systemsKnowledge of Unix/Linux based systems, and experience troubleshooting applications running on these systemsExperience with software lifecycle including design, implementation, and deliveryExpertise in designing, analyzing and troubleshooting large-scale distributed systemsAbility to apply a systematic approach to solve problems with a sense of ownership and focusEffective communication skills with the ability to articulate technical details to different audience Requirements (emphasis on Moogsoft)Installation, Infra & Config:Linux Systems Administration and Operations (link removed)work Administration experience.JavaScript experience.Familiarity with the Moogsoft installation procedures. Integrations & DevFamiliarity with WebHooks, REST API, JSON, XML, SMTP.Development experience with a popular scripting language (Python) and Unix Shell Scripting.Familiarity with SQL QueryProficient in Jenkins & AnsibleProficient in Grafana reporting tools. Clustering & WorkflowsFamiliarity with Operations (SRE) workflows, responsibilities and organizational structures.Familiarity with predetermined and dynamic correlation, entropy, anomaly detection concepts.Strong SQL/PERCONA DB experience.Experienced communicator and collaborator. Platform MonitoringSystems Administration and Operations (link removed)work Administration experience.Development experience with a popular scripting language (Python, GO, Ruby), JavaScript and Unix Shell ScriptingFamiliarity with Moogsoft components and data flows.Understanding of monitoring and metrics concepts. (Volume, Performance, Capacity)

Pyramid Consulting, Inc

Apply Online
or email this job to apply later

Search millions of jobs


Jobs by Title | Resumes by Title | Top Job Searches
Privacy | Terms of Use

* Free services are subject to limitations