Search Jobvertise Jobs
Jobvertise

Site ReliabilityDevOps Engineer Technical Onboarding and Appli
Location:
US-VA-Arlington
Email this job to a friend

Report this Job

Report this job





Incorrect company
Incorrect location
Job is expired
Job may be a scam
Other







Apply Online
or email this job to apply later

DocMe360 is seeking a skilled and motivated Site Reliability/DevOps Engineer to join our team and play a critical role in supporting the onboarding and support of new applications within the Office of Information Technology (OIT) and Veterans Health Administration (VHA) organizations. The primary focus of this role will be to ensure the technical feasibility, successful technical onboarding, application maintenance, continuous monitoring, performance optimization, and reliability of the Clinical Decision Support Collaborative (CDSC) platform and production applications. The ideal candidate is a proactive problem-solver with a strong background in infrastructure, cloud technologies, application deployment, and performance tuning.

Responsibilities:

  • Collaborate with cross-functional teams from OIT and VHA to understand the technical requirements and feasibility of onboarding new applications onto the CDSC platform.
  • Assess and validate the technical requirements, scalability, and compatibility of new applications with the existing infrastructure and ecosystem.
  • Work closely with development teams to provide technical guidance and assistance during the onboarding process, ensuring adherence to best practices and reliability standards.
  • Build and maintain monitoring solutions that provide real-time insights into the performance, availability, and reliability of CDSC and associated applications.
  • Identify performance bottlenecks, troubleshoot issues, and implement performance optimization strategies to enhance the user experience and system efficiency.
  • Maintain stable production applications throughout the operational phase, working closely with and advising application teams on new releases, features, and patches.
  • Collaborate with DevOps teams to automate deployment processes, implement continuous integration and delivery pipelines, and enhance the overall release management process.
  • Design and implement reliability practices such as chaos engineering, fault injection, and disaster recovery simulations to ensure the platform's resilience in the face of failures.
  • Participate in incident response activities, conduct post-incident reviews, and contribute to the development of preventive measures to avoid similar incidents in the future.
  • Stay up-to-date with industry trends, emerging technologies, and best practices in SRE and DevOps, cloud computing, and application deployment.
  • Foster a culture of collaboration, innovation, and continuous improvement within SRE and DevOps and cross-functional teams.

Requirements

  • Proven experience (3-5 years) working as a Site Reliability Engineer, DevOps Engineer, or a similar role in a complex and dynamic environment.
  • Strong expertise in cloud technologies (e.g., AWS, Azure, GCP) and container orchestration platforms (e.g., Kubernetes).
  • Proficiency in scripting and programming languages such as Python, Bash, Go, or similar.
  • Experience with monitoring and observability tools (e.g., DataDog, DynaTrace, Prometheus, Grafana, ELK stack) to ensure proactive issue detection and resolution.
  • Solid understanding of networking, security principles, and best practices in cloud environments.
  • Demonstrated ability to collaborate effectively with cross-functional teams, including development, operations, and security.
  • Excellent problem-solving skills, with a keen attention to detail and a proactive attitude towards identifying and resolving issues.
  • Strong communication skills, both written and verbal, with the ability to convey complex technical concepts to non-technical stakeholders.

    Preferred:
    • Relevant certifications in cloud platforms, DevOps, or SRE practices.
    • Experience in healthcare or government IT environments.
    • Familiarity with compliance standards such as HIPAA, HITRUST, and FedRAMP.

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Training & Development
  • Work From Home

DocMe360

Apply Online
or email this job to apply later


 
Search millions of jobs

Jobseekers
Employers
Company

Jobs by Title | Resumes by Title | Top Job Searches
Privacy | Terms of Use


* Free services are subject to limitations