Site Reliability Engineer (SRE) Job at DRC Systems, Washington DC

SHJ0M04ybW5rcTRpbFBVRy85QURhZ2tsQkE9PQ==
  • DRC Systems
  • Washington DC

Job Description

Site Reliability Engineer (SRE) Vulnerability Management, Observability & Server Patching

Seattle WA

Role Overview

This role is responsible for ensuring the security, reliability, and operational excellence of server infrastructure through proactive vulnerability management, effective server patching, and robust observability practices. The SRE will leverage platforms such as Brinqa for vulnerability aggregation and prioritization, and Datadog for monitoring, alerting, and service observability.

The ideal candidate will work closely with engineering, security, and application teams to identify and remediate risks, execute patching strategies, and continuously improve system visibility, reliability, and compliance.

Key Responsibilities

Vulnerability Management

  • Manage and continuously improve the enterprise vulnerability management program using Brinqa for aggregation, prioritization, and reporting.
  • Identify, analyze, and assess vulnerabilities across server infrastructure, including operating systems, applications, and supporting components.
  • Partner with security, infrastructure, and application teams to prioritize remediation efforts based on risk and business impact.
  • Ensure adherence to corporate security policies, regulatory requirements, and industry best practices.

Server Patching & Remediation

  • Plan, schedule, and execute server patching activities for operating systems and third-party software.
  • Track patch compliance and remediation metrics, including mean time to patch (MTTP).
  • Develop and maintain automation scripts and tooling to streamline patching workflows and improve efficiency.
  • Reduce operational risk by standardizing patching processes and minimizing service disruption.

Observability & Reliability

  • Maintain and enhance observability of supported services using Datadog.
  • Design and implement effective monitoring, alerting, and dashboards to improve service reliability and operational awareness.
  • Define and measure service-level indicators (SLIs), service-level objectives (SLOs), and success metrics.
  • Analyze incidents and trends to drive continuous improvement in system reliability and performance.

Collaboration & Operations

  • Collaborate with application owners, platform teams, and other stakeholders to support core SRE and operational objectives.
  • Provide guidance and best practices related to reliability, security, and operational resilience.
  • Support incident response, root cause analysis, and post-incident reviews where applicable.

Skills & Qualifications

  • Strong hands-on experience with server operating systems (Windows Server, Linux) and patching methodologies.
  • Solid understanding of vulnerability management frameworks, risk-based prioritization, and remediation practices.
  • Experience with vulnerability management tools such as Brinqa , Qualys , or similar platforms.
  • Proven experience implementing observability solutions using Datadog .
  • Experience working in on-premise and Microsoft Azure environments .
  • Hands-on experience with containerized applications using Docker and Kubernetes (K8s) .
  • Experience with CI/CD pipelines , including GitOps-based deployments using ArgoCD .
  • Proficiency in automation and scripting (e.g., Python, PowerShell, Bash).
  • Experience supporting on-call rotations , incident response, and production issue resolution.
  • Good knowledge of networking concepts , including TCP/IP, DNS, load balancing, firewall rules, and troubleshooting connectivity issues.
  • Familiarity with ITIL concepts and operational best practices.
  • Strong communication and cross-team collaboration skills.
  • Ability to work independently, manage multiple priorities, and operate effectively in a fast-paced environment.

Job Tags

Similar Jobs

Merck & Co.

Clinical Quality Operations Manager - Remote Job at Merck & Co.

Job DescriptionIn partnership with the Clinical Quality Operations Lead (CQOL and Head of CQO, the CQOM is accountable for the execution of operational quality activities within the assigned therapeutic area.- This includes operational quality management and inspection ...

JACK SPRAT

COOKING AND SERVING Job at JACK SPRAT

 ...dinner 5 nights per week and see a good mix of locals and tourists. Our scratch kitchen is known for healthy alternatives featuring vegan fare, Alaskan seafood, all-natural meats, and decadent desserts. We promote a progressive tip share that rewards all non-management... 

KnownHost

Technical Support Operator - Remote Job at KnownHost

 ...responsibilities in your daily goal to fight bad web hosting: Hiring for day shift employees CST. Provide customers support via our 24/7 helpdesk. Troubleshooting email delivery issues Troubleshooting cPanel, Plesk, and DirectAdmin related issues Troubleshooting DNS,... 

AUROBINDO

Manufacturing Operator Job at AUROBINDO

Division OverviewAurolife was founded in 2008 as part of a group of companies that has a long history of excelling in generic pharmaceutical product development and manufacturing. Aurolife has a 10 year history of extraordinary manufacturing practices in the solid oral ...

Aequor

Material Handler Job at Aequor

Job Title: Material Handler Hours between 9 am and 7pm. Location will be at Hicksville New York Site. bench mark pay rate for this position is /hour. Job Purpose: The primary responsibility of this position is to transport assigned operational components, supplies and materials...