Senior Site Reliability Engineer, Arlington

Onebrief
Chantilly, Loudoun County, VA

About Onebrief

Onebrief is collaboration and AI-powered workflow software designed specifically for military staffs. By transforming this work Onebrief makes the staff as a whole superhuman - meaning faster smarter and more efficient.

We take ownership seek excellence and play to win with the seriousness and camaraderie of an Olympic team. Onebrief operates as an all-remote company though many of our employees work alongside our customers at military commands around the world.

Founded in 2019 by a group of experienced planners today Onebriefs team spans veterans from all forces and global organizations and technologists from leading-edge software companies. Weve raised $123m from top-tier investors including Battery Ventures General Catalyst Insight Partners and Human Capital and today Onebrief is valued at $1.1B. With this continued growth Onebrief is able to make an impact where it matters most.

Security Clearance Location and Onsite Notice:

This role requires regularly working on-site at customer locations in Arlington VA.

If you are not currently within commuting distance you must be willing to relocate (note that Onebrief will provide relocation assistance).

Active Top Secret Clearance required with the ability to obtain SCI eligibility.

About The Role

We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. Youll work closely with fellow SREs security and customer success.

You will be the first line of support for our mission critical deployments and responsible for ensuring best-in-class service quality and issue resolution. You will work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works from policy to implementation.

In addition to working at the customer you will contribute directly to solutions that increase stability performance and security of our deployments and improve the overall experience of deploying and managing Onebrief on premise.

About You

You are a force multiplier who views reliability as the most critical feature of any application and/or platform and believe that reliability beats novelty. You see infrastructure and operability as a product to be automated documented and continuously improved always leaving systems easier to operate than you found them.

You are equally comfortable leading a post-incident review designing SLOs in a system design session or diving into a kubectl shell to triage a complex production issue. You dont just fix problems; you translate constraints and failure modes into clear automated guardrails and scalable resilient architecture. For you robust monitoring actionable alerting and insightful runbooks are core parts of the engineering process not afterthoughts.

You mentor others fostering a culture of blameless postmortems and proactive reliability. You collaborate naturally with application and platform teams helping them move quickly but safely by building the tools processes and observability that make fast recovery a reality.

What Youll Do

Youll own the reliability scalability and security of the production application and/or platform. You will do this by:

  • Building a World-Class Observability Platform: Design implement and manage our monitoring logging and alerting stack (e.g. Prometheus Loki Alloy and Grafana). You wont just track metrics; youll create the actionable insights and automated alerting that allow teams to identify and resolve issues before they impact users.

  • Defining and Upholding Reliability: Define measure and own alerting that feeds into our Service Level Objectives (SLOs) and increases trust internally and externally. You will be the organizations expert on what it means for our systems to be reliable and how to measure it.

  • Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents You will lead blameless post-mortems / After Action Reviews (AARs) that identify true root causes and drive automated long-term solutions to prevent recurrence.

  • Automating for Scale and Security: Partner with platform engineers to design build and manage secure resilient Kubernetes clusters and cloud/on-prem environments using Infrastructure-as-Code (Terraform Ansible). You will embed security and compliance controls (RMF STIGs) directly into this automation.

  • Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation. You will act as a force multiplier by advising other teams on best practices in air-gapped environments and production readiness.

What We Look For

  • 3 years of experience in Site Reliability Engineering or a related field with firsthand experience managing mission-critical systems within DoDs air-gapped environments

  • An active Top Secret security clearance. U.S. citizenship required.

  • Experience automating software delivery deployment and providing documentation and self-service tools for engineering teams and customers.

  • A strong understanding of Linux containerization and orchestration and virtual machines

  • Experience with centralized logging metrics and observability using tools such as Prometheus Loki Grafana ELK stack or Datadog.

  • Networking fundamentals: core protocols and secure configurations.

  • A deep understanding of incident response processes with experience conducting thorough root cause analyses and driving continuous improvement

  • Clear concise writing; strong documentation habits and async communication.

    • Core skills and technologies: VMWare Kubernetes Docker Helm Ansible Terraform Linux AWS DoD compliance Monitoring and Observability tools AWS.

Bonus points (nice to have)

  • Experience with compliance frameworks (RMF STIGs/SRGs ICD 503).

  • Securityminded design for air-gapped environments.

  • Active Security or another DoD 8570.01-approved security credential or the ability to obtain the valid credentials within 3 months of employment.


Notice to Third Party Recruitment Agencies

Please note that Onebrief does not accept unsolicited resumes from recruiters or employment the absence of an executed Recruitment Services Agreement there will be no obligation to any referral compensation or recruiter the event a recruiter or agency submits a resume or candidate without an agreement Onebrief explicitly reserves the right to pursue and hire those candidate(s) without any financial obligation to the recruiter or agency. Any unsolicited resumes including those submitted to hiring managers shall be deemed the property of Onebrief.

Required Experience:

Senior IC

Posted 2025-11-23

Recommended Jobs

Accounting Manager 16740552

Cordia Resources by Cherry Bekaert
Emporia, VA

Hiring!!!! Accounting Manager – General Ledger Operations &##128205; Emporia, VA | Hybrid (3 days in-office, 2 remote) &##128338; Full-Time | Federal Contractor | Manufacturing Industry &##12827…

View Details
Posted 2025-10-20

Temporary Support Associate

Coach
Leesburg, VA

Coach is seeking a Temporary Support Associate in Leesburg, VA, to assist with sales floor and stockroom organization, support sales staff, and ensure merchandise presentation aligns with brand standa…

View Details
Posted 2025-10-17

Administrative Services Assistant (Office Automation)

U.S. Department of Justice
Newport News, VA

Administrative Services Assistant (Office Automation) Location Newport News, VA : Summary For more information on the Department of Justice and the United States Attorneys' Offices, visit As needed, …

View Details
Posted 2025-11-25

Mission Guard (OCONUS)

IDS International
Arlington, VA

Why IDS?    IDS believes in resolving conflict, building innovative approaches to do so. Combining operational expertise with an intimate understanding of today’s greatest challenges, we bring our …

View Details
Posted 2025-10-22

DevOps Cloud Engineer

System One
Alexandria, VA

Job Title: DevOps Cloud Engineer Location: Alexandria, VA DevOps Cloud Engineer  Top Secret clearance  $180k  Alexandria, VA or Gaithersburg, MD  We are looking for an experienced  …

View Details
Posted 2025-11-19

Harmon Learning Center Lead Teacher

Calfee Community and Cultural Center
Pulaski, VA

Summary The Teacher is responsible for coordinating the curriculum and managing the day-to-day operational activities of the classroom. Teachers must understand children’s cognitive, social, emo…

View Details
Posted 2025-10-28

Product Manager

Bixal
Fairfax, VA

Important Notice for Applicants: At Bixal, we want to ensure a transparent and secure application process for all candidates. Official communication will come from an email address ending in @bixa…

View Details
Posted 2025-11-20

Educational Assistant (Heritage ES) Bilingual Spanish

Woodburn SD 103
Virginia

POSITION SUMMARY: An Educational Assistant will perform a variety of duties to assist in the implementation of instructional programs for students. Function in an instructional setting separate …

View Details
Posted 2025-11-23

Hourly Manager Milk & Honey - Hampton

Thompson Hospitality Corporation
Hampton, VA

Overview: We’re looking for a passionate and motivated Supervisor to help lead a dynamic team in delivering exceptional guest experiences in a fun, fast-paced environment. As a Supervisor, you’l…

View Details
Posted 2025-09-14

Manager, Plant Environmental, Health & Safety

Ball Corporation
Williamsburg, VA

**This position will be posted for a minimum of 3 days and will remain open until filled or adjusted based on the volume of applicants.** **Further your career at Ball, a world leader in manufacturing…

View Details
Posted 2025-11-14