Site Reliability Engineer (SRE)

HG Solutions
Reston, VA

We are looking for a Reliability Engineer who is based out of Reston, VA.

These roles are Hybrid Role with 3 Days a week to Reston Office

Contract

6 Months extendable

Role : Reliability Engineer

Description:

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) . The ideal candidate will have a strong background in cloud platforms, DevOps practices, and modern software development frameworks. The SRE will play a critical role in designing, building, and maintaining highly scalable, fault-tolerant, and secure cloud infrastructure while ensuring operational excellence, high availability, and reliability.

Key Resposibilities

  1. Cloud Infrastructure & Automation:
    Design, implement, and manage cloud-based infrastructure using platforms like AWS, Azure, or GCP.
    Utilize Infrastructure-as-Code (IaC) tools such as Terraform, CloudFormation, and Ansible to automate deployments and configurations.
    Create robust automation targeted at anomaly detection, toil reduction, recovery processes, and self-healing mechanisms, and optimize cloud costs.
    2. DevSecOps & CI/CD:
    Deep understanding of DevSecOps principles and CI/CD pipelines using tools like GitLab, Jenkins, SonarQube, Nexus/Artifactory, and Docker.
    Implement security best practices, including IAM roles, RBAC, vulnerability remediation, and SAST/DAST/SCA tools.
    3. Observability & Incident Management:
    Design and implement monitoring, logging, and distributed tracing solutions using tools like AWS CloudWatch, Splunk/SignalFX, Dynatrace, and OpenTelemetry.
    Lead root cause analysis, blameless postmortems, and proactive incident management to minimize MTTR and MTTD.
    Define and monitor SLOs, SLIs, and error budgets to ensure system reliability.
    4. Microservices & API Management:
    Architect and manage microservices, serverless computing, and RESTful APIs.
    Ensure fault tolerance and resilience using design patterns like Circuit Breaker, Retry, Timeout, and Bulkhead.
    5. Chaos Engineering & Resiliency:
    Conduct chaos engineering experiments using tools like AWS FIS and Chaos Toolkit.
    Perform resiliency assessments using Resilience Hub and implement self-healing solutions.
    6. Database & Application Support:
    Manage and optimize database technologies such as PostgreSQL, MongoDB, DynamoDB, Oracle, and Redshift.
    Provide production support, including incident response, problem management, and runbook creation. Participate in on-call rotations.
    7. Collaboration & Communication:
    Collaborate with cross-functional teams to implement shift-left testing practices (BDD, TDD, Unit, Regression).
    Create and maintain architecture diagrams, knowledge articles, and disaster recovery plans.
    Communicate effectively with stakeholders and demonstrate strong relationship management skills.

Required Qualifications

1. Well versed in AWS (ECS, EC2, RDS, Redshift, EMR, Lambda, Route 53, Step Functions). Must have hands on experience
2. DevOps - Infrastructure as Code, CICD - Jenkins, GitLab, Terraform
3. Well versed in SRE concepts SLO, Error Budget, Alarms, Monitoring etc. Must have implemented these concepts hands on
4. Programming using Python/Java
5. Experience in APM & observability using Splunk, Dynatrace



Nice to have
1. Release Engineer
2. Production Support
3. Performance Testing

Preferred Qualifications:
Experience with AI/ML libraries (e.g., NLTK, Transformers, Spacy, SciPy), Amazon SageMaker, and GenAI tools.
Familiarity with project management tools like JIRA, Confluence, and ServiceNow.
Knowledge of utilities like AWS CLI, POSTMAN, and curl.

Required Skills

Expertise in cloud platforms (AWS, Azure, or GCP) and container orchestration.
Proficiency in programming/scripting languages such as Python, Java, Node.js, Bash, and PowerShell.
Strong knowledge of database technologies (e.g., PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift).
Experience with DevOps tools (Jenkins, Docker, Nexus/Artifactory) and build tools (Maven, Gradle).
Familiarity with AI/ML integrations, event-driven architectures, and distributed systems.
Expertise in observability, logging, and monitoring tools (AWS CloudWatch, Splunk, Dynatrace, OpenTelemetry).
Strong understanding of security practices, including IAM, RBAC, and vulnerability management.
Experience with chaos engineering, resiliency assessments, and disaster recovery planning.
Proficiency in performance testing tools (JMeter, LoadRunner) and capacity planning.
Excellent verbal and written communication skills, with the ability to collaborate across teams.
8+ years of related experience in their specific area with experience leading teams on projects with similar scope and complexity.
Bachelor s or master s degree in computer science or equivalent.
Certifications: AWS Solutions Architect, Agile Certified Practitioner (ACP), or relevant cloud certifications.

Posted 2025-07-30

Recommended Jobs

Member Survey Specialist

STATE DEPARTMENT FEDERAL CREDIT UNION
Alexandria, VA

Member Survey Specialist Location Alexandria, VA (King St. Metro-Eisenhower Ave. area) : Recognized by the city of Alexandria as a Gold Employer, the State Department Federal Credit Union (SDFCU) is …

View Details
Posted 2025-07-30

Installation Technician

Collabera
Henrico, VA

Job Description Job Description Must-Haves: Effectively communicate with employees, customers and colleagues Ability to use hand and power tools in a safe and efficient manner Ability to…

View Details
Posted 2025-07-25

Travel Nurse RN - ED - Emergency Department - $1,966 per week

Aviata Health Group
Newport News, VA

Seeking a travel ER RN for a 13-week assignment in Newport News, VA, starting ASAP. ~12-hour night shifts, 36 hours/week, with potential for extension. ~ Provide care for trauma and emergency pati…

View Details
Posted 2025-07-26

Experienced Background Investigator - Leesburg, VA

Peraton
Leesburg, VA

Program Overview About The Role Peraton is seeking to fill a critical role for a full-time experienced background investigator near Leesburg, VA . Applicants must reside within 15 miles …

View Details
Posted 2025-07-28

Sales Representative

Bentley Consulting
Fredericksburg, VA

You've probably seen this ad before: "SELL $2 AUTO LEADS LIKE HOTCAKES TO FRANCHISE DEALERS NATIONWIDE ON THE INTERNET!" What are you waiting for? If you hate your current job, call or text us. …

View Details
Posted 2025-07-24

Counselor

11th Hour Service
Falls Church, VA

Job Description Job Description About 11th Hour Service 11th Hour Service is a fast-growing, people-centric, Management Consulting and Advisory firm providing forward-thinking solutions to go…

View Details
Posted 2025-07-30

APP - Specialty Practice - Sleep Medicine - Lexington

Carilion Clinic
Lexington, VA

How You’ll Help Transform Healthcare: Full time APP ( NP or PA ) opportunity with Carilion Clinic- Sleep Medicine in Lexington, VA. 5 - 8-hour or 4- 10-hour shifts; Monday - Friday; no nights, wee…

View Details
Posted 2025-07-27