Site Reliability Engineer
Job Description
Job Description
As a Site Reliability Engineer (SRE), you will be an integral part of the team at LightEdge Solutions. This position will report to the DevOps Manager, and will be responsible for reliable operation of the organization’s systems and services. You will play a key role in identifying our monitoring strategy and vision across multiple products and work with a variety of teams to improve the accuracy of our monitoring systems.
Responsibilities- Monitoring and Observability: Design and implement monitoring solutions to track the performance, availability, and health of various systems and services. Establish robust monitoring frameworks, set up alerts, and analyze system metrics to identify and resolve issues proactively.
- Establish and align metrics, including SLAs, SLOs, and SLIs, to closely tie system performance to business objectives, ensuring that the site reliability engineering efforts support the overall goals and customer satisfaction.
- Utilize AIOPS techniques to leverage automation in Incident Management and Response. Develop and maintain automated incident response systems that can detect and mitigate issues automatically. This includes automated incident triaging, remediation, and escalation workflows to minimize manual intervention and improve response times.
- Leverage the IT service management platform’s capabilities to integrate monitoring into incident management, change management, and other operational processes, enhancing the efficiency and effectiveness of site reliability engineering practices.
- Working closely with IT functional owners & SME’s.
- Perform complex systems design, proof of concept, implementation and integration functions.
- Tasks will consist of developing detailed designs, execution and troubleshooting of strategic solutions in support of effective monitoring, alerting, escalation, automation, reporting and event correlation
- 5 years hands-on experience with enterprise monitoring solutions
- Must possess knowledge of Network Switches, Server hardware, Storage, and Virtualization Technologies
- Understanding of VMware Infrastructure
- Experience working with variety of monitoring systems such as Zabbix, vRealize Operations Manager, Nagios and Science Logic
- Experience and proficiency in integrating with ServiceNow or similar IT service management platforms.
- Experience with managing automations within a monitoring environment.
- Ability to provide guidance with design, maintenance, and improvements to enterprise level monitoring solutions.
- Excellent verbal and written communication skills, ability to present complex ideas and designs to a variety of technical or non-technical stakeholders.
- Experience with design, implementation, and support of monitoring tools in a complex, multi-platform environment.
- High level of understanding monitoring requirements for Storage, Network, and Compute servers.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Recommended Jobs
IN-STORE ASSEMBLY TECHNICIAN
Grills, Lawn Mowers, Patio Furniture & More M&C is the largest In-store Assembly company in the country We put merchandise together in the stores you probably go to every day! Earn $20/hr averag…
Acquisition Life Cycle Manager / Systems Integration Lead, TS/SCI with CI Poly in Chantilly, VA
Hi , Our client is looking for a Acquisition Life Cycle Manager / Systems Integration Lead, TS/SCI with CI Poly in Chantilly, VA . If you are interested, please share your updated resume with your…
Registered Nurse (RN)
Responsibilities Newport News Behavioral Health Center Registered Nurse (RN)-(7P-7A Available) ** NEW! Up to $15,000.00 Sign-on Bonus ** Newport News Behavioral Health Center is a 108…
Network Engineer (Palo Alto) - U.S. Citizenship Required
Network Engineer - U.S. Citizenship Required Position Description CGI is one of the top 5 largest global IT companies spread across 40 countries with endless opportunities to expand and grow. As…
Respite Assistant - Part Time
Respite Assistant – Job Announcement The Planning Council mission statement – We identify community needs, connect people with solutions, and improve lives. Job Title: Respite Ass…
Wellness Director Senior Living
Job Description Recognized by Newsweek in 2024 and 2025 as one of America's Greatest Workplaces for Diversity About The Director Of Nursing Position As Director of Nursing at Brookdale, you …
Registered Behavior Technician - RBT/BT - Part-Time
Behavior Technician/Registered Behavior Technician (BT/RBT) – ABA Centers of Virginia Part-Time Vienna, VA Starting rate of $24.15/hour. Final compensation will be determined by a candi…
Mechanical (MEP) Project Manager - Dulles, VA
What you’ll do: Provide support to a single large project or multiple projects being managed by a Senior Project Managers and Project Engineers We do self-performing work, so you’ll need to wor…
Cloud Developer
J5 Consulting is a Maryland based company established in 2006 to provide computing and consulting services for government and commercial entities. Our services improve Information System networking p…
Hvac Service Technician
Company Name: ARS-Rescue Rooter Overview: Pay: $75,000 - $120,000 +, Hourly plus Commission Opportunity Sign-on Bonus: UP TO $5000 based on experience and interview Schedule: Multiple Day /…