Systems Engineering Lead (Lead Ops Engineer)
RELX
Date: 3 weeks ago
City: Quezon City
Contract type: Full time
Qualifications
Non-Negotiable
Operational Support Leadership:
Provide leadership and direction for operational support teams, ensuring timely resolution of incidents and effective communication with stakeholders.
Establish and maintain service level agreements (SLAs) for operational support activities, and continuously monitor and improve support processes.
Implement and optimize monitoring, logging, and alerting systems to facilitate proactive issue detection and resolution.
Lead and inspire the infrastructure team, providing guidance, support, and mentorship to ensure the successful execution of projects and initiatives.
Incident Management
Establish incident management processes and procedures to ensure timely response and resolution of incidents impacting cloud services.
Define incident severity levels, escalation paths, and communication protocols to effectively manage incidents of varying impact and urgency.
Lead incident response efforts during major incidents, coordinating cross-functional teams and stakeholders to mitigate impact and restore service.
Major Incident Response
Develop and maintain a major incident response plan outlining roles, responsibilities, and procedures for responding to and resolving major incidents.
Conduct regular tabletop exercises and simulations to test the effectiveness of the major incident response plan and identify areas for improvement.
Lead post-incident reviews and root cause analyses to identify systemic issues and implement corrective actions to prevent recurrence.
Monitoring And Observability
Implement comprehensive monitoring and observability solutions to gain insights into the health, performance, and availability of cloud infrastructure and services.
Utilize monitoring tools and platforms (New Relic) to collect, analyze, and visualize metrics, logs, and traces.
Establish and maintain robust monitoring and alerting mechanisms to ensure the timely detection and resolution of issues in our hosting environment.
Performance Optimization
Monitor and analyze cloud infrastructure performance metrics to identify bottlenecks and areas for optimization.
Implement performance tuning strategies to improve the efficiency and responsiveness of cloud-based applications and services.
Work closely with development teams to optimize application performance and resource utilization in the cloud environment.
Trend Analysis And Reporting
Conduct trend analysis on system performance, incidents, and operational metrics to identify patterns, anomalies, and areas for improvement.
Develop and maintain reports and dashboards to communicate key performance indicators (KPIs) and metrics related to processes and systems.
Collaborate with stakeholders to derive insights from data and drive data-driven decision-making to optimize processes and enhance system reliability.
Documentation
Maintain comprehensive documentation of infrastructure configurations and procedures. Provide training and knowledge-sharing sessions for team members to ensure proficiency in AWS technologies and best practices.
Team Management And Development
Lead and mentor a team of cloud infrastructure engineers, providing guidance, support, and opportunities for professional growth.
Foster a culture of collaboration, innovation, and continuous learning within the team.
Develop and execute training programs to enhance the technical skills and expertise of team members.
Non-Negotiable
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Proven experience (3-5 years) in a leadership role, with a focus on infrastructure and hosting operations.
- Strong technical expertise in cloud technologies (AWS) and services, with hands-on experience in managing cloud infrastructure. (2-3 years)
- Service Delivery Management Experience (service level management driving problem management and operational efficiencies)
- Excellent leadership and communication skills, with the ability to effectively collaborate with cross-functional teams /enterprise stakeholder management
- Strong problem-solving skills and a proactive approach to identifying and addressing technical challenges.
- Experience working in agile development environments and familiarity with agile methodologies (e.g., Scrum, Kanban)
- Automation Background
- Any AWS Certification
- ITIL certification
- Security and Compliance: Knowledge of security best practices and experience in implementing and maintaining security controls to ensure compliance with relevant standards and regulations. (vulnerability and patch management)
Operational Support Leadership:
Provide leadership and direction for operational support teams, ensuring timely resolution of incidents and effective communication with stakeholders.
Establish and maintain service level agreements (SLAs) for operational support activities, and continuously monitor and improve support processes.
Implement and optimize monitoring, logging, and alerting systems to facilitate proactive issue detection and resolution.
Lead and inspire the infrastructure team, providing guidance, support, and mentorship to ensure the successful execution of projects and initiatives.
Incident Management
Establish incident management processes and procedures to ensure timely response and resolution of incidents impacting cloud services.
Define incident severity levels, escalation paths, and communication protocols to effectively manage incidents of varying impact and urgency.
Lead incident response efforts during major incidents, coordinating cross-functional teams and stakeholders to mitigate impact and restore service.
Major Incident Response
Develop and maintain a major incident response plan outlining roles, responsibilities, and procedures for responding to and resolving major incidents.
Conduct regular tabletop exercises and simulations to test the effectiveness of the major incident response plan and identify areas for improvement.
Lead post-incident reviews and root cause analyses to identify systemic issues and implement corrective actions to prevent recurrence.
Monitoring And Observability
Implement comprehensive monitoring and observability solutions to gain insights into the health, performance, and availability of cloud infrastructure and services.
Utilize monitoring tools and platforms (New Relic) to collect, analyze, and visualize metrics, logs, and traces.
Establish and maintain robust monitoring and alerting mechanisms to ensure the timely detection and resolution of issues in our hosting environment.
Performance Optimization
Monitor and analyze cloud infrastructure performance metrics to identify bottlenecks and areas for optimization.
Implement performance tuning strategies to improve the efficiency and responsiveness of cloud-based applications and services.
Work closely with development teams to optimize application performance and resource utilization in the cloud environment.
Trend Analysis And Reporting
Conduct trend analysis on system performance, incidents, and operational metrics to identify patterns, anomalies, and areas for improvement.
Develop and maintain reports and dashboards to communicate key performance indicators (KPIs) and metrics related to processes and systems.
Collaborate with stakeholders to derive insights from data and drive data-driven decision-making to optimize processes and enhance system reliability.
Documentation
Maintain comprehensive documentation of infrastructure configurations and procedures. Provide training and knowledge-sharing sessions for team members to ensure proficiency in AWS technologies and best practices.
Team Management And Development
Lead and mentor a team of cloud infrastructure engineers, providing guidance, support, and opportunities for professional growth.
Foster a culture of collaboration, innovation, and continuous learning within the team.
Develop and execute training programs to enhance the technical skills and expertise of team members.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Store Supervisor/ Assistant Supervisor - Anonas Quezon City
MR DIY Philippines,
Quezon City
20 hours ago
About the job Store Supervisor/ Assistant Supervisor - Anonas Quezon CityQualifications Candidate must possess Bachelor's/College Degree in any field. Required language(s): Filipino, English 5 Year(s) of working experience in Retail Industry. Required Skill(s): Excellent Organizational Skills, Leadership and Communications Skills, People Management Skills, Coaching Skills, Merchandising Skills Preferably Supervisor/5 Yrs & Up Experienced Employee specialized in Sales - Retail/General or...
Team Leader
Probe CX,
Quezon City
1 day ago
Apply Now Back to search results Previous job Next job Job DescriptionJob Overview:The Team Leader is responsible for managing Team Members within the Contact Centre, including the performance of their teams and the individual Team Members.Duties and responsibilitiesTo manage the day–to-day planning, operation and problem-solving of a team of agents to meet with the required service level components, to develop...
AP and T&E Analyst-2
Copeland,
Quezon City
2 days ago
About UsWe are a global climate technologies company engineered for sustainability. We create sustainable and efficient residential, commercial and industrial spaces through HVACR technologies. We protect temperature-sensitive goods throughout the cold chain. And we bring comfort to people globally. Best-in-class engineering, design and manufacturing combined with category-leading brands in compression, controls, software and monitoring solutions result in next-generation climate technology...