Site Reliability Engineer
Johannesburg, ZA
Requisition Details & Talent Acquisition Consultant
REQ 138974 - Keabetswe Modise
Closing Date: 13 May 2025
Job Family
Information Technology
Career Stream
Application Development
Leadership Pipeline
Manage Self: Professional
Job Purpose
To serve as an IT professional specialising in Site Reliability Engineering (SRE) at Nedbank, contributing to the strategic capability of the organisation as part of a dynamic team. The role is focused on advancing SRE discipline and working with other domains to influence the adoption. It is a strategic, consultancy-based role that involves enabling and contributing to solutions aligned with the principles of reliability, availability, and resilience, while also promoting frequent and efficient delivery from development teams.
Job Responsibilities
- Collaborating with stakeholders, engineers, and operational SMEs to ensure all relevant parties are up to date with what is top of mind within the reliability service offerings
- Evolve production services based on customer needs and technology to ensure we remain competitive in the financial services industry/market.
- Influence squads during service or platform design to prevent system failures and improve performance.
- Engage with leadership and teams to adopt SRE practices with a core focus to contribute towards incident management and advocate for blameless postmortems.
- Engage and influence all teams involved in the software development life cycle with regards to observability, high availability utilising new or existing technology and improve disaster recovery plans.
- Implement automated-based solutions to achieve high availability, efficiency, reduce cost and performance to systems.
- Coach teams on best practices within the organisation via internal forums to position SRE fundamental knowledge and promote enterprise-wide knowledge sharing
- Assist with creating and maintaining system health and performance metrics reflecting real-time data, enabling proactive resolution, and faster troubleshooting.
- Collaborate and partner with DevOps engineer/coach to ensure efficient continuous integration/continuous deployment pipelines and resolve any failures or improve the flow.
- Take charge of technical leadership, engage with teams to identify best solutions, and mentor Junior Site Reliability Engineers to resolve technical challenges.
- Assist in defining and implementing metrics such as SLI's and SLO's to gain insight of user experience and performance of application.
- Define and deliver technical standards in partnership with all disciplines of software engineering for adoption of site reliability engineering.
- Participate and closely work with relevant COE's to improve release of new features to facilitate time to market.
- Build and maintain strategic relationships with the business units and vendors to be in sync on current ways of work and business decisions that are being embraced.
- Conduct maturity assessments within teams to measure SRE level of adoption and use results to outline a plan to assist teams how to get to the next level of maturity.
- Utilise application monitoring tools to generate report for informed decision making and driving visibility of Site Reliability Engineering.
- Adhere and comply with Nedbank group information management, data integrity and security policies and best practices to protect client data.
- Manage concurrent objectives, projects, groups, activities and time allocation based on prioritisation for effective delivery.
- Stay abreast of the most recent industry trends and practices and implement learnings back into the business to ensure alignment across industry.
- Responsible for the success of the team and projects by taking ownership of issues and ensuring their resolution.
- Articulate technical concepts to diverse audiences through proficient written and verbal communication to ease the understanding of the SRE discipline.
- Contribute to the successful implementation of the business strategy in an innovative high passed environment.
Essential Qualifications - NQF Level
- Matric / Grade 12 / National Senior Certificate
- Advanced Diplomas/National 1st Degrees
Preferred Qualification
- B-Tech Computer systems, BSc - Info Sys/Computer System or Related qualification
Preferred Certifications
- Associate or professional (Amazon Web Services/Azure Solutions), ITIL, DevOps
Minimum Experience Level
- Min 8 years IT Experience with 5 years in relevant technologies or domains
Business Drivers
- Technical Expert
- Analyst
- Consultant
- Problem solver
Technical / Professional Knowledge
- Microservices and containerization (K8s or Docker)
- Troubleshooting and root cause analysis
- Site Reliability Engineering Best practices
- DevOps framework
- Relevant programming/scripting languages
- Infrastructure and application monitoring
- Incident management and post incident analysis
Behavioural Competencies
- Tech Savvy
- Decision Making
- Building Networks
- Influencing
- Communication
- Trouble shooter
- Emotional intelligence Essentials
---------------------------------------------------------------------------------------
Please contact the Nedbank Recruiting Team at +27 860 555 566