Apply now »
Position

Site Reliability Engineer

Details

Location: 

Johannesburg, ZA

Date:  24 Apr 2025
Reference:  138974

Requisition Details & Talent Acquisition Consultant

REQ 138974 - Keabetswe Modise

Closing Date: 13 May 2025

Job Family

Information Technology

Career Stream

Application Development

Leadership Pipeline

Manage Self: Professional

Job Purpose

To serve as an IT professional specialising in Site Reliability Engineering (SRE) at Nedbank, contributing to the strategic capability of the organisation as part of a dynamic team. The role is focused on advancing SRE discipline and working with other domains to influence the adoption. It is a strategic, consultancy-based role that involves enabling and contributing to solutions aligned with the principles of reliability, availability, and resilience, while also promoting frequent and efficient delivery from development teams.

Job Responsibilities

  • Collaborating with stakeholders, engineers, and operational SMEs to ensure all relevant parties are up to date with what is top of mind within the reliability service offerings 
  • Evolve production services based on customer needs and technology to ensure we remain competitive in the financial services industry/market.
  • Influence squads during service or platform design to prevent system failures and improve performance.  
  • Engage with leadership and teams to adopt SRE practices with a core focus to contribute towards incident management and advocate for blameless postmortems.
  • Engage and influence all teams involved in the software development life cycle with regards to observability, high availability utilising new or existing technology and improve disaster recovery plans.
  • Implement automated-based solutions to achieve high availability, efficiency, reduce cost and performance to systems.
  • Coach teams on best practices within the organisation via internal forums to position SRE fundamental knowledge and promote enterprise-wide knowledge sharing  
  • Assist with creating and maintaining system health and performance metrics reflecting real-time data, enabling proactive resolution, and faster troubleshooting.
  • Collaborate and partner with DevOps engineer/coach to ensure efficient continuous integration/continuous deployment pipelines and resolve any failures or improve the flow.
  • Take charge of technical leadership, engage with teams to identify best solutions, and mentor Junior Site Reliability Engineers to resolve technical challenges. 
  • Assist in defining and implementing metrics such as SLI's and SLO's to gain insight of user experience and performance of application.
  • Define and deliver technical standards in partnership with all disciplines of software engineering for adoption of site reliability engineering. 
  • Participate and closely work with relevant COE's to improve release of new features to facilitate time to market. 
  • Build and maintain strategic relationships with the business units and vendors to be in sync on current ways of work and business decisions that are being embraced.
  • Conduct maturity assessments within teams to measure SRE level of adoption and use results to outline a plan to assist teams how to get to the next level of maturity. 
  • Utilise application monitoring tools to generate report for informed decision making and driving visibility of Site Reliability Engineering. 
  • Adhere and comply with Nedbank group information management, data integrity and security policies and best practices to protect client data.
  • Manage concurrent objectives, projects, groups, activities and time allocation based on prioritisation for effective delivery.
  • Stay abreast of the most recent industry trends and practices and implement learnings back into the business to ensure alignment across industry.
  • Responsible for the success of the team and projects by taking ownership of issues and ensuring their resolution.
  • Articulate technical concepts to diverse audiences through proficient written and verbal communication to ease the understanding of the SRE discipline. 
  • Contribute to the successful implementation of the business strategy in an innovative high passed environment.

Essential Qualifications - NQF Level

  • Matric / Grade 12 / National Senior Certificate
  • Advanced Diplomas/National 1st Degrees

Preferred Qualification

  • B-Tech Computer systems, BSc - Info Sys/Computer System or Related qualification

Preferred Certifications

  • Associate or professional (Amazon Web Services/Azure Solutions), ITIL, DevOps

Minimum Experience Level

  • Min 8 years IT Experience with 5 years in relevant technologies or domains

Business Drivers

  • Technical Expert
  • Analyst 
  • Consultant
  • Problem solver

Technical / Professional Knowledge

  • Microservices and containerization (K8s or Docker)
  • Troubleshooting and root cause analysis
  • Site Reliability Engineering Best practices
  • DevOps framework
  • Relevant programming/scripting languages
  • Infrastructure and application monitoring 
  • Incident management and post incident analysis

Behavioural Competencies

  • Tech Savvy
  • Decision Making 
  • Building Networks
  • Influencing 
  • Communication
  • Trouble shooter
  • Emotional intelligence Essentials

---------------------------------------------------------------------------------------

Please contact the Nedbank Recruiting Team at +27 860 555 566 

 

If you can't find the job you're looking for, activate job alerts to be one of the first to know when new positions open up.

Apply now »