Senior Observability Specialist

Location:

Johannesburg, ZA

Date: 1 Apr 2026

Reference: 144605

Why This Role Matters

In a world where milliseconds matter and reliability is non-negotiable, observability is the superpower that keeps enterprise systems alive, agile, and trusted.

As our Senior Observability Specialist, you will architect the nervous system of our digital estate, bringing clarity to complexity across hybrid cloud, distributed platforms, and mainframe environments. Your work will ensure incidents are prevented before they happen, detected faster than ever, and resolved with confidence and speed.

You won’t just monitor systems, you will illuminate them, championing Site Reliability Engineering (SRE) principles, automation-first thinking, and a truly measure-first culture.

Your Mission

To deliver enterprise-grade observability solutions that provide real-time insights into system health, performance, and resilience—using best-in-class platforms like Dynatrace and ServiceNow.

You’ll partner with technology and business leaders to design scalable, integrated monitoring ecosystems that empower teams, reduce noise, and turn data into decisive action.

What You’ll Be Doing

Design & Build World-Class Observability

Architect and implement end-to-end observability across applications, infrastructure, and services
Define and set enterprise standards for monitoring and event management
Instrument services using logs, metrics, traces, telemetry, APM, RUM, and synthetics
Design synthetic user journeys for proactive, early-warning detection
Build high-signal alerts and visually compelling dashboards that matter
Ensure scalability, availability, and recoverability across hybrid cloud environments

Integrate, Automate, Accelerate

Integrate observability platforms with ServiceNow Event Management
Design event normalization, enrichment, CI binding, and noise reduction strategies

Drive Operational Excellence

Strengthen operational readiness with clear procedures and response plans
Partner with operations teams to close monitoring gaps and refine incident response

Champion Continuous Improvement

Research emerging observability tools, trends, and practices
Design advanced, cross-technology observability solutions at enterprise scale

Lead, Influence & Enable

Serve as an observability and SRE consultant across enterprise initiatives
Support SRE practices including SLIs, SLOs, error budgets, and post-incident reviews

Experience & Expertise

8–10 years of IT experience, with 7–10+ years in monitoring, APM, observability, or SRE at enterprise scale
Deep, hands-on expertise with Dynatrace, ServiceNow APM/Event Management, or equivalent platforms
Strong background in event management design, ITSM integration, and automation
Proven experience designing SLIs, SLOs, alert strategies, and driving MTTR reduction
Production experience in AWS, Azure, or GCP, including Kubernetes and containerized environments

Technical Strengths

Strong programming logic and scripting expertise
Ability to design solutions that span complex, interdependent technologies
Solid understanding of hybrid cloud and distributed system architectures

Qualifications

IT Diploma / Degree or equivalent practical experience

The Way You Work

Strategic thinker who sees the big picture while mastering the details
Calm under pressure, leading incidents with clarity and confidence
Able to influence without authority, coaching teams to shift-left and own their SLOs
Collaborative across all organizational levels
Emotionally intelligent, decisive, resilient, and quality-driven

The Impact You’ll Make

You’ll redefine how reliability is engineered, how incidents are experienced, and how visibility empowers teams. Your influence will be felt across platforms, portfolios, and people, turning observability into a competitive advantage.

---------------------------------------------------------------------------------------

Please contact the Nedbank Recruiting Team at +27 860 555 566