Purple Group Limited Logo

Head of Platform Stability

Purple Group Limited|Posted 2 months ago

Skills and experience

Role:Site reliability engineer (SRE)
Experience in role:10+ years
Language proficiency:English
Must-have skills:
    Elastic Stack
    Prometheus
    Grafana
Nice-to-have skills:
    Datadog
    Splunk Cloud

Location and salary

Remote policy:Remote
Location of job:Anywhere
Visa requirements:Authorised to work in South Africa With status of citizen/passport holder or permanent resident
Visa sponsorship:Unable to sponsor visa
Employment type:Permanent

Role description

Description of role:

We are seeking an experienced and innovative Head of Platform Stability to lead our efforts in ensuring the health, reliability, and performance of our platforms. This role is critical in maintaining high uptime, implementing real-time monitoring systems, providing application support, and driving proactive maintenance initiatives. The ideal candidate will have a proven track record in leading similar functions and be adept at issue management and client resolution.

The Head of Platform Stability will play a crucial role in ensuring the reliability and performance of our platform. This position offers an exciting opportunity to lead innovation in platform stability and shape the future of our technology infrastructure.

In this role you will be required to:

• Develop and implement strategies to monitor and maximize platform uptime and reliability

• Establish and monitor key performance indicators (KPIs) for platform stability

• Lead incident response efforts and conduct post-mortem analyses by implementing and refining incident

management frameworks to prevent future issues

• Design and oversee the implementation of comprehensive real-time observability and monitoring systems

• Develop proactive maintenance protocols to identify and address potential issues before they impact users

• Leverage AI and machine learning technologies to enhance predictive maintenance capabilities

• Manage the Business as Usual (BAU) application support team

• Establish efficient processes for issue triage, escalation, and resolution through the use of industry-proven

tooling.

• Collaborate with development teams to implement long-term solutions for recurring issues

• Build and lead a proactive and high-performing team of platform stability engineers

• Define the scope and structure of the platform stability function

• Develop and implement best practices for platform stability across the organization

• Serve as the primary point of contact and incident manager for high-priority platform stability issues

• Develop and maintain strong relationships with key stakeholders and clients and ensure alignment with

platform stability goals with business priorities.

• Provide regular updates and reports on platform performance and stability initiatives

You are likely to be a good fit for this role if:

• Strategic thinking and ability to translate technical concepts into business value

• Strong leadership and team management skills

• Excellent analytical and problem-solving abilities

• Proactive approach to identifying and mitigating potential issues

• Ability to work effectively in a fast-paced, dynamic environment

• Strong customer focus and commitment to service excellence

• Experienced in advanced monitoring tools for monitoring and observability such as Elastic Stack

Observability, Prometheus, Grafana, Datadog or Splunk.

Desired Experience and Qualifications:

• Bachelor's degree in Computer Science, Information Technology, or a related field; Master's degree

preferred

• 10+ years of experience in platform engineering, with at least 5 years in a leadership role

• Proven track record in managing large-scale, complex platforms with high uptime requirements

• Strong knowledge of cloud technologies, infrastructure as code, and DevOps practices

• Experience with AI and machine learning applications in system monitoring and maintenance

• Excellent problem-solving skills and ability to manage high-pressure situations

• Strong communication and interpersonal skills, with the ability to interact effectively with technical and non-technical stakeholders

• Frameworks and Standards:

• Incident Management: ITIL, SRE, Incident Command System (ICS).

• Post-Mortem Analysis: Blameless Post-Mortems, 5 Whys, Fishbone Diagrams.

• Monitoring and Observability: Tools like Elastic Stack Observability, Prometheus, Grafana, Datadog, Splunk.

About Purple Group Limited

201-500 employees

At Purple Group, a JSE listed company, we democratise investing, making it easy for everyone to grow and protect wealth. Our dream is rooted in technical excellence and security, beautiful design and captivating story telling. Our Brands include:

  • EasyEquities which aims to disrupt and remove the barriers to entry in local and international stock markets, making share ownership easy, cheap and fun, and ensuring that anyone can own shares in the companies they love.
  • EasyProperties where our goal is to simplify the process through fractional property investments, offering investors the chance to start small while thinking big.
  • EasyAssetMgmt where we use four main building blocks to construct our portfolios: Value, Quality, Stability and Momentum, that matches your defined risk-return objectives.
  • GT247.com, South Africa’s first CFD provider, remains one of South Africa’s leading trading platforms.
  • RISE, an affordable and practical one-stop solution for the needs of retirement Funds and their members.
  • EasyCrypto which is built to deliver the safest, easiest and most trusted platform to invest in and store all your crypto assets. Built on blockchain rails the platform enables the digitization of any asset class giving the business the capability to capitalize on this mega trend into the future.

To become part of our dynamic team that, together, is shaping the world of finance, the following hiring process applies:

  1. Technical and culture fit interviews
  2. Technical assessments (for certain roles)
  3. Background checks

Perks at Purple Group Limited

Remote Working
Medical Aid
Learning and Development Support
Retirement Savings
Risk Benefits (Life, Disability, Funeral)
Mental Health Support
Great Leave Policy
Long Term Incentives
No strict dress code

Tech Stack

utilities

Elasticsearch
Elasticsearch
Power BI
Power BI

business tool

Slack
Slack
JIRA
JIRA
Microsoft Sharepoint
Microsoft Sharepoint
Microsoft Teams
Microsoft Teams
HubSpot
HubSpot

application and data

C#
C#
MS SQL
MS SQL
Kafka
Kafka

dev ops

Prometheus
Prometheus
Datadog
Datadog

Similar jobs on OfferZen: