| Job Description: |
Skills: Digital : Python~Digital : Node.js~Analytics Platform System (APS) Experience Required: 8-10
Role Descriptions: SRE Lead • Deep application and system level knowledge across complex, end to end environments, including tightly integrated on prem and cloud native services supporting large scale, multi tier transaction flows • Prior hands on experience with APM and observability platforms (Dynatrace or comparable enterprise tools), with the ability to instrument, analyze, and troubleshoot complex distributed applications • Proven expertise in deep troubleshooting across multi layer, end to end (E2E) environments, including application, infrastructure, network, and platform layers (on prem and cloud) • Drive and execute the SRE / WCCS roadmap for BMO • Hands on role from Day 1 • Strong observability experience (refer to Observability SME expectations below) • Deep knowledge and experience implementing SRE practices and guiding complex SRE transformations across the industry
Key Contributions: • Assess current SRE capabilities, identify gaps, and contribute to the SRE & WCCS roadmap • Navigate and collaborate across multi team SRE and IT Operations environments to drive results • Deliver creative workarounds and practical solutions to complex problems ________________________________________
SRE – Observability SME • Hands on role from Day 1 • Strong Day 1 Dynatrace expertise, including: o DQL o Gen3 Dashboards o Traces / Grail o Active Gate and Plugins o SRG / Workflow development o BizEvents • Prior hands on experience with APM and observability platforms (Dynatrace or equivalent), with the ability to instrument, analyze, and troubleshoot distributed applications • Deep troubleshooting expertise using observability signals (Metrics, Events, Logs, Traces) to identify root causes across complex, multi layer E2E environments • Strong foundation in Observability fundamentals (MELT) • Expert level dashboard design, including UI/UX best practices • Extensive experience troubleshooting performance and non functional issues • Familiarity with SRE concepts as outlined in the Google SRE book/workbook • Strong expertise in AWS Observability, including: o CloudWatch o Application Signals o Metrics, Logs, and Traces o Lambda and API Gateway • Ability to design creative monitoring solutions for platforms with limited observability (e.g., IBM DataPower) • Development experience with Python, AWS Lambda, ECS, and Azure Functions • Understanding of AI based system fundamentals, including how such systems are built and monitored • Background or working knowledge of OpenTelemetry (OTEL) • Experience in Financial Services or equivalent highly complex environments (e.g., 50+ systems collaborating to fulfill a single customer transaction) |