Job Opportunities | USA | Careers | APEX-2000 Inc - A leading Global Consulting and IT services company

Requirement ID:	87345
Job Title:	Site Reliability Engineer
Job Type:	Contract
Rate:	CAD 90/hr
Duration:	6 - 9 months
Location:	Toronto
Job Description:	Role Descriptions: Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring\| log investigation\| and observability practices. The ideal candidate will have a deep understanding of business processes\| upstream-downstream dependencies\| and the ability to design and implement dashboards with SLOs and SLAs that align with business objectives.Key ResponsibilitiesMonitoring ObservabilityoConfigure and maintain Dynatrace for application and infrastructure monitoring.oDevelop custom dashboards\| alerts\| and reports to track system health and performance.oDefine and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs).Log Analysis TroubleshootingoPerform log investigation using tools like Splunk\| ELK\| or similar platforms.oIdentify root causes of incidents and provide actionable insights for resolution.Business UnderstandingoAnalyze business models\| workflows\| and critical application flows.oMap upstream and downstream dependencies to ensure end-to-end reliability.Incident ManagementoParticipate in on-call rotations and respond to production incidents.oDrive post-incident reviews and implement preventive measures.Automation OptimizationoAutomate monitoring and alerting processes to reduce manual intervention.oCollaborate with development teams to improve system reliability and performance.Required Skills QualificationsTechnical ExpertiseoStrong experience with Dynatrace (configuration\| dashboards\| problem detection).oProficiency in log analysis tools (Splunk\| ELK\| or equivalent).oSolid understanding of SRE principles\| observability\| and incident management.Business Analytical SkillsoAbility to understand business processes and translate them into technical monitoring solutions.oExperience in mapping application dependencies and creating impact analysis.Soft SkillsoExcellent communication and collaboration skills.oStrong problem-solving and analytical mindset.PreferredoExperience with cloud platforms (AWS\| Azure\| GCP).oFamiliarity with CICD pipelines and automation scripting.Performance MetricsUptime and reliability improvements.Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).Accuracy and relevance of dashboards and alerts.Compliance with defined SLOs and SLAs. Essential Skills: Site Reliability Engineer (SRE) Desirable Skills: Keyword: Skills: Digital : Amazon Web Service(AWS) Cloud Computing~Digital : Site Reliability Engineering (SRE)~Dynatrace~Github Enterprise Experience Required: 8-10
	Apply Now

Apply Now