Site Reliability Engineer (Sre)
ITRiders • toronto, on • Posted June 01, 2026
About the Role
Observability, SRE, DevOps roles with proven expertise across infrastructure and application-level reliability. Dynatrace, ELK, Splunk, and PagerDuty; SLI/SLO frameworks. Azure Kubernetes Service, Terraform,
What will you do Design and implement observability-as-code solutions using Terraform to deploy monitoring pipelines, dashboards, and alerting strategies across distributed systems.
Drive observability improvements leveraging industry-leading tools (Dynatrace, ELK, Splunk, PagerDuty) to achieve real-time performance insights and comprehensive system visibility.
Instrument applications for end-to-end observability implementing distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures.
Troubleshoot complex incidents in production environments, diagnosing root causes across multiple service layers, databases, caches, and APIs under load using SLISLO frameworks.
Investig...
What will you do Design and implement observability-as-code solutions using Terraform to deploy monitoring pipelines, dashboards, and alerting strategies across distributed systems.
Drive observability improvements leveraging industry-leading tools (Dynatrace, ELK, Splunk, PagerDuty) to achieve real-time performance insights and comprehensive system visibility.
Instrument applications for end-to-end observability implementing distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures.
Troubleshoot complex incidents in production environments, diagnosing root causes across multiple service layers, databases, caches, and APIs under load using SLISLO frameworks.
Investig...