Site Reliability Engineer (Sre)

ITRiders • toronto, on • Posted June 01, 2026

About the Role

Observability, SRE, DevOps roles with proven expertise across infrastructure and application-level reliability. Dynatrace, ELK, Splunk, and PagerDuty; SLI/SLO frameworks. Azure Kubernetes Service, Terraform,

What will you do Design and implement observability-as-code solutions using Terraform to deploy monitoring pipelines, dashboards, and alerting strategies across distributed systems.

Drive observability improvements leveraging industry-leading tools (Dynatrace, ELK, Splunk, PagerDuty) to achieve real-time performance insights and comprehensive system visibility.

Instrument applications for end-to-end observability implementing distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures.

Troubleshoot complex incidents in production environments, diagnosing root causes across multiple service layers, databases, caches, and APIs under load using SLISLO frameworks.

Investig...