AIOps Service Health

Unified health monitoring

Problem: Azure incident managers (AIMs) or Directly Responsible Individuals (DRIs) struggle to assess service health due to lack of dashboard access, relying on others for info and slowing incident response.

Solution: The AIOps Service Health dashboard uses SLIs and anomaly detection to assess service health, enabling fast, consistent insights without relying on service DRIs. Jump to demo

Deliverable: Led architecture of a unified monitoring UX framework to help AIMs and DRIs quickly detect outages, anomalies, and system issues across all data layers for real-time incident response. Conducted user research with 15+ stakeholders through interviews, live incident observations, and iterative usability testing to validate design decisions.

Impact: Telemetry shows significant improvement after release in March 2025. In April, Daily Active Usage (DAU) went from 51 to 139. Monthly Active Usage (MAU) went from 700 to 1400.

Right click to open image in new tab

Right click to open image in new tab Right click to open image in new tab

Right click to open image in new tab

Right click to open image in new tab



Right click to open image in new tab

Right click to open image in new tab

Right click to open image in new tab