AIOps Service Health
Unified health monitoring
Problem: Azure incident managers (AIMs) or Directly Responsible Individuals (DRIs) struggle to assess service health due to lack of dashboard access, relying on others for info and slowing incident response.
Solution: The AIOps Service Health dashboard uses SLIs and anomaly detection to assess service health, enabling fast, consistent insights without relying on service DRIs. Jump to demo
Deliverable: Led architecture of a unified monitoring UX framework to help AIMs and DRIs quickly detect outages, anomalies, and system issues across all data layers for real-time incident response. Conducted user research with 15+ stakeholders through interviews, live incident observations, and iterative usability testing to validate design decisions.
Impact: Telemetry shows significant improvement after release in March 2025. In April, Daily Active Usage (DAU) went from 51 to 139. Monthly Active Usage (MAU) went from 700 to 1400.





