Rules

keycloak_audit_alerts

22.066s ago

1.004ms

Rule State Error Last Evaluation Evaluation Time
alert: KeycloakAuditEventsDropped expr: keycloak_events_dropped_total > 0 for: 1m labels: invariant: INV-IAM-09 pd_ref: PD-235 severity: critical annotations: description: '{{ $value }} audit events have been dropped. This violates INV-IAM-09.' summary: Keycloak audit events dropped ok 22.069s ago 510.4us
alert: KeycloakAuditQueueHigh expr: keycloak_events_queue_depth > 1000 for: 5m labels: pd_ref: PD-235 severity: warning annotations: description: Audit event queue depth is {{ $value }}. Events may be delayed. summary: Keycloak audit queue depth high ok 22.069s ago 114.8us
alert: KeycloakAuditForwardingFailed expr: keycloak_audit_forward_failures_total > 0 for: 5m labels: invariant: INV-IAM-09 pd_ref: PD-235 severity: warning annotations: description: Log forwarding has failed {{ $value }} times. Local buffer in use. summary: Keycloak audit log forwarding failed ok 22.069s ago 175.6us
alert: KeycloakAuditReconciliationDelta expr: keycloak_audit_reconciliation_delta > 0 for: 1m labels: invariant: INV-IAM-09 pd_ref: PD-235 severity: critical annotations: description: '{{ $value }} events missing between source and collector.' summary: Keycloak audit event count mismatch ok 22.069s ago 159us

infrastructure

7.005s ago

2.104ms

Rule State Error Last Evaluation Evaluation Time
alert: NodeDown expr: up{job="node"} == 0 for: 2m labels: severity: critical annotations: description: '{{ $labels.instance }} is down' summary: Node exporter down ok 7.005s ago 454.7us
alert: HighCPUUsage expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 10m labels: severity: warning annotations: description: CPU usage > 80% on {{ $labels.instance }} summary: High CPU usage ok 7.004s ago 579.7us
alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85 for: 5m labels: severity: warning annotations: description: Memory usage > 85% on {{ $labels.instance }} summary: High memory usage ok 7.004s ago 394.2us
alert: DiskSpaceLow expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15 for: 5m labels: severity: warning annotations: description: Disk space < 15% on {{ $labels.instance }} summary: Low disk space ok 7.004s ago 394.1us
alert: DiskSpaceCritical expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 5 for: 2m labels: severity: critical annotations: description: Disk space < 5% on {{ $labels.instance }} summary: Critical disk space ok 7.003s ago 242.1us

postgresql

15.629s ago

1.418ms

Rule State Error Last Evaluation Evaluation Time
alert: PostgreSQLDown expr: pg_up == 0 for: 1m labels: severity: critical annotations: description: PostgreSQL on {{ $labels.instance }} is down summary: PostgreSQL is down ok 15.629s ago 806us
alert: PostgreSQLTooManyConnections expr: pg_stat_activity_count > 80 for: 5m labels: severity: warning annotations: description: '{{ $value }} connections (>80)' summary: PostgreSQL connections high ok 15.629s ago 171.9us
alert: PostgreSQLDeadlocks expr: increase(pg_stat_database_deadlocks[5m]) > 0 for: 1m labels: severity: warning annotations: description: Deadlocks detected on {{ $labels.datname }} summary: PostgreSQL deadlocks ok 15.629s ago 240.5us
alert: PostgreSQLSlowQueries expr: rate(pg_stat_statements_seconds_total[5m]) > 1 for: 10m labels: severity: warning annotations: description: High query time on {{ $labels.instance }} summary: Slow queries detected ok 15.628s ago 171.6us

services

6.194s ago

1.869ms

Rule State Error Last Evaluation Evaluation Time
alert: ServiceDown expr: probe_success == 0 for: 3m labels: severity: critical annotations: description: '{{ $labels.instance }} is not responding' summary: Service unreachable ok 6.194s ago 496.8us
alert: APIDown expr: up{job="api"} == 0 for: 2m labels: severity: critical annotations: description: ProbatioVault API is not responding summary: API Backend down ok 6.194s ago 582.5us
alert: GrafanaDown expr: probe_success{instance=~".*grafana.*"} == 0 for: 3m labels: severity: warning annotations: description: Grafana dashboard is not responding summary: Grafana unreachable ok 6.193s ago 465.8us
alert: PrometheusDown expr: up{job="prometheus"} == 0 for: 1m labels: severity: critical annotations: description: Prometheus monitoring is down summary: Prometheus down ok 6.193s ago 141.2us
alert: SonarQubeDown expr: probe_success{instance=~".*sonar.*"} == 0 for: 5m labels: severity: warning annotations: description: SonarQube is not responding summary: SonarQube unreachable ok 6.193s ago 145.4us

ssl

14.582s ago

874.8us

Rule State Error Last Evaluation Evaluation Time
alert: SSLCertExpiringSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 14 for: 1h labels: severity: warning annotations: description: Certificate for {{ $labels.instance }} expires in < 14 days summary: SSL certificate expiring soon ok 14.582s ago 570.4us
alert: SSLCertExpired expr: probe_ssl_earliest_cert_expiry - time() < 0 for: 1m labels: severity: critical annotations: description: Certificate for {{ $labels.instance }} has expired summary: SSL certificate expired ok 14.582s ago 245.1us