keycloak_audit_alerts |
22.066s ago |
1.004ms |
||
| Rule | State | Error | Last Evaluation | Evaluation Time |
| alert: KeycloakAuditEventsDropped expr: keycloak_events_dropped_total > 0 for: 1m labels: invariant: INV-IAM-09 pd_ref: PD-235 severity: critical annotations: description: '{{ $value }} audit events have been dropped. This violates INV-IAM-09.' summary: Keycloak audit events dropped | ok | 22.069s ago | 510.4us | |
| alert: KeycloakAuditQueueHigh expr: keycloak_events_queue_depth > 1000 for: 5m labels: pd_ref: PD-235 severity: warning annotations: description: Audit event queue depth is {{ $value }}. Events may be delayed. summary: Keycloak audit queue depth high | ok | 22.069s ago | 114.8us | |
| alert: KeycloakAuditForwardingFailed expr: keycloak_audit_forward_failures_total > 0 for: 5m labels: invariant: INV-IAM-09 pd_ref: PD-235 severity: warning annotations: description: Log forwarding has failed {{ $value }} times. Local buffer in use. summary: Keycloak audit log forwarding failed | ok | 22.069s ago | 175.6us | |
| alert: KeycloakAuditReconciliationDelta expr: keycloak_audit_reconciliation_delta > 0 for: 1m labels: invariant: INV-IAM-09 pd_ref: PD-235 severity: critical annotations: description: '{{ $value }} events missing between source and collector.' summary: Keycloak audit event count mismatch | ok | 22.069s ago | 159us | |
infrastructure |
7.005s ago |
2.104ms |
||
| Rule | State | Error | Last Evaluation | Evaluation Time |
| alert: NodeDown expr: up{job="node"} == 0 for: 2m labels: severity: critical annotations: description: '{{ $labels.instance }} is down' summary: Node exporter down | ok | 7.005s ago | 454.7us | |
| alert: HighCPUUsage expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 10m labels: severity: warning annotations: description: CPU usage > 80% on {{ $labels.instance }} summary: High CPU usage | ok | 7.004s ago | 579.7us | |
| alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85 for: 5m labels: severity: warning annotations: description: Memory usage > 85% on {{ $labels.instance }} summary: High memory usage | ok | 7.004s ago | 394.2us | |
| alert: DiskSpaceLow expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15 for: 5m labels: severity: warning annotations: description: Disk space < 15% on {{ $labels.instance }} summary: Low disk space | ok | 7.004s ago | 394.1us | |
| alert: DiskSpaceCritical expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 5 for: 2m labels: severity: critical annotations: description: Disk space < 5% on {{ $labels.instance }} summary: Critical disk space | ok | 7.003s ago | 242.1us | |
postgresql |
15.629s ago |
1.418ms |
||
| Rule | State | Error | Last Evaluation | Evaluation Time |
| alert: PostgreSQLDown expr: pg_up == 0 for: 1m labels: severity: critical annotations: description: PostgreSQL on {{ $labels.instance }} is down summary: PostgreSQL is down | ok | 15.629s ago | 806us | |
| alert: PostgreSQLTooManyConnections expr: pg_stat_activity_count > 80 for: 5m labels: severity: warning annotations: description: '{{ $value }} connections (>80)' summary: PostgreSQL connections high | ok | 15.629s ago | 171.9us | |
| alert: PostgreSQLDeadlocks expr: increase(pg_stat_database_deadlocks[5m]) > 0 for: 1m labels: severity: warning annotations: description: Deadlocks detected on {{ $labels.datname }} summary: PostgreSQL deadlocks | ok | 15.629s ago | 240.5us | |
| alert: PostgreSQLSlowQueries expr: rate(pg_stat_statements_seconds_total[5m]) > 1 for: 10m labels: severity: warning annotations: description: High query time on {{ $labels.instance }} summary: Slow queries detected | ok | 15.628s ago | 171.6us | |
services |
6.194s ago |
1.869ms |
||
| Rule | State | Error | Last Evaluation | Evaluation Time |
| alert: ServiceDown expr: probe_success == 0 for: 3m labels: severity: critical annotations: description: '{{ $labels.instance }} is not responding' summary: Service unreachable | ok | 6.194s ago | 496.8us | |
| alert: APIDown expr: up{job="api"} == 0 for: 2m labels: severity: critical annotations: description: ProbatioVault API is not responding summary: API Backend down | ok | 6.194s ago | 582.5us | |
| alert: GrafanaDown expr: probe_success{instance=~".*grafana.*"} == 0 for: 3m labels: severity: warning annotations: description: Grafana dashboard is not responding summary: Grafana unreachable | ok | 6.193s ago | 465.8us | |
| alert: PrometheusDown expr: up{job="prometheus"} == 0 for: 1m labels: severity: critical annotations: description: Prometheus monitoring is down summary: Prometheus down | ok | 6.193s ago | 141.2us | |
| alert: SonarQubeDown expr: probe_success{instance=~".*sonar.*"} == 0 for: 5m labels: severity: warning annotations: description: SonarQube is not responding summary: SonarQube unreachable | ok | 6.193s ago | 145.4us | |
ssl |
14.582s ago |
874.8us |
||
| Rule | State | Error | Last Evaluation | Evaluation Time |
| alert: SSLCertExpiringSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 14 for: 1h labels: severity: warning annotations: description: Certificate for {{ $labels.instance }} expires in < 14 days summary: SSL certificate expiring soon | ok | 14.582s ago | 570.4us | |
| alert: SSLCertExpired expr: probe_ssl_earliest_cert_expiry - time() < 0 for: 1m labels: severity: critical annotations: description: Certificate for {{ $labels.instance }} has expired summary: SSL certificate expired | ok | 14.582s ago | 245.1us | |