Alerting

Introduction

The following alerts are coming out-of-the-box with Axual 2020.3

  • Connect Metrics Unreachable - When connect metrics unreachable for more than 5 minutes

  • Connector Failing - When connect task failing for more than 5 minutes

To read more about alerting see Alerts
platform-deploy/include/prometheus/alert-rules/connect.yml
groups:
  - name: ConnectRules
    rules:
      - alert: ConnectMetricsUnreachable
        expr: up{is_axual_connect="true"} == 0
        for: 5m
        labels:
          severity: high
          callout: true
        annotations:
          summary: "Instance '{{ $labels.axual_instance }}' machine '{{ $labels.instance }}' connect metrics unreachable for more than 5 minutes."
      - alert: ConnectorFailing
        expr: kafka_connect_connector_metrics_status{is_axual_connect="true"} == 4
        for: 5m
        labels:
          severity: high
          callout: true
        annotations:
          summary: "Instance '{{ $labels.axual_instance }}' machine '{{ $labels.instance }}' connect task failing for more than 5 minutes."