Alerts

Compliant Kubernetes (CK8S) includes alerts via Alertmanager.

Important

By default, you will get some platform alerts. This may benefit you, by giving you improved "situational awareness". Please decide if these alerts are of interest to you or not. Feel free to silence them, as the Compliant Kubernetes administrator will take responsibility for them.

Your focus should be on user alerts or application-level alerts, i.e., alerts under the control and responsibility of the Compliant Kubernetes user. We will focus on user alerts in this document.

Compliance needs

Many regulations require you to have an incident management process. Alerts help you discover abnormal application behavior that need attention. This maps to ISO 27001 – Annex A.16: Information Security Incident Management.

Enabling user alerts

User alerts are handled by a project called AlertManager, which needs to be enabled by the administrator. Get in touch with the administrator and they will be happy to help.

Configuring user alerts

User alerts are configured via the Secret alertmanager-alertmanager located in the alertmanager namespace. This configuration file is specified here.

# retrieve the old configuration
kubectl get -n alertmanager secret alertmanager-alertmanager -o jsonpath='{.data.alertmanager\.yaml}' | base64 -d > alertmanager.yaml

# edit alertmanager.yaml as needed

# patch the new configuration
kubectl patch -n alertmanager secret alertmanager-alertmanager -p "'{\"data\":{\"alertmanager.yaml\":\"$(base64 -w 0 < alertmanager.yaml)\"}}'"

Make sure to configure and test a receiver for you alerts, e.g., Slack or OpsGenie.

Note

If you get an access denied error, check with your Compliant Kubernetes administrator.

Accessing user AlertManager

If you want to access AlertManager, for example to confirm that its configuration was picked up correctly, or to configure silences, proceed as follows:

  1. Type: kubectl proxy.
  2. Open this link in your browser.

Configuring alerts

Before setting up an alert, you must first collect metrics from your application by setting up either ServiceMonitors or PodMonitors. In general ServiceMonitors are recommended over PodMonitors, and it is the most common way to configure metrics collection.

Then create a PrometheusRule following the examples below or the upstream documentation with an expression that evaluates to the condition to alert on. Prometheus will pick them up, evaluate them, and then send notifications to AlertManager.

The API reference for Prometheus Operator describes how the Kubernetes resource is configured and the configuration reference for Prometheus describes the rules themselves.

In Compliant Kubernetes the Prometheus Operator in the workload cluster is configured to pick up all PrometheusRules, regardless in which namespace they are or which labels they have.

Running Example

The user demo already includes a PrometheusRule, to configure an alert:

{{- if .Values.prometheusRule.enabled -}}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: {{ include "ck8s-user-demo.fullname" . }}
  labels:
    {{- include "ck8s-user-demo.labels" . | nindent 4 }}
spec:
  groups:
  - name: ./example.rules
    rules:
    - alert: ApplicationIsActuallyUsed
      expr: rate(http_request_duration_seconds_count[1m])>1
{{- end }}

The screenshot below gives an example of the application alert, as seen in AlertManager.

Example of User Demo Alerts

Detailed example

PrometheusRules have two features, either the rules alerts based on expression, or the rules records based on a expression. The former is the way to create alerting rules and the latter is a way to precompute complex queries that will be stored as separate metrics:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: example
    role: alert-rules
  name: prometheus-example-rules
spec:
  groups:
  - name: ./example.rules
    # interval: 30s # optional parameter to configure how often groups of rules are evaluated
    rules:
    - alert: ExampleAlert
      expr: vector(1)
      # for: 1m # optional parameter to configure how long an alert must be triggered to be fired
      labels:
        severity: high
      annotations:
        summary: "Example Alert has been fired!"
        description: "The Example Alert has been fired! It shows the value {{ $value }}."
    - record: example_record_metric
      expr: vector(1)
      labels:
        record: example

For alert rules labels and annotations can be added or overridden that will become present in the resulting alert notifications, in addition the annotations support Go Templating allowing access to the evaluated value via the $value variable and all labels from the expression using the $labels variable.

For recording rules labels can be added or overridden that will become present in the resulting metric.