Effective Kubernetes Monitoring Strategies

Apr 6, 2025

SRE

#kubernetes #monitoring #prometheus #grafana #observability

Effective Kubernetes Monitoring Strategies

Monitoring Kubernetes clusters effectively is crucial for maintaining the health and performance of your applications. In this post, I’ll share some practical strategies for setting up comprehensive monitoring solutions.

The Three Pillars of Observability

Modern observability relies on three key components:

Metrics - Numerical data about system performance
Logs - Detailed event records from applications and systems
Traces - Records of requests as they flow through distributed systems

Setting Up Prometheus for Metrics Collection

Prometheus is a powerful tool for collecting and querying metrics. Here’s a simple configuration to get started:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - source_labels: [__address__]
          regex: '(.*):10250'
          replacement: '${1}:9100'
          target_label: __address__
          action: replace

Custom Metrics with Prometheus Operators

You can define custom metrics using PromQL (Prometheus Query Language):

// Calculate request error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) 
  / 
sum(rate(http_requests_total[5m])) by (service)

Setting Up Alerts

Alert configuration is essential for proactive monitoring:

groups:
- name: kubernetes-alerts
  rules:
  - alert: HighCPUUsage
    expr: (100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage detected"
      description: "CPU usage is above 80% for 5 minutes on {{ $labels.instance }}"

Visualizing with Grafana

Here’s a simple bash script to install Grafana in your Kubernetes cluster:

#!/bin/bash
# Install Grafana in K8s cluster

# Add Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Install Grafana
helm install grafana grafana/grafana \
  --namespace monitoring \
  --set persistence.enabled=true \
  --set persistence.size=10Gi \
  --set service.type=LoadBalancer

# Get admin password
kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Useful Dashboards

Once you have metrics flowing, you can create custom dashboards or import existing ones. Here’s a snippet that demonstrates how to query node memory information:

// Example Grafana dashboard panel query for memory usage
100 * (
  1 - (
    node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
  )
)

Conclusion

Implementing robust monitoring for Kubernetes requires a combination of the right tools and proper configuration. By following these strategies, you can build an effective observability stack that helps maintain the reliability of your applications.

In future posts, I’ll dive deeper into advanced alerting strategies and explore how to implement distributed tracing with Jaeger or Zipkin.

← Back to all posts