Monitor and Optimize Multi-Cluster AKS Costs 💰

Hamdi KHELIL - Sep 12 - - Dev Community

As businesses scale their Kubernetes workloads across multiple Azure Kubernetes Service (AKS) clusters, managing and optimizing cloud costs becomes critical. Deploying and managing observability tools such as KubeCost and OpenTelemetry (OTel) across multiple clusters can be simplified using AKS Fleet Manager, Microsoft Managed Prometheus, and Grafana.

This guide will explain how to deploy KubeCost and OpenTelemetry to multiple AKS clusters using AKS Fleet Manager, expose metrics through OpenTelemetry, and centralize monitoring via Managed Prometheus and Grafana. This setup provides a single view into your multi-cluster environment, allowing for more efficient resource utilization and cost management.

What is KubeCost? 🧮

KubeCost is an open-source cost management tool designed to give you real-time visibility into cloud expenses in Kubernetes environments. It helps identify costs at granular levels—such as namespaces, deployments, services, and pods—allowing organizations to optimize resource usage and reduce expenses.

Why Use AKS Fleet Manager? 🌐

AKS Fleet Manager simplifies managing multiple AKS clusters by centralizing governance, policies, and monitoring across your fleet. Instead of manually managing each cluster, AKS Fleet Manager allows you to orchestrate deployments (like KubeCost and OpenTelemetry) across multiple clusters simultaneously.

Why Use OpenTelemetry, Managed Prometheus, and Grafana? 📊

  • OpenTelemetry (OTel): Provides standardized observability, collecting metrics, logs, and traces from Kubernetes workloads and exposing them to monitoring systems like Prometheus.
  • Microsoft Managed Prometheus: Fully managed Prometheus service that removes the need to handle Prometheus infrastructure, making it easy to monitor metrics across your clusters.
  • Grafana: A powerful visualization tool that integrates with Prometheus to present monitoring metrics in flexible, customizable dashboards.

Deploying these tools across multiple clusters using AKS Fleet Manager allows you to centralize your monitoring and cost optimization across all AKS environments.


Step-by-Step: Deploy KubeCost and OpenTelemetry with AKS Fleet Manager 🔧

Step 1: Create and Register AKS Clusters

First, create the AKS clusters and register them with AKS Fleet Manager. This will allow you to manage multiple clusters as part of a fleet.

Create AKS Clusters

You can create your AKS clusters using the Azure CLI:

# Create a resource group for AKS Fleet Manager
az group create --name myFleetResourceGroup --location eastus

# Create two AKS clusters in different regions
az aks create --resource-group myResourceGroup1 --name myAKSCluster1 --node-count 3 --enable-managed-identity --generate-ssh-keys
az aks create --resource-group myResourceGroup2 --name myAKSCluster2 --node-count 3 --enable-managed-identity --generate-ssh-keys
Enter fullscreen mode Exit fullscreen mode

Register Clusters with AKS Fleet Manager

Once your clusters are ready, you can register them with AKS Fleet Manager:

# Register the first AKS cluster with Fleet Manager
az fleet manager register --resource-group myFleetResourceGroup --name myFleetManager \
  --cluster-id /subscriptions/{subscription-id}/resourceGroups/myResourceGroup1/providers/Microsoft.ContainerService/managedClusters/myAKSCluster1

# Register the second AKS cluster
az fleet manager register --resource-group myFleetResourceGroup --name myFleetManager \
  --cluster-id /subscriptions/{subscription-id}/resourceGroups/myResourceGroup2/providers/Microsoft.ContainerService/managedClusters/myAKSCluster2
Enter fullscreen mode Exit fullscreen mode

Note: Replace {subscription-id} with your actual subscription ID.

Step 2: Create AKS Fleet Manager Workload Template for KubeCost and OpenTelemetry

AKS Fleet Manager allows you to define templates to deploy workloads to multiple clusters simultaneously. Here, we’ll create a workload template for KubeCost and OpenTelemetry.

Define Workload Template for KubeCost

Create a YAML template to deploy KubeCost via Helm:

# kubecost-template.yaml
apiVersion: fleet.azure.com/v1
kind: WorkloadTemplate
metadata:
  name: kubecost-template
spec:
  release:
    chart: kubecost/cost-analyzer
    namespace: kubecost
    version: 1.90.1
    values:
      kubecostProductConfigs:
        prometheus: true
      networkPolicy:
        enabled: true
Enter fullscreen mode Exit fullscreen mode

This template deploys KubeCost into the kubecost namespace on all clusters managed by AKS Fleet Manager.

Define Workload Template for OpenTelemetry Collector

Similarly, define a YAML template for deploying the OpenTelemetry collector in each AKS cluster:

# otel-collector-template.yaml
apiVersion: fleet.azure.com/v1
kind: WorkloadTemplate
metadata:
  name: otel-collector-template
spec:
  release:
    chart: open-telemetry/opentelemetry-collector
    namespace: otel-collector
    version: 0.31.0
    values:
      config:
        receivers:
          prometheus:
            scrape_configs:
              - job_name: 'kubecost'
                metrics_path: /metrics
                static_configs:
                  - targets: ['kubecost-cost-analyzer.kubecost.svc.cluster.local:9090']
        exporters:
          otlp:
            endpoint: "managed-prometheus-endpoint:4317"
            tls:
              insecure: true
        service:
          pipelines:
            metrics:
              receivers: [prometheus]
              exporters: [otlp]
Enter fullscreen mode Exit fullscreen mode

This OpenTelemetry configuration scrapes metrics from KubeCost and exports them to Managed Prometheus.

Step 3: Deploy Workload Templates via AKS Fleet Manager

With both workload templates defined, you can deploy them across your clusters using AKS Fleet Manager.

Deploy KubeCost Template

To deploy the KubeCost workload template to all clusters in your fleet:

az fleet workload create --resource-group myFleetResourceGroup --fleet-name myFleetManager \
  --template kubecost-template.yaml
Enter fullscreen mode Exit fullscreen mode

This command deploys KubeCost to all AKS clusters managed by the AKS Fleet Manager.

Deploy OpenTelemetry Template

Next, deploy the OpenTelemetry workload template to all clusters:

az fleet workload create --resource-group myFleetResourceGroup --fleet-name myFleetManager \
  --template otel-collector-template.yaml
Enter fullscreen mode Exit fullscreen mode

This command deploys the OpenTelemetry collector to all clusters, configuring it to collect KubeCost metrics and forward them to Managed Prometheus.

Step 4: Configure Managed Prometheus

With OpenTelemetry collectors set up in each cluster, Managed Prometheus will now receive metrics from all the clusters.

Enable Microsoft Managed Prometheus

In the Azure portal:

  1. Navigate to Monitoring > Metrics in each AKS cluster.
  2. Enable Managed Prometheus.

This allows Managed Prometheus to begin receiving metrics via the OpenTelemetry OTLP exporter from each cluster.

Step 5: Set Up Centralized Visualization in Managed Grafana

To visualize the metrics and costs across your AKS fleet, configure Managed Grafana to connect to Managed Prometheus.

Create a Managed Grafana Workspace

In the Azure portal:

  1. Create a Managed Grafana workspace by navigating to Azure Managed Grafana.
  2. Follow the prompts to set up the workspace.

Add Prometheus Data Source to Grafana

Once the Managed Grafana workspace is created:

  1. Open the Grafana instance.
  2. Go to Configuration > Data Sources.
  3. Add Prometheus as a data source and provide the Managed Prometheus endpoint URL.

Import KubeCost Dashboards

Import the pre-built KubeCost dashboards into Grafana for cost visibility:

  1. Go to Dashboards > Manage > Import.
  2. Use KubeCost’s provided dashboard IDs or JSON files to import the dashboards.

This setup allows you to monitor KubeCost metrics (such as cost allocation by namespace, deployment, or pod) across all clusters from a single Grafana instance.

Step 6: Monitor and Analyze Costs Across Clusters 📈

Once everything is configured, you can start monitoring and analyzing your Kubernetes costs across multiple clusters using Grafana.

  • Cost Allocation by Namespace: Create a dashboard to show cost breakdown by namespace across clusters using the following query:
  sum(kubecost_allocation{label_namespace!=""}) by (label_namespace)
Enter fullscreen mode Exit fullscreen mode
  • Cross-Cluster Cost Efficiency by Deployment: Create a dashboard to track cost efficiency by deployment across your clusters with this query:
  sum(rate(kubecost_allocation{label_deployment!=""}[5m])) by (label_deployment, cluster) / sum(rate(container_cpu_usage_seconds_total{job="kubelet", container!="POD"}[5m])) by (pod, namespace, cluster)
Enter fullscreen mode Exit fullscreen mode

This centralized view allows you to optimize resource usage and identify potential inefficiencies across multiple AKS clusters.

Step 7: Implement Best Practices for Security and Performance 🔒

  • Secure OpenTelemetry Communication: Use TLS encryption for communication between OpenTelemetry collectors and Managed Prometheus.
  • Limit Network Access: Ensure Managed Prometheus and Grafana endpoints are only accessible to authorized users and systems.
  • Set Resource Limits: Define resource limits for OpenTelemetry collectors to prevent them from consuming excessive resources on your AKS clusters.

Conclusion 🎯

By deploying KubeCost and OpenTelemetry via AKS Fleet Manager and centralizing monitoring with Managed Prometheus and Grafana, you can streamline cost management and observability

across multiple AKS clusters. This setup provides a unified view of costs, enabling you to make data-driven decisions for optimizing resource usage and reducing cloud spend.

With AKS Fleet Manager handling the deployment and orchestration of workloads across multiple clusters, this approach simplifies management and ensures consistency across environments. Implement this multi-cluster monitoring solution today and gain complete control over your Kubernetes spending! 🌟

Happy clustering! 📉

. . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player