Skip to content

Kubernetes Integration

Infrastruction Integration

Instructions

Follow the installation guide below for your given collector environment.

During installation, use the configuration section below as reference.

After installation, the infrastructure datasources in the table below will be available in the AOC.

Installation Guide

Installing this integration consists of creating a yaml file in the filesystem of your collectors. Click below for instructions on how to do so for your given collectors environment.

Docker

Kubernetes

Mesos-Marathon

Debian

Ubuntu

RHEL/CentOS

SUSE

Configuration

  1. Ensure that the variable KUBERNETES=yes is in the environment of your collectors. For instance, if you are deploying the collectors as a docker container, you should pass the parameter -e KUBERNETES=yes to your docker run command.

  2. Edit kubernetes.yaml to configure the agent. Please refer kubernetes.yaml for all available configuration options.

    init_config:
      #    tags:
      #      - optional_tag1
      #      - optional_tag2
    instances:
      - port: 4194
    

If you are running Kubernetes >= 1.2.0, you can use the kube-state-metrics project to provide additional metrics to Epoch. To do so, create a kube-state-metrics service from the manifest below and deploy it with kubectl:

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: kube-state-metrics
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: kube-state-metrics
        spec:
          containers:
          - name: kube-state-metrics
            image: gcr.io/google_containers/kube-state-metrics:v0.4.1
            ports:
            - name: metrics
              containerPort: 8080
            resources:
              requests:
                memory: 30Mi
                cpu: 100m
              limits:
                memory: 50Mi
                cpu: 200m

    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
      labels:
        app: kube-state-metrics
      name: kube-state-metrics
    spec:
      ports:
      - name: metrics
        port: 8080
        targetPort: metrics
        protocol: TCP
      selector:
        app: kube-state-metrics

Then, create and edit kubernetes_state.yaml Ensure that the kube_state_url parameter is set to the endpoint of the kube-state-metrics service. For example:

    init_config:
      #    tags:
      #      - optional_tag1
      #      - optional_tag2
    instances:
      - kube_state_url: http://example.com:8080/metrics

NOTE: In-case the collectors are deployed as a daemonset the Autoconf should pick-up this integration automatically. It would look for the container image named: kube-state-metrics, refer the default kubernetes_state.yaml Autoconf file.

Infrastructure Datasources

Datasource Available Aggregations Unit Description
kubernetes.cpu.capacity AVG MAX MIN SUM The number of cores in this machine.
kubernetes.cpu.usage.total AVG MAX MIN SUM percent_nano The percentage of CPU time used
kubernetes.cpu.limits AVG MAX MIN SUM cpu The limit of cpu cores set
kubernetes.cpu.requests AVG MAX MIN SUM cpu The requested cpu cores
kubernetes.filesystem.usage AVG MAX MIN SUM byte The amount of disk used
kubernetes.filesystem.usage_pct AVG MAX MIN SUM fraction The percentage of disk used
kubernetes.memory.capacity AVG MAX MIN SUM byte The amount of memory (in bytes) in this machine
kubernetes.memory.limits AVG MAX MIN SUM byte The limit of memory set
kubernetes.memory.requests AVG MAX MIN SUM byte The requested memory
kubernetes.memory.usage AVG MAX MIN SUM byte The amount of memory used
kubernetes.network.rx_bytes AVG MAX MIN SUM byte/second The amount of bytes per second received
kubernetes.network.tx_bytes AVG MAX MIN SUM byte/second The amount of bytes per second transmitted
kubernetes.network_errors AVG MAX MIN SUM error/second The amount of network errors per second
kubernetes_state.node.cpu_capacity AVG MAX MIN SUM cpu The total CPU resources of the node
kubernetes_state.node.memory_capacity AVG MAX MIN SUM byte The total memory resources of the node
kubernetes_state.node.pods_capacity AVG MAX MIN SUM The total pod resources of the node
kubernetes_state.node.cpu_allocatable AVG MAX MIN SUM cpu The CPU resources of a node that are available for scheduling
kubernetes_state.node.memory_allocatable AVG MAX MIN SUM byte The memory resources of a node that are available for scheduling
kubernetes_state.node.pods_allocatable AVG MAX MIN SUM The pod resources of a node that are available for scheduling
kubernetes_state.deployment.replicas AVG MAX MIN SUM The number of replicas per deployment
kubernetes_state.deployment.replicas_available AVG MAX MIN SUM The number of available replicas per deployment
kubernetes_state.deployment.replicas_unavailable AVG MAX MIN SUM The number of unavailable replicas per deployment
kubernetes_state.deployment.replicas_updated AVG MAX MIN SUM The number of updated replicas per deployment
kubernetes_state.deployment.replicas_desired AVG MAX MIN SUM The number of desired replicas per deployment
kubernetes_state.deployment.paused AVG MAX MIN SUM Whether a deployment is paused
kubernetes_state.deployment.rollingupdate.max_unavailable AVG MAX MIN SUM Maximum number of unavailable replicas during a rolling update
kubernetes_state.daemonset.scheduled AVG MAX MIN SUM The number of nodes running at least one daemon pod and that are supposed to
kubernetes_state.daemonset.misscheduled AVG MAX MIN SUM The number of nodes running a daemon pod but are not supposed to
kubernetes_state.daemonset.desired AVG MAX MIN SUM The number of nodes that should be running the daemon pod
kubernetes_state.pod.ready AVG MAX MIN SUM Whether the pod is ready to serve requests
kubernetes_state.pod.scheduled AVG MAX MIN SUM Reports the status of the scheduling process for the pod with its tags
kubernetes_state.container.ready AVG MAX MIN SUM Whether the containers readiness check succeeded
kubernetes_state.container.running AVG MAX MIN SUM Whether the container is currently in running state
kubernetes_state.container.terminated AVG MAX MIN SUM Whether the container is currently in terminated state
kubernetes_state.container.waiting AVG MAX MIN SUM Whether the container is currently in waiting state
kubernetes_state.container.restarts AVG MAX MIN SUM The number of restarts per container
kubernetes_state.container.cpu_requested AVG MAX MIN SUM cpu The number of requested cpu cores by a container
kubernetes_state.container.memory_requested AVG MAX MIN SUM byte The number of requested memory bytes by a container
kubernetes_state.container.cpu_limit AVG MAX MIN SUM cpu The limit on cpu cores to be used by a container
kubernetes_state.container.memory_limit AVG MAX MIN SUM byte The limit on memory to be used by a container
kubernetes_state.replicaset.replicas AVG MAX MIN SUM The number of replicas per ReplicaSet
kubernetes_state.replicaset.fully_labeled_replicas AVG MAX MIN SUM The number of fully labeled replicas per ReplicaSet
kubernetes_state.replicaset.replicas_ready AVG MAX MIN SUM The number of ready replicas per ReplicaSet
kubernetes_state.replicaset.replicas_desired AVG MAX MIN SUM Number of desired pods for a ReplicaSet
kubernetes_state.resourcequota.pods.used AVG MAX MIN SUM Observed number of pods used for a resource quota
kubernetes_state.resourcequota.services.used AVG MAX MIN SUM Observed number of services used for a resource quota
kubernetes_state.resourcequota.persistentvolumeclaims.used AVG MAX MIN SUM Observed number of persistent volume claims used for a resource quota
kubernetes_state.resourcequota.services.nodeports.used AVG MAX MIN SUM Observed number of node ports used for a resource quota
kubernetes_state.resourcequota.services.loadbalancers.used AVG MAX MIN SUM Observed number of loadbalancers used for a resource quota
kubernetes_state.resourcequota.requests.cpu.used AVG MAX MIN SUM cpu Observed sum of CPU cores requested for a resource quota
kubernetes_state.resourcequota.requests.memory.used AVG MAX MIN SUM byte Observed sum of memory bytes requested for a resource quota
kubernetes_state.resourcequota.requests.storage.used AVG MAX MIN SUM byte Observed sum of storage bytes requested for a resource quota
kubernetes_state.resourcequota.limits.cpu.used AVG MAX MIN SUM cpu Observed sum of limits for CPU cores for a resource quota
kubernetes_state.resourcequota.limits.memory.used AVG MAX MIN SUM byte Observed sum of limits for memory bytes for a resource quota
kubernetes_state.resourcequota.pods.limit AVG MAX MIN SUM Hard limit of the number of pods for a resource quota
kubernetes_state.resourcequota.services.limit AVG MAX MIN SUM Hard limit of the number of services for a resource quota
kubernetes_state.resourcequota.persistentvolumeclaims.limit AVG MAX MIN SUM Hard limit of the number of PVC for a resource quota
kubernetes_state.resourcequota.services.nodeports.limit AVG MAX MIN SUM Hard limit of the number of node ports for a resource quota
kubernetes_state.resourcequota.services.loadbalancers.limit AVG MAX MIN SUM Hard limit of the number of loadbalancers for a resource quota
kubernetes_state.resourcequota.requests.cpu.limit AVG MAX MIN SUM cpu Hard limit on the total of CPU core requested for a resource quota
kubernetes_state.resourcequota.requests.memory.limit AVG MAX MIN SUM byte Hard limit on the total of memory bytes requested for a resource quota
kubernetes_state.resourcequota.requests.storage.limit AVG MAX MIN SUM byte Hard limit on the total of storage bytes requested for a resource quota
kubernetes_state.resourcequota.limits.cpu.limit AVG MAX MIN SUM cpu Hard limit on the sum of CPU core limits for a resource quota
kubernetes_state.resourcequota.limits.memory.limit AVG MAX MIN SUM byte Hard limit on the sum of memory bytes limits for a resource quota

Events

dd-agent supports built in leader election for the Kubernetes event collector. Agents coordinate by performing a leader election among members of the collector DaemonSet through kubernetes to ensure only one leader agent instance is gathering events at a given time. If the leader agent instance fails, a re-election occurs and another cluster agent will take over collection.

In your kubernetes.yaml file you will see the leader_lease_duration parameter. It’s the duration for which a leader stays elected. It should be > 30 seconds. A longer lease duration will result in fewer agent requests to the apiserver, but if the leader dies under certain conditions, an event blackout may occur until the lease expires and a new leader is elected.

This feature relies on ConfigMaps, so you will need to grant the dd-agent get, list, delete and create access to the ConfigMap resource.

Gathering Kubernetes events through leader election is disabled by default. To enable leader election you need to:

  1. Set the variable leader_candidate to true in your kubernetes.yaml file.

  2. Use these Kubernetes RBAC entities for your dd-agent to properly configure the previous permissions.

  3. Create the ClusterRole, ServiceAccount, and ClusterRoleBinding:

    kubectl create -f epoch-dd-agent-serviceaccount.yaml

    Also, you will need to give your cloud user account explicit permission to create Roles and RoleBindings, by giving yourself the cluster-admin role. For example, in GCP, one can do

    $ ACCOUNT=$(gcloud info --format='value(config.account)') $ kubectl create clusterrolebinding owner-cluster-admin-binding --clusterrole cluster-admin --user $ACCOUNT

    Without this, creating Roles/ClusterRoles/RoleBindings/ClusterRoleBindings may give you errors.

  4. Update the Epoch collector daemonset config with the following settings and reload the daemonset.

          env:
            - name: KUBERNETES_LEADER_CANDIDATE
              value: "true"
    

At this time, we can collect the following Kubernetes events:

Backoff
Conflict
Delete
DeletingAllPods
Didn’t have enough resource
Error
Failed
FailedCreate
FailedDelete
FailedMount
FailedSync
Failedvalidation
FreeDiskSpaceFailed
HostPortConflict
InsufficientFreeCPU
InsufficientFreeMemory
InvalidDiskCapacity
Killing
KubeletsetupFailed
NodeNotReady
NodeoutofDisk
OutofDisk
Rebooted
TerminatedAllPods
Unable
Unhealthy