Kubernetes Integration¶
Infrastruction Integration¶
Instructions¶
Follow the installation guide below for your given collector environment.
During installation, use the configuration section below as reference.
After installation, the infrastructure datasources in the table below will be available in the AOC.
Installation Guide¶
Installing this integration consists of creating a yaml
file in the filesystem of your collectors.
Click below for instructions on how to do so for your given collectors environment.
Configuration¶
-
Ensure that the variable
KUBERNETES=yes
is in the environment of your collectors. For instance, if you are deploying the collectors as a docker container, you should pass the parameter-e KUBERNETES=yes
to yourdocker run
command. -
Edit
kubernetes.yaml
to configure the agent. Please referkubernetes.yaml
for all available configuration options.init_config: # tags: # - optional_tag1 # - optional_tag2 instances: - port: 4194
If you are running Kubernetes >= 1.2.0, you can use the kube-state-metrics project to provide additional metrics to Epoch. To do so, create a kube-state-metrics
service from the manifest below and deploy it with kubectl
:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-state-metrics
spec:
replicas: 1
template:
metadata:
labels:
app: kube-state-metrics
spec:
containers:
- name: kube-state-metrics
image: gcr.io/google_containers/kube-state-metrics:v0.4.1
ports:
- name: metrics
containerPort: 8080
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: kube-state-metrics
name: kube-state-metrics
spec:
ports:
- name: metrics
port: 8080
targetPort: metrics
protocol: TCP
selector:
app: kube-state-metrics
Then, create and edit kubernetes_state.yaml
Ensure that the kube_state_url
parameter is set to the endpoint of the kube-state-metrics
service. For example:
init_config:
# tags:
# - optional_tag1
# - optional_tag2
instances:
- kube_state_url: http://example.com:8080/metrics
NOTE: In-case the collectors are deployed as a daemonset the Autoconf should pick-up this integration automatically. It would look for the container image named: kube-state-metrics
, refer the default kubernetes_state.yaml Autoconf file.
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
kubernetes.cpu.capacity | AVG MAX MIN SUM |
The number of cores in this machine. | |
kubernetes.cpu.usage.total | AVG MAX MIN SUM |
percent_nano | The percentage of CPU time used |
kubernetes.cpu.limits | AVG MAX MIN SUM |
cpu | The limit of cpu cores set |
kubernetes.cpu.requests | AVG MAX MIN SUM |
cpu | The requested cpu cores |
kubernetes.filesystem.usage | AVG MAX MIN SUM |
byte | The amount of disk used |
kubernetes.filesystem.usage_pct | AVG MAX MIN SUM |
fraction | The percentage of disk used |
kubernetes.memory.capacity | AVG MAX MIN SUM |
byte | The amount of memory (in bytes) in this machine |
kubernetes.memory.limits | AVG MAX MIN SUM |
byte | The limit of memory set |
kubernetes.memory.requests | AVG MAX MIN SUM |
byte | The requested memory |
kubernetes.memory.usage | AVG MAX MIN SUM |
byte | The amount of memory used |
kubernetes.network.rx_bytes | AVG MAX MIN SUM |
byte/second | The amount of bytes per second received |
kubernetes.network.tx_bytes | AVG MAX MIN SUM |
byte/second | The amount of bytes per second transmitted |
kubernetes.network_errors | AVG MAX MIN SUM |
error/second | The amount of network errors per second |
kubernetes_state.node.cpu_capacity | AVG MAX MIN SUM |
cpu | The total CPU resources of the node |
kubernetes_state.node.memory_capacity | AVG MAX MIN SUM |
byte | The total memory resources of the node |
kubernetes_state.node.pods_capacity | AVG MAX MIN SUM |
The total pod resources of the node | |
kubernetes_state.node.cpu_allocatable | AVG MAX MIN SUM |
cpu | The CPU resources of a node that are available for scheduling |
kubernetes_state.node.memory_allocatable | AVG MAX MIN SUM |
byte | The memory resources of a node that are available for scheduling |
kubernetes_state.node.pods_allocatable | AVG MAX MIN SUM |
The pod resources of a node that are available for scheduling | |
kubernetes_state.deployment.replicas | AVG MAX MIN SUM |
The number of replicas per deployment | |
kubernetes_state.deployment.replicas_available | AVG MAX MIN SUM |
The number of available replicas per deployment | |
kubernetes_state.deployment.replicas_unavailable | AVG MAX MIN SUM |
The number of unavailable replicas per deployment | |
kubernetes_state.deployment.replicas_updated | AVG MAX MIN SUM |
The number of updated replicas per deployment | |
kubernetes_state.deployment.replicas_desired | AVG MAX MIN SUM |
The number of desired replicas per deployment | |
kubernetes_state.deployment.paused | AVG MAX MIN SUM |
Whether a deployment is paused | |
kubernetes_state.deployment.rollingupdate.max_unavailable | AVG MAX MIN SUM |
Maximum number of unavailable replicas during a rolling update | |
kubernetes_state.daemonset.scheduled | AVG MAX MIN SUM |
The number of nodes running at least one daemon pod and that are supposed to | |
kubernetes_state.daemonset.misscheduled | AVG MAX MIN SUM |
The number of nodes running a daemon pod but are not supposed to | |
kubernetes_state.daemonset.desired | AVG MAX MIN SUM |
The number of nodes that should be running the daemon pod | |
kubernetes_state.pod.ready | AVG MAX MIN SUM |
Whether the pod is ready to serve requests | |
kubernetes_state.pod.scheduled | AVG MAX MIN SUM |
Reports the status of the scheduling process for the pod with its tags | |
kubernetes_state.container.ready | AVG MAX MIN SUM |
Whether the containers readiness check succeeded | |
kubernetes_state.container.running | AVG MAX MIN SUM |
Whether the container is currently in running state | |
kubernetes_state.container.terminated | AVG MAX MIN SUM |
Whether the container is currently in terminated state | |
kubernetes_state.container.waiting | AVG MAX MIN SUM |
Whether the container is currently in waiting state | |
kubernetes_state.container.restarts | AVG MAX MIN SUM |
The number of restarts per container | |
kubernetes_state.container.cpu_requested | AVG MAX MIN SUM |
cpu | The number of requested cpu cores by a container |
kubernetes_state.container.memory_requested | AVG MAX MIN SUM |
byte | The number of requested memory bytes by a container |
kubernetes_state.container.cpu_limit | AVG MAX MIN SUM |
cpu | The limit on cpu cores to be used by a container |
kubernetes_state.container.memory_limit | AVG MAX MIN SUM |
byte | The limit on memory to be used by a container |
kubernetes_state.replicaset.replicas | AVG MAX MIN SUM |
The number of replicas per ReplicaSet | |
kubernetes_state.replicaset.fully_labeled_replicas | AVG MAX MIN SUM |
The number of fully labeled replicas per ReplicaSet | |
kubernetes_state.replicaset.replicas_ready | AVG MAX MIN SUM |
The number of ready replicas per ReplicaSet | |
kubernetes_state.replicaset.replicas_desired | AVG MAX MIN SUM |
Number of desired pods for a ReplicaSet | |
kubernetes_state.resourcequota.pods.used | AVG MAX MIN SUM |
Observed number of pods used for a resource quota | |
kubernetes_state.resourcequota.services.used | AVG MAX MIN SUM |
Observed number of services used for a resource quota | |
kubernetes_state.resourcequota.persistentvolumeclaims.used | AVG MAX MIN SUM |
Observed number of persistent volume claims used for a resource quota | |
kubernetes_state.resourcequota.services.nodeports.used | AVG MAX MIN SUM |
Observed number of node ports used for a resource quota | |
kubernetes_state.resourcequota.services.loadbalancers.used | AVG MAX MIN SUM |
Observed number of loadbalancers used for a resource quota | |
kubernetes_state.resourcequota.requests.cpu.used | AVG MAX MIN SUM |
cpu | Observed sum of CPU cores requested for a resource quota |
kubernetes_state.resourcequota.requests.memory.used | AVG MAX MIN SUM |
byte | Observed sum of memory bytes requested for a resource quota |
kubernetes_state.resourcequota.requests.storage.used | AVG MAX MIN SUM |
byte | Observed sum of storage bytes requested for a resource quota |
kubernetes_state.resourcequota.limits.cpu.used | AVG MAX MIN SUM |
cpu | Observed sum of limits for CPU cores for a resource quota |
kubernetes_state.resourcequota.limits.memory.used | AVG MAX MIN SUM |
byte | Observed sum of limits for memory bytes for a resource quota |
kubernetes_state.resourcequota.pods.limit | AVG MAX MIN SUM |
Hard limit of the number of pods for a resource quota | |
kubernetes_state.resourcequota.services.limit | AVG MAX MIN SUM |
Hard limit of the number of services for a resource quota | |
kubernetes_state.resourcequota.persistentvolumeclaims.limit | AVG MAX MIN SUM |
Hard limit of the number of PVC for a resource quota | |
kubernetes_state.resourcequota.services.nodeports.limit | AVG MAX MIN SUM |
Hard limit of the number of node ports for a resource quota | |
kubernetes_state.resourcequota.services.loadbalancers.limit | AVG MAX MIN SUM |
Hard limit of the number of loadbalancers for a resource quota | |
kubernetes_state.resourcequota.requests.cpu.limit | AVG MAX MIN SUM |
cpu | Hard limit on the total of CPU core requested for a resource quota |
kubernetes_state.resourcequota.requests.memory.limit | AVG MAX MIN SUM |
byte | Hard limit on the total of memory bytes requested for a resource quota |
kubernetes_state.resourcequota.requests.storage.limit | AVG MAX MIN SUM |
byte | Hard limit on the total of storage bytes requested for a resource quota |
kubernetes_state.resourcequota.limits.cpu.limit | AVG MAX MIN SUM |
cpu | Hard limit on the sum of CPU core limits for a resource quota |
kubernetes_state.resourcequota.limits.memory.limit | AVG MAX MIN SUM |
byte | Hard limit on the sum of memory bytes limits for a resource quota |
Events¶
dd-agent
supports built in leader election for the Kubernetes event collector. Agents coordinate by performing a leader election among members of the collector DaemonSet through kubernetes to ensure only one leader agent instance is gathering events at a given time. If the leader agent instance fails, a re-election occurs and another cluster agent will take over collection.
In your kubernetes.yaml
file you will see the leader_lease_duration
parameter. It’s the duration for which a leader stays elected. It should be > 30 seconds. A longer lease duration will result in fewer agent requests to the apiserver, but if the leader dies under certain conditions, an event blackout may occur until the lease expires and a new leader is elected.
This feature relies on ConfigMaps
, so you will need to grant the dd-agent
get
, list
, delete
and create
access to the ConfigMap
resource.
Gathering Kubernetes events through leader election is disabled by default. To enable leader election you need to:
-
Set the variable
leader_candidate
to true in yourkubernetes.yaml
file. -
Use these
Kubernetes RBAC entities
for yourdd-agent
to properly configure the previous permissions. -
Create the
ClusterRole
,ServiceAccount
, andClusterRoleBinding
:kubectl create -f epoch-dd-agent-serviceaccount.yaml
Also, you will need to give your cloud user account explicit permission to create Roles and RoleBindings, by giving yourself the
cluster-admin
role. For example, in GCP, one can do$ ACCOUNT=$(gcloud info --format='value(config.account)')
$ kubectl create clusterrolebinding owner-cluster-admin-binding --clusterrole cluster-admin --user $ACCOUNT
Without this, creating
Roles/ClusterRoles/RoleBindings/ClusterRoleBindings
may give you errors. -
Update the Epoch collector daemonset config with the following settings and reload the daemonset.
env: - name: KUBERNETES_LEADER_CANDIDATE value: "true"
At this time, we can collect the following Kubernetes events:
Backoff
Conflict
Delete
DeletingAllPods
Didn’t have enough resource
Error
Failed
FailedCreate
FailedDelete
FailedMount
FailedSync
Failedvalidation
FreeDiskSpaceFailed
HostPortConflict
InsufficientFreeCPU
InsufficientFreeMemory
InvalidDiskCapacity
Killing
KubeletsetupFailed
NodeNotReady
NodeoutofDisk
OutofDisk
Rebooted
TerminatedAllPods
Unable
Unhealthy