Master agent deployment helps to collect OKD-apiserver
, OKD-controller
, OKD-scheduler
, OKD-kube-state
, OKD-metrics-server
, OKD-coreDNS
/ kubeDNS
metrics required to monitor Kubernetes.
Docker metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
docker.containers.running_total | Docker Container Running Total | The total number of containers running on the host machine | - |
docker.containers.stopped_total | Total Containers Stopped | The total number of containers stopped (not running) on the host machine | - |
docker.container.states | Docker Container states | The state of Container | - |
docker.containers.running | Containers Running by Image | The number of containers running on host plotted with image as instance | - |
docker.containers.stopped | Containers Stopped by Image | The number of containers stopped on host plotted with image as instance | - |
docker.image.size | Image Size | The amount of data (on disk) that is used for the writable layer of each container | megabytes |
docker.image.virtual_size | Image Virtual Size | The total amount of disk-space used for the read-only image data used (shared) by each container and the writable layer of each container | megabytes |
docker.images.available | Images Available | The number of top-level images | - |
docker.images.intermediate | Images intermediate | The number of intermediate images, which are intermediate layers that make up other images | - |
docker.container.size_rootfs | Root Filesystem Size | The total size of all the files in the container | megabytes |
docker.container.size_rw | Total Files Size | The total size of the files (plotted as megabytes) that is changed or newly created if you compare the container to its base image. This indicates that just after the container creation, size should be zero and as you modify (or create) files, size will increase | megabytes |
docker.cpu.usage | CPU Usage | The percentage of CPU time obtained by container with regard to to all CPUs | percent |
docker.cpu.usage.overlimit | CPU Usage Over Limit | The percentage of CPU time obtained by container over its CPU limit set ( If limit is not set , this metric will not be monitored & even | percent |
docker.cpu.usage.percpu | CPU Usage per CPU | The percentage of CPU time obtained by container with regard to to each CPU | percent |
docker.cpu.shares | Shares of CPU | Shares of CPU usage allocated to the container | - |
docker.cpu.system | CPU System | The percentage of time the CPU is executing system calls on behalf of processes of this container, unnormalized | percent |
docker.cpu.throttled | CPU Throttled | Number of times the cgroup is throttled | - |
docker.cpu.user | CPU User | The percentage of time the CPU is under direct control of processes of this container, unnormalized | percent |
docker.mem.usage | Memory Usage | The percentage of used memory out of total node memory | percent |
docker.mem.usage.overlimit | Memory Usage Over Limit | The percentage of used memory out of memory limit ( If limit is not set , this metric will not be monitored & even < metric value >/graph will not be plotted ) | percent |
docker.mem.in_use | Memory In Use | The fraction of used memory to available memory limit if the limit is set. Otherwise, it is against the node memory | - |
docker.mem.limit | Memory Limit | The memory limit for the container, if set | megabytes |
docker.io.read_bytes | IO Read Bytes | Bytes read per second from disk by the processes of the container | bytes/second |
docker.io.write_bytes | IO Write Bytes | Bytes written per second to disk by the processes of the container | bytes/second |
docker.mem.active_anon | Active RSS Memory | The amount of active RSS memory. Active memory is not swapped to disk | megabytes |
docker.mem.active_file | Active Cache Memory | The amount of active cache memory. Active memory is reclaimed by the system only after inactive is reclaimed | megabytes |
docker.mem.cache | Cache Size | The amount of memory that is being used to cache data from disk (For example, memory content that can be associated precisely with a block on a block device) | megabytes |
docker.mem.inactive_anon | Inactive RSS Memory | The amount of inactive RSS memory. Inactive memory is swapped to disk when necessary | megabytes |
docker.mem.inactive_file | Inactive Cache Memory | The amount of inactive cache memory. Inactive memory may be reclaimed first when the system needs memory | megabytes |
docker.mem.mapped_file | Memory Mapped by Process | The amount of memory mapped by the processes in the control group | megabytes |
docker.mem.pgfault | Memory Page Faults | The rate that processes in the container trigger page faults by accessing a non-existent or protected part of its virtual address space. Usually a page fault of this type results in a segmentation fault | per second |
docker.mem.pgmajfault | Memory Page Faults Virtual | The rate that processes in the container trigger page faults by accessing a part virtual address space that was swapped out or corresponded to a mapped file. Usually, a page fault of type results in fetching the data from disk instead of memory | per second |
docker.mem.pgpgin | Pages Charged Rate | The rate at which pages are charged (added to the accounting) of a cgroup | per second |
docker.mem.pgpgout | Pages Uncharged Rate | The rate at which pages are uncharged (removed from the accounting) of a cgroup | per second |
docker.mem.rss | RSS Memory | The amount of non-cache memory that belongs to the container's processes. For example, used for stacks and heaps | megabytes |
docker.mem.soft_limit | Memory Reservation Limit | The memory reservation limit for the container, when set | megabytes |
docker.mem.sw_in_use | Swap Memory In Use | The fraction of used swap + memory to available swap + memory if the limit is set | - |
docker.mem.sw_limit | Swap Memory Limit | The swap + memory limit for the container, when set | megabytes |
docker.container.interface.traffic.in | Network Rx Bytes per Sec | Network Rx Bytes per Second | bytes/second |
docker.container.interface.traffic.out | Network Tx Bytes per Sec | Network Tx Bytes per Second | bytes/second |
docker.container.interface.packets.in | Network Rx Packets per Sec | Network Rx Packets per Second | per second |
docker.container.interface.packets.out | Network Tx Packets per Sec | Network Tx Packets per Second | per second |
docker.container.interface.errors.in | Network Rx Errors per Sec | Network Rx Errors per Second | per second |
docker.container.interface.errors.out | Network Tx Errors per Sec | Network Tx Errors per Second | per second |
docker.container.interface.discards.in | Network Rx Drops per Sec | Network Rx Drops per Second | per second |
docker.container.interface.discards.out | Network Tx Drops per Sec | Network Tx Drops per Second | per second |
ContainerD metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
containerd_hugetlb_failcnt | ContainerD HugeTLB fail Rate | Rate of allocation failure due to HugeTLB limit | - |
containerd_hugetlb_max | ContainerD HugeTLB max usage | max hugepagesize hugetlb usage recorded | bytes |
containerd_hugetlb_usage | ContianerD HugeTLB usage | Current usage for hugepagesize hugetlb | bytes |
containerd_memory_usage | ContinaerD Memory Usage | Memory Usage | bytes |
containerd_memory_usage_failcnt | ContainerD Memory Usage fail Rate | Rate of number of times the cgroup limit exceeded | - |
containerd_memory_usage_limit | ContainerD Memory Usage Limit | limit of memory usage | bytes |
containerd_memory_usage_max | ContainerD Memory Usage Max | show maximum memory usage recorded | bytes |
containerd_memory_cache | ContainerD Memory Cache | bytes of page cache memory | bytes |
containerd_memory_rss | ContainerD Memory RSS | bytes of anonymous and swap cache memory (includes transparent huge pages) | bytes |
containerd_memory_rss_huge | ContainerD Memory RSS Huge | bytes of anonymous transparent huge pages | bytes |
containerd_memory_dirty | ContainerD Memory Dirty | bytes that are waiting to get written back to the disk | bytes |
containerd_memory_swap_usage | ContinaerD Swap Usage | swap Usage | bytes |
containerd_memory_swap_failcnt | DisplContainerD Swap Usage fail Rate | Rate of number of times the cgroup swap limit exceeded | - |
containerd_memory_swap_limit | ContainerD Swap Usage Limit | limit of swap usage | bytes |
containerd_memory_swap_max | ContainerD Swap Usage Max | show maximum swap usage recorded | bytes |
containerd_memory_kernel_usage | ContainerD Kernel Usage Name | current kernel memory allocation | bytes |
containerd_memory_kernel_failcnt | ContainerD Kernel fail count | rate of the number of kernel memory usage hits limits | - |
containerd_memory_kernel_limit | ContainerD Kernel Limit | hard limit for kernel memory | bytes |
containerd_memory_kernel_max | ContainerD Kernel Max | max kernel memory usage recorded | bytes |
containerd_memory_kernel_tcp_usage | ContainerD Kernel TCP Usage | current TCP buffer memory allocation | bytes |
containerd_memory_kernel_tcp_failcnt | ContainerD Kernel TCP fail rate | rate of number of tcp buf memory usage hits limits | - |
containerd_memory_kernel_tcp_limit | ContainerD Kernel TCP Limit | show hard limit for TCP buffer memory | bytes |
containerd_memory_kernel_tcp_max | ContainerD Kernel TCP Max | maximum TCP buffer memory usage recorded | bytes |
containerd_cpu_throttling_throttledTime | ContainerD CPU Throttled Time | CPU throttled time | percent |
containerd_cpu_usage_system | ContainerD CPU System Usage | system CPU usage of container with repect to host system | percent |
containerd_cpu_usage_total | ContainerD CPU Total Usage | total CPU usage of container with repect to host system | percent |
containerd_cpu_usage_user | ContainerD CPU User Usage | user CPU usage of container with repect to host system | percent |
containerd_blkio_service_bytes_recursive | ContainerD BlkIO Service Bytes | Number of bytes transferred to/from the disk | bytes |
containerd_blkio_serviced_recursive | ContainerD BlkIO Serviced | Number of IOs (bio) issued to the disk by the group | bytes |
containerd_blkio_queued_recursive | ContainerD BlkIO Queued | Total number of requests queued up at any given instant for the cgroup | bytes |
containerd_blkio_service_time_recursive | ContainerD BlkIO Service Time | Total amount of time between request dispatch and request completion for the IOs | bytes |
containerd_blkio_wait_time_recursive | ContainerD BlkIO Wait Time | Total amount of time the IOs for this cgroup spent waiting in the scheduler queues for service | bytes |
containerd_blkio_merged_recursive | ContainerD BlkIO Merged | Total number of bios/requests merged into requests belonging to this cgroup | bytes |
containerd_blkio_time_recursive | ContainerD BlkIO Time | disk time allocated to cgroup per device in milliseconds | bytes |
containerd_blkio_sectors_recursive | ContainerD BlkIO Sectors | number of sectors transferred to/from disk by the group | bytes |
containerd_proc_open_fds | ContainerD number of open fd | Number of open file descriptors | - |
containerd_container_uptime | ContainerD Container Uptime | Uptime of the Current Container | second |
containerd_containers_running | ContainerD Running Containers | Total number of running containers | - |
containerd_containers_stopped | ContainerD Stopped Containers | Total number of Stopped Containers | - |
containerd_image_size | ContainerD Image Size | Image sizes of different container images | bytes |
CRI-O metrics
Click here to enable metrics endpoint.
Metrics | Display Name | Description | Units |
---|---|---|---|
crio_operations | Operations Count | Cumulative number of CRI-O operations by operation type | - |
crio_operations_latency_microseconds | Operations Latency Microseconds | Latency of CRI-O operations. Broken down by operation type | microseconds |
crio_operations_latency_microseconds_sum | Operations Latency Microseconds Sum | Latency of CRI-O operations. Broken down by operation type. sum value | microseconds |
crio_operations_latency_microseconds_count | Operations Latency Microseconds Count | Latency of CRI-O operations. Broken down by operation type. count value | microseconds |
crio_operations_errors | Operations Errors | Cumulative number of CRI-O operation errors by operation type | - |
crio_image_pulls_by_digest | Image Pulls by Digest | Bytes transferred by CRI-O image pulls by digest | - |
crio_image_pulls_by_name | Image Pulls by Name | Bytes transferred by CRI-O image pulls by name | - |
crio_image_pulls_by_name_skipped | Image Pulls by Name Skipped | Bytes skipped by CRI-O image pulls by name | - |
crio_image_pulls_successes | Image Pulls Successes | Successful image pulls by image name | - |
crio_image_pulls_failures | Image Pulls Failures | Failed image pulls by image name and their error category | - |
crio_image_layer_reuse | Image Layer Reuse | Reused (not pulled) local image layer count by name | - |
crio_cpu_time | CPU Time | Total user and system CPU time spent | seconds |
crio_mem_resident | Mem Resident | Resident memory size | bytes |
crio_mem_virtual | Mem Virtual | Virtual memory size | bytes |
crio_process_open_fds | Process Open Fds | Number of open file descriptors | - |
crio_cpu_usage_core | CPU Usage | Cumulative CPU usage (sum across all cores) since object creation | nanoseconds |
crio_memory_working_set | Memory Working Set | Amount of working set memory | bytes |
crio_filesystem_used | Filesystem Used | Represents the bytes used for images on the filesystem. (This may differ from the total bytes used on the filesystem and may not equal CapacityBytes - AvailableBytes) | bytes |
crio_inodes_used | Inodes Used | Represents the inodes used by the images. (This may not equal InodesCapacity - InodesAvailable because the underlying filesystem may also be used for purposes other than storing images) | - |
OKD Kubelet metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
kube_pods_running | Pods Running | The number of running pods | - |
kube_containers_running | Containers Running | The number of running containers | - |
kube_containers_restarts | Containers Restarts | The number of times the container is restarted | - |
kube_cpu_load_10s_avg | Cpu Load 10S Avg | Container CPU load average over the last 10 seconds | - |
kube_cpu_system_total | Cpu System Total | System CPU time consumed in seconds | per second |
kube_cpu_user_total | Cpu User Total | User cpu time consumed in seconds | per second |
kube_cpu_cfs_periods | Cpu Cfs Periods | Number of elapsed enforcement period intervals | per second |
kube_cpu_cfs_throttled_periods | Cpu Cfs Throttled Periods | Number of throttled period intervals | per second |
kube_cpu_cfs_throttled_seconds | Cpu Cfs Throttled Seconds | Total duration of the container being throttled | per second |
kube_node_cpu_capacity | Node Cpu Capacity | CPU capacity of Node (Plotted in Millicores) | millicores |
kube_node_memory_capacity | Node Memory Capacity | Memory capacity of node (Plotted in Megabytes) | megabytes |
kube_node_cpu_usage_percentage | Node Cpu Usage Percentage | CPU usage percentage of node | percent |
kube_node_memory_usage_percentage | Node Memory Usage Percentage | Memory usage percentage of node | percent |
kube_node_cpu_allocatable | Node Cpu Allocatable | CPU allocatable of node | millicores |
kube_node_memory_allocatable | Node Memory Allocatable | Memory allocatable of node | megabytes |
kube_node_cpu_usage | Node Cpu Usage | CPU usage of node (Plotted in Millicores) | millicores |
kube_node_memory_usage | Node Memory Usage | Memory usage of node (Plotted in Megabytes) | megabytes |
kube_cpu_usage_total | Cpu Usage Total | CPU time consumed in seconds | per second |
kube_cpu_limits | Cpu Limits | The limit of CPU cores set | millicores |
kube_cpu_requests | Cpu Requests | The requested CPU cores | millicores |
kube_filesystem_usage | Filesystem Usage | Number of megabytes that are consumed by the container on this filesystem | megabytes |
kube_filesystem_usage_pct | Filesystem Usage Pct | Number of megabytes that can be consumed by the container on this filesystem | Fraction |
kube_io_read_bytes | Io Read Bytes | The amount of bytes read from the disk | bytes/second |
kube_io_write_bytes | Io Write Bytes | The amount of bytes written to the disk | bytes/second |
kube_memory_limits | Memory Limits | Memory limit for the container | megabytes |
kube_memory_sw_limit | Memory Sw Limit | Memory swap limit for the container | bytes |
kube_memory_requests | Memory Requests | The requested memory | megabytes |
kube_memory_usage | Memory Usage | Current memory usage in bytes including all memory regardless of when it was accessed | bytes |
kube_memory_working_set | Memory Working Set | Current working set in megabytes, for which the OOM killer is watching for | megabytes |
kube_memory_cache | Memory Cache | Number of bytes of page cache memory | bytes |
kube_memory_rss | Memory Rss | Size of RSS in bytes | bytes |
kube_memory_swap | Memory Swap | Container swap usage in bytes | bytes |
kube_network_rx_bytes | Network Rx Bytes | The amount of bytes received per second | bytes/second |
kube_network_rx_dropped | Network Rx Dropped | The amount of Rx packets dropped per second | packets/second |
kube_network_rx_errors | Network Rx Errors | The amount of Rx errors per second | errors/second |
kube_network_tx_bytes | Network Tx Bytes | The number of bytes transmitted per second | bytes/second |
kube_network_tx_dropped | Network Tx Dropped | The amount of tx packets dropped per second | packets/second |
kube_network_tx_errors | Network Tx Errors | The amount of tx errors per second | errors/second |
kube_apiserver_certificate_expiration | Apiserver Certificate Expiration | Average distribution of the remaining lifetime on the certificate used to authenticate a request since last pool | seconds |
kube_rest_client_requests | Rest Client Requests | The number of HTTP requests | operations/second |
kube_rest_client_latency | Rest Client Latency | Average Request latency in seconds. Broken down by verb and URL since last pool | seconds |
kube_kubelet_runtime_operations | Kubelet Runtime Operations | The number of runtime operations | operations/second |
kube_kubelet_runtime_errors | Kubelet Runtime Errors | The number of runtime operations errors | operations/second |
kube_kubelet_network_plugin_latency | Kubelet Network Plugin Latency | Average latency in seconds of network plugin operations. Broken down by operation type since the last pool | seconds |
kube_kubelet_volume_stats_available_bytes | Kubelet Volume Stats Available Bytes | The number of available bytes in the volume | bytes |
kube_kubelet_volume_stats_capacity_bytes | Kubelet Volume Stats Capacity Bytes | The capacity in bytes of the volume | bytes |
kube_kubelet_volume_stats_used_bytes | Kubelet Volume Stats Used Bytes | The number of used bytes in the volume | bytes |
kube_kubelet_volume_stats_inodes | Kubelet Volume Stats Inodes | The maximum number of inodes in the volume | Inode |
kube_kubelet_volume_stats_inodes_free | Kubelet Volume Stats Inodes Free | The number of free inodes in the volume | Inode |
kube_kubelet_volume_stats_inodes_used | Kubelet Volume Stats Inodes Used | The number of used inodes in the volume | Inode |
kube_ephemeral_storage_usage | Ephemeral Storage Usage | Ephemeral storage usage of the POD | megabytes |
kube_kubelet_evictions | Kubelet Evictions | The number of pods that have been evicted from the kubelet (ALPHA in kubernetes v1.16) | - |
kube_kubelet_cpu_usage | Kubelet Cpu Usage | The number of cores used by kubelet | millicores |
kube_kubelet_memory_rss | Kubelet Memory Rss | Size of kubelet RSS in megabytes | megabytes |
kube_runtime_cpu_usage | Runtime Cpu Usage | The number of cores used by the runtime | millicores |
kube_runtime_memory_rss | Runtime Memory Rss | Size of runtime RSS | megabytes |
kube_kubelet_container_log_filesystem_used_bytes | Kubelet Container Log Filesystem Used Bytes | Bytes used by the container's logs on the filesystem (requires kubernetes 1.14+) | bytes |
OKD Kube State metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
kubernetes_state.container.cpu_limit | Container Cpu Limit | The limit on CPU cores to be used by a container | cpu |
kubernetes_state.container.cpu_requested | Container Cpu Requested | The number of requested CPU cores by a container | cpu |
kubernetes_state.container.memory_limit | Container Memory Limit | The limit on memory to be used by a container | bytes |
kubernetes_state.container.memory_requested | Container Memory Requested | The number of requested memory bytes by a container | bytes |
kubernetes_state.container.ready | Container Ready | Describes whether the containers readiness check succeeded | - |
kubernetes_state.container.ready.total | Total Containers Ready | Total containers whose readiness check succeeded | - |
kubernetes_state.container.restarts | Container Restarts | The number of restarts per container | - |
kubernetes_state.container.restarts.total | Total Containers Restarts Count | Total containers restarts count | - |
kubernetes_state.container.running | Container Running | Describes whether the container is currently in running state | - |
kubernetes_state.container.running.total | Total Containers Running | Total containers currently in running state | - |
kubernetes_state.container.terminated | Container Terminated | Describes whether the container is currently in terminated state | - |
kubernetes_state.container.terminated.total | Total Containers Terminated | Total containers currently in terminated state | - |
kubernetes_state.container.waiting | Container Waiting | Whether the container is currently in waiting state | - |
kubernetes_state.container.waiting.total | Total Containers Waiting | Total containers currently in waiting state | - |
kubernetes_state.daemonset.desired | Daemonset Desired | The number of nodes that should be running the daemon pod | - |
kubernetes_state.daemonset.misscheduled | Daemonset Misscheduled | The number of nodes running a daemon pod but are not expected to | - |
kubernetes_state.daemonset.ready | Daemonset Ready | The number of nodes that should be running the daemon pod and have one or more of the daemon pods running and ready | - |
kubernetes_state.daemonset.scheduled | Daemonset Scheduled | The number of nodes running at least one daemon pod as expected | - |
kubernetes_state.deployment.paused | Deployment Paused | The deployment is paused and will not be processed by the deployment controller | - |
kubernetes_state.deployment.replicas | Deployment Replicas | The number of replicas per deployment | - |
kubernetes_state.deployment.replicas_available | Deployment Replicas Available | The number of available replicas per deployment | - |
kubernetes_state.deployment.replicas_desired | Deployment Replicas Desired | The number of desired replicas per deployment | - |
kubernetes_state.deployment.replicas_unavailable | Deployment Replicas Unavailable | The number of unavailable replicas per deployment | - |
kubernetes_state.deployment.replicas_updated | Deployment Replicas Updated | The number of updated replicas per deployment | - |
kubernetes_state.deployment.rollingupdate.max_unavailable | Deployment Rollingupdate Max Unavailable | Maximum number of unavailable replicas during a rolling update of a deployment | - |
kubernetes_state.node.cpu_allocatable | Node Cpu Allocatable | The CPU resources of a node that are available for scheduling | - |
kubernetes_state.node.cpu_capacity | Node Cpu Capacity | The total CPU resources of the node | cpu |
kubernetes_state.node.memory_allocatable | Node Memory Allocatable | The memory resources of a node that are available for scheduling | bytes |
kubernetes_state.node.memory_capacity | Node Memory Capacity | The total memory resources of the node | bytes |
kubernetes_state.node.pods_allocatable | Node Pods Allocatable | The pod resources of a node that are available for scheduling | - |
kubernetes_state.node.pods_capacity | Node Pods Capacity | The total pod resources of the node | - |
kubernetes_state.node.status | Node Status | The condition of a cluster node plotted with node as an instance. This metric gives status of each node with values either 0 or 1. | - |
kubernetes_state.pod.ready | Pod Ready | Describes whether the pod is ready to serve requests. In association with the condition tag, whether the pod is ready to serve requests. For example, condition:true keeps the pods that are in a ready state | - |
kubernetes_state.pod.scheduled | Pod Scheduled | Describes the status of the scheduling process for the pod | - |
kubernetes_state.replicaset.fully_labeled_replicas | Replicaset Fully Labeled Replicas | The number of fully labeled replicas per ReplicaSet | - |
kubernetes_state.replicaset.replicas | Replicaset Replicas | The number of replicas per ReplicaSet | - |
kubernetes_state.replicaset.replicas_desired | Replicaset Replicas Desired | Number of desired pods for a ReplicaSet | - |
kubernetes_state.replicaset.replicas_ready | Replicaset Replicas Ready | The number of ready replicas per ReplicaSet | - |
kubernetes_state.resourcequota.limits.cpu.limit | Resourcequota Limits Cpu Limit | Hard limit on the sum of CPU core limits for a resource quota | cpu |
kubernetes_state.resourcequota.limits.cpu.used | Resourcequota Limits Cpu Used | Observed sum of limits for CPU cores for a resource quota | cpu |
kubernetes_state.resourcequota.limits.memory.limit | Resourcequota Limits Memory Limit | Hard limit on the sum of memory bytes limits for a resource quota | bytes |
kubernetes_state.resourcequota.limits.memory.used | Resourcequota Limits Memory Used | Observed sum of limits for memory bytes for a resource quota | bytes |
kubernetes_state.resourcequota.persistentvolumeclaims.limit | Resourcequota Persistentvolumeclaims Limit | Hard limit of the number of PVC for a resource quota | - |
kubernetes_state.resourcequota.persistentvolumeclaims.used | Resourcequota Persistentvolumeclaims Used | Observed number of persistent volume claims used for a resource quota | - |
kubernetes_state.resourcequota.pods.limit | Resourcequota Pods Limit | Hard limit of the number of pods for a resource quota | - |
kubernetes_state.resourcequota.pods.used | Resourcequota Pods Used | Observed number of pods used for a resource quota | - |
kubernetes_state.resourcequota.requests.cpu.limit | Resourcequota Requests Cpu Limit | Hard limit on the total of CPU core requested for a resource quota | cpu |
kubernetes_state.resourcequota.requests.cpu.used | Resourcequota Requests Cpu Used | Observed sum of CPU cores requested for a resource quota | cpu |
kubernetes_state.resourcequota.requests.memory.limit | Resourcequota Requests Memory Limit | Hard limit on the total of memory bytes requested for a resource quota | bytes |
kubernetes_state.resourcequota.requests.memory.used | Resourcequota Requests Memory Used | Observed sum of memory bytes requested for a resource quota | bytes |
kubernetes_state.resourcequota.requests.storage.limit | Resourcequota Requests Storage Limit | Hard limit on the total of storage bytes requested for a resource quota | bytes |
kubernetes_state.resourcequota.requests.storage.used | Resourcequota Requests Storage Used | Observed sum of storage bytes requested for a resource quota | bytes |
kubernetes_state.resourcequota.services.limit | Resourcequota Services Limit | Hard limit of the number of services for a resource quota | - |
kubernetes_state.resourcequota.services.loadbalancers.limit | Resourcequota Services Loadbalancers Limit | Hard limit of the number of load balancers for a resource quota | - |
kubernetes_state.resourcequota.services.loadbalancers.used | Resourcequota Services Loadbalancers Used | Observed number of load balancers used for a resource quota | - |
kubernetes_state.resourcequota.services.nodeports.limit | Resourcequota Services Nodeports Limit | Hard limit of the number of node ports for a resource quota | - |
kubernetes_state.resourcequota.services.nodeports.used | Resourcequota Services Nodeports Used | Observed number of node ports used for a resource quota | - |
kubernetes_state.resourcequota.services.used | Resourcequota Services Used | Observed number of services used for a resource quota | - |
OKD CoreDNS metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
coredns.panics | Total Panics | Total number of panics | - |
coredns.query.count | Query count | Total query count | - |
coredns.request_duration.seconds.sum | Request Duration Seconds Sum | Duration to process each query | - |
coredns.request_duration.seconds.count | Request Duration Seconds Count | Duration per upstream interaction | - |
coredns.response_size.bytes.sum | Response Size Bytes Sum | Size of the returns response | bytes |
Note: CoreDNS is supported in the later versions of Kubernetes 1.21.
OKD KubeDNS metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
kubedns.cachemiss_count | Cachemiss Count | Number of DNS cache misses (from start of process) | - |
kubedns.error_count | Error Count | Number of DNS requests resulting in an error | - |
kubedns.request_count | Request Count | Total number of DNS requests made | - |
kubedns.request_duration.seconds.count | Request Duration Seconds Count | Number of requests on which the kubedns.request_duration.seconds.sum metric is evaluated | - |
kubedns.request_duration.seconds.sum | Request Duration Seconds Sum | Time (in seconds) taken to resolve each request | - |
kubedns.response_size.bytes.count | Response Size Bytes Count | Number of responses on which the kubedns.response_size.bytes.sum metric is evaluated | - |
kubedns.response_size.bytes.sum | Response Size Bytes Sum | Size of the returns response in bytes | bytes |
Note: KubeDNS is supported prior to Kubernetes version 1.21.
OKD Kube Controller metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
controller.workqueue.work_duration.sum | Kube Controller Workqueue Work Duration Seconds Sum | Duration taken in seconds to process an item from workqueue | seconds |
controller.workqueue.work_duration.count | Kube Controller Workqueue Work Duration Seconds Count | Total time taken in seconds to process an item from workqueue | seconds |
controller.workqueue.work_unfinished_duration | Kube Controller Workqueue Unfinished Work Seconds | Time in seconds taken for the work in progress and has not been observed by work_duration. Large values indicate stuck threads | seconds |
controller.workqueue.work_longest_duration | Kube Controller Workqueue Longest Running Processor Seconds | Time in seconds for which the longest running processor for workqueue is running | - |
controller.workqueue.queue_duration.sum | Kube Controller Workqueue Queue Duration Seconds Sum | Duration in seconds for whichan item remains in workqueue before being requested | - |
controller.workqueue.queue_duration.count | Kube Controller Workqueue Queue Duration Seconds Count | Total duration in seconds for which an item remains in workqueue before being requested | - |
controller.workqueue.nodes.count | Kube Controller Registered Nodes | Number of registered Nodes per zone | - |
controller.workqueue.nodes.unhealthy | Kube Controller Node Collector Unhealthy Nodes in Zone | Number of Nodes not ready per zone | - |
controller.workqueue.nodes.evictions | Kube Controller Node Collector Evictions Number | Number of Node evictions that happened since current instance of NodeController started | - |
controller.workqueue.depth | Kube Controller Workqueue Depth | Current depth of workqueue | - |
controller.workqueue.adds | Kube Controller Workqueue Adds Total | Total number of additions/insertions handled by workqueue | - |
controller.workqueue.retries | Kube Controller Workqueue Retries Total | Total number of retries handled by workqueue | - |
controller.rate_limiter.use | Kube Controller Node Lifecycle Controller Rate Limiter Use | A metric measuring the saturation of the rate limiter for node_lifecycle_controller | - |
controller.go.goroutines | Kube Controller Go Goroutines | Number of goroutines that currently exist | - |
controller.threads | Kube Controller Os Threads | Number of OS threads created | - |
controller.process.max_fds | Kube Controller Process Max Fds | Maximum number of open file descriptors | - |
controller.process.open_fds | Kube Controller Process Open Fds | Number of open file descriptors | - |
OKD Kube Scheduler metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
scheduler.binding.duration.count | Kube Scheduler Binding Duration Seconds Count | Total Binding duration in seconds | seconds |
scheduler.binding.duration.seconds | Kube Scheduler Binding Duration Seconds Sum | Binding duration | seconds |
scheduler.binding.latency.count | Kube Scheduler Binding Latency Microseconds Count | Total Binding latency | microseconds |
scheduler.binding.latency.sum | Kube Scheduler Binding Latency Microseconds | Binding latency sum | microseconds |
scheduler.cache.lookups | Kube Scheduler Equiv Cache Lookups Total | Total number of equivalent cache lookups, by whether a cache entry was found | - |
scheduler.client.http.requests | Kube Scheduler Rest Client Requests Total | Number of HTTP requests, partitioned by status code, method, and host | - |
scheduler.client.http.requests_duration.count | Kube Scheduler Rest Client Request Latency Seconds Count | Total request latency. Broken down by verb and URL | seconds |
scheduler.client.http.requests_duration.sum | Kube Scheduler Rest Client Request Latency Seconds Sum | Request latency. Broken down by verb and URL | seconds |
scheduler.gc_duration_seconds.count | Kube Scheduler Go GC Duration Seconds Count | A summary of the GC invocation durations | - |
scheduler.gc_duration_seconds.quantile | Kube Scheduler Go GC Duration Seconds | A summary of the GC invocation durations | - |
scheduler.gc_duration_seconds.sum | Kube Scheduler Go GC Duration Seconds Sum | A summary of the GC invocation durations | - |
scheduler.go.goroutines | Kube Scheduler Go Goroutines | Number of goroutines that currently exist | - |
scheduler.process.max_fds | Kube Scheduler Process Max Fds | Maximum number of open file descriptors | - |
scheduler.process.open_fds | Kube Scheduler Process Open Fds | Number of open file descriptors | - |
scheduler.pod_preemption.victims | Kube Scheduler Pod Preemption Victims | Number of selected preemption victims | - |
scheduler.pod_preemption.attempts | Kube Scheduler Total Preemption Attempts | Total preemption attempts in the cluster till now | - |
scheduler.schedule_attempts.total | Kube Scheduler Schedule Attempts Total | Number of attempts to schedule pods, by the result. unschedulable means a pod could not be scheduled, and error means an internal scheduler problem | - |
scheduler.scheduling.algorithm_duration.count | Kube Scheduler Scheduling Algorithm Duration Seconds Count | Total Scheduling algorithm latency | seconds |
scheduler.scheduling.algorithm_duration.sum | Kube Scheduler Scheduling Algorithm Duration Seconds Sum | Scheduling algorithm latency | seconds |
scheduler.scheduling.algorithm_latency.count | Kube Scheduler Scheduling Algorithm Latency Microseconds Count | Total Scheduling algorithm latency | microseconds |
scheduler.scheduling.algorithm_latency.sum | Kube Scheduler Scheduling Algorithm Latency Microseconds Sum | Scheduling algorithm latency | microseconds |
scheduler.scheduling.algorithm.predicate_duration.count | Kube Scheduler Scheduling Algorithm Predicate Evaluation Count | Scheduling algorithm predicate evaluation duration | - |
scheduler.scheduling.algorithm.predicate_duration.sum | Kube Scheduler Scheduling Algorithm Predicate Evaluation Sum | Scheduling algorithm predicate evaluation duration | - |
scheduler.scheduling.algorithm.preemption_duration.count | Kube Scheduler Scheduling Algorithm Preemption Evaluation Count | Scheduling algorithm preemption evaluation duration | - |
scheduler.scheduling.algorithm.preemption_duration.sum | Kube Scheduler Scheduling Algorithm Preemption Evaluation Sum | Scheduling algorithm preemption evaluation duration | - |
scheduler.scheduling.algorithm.priority_duration.count | Kube Scheduler Scheduling Algorithm Priority Evaluation Count | Scheduling algorithm priority evaluation duration | - |
scheduler.scheduling.algorithm.priority_duration.sum | Kube Scheduler Scheduling Algorithm Priority Evaluation Sum | Scheduling algorithm priority evaluation duration | - |
scheduler.e2e.scheduling_duration.count | Kube Scheduler E2E Scheduling Duration Seconds Count | Total E2e scheduling latency (scheduling algorithm + binding) | seconds |
scheduler.e2e.scheduling_duration.sum | Kube Scheduler E2E Scheduling Duration Seconds Sum | E2e scheduling latency (scheduling algorithm + binding) | seconds |
scheduler.e2e.scheduling_latency.count | Kube Scheduler E2E Scheduling Latency Microseconds Count | Total E2e scheduling latency (scheduling algorithm + binding) | microseconds |
scheduler.e2e.scheduling_latency.sum | Kube Scheduler E2E Scheduling Latency Microseconds Sum | E2e scheduling latency (scheduling algorithm + binding) | microseconds |
scheduler.scheduling.scheduling_duration.count | Kube Scheduler Scheduling Duration Seconds Count | Scheduling latency split by sub-parts of the scheduling operation | seconds |
scheduler.scheduling.scheduling_duration.quantile | Kube Scheduler Scheduling Duration Seconds | Scheduling latency split by sub-parts of the scheduling operation | seconds |
scheduler.scheduling.scheduling_duration.sum | Kube Scheduler Scheduling Duration Seconds Sum | Scheduling latency split by sub-parts of the scheduling operation | seconds |
scheduler.scheduling.scheduling_latency.count | Kube Scheduler Scheduling Latency Seconds Count | Scheduling latency split by sub-parts of the scheduling operation | seconds |
scheduler.scheduling.scheduling_latency.quantile | Kube Scheduler Scheduling Latency Seconds | Scheduling latency split by sub-parts of the scheduling operation | seconds |
scheduler.scheduling.scheduling_latency.sum | Kube Scheduler Scheduling Latency Seconds Sum | Scheduling latency split by sub-parts of the scheduling operation | seconds |
scheduler.threads | Kube Scheduler OS Threads | Number of OS threads created | - |
scheduler.volume_scheduling_duration.sum | scheduler.volume_scheduling_duration.sum Kube Scheduler Volume Scheduling Duration Seconds Sum | Volume scheduling stage latency sum | - |
scheduler.volume_scheduling_duration.count | Kube Scheduler Volume Scheduling Duration Seconds Count | Volume scheduling stage latency count | - |
OKD Server metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
metrics_server.go_gc_duration_seconds_sum | Go GC Duration Seconds Sum | A summary of the GC invocation durations | seconds |
metrics_server.authenticated_user_requests | Authenticated User Requests | Counter of authenticated requests broken out by username | - |
metrics_server.go_goroutines | Go Goroutines | Number of goroutines that currently exist | - |
metrics_server.manager_tick_duration_sum | Manager Tick Duration Sum | The total time spent collecting and storing metrics | seconds |
metrics_server.scraper_duration_count | Scraper Duration Count | Time spent scraping sources | seconds |
metrics_server.scraper_duration_sum | Scraper Duration Sum | Time spent scraping sources | seconds |
metrics_server.scraper_last_time | Scraper Last Time | Last time metrics-server performed a scrape since unix epoch | seconds |
metrics_server.go_gc_duration_seconds_quantile | Go GC Duration Seconds Quantile | A summary of the GC invocation durations | seconds |
metrics_server.kubelet_summary_request_duration_sum | Kubelet Summary Request Duration Sum | The Kubelet summary request latencies | seconds |
metrics_server.kubelet_summary_scrapes_total | Kubelet Summary Scrapes Total | Total number of attempted Summary API scrapes done by Metrics Server | - |
metrics_server.manager_tick_duration_count | Manager Tick Duration Count | The total time spent collecting and storing metrics | seconds |
metrics_server.process_max_fds | Process Max Fds | Maximum number of open file descriptors | - |
metrics_server.process_open_fds | Process Open Fds | Number of open file descriptors | - |
metrics_server.go_gc_duration_seconds_count | Go GC Duration Seconds Count | A summary of the GC invocation durations | - |
metrics_server.kubelet_summary_request_duration_count | Kubelet Summary Request Duration Count | The Kubelet summary request latencies | seconds |
metrics_server.process_cpu_seconds_total | Process Cpu Seconds Total | Total user and system CPU time spent | seconds |
OKD API Server metrics
Metrics | Display Name | Description | Units |
---|---|---|---|
apiserver.go.threads.total | Kube apiserver Go Threads Total | Number of OS threads created | - |
apiserver.authenticated.user.requests | Kube apiserver Authenticated User Requests | Counter of authenticated requests broken out by username | - |
apiserver.http.requests.total.count | Kube apiserver HTTP Requests Total Count | Total number of HTTP requests made | - |
apiserver.authenticated.user.requests.count | Kube apiserver Authenticated User Requests Count | Counter of authenticated requests broken out by username | - |
apiserver.dropped.requests.total | Kube apiserver Dropped Requests Total | Accumulated number of requests dropped with Try-again-later response | - |
apiserver.http.requests.total | Kube apiserver HTTP Requests Total | Total number of HTTP requests made | - |
apiserver.audit.event.total | Kube apiserver Audit Event Total | Counter of audit events generated and sent to the audit back end | - |
apiserver.rest.client.requests.total | Kube apiserver Rest Client Requests Total | Number of HTTP requests, partitioned by status code, method, and host | - |
apiserver.request.count | Kube apiserver Request Count | Counter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code | - |
apiserver.request.count.count | Kube apiserver Request Count Count | Counter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code | - |
apiserver.dropped.requests.total.count | Kube apiserver Dropped Requests Total Count | Monotonic count of requests dropped with Try-again-later response | - |
apiserver.inflight.requests | Kube apiserver Inflight Requests | Maximal number of currently used inflight request limit of this API server per request kind in the last second | - |
apiserver.go.goroutines | Kube apiserver Goroutines | Number of goroutines that currently exist | - |
apiserver.APIServiceRegistrationController.depth | Kube apiserver APIService Registration Controller Depth | Current depth of workqueue: APIServiceRegistrationController | - |
apiserver.etcd.object.counts | Kube apiserver ETCD Object Counts | Number of stored objects at the time of last check split by kind | - |
apiserver.rest.client.requests.total.count | Kube apiserver Rest Client Requests Total Count | Number of HTTP requests, partitioned by status code, method, and host | - |