Master agent deployment helps to collect OKD-apiserver, OKD-controller, OKD-scheduler, OKD-kube-state, OKD-metrics-server, OKD-coreDNS / kubeDNS metrics required to monitor Kubernetes.

Docker metrics

MetricsDisplay NameDescriptionUnits
docker.containers.running_totalDocker Container Running TotalThe total number of containers running on the host machine-
docker.containers.stopped_totalTotal Containers StoppedThe total number of containers stopped (not running) on the host machine-
docker.container.statesDocker Container statesThe state of Container-
docker.containers.runningContainers Running by ImageThe number of containers running on host plotted with image as instance-
docker.containers.stoppedContainers Stopped by ImageThe number of containers stopped on host plotted with image as instance-
docker.image.sizeImage SizeThe amount of data (on disk) that is used for the writable layer of each containermegabytes
docker.image.virtual_sizeImage Virtual SizeThe total amount of disk-space used for the read-only image data used (shared) by each container and the writable layer of each containermegabytes
docker.images.availableImages AvailableThe number of top-level images-
docker.images.intermediateImages intermediateThe number of intermediate images, which are intermediate layers that make up other images-
docker.container.size_rootfsRoot Filesystem SizeThe total size of all the files in the containermegabytes
docker.container.size_rwTotal Files SizeThe total size of the files (plotted as megabytes) that is changed or newly created if you compare the container to its base image. This indicates that just after the container creation, size should be zero and as you modify (or create) files, size will increasemegabytes
docker.cpu.usageCPU UsageThe percentage of CPU time obtained by container with regard to to all CPUspercent
docker.cpu.usage.overlimitCPU Usage Over LimitThe percentage of CPU time obtained by container over its CPU limit set ( If limit is not set , this metric will not be monitored & even /graph will not be plotted )percent
docker.cpu.usage.percpuCPU Usage per CPUThe percentage of CPU time obtained by container with regard to to each CPUpercent
docker.cpu.sharesShares of CPUShares of CPU usage allocated to the container-
docker.cpu.systemCPU SystemThe percentage of time the CPU is executing system calls on behalf of processes of this container, unnormalizedpercent
docker.cpu.throttledCPU ThrottledNumber of times the cgroup is throttled-
docker.cpu.userCPU UserThe percentage of time the CPU is under direct control of processes of this container, unnormalizedpercent
docker.mem.usageMemory UsageThe percentage of used memory out of total node memorypercent
docker.mem.usage.overlimitMemory Usage Over LimitThe percentage of used memory out of memory limit ( If limit is not set , this metric will not be monitored & even < metric value >/graph will not be plotted )percent
docker.mem.in_useMemory In UseThe fraction of used memory to available memory limit if the limit is set. Otherwise, it is against the node memory-
docker.mem.limitMemory LimitThe memory limit for the container, if setmegabytes
docker.io.read_bytesIO Read BytesBytes read per second from disk by the processes of the containerbytes/second
docker.io.write_bytesIO Write BytesBytes written per second to disk by the processes of the containerbytes/second
docker.mem.active_anonActive RSS MemoryThe amount of active RSS memory. Active memory is not swapped to diskmegabytes
docker.mem.active_fileActive Cache MemoryThe amount of active cache memory. Active memory is reclaimed by the system only after inactive is reclaimedmegabytes
docker.mem.cacheCache SizeThe amount of memory that is being used to cache data from disk (For example, memory content that can be associated precisely with a block on a block device)megabytes
docker.mem.inactive_anonInactive RSS MemoryThe amount of inactive RSS memory. Inactive memory is swapped to disk when necessarymegabytes
docker.mem.inactive_fileInactive Cache MemoryThe amount of inactive cache memory. Inactive memory may be reclaimed first when the system needs memorymegabytes
docker.mem.mapped_fileMemory Mapped by ProcessThe amount of memory mapped by the processes in the control groupmegabytes
docker.mem.pgfaultMemory Page FaultsThe rate that processes in the container trigger page faults by accessing a non-existent or protected part of its virtual address space. Usually a page fault of this type results in a segmentation faultper second
docker.mem.pgmajfaultMemory Page Faults VirtualThe rate that processes in the container trigger page faults by accessing a part virtual address space that was swapped out or corresponded to a mapped file. Usually, a page fault of type results in fetching the data from disk instead of memoryper second
docker.mem.pgpginPages Charged RateThe rate at which pages are charged (added to the accounting) of a cgroupper second
docker.mem.pgpgoutPages Uncharged RateThe rate at which pages are uncharged (removed from the accounting) of a cgroupper second
docker.mem.rssRSS MemoryThe amount of non-cache memory that belongs to the container's processes. For example, used for stacks and heapsmegabytes
docker.mem.soft_limitMemory Reservation LimitThe memory reservation limit for the container, when setmegabytes
docker.mem.sw_in_useSwap Memory In UseThe fraction of used swap + memory to available swap + memory if the limit is set-
docker.mem.sw_limitSwap Memory LimitThe swap + memory limit for the container, when setmegabytes
docker.container.interface.traffic.inNetwork Rx Bytes per SecNetwork Rx Bytes per Secondbytes/second
docker.container.interface.traffic.outNetwork Tx Bytes per SecNetwork Tx Bytes per Secondbytes/second
docker.container.interface.packets.inNetwork Rx Packets per SecNetwork Rx Packets per Secondper second
docker.container.interface.packets.outNetwork Tx Packets per SecNetwork Tx Packets per Secondper second
docker.container.interface.errors.inNetwork Rx Errors per SecNetwork Rx Errors per Secondper second
docker.container.interface.errors.outNetwork Tx Errors per SecNetwork Tx Errors per Secondper second
docker.container.interface.discards.inNetwork Rx Drops per SecNetwork Rx Drops per Secondper second
docker.container.interface.discards.outNetwork Tx Drops per SecNetwork Tx Drops per Secondper second

ContainerD metrics

MetricsDisplay NameDescriptionUnits
containerd_hugetlb_failcntContainerD HugeTLB fail RateRate of allocation failure due to HugeTLB limit-
containerd_hugetlb_maxContainerD HugeTLB max usagemax hugepagesize hugetlb usage recordedbytes
containerd_hugetlb_usageContianerD HugeTLB usageCurrent usage for hugepagesize hugetlbbytes
containerd_memory_usageContinaerD Memory UsageMemory Usagebytes
containerd_memory_usage_failcntContainerD Memory Usage fail RateRate of number of times the cgroup limit exceeded-
containerd_memory_usage_limitContainerD Memory Usage Limitlimit of memory usagebytes
containerd_memory_usage_maxContainerD Memory Usage Maxshow maximum memory usage recordedbytes
containerd_memory_cacheContainerD Memory Cachebytes of page cache memorybytes
containerd_memory_rssContainerD Memory RSSbytes of anonymous and swap cache memory (includes transparent huge pages)bytes
containerd_memory_rss_hugeContainerD Memory RSS Hugebytes of anonymous transparent huge pagesbytes
containerd_memory_dirtyContainerD Memory Dirtybytes that are waiting to get written back to the diskbytes
containerd_memory_swap_usageContinaerD Swap Usageswap Usagebytes
containerd_memory_swap_failcntDisplContainerD Swap Usage fail RateRate of number of times the cgroup swap limit exceeded-
containerd_memory_swap_limitContainerD Swap Usage Limitlimit of swap usagebytes
containerd_memory_swap_maxContainerD Swap Usage Maxshow maximum swap usage recordedbytes
containerd_memory_kernel_usageContainerD Kernel Usage Namecurrent kernel memory allocationbytes
containerd_memory_kernel_failcntContainerD Kernel fail countrate of the number of kernel memory usage hits limits-
containerd_memory_kernel_limitContainerD Kernel Limithard limit for kernel memorybytes
containerd_memory_kernel_maxContainerD Kernel Maxmax kernel memory usage recordedbytes
containerd_memory_kernel_tcp_usageContainerD Kernel TCP Usagecurrent TCP buffer memory allocationbytes
containerd_memory_kernel_tcp_failcntContainerD Kernel TCP fail raterate of number of tcp buf memory usage hits limits-
containerd_memory_kernel_tcp_limitContainerD Kernel TCP Limitshow hard limit for TCP buffer memorybytes
containerd_memory_kernel_tcp_maxContainerD Kernel TCP Maxmaximum TCP buffer memory usage recordedbytes
containerd_cpu_throttling_throttledTimeContainerD CPU Throttled TimeCPU throttled timepercent
containerd_cpu_usage_systemContainerD CPU System Usagesystem CPU usage of container with repect to host systempercent
containerd_cpu_usage_totalContainerD CPU Total Usagetotal CPU usage of container with repect to host systempercent
containerd_cpu_usage_userContainerD CPU User Usageuser CPU usage of container with repect to host systempercent
containerd_blkio_service_bytes_recursiveContainerD BlkIO Service BytesNumber of bytes transferred to/from the diskbytes
containerd_blkio_serviced_recursiveContainerD BlkIO ServicedNumber of IOs (bio) issued to the disk by the groupbytes
containerd_blkio_queued_recursiveContainerD BlkIO QueuedTotal number of requests queued up at any given instant for the cgroupbytes
containerd_blkio_service_time_recursiveContainerD BlkIO Service TimeTotal amount of time between request dispatch and request completion for the IOsbytes
containerd_blkio_wait_time_recursiveContainerD BlkIO Wait TimeTotal amount of time the IOs for this cgroup spent waiting in the scheduler queues for servicebytes
containerd_blkio_merged_recursiveContainerD BlkIO MergedTotal number of bios/requests merged into requests belonging to this cgroupbytes
containerd_blkio_time_recursiveContainerD BlkIO Timedisk time allocated to cgroup per device in millisecondsbytes
containerd_blkio_sectors_recursiveContainerD BlkIO Sectorsnumber of sectors transferred to/from disk by the groupbytes
containerd_proc_open_fdsContainerD number of open fdNumber of open file descriptors-
containerd_container_uptimeContainerD Container UptimeUptime of the Current Containersecond
containerd_containers_runningContainerD Running ContainersTotal number of running containers-
containerd_containers_stoppedContainerD Stopped ContainersTotal number of Stopped Containers-
containerd_image_sizeContainerD Image SizeImage sizes of different container imagesbytes

CRI-O metrics

Click here to enable metrics endpoint.

MetricsDisplay NameDescriptionUnits
crio_operationsOperations CountCumulative number of CRI-O operations by operation type-
crio_operations_latency_microsecondsOperations Latency MicrosecondsLatency of CRI-O operations. Broken down by operation typemicroseconds
crio_operations_latency_microseconds_sumOperations Latency Microseconds SumLatency of CRI-O operations. Broken down by operation type. sum valuemicroseconds
crio_operations_latency_microseconds_countOperations Latency Microseconds CountLatency of CRI-O operations. Broken down by operation type. count valuemicroseconds
crio_operations_errorsOperations ErrorsCumulative number of CRI-O operation errors by operation type-
crio_image_pulls_by_digestImage Pulls by DigestBytes transferred by CRI-O image pulls by digest-
crio_image_pulls_by_nameImage Pulls by NameBytes transferred by CRI-O image pulls by name-
crio_image_pulls_by_name_skippedImage Pulls by Name SkippedBytes skipped by CRI-O image pulls by name-
crio_image_pulls_successesImage Pulls SuccessesSuccessful image pulls by image name-
crio_image_pulls_failuresImage Pulls FailuresFailed image pulls by image name and their error category-
crio_image_layer_reuseImage Layer ReuseReused (not pulled) local image layer count by name-
crio_cpu_timeCPU TimeTotal user and system CPU time spentseconds
crio_mem_residentMem ResidentResident memory sizebytes
crio_mem_virtualMem VirtualVirtual memory sizebytes
crio_process_open_fdsProcess Open FdsNumber of open file descriptors-
crio_cpu_usage_coreCPU UsageCumulative CPU usage (sum across all cores) since object creationnanoseconds
crio_memory_working_setMemory Working SetAmount of working set memorybytes
crio_filesystem_usedFilesystem UsedRepresents the bytes used for images on the filesystem. (This may differ from the total bytes used on the filesystem and may not equal CapacityBytes - AvailableBytes)bytes
crio_inodes_usedInodes UsedRepresents the inodes used by the images. (This may not equal InodesCapacity - InodesAvailable because the underlying filesystem may also be used for purposes other than storing images)-

OKD Kubelet metrics

MetricsDisplay NameDescriptionUnits
kube_pods_runningPods RunningThe number of running pods-
kube_containers_runningContainers RunningThe number of running containers-
kube_containers_restartsContainers RestartsThe number of times the container is restarted-
kube_cpu_load_10s_avgCpu Load 10S AvgContainer CPU load average over the last 10 seconds-
kube_cpu_system_totalCpu System TotalSystem CPU time consumed in secondsper second
kube_cpu_user_totalCpu User TotalUser cpu time consumed in secondsper second
kube_cpu_cfs_periodsCpu Cfs PeriodsNumber of elapsed enforcement period intervalsper second
kube_cpu_cfs_throttled_periodsCpu Cfs Throttled PeriodsNumber of throttled period intervalsper second
kube_cpu_cfs_throttled_secondsCpu Cfs Throttled SecondsTotal duration of the container being throttledper second
kube_node_cpu_capacityNode Cpu CapacityCPU capacity of Node (Plotted in Millicores)millicores
kube_node_memory_capacityNode Memory CapacityMemory capacity of node (Plotted in Megabytes)megabytes
kube_node_cpu_usage_percentageNode Cpu Usage PercentageCPU usage percentage of nodepercent
kube_node_memory_usage_percentageNode Memory Usage PercentageMemory usage percentage of nodepercent
kube_node_cpu_allocatableNode Cpu AllocatableCPU allocatable of nodemillicores
kube_node_memory_allocatableNode Memory AllocatableMemory allocatable of nodemegabytes
kube_node_cpu_usageNode Cpu UsageCPU usage of node (Plotted in Millicores)millicores
kube_node_memory_usageNode Memory UsageMemory usage of node (Plotted in Megabytes)megabytes
kube_cpu_usage_totalCpu Usage TotalCPU time consumed in secondsper second
kube_cpu_limitsCpu LimitsThe limit of CPU cores setmillicores
kube_cpu_requestsCpu RequestsThe requested CPU coresmillicores
kube_filesystem_usageFilesystem UsageNumber of megabytes that are consumed by the container on this filesystemmegabytes
kube_filesystem_usage_pctFilesystem Usage PctNumber of megabytes that can be consumed by the container on this filesystemFraction
kube_io_read_bytesIo Read BytesThe amount of bytes read from the diskbytes/second
kube_io_write_bytesIo Write BytesThe amount of bytes written to the diskbytes/second
kube_memory_limitsMemory LimitsMemory limit for the containermegabytes
kube_memory_sw_limitMemory Sw LimitMemory swap limit for the containerbytes
kube_memory_requestsMemory RequestsThe requested memorymegabytes
kube_memory_usageMemory UsageCurrent memory usage in bytes including all memory regardless of when it was accessedbytes
kube_memory_working_setMemory Working SetCurrent working set in megabytes, for which the OOM killer is watching formegabytes
kube_memory_cacheMemory CacheNumber of bytes of page cache memorybytes
kube_memory_rssMemory RssSize of RSS in bytesbytes
kube_memory_swapMemory SwapContainer swap usage in bytesbytes
kube_network_rx_bytesNetwork Rx BytesThe amount of bytes received per secondbytes/second
kube_network_rx_droppedNetwork Rx DroppedThe amount of Rx packets dropped per secondpackets/second
kube_network_rx_errorsNetwork Rx ErrorsThe amount of Rx errors per seconderrors/second
kube_network_tx_bytesNetwork Tx BytesThe number of bytes transmitted per secondbytes/second
kube_network_tx_droppedNetwork Tx DroppedThe amount of tx packets dropped per secondpackets/second
kube_network_tx_errorsNetwork Tx ErrorsThe amount of tx errors per seconderrors/second
kube_apiserver_certificate_expirationApiserver Certificate ExpirationAverage distribution of the remaining lifetime on the certificate used to authenticate a request since last poolseconds
kube_rest_client_requestsRest Client RequestsThe number of HTTP requestsoperations/second
kube_rest_client_latencyRest Client LatencyAverage Request latency in seconds. Broken down by verb and URL since last poolseconds
kube_kubelet_runtime_operationsKubelet Runtime OperationsThe number of runtime operationsoperations/second
kube_kubelet_runtime_errorsKubelet Runtime ErrorsThe number of runtime operations errorsoperations/second
kube_kubelet_network_plugin_latencyKubelet Network Plugin LatencyAverage latency in seconds of network plugin operations. Broken down by operation type since the last poolseconds
kube_kubelet_volume_stats_available_bytesKubelet Volume Stats Available BytesThe number of available bytes in the volumebytes
kube_kubelet_volume_stats_capacity_bytesKubelet Volume Stats Capacity BytesThe capacity in bytes of the volumebytes
kube_kubelet_volume_stats_used_bytesKubelet Volume Stats Used BytesThe number of used bytes in the volumebytes
kube_kubelet_volume_stats_inodesKubelet Volume Stats InodesThe maximum number of inodes in the volumeInode
kube_kubelet_volume_stats_inodes_freeKubelet Volume Stats Inodes FreeThe number of free inodes in the volumeInode
kube_kubelet_volume_stats_inodes_usedKubelet Volume Stats Inodes UsedThe number of used inodes in the volumeInode
kube_ephemeral_storage_usageEphemeral Storage UsageEphemeral storage usage of the PODmegabytes
kube_kubelet_evictionsKubelet EvictionsThe number of pods that have been evicted from the kubelet (ALPHA in kubernetes v1.16)-
kube_kubelet_cpu_usageKubelet Cpu UsageThe number of cores used by kubeletmillicores
kube_kubelet_memory_rssKubelet Memory RssSize of kubelet RSS in megabytesmegabytes
kube_runtime_cpu_usageRuntime Cpu UsageThe number of cores used by the runtimemillicores
kube_runtime_memory_rssRuntime Memory RssSize of runtime RSSmegabytes
kube_kubelet_container_log_filesystem_used_bytesKubelet Container Log Filesystem Used BytesBytes used by the container's logs on the filesystem (requires kubernetes 1.14+)bytes

OKD Kube State metrics

MetricsDisplay NameDescriptionUnits
kubernetes_state.container.cpu_limitContainer Cpu LimitThe limit on CPU cores to be used by a containercpu
kubernetes_state.container.cpu_requestedContainer Cpu RequestedThe number of requested CPU cores by a containercpu
kubernetes_state.container.memory_limitContainer Memory LimitThe limit on memory to be used by a containerbytes
kubernetes_state.container.memory_requestedContainer Memory RequestedThe number of requested memory bytes by a containerbytes
kubernetes_state.container.readyContainer ReadyDescribes whether the containers readiness check succeeded-
kubernetes_state.container.ready.totalTotal Containers ReadyTotal containers whose readiness check succeeded-
kubernetes_state.container.restartsContainer RestartsThe number of restarts per container-
kubernetes_state.container.restarts.totalTotal Containers Restarts CountTotal containers restarts count-
kubernetes_state.container.runningContainer RunningDescribes whether the container is currently in running state-
kubernetes_state.container.running.totalTotal Containers RunningTotal containers currently in running state-
kubernetes_state.container.terminatedContainer TerminatedDescribes whether the container is currently in terminated state-
kubernetes_state.container.terminated.totalTotal Containers TerminatedTotal containers currently in terminated state-
kubernetes_state.container.waitingContainer WaitingWhether the container is currently in waiting state-
kubernetes_state.container.waiting.totalTotal Containers WaitingTotal containers currently in waiting state-
kubernetes_state.daemonset.desiredDaemonset DesiredThe number of nodes that should be running the daemon pod-
kubernetes_state.daemonset.misscheduledDaemonset MisscheduledThe number of nodes running a daemon pod but are not expected to-
kubernetes_state.daemonset.readyDaemonset ReadyThe number of nodes that should be running the daemon pod and have one or more of the daemon pods running and ready-
kubernetes_state.daemonset.scheduledDaemonset ScheduledThe number of nodes running at least one daemon pod as expected-
kubernetes_state.deployment.pausedDeployment PausedThe deployment is paused and will not be processed by the deployment controller-
kubernetes_state.deployment.replicasDeployment ReplicasThe number of replicas per deployment-
kubernetes_state.deployment.replicas_availableDeployment Replicas AvailableThe number of available replicas per deployment-
kubernetes_state.deployment.replicas_desiredDeployment Replicas DesiredThe number of desired replicas per deployment-
kubernetes_state.deployment.replicas_unavailableDeployment Replicas UnavailableThe number of unavailable replicas per deployment-
kubernetes_state.deployment.replicas_updatedDeployment Replicas UpdatedThe number of updated replicas per deployment-
kubernetes_state.deployment.rollingupdate.max_unavailableDeployment Rollingupdate Max UnavailableMaximum number of unavailable replicas during a rolling update of a deployment-
kubernetes_state.node.cpu_allocatableNode Cpu AllocatableThe CPU resources of a node that are available for scheduling-
kubernetes_state.node.cpu_capacityNode Cpu CapacityThe total CPU resources of the nodecpu
kubernetes_state.node.memory_allocatableNode Memory AllocatableThe memory resources of a node that are available for schedulingbytes
kubernetes_state.node.memory_capacityNode Memory CapacityThe total memory resources of the nodebytes
kubernetes_state.node.pods_allocatableNode Pods AllocatableThe pod resources of a node that are available for scheduling-
kubernetes_state.node.pods_capacityNode Pods CapacityThe total pod resources of the node-
kubernetes_state.node.statusNode StatusThe condition of a cluster node plotted with node as an instance. This metric gives status of each node with values either 0 or 1.-
kubernetes_state.pod.readyPod ReadyDescribes whether the pod is ready to serve requests. In association with the condition tag, whether the pod is ready to serve requests. For example, condition:true keeps the pods that are in a ready state-
kubernetes_state.pod.scheduledPod ScheduledDescribes the status of the scheduling process for the pod-
kubernetes_state.replicaset.fully_labeled_replicasReplicaset Fully Labeled ReplicasThe number of fully labeled replicas per ReplicaSet-
kubernetes_state.replicaset.replicasReplicaset ReplicasThe number of replicas per ReplicaSet-
kubernetes_state.replicaset.replicas_desiredReplicaset Replicas DesiredNumber of desired pods for a ReplicaSet-
kubernetes_state.replicaset.replicas_readyReplicaset Replicas ReadyThe number of ready replicas per ReplicaSet-
kubernetes_state.resourcequota.limits.cpu.limitResourcequota Limits Cpu LimitHard limit on the sum of CPU core limits for a resource quotacpu
kubernetes_state.resourcequota.limits.cpu.usedResourcequota Limits Cpu UsedObserved sum of limits for CPU cores for a resource quotacpu
kubernetes_state.resourcequota.limits.memory.limitResourcequota Limits Memory LimitHard limit on the sum of memory bytes limits for a resource quotabytes
kubernetes_state.resourcequota.limits.memory.usedResourcequota Limits Memory UsedObserved sum of limits for memory bytes for a resource quotabytes
kubernetes_state.resourcequota.persistentvolumeclaims.limitResourcequota Persistentvolumeclaims LimitHard limit of the number of PVC for a resource quota-
kubernetes_state.resourcequota.persistentvolumeclaims.usedResourcequota Persistentvolumeclaims UsedObserved number of persistent volume claims used for a resource quota-
kubernetes_state.resourcequota.pods.limitResourcequota Pods LimitHard limit of the number of pods for a resource quota-
kubernetes_state.resourcequota.pods.usedResourcequota Pods UsedObserved number of pods used for a resource quota-
kubernetes_state.resourcequota.requests.cpu.limitResourcequota Requests Cpu LimitHard limit on the total of CPU core requested for a resource quotacpu
kubernetes_state.resourcequota.requests.cpu.usedResourcequota Requests Cpu UsedObserved sum of CPU cores requested for a resource quotacpu
kubernetes_state.resourcequota.requests.memory.limitResourcequota Requests Memory LimitHard limit on the total of memory bytes requested for a resource quotabytes
kubernetes_state.resourcequota.requests.memory.usedResourcequota Requests Memory UsedObserved sum of memory bytes requested for a resource quotabytes
kubernetes_state.resourcequota.requests.storage.limitResourcequota Requests Storage LimitHard limit on the total of storage bytes requested for a resource quotabytes
kubernetes_state.resourcequota.requests.storage.usedResourcequota Requests Storage UsedObserved sum of storage bytes requested for a resource quotabytes
kubernetes_state.resourcequota.services.limitResourcequota Services LimitHard limit of the number of services for a resource quota-
kubernetes_state.resourcequota.services.loadbalancers.limitResourcequota Services Loadbalancers LimitHard limit of the number of load balancers for a resource quota-
kubernetes_state.resourcequota.services.loadbalancers.usedResourcequota Services Loadbalancers UsedObserved number of load balancers used for a resource quota-
kubernetes_state.resourcequota.services.nodeports.limitResourcequota Services Nodeports LimitHard limit of the number of node ports for a resource quota-
kubernetes_state.resourcequota.services.nodeports.usedResourcequota Services Nodeports UsedObserved number of node ports used for a resource quota-
kubernetes_state.resourcequota.services.usedResourcequota Services UsedObserved number of services used for a resource quota-

OKD CoreDNS metrics

MetricsDisplay NameDescriptionUnits
coredns.panicsTotal PanicsTotal number of panics-
coredns.query.countQuery countTotal query count-
coredns.request_duration.seconds.sumRequest Duration Seconds SumDuration to process each query-
coredns.request_duration.seconds.countRequest Duration Seconds CountDuration per upstream interaction-
coredns.response_size.bytes.sumResponse Size Bytes SumSize of the returns responsebytes

Note: CoreDNS is supported in the later versions of Kubernetes 1.21.

OKD KubeDNS metrics

MetricsDisplay NameDescriptionUnits
kubedns.cachemiss_countCachemiss CountNumber of DNS cache misses (from start of process)-
kubedns.error_countError CountNumber of DNS requests resulting in an error-
kubedns.request_countRequest CountTotal number of DNS requests made-
kubedns.request_duration.seconds.countRequest Duration Seconds CountNumber of requests on which the kubedns.request_duration.seconds.sum metric is evaluated-
kubedns.request_duration.seconds.sumRequest Duration Seconds SumTime (in seconds) taken to resolve each request-
kubedns.response_size.bytes.countResponse Size Bytes CountNumber of responses on which the kubedns.response_size.bytes.sum metric is evaluated-
kubedns.response_size.bytes.sumResponse Size Bytes SumSize of the returns response in bytesbytes

Note: KubeDNS is supported prior to Kubernetes version 1.21.

OKD Kube Controller metrics

MetricsDisplay NameDescriptionUnits
controller.workqueue.work_duration.sumKube Controller Workqueue Work Duration Seconds SumDuration taken in seconds to process an item from workqueueseconds
controller.workqueue.work_duration.countKube Controller Workqueue Work Duration Seconds CountTotal time taken in seconds to process an item from workqueueseconds
controller.workqueue.work_unfinished_durationKube Controller Workqueue Unfinished Work SecondsTime in seconds taken for the work in progress and has not been observed by work_duration. Large values indicate stuck threadsseconds
controller.workqueue.work_longest_durationKube Controller Workqueue Longest Running Processor SecondsTime in seconds for which the longest running processor for workqueue is running-
controller.workqueue.queue_duration.sumKube Controller Workqueue Queue Duration Seconds SumDuration in seconds for whichan item remains in workqueue before being requested-
controller.workqueue.queue_duration.countKube Controller Workqueue Queue Duration Seconds CountTotal duration in seconds for which an item remains in workqueue before being requested-
controller.workqueue.nodes.countKube Controller Registered NodesNumber of registered Nodes per zone-
controller.workqueue.nodes.unhealthyKube Controller Node Collector Unhealthy Nodes in ZoneNumber of Nodes not ready per zone-
controller.workqueue.nodes.evictionsKube Controller Node Collector Evictions NumberNumber of Node evictions that happened since current instance of NodeController started-
controller.workqueue.depthKube Controller Workqueue DepthCurrent depth of workqueue-
controller.workqueue.addsKube Controller Workqueue Adds TotalTotal number of additions/insertions handled by workqueue-
controller.workqueue.retriesKube Controller Workqueue Retries TotalTotal number of retries handled by workqueue-
controller.rate_limiter.useKube Controller Node Lifecycle Controller Rate Limiter UseA metric measuring the saturation of the rate limiter for node_lifecycle_controller-
controller.go.goroutinesKube Controller Go GoroutinesNumber of goroutines that currently exist-
controller.threadsKube Controller Os ThreadsNumber of OS threads created-
controller.process.max_fdsKube Controller Process Max FdsMaximum number of open file descriptors-
controller.process.open_fdsKube Controller Process Open FdsNumber of open file descriptors-

OKD Kube Scheduler metrics

MetricsDisplay NameDescriptionUnits
scheduler.binding.duration.countKube Scheduler Binding Duration Seconds CountTotal Binding duration in secondsseconds
scheduler.binding.duration.secondsKube Scheduler Binding Duration Seconds SumBinding durationseconds
scheduler.binding.latency.countKube Scheduler Binding Latency Microseconds CountTotal Binding latencymicroseconds
scheduler.binding.latency.sumKube Scheduler Binding Latency MicrosecondsBinding latency summicroseconds
scheduler.cache.lookupsKube Scheduler Equiv Cache Lookups TotalTotal number of equivalent cache lookups, by whether a cache entry was found-
scheduler.client.http.requestsKube Scheduler Rest Client Requests TotalNumber of HTTP requests, partitioned by status code, method, and host-
scheduler.client.http.requests_duration.countKube Scheduler Rest Client Request Latency Seconds CountTotal request latency. Broken down by verb and URLseconds
scheduler.client.http.requests_duration.sumKube Scheduler Rest Client Request Latency Seconds SumRequest latency. Broken down by verb and URLseconds
scheduler.gc_duration_seconds.countKube Scheduler Go GC Duration Seconds CountA summary of the GC invocation durations-
scheduler.gc_duration_seconds.quantileKube Scheduler Go GC Duration SecondsA summary of the GC invocation durations-
scheduler.gc_duration_seconds.sumKube Scheduler Go GC Duration Seconds SumA summary of the GC invocation durations-
scheduler.go.goroutinesKube Scheduler Go GoroutinesNumber of goroutines that currently exist-
scheduler.process.max_fdsKube Scheduler Process Max FdsMaximum number of open file descriptors-
scheduler.process.open_fdsKube Scheduler Process Open FdsNumber of open file descriptors-
scheduler.pod_preemption.victimsKube Scheduler Pod Preemption VictimsNumber of selected preemption victims-
scheduler.pod_preemption.attemptsKube Scheduler Total Preemption AttemptsTotal preemption attempts in the cluster till now-
scheduler.schedule_attempts.totalKube Scheduler Schedule Attempts TotalNumber of attempts to schedule pods, by the result. unschedulable means a pod could not be scheduled, and error means an internal scheduler problem-
scheduler.scheduling.algorithm_duration.countKube Scheduler Scheduling Algorithm Duration Seconds CountTotal Scheduling algorithm latencyseconds
scheduler.scheduling.algorithm_duration.sumKube Scheduler Scheduling Algorithm Duration Seconds SumScheduling algorithm latencyseconds
scheduler.scheduling.algorithm_latency.countKube Scheduler Scheduling Algorithm Latency Microseconds CountTotal Scheduling algorithm latencymicroseconds
scheduler.scheduling.algorithm_latency.sumKube Scheduler Scheduling Algorithm Latency Microseconds SumScheduling algorithm latencymicroseconds
scheduler.scheduling.algorithm.predicate_duration.countKube Scheduler Scheduling Algorithm Predicate Evaluation CountScheduling algorithm predicate evaluation duration-
scheduler.scheduling.algorithm.predicate_duration.sumKube Scheduler Scheduling Algorithm Predicate Evaluation SumScheduling algorithm predicate evaluation duration-
scheduler.scheduling.algorithm.preemption_duration.countKube Scheduler Scheduling Algorithm Preemption Evaluation CountScheduling algorithm preemption evaluation duration-
scheduler.scheduling.algorithm.preemption_duration.sumKube Scheduler Scheduling Algorithm Preemption Evaluation SumScheduling algorithm preemption evaluation duration-
scheduler.scheduling.algorithm.priority_duration.countKube Scheduler Scheduling Algorithm Priority Evaluation CountScheduling algorithm priority evaluation duration-
scheduler.scheduling.algorithm.priority_duration.sumKube Scheduler Scheduling Algorithm Priority Evaluation SumScheduling algorithm priority evaluation duration-
scheduler.e2e.scheduling_duration.countKube Scheduler E2E Scheduling Duration Seconds CountTotal E2e scheduling latency (scheduling algorithm + binding)seconds
scheduler.e2e.scheduling_duration.sumKube Scheduler E2E Scheduling Duration Seconds SumE2e scheduling latency (scheduling algorithm + binding)seconds
scheduler.e2e.scheduling_latency.countKube Scheduler E2E Scheduling Latency Microseconds CountTotal E2e scheduling latency (scheduling algorithm + binding)microseconds
scheduler.e2e.scheduling_latency.sumKube Scheduler E2E Scheduling Latency Microseconds SumE2e scheduling latency (scheduling algorithm + binding)microseconds
scheduler.scheduling.scheduling_duration.countKube Scheduler Scheduling Duration Seconds CountScheduling latency split by sub-parts of the scheduling operationseconds
scheduler.scheduling.scheduling_duration.quantileKube Scheduler Scheduling Duration SecondsScheduling latency split by sub-parts of the scheduling operationseconds
scheduler.scheduling.scheduling_duration.sumKube Scheduler Scheduling Duration Seconds SumScheduling latency split by sub-parts of the scheduling operationseconds
scheduler.scheduling.scheduling_latency.countKube Scheduler Scheduling Latency Seconds CountScheduling latency split by sub-parts of the scheduling operationseconds
scheduler.scheduling.scheduling_latency.quantileKube Scheduler Scheduling Latency SecondsScheduling latency split by sub-parts of the scheduling operationseconds
scheduler.scheduling.scheduling_latency.sumKube Scheduler Scheduling Latency Seconds SumScheduling latency split by sub-parts of the scheduling operationseconds
scheduler.threadsKube Scheduler OS ThreadsNumber of OS threads created-
scheduler.volume_scheduling_duration.sumscheduler.volume_scheduling_duration.sum Kube Scheduler Volume Scheduling Duration Seconds SumVolume scheduling stage latency sum-
scheduler.volume_scheduling_duration.countKube Scheduler Volume Scheduling Duration Seconds CountVolume scheduling stage latency count-

OKD Server metrics

MetricsDisplay NameDescriptionUnits
metrics_server.go_gc_duration_seconds_sumGo GC Duration Seconds SumA summary of the GC invocation durationsseconds
metrics_server.authenticated_user_requestsAuthenticated User RequestsCounter of authenticated requests broken out by username-
metrics_server.go_goroutinesGo GoroutinesNumber of goroutines that currently exist-
metrics_server.manager_tick_duration_sumManager Tick Duration SumThe total time spent collecting and storing metricsseconds
metrics_server.scraper_duration_countScraper Duration CountTime spent scraping sourcesseconds
metrics_server.scraper_duration_sumScraper Duration SumTime spent scraping sourcesseconds
metrics_server.scraper_last_timeScraper Last TimeLast time metrics-server performed a scrape since unix epochseconds
metrics_server.go_gc_duration_seconds_quantileGo GC Duration Seconds QuantileA summary of the GC invocation durationsseconds
metrics_server.kubelet_summary_request_duration_sumKubelet Summary Request Duration SumThe Kubelet summary request latenciesseconds
metrics_server.kubelet_summary_scrapes_totalKubelet Summary Scrapes TotalTotal number of attempted Summary API scrapes done by Metrics Server-
metrics_server.manager_tick_duration_countManager Tick Duration CountThe total time spent collecting and storing metricsseconds
metrics_server.process_max_fdsProcess Max FdsMaximum number of open file descriptors-
metrics_server.process_open_fdsProcess Open FdsNumber of open file descriptors-
metrics_server.go_gc_duration_seconds_countGo GC Duration Seconds CountA summary of the GC invocation durations-
metrics_server.kubelet_summary_request_duration_countKubelet Summary Request Duration CountThe Kubelet summary request latenciesseconds
metrics_server.process_cpu_seconds_totalProcess Cpu Seconds TotalTotal user and system CPU time spentseconds

OKD API Server metrics

MetricsDisplay NameDescriptionUnits
apiserver.go.threads.totalKube apiserver Go Threads TotalNumber of OS threads created-
apiserver.authenticated.user.requestsKube apiserver Authenticated User RequestsCounter of authenticated requests broken out by username-
apiserver.http.requests.total.countKube apiserver HTTP Requests Total CountTotal number of HTTP requests made-
apiserver.authenticated.user.requests.countKube apiserver Authenticated User Requests CountCounter of authenticated requests broken out by username-
apiserver.dropped.requests.totalKube apiserver Dropped Requests TotalAccumulated number of requests dropped with Try-again-later response-
apiserver.http.requests.totalKube apiserver HTTP Requests TotalTotal number of HTTP requests made-
apiserver.audit.event.totalKube apiserver Audit Event TotalCounter of audit events generated and sent to the audit back end-
apiserver.rest.client.requests.totalKube apiserver Rest Client Requests TotalNumber of HTTP requests, partitioned by status code, method, and host-
apiserver.request.countKube apiserver Request CountCounter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code-
apiserver.request.count.countKube apiserver Request Count CountCounter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code-
apiserver.dropped.requests.total.countKube apiserver Dropped Requests Total CountMonotonic count of requests dropped with Try-again-later response-
apiserver.inflight.requestsKube apiserver Inflight RequestsMaximal number of currently used inflight request limit of this API server per request kind in the last second-
apiserver.go.goroutinesKube apiserver GoroutinesNumber of goroutines that currently exist-
apiserver.APIServiceRegistrationController.depthKube apiserver APIService Registration Controller DepthCurrent depth of workqueue: APIServiceRegistrationController-
apiserver.etcd.object.countsKube apiserver ETCD Object CountsNumber of stored objects at the time of last check split by kind-
apiserver.rest.client.requests.total.countKube apiserver Rest Client Requests Total CountNumber of HTTP requests, partitioned by status code, method, and host-