Using UCP cluster metrics with Prometheus

Estimated reading time: 5 minutes

UCP metrics

The following table lists the metrics that UCP exposes in Prometheus, along with descriptions. Note that only the metrics labeled with ucp_ are documented. Other metrics are exposed in Prometheus but are not documented.

Name Units Description Labels Metric source
ucp_controller_services number of services The total number of Swarm services   Controller
ucp_engine_container_cpu_percent percentage The percentage of CPU time this container is using. container labels Node
ucp_engine_container_cpu_total_time_nanoseconds nanoseconds Total CPU time used by this container in nanoseconds container labels Node
ucp_engine_container_health 0.0 or 1.0 Whether or not this container is healthy, according to its healthcheck. Note that if this value is 0, it just means that the container is not reporting healthy; it might not have a healthcheck defined at all, or its healthcheck might not have returned any results yet container labels Node
ucp_engine_container_memory_max_usage_bytes bytes Maximum memory used by this container in bytes container labels Node
ucp_engine_container_memory_usage_bytes bytes Current memory used by this container in bytes container labels Node
ucp_engine_container_memory_usage_percent percentage Percentage of total node memory currently being used by this container container labels Node
ucp_engine_container_network_rx_bytes_total bytes Number of bytes received by this container on this network in the last sample container networking labels Node
ucp_engine_container_network_rx_dropped_packets_total number of packets Number of packets bound for this container on this network that were dropped in the last sample container networking labels Node
ucp_engine_container_network_rx_errors_total number of errors Number of received network errors for this container on this network in the last sample container networking labels Node
ucp_engine_container_network_rx_packets_total number of packets Number of received packets for this container on this network in the last sample container networking labels Node
ucp_engine_container_network_tx_bytes_total bytes Number of bytes sent by this container on this network in the last sample container networking labels Node
ucp_engine_container_network_tx_dropped_packets_total number of packets Number of packets sent from this container on this network that were dropped in the last sample container networking labels Node
ucp_engine_container_network_tx_errors_total number of errors Number of sent network errors for this container on this network in the last sample container networking labels Node
ucp_engine_container_network_tx_packets_total number of packets Number of sent packets for this container on this network in the last sample container networking labels Node
ucp_engine_container_unhealth 0.0 or 1.0 Whether or not this container is unhealthy, according to its healthcheck. Note that if this value is 0, it just means that the container is not reporting unhealthy; it might not have a healthcheck defined at all, or its healthcheck might not have returned any results yet container labels Node
ucp_engine_containers number of containers Total number of containers on this node node labels Node
ucp_engine_cpu_total_time_nanoseconds nanoseconds System CPU time used by this container in nanoseconds container labels Node
ucp_engine_disk_free_bytes bytes Free disk space on the Docker root directory on this node in bytes. Note that this metric is not available for Windows nodes node labels Node
ucp_engine_disk_total_bytes bytes Total disk space on the Docker root directory on this node in bytes. Note that this metric is not available for Windows nodes node labels Node
ucp_engine_images number of images Total number of images on this node node labels Node
ucp_engine_memory_total_bytes bytes Total amount of memory on this node in bytes node labels Node
ucp_engine_networks number of networks Total number of networks on this node node labels Node
ucp_engine_node_health 0.0 or 1.0 Whether or not this node is healthy, as determined by UCP nodeName: node name, nodeAddr: node IP address Controller
ucp_engine_num_cpu_cores number of cores Number of CPU cores on this node node labels Node
ucp_engine_pod_container_ready 0.0 or 1.0 Whether or not this container in a Kubernetes pod is ready, as determined by its readiness probe. pod labels Controller
ucp_engine_pod_ready 0.0 or 1.0 Whether or not this Kubernetes pod is ready, as determined by its readiness probe. pod labels Controller
ucp_engine_volumes number of volumes Total number of volumes on this node node labels Node

Metrics labels

Metrics exposed by UCP in Prometheus have standardized labels, depending on the resource that they are measuring. The following table lists some of the labels that are used, along with their values:

Container labels

Label name Value
collection The collection ID of the collection this container is in, if any
container The ID of this container
image The name of this container’s image
manager “true” if the container’s node is a UCP manager, “false” otherwise
name The name of the container
podName If this container is part of a Kubernetes pod, this is the pod’s name
podNamespace If this container is part of a Kubernetes pod, this is the pod’s namespace
podContainerName If this container is part of a Kubernetes pod, this is the container’s name in the pod spec
service If this container is part of a Swarm service, this is the service ID
stack If this container is part of a Docker compose stack, this is the name of the stack

Container networking labels

The following metrics measure network activity for a given network attached to a given container. They have the same labels as Container labels, with one addition:

Label name Value
network The ID of the network

Node labels

Label name Value
manager “true” if the node is a UCP manager, “false” otherwise

Metric source

UCP exports metrics on every node and also exports additional metrics from every controller. The metrics that are exported from controllers are cluster-scoped, for example, the total number of Swarm services. Metrics that are exported from nodes are specific to those nodes, for example, the total memory on that node.

prometheus, metrics, ucp