Para recopilar métricas de sistema como el uso de CPU, disco y memoria, activa el receptor de métricas de host en tu Collector.
Para obtener más información, incluidos los sistemas operativos compatibles, consulta la documentación del proyecto de OpenTelemetry para el receptor de métricas de host.
Configuración
Añade las siguientes líneas a tu configuración de Collector:
receivers:
hostmetrics:
collection_interval: 10s
scrapers:
paging:
metrics:
system.paging.utilization:
enabled: true
cpu:
metrics:
system.cpu.utilization:
enabled: true
disk:
filesystem:
metrics:
system.filesystem.utilization:
enabled: true
load:
memory:
network:
processes:
Configura el receptor de métricas de host en cada nodo del que se necesite recopilar métricas. Para recopilar métricas de host de cada nodo en tu clúster, despliega el receptor de métricas de host como un DaemonSet Collector. Añade lo siguiente en la configuración de Collector:
receivers:
hostmetrics:
collection_interval: 10s
scrapers:
paging:
metrics:
system.paging.utilization:
enabled: true
cpu:
metrics:
system.cpu.utilization:
enabled: true
system.cpu.physical.count:
enabled: true
system.cpu.logical.count:
enabled: true
system.cpu.frequency:
enabled: true
disk:
filesystem:
metrics:
system.filesystem.utilization:
enabled: true
load:
memory:
network:
processes:
Datos recopilados
Las métricas de hosts son recopiladas por el receptor de métricas de host. Para obtener información sobre la configuración del receptor, consulta Exportador de Datadog recopilador de OpenTelemetry.
La métricas, asignadas a métricas de Datadog, se utilizan en las siguientes vistas:
Nota: Para correlacionar métricas de trazas y hosts, configura atributos universales de monitorización de servicios para cada servicio y define el atributo de recurso host.name
en el host subyacente correspondiente para instancias del servicio y del recopilador.
La siguiente tabla muestra qué nombres de métrica de host de Datadog están asociados a los correspondientes nombres de métrica de host de OpenTelemetry y, si procede, qué matemática se aplica a la métrica de host de OTel para transformarla en unidades de Datadog durante la asignación.
OTEL | DATADOG | DESCRIPTION | FILTER | TRANSFORM |
---|
system.cpu.load_average.15m | system.load.15 | Average CPU Load over 15 minutes. | | |
system.cpu.load_average.1m | system.load.1 | Average CPU Load over 1 minute. | | |
system.cpu.load_average.5m | system.load.5 | Average CPU Load over 5 minutes. | | |
system.cpu.utilization | system.cpu.idle | Difference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]). | state : idle | × 100 |
system.cpu.utilization | system.cpu.iowait | Difference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]). | state : wait | × 100 |
system.cpu.utilization | system.cpu.stolen | Difference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]). | state : steal | × 100 |
system.cpu.utilization | system.cpu.system | Difference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]). | state : system | × 100 |
system.cpu.utilization | system.cpu.user | Difference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]). | state : user | × 100 |
system.filesystem.utilization | system.disk.in_use | Fraction of filesystem bytes used. | | |
system.filesystem.utilization | system.disk.in_use | Fraction of filesystem bytes used. | | |
system.memory.usage | system.mem.total | Bytes of memory in use. | | × 1048576 |
system.memory.usage | system.mem.usable | Bytes of memory in use. | state : free, cached, buffered | × 1048576 |
system.network.io | system.net.bytes_rcvd | The number of bytes transmitted and received. | direction : receive | |
system.network.io | system.net.bytes_sent | The number of bytes transmitted and received. | direction : transmit | |
system.paging.usage | system.swap.free | Swap (unix) or pagefile (windows) usage. | state : free | × 1048576 |
system.paging.usage | system.swap.used | Swap (unix) or pagefile (windows) usage. | state : used | × 1048576 |
Para obtener más información, consulta Asignación de métricas de OpenTelemetry.
Ejemplo completo de configuración
Para ver un ejemplo completo de configuración en funcionamiento con el exportador de Datadog, consulta host-metrics.yaml
.
Ejemplo de salida de registro
ResourceMetrics #1
Resource SchemaURL: https://opentelemetry.io/schemas/1.9.0
Resource attributes:
-> k8s.pod.ip: Str(192.168.63.232)
-> cloud.provider: Str(aws)
-> cloud.platform: Str(aws_ec2)
-> cloud.region: Str(us-east-1)
-> cloud.account.id: Str(XXXXXXXXX)
-> cloud.availability_zone: Str(us-east-1c)
-> host.id: Str(i-07e7d48cedbec9e86)
-> host.image.id: Str(ami-0cbbb5a8c6f670bb6)
-> host.type: Str(m5.large)
-> host.name: Str(ip-192-168-49-157.ec2.internal)
-> os.type: Str(linux)
-> kube_app_instance: Str(opentelemetry-collector-gateway)
-> k8s.pod.name: Str(opentelemetry-collector-gateway-688585b95-l2lds)
-> k8s.pod.uid: Str(d8063a97-f48f-4e9e-b180-8c78a56d0a37)
-> k8s.replicaset.uid: Str(9e2d5331-f763-43a3-b0be-9d89c0eaf0cd)
-> k8s.replicaset.name: Str(opentelemetry-collector-gateway-688585b95)
-> k8s.deployment.name: Str(opentelemetry-collector-gateway)
-> kube_app_name: Str(opentelemetry-collector)
-> k8s.namespace.name: Str(otel-ds-gateway)
-> k8s.pod.start_time: Str(2023-11-20T12:53:08Z)
-> k8s.node.name: Str(ip-192-168-49-157.ec2.internal)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope otelcol/hostmetricsreceiver/memory 0.88.0-dev
Metric #0
Descriptor:
-> Name: system.memory.usage
-> Description: Bytes of memory in use.
-> Unit: By
-> DataType: Sum
-> IsMonotonic: false
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> state: Str(used)
StartTimestamp: 2023-08-21 13:45:37 +0000 UTC
Timestamp: 2023-11-20 13:04:19.489045896 +0000 UTC
Value: 1153183744