- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
In Observability Pipelines, your pipelines are comprised of components that collect, process, and route your observability data. The health of your pipelines and components are indicated by health statuses and graphs, as well as resource utilization and data delivery graphs.
Health statuses are determined by specific metrics based on thresholds and default time windows. The available statuses are as follows:
Healthy
: Indicates the Worker is not falling behind.Warning
: Indicates the Worker is not performing optimally and is at risk of falling behind. The Worker may fall behind due to issues such as a downstream destination or service causing back pressure to build up and there not being enough resources provisioned for the Workers.Critical
: Indicates that the Worker is falling behind. If the Worker is falling behind, it may be at risk of dropping data; however, the Worker will not drop data unintentionally as long as your pipelines are architected and configured correctly.Internal metrics, which are grouped by health, data delivery, and resource utilization, drives the overall health status of your pipeline and its components.
Health graphs are available for the following metrics:
Data delivery graphs are available for the following metrics:
Resource utilization graphs are available for the following metrics:
Metric | OK | Warning | Critical | Description |
---|---|---|---|---|
CPU usage | <= 0.85 | > 0.85 | N/A | Tracks how much CPU a Worker process is using. A value of 1 indicates that a Worker process does not have any more headroom in the host or compute units running it. This can lead to possible issues such as processing latency going out of bounds, upstream/downstream overload, and so on. |
Memory usage | >= 0.15 | < 0.15 | N/A | Tracks the amount of used and free memory on the host. The Worker is not memory bound but high memory usage can indicate leaks. |
Metric | Sources | Transforms | Destinations | OK | Warning | Critical | Description |
---|---|---|---|---|---|---|---|
Events dropped | ==0 | N/A | > 0 | Expected to always be 0 . If you configured the Worker to intentionally drop data, for example using the filter transform, that data is not counted here. Therefore, a single error indicates that the Worker is not in a healthy state. | |||
Total errors | ==0 | >0 | N/A | The total number of errors encountered by the component. These errors are also emitted as Diagnostic Logs, which provides more information about specific internal error logs. | |||
Utilization | <=0.95 | >0.95 | N/A | Tracks the component’s activity. A value of 0 indicates an idle component that is waiting for input. A value of 1 indicates a component that is never idle. A value greater than 0.95 indicates that the component is busy and likely a bottleneck in the processing topology. | |||
Lag time | N/A | N/A | N/A | This is the raw time difference (in milliseconds) between the timestamp on the event and the timestamp of when the event was ingested by the Worker. High lag time or a change in the lag time (see below) is an indicator of whether the Worker is falling behind due to back pressure from a downstream service, lack of resources provisioned to the Worker, or a bottleneck in the pipeline. | |||
Lag time rate of change | <=0 | >0 | >1 | Indicates whether there is a substantial delay between when the event is generated and when the Worker receives the data. If there is a delay, then the Worker is falling behind in receiving data from the source. A value of 0 indicates there is no additional lag from when the observability data is generated and when the Worker receives the data. A value equal to or greater than 1 indicates that there is backpressure and a bottleneck. | |||
Disk usage | >=0.20 | > 0.20 | N/A | Measures how full a given disk is. A value of 1 indicates that no data can be stored in the disk. A value of 0 indicates that the disk is empty. |