- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Datadog Agent는 DogStatsD를 사용하여 Druid로부터 메트릭을 수집합니다. DogStatsD는 Druid 쿼리, 수집 및 조정 데이터에 대한 메트릭을 수집합니다. 자세한 내용은 Druid 메트릭 문서를 참조하세요.
메트릭을 수집하는 것 외에 Agent는 Druid의 상태와 관련된 Service Check도 보냅니다.
이 통합이 제대로 작동하려면 Druid 0.16 이상이 필요합니다.
Druid 통합이 제대로 작동하려면 아래 두 단계가 모두 필요합니다. 시작하기 전에 Datadog Agent를 설치해야 합니다.
Datadog Agent 패키지에 포함된 Druid 검사를 구성하여 상태 메트릭 및 서비스 점검을 수집합니다.
conf.d/
폴더에서 druid.d/conf.yaml
파일을 편집합니다. 사용 가능한 모든 구성 옵션은 샘플 druid.d/conf.yaml을 참조하세요.statsd-emitter
을 사용하여 메트릭 수집을 위해 Druid를 DogStatsD(Datadog Agent에 포함됨)에 연결합니다.대부분의 Druid 메트릭을 수집하도록 확장 프로그램 statsd-emitter
을 구성하는 단계입니다.
Druid 확장 프로그램 statsd-emitter
을 설치합니다.
$ java \
-cp "lib/*" \
-Ddruid.extensions.directory="./extensions" \
-Ddruid.extensions.hadoopDependenciesDir="hadoop-dependencies" \
org.apache.druid.cli.Main tools pull-deps \
--no-default-hadoop \
-c "org.apache.druid.extensions.contrib:statsd-emitter:0.15.0-incubating"
이 단계에 대한 자세한 내용은 Druid 확장 프로그램 로드에 대한 공식 가이드에서 확인할 수 있습니다.
다음 구성을 추가하여 Druid Java 속성을 업데이트합니다.
# Add `statsd-emitter` to the extensions list to be loaded
druid.extensions.loadList=[..., "statsd-emitter"]
# By default druid emission period is 1 minute (PT1M).
# We recommend using 15 seconds instead:
druid.monitoring.emissionPeriod=PT15S
# Use `statsd-emitter` extension as metric emitter
druid.emitter=statsd
# Configure `statsd-emitter` endpoint
druid.emitter.statsd.hostname=127.0.0.1
druid.emitter.statsd.port:8125
# Configure `statsd-emitter` to use dogstatsd format. Must be set to true, otherwise tags are not reported correctly to Datadog.
druid.emitter.statsd.dogstatsd=true
druid.emitter.statsd.dogstatsdServiceAsTag=true
Druid를 다시 시작하여 DogStatsD를 통해 Druid 메트릭을 Agent로 전송합니다.
druid.d/conf.yaml
파일의 기본 구성을 사용하여 Druid 서비스 점검 수집을 활성화합니다. 사용 가능한 모든 구성 옵션은 샘플 druid.d/conf.yaml을 참조하세요.
Agent 버전 6.0 이상에서 사용 가능
로그 수집은 Datadog Agent에서 기본적으로 비활성화되어 있습니다. datadog.yaml 파일에서 활성화하세요.
logs_enabled: true
druid.d/conf.yaml
의 하단에서 이 구성 블록의 주석 처리를 제거하고 편집합니다.
logs:
- type: file
path: '<PATH_TO_DRUID_DIR>/var/sv/*.log'
source: druid
service: '<SERVICE_NAME>'
log_processing_rules:
- type: multi_line
name: new_log_start_with_date
pattern: \d{4}\-\d{2}\-\d{2}
path
및 service
파라미터 값을 변경하고 환경에 맞게 설정합니다.
Agent를 다시 시작합니다.
Agent의 상태 하위 명령을 실행하고 Checks 섹션에서 druid
를 찾으세요.
druid.coordinator.segment.count (gauge) | Coordinator segment count. Shown as segment |
druid.historical.segment.count (gauge) | Historical segment count. Shown as segment |
druid.ingest.events.buffered (gauge) | Number of events queued in the EventReceiverFirehose's buffer. Shown as event |
druid.ingest.events.duplicate (count) | Number of events rejected because the events are duplicated. Shown as event |
druid.ingest.events.messageGap (gauge) | Time gap between the data time in event and current system time. Shown as millisecond |
druid.ingest.events.processed (count) | Number of events successfully processed per emission period. Shown as event |
druid.ingest.events.thrownAway (count) | Number of events rejected because they are outside the windowPeriod. Shown as event |
druid.ingest.events.unparseable (count) | Number of events rejected because the events are unparsable. Shown as event |
druid.ingest.handoff.failed (count) | Number of handoffs that failed. |
druid.ingest.kafka.avgLag (gauge) | Average lag between the offsets consumed by the Kafka indexing tasks and latest offsets in Kafka brokers across all partitions. Minimum emission period for this metric is a Minute. Shown as offset |
druid.ingest.kafka.lag (gauge) | Total lag between the offsets consumed by the Kafka indexing tasks and latest offsets in Kafka brokers across all partitions. Minimum emission period for this metric is a Minute. Shown as offset |
druid.ingest.kafka.maxLag (gauge) | Max lag between the offsets consumed by the Kafka indexing tasks and latest offsets in Kafka brokers across all partitions. Minimum emission period for this metric is a Minute. Shown as offset |
druid.ingest.merge.cpu (gauge) | Cpu time in Nanoseconds spent on merging intermediate segments. Shown as nanosecond |
druid.ingest.merge.time (gauge) | Milliseconds spent merging intermediate segments. Shown as millisecond |
druid.ingest.persists.backPressure (gauge) | Milliseconds spent creating persist tasks and blocking waiting for them to finish. Shown as millisecond |
druid.ingest.persists.count (count) | Number of times persist occurred. |
druid.ingest.persists.cpu (gauge) | Cpu time in Nanoseconds spent on doing intermediate persist. Shown as nanosecond |
druid.ingest.persists.failed (count) | Number of persists that failed. |
druid.ingest.persists.time (gauge) | Milliseconds spent doing intermediate persist. Shown as millisecond |
druid.ingest.rows.output (count) | Number of Druid rows persisted. Shown as row |
druid.jvm.bufferpool.capacity (gauge) | Bufferpool capacity in bytes. Shown as byte |
druid.jvm.bufferpool.count (gauge) | Bufferpool count in bytes. Shown as byte |
druid.jvm.bufferpool.used (gauge) | Bufferpool used in bytes. Shown as byte |
druid.jvm.gc.count (count) | Garbage collection count. |
druid.jvm.gc.cpu (gauge) | Cpu time in Nanoseconds spent on garbage collection. Shown as nanosecond |
druid.jvm.mem.committed (gauge) | Committed memory in bytes. Shown as byte |
druid.jvm.mem.init (gauge) | Initial memory in bytes. Shown as byte |
druid.jvm.mem.max (gauge) | Max memory in bytes. Shown as byte |
druid.jvm.mem.used (gauge) | Used memory in bytes. Shown as byte |
druid.jvm.pool.committed (gauge) | Committed pool in byte. Shown as byte |
druid.jvm.pool.init (gauge) | Initial pool in bytes. Shown as byte |
druid.jvm.pool.max (gauge) | Max pool in bytes. Shown as byte |
druid.jvm.pool.used (gauge) | Pool used in bytes. Shown as byte |
druid.query.bytes (count) | Number of bytes returned in query response. Shown as byte |
druid.query.cache.delta.averageBytes (count) | Delta average cache entry byte size. Shown as byte |
druid.query.cache.delta.errors (count) | Delta number of cache errors. |
druid.query.cache.delta.evictions (count) | Delta number of cache evictions. Shown as eviction |
druid.query.cache.delta.hitRate (count) | Delta cache hit rate. Shown as fraction |
druid.query.cache.delta.hits (count) | Delta number of cache hits. Shown as hit |
druid.query.cache.delta.misses (count) | Delta number of cache misses. Shown as miss |
druid.query.cache.delta.numEntries (count) | Delta number of cache entries. |
druid.query.cache.delta.sizeBytes (count) | Delta size in bytes of cache entries. Shown as byte |
druid.query.cache.delta.timeouts (count) | Delta number of cache timeouts. |
druid.query.cache.total.averageBytes (gauge) | Total average cache entry byte size. Shown as byte |
druid.query.cache.total.errors (gauge) | Total number of cache errors. |
druid.query.cache.total.evictions (gauge) | Total number of cache evictions. Shown as eviction |
druid.query.cache.total.hitRate (gauge) | Total cache hit rate. Shown as fraction |
druid.query.cache.total.hits (gauge) | Total number of cache hits. Shown as hit |
druid.query.cache.total.misses (gauge) | Total number of cache misses. Shown as miss |
druid.query.cache.total.numEntries (gauge) | Total number of cache entries. |
druid.query.cache.total.sizeBytes (gauge) | Total size in bytes of cache entries. Shown as byte |
druid.query.cache.total.timeouts (gauge) | Total number of cache timeouts. |
druid.query.count (count) | Number of total queries. Shown as query |
druid.query.cpu.time (gauge) | Microseconds of CPU time taken to complete a query. Shown as microsecond |
druid.query.failed.count (count) | Number of failed queries. Shown as query |
druid.query.interrupted.count (count) | Number of queries interrupted due to cancellation or timeout. Shown as query |
druid.query.intervalChunk.time (gauge) | Only emitted if interval chunking is enabled. Milliseconds required to query an interval chunk. This metric is deprecated and will be removed in the future because interval Chunking is deprecated. See Query Context. Shown as millisecond |
druid.query.node.backpressure (gauge) | Milliseconds that the channel to this process has spent suspended due to backpressure. Shown as millisecond |
druid.query.node.bytes (count) | Number of bytes returned from querying individual historical/realtime processes. Shown as byte |
druid.query.node.time (gauge) | Milliseconds taken to query individual historical/realtime processes. Shown as millisecond |
druid.query.node.ttfb (gauge) | Time to first byte. Milliseconds elapsed until Broker starts receiving the response from individual historical/realtime processes. Shown as millisecond |
druid.query.segment.time (gauge) | Milliseconds taken to query individual segment. Includes time to page in the segment from disk. Shown as millisecond |
druid.query.segmentAndCache.time (gauge) | Milliseconds taken to query individual segment or hit the cache (if it is enabled on the Historical process). Shown as millisecond |
druid.query.success.count (count) | Number of queries successfully processed. Shown as query |
druid.query.time (gauge) | Milliseconds taken to complete a query. Shown as millisecond |
druid.query.wait.time (gauge) | Milliseconds spent waiting for a segment to be scanned. Shown as millisecond |
druid.segment.added.bytes (count) | Size in bytes of new segments created. Shown as byte |
druid.segment.assigned.count (count) | Number of segments assigned to be loaded in the cluster. Shown as segment |
druid.segment.cost.normalization (count) | Used in cost balancing. The normalization of hosting segments. |
druid.segment.cost.normalized (count) | Used in cost balancing. The normalized cost of hosting segments. |
druid.segment.cost.raw (count) | Used in cost balancing. The raw cost of hosting segments. |
druid.segment.deleted.count (count) | Number of segments dropped due to rules. Shown as segment |
druid.segment.dropQueue.count (gauge) | Number of segments to drop. Shown as segment |
druid.segment.dropped.count (count) | Number of segments dropped due to being overshadowed. Shown as segment |
druid.segment.loadQueue.count (gauge) | Number of segments to load. Shown as segment |
druid.segment.loadQueue.failed (gauge) | Number of segments that failed to load. Shown as segment |
druid.segment.loadQueue.size (gauge) | Size in bytes of segments to load. Shown as byte |
druid.segment.max (gauge) | Maximum byte limit available for segments. Shown as byte |
druid.segment.moved.bytes (count) | Size in bytes of segments moved/archived via the Move Task. Shown as byte |
druid.segment.moved.count (count) | Number of segments moved in the cluster. Shown as segment |
druid.segment.nuked.bytes (count) | Size in bytes of segments deleted via the Kill Task. Shown as byte |
druid.segment.overShadowed.count (gauge) | Number of overShadowed segments. Shown as segment |
druid.segment.pendingDelete (gauge) | On-disk size in bytes of segments that are waiting to be cleared out. Shown as byte |
druid.segment.scan.pending (gauge) | Number of segments in queue waiting to be scanned. Shown as unit |
druid.segment.size (gauge) | Size in bytes of available segments. Shown as byte |
druid.segment.unavailable.count (count) | Number of segments (not including replicas) left to load until segments that should be loaded in the cluster are available for queries. Shown as segment |
druid.segment.underReplicated.count (count) | Number of segments (including replicas) left to load until segments that should be loaded in the cluster are available for queries. Shown as segment |
druid.segment.unneeded.count (count) | Number of segments dropped due to being marked as unused. Shown as segment |
druid.segment.used (gauge) | Bytes used for served segments. Shown as byte |
druid.segment.usedPercent (gauge) | Percentage of space used by served segments. Shown as fraction |
druid.service.health (gauge) | 1 if the service is healthy, 0 otherwise |
druid.sys.cpu (gauge) | CPU used. Shown as percent |
druid.sys.disk.read.count (count) | Reads from disk. Shown as read |
druid.sys.disk.read.size (count) | Bytes read from disk. Can we used to determine how much paging is occurring with regards to segments. Shown as byte |
druid.sys.disk.write.count (count) | Writes to disk. Shown as write |
druid.sys.disk.write.size (count) | Bytes written to disk. Can we used to determine how much paging is occurring with regards to segments. Shown as byte |
druid.sys.fs.max (gauge) | Filesystesm bytes max. Shown as byte |
druid.sys.fs.used (gauge) | Filesystem bytes used. Shown as byte |
druid.sys.mem.max (gauge) | Memory max. Shown as byte |
druid.sys.mem.used (gauge) | Memory used. Shown as byte |
druid.sys.net.read.size (count) | Bytes read from the network. Shown as byte |
druid.sys.net.write.size (count) | Bytes written to the network. Shown as byte |
druid.sys.storage.used (gauge) | Disk space used. Shown as byte |
druid.sys.swap.free (gauge) | Free swap in bytes. Shown as byte |
druid.sys.swap.max (gauge) | Max swap in bytes. Shown as byte |
druid.sys.swap.pageIn (gauge) | Paged in swap. Shown as page |
druid.sys.swap.pageOut (gauge) | Paged out swap. Shown as page |
druid.task.failed.count (count) | Number of failed tasks per emission period. This metric is only available if the TaskCountStatsMonitor module is included. Shown as task |
druid.task.pending.count (count) | Number of current pending tasks. This metric is only available if the TaskCountStatsMonitor module is included. Shown as task |
druid.task.run.time (gauge) | Milliseconds taken to run a task. Shown as millisecond |
druid.task.running.count (count) | Number of current running tasks. This metric is only available if the TaskCountStatsMonitor module is included. Shown as task |
druid.task.success.count (count) | Number of successful tasks per emission period. This metric is only available if the TaskCountStatsMonitor module is included. Shown as task |
druid.task.waiting.count (count) | Number of current waiting tasks. This metric is only available if the TaskCountStatsMonitor module is included. Shown as task |
Druid 점검은 이벤트를 포함하지 않습니다.
druid.service.can_connect
Returns CRITICAL
if the check cannot connect to Druid service. Returns OK
otherwise.
Statuses: ok, critical
druid.service.health
Returns CRITICAL
if Druid service is not healthy. Returns OK
otherwise.
Statuses: ok, critical
도움이 필요하세요? Datadog 지원팀에 문의하세요.