TiDB

문서 > 통합 > TiDB

Supported OS Mac OS Windows

통합 버전2.1.0

개요

TiDB 클러스터를 Datadog를 연결하여 다음을 수행합니다.

클러스터의 핵심 TiDB 메트릭을 수집합니다.
TiDB/TiKV/TiFlash 로그, 느린 쿼리 로그 등 클러스터의 로그를 수집하세요.
제공된 대시보드에서 클러스터 성능을 시각화합니다.

참고:
이 통합을 위해서는 TiDB 4.0 이상이 필요합니다. TiDB 클라우드의 경우 TiDB 클라우드 통합을 참조하세요.

설정

설치

먼저, Datadog Agent를 다운로드하여 실행합니다.

그런 다음 TiDB 점검을 수동으로 설치합니다. 환경에 따라 지침이 다릅니다.

datadog-agent integration install -t datadog-tidb==<INTEGRATION_VERSION>을 실행합니다.

구성

메트릭 수집

Agent 구성 디렉터리 루트의 conf.d/ 폴더에 있는 tidb.d/conf.yaml 파일을 편집하여 TiDB 성능 데이터 수집을 시작합니다. 사용 가능한 모든 구성 옵션은 샘플 tidb.d/conf.yaml을 참조하세요.

샘플 tidb.d/conf.yaml은 PD 인스턴스만 구성합니다. TiDB 클러스터의 다른 인스턴스는 수동으로 구성해야 합니다. 예는 다음과 같습니다.

init_config:

instances:

  - pd_metric_url: http://localhost:2379/metrics
    send_distribution_buckets: true
    tags:
      - cluster_name:cluster01

  - tidb_metric_url: http://localhost:10080/metrics
    send_distribution_buckets: true
    tags:
      - cluster_name:cluster01

  - tikv_metric_url: http://localhost:20180/metrics
    send_distribution_buckets: true
    tags:
      - cluster_name:cluster01

  - tiflash_metric_url: http://localhost:8234/metrics
    send_distribution_buckets: true
    tags:
      - cluster_name:cluster01

  - tiflash_proxy_metric_url: http://localhost:20292/metrics
    send_distribution_buckets: true
    tags:
      - cluster_name:cluster01

Agent를 재시작합니다.

로그 수집

Agent 버전 6.0 이상에서 사용 가능

Datadog 에이전트에서 로그 수집은 기본적으로 사용하지 않도록 설정되어 있습니다. datadog.yaml파일에서 로그 수집을 사용하도록 설정합니다.
```
logs_enabled: true
```

이 구성 블록을 tidb.d/conf.yaml 파일에 추가하여 TiDB 로그 수집을 시작하세요.

logs:
 # pd log
 - type: file
   path: "/tidb-deploy/pd-2379/log/pd*.log"
   service: "tidb-cluster"
   source: "pd"

 # tikv log
 - type: file
   path: "/tidb-deploy/tikv-20160/log/tikv*.log"
   service: "tidb-cluster"
   source: "tikv"

 # tidb log
 - type: file
   path: "/tidb-deploy/tidb-4000/log/tidb*.log"
   service: "tidb-cluster"
   source: "tidb"
   exclude_paths:
     - /tidb-deploy/tidb-4000/log/tidb_slow_query.log
 - type: file
   path: "/tidb-deploy/tidb-4000/log/tidb_slow_query*.log"
   service: "tidb-cluster"
   source: "tidb"
   log_processing_rules:
     - type: multi_line
       name: new_log_start_with_datetime
       pattern: '#\sTime:'
   tags:
     - "custom_format:tidb_slow_query"

 # tiflash log
 - type: file
   path: "/tidb-deploy/tiflash-9000/log/tiflash*.log"
   service: "tidb-cluster"
   source: "tiflash"

클러스터의 구성에 따라 path 및 service를 변경합니다.

다음 명령을 사용하여 모든 로그 경로를 표시합니다:

# show deploying directories
tiup cluster display <YOUR_CLUSTER_NAME>
# find specific logging file path by command arguments
ps -fwwp <TIDB_PROCESS_PID/PD_PROCESS_PID/etc.>

Agent를 재시작합니다.

검증

Agent의 상태 하위 명령을 실행하고 확인 섹션에서 tidb를 찾습니다.

수집한 데이터

메트릭


tidb_cluster.tidb_executor_statement_total (count)	The total number of statements executed Shown as execution
tidb_cluster.tidb_server_execute_error_total (count)	The total number of execution errors Shown as error
tidb_cluster.tidb_server_connections (gauge)	Current number of connections in TiDB server Shown as connection
tidb_cluster.tidb_server_handle_query_duration_seconds.count (count)	The total number of handled queries in server Shown as query
tidb_cluster.tidb_server_handle_query_duration_seconds.sum (count)	The sum of handled query duration in server Shown as second
tidb_cluster.tikv_engine_size_bytes (gauge)	The disk usage bytes of TiKV instances Shown as byte
tidb_cluster.tikv_store_size_bytes (gauge)	The disk capacity bytes of TiKV instances Shown as byte
tidb_cluster.tikv_io_bytes (count)	The io read/write bytes of TiKV instances Shown as byte
tidb_cluster.tiflash_store_size_used_bytes (gauge)	The disk usage bytes of TiFlash instances Shown as byte
tidb_cluster.tiflash_store_size_capacity_bytes (gauge)	The disk capacity bytes of TiFlash instances Shown as byte
tidb_cluster.process_cpu_seconds_total (count)	The cpu usage seconds of TiDB/TiKV/TiFlash instances Shown as second
tidb_cluster.process_resident_memory_bytes (gauge)	The resident memory bytes of TiDB/TiKV/TiFlash instances Shown as byte

It is possible to use the metrics configuration option to collect additional metrics from a TiDB cluster.

metrics 구성 옵션을 사용하여 TiDB 클러스터에서 추가 메트릭을 수집할 수 있습니다.

이벤트

TiDB 점검에는 어떠한 이벤트도 포함되지 않습니다.

서비스 점검

Service Checks are based on tidb_cluster.prometheus.health metrics. This check is controlled by the health_service_check config and default to true. You can modify this behavior in tidb.yml file.

tidb_cluster.prometheus.health

Returns CRITICAL if the Agent cannot fetch Prometheus metrics, otherwise returns OK.

Statuses: ok, critical

트러블슈팅

macOS에서 TiKV 및 TiFlash 인스턴스에 대한 누락된 CPU 및 메모리 메트릭

다음과 같은 경우에는 TiKV 및 TiFlash 인스턴스에 대해 CPU 및 메모리 메트릭이 제공되지 않습니다:

macOS에서 tiup 플레이그라운드로 TiKV 또는 TiFlash 인스턴스를 실행합니다.
새 Apple M1 머신에서 docker-compose up을 사용하여 TiKV 또는 TiFlash 인스턴스를 실행합니다.

너무 많은 메트릭

TiDB 검사는 기본적으로 Datadog 의 distribution 메트릭 유형을 사용하도록 설정합니다. 이 데이터 부분은 상당히 크고 많은 리소스를 소모할 수 있습니다. tidb.yml 파일에서 이 동작을 수정할 수 있습니다.

send_distribution_buckets: false

TiDB 클러스터에는 중요한 메트릭이 많기 때문에, TiDB 검사는 기본적으로 max_returned_metrics을 10000로 설정합니다. 필요한 경우 tidb.yml 파일에서 max_returned_metrics을 줄일 수 있습니다.

max_returned_metrics: 1000

도움이 필요하신가요? Datadog 고객 지원팀에 문의해주세요.