TiDB

TiDB

Agent Check Agent Check

Linux Mac OS Windows OS Supported

Overview

Connect TiDB cluster to Datadog in order to:

  • Collect key TiDB metrics of your cluster.
  • Collect logs of your cluster, such as TiDB/TiKV/TiFlash logs and slow query logs.
  • Visualize cluster performance on the provided dashboard.

Note:

  • TiDB 4.0+ is required for this integration.
  • Integration of TiDB Cloud with Datadog is not available now.

Setup

Installation

First, download and launch the Datadog Agent.

Then, manually install the TiDB check. Instructions vary depending on the environment.

Current TiDB integration version: 1.0.0

Host

Run datadog-agent integration install -t datadog-tidb==<INTEGRATION_VERSION>.

Containerized

The best way to use this integration with the Docker Agent is to build the Agent with this integration installed. Use the following Dockerfile to build an updated version of the Agent:

FROM gcr.io/datadoghq/agent:latest

ARG INTEGRATION_VERSION=1.0.0

RUN agent integration install -r -t datadog-tidb==${INTEGRATION_VERSION}

Build the image and push it to your private Docker registry.

Then, upgrade the Datadog Agent container image. If the Helm chart is used, modify the agents.image section in the values.yaml to replace the default agent image:

agents:
  enabled: true
  image:
    tag: <NEW_TAG>
    repository: <YOUR_PRIVATE_REPOSITORY>/<AGENT_NAME>

Use the new values.yaml to upgrade the Agent:

helm upgrade -f values.yaml <RELEASE_NAME> datadog/datadog

Configuration

Host

Metric collection
  1. Edit the tidb.d/conf.yaml file in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your TiDB performance data. See the sample tidb.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Log collection

Available for Agent versions >6.0

  1. Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:

    logs_enabled: true
    
  2. Add this configuration block to your tidb.d/conf.yaml file to start collecting your TiDB logs:

      logs:
       # pd log
       - type: file
         path: "/tidb-deploy/pd-2379/log/pd*.log"
         service: "tidb-cluster"
         source: "pd"
         
       # tikv log
       - type: file
         path: "/tidb-deploy/tikv-20160/log/tikv*.log"
         service: "tidb-cluster"
         source: "tikv"
         
       # tidb log
       - type: file
         path: "/tidb-deploy/tidb-4000/log/tidb*.log"
         service: "tidb-cluster"
         source: "tidb"
         exclude_paths:
           - /tidb-deploy/tidb-4000/log/tidb_slow_query.log
       - type: file
         path: "/tidb-deploy/tidb-4000/log/tidb_slow_query*.log"
         service: "tidb-cluster"
         source: "tidb"
         log_processing_rules:
           - type: multi_line
             name: new_log_start_with_datetime
             pattern: '#\sTime:'
         tags:
           - "custom_format:tidb_slow_query"
         
       # tiflash log
       - type: file
         path: "/tidb-deploy/tiflash-9000/log/tiflash*.log"
         service: "tidb-cluster"
         source: "tiflash"
    

    Change the path and service according to your cluster’s configuration.

    Use these commands to show all log path:

    # show deploying directories
    tiup cluster display <YOUR_CLUSTER_NAME>
    # find specific logging file path by command arguments
    ps -fwwp <TIDB_PROCESS_PID/PD_PROCESS_PID/etc.>
    
  3. Restart the Agent.

Containerized

Metric collection

For containerized environments, after the TiDB check is integrated in the Datadog Agent image, Autodiscovery is configured by default.

Thus, metrics are automatically collected to Datadog’s server.

If you need to override the default Autodiscovery behavior, add Datadog annotations to TiDB Pods:

apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/tidb.check_names: '["tidb"]'
    ad.datadoghq.com/tidb.init_configs: '[{}]'
    ad.datadoghq.com/tidb.instances: '[{"pd_metric_url": "http://%%host%%:2379/metrics", "tidb_metric_url": "http://%%host%%:10080/metrics", "tikv_metric_url": "http://%%host%%:20180/metrics"}]'
    # (...)
spec:
  containers:
    - name: 'tidb'
# (...)

See the Autodiscovery Integration Templates for the complete guidance.

Log collection

Available for Agent versions >6.0

Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes log collection documentation.

ParameterValue
<LOG_CONFIG>{"source": "tidb", "service": "tidb_cluster"}

Validation

Run the Agent’s status subcommand and look for tidb under the Checks section.

Data Collected

Metrics

tidb_cluster.tidb.executor_statement_total
(count)
The total number of statements executed
tidb_cluster.tidb.server_query_total
(count)
The total number of queries
tidb_cluster.tidb.server_execute_error_total
(count)
The total number of execution errors
tidb_cluster.tidb.server_connections
(gauge)
Current number of connections in TiDB server
tidb_cluster.tidb.tikv_client_region_err_total
(count)
The total number of region errors for TiKV client
tidb_cluster.tidb.tikv_client_lock_resolver_actions_total
(count)
The total number of lock resolver actions for TiKV client
tidb_cluster.tidb.server_handle_query_duration_seconds.count
(count)
The total number of handled queries in server
tidb_cluster.tidb.server_handle_query_duration_seconds.sum
(count)
The sum of handled query duration in server
tidb_cluster.tidb.session_transaction_duration_seconds.count
(count)
The total number of transactions among every session
tidb_cluster.tidb.session_transaction_duration_seconds.sum
(count)
The sum of transactions duration among every session
tidb_cluster.tidb.tikv_client_txn_cmd_duration_seconds.count
(count)
The total number of transaction commands from TiKV client
tidb_cluster.tidb.tikv_client_txn_cmd_duration_seconds.sum
(count)
The sum of transaction command duration from TiKV client
tidb_cluster.tidb.tikv_client_backoff_seconds.count
(count)
The total number of backoffs from TiKV client
tidb_cluster.tidb.tikv_client_backoff_seconds.sum
(count)
The sum of backoff duration from TiKV client
tidb_cluster.tidb.pd_client_request_handle_requests_duration_seconds.count
(count)
The total number of requests from PD client
tidb_cluster.tidb.pd_client_request_handle_requests_duration_seconds.sum
(count)
The sum of request duration from PD client
tidb_cluster.tidb.pd_client_cmd_handle_cmds_duration_seconds.count
(count)
The total number of commands from PD client
tidb_cluster.tidb.pd_client_cmd_handle_cmds_duration_seconds.sum
(count)
The sum of command duration from PD client
tidb_cluster.tidb.domain_load_schema_duration_seconds.count
(count)
The total number of domain loaded schemas
tidb_cluster.tidb.domain_load_schema_duration_seconds.sum
(count)
The sum of domain schema loading duration
tidb_cluster.tidb.go_memstats_heap_inuse_bytes
(count)
Go runtime inuse memory
tidb_cluster.tidb.process_resident_memory_bytes
(count)
Go runtime process resident memory
tidb_cluster.pd.tso_events
(gauge)
The number of tso events
tidb_cluster.pd.cluster_status
(gauge)
PD cluster status
tidb_cluster.pd.regions_status
(gauge)
PD region status
tidb_cluster.pd.hotspot_status
(gauge)
PD hotspot status
tidb_cluster.pd.scheduler_region_heartbeat
(gauge)
PD region scheduler heartbeat
tidb_cluster.pd.grpc_server_handling_seconds.sum
(count)
The total number of grpc request PD handled
tidb_cluster.pd.grpc_server_handling_seconds.count
(count)
The sum of duration for grpc requests PD handled
tidb_cluster.pd.scheduler_region_heartbeat_latency_seconds.sum
(count)
The total number of scheduler region heartbeat
tidb_cluster.pd.scheduler_region_heartbeat_latency_seconds.count
(count)
The sum of duration for scheduler region heartbeat
tidb_cluster.tikv.raft_store_region_count
(count)
Region count of raft stores
tidb_cluster.tikv.thread_cpu_seconds_total
(count)
Sum of cpu seconds
tidb_cluster.tikv.engine_size_bytes
(count)
Sum of engine size
tidb_cluster.tikv.channel_full_total
(count)
Sum of channel full events
tidb_cluster.tikv.server_report_failure_msg_total
(count)
Sum of server report failures
tidb_cluster.tikv.scheduler_context_total
(count)
Sum of scheduler contexts
tidb_cluster.tikv.coprocessor_executor_count
(count)
Sum of coprocessor executors
tidb_cluster.tikv.coprocessor_request_duration_seconds.sum
(count)
Total number of coprocessor requests
tidb_cluster.tikv.coprocessor_request_duration_seconds.count
(count)
Sum of coprocessor request duration
tidb_cluster.tidb.session_parse_duration_seconds.sum
(count)
Total number of session parses
tidb_cluster.tidb.session_parse_duration_seconds.count
(count)
Sum of session parse duration

Service Checks

TiDB check does not include any service checks.

Events

TiDB check does not include any events.

Troubleshooting

Need help? Contact Datadog support.