- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
This check monitors Fly.io metrics through the Datadog Agent.
Follow the instructions below to install and configure this check for an Agent running on a Fly application.
The Fly.io check is included in the Datadog Agent package. We recommend running the Fly.io check on the Datadog Agent in a Fly.io application. The Agent collects Prometheus metrics as well as some additional data from the Machines API. Additionally, you can configure the Agent to receive traces and custom metrics from all of your Fly.io applications inside the organization.
Create a new application in Fly.io with the image set as the Datadog Agent when launching, or provide the image in the fly.toml
file:
[build]
image = 'gcr.io/datadoghq/agent:7'
Set a secret for your Datadog API key called DD_API_KEY
, and optionally your site as DD_SITE
.
In your app’s directory, create a conf.yaml
file for the Fly.io integration, configure the integration, and mount it in the Agent’s conf.d/fly_io.d/
directory as conf.yaml
:
instances:
- empty_default_hostname: true
headers:
Authorization: Bearer <YOUR_FLY_TOKEN>
machines_api_endpoint: http://_api.internal:4280
org_slug: <YOUR_ORG_SLUG>
Deploy your app.
Note: To collect traces and custom metrics from your applications, see Application traces.
Edit the fly_io.d/conf.yaml
file, located in the conf.d/
folder at the root of your Agent’s configuration directory, to start collecting your Fly.io performance data. See the sample fly_io.d/conf.yaml for all available configuration options.
Run the Agent’s status subcommand and look for fly_io
under the Checks section.
fly_io.app.concurrency (gauge) | |
fly_io.app.connect_time.bucket (count) | Shown as second |
fly_io.app.connect_time.count (count) | |
fly_io.app.connect_time.sum (count) | Shown as second |
fly_io.app.count (gauge) | Count of apps |
fly_io.app.http_response_time.bucket (count) | Shown as second |
fly_io.app.http_response_time.count (count) | |
fly_io.app.http_response_time.sum (count) | Shown as second |
fly_io.app.http_responses.count (gauge) | Shown as response |
fly_io.app.tcp_connects.count (gauge) | |
fly_io.app.tcp_disconnects.count (gauge) | |
fly_io.edge.data_in (gauge) | Shown as byte |
fly_io.edge.data_out (gauge) | Shown as byte |
fly_io.edge.http_response_time.bucket (count) | Shown as second |
fly_io.edge.http_response_time.count (count) | |
fly_io.edge.http_response_time.sum (count) | Shown as second |
fly_io.edge.http_responses.count (gauge) | Shown as response |
fly_io.edge.tcp_connects.count (gauge) | |
fly_io.edge.tcp_disconnects.count (gauge) | |
fly_io.edge.tls_handshake_errors (gauge) | Shown as error |
fly_io.edge.tls_handshake_time.bucket (count) | Shown as second |
fly_io.edge.tls_handshake_time.count (count) | |
fly_io.edge.tls_handshake_time.sum (count) | Shown as second |
fly_io.instance.cpu.count (count) | The amount of time each CPU (cpu_id) has spent performing different kinds of work (mode) in centiseconds |
fly_io.instance.disk.io_in_progress (gauge) | Incremented as requests are given to appropriate struct request_queue and decremented as they finish. |
fly_io.instance.disk.reads_completed.count (count) | This is the total number of reads completed successfully. |
fly_io.instance.disk.reads_merged.count (count) | Reads and writes which are adjacent to each other may be merged for efficiency. This field lets you know how often this was done. |
fly_io.instance.disk.sectors_read.count (count) | This is the total number of sectors read successfully. |
fly_io.instance.disk.sectors_written.count (count) | This is the total number of sectors written successfully. |
fly_io.instance.disk.time_io.count (count) | Counts jiffies when at least one request was started or completed. If request runs more than 2 jiffies then some I/O time might be not accounted in case of concurrent requests. Shown as millisecond |
fly_io.instance.disk.time_io_weighted.count (count) | Incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. Shown as millisecond |
fly_io.instance.disk.time_reading.count (count) | This is the total number of milliseconds spent by all reads. Shown as millisecond |
fly_io.instance.disk.time_writing.count (count) | This is the total number of milliseconds spent by all writes Shown as millisecond |
fly_io.instance.disk.writes_completed.count (count) | This is the total number of writes completed successfully. |
fly_io.instance.disk.writes_merged.count (count) | Reads and writes which are adjacent to each other may be merged for efficiency. This field lets you know how often this was done. |
fly_io.instance.filefd.allocated (gauge) | Number of allocated file descriptors |
fly_io.instance.filefd.max (gauge) | Number of maximum file descriptors |
fly_io.instance.filesystem.block_size (gauge) | File system block size. |
fly_io.instance.filesystem.blocks (gauge) | Total number of blocks on file system |
fly_io.instance.filesystem.blocks_avail (gauge) | Total number of available blocks. |
fly_io.instance.filesystem.blocks_free (gauge) | Total number of free blocks. |
fly_io.instance.load.avg (gauge) | System load average measuring the number of processes in the system run queue, with samples representing averages over 1, 5, and 15 minutes. Shown as process |
fly_io.instance.memory.active (gauge) | Memory that has been used more recently and usually not reclaimed unless absolutely necessary. Shown as byte |
fly_io.instance.memory.buffers (gauge) | Relatively temporary storage for raw disk blocks Shown as byte |
fly_io.instance.memory.cached (gauge) | In-memory cache for files read from the disk (the pagecache) as well as tmpfs & shmem. Doesn't include SwapCached. Shown as byte |
fly_io.instance.memory.dirty (gauge) | Memory which is waiting to get written back to the disk Shown as byte |
fly_io.instance.memory.inactive (gauge) | Memory which has been less recently used. It is more eligible to be reclaimed for other purposes Shown as byte |
fly_io.instance.memory.mem_available (gauge) | An estimate of how much memory is available for starting new applications, without swapping. Shown as byte |
fly_io.instance.memory.mem_free (gauge) | Total free RAM. Shown as byte |
fly_io.instance.memory.mem_total (gauge) | Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel binary code) Shown as byte |
fly_io.instance.memory.pressure_full (gauge) | Memory pressure for all processes |
fly_io.instance.memory.pressure_some (gauge) | Memory pressure for at least one process |
fly_io.instance.memory.shmem (gauge) | Total memory used by shared memory (shmem) and tmpfs Shown as byte |
fly_io.instance.memory.slab (gauge) | in-kernel data structures cache Shown as byte |
fly_io.instance.memory.swap_cached (gauge) | Memory that once was swapped out, is swapped back in but still also is in the swapfile Shown as byte |
fly_io.instance.memory.swap_free (gauge) | Memory which has been evicted from RAM, and is temporarily on the disk Shown as byte |
fly_io.instance.memory.swap_total (gauge) | total amount of swap space available Shown as byte |
fly_io.instance.memory.vmalloc_chunk (gauge) | largest contiguous block of vmalloc area which is free Shown as byte |
fly_io.instance.memory.vmalloc_total (gauge) | total size of vmalloc virtual address space Shown as byte |
fly_io.instance.memory.vmalloc_used (gauge) | amount of vmalloc area which is used Shown as byte |
fly_io.instance.memory.writeback (gauge) | Memory which is actively being written back to the disk Shown as byte |
fly_io.instance.net.recv_bytes.count (count) | Number of good bytes received by the interface. Shown as byte |
fly_io.instance.net.recv_compressed.count (count) | Number of correctly received compressed packets. |
fly_io.instance.net.recv_drop.count (count) | Number of packets received but not processed, e.g. due to lack of resources or unsupported protocol. Shown as packet |
fly_io.instance.net.recv_errs.count (count) | Total number of bad packets received on this network device. Shown as packet |
fly_io.instance.net.recv_fifo.count (count) | Receiver FIFO overflow event counter. |
fly_io.instance.net.recv_frame.count (count) | Receiver frame alignment errors. |
fly_io.instance.net.recv_multicast.count (count) | Multicast packets received. Shown as packet |
fly_io.instance.net.recv_packets.count (count) | Number of good packets received by the interface. Shown as packet |
fly_io.instance.net.sent_bytes.count (count) | Number of good transmitted bytes. Shown as byte |
fly_io.instance.net.sent_carrier.count (count) | Number of frame transmission errors due to loss of carrier during transmission. |
fly_io.instance.net.sent_colls.count (count) | Number of collisions during packet transmissions. |
fly_io.instance.net.sent_compressed.count (count) | Number of transmitted compressed packets. |
fly_io.instance.net.sent_drop.count (count) | Number of packets dropped on their way to transmission, e.g. due to lack of resources. Shown as packet |
fly_io.instance.net.sent_errs.count (count) | Total number of transmit problems. |
fly_io.instance.net.sent_fifo.count (count) | Sent FIFO overflow event counter. |
fly_io.instance.net.sent_packets.count (count) | Number of packets successfully transmitted. Shown as packet |
fly_io.instance.up (gauge) | Reports 1 if the VM is reporting correctly |
fly_io.instance.volume.size (gauge) | Volume size in bytes. Shown as byte |
fly_io.instance.volume.used (gauge) | Percentage of volume used. Shown as byte |
fly_io.machine.count (gauge) | Count of running machines |
fly_io.machine.cpus.count (gauge) | Number of cpus |
fly_io.machine.gpus.count (gauge) | Number of gpus |
fly_io.machine.memory (gauge) | Memory of a machine Shown as megabyte |
fly_io.machine.swap_size (gauge) | Swap space to reserve for the Fly Machine Shown as megabyte |
fly_io.machines_api.up (gauge) | Whether the check can access the machines API or not |
fly_io.pg.database.size (gauge) | Database size Shown as byte |
fly_io.pg.replication.lag (gauge) | Replication lag |
fly_io.pg_stat.activity.count (gauge) | number of connections in this state |
fly_io.pg_stat.activity.max_tx_duration (gauge) | max duration in seconds any active transaction has been running Shown as second |
fly_io.pg_stat.archiver.archived_count (gauge) | Number of WAL files that have been successfully archived |
fly_io.pg_stat.archiver.failed_count (gauge) | Number of failed attempts for archiving WAL files |
fly_io.pg_stat.bgwriter.buffers_alloc (gauge) | Number of buffers allocated |
fly_io.pg_stat.bgwriter.buffers_backend (gauge) | Number of buffers written directly by a backend |
fly_io.pg_stat.bgwriter.buffers_backend_fsync (gauge) | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) |
fly_io.pg_stat.bgwriter.buffers_checkpoint (gauge) | Number of buffers written during checkpoints |
fly_io.pg_stat.bgwriter.buffers_clean (gauge) | Number of buffers written by the background writer |
fly_io.pg_stat.bgwriter.checkpoint_sync_time (gauge) | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds Shown as millisecond |
fly_io.pg_stat.bgwriter.checkpoint_write_time (gauge) | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds Shown as millisecond |
fly_io.pg_stat.bgwriter.checkpoints_req (gauge) | Number of requested checkpoints that have been performed |
fly_io.pg_stat.bgwriter.checkpoints_timed (gauge) | Number of scheduled checkpoints that have been performed |
fly_io.pg_stat.bgwriter.maxwritten_clean (gauge) | Number of times the background writer stopped a cleaning scan because it had written too many buffers |
fly_io.pg_stat.bgwriter.stats_reset (gauge) | Time at which these statistics were last reset |
fly_io.pg_stat.database.blk_read_time (gauge) | Time spent reading data file blocks by backends in this database, in milliseconds Shown as millisecond |
fly_io.pg_stat.database.blk_write_time (gauge) | Time spent writing data file blocks by backends in this database, in milliseconds Shown as millisecond |
fly_io.pg_stat.database.blks_hit (gauge) | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache) |
fly_io.pg_stat.database.blks_read (gauge) | Number of disk blocks read in this database |
fly_io.pg_stat.database.conflicts (gauge) | Number of queries canceled due to conflicts with recovery in this database. Conflicts occur only on standby servers |
fly_io.pg_stat.database.conflicts_confl_bufferpin (gauge) | Number of queries in this database that have been canceled due to pinned buffers |
fly_io.pg_stat.database.conflicts_confl_deadlock (gauge) | Number of queries in this database that have been canceled due to deadlocks |
fly_io.pg_stat.database.conflicts_confl_lock (gauge) | Number of queries in this database that have been canceled due to lock timeouts |
fly_io.pg_stat.database.conflicts_confl_snapshot (gauge) | Number of queries in this database that have been canceled due to old snapshots |
fly_io.pg_stat.database.conflicts_confl_tablespace (gauge) | Number of queries in this database that have been canceled due to dropped tablespaces |
fly_io.pg_stat.database.deadlocks (gauge) | Number of deadlocks detected in this database |
fly_io.pg_stat.database.numbackends (gauge) | Number of backends currently connected to this database. This is the only column in this view that returns a value reflecting current state; all other columns return the accumulated values since the last reset. |
fly_io.pg_stat.database.stats_reset (gauge) | Time at which these statistics were last reset |
fly_io.pg_stat.database.tup_deleted (gauge) | Number of rows deleted by queries in this database |
fly_io.pg_stat.database.tup_fetched (gauge) | Number of rows fetched by queries in this database |
fly_io.pg_stat.database.tup_inserted (gauge) | Number of rows inserted by queries in this database |
fly_io.pg_stat.database.tup_returned (gauge) | Number of rows returned by queries in this database |
fly_io.pg_stat.database.tup_updated (gauge) | Number of rows updated by queries in this database |
fly_io.pg_stat.database.xact_commit (gauge) | Number of transactions in this database that have been committed |
fly_io.pg_stat.database.xact_rollback (gauge) | Number of transactions in this database that have been rolled back |
fly_io.pg_stat.replication.pg_current_wal_lsn_bytes (gauge) | WAL position in bytes Shown as byte |
fly_io.pg_stat.replication.pg_wal_lsn_diff (gauge) | Lag in bytes between master and slave Shown as byte |
fly_io.pg_stat.replication.reply_time (gauge) | Send time of last reply message received from standby server |
fly_io.volume.block_size (gauge) | The size of each memory block in bytes Shown as byte |
fly_io.volume.blocks.count (gauge) | The total number of blocks in the volume |
fly_io.volume.blocks_avail (gauge) | The number of blocks available for data in the volume |
fly_io.volume.blocks_free (gauge) | The total number of blocks free for data and root user ops |
fly_io.volume.created (gauge) | Whether the volume has been created or not |
fly_io.volume.encrypted (gauge) | Whether the volume is encrypted or not |
fly_io.volume.size (gauge) | The size of the volume in GB Shown as gigabyte |
The Fly.io integration does not include any events.
The Fly.io integration does not include any service checks.
Follow these steps to collect traces for an application in your Fly.io environment.
Instrument your application.
Deploy the Datadog Agent as a Fly.io application.
Set the required environment variables in the fly.toml
or Dockerfile
of your application and deploy the app.
Set the following as an environment variable to submit metrics to the Datadog Agent application:
[env]
DD_AGENT_HOST="<YOUR_AGENT_APP_NAME>.internal"
Set the following environment variable to ensure you report the same host for logs and metrics:
DD_TRACE_REPORT_HOSTNAME="true"
To utilize unified service tagging, set these environment variables:
DD_SERVICE="APP_NAME"
DD_ENV="ENV_NAME"
DD_VERSION="VERSION"
To correlate logs and traces, follow these steps and set this environment variable:
DD_LOGS_INJECTION="true"
Set the following environment variables in your Datadog Agent application’s fly.toml
and deploy the app:
[env]
DD_APM_ENABLED = "true"
DD_APM_NON_LOCAL_TRAFFIC = "true"
DD_DOGSTATSD_NON_LOCAL_TRAFFIC = "true"
DD_BIND_HOST = "fly-global-services"
Note: Ensure that the settings on your Fly.io instances do not publicly expose the ports for APM and DogStatsD, if enabled.
Use the fly_logs_shipper to collect logs from your Fly.io applications.
Clone the logs shipper project.
Modify the vector-configs/vector.toml
file to set the logs source as fly_io
:
[transforms.log_json]
type = "remap"
inputs = ["nats"]
source = '''
. = parse_json!(.message)
.ddsource = 'fly-io'
.host = .fly.app.instance
.env = <YOUR_ENV_NAME>
'''
This configuration will parse basic fly-specific log attributes. To fully parse all log attributes, set ddsource
to a known logs integration on a per-app basis using vector transforms.
Deploy the logs shipper app.
Need help? Contact Datadog support.