Logging is here!

Gitlab

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

Integration that allows to:

  • Visualize and monitor metrics collected via Gitlab through Prometheus

See https://docs.gitlab.com/ee/administration/monitoring/prometheus/ for more information about Gitlab and its integration with Prometheus

Setup

Installation

The Gitlab check is included in the Datadog Agent package, so you don’t need to install anything else on your Gitlab servers.

Configuration

Edit the gitlab.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s directory, to point to the Gitlab’s Prometheus metrics endpoint. See the sample gitlab.d/conf.yaml for all available configuration options.

Note: The allowed_metrics item in the init_config section allows to specify the metrics that should be extracted.

Validation

Run the Agent’s status subcommand and look for gitlab under the Checks section.

Data Collected

Metrics

go_gc_duration_seconds
(gauge)
A summary of the GC invocation durations
shown as request
go_gc_duration_seconds_sum
(gauge)
Sum of the GC invocation durations
shown as request
go_gc_duration_seconds_count
(gauge)
Count of the GC invocation durations
shown as request
go_goroutines
(gauge)
Number of goroutines that currently exist
shown as request
go_memstats_alloc_bytes
(gauge)
Number of bytes allocated and still in use
shown as byte
go_memstats_alloc_bytes_total
(counter)
Total number of bytes allocated
shown as byte
go_memstats_buck_hash_sys_bytes
(gauge)
Number of bytes used by the profiling bucket hash table
shown as byte
go_memstats_frees_total
(counter)
Total number of frees
shown as request
go_memstats_gc_cpu_fraction
(gauge)
The fraction of this program's available CPU time used by the GC since the program started
shown as request
go_memstats_gc_sys_bytes
(gauge)
Number of bytes used for garbage collection system metadata
shown as byte
go_memstats_heap_alloc_bytes
(gauge)
Number of heap bytes allocated and still in use
shown as byte
go_memstats_heap_idle_bytes
(gauge)
Number of heap bytes waiting to be used
shown as byte
go_memstats_heap_inuse_bytes
(gauge)
Number of heap bytes that are in use
shown as byte
go_memstats_heap_objects
(gauge)
Number of allocated objects
shown as request
go_memstats_heap_released_bytes_total
(counter)
Total number of heap bytes released to OS
shown as byte
go_memstats_heap_sys_bytes
(gauge)
Number of heap bytes obtained from system
shown as byte
go_memstats_last_gc_time_seconds
(gauge)
Number of seconds since 1970 of last garbage collection
shown as request
go_memstats_lookups_total
(counter)
Total number of pointer lookups
shown as request
go_memstats_mallocs_total
(counter)
Total number of mallocs
shown as request
go_memstats_mcache_inuse_bytes
(gauge)
Number of bytes in use by mcache structures
shown as byte
go_memstats_mcache_sys_bytes
(gauge)
Number of bytes used for mcache structures obtained from system
shown as byte
go_memstats_mspan_inuse_bytes
(gauge)
Number of bytes in use by mspan structures
shown as byte
go_memstats_mspan_sys_bytes
(gauge)
Number of bytes used for mspan structures obtained from system
shown as byte
go_memstats_next_gc_bytes
(gauge)
Number of heap bytes when next garbage collection will take place
shown as byte
go_memstats_other_sys_bytes
(gauge)
Number of bytes used for other system allocations
shown as byte
go_memstats_stack_inuse_bytes
(gauge)
Number of bytes in use by the stack allocator
shown as byte
go_memstats_stack_inuse_bytes
(gauge)
Number of bytes obtained from system for stack allocator
shown as byte
go_memstats_sys_bytes
(gauge)
Number of bytes obtained by system. Sum of all system allocations
shown as byte
go_threads
(gauge)
Number of OS threads create
shown as request
http_request_duration_microseconds
(gauge)
The HTTP request latencies in microseconds
shown as request
http_request_size_bytes
(gauge)
The HTTP request sizes in bytes
shown as byte
http_requests_total
(counter)
Total number of HTTP requests made
shown as request
http_response_size_bytes
(gauge)
The HTTP response sizes in bytes
shown as byte
process_cpu_seconds_total
(counter)
Total user and system CPU time spent in seconds
shown as request
process_max_fds
(gauge)
Maximum number of open file descriptors
shown as request
process_open_fds
(gauge)
Number of open file descriptors
shown as request
process_resident_memory_bytes
(gauge)
Resident memory size in bytes
shown as byte
process_start_time_seconds
(gauge)
Start time of the process since unix epoch in seconds
shown as request
process_virtual_memory_bytes
(gauge)
Virtual memory size in bytes
shown as byte
prometheus_build_info
(gauge)
A metric with a constant '1' value labeled by version revision branch and goversion from which prometheus was built
shown as request
prometheus_config_last_reload_success_timestamp_seconds
(gauge)
Timestamp of the last successful configuration reload
shown as request
prometheus_config_last_reload_successful
(gauge)
Whether the last configuration reload attempt was successful
shown as request
prometheus_engine_queries
(gauge)
The current number of queries being executed or waiting
shown as request
prometheus_engine_queries_concurrent_max
(gauge)
The max number of concurrent queries
shown as request
prometheus_engine_query_duration_seconds
(gauge)
Query timing
shown as request
prometheus_evaluator_duration_seconds
(gauge)
The duration of rule group evaluations
shown as request
prometheus_evaluator_iterations_missed_total
(counter)
The total number of rule group evaluations missed due to slow rule group evaluation
shown as request
prometheus_evaluator_iterations_skipped_total
(counter)
The total number of rule group evaluations skipped due to throttled metric storage
shown as request
prometheus_evaluator_iterations_total
(counter)
The total number of scheduled rule group evaluations whether executed missed or skipped
shown as request
prometheus_local_storage_checkpoint_duration_seconds
(gauge)
The duration in seconds taken for checkpointing open chunks and chunks yet to be persisted
shown as request
prometheus_local_storage_checkpoint_last_duration_seconds
(gauge)
The duration in seconds it took to last checkpoint open chunks and chunks yet to be persisted
shown as request
prometheus_local_storage_checkpoint_last_size_bytes
(gauge)
The size of the last checkpoint of open chunks and chunks yet to be persisted
shown as byte
prometheus_local_storage_checkpoint_series_chunks_written
(gauge)
The number of chunk written per series while checkpointing open chunks and chunks yet to be persisted
shown as request
prometheus_local_storage_checkpointing
(gauge)
1 if the storage is checkpointing and 0 otherwise
shown as request
prometheus_local_storage_chunk_ops_total
(counter)
The total number of chunk operations by their type
shown as request
prometheus_local_storage_chunks_to_persist
(counter)
The current number of chunks waiting for persistence
shown as request
prometheus_local_storage_fingerprint_mappings_total
(counter)
The total number of fingerprints being mapped to avoid collisions
shown as request
prometheus_local_storage_inconsistencies_total
(counter)
A counter incremented each time an inconsistency in the local storage is detected. If this is greater zero then restart the server as soon as possible
shown as request
prometheus_local_storage_indexing_batch_duration_seconds
(gauge)
Quantiles for batch indexing duration in seconds
shown as request
prometheus_local_storage_indexing_batch_sizes
(gauge)
Quantiles for indexing batch sizes (number of metrics per batch)
shown as request
prometheus_local_storage_indexing_queue_capacity
(gauge)
The capacity of the indexing queue
shown as request
prometheus_local_storage_indexing_queue_length
(gauge)
The number of metrics waiting to be indexed
shown as request
prometheus_local_storage_ingested_samples_total
(counter)
The total number of samples ingested
shown as request
prometheus_local_storage_maintain_series_duration_seconds
(gauge)
The duration in seconds it took to perform maintenance on a series
shown as request
prometheus_local_storage_memory_chunkdescs
(gauge)
The current number of chunk descriptors in memory
shown as request
prometheus_local_storage_memory_chunks
(gauge)
The current number of chunks in memory. The number does not include cloned chunks (i.e. chunks without a descriptor)
shown as request
prometheus_local_storage_memory_dirty_series
(gauge)
The current number of series that would require a disk seek during crash recovery
shown as request
prometheus_local_storage_memory_series
(gauge)
The current number of series in memory
shown as request
prometheus_local_storage_non_existent_series_matches_total
(counter)
How often a non-existent series was referred to during label matching or chunk preloading. This is an indication of outdated label indexes
shown as request
prometheus_local_storage_open_head_chunks
(gauge)
The current number of open head chunks
shown as request
prometheus_local_storage_out_of_order_samples_total
(counter)
The total number of samples that were discarded because their timestamps were at or before the last received sample for a series
shown as request
prometheus_local_storage_persist_errors_total
(counter)
The total number of errors while writing to the persistence layer
shown as request
prometheus_local_storage_persistence_urgency_score
(gauge)
A score of urgency to persist chunks. 0 is least urgent and 1 most
shown as request
prometheus_local_storage_queued_chunks_to_persist_total
(counter)
The total number of chunks queued for persistence
shown as request
prometheus_local_storage_rushed_mode
(gauge)
1 if the storage is in rushed mode and 0 otherwise
shown as request
prometheus_local_storage_series_chunks_persisted
(gauge)
The number of chunks persisted per series
shown as request
prometheus_local_storage_series_ops_total
(counter)
The total number of series operations by their type
shown as request
prometheus_local_storage_started_dirty
(gauge)
Whether the local storage was found to be dirty (and crash recovery occurred) during Prometheus startup
shown as request
prometheus_local_storage_target_heap_size_bytes
(gauge)
The configured target heap size in bytes
shown as byte
prometheus_notifications_alertmanagers_discovered
(gauge)
The number of alertmanagers discovered and active
shown as request
prometheus_notifications_dropped_total
(counter)
Total number of alerts dropped due to errors when sending to Alertmanager
shown as request
prometheus_notifications_queue_capacity
(gauge)
The capacity of the alert notifications queue
shown as request
prometheus_notifications_queue_length
(gauge)
The number of alert notifications in the queue
shown as request
prometheus_rule_evaluation_failures_total
(gauge)
The total number of rule evaluation failures
shown as request
prometheus_sd_azure_refresh_duration_seconds
(gauge)
The duration of a Azure-SD refresh in seconds
shown as request
prometheus_sd_azure_refresh_failures_total
(counter)
Number of Azure-SD refresh failures
shown as request
prometheus_sd_consul_rpc_duration_seconds
(gauge)
The duration of a Consul RPC call in seconds
shown as request
prometheus_sd_consul_rpc_failures_total
(counter)
The number of Consul RPC call failures
shown as request
prometheus_sd_dns_lookup_failures_total
(counter)
The number of DNS-SD lookup failures
shown as request
prometheus_sd_dns_lookups_total
(counter)
The number of DNS-SD lookups
shown as request
prometheus_sd_ec2_refresh_duration_seconds
(gauge)
The duration of a EC2-SD refresh in seconds
shown as request
prometheus_sd_ec2_refresh_failures_total
(counter)
The number of EC2-SD scrape failures
shown as request
prometheus_sd_file_read_errors_total
(counter)
The number of File-SD read errors
shown as request
prometheus_sd_file_scan_duration_seconds
(gauge)
The duration of the File-SD scan in seconds
shown as request
prometheus_sd_gce_refresh_duration
(gauge)
The duration of a GCE-SD refresh in seconds
shown as request
prometheus_sd_gce_refresh_failures_total
(counter)
The number of GCE-SD refresh failures
shown as request
prometheus_sd_kubernetes_events_total
(counter)
The number of Kubernetes events handled
shown as request
prometheus_sd_marathon_refresh_duration_seconds
(gauge)
The duration of a Marathon-SD refresh in seconds
shown as request
prometheus_sd_marathon_refresh_failures_total
(counter)
The number of Marathon-SD refresh failures
shown as request
prometheus_sd_openstack_refresh_duration_seconds
(gauge)
The duration of an OpenStack-SD refresh in seconds
shown as request
prometheus_sd_openstack_refresh_failures_total
(counter)
The number of OpenStack-SD scrape failures
shown as request
prometheus_sd_triton_refresh_duration_seconds
(gauge)
The duration of a Triton-SD refresh in seconds
shown as request
prometheus_sd_triton_refresh_failures_total
(counter)
The number of Triton-SD scrape failures
shown as request
prometheus_target_interval_length_seconds
(gauge)
Actual intervals between scrapes
shown as request
prometheus_target_scrape_pool_sync_total
(counter)
Total number of syncs that were executed on a scrape pool
shown as request
prometheus_target_scrapes_exceeded_sample_limit_total
(gauge)
Total number of scrapes that hit the sample limit and were rejected
shown as request
prometheus_target_skipped_scrapes_total
(counter)
Total number of scrapes that were skipped because the metric storage was throttled
shown as request
prometheus_target_sync_length_seconds
(gauge)
Actual interval to sync the scrape pool
shown as request
prometheus_treecache_watcher_goroutines
(gauge)
The current number of watcher goroutines
shown as request
prometheus_treecache_zookeeper_failures_total
(counter)
The total number of ZooKeeper failures
shown as request

Events

The Gitlab check does not include any events at this time.

Service Checks

The Gitlab check includes a readiness and a liveness service check. Moreover, it provides a service check to ensure that the local Prometheus endpoint is available.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Learn more about infrastructure monitoring and all our integrations on our blog

Gitlab Runner Integration

Overview

Integration that allows to:

  • Visualize and monitor metrics collected via Gitlab Runners through Prometheus
  • Validate that the Gitlab Runner can connect to Gitlab

See https://docs.gitlab.com/runner/monitoring/README.html for more information about Gitlab Runner and its integration with Prometheus

Setup

Installation

The Gitlab Runner check is included in the Datadog Agent package, so you don’t need to install anything else on your Gitlab servers.

Configuration

Edit the gitlab_runner.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s directory, to point to the Runner’s Prometheus metrics endpoint and to the Gitlab master to have a service check.
See the sample gitlab_runner.d/conf.yaml for all available configuration options.

Note: The allowed_metrics item in the init_config section allows to specify the metrics that should be extracted.

Remarks: Some metrics should be reported as rate (i.e., ci_runner_errors)

Validation

Run the Agent’s status subcommand and look for gitlab_runner under the Checks section.

Data Collected

Metrics

ci_docker_machines_provider_machine_creation_duration_seconds_bucket
(histogram)
Histogram of Docker machine creation time
shown as request
ci_docker_machines_provider_machine_creation_duration_seconds_sum
(gauge)
Sum of Docker machine creation time
shown as request
ci_docker_machines_provider_machine_creation_duration_seconds_count
(gauge)
Count of Docker machine creation time
shown as request
ci_docker_machines_provider_machine_states
(gauge)
The current number of machines per state in this provider
shown as request
ci_runner_errors
(counter)
The number of catched errors
shown as request
ci_runner_version_info
(gauge)
A metric with a constant '1' value labeled by different build stats fields
shown as request
ci_ssh_docker_machines_provider_machine_creation_duration_seconds_bucket
(histogram)
Histogram of SSH Docker machine creation time
shown as request
ci_ssh_docker_machines_provider_machine_creation_duration_seconds_sum
(histogram)
Sum of SSH Docker machine creation time
shown as request
ci_ssh_docker_machines_provider_machine_creation_duration_seconds_count
(histogram)
Count of SSH Docker machine creation time
shown as request
ci_ssh_docker_machines_provider_machine_states
(gauge)
The current number of machines per state in this provider
shown as request
go_gc_duration_seconds
(gauge)
A summary of the GC invocation durations
shown as request
go_gc_duration_seconds_sum
(gauge)
Sum of the GC invocation durations
shown as request
go_gc_duration_seconds_count
(gauge)
Count of the GC invocation durations
shown as request
go_goroutines
(gauge)
Number of goroutines that currently exist
shown as request
go_memstats_alloc_bytes
(gauge)
Number of bytes allocated and still in use
shown as byte
go_memstats_alloc_bytes_total
(counter)
Total number of bytes allocated
shown as byte
go_memstats_buck_hash_sys_bytes
(gauge)
Number of bytes used by the profiling bucket hash table
shown as byte
go_memstats_frees_total
(counter)
Total number of frees
shown as request
go_memstats_gc_sys_bytes
(gauge)
Number of bytes used for garbage collection system metadata
shown as byte
go_memstats_heap_alloc_bytes
(gauge)
Number of heap bytes allocated and still in use
shown as byte
go_memstats_heap_idle_bytes
(gauge)
Number of heap bytes waiting to be used
shown as byte
go_memstats_heap_inuse_bytes
(gauge)
Number of heap bytes that are in use
shown as byte
go_memstats_heap_objects
(gauge)
Number of allocated objects
shown as request
go_memstats_heap_released_bytes_total
(counter)
Total number of heap bytes released to OS
shown as byte
go_memstats_heap_sys_bytes
(gauge)
Number of heap bytes obtained from system
shown as byte
go_memstats_last_gc_time_seconds
(gauge)
Number of seconds since 1970 of last garbage collection
shown as request
go_memstats_lookups_total
(counter)
Total number of pointer lookups
shown as request
go_memstats_mallocs_total
(counter)
Total number of mallocs
shown as request
go_memstats_mcache_inuse_bytes
(gauge)
Number of bytes in use by mcache structures
shown as byte
go_memstats_mcache_sys_bytes
(gauge)
Number of bytes used for mcache structures obtained from system
shown as byte
go_memstats_mspan_inuse_bytes
(gauge)
Number of bytes in use by mspan structures
shown as byte
go_memstats_mspan_sys_bytes
(gauge)
Number of bytes used for mspan structures obtained from system
shown as byte
go_memstats_next_gc_bytes
(gauge)
Number of heap bytes when next garbage collection will take place
shown as byte
go_memstats_other_sys_bytes
(gauge)
Number of bytes used for other system allocations
shown as byte
go_memstats_stack_inuse_bytes
(gauge)
Number of bytes in use by the stack allocator
shown as byte
go_memstats_stack_inuse_bytes
(gauge)
Number of bytes obtained from system for stack allocator
shown as byte
go_memstats_sys_bytes
(gauge)
Number of bytes obtained by system. Sum of all system allocations
shown as byte
process_cpu_seconds_total
(counter)
Total user and system CPU time spent in seconds
shown as request
process_max_fds
(gauge)
Maximum number of open file descriptors
shown as request
process_open_fds
(gauge)
Number of open file descriptors
shown as request
process_resident_memory_bytes
(gauge)
Resident memory size in bytes
shown as byte
process_start_time_seconds
(gauge)
Start time of the process since unix epoch in seconds
shown as request
process_virtual_memory_bytes
(gauge)
Virtual memory size in bytes
shown as byte

Events

The Gitlab Runner check does not include any events at this time.

Service Checks

The Gitlab Runner check currently provides a service check to ensure that the Runner can talk to the Gitlab master and another one to ensure that the local Prometheus endpoint is available.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Learn more about infrastructure monitoring and all our integrations on our blog