Amazon ElastiCache

Dashboard Memcached predeterminado de ElastiCache

Información general

Consulta Monitorización de métricas de rendimiento de ElastiCache con Redis o Memcached para obtener información sobre métricas de rendimiento clave, cómo recopilarlas y cómo Coursera monitoriza ElastiCache utilizando Datadog.

Configuración

Si aún no lo has hecho, configura la integración Amazon Web Services.

Instalación sin el Datadog Agent

  1. En la página de la integración AWS, asegúrate de que ElastiCache está habilitado en la pestaña Metric Collection.

  2. Añade los siguientes permisos a tu política IAM de Datadog para poder recopilar métricas de Amazon ElastiCache. Para obtener más información, consulta las políticas de ElastiCache en el sitio web de AWS.

    | Permiso AWS | Descripción | | | ———————————– | ——————————————————————— | | elasticache:DescribeCacheClusters | Enumera y describe clústeres de caché, para añadir etiquetas (tags) y métricas adicionales. | | elasticache:ListTagsForResource | Enumera etiquetas personalizadas de un clúster, para añadir etiquetas personalizadas. | | elasticache:DescribeEvents | Añade eventos de snapshots y mantenimientos. |

  3. Instala la integración Amazon ElastiCache en Datadog.

Instalación con el Datadog Agent (recomendado)

Recopilación de métricas nativas con el Agent

El siguiente diagrama muestra cómo Datadog recopila métricas directamente de CloudWatch con la integración ElastiCache nativa y cómo además puede recopilar métricas nativas directamente de tecnologías backend: Redis o Memcached. Al recopilar directamente del backend, tienes acceso a un mayor número de métricas importantes, a una mayor resolución.

Integraciones ElastiCache, Redis y Memcached

Funcionamiento

Debido a que las métricas del Agent están vinculadas a la instancia EC2, donde se ejecuta el Agent, y no a la instancia ElastiCache real, es necesario utilizar la etiqueta cacheclusteridpara conectar todas las métricas. Una vez que el Agent esté configurado con las mismas etiquetas que la instancia ElastiCache, la combinación de las métricas Redis/Memcached con las métricas ElastiCache es realmente sencilla.

Paso a paso

Debido a que el Agent no se ejecuta en una instancia ElastiCache real, sino en una máquina remota, la clave para configurar correctamente esta integración es indicarle al Agent dónde recolectar las métricas.

Recopilación de la información de conexión para tu instancia ElastiCache

Primero ve a la consola de AWS, abre la sección ElastiCache y luego la pestaña Clústeres de caché para encontrar el clúster que quieres monitorizar. Deberías ver algo como lo siguiente:

Clústeres ElastiCache en la consola de AWS

Luego, haz clic en el enlace “nodo” para acceder a la URL de tu endpoint:

Link nodo en la consola de AWS

Anota la URL del endpoint (por ejemplo: replica-001.xxxx.use1.cache.amazonaws.com) y el cacheclusterid (por ejemplo: replica-001). Necesitas estos valores para configurar el Agent y para crear gráficos y dashboards.

Configuración del Agent

Las integraciones Redis/Memcached admiten el etiquetado de instancias de caché individuales. Originalmente diseñadas para permitir la monitorización de múltiples instancias en la misma máquina, estas etiquetas también se pueden utilizar para filtrar y agrupar métricas. El siguiente es un ejemplo de configuración de ElastiCache con Redis utilizando redisdb.yaml. Para obtener más información acerca de dónde se almacena este archivo en función de tu plataforma, consulta el directorio de configuración del Agent.

init_config:

instances:
    # URL del endpoint de la consola de AWS
    - host: replica-001.xxxx.use1.cache.amazonaws.com
      port: 6379
      # ID del clúster de caché de la consola de AWS
      tags:
          - cacheclusterid:replicaa-001

A continuación, reinicia el Agent: sudo /etc/init.d/datadog-agent restart (en Linux).

Visualizar métricas juntas

Después de unos minutos, se puede acceder a las métricas ElastiCache y a las métricas Redis o Memcached en Datadog para la creación de gráficos, la monitorización, etc.

A continuación se muestra un ejemplo de configuración de un gráfico para combinar las métricas de hits de caché de ElastiCache con las métricas de latencia nativas de Redis utilizando la misma etiqueta cacheclusterid replicaa-001.

Métricas ElastiCache y de caché

Datos recopilados

Métricas

aws.elasticache.active_defrag_hits
(gauge)
Redis - The number of value reallocations per minute performed by the active defragmentation process.
aws.elasticache.bytes_read_into_memcached
(count)
Memcached - The number of bytes that have been read from the network by the cache node.
Shown as byte
aws.elasticache.bytes_used_for_cache
(gauge)
Redis - The total number of bytes allocated by Redis.
Shown as byte
aws.elasticache.bytes_used_for_cache_items
(gauge)
Memcached - The number of bytes used to store cache items.
Shown as byte
aws.elasticache.bytes_used_for_hash
(gauge)
Memcached - The number of bytes currently used by hash tables.
Shown as byte
aws.elasticache.bytes_written_out_from_memcached
(count)
Memcached - The number of bytes that have been written to the network by the cache node.
Shown as byte
aws.elasticache.cache_hit_rate
(gauge)
Redis - Indicates the usage efficiency of the Redis instance.
Shown as percent
aws.elasticache.cache_hits
(count)
Redis - The number of successful key lookups.
Shown as hit
aws.elasticache.cache_misses
(count)
Redis - The number of unsuccessful key lookups.
Shown as miss
aws.elasticache.cas_badval
(count)
Memcached - The number of CAS (check and set) requests the cache has received where the Cas value did not match the Cas value stored.
Shown as request
aws.elasticache.cas_hits
(count)
Memcached - The number of CAS requests the cache has received where the requested key was found and the Cas value matched.
Shown as hit
aws.elasticache.cas_misses
(count)
Memcached - The number of CAS requests the cache has received where the key requested was not found.
Shown as miss
aws.elasticache.cluster_count
(count)
The number of Elasticache clusters.
aws.elasticache.cmd_config_get
(count)
Memcached - The cumulative number of config get requests.
Shown as get
aws.elasticache.cmd_config_set
(count)
Memcached - The cumulative number of config set requests.
Shown as set
aws.elasticache.cmd_flush
(count)
Memcached - The number of flush commands the cache has received.
Shown as flush
aws.elasticache.cmd_get
(count)
Memcached - The number of get commands the cache has received.
Shown as get
aws.elasticache.cmd_set
(count)
Memcached - The number of set commands the cache has received.
Shown as set
aws.elasticache.cmd_touch
(count)
Memcached - The cumulative number of touch requests.
Shown as request
aws.elasticache.cpucredit_balance
(gauge)
The number of earned CPU credits that an instance has accrued since it was launched or started.
Shown as unit
aws.elasticache.cpucredit_usage
(gauge)
The number of CPU credits spent by the instance for CPU utilization.
Shown as unit
aws.elasticache.cpuutilization
(gauge)
The percentage of CPU utilization for the server.
Shown as percent
aws.elasticache.curr_config
(gauge)
Memcached - The current number of configurations stored.
aws.elasticache.curr_connections
(gauge)
Redis - The number of client connections, excluding connections from read replicas. Memcached - A count of the number of connections connected to the cache at an instant in time.
Shown as connection
aws.elasticache.curr_items
(gauge)
Redis - The number of items in the cache. This is derived from the Redis keyspace statistic, summing all of the keys in the entire keyspace. Memcached - A count of the number of items currently stored in the cache.
Shown as item
aws.elasticache.database_memory_usage_percentage
(gauge)
Redis - The percentage of the memory available for the cluster that is in use.
Shown as percent
aws.elasticache.db_0average_ttl
(gauge)
Redis - Exposes avg_ttl of DB0 from the keyspace statistic of the Redis INFO command.
Shown as millisecond
aws.elasticache.decr_hits
(count)
Memcached - The number of decrement requests the cache has received where the requested key was found.
Shown as hit
aws.elasticache.decr_misses
(count)
Memcached - The number of decrement requests the cache has received where the requested key was not found.
Shown as miss
aws.elasticache.delete_hits
(count)
Memcached - The number of delete requests the cache has received where the requested key was found.
Shown as hit
aws.elasticache.delete_misses
(count)
Memcached - The number of delete requests the cache has received where the requested key was not found.
Shown as miss
aws.elasticache.engine_cpuutilization
(gauge)
The percentage of CPU utilization for the Redis process.
Shown as percent
aws.elasticache.eval_based_cmds
(count)
Redis - The total number of commands for eval-based commands.
Shown as command
aws.elasticache.eval_based_cmds_latency
(gauge)
Redis - The latency of eval-based commands.
Shown as microsecond
aws.elasticache.evicted_unfetched
(count)
Memcached - The number of valid items evicted from the least recently used cache (LRU) which were never touched after being set.
Shown as item
aws.elasticache.evictions
(count)
Redis - The number of keys that have been evicted due to the maxmemory limit. Memcached - The number of non-expired items the cache evicted to allow space for new writes.
Shown as eviction
aws.elasticache.expired_unfetched
(count)
Memcached - The number of expired items reclaimed from the LRU which were never touched after being set.
Shown as item
aws.elasticache.freeable_memory
(gauge)
The amount of free memory available on the host.
Shown as byte
aws.elasticache.geo_spatial_based_cmds
(count)
Redis - The total number of geo spatial based commands.
Shown as command
aws.elasticache.get_hits
(count)
Memcached - The number of get requests the cache has received where the key requested was found.
Shown as hit
aws.elasticache.get_misses
(count)
Memcached - The number of get requests the cache has received where the key requested was not found.
Shown as miss
aws.elasticache.get_type_cmds
(count)
Redis - The total number of read-only type commands. This is derived from the Redis OSS commandstats statistic by summing all of the read-only type commands (get, hget, scard, lrange, and so on.)
Shown as command
aws.elasticache.get_type_cmds_latency
(gauge)
Redis - The latency of read commands.
Shown as microsecond
aws.elasticache.hash_based_cmds
(count)
Redis - The total number of commands that are hash-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more hashes.
Shown as command
aws.elasticache.hash_based_cmds_latency
(gauge)
Redis - The latency of hash-based commands.
Shown as microsecond
aws.elasticache.hyper_log_log_based_cmds
(count)
Redis - The total number of HyperLogLog based commands. This is derived from the Redis commandstats statistic by summing all of the pf type of commands (pfadd, pfcount, pfmerge).
Shown as command
aws.elasticache.incr_hits
(count)
Memcached - The number of increment requests the cache has received where the key requested was found.
Shown as hit
aws.elasticache.incr_misses
(count)
Memcached - The number of increment requests the cache has received where the key requested was not found.
Shown as miss
aws.elasticache.is_master
(gauge)
Redis - Returns 1 if the node is master, 0 otherwise.
aws.elasticache.key_based_cmds
(count)
Redis - The total number of commands that are key-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more keys.
Shown as command
aws.elasticache.key_based_cmds_latency
(gauge)
Redis - The latency of key-based commands.
Shown as microsecond
aws.elasticache.list_based_cmds
(count)
Redis - The total number of commands that are list-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more lists.
Shown as command
aws.elasticache.master_link_health_status
(gauge)
Redis - A value of 0 indicates that data in the Elasticache primary node is not in sync with Redis on EC2. A value of 1 indicates that the data is in sync.
aws.elasticache.memory_fragmentation_ratio
(gauge)
Redis - Indicates the efficiency in the allocation of memory of the Redis engine.
aws.elasticache.network_bytes_in
(count)
The number of bytes the host has read from the network.
Shown as byte
aws.elasticache.network_bytes_out
(count)
The number of bytes the host has written to the network.
Shown as byte
aws.elasticache.network_packets_in
(count)
The number of packets received on all network interfaces by the instance.
Shown as packet
aws.elasticache.network_packets_out
(count)
The number of packets sent out on all network interfaces by the instance.
Shown as packet
aws.elasticache.new_connections
(count)
Redis - The total number of connections that have been accepted by the server during this period. Memcached - The number of new connections the cache has received. This is derived from the memcached totalconnections statistic by recording the change in totalconnections across a period of time. This will always be at least 1, due to a connection reserved for a ElastiCache.
Shown as connection
aws.elasticache.new_items
(count)
Memcached - The number of new items the cache has stored. This is derived from the memcached totalitems statistic by recording the change in totalitems across a period of time.
Shown as item
aws.elasticache.node_count
(count)
The number of Elasticache nodes.
Shown as node
aws.elasticache.reclaimed
(count)
Redis - The total number of key expiration events. Memcached - The number of expired items the cache evicted to allow space for new writes.
aws.elasticache.replication_bytes
(gauge)
Redis - For primaries with attached replicas, ReplicationBytes reports the number of bytes that the primary is sending to all of its replicas. This metric is representative of the write load on the replication group. For replicas and standalone primaries, ReplicationBytes is always 0.
Shown as byte
aws.elasticache.replication_lag
(gauge)
Redis - This metric is only applicable for a cache node running as a read replica. It represents how far behind, in seconds, the replica is in applying changes from the primary cache cluster.
Shown as second
aws.elasticache.save_in_progress
(gauge)
Redis - This binary metric returns 1 whenever a background save (forked or forkless) is in progress, and 0 otherwise. A background save process is typically used during snapshots and syncs. These operations can cause degraded performance. Using the SaveInProgress metric, you can diagnose whether or not degraded performance was caused by a background save process.
aws.elasticache.set_based_cmds
(count)
Redis - The total number of commands that are set-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more sets.
Shown as command
aws.elasticache.set_based_cmds_latency
(gauge)
Redis - The latency of set-based commands.
Shown as microsecond
aws.elasticache.set_type_cmds
(count)
Redis - The total number of write types of commands. This is derived from the Redis OSS commandstats statistic by summing all of the mutative types of commands that operate on data (set, hset, sadd, lpop, and so on.)
Shown as command
aws.elasticache.set_type_cmds_latency
(gauge)
Redis - The latency of write commands.
Shown as microsecond
aws.elasticache.slabs_moved
(count)
Memcached - The total number of slab pages that have been moved.
Shown as page
aws.elasticache.sorted_set_based_cmds
(count)
Redis - The total number of commands that are sorted set-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more sorted sets.
Shown as command
aws.elasticache.sorted_set_based_cmds_latency
(gauge)
Redis - The latency of sorted-based commands.
Shown as microsecond
aws.elasticache.stream_based_cmds
(count)
Redis - The total number of commands that are stream-based.
Shown as command
aws.elasticache.stream_based_cmds_latency
(gauge)
Redis - The latency of stream-based commands.
Shown as microsecond
aws.elasticache.string_based_cmds
(count)
Redis - The total number of commands that are string-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more strings.
Shown as command
aws.elasticache.string_based_cmds_latency
(gauge)
Redis - The latency of string-based commands.
Shown as microsecond
aws.elasticache.swap_usage
(gauge)
The amount of swap used on the host.
Shown as byte
aws.elasticache.touch_hits
(count)
Memcached - The number of keys that have been touched and were given a new expiration time.
Shown as hit
aws.elasticache.touch_misses
(count)
Memcached - The number of items that have been touched, but were not found.
Shown as miss
aws.elasticache.unused_memory
(gauge)
Memcached - The amount of unused memory the cache can use to store items. This is derived from the memcached statistics limitmaxbytes and bytes by subtracting bytes from limitmaxbytes.
Shown as byte

A cada una de las métricas recuperadas de AWS se le asignan las mismas etiquetas que aparecen en la consola de AWS, incluidos, entre otros, el nombre del host y los grupos de seguridad.

Nota: Las métricas para despliegues de ElastiCache Serverless se informan en el mismo espacio de nombres aws.elasticache. Estas métricas pueden distinguirse por etiquetas (tags):

  • Las métricas de ElastiCache existentes para cachés de diseño propio utilizan la etiqueta cacheclusterid para identificar una caché individual.
  • Las métricas de caché serverless utilizan la etiqueta clusterid para identificar las cachés individuales

Eventos

La integración Amazon ElastiCache incluye eventos para clúster, grupos de seguridad de caché y grupos de parámetros de caché. Consulta los siguientes ejemplos de eventos:

Eventos Amazon ElastiCache

Checks de servicio

La integración Amazon ElastiCache no incluye checks de servicios.

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el equipo de asistencia de Datadog.

Referencias adicionales